Fixing Source IP for Nginx Docker Container

docker iptables

In the last week, I’ve been deploying containers on Google Cloud Platform. During these deployments we faced an issue where source IP of the client was not preserved in the nginx logs. Our simplified architecture looks something like following.

                                          Client: 1.2.3.4
                                                         
                                                     |   
                                                     |
                                                     |
                    +--------------------------------v------+
                    | Network Load Balancer       port 80   |
                    |                                       |
                    |                      123.123.123.123  |
                    +------|--------------------------------+
                           |                            
                           |                            
                           |                            
                           |                            
                    +------v-----------v-------------------+ 
                    |    eth0       docker0                |
                    |10.10.10.10    172.17.0.1             |
                    |                          expose 80   |
                    |                        +-----------+ |
                    |                        |nginx      | |
                    |                        |container  | |
                    |                        |172.17.0.2 | |
                    | Compute Instance       +-----------+ |
                    +--------------------------------------+
                                      

What we have is, nginx container running inside a compute instance which is behind a network load-balancer. NLB accepts requests on port 80 and transparently forwards the packets to the compute instance on port 80.

Now the issue here is that, when a client with IP address 1.2.3.4 makes a request to the service which is accessible via nginx container, the access logs of nginx container do not log IP of a client(remote_addr=1.2.3.4) but instead log the IP address of docker0 bridge(172.17.0.1).

We’re using L4 load-balancer(TCP/UDP) here and not L7(HTTP) load-balancer hence there’s no additional request headers(e.g. X-Forwarded-For) that can be used to solve this problem.

Okay, let’s revisit one statement from eariler paragraph to dive deeper.

NLB accepts requests on port 80 and transparently forwards the packets to the compute instance on port 80.

The issue wouldn’t have surfaced if NLB sends the packets with destination address as default private interface(eth0=10.10.10.10) of compute instance but instead the destination address of packets is the address of NLB(123.123.123.123). You ask why? It is because the Docker adds a rule in the PREROUTING chain of NAT table as follows.

-A PREROUTING -m addrtype --dst-type LOCAL -j DOCKER
-A DOCKER -d 10.10.10.10/32 ! -i docker0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.17.0.1:80

The first rule checks if the packet is addressed to any of the local interface IP addresses, it yes it jumps to DOCKER chain which has the mapping for exposed ports and corresponding container IP. This works fine in normal scenarios but not here. Why? Because the extra check in this rule - if packet is destined for eth0=10.10.10.10 - would fail and packet will fall through regular IPtables rules.

How do you solve this? There are three probable solutions that we could come up with.

  1. Run nginx as a system service on all interfaces(0.0.0.0). This is good enough and last resort solutuon as we would have to bypass our container tooling to manage services.
  2. Run nginx with network_mode=host. This essentially bypasses creation of separate network namspaces and runs container on host network stack.
  3. Add IPtable rules similar to what Docker has added in the PREROUTING chain where destination IP address is of NLB.
-I PREROUTING -d 123.123.123.123 -p tcp -m tcp --dport 80 -j DOCKER
-A DOCKER -d 123.123.123.123/32 ! -i docker0 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.17.0.1:80

It would have been better if we could somehow implement the third solution but it was too tedious of a task to maintain iptables rules out of band of Docker Toolchain and be consistent. Hence, we opted for less complicated and simple to implement solution #2. Also, we were okay to use network_mode=host as we knew no other service would be using port 80 wherever nginx container is running, YMMV.