Ok here’s a recipe to try:
- Get a development board or PC that has two or more network interfaces,
- Assign each of them with a unique IP address,
- Connect the interfaces to a common network,
- Finally, ping one of the IP addresses from another machine.
Now, depending on which IP address you chose to ping, you may find that your pings will suddenly fail to respond and timeout when you disconnect the other interface (i.e. the interface you are not pinging). Bizarre, isn’t it?
However, as my colleague and I recently discovered whilst debugging a new Ethernet driver, this gotcha is actually correct behaviour for Linux – and in fact it is correct behaviour as defined by the relevant RFC’s. I thought I’d use this post to discover what is going on and why this is OK.
In order to reproduce this behaviour I set up a virtual machine and assigned it with a number of NAT’d Ethernet devices (4 in fact). I also set up Wireshark (what used to be Ethereal) so that I could monitor any traffic. Here is cut down version of the output from ifconfig.
$ ifconfig eth0 Link encap:Ethernet HWaddr 00:0c:29:12:0b:bd inet addr:192.168.27.132 Bcast:192.168.27.255 Mask:255.255.255.0 eth1 Link encap:Ethernet HWaddr 00:0c:29:12:0b:c7 inet addr:192.168.27.133 Bcast:192.168.27.255 Mask:255.255.255.0 eth2 Link encap:Ethernet HWaddr 00:0c:29:12:0b:d1 inet addr:192.168.27.135 Bcast:192.168.27.255 Mask:255.255.255.0 eth3 Link encap:Ethernet HWaddr 00:0c:29:12:0b:db inet addr:192.168.27.134 Bcast:192.168.27.255 Mask:255.255.255.
I cleared the ARP cache on my Windows machine by using “arp -d” and then pinged 192.168.27.132. The packet exchange captured by Wireshark proved to be quite interesting. Let’s take a look.
No. Source Destination Protocol Info 1 Vmware_c0:00:08 Broadcast ARP Who has 192.168.27.132? Tell 192.168.27.1 2 Vmware_12:0b:db Vmware_c0:00:08 ARP 192.168.27.132 is at 00:0c:29:12:0b:db 3 192.168.27.1 192.168.27.132 ICMP Echo (ping) request 4 Vmware_12:0b:d1 Vmware_c0:00:08 ARP 192.168.27.132 is at 00:0c:29:12:0b:d1 5 Vmware_12:0b:c7 Vmware_c0:00:08 ARP 192.168.27.132 is at 00:0c:29:12:0b:c7 6 Vmware_12:0b:bd Vmware_c0:00:08 ARP 192.168.27.132 is at 00:0c:29:12:0b:bd 7 192.168.27.132 192.168.27.1 ICMP Echo (ping) reply 8 192.168.27.1 192.168.27.132 ICMP Echo (ping) request 9 192.168.27.132 192.168.27.1 ICMP Echo (ping) reply 10 192.168.27.1 192.168.27.132 ICMP Echo (ping) request 11 192.168.27.132 192.168.27.1 ICMP Echo (ping) reply
After I invoke the ping command, my machine issues an ARP broadcast, asking for the MAC address currently associated with 192.168.27.132. However all of the network interfaces of my virtual machine respond – resulting in 4 ARP replies. When this happens Windows (and other OS’s) will ignore all but the first response, with the assumption that the first reply must have come from the quickest route.
In this example the quickest ARP reply came from the MAC address associated with eth3. Therefore whenever we communicate with 192.168.27.132, as we have done via Ping, the traffic will be sent to eth3. As a result, if we now down interface eth3 with “ifconfig eth3 down”, our pings will fail. This behaviour can be confusing as why should eth3 going down affect traffic that is directed to 192.168.27.132 which we believed to be associated with eth1?
Despite the impression ifconfig gives, Linux associates IP addresses with the host as opposed to individual interfaces of the host. With that in mind, the behaviour we’ve seen doesn’t seem so bizarre. When a network interface receives an ARP request for an IP address which it owns, then in effect a valid network route been made between the requestor and the requested. This route could potentially be the only route and as it is likely that the two will communicate with each other, it makes sense to reply to the ARP request. And this is what happens – the network interface that received the ARP request will now act as a proxy for the requested IP address.
This behaviour is actually quite convenient. In our example, even though our pings began to fail once we disconnected a route to the host – as soon as the Windows ARP cache times out (after 10 minutes) another ARP request will be broadcast. Like before, any interface that can provide a route to the host will respond, and so connectivity will be restored. If Linux wasn’t designed in this way and each interface truly owned an IP address, then if that link went down connectivity would never be restored to that address – even though there are other physical connections to the machine that has that IP address!
The other point of interest here, at least with this contrived networking configuration, is that reliability is favoured over performance. The reason is that where multiple interfaces exist on a machine, it’s quite likely that a priority ordering will exist between them. And so if, as in this case, eth3 replies the quickest then it is likely that it will always be the quickest. As a result, it is also likely to respond to all the ARP requests first and so all traffic for the 4 IP addresses will arrive on a single interface. We can demonstrate this. After pinging all of the IP addresses assigned to my virtual machine we can examine the ARP cache of Windows.
arp -a Interface: 192.168.27.1 --- 0x2 Internet Address Physical Address Type 192.168.27.132 00-0c-29-12-0b-bd dynamic 192.168.27.133 00-0c-29-12-0b-bd dynamic 192.168.27.134 00-0c-29-12-0b-bd dynamic 192.168.27.135 00-0c-29-12-0b-bd dynamic
As you can see all the IP addresses correspond to the same interface, i.e. eth3. Thus all the traffic will go over a single 10/100Mbit link instead of 4 links.
Fortunately, where this behaviour isn’t ideal, the proc interface provides a means to modify it. Of particular interest are the arp_filter and rp_filter sysctl knobs which can be found in the proc interface. I’ve not really managed to make complete sense of these yet and may well write another post on these in the future. Though for the behaviour described above it was necessary for me to invoke “echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter”, I found without this I would only ever get two ARP replies instead of 4 – I’m not entirely sure why this is… suggestions anyone?