DevOps Within

Stories from a man against the machines.

Whose MAC is this anyway?

One step back and two steps forward

In the previous installment I have shown that we can find MAC addresses of network neighbors using LLDP. While MAC addresses might work great as node identifiers (those have to be unique in the network if we don't want bad things to happen), this is not the usual way of addressing things over the network. The typical thing to use is IP Address and there are ways of translating one to the other.

In fact, each time we try to send a network packet to specific IP Address, an Address Resolution Protocol (ARP) is involved. As usual, I don't want to get into the specifics of the protocol, but the core of it works as follows:

  1. Device 1 searches for Device 2's IP <-> Physical address matching in ARP table
  2. If it is found, further communication is started
  3. If not: device 1 sends a 'who-has' broadcast packet with information about Device 2's IP Address
  4. Device 2 responds with it's MAC Address
  5. Device 1 caches the information in it's ARP Table and then starts communication with Device 2 directly

A good way of filling the table all at once is pinging the broadcast address, which should fill our ARP table with address mapping for all devices in the subnet. Then, we could just look up MAC address in the table to find it's assigned IP.

This is great, but only works if:

  1. Devices are connected in local area network via Ethernet cable and switches
  2. There are no gateways or routers along the way

Welcome to the world where IP is made up and ARPs don't matter

Turns out that my idea of basing the application on this mechanism wasn't too good, because it's not very usual for a computer to stay on the same physical network with the network switches. This means that I cannot depend on using ARP for MAC-to-IP Translation in my app. Bugger. Took me a few evenings to find a workaround and turns out that it was just in front of my eyes.

Let me quote a part of MIB for LLDP:

lldpRemManAddrTable OBJECT-TYPE
    SYNTAX      SEQUENCE OF LldpRemManAddrEntry
(...)
lldpRemManAddrEntry OBJECT-TYPE
    SYNTAX      LldpRemManAddrEntry
    MAX-ACCESS  not-accessible
    STATUS      current
    DESCRIPTION
            "Management address information about a particular chassis
            component.  There may be multiple management addresses
            configured on the remote system identified by a particular
            lldpRemIndex whose information is received on
            lldpRemLocalPortNum of the local system.  Each management
            address should have distinct 'management address
            type' (lldpRemManAddrSubtype) and 'management address'
            (lldpRemManAddr.)
            Entries may be created and deleted in this table by the
            agent."
    INDEX   { lldpRemTimeMark,
              lldpRemLocalPortNum,
              lldpRemIndex,
              lldpRemManAddrSubtype,
              lldpRemManAddr
 }

What does this mean? It means that OID for lldpRemManAddrEntry (remote management address of LLDP neighbor) is indexed with... IP of the neighbor! So the whole MAC -> IP translation thing is not needed! :) As a proof:

$ snmpwalk -M ~/.snmp/mibs/ -c <community> -v2c 10.1.1.42 lldpRemManAddrTable -Obfn
.1.0.8802.1.1.2.1.4.2.1.3.0.47.1.1.4.10.1.1.40 = INTEGER: ifIndex(2)
(...)
.1.0.8802.1.1.2.1.4.2.1.5.0.48.1.1.4.10.1.1.40 = OID: .0.0

Last four segments of the OID are the IP of a neighbor.

This is the part where I don't show you my ugly code

So, this all lead to quite a few changes in the code itself.

First of all, I have changed the OID fed to BulkWalk method to one representing lldpRemManAddrTable and started parsing its responses specifically.

Next, I have decided to take small steps towards crawling the network - I have created structs representing single port of the switch (with pointer to remote switch) and representing the switch itself (with assigned IP and list of ports). Also, switch has a method for finding its neighbors which creates missing nodes on the fly.

Current output of the application looks as follows. It is, admittedly, quite ugly, but shows that I have found all the devices connected to starting point - 10.1.1.0.

[ `./i-must-go` | done: 163.531855ms ]
  * Switch 10.1.1.20: map[]
  * Switch 10.1.1.40: map[]
  * Switch 10.0.0.252: map[]
  * Switch 10.1.1.0: map[%!s(int=46):Remote IP:10.1.1.40
   %!s(int=48):Remote IP:10.0.0.252
   %!s(int=38):Remote IP:10.1.1.1
   %!s(int=43):Remote IP:10.1.1.20
   %!s(int=44):Remote IP:10.1.1.20
   %!s(int=41):Remote IP:10.1.1.3
   %!s(int=42):Remote IP:10.1.1.3
   %!s(int=45):Remote IP:10.1.1.40
   %!s(int=37):Remote IP:10.1.1.1
   %!s(int=39):Remote IP:10.1.1.2
   %!s(int=40):Remote IP:10.1.1.2
  ]
  * Switch 10.1.1.1: map[]
  * Switch 10.1.1.2: map[]
  * Switch 10.1.1.3: map[]

Next steps

I'm getting closer and closer to the thing which scares me the most of all planned things - playing with graphs theory, crawling the network and finding all the nodes. This makes me feel bad about not graduating from Computer Science studies ;-)

The code is available to look at on github.