Hey so I'm trying to debug some slightly strange tunneldigger behaviour and thought I'd check to see if anyone here has any thoughts.

This page shows ping times to a few mesh nodes from a VPS monitor server:

http://192.241.217.196/smokeping/smokeping.cgi?target=Mesh

Both MaxbMyNet1 and MaxbMyNet2 show a consistent increase in ping times starting Monday (5-25-15) at like 11am or so.

MaxbMyNet1 has a direct ethernet connection to the internet and is tunnelling to the exit server, while MaxbMyNet2 does not have any ethernet connection and is instead connecting to the internet through MaxbMyNet1.

If I ssh into MaxbMyNet1, I can see that the l2tp0 tunnel is correctly setup and that tunneldigger seems to be working correctly:

root@my:~# ps | grep tunneldigger
 9538 root      5296 S    /usr/bin/tunneldigger -u Sudomesh-MyNet-2 -i l2tp0 -t 1 -b 104.236.181.226 8942 -L 20000kbit -s /opt/mesh/tunnel_hook -I eth0.1

root@my:~# ip addr show l2tp0
18: l2tp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1438 qdisc htb state UNKNOWN group default qlen 1000
    link/ether da:d8:46:b7:d7:9b brd ff:ff:ff:ff:ff:ff
    inet 100.64.3.1/32 scope global l2tp0
       valid_lft forever preferred_lft forever
    inet6 fe80::d8d8:46ff:feb7:d79b/64 scope link
       valid_lft forever preferred_lft forever
root@my:~# ip addr show eth0.1
11: eth0.1@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 00:90:a9:0b:73:cb brd ff:ff:ff:ff:ff:ff
    inet 192.168.13.37/24 brd 192.168.13.255 scope global eth0.1
       valid_lft forever preferred_lft forever
    inet 192.168.0.102/24 brd 192.168.0.255 scope global eth0.1
       valid_lft forever preferred_lft forever
    inet6 fe80::290:a9ff:fe0b:73cb/64 scope link
       valid_lft forever preferred_lft forever

Even more strangely, I can ping the world-routable IP of the exit server and get back ping times consistent with the lower line of the graph:

root@my:~# ping 104.236.181.226
PING 104.236.181.226 (104.236.181.226): 56 data bytes
64 bytes from 104.236.181.226: seq=0 ttl=52 time=14.670 ms
64 bytes from 104.236.181.226: seq=1 ttl=52 time=14.264 ms
64 bytes from 104.236.181.226: seq=2 ttl=52 time=13.241 ms
64 bytes from 104.236.181.226: seq=3 ttl=52 time=13.949 ms
64 bytes from 104.236.181.226: seq=4 ttl=52 time=13.626 ms
64 bytes from 104.236.181.226: seq=5 ttl=52 time=18.133 ms
64 bytes from 104.236.181.226: seq=6 ttl=52 time=13.531 ms

And if I manually specify ping packets to go over the eth0.1 interface and NOT the l2tp0 interface they have low ping times:

root@my:~# ping -I eth0.1 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=55 time=21.834 ms
64 bytes from 8.8.8.8: seq=1 ttl=55 time=16.872 ms
64 bytes from 8.8.8.8: seq=2 ttl=55 time=19.764 ms
64 bytes from 8.8.8.8: seq=3 ttl=55 time=17.265 ms
64 bytes from 8.8.8.8: seq=4 ttl=55 time=16.989 ms
64 bytes from 8.8.8.8: seq=5 ttl=55 time=18.188 ms


However, if I ping over the tunnel and through the exit server I get the slower times:
root@my:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=56 time=28.958 ms
64 bytes from 8.8.8.8: seq=1 ttl=56 time=29.211 ms
64 bytes from 8.8.8.8: seq=2 ttl=56 time=28.965 ms
64 bytes from 8.8.8.8: seq=3 ttl=56 time=29.022 ms


And then, weirdly, restarting tunneldigger on the MyNet seems to have fixed it (look for the new line that will proably start around 16:00 on Monday which will be at the lower time).

Thoughts? I'll keep taking a look at it, and it's possible it has something to do with our up hook on the exit server which adds the new l2tp interface to babel, but wanted to put it out there in case anyone had any ideas.


Max