Hey so I'm trying to debug some slightly strange tunneldigger behaviour and
thought I'd check to see if anyone here has any thoughts.
This page shows ping times to a few mesh nodes from a VPS monitor server:
http://192.241.217.196/smokeping/smokeping.cgi?target=Mesh
Both MaxbMyNet1 and MaxbMyNet2 show a consistent increase in ping times
starting Monday (5-25-15) at like 11am or so.
MaxbMyNet1 has a direct ethernet connection to the internet and is
tunnelling to the exit server, while MaxbMyNet2 does not have any ethernet
connection and is instead connecting to the internet through MaxbMyNet1.
If I ssh into MaxbMyNet1, I can see that the l2tp0 tunnel is correctly
setup and that tunneldigger seems to be working correctly:
root@my:~# ps | grep tunneldigger
9538 root 5296 S /usr/bin/tunneldigger -u Sudomesh-MyNet-2 -i
l2tp0 -t 1 -b 104.236.181.226 8942 -L 20000kbit -s /opt/mesh/tunnel_hook -I
eth0.1
root@my:~# ip addr show l2tp0
18: l2tp0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1438 qdisc htb state
UNKNOWN group default qlen 1000
link/ether da:d8:46:b7:d7:9b brd ff:ff:ff:ff:ff:ff
inet 100.64.3.1/32 scope global l2tp0
valid_lft forever preferred_lft forever
inet6 fe80::d8d8:46ff:feb7:d79b/64 scope link
valid_lft forever preferred_lft forever
root@my:~# ip addr show eth0.1
11: eth0.1@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UP group default
link/ether 00:90:a9:0b:73:cb brd ff:ff:ff:ff:ff:ff
inet 192.168.13.37/24 brd 192.168.13.255 scope global eth0.1
valid_lft forever preferred_lft forever
inet 192.168.0.102/24 brd 192.168.0.255 scope global eth0.1
valid_lft forever preferred_lft forever
inet6 fe80::290:a9ff:fe0b:73cb/64 scope link
valid_lft forever preferred_lft forever
Even more strangely, I can ping the world-routable IP of the exit server
and get back ping times consistent with the lower line of the graph:
root@my:~# ping 104.236.181.226
PING 104.236.181.226 (104.236.181.226): 56 data bytes
64 bytes from 104.236.181.226: seq=0 ttl=52 time=14.670 ms
64 bytes from 104.236.181.226: seq=1 ttl=52 time=14.264 ms
64 bytes from 104.236.181.226: seq=2 ttl=52 time=13.241 ms
64 bytes from 104.236.181.226: seq=3 ttl=52 time=13.949 ms
64 bytes from 104.236.181.226: seq=4 ttl=52 time=13.626 ms
64 bytes from 104.236.181.226: seq=5 ttl=52 time=18.133 ms
64 bytes from 104.236.181.226: seq=6 ttl=52 time=13.531 ms
And if I manually specify ping packets to go over the eth0.1 interface and
NOT the l2tp0 interface they have low ping times:
root@my:~# ping -I eth0.1 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=55 time=21.834 ms
64 bytes from 8.8.8.8: seq=1 ttl=55 time=16.872 ms
64 bytes from 8.8.8.8: seq=2 ttl=55 time=19.764 ms
64 bytes from 8.8.8.8: seq=3 ttl=55 time=17.265 ms
64 bytes from 8.8.8.8: seq=4 ttl=55 time=16.989 ms
64 bytes from 8.8.8.8: seq=5 ttl=55 time=18.188 ms
However, if I ping over the tunnel and through the exit server I get the
slower times:
root@my:~# ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=56 time=28.958 ms
64 bytes from 8.8.8.8: seq=1 ttl=56 time=29.211 ms
64 bytes from 8.8.8.8: seq=2 ttl=56 time=28.965 ms
64 bytes from 8.8.8.8: seq=3 ttl=56 time=29.022 ms
And then, weirdly, restarting tunneldigger on the MyNet seems to have fixed
it (look for the new line that will proably start around 16:00 on Monday
which will be at the lower time).
Thoughts? I'll keep taking a look at it, and it's possible it has something
to do with our up hook on the exit server which adds the new l2tp interface
to babel, but wanted to put it out there in case anyone had any ideas.
Max