[mesh-dev] We need to talk about batman

max b maxb.personal at gmail.com
Sun Oct 19 19:35:19 PDT 2014


Met with Alex today and we went over a bunch of stuff.

TL;DR
Iphones and other OS now work(!  ...mostly), but UDP is probably still
broken. There may be some other hacked solutions, but we may start taking a
more serious look at babel or bmx6.


Alex explained because open0 and bat0 are bridged and because the
destination address of a packet is on the layer 2 network managed by
batman, the br-openmesh bridged interface won't return the mtu.

The first place that layer 3 routing happens is on the exit/relay server.
Which is already after the place an MTU drop would have taken place.

One partial solution that we implemented was setting up the TCPMSS Clamping
on the exit server. This commit handles that"

https://github.com/sudomesh/exitnode/commit/66a7523895053357a993a4ff61362eb89ca9e8c9

We tested it and an iphone was able to connect to the peoplesopen.net
network and the variety of functionality that we attempted seemed to be
working just fine. So that's nice! However, this will ONLY work for TCP
connections. Everything over udp will still have the mis-matched MTU issues.

Alex and I mentioned that it would be fairly easy to unbridge the open0 and
bat0 interfaces, assign them IP addresses, and set up forwarding rules over
them. That way, traffic arriving at the node would actually be layer 3
routed and the node would have the opportunity to return the ICMP MTU
response. It might work, but it's a kind of weird hack that is maybe less
ideal.

We also talked a lot about how maybe the fundamental issue here is trying
to create a layer 2 mesh network over layer 3. We agreed that we'd take a
more serious look at what exactly an implementation of a layer 3 mesh
protocol might require/how it would be implemented and that we may start
some amount of parallel work testing out babel or bmx6. If anyone is
interested in that, let us know and we can figure out how to split up that
work.`


Max

On Sun, Oct 19, 2014 at 3:17 PM, max b <maxb.personal at gmail.com> wrote:

> Can we get a more clear understanding of how exactly we might reproduce
> these issues? All of my desktop clients (mac osx, linux, windows 7) seem to
> be able to connect. They're all able to view youtube videos and seem to
> browse the web fairly regularly. My android phone is working for 90% of
> applications, although it looks like maybe some of the apps aren't
> connecting reliably. That being said, some of these apps seem to always
> have some sort of connectivity issues and I'm having a hard time isolating
> them.
>
> Furthermore, isn't this MTU problem an issue that would occur on every
> batman-adv network that is connected to the internet? I'm not able to
> articulate this as well as I'd like, but I'm not seeing how this is
> specific to our particular network structure...
>
> Also - tried this on my picostation:
> root at my:~# iptables -t mangle -A POSTROUTING -s 10.0.0.0/8 -p tcp
> --tcp-flags SYN,RST SYN -j TCPMSS  --set-mss 1400
> root at my:~# iptables -t mangle -A POSTROUTING -d 10.0.0.0/8 -p tcp
> --tcp-flags SYN,RST SYN -j TCPMSS  --set-mss 1400
>
> I'm wondering, though, that if this is also layer 3 routing, it probably
> won't flag the sort of issue that you're describing...
>
> I'm curious though, if the scenario you've described is accurate, why
> wouldn't the bridge (which is layer 3, and which has an ip address and a
> set mtu) respond with the ICMP response? In terms of layer 3 traffic, we
> have a client with a layer 3 ip addr and then we also have a mesh node with
> a layer 3 ip addr (which is the bridged interface).
>
>
> Also - are these only hosts which have dhcp clients that don't respect the
> MTU option?
>
>
> Hopefully catch you all on Tuesday, but things have been a little crazy on
> my end, so we'll see....
>
> On Fri, Oct 17, 2014 at 12:38 PM, Alexander Papazoglou <papazoga at gmail.com
> > wrote:
>
>> Max,
>>
>> We can't do what you're suggesting because the open0 interface is not
>> operating as a layer 3 interface. It is
>> bridged to br-openmesh along with bat0. This means that an over-sized
>> packet (e.g. one that doesn't fit in the
>> L2TP envelope of size ~1400bytes) arriving at open0 and headed toward
>> bat0, wouldn't trigger an ICMP
>> response, and would be unceremoniously dropped.
>>
>> The ICMP response is triggered by the IP protocol layer (layer 3). That
>> response is also the only way
>> a Windows (and I think OS X) client would know that the mtu is smaller
>> than it thinks.
>>
>> Assuming this is correct, which is still up for debate, we have two
>> options:
>>
>> (1) find a way to make the Win/OSX (Android/iOS?) client understand that
>> it must use a lower mtu
>>      (DHCP is not an option).
>> (2) remove the bridge and forward at layer 3 (so that ICMP responses
>> would be triggered, and the
>>      client can discover its mtu).
>>
>> Alex
>>
>>
>> 2014-10-17 12:06 GMT-07:00 Max B <maxb.personal at gmail.com>:
>>
>>  Not that I'm arguing in favor of layer 2 vs layer 3 forwarding (although
>>> we're already pretty deep in certain parts of layer 2 implementations), but
>>> why can't we just match the MTU of the open0 interface to the bat0
>>> interface?
>>>
>>>
>>>
>>> On 10/17/14, 11:51 AM, Alexander Papazoglou wrote:
>>>
>>>    Hello mesh-dev.
>>>
>>> I think we may finally have an explanation of the vexing issue of "I
>>> can't
>>> connect to the internet over peoplesopen.net."
>>>
>>> Marc and I spent some time staring at wireshark dumps and  thinking
>>> about why some clients are unable to consistently connect via the
>>> tunnel last night. I think Marc came up with a disappointing but correct
>>> answer: it is basically an mtu issue (mtu is not being discovered
>>> correctly), BUT there is no good fix because we are tunneling at layer 2.
>>>
>>>  When a packet arrives at a node from a client with too large an mtu,
>>>  what SHOULD happen in a normal forwarding situation (per RFC 1191)
>>> is that the node issue a ICMP "Destination Unreachable" packet with a
>>> "Fragmentation required" code. The client then uses this information to
>>> reset its mtu.
>>>
>>>  This doesn't happen because we aren't really forwarding (forwarding
>>> happens
>>> at layer 3). Instead, our interfaces (open0 and bat0) are bridged. So if
>>> a frame
>>>  coming from open0 doesn't fit into bat0 it most likely gets silently
>>> dropped.
>>>
>>>  So bridging open0 with bat0 is a disaster. A quick fix might be to
>>> replace
>>> bridging with forwarding (at the IP level). I suspect this is not the
>>> right thing
>>> to do. It might be better to abandon the idea of meshing at layer 2;
>>> there
>>> are numerous advantages to this.
>>>
>>> In any case; we should discuss options this Tuesday.
>>>
>>> Alex
>>>
>>>
>>>
>>> _______________________________________________
>>> mesh-dev mailing listmesh-dev at lists.sudoroom.orghttps://lists.sudoroom.org/listinfo/mesh-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> mesh-dev mailing list
>>> mesh-dev at lists.sudoroom.org
>>> https://lists.sudoroom.org/listinfo/mesh-dev
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://sudoroom.org/lists/private/mesh-dev/attachments/20141019/bea0bf5f/attachment.html>


More information about the mesh-dev mailing list