On Mon, Mar 30, 2015 at 3:07 AM, Marc Juul <juul@labitat.dk> wrote:
Oh look: It's another very long email from juul!

I've been working on the config for the N750 + antenna-node configuration and here are my thoughts so far.

We should _not_ let the LAN ports be one big ethernet interface for use both as a wired version of open0 _and_ a way to connect antenna-nodes. We never want to bridge two interfaces with attached antenna-nodes but treating the multiple LAN ethernet interfaces as one interface is effectively the same as bridging.

Example scenario: You have to nanobridges on your roof linking you to two parts of the mesh. They are both plugged into LAN ports on your N750 and since the LAN ports are treated as one interface then the two nanobridges are able to communicate on layer 2. The nodes at the houses to which you are connecting with your nanobridges have the same setup. Now we effectively have the spanning tree protocol as our mesh protocol instead of babel.

It was Alex who pointed this out when we were talking about bridging but for some reason I hadn't connected the dots and recognized that the same is true when telling the built-in switch to treat the interfaces as one.

This means that each physical port on the N750 that we wish to connect to an antenna-node must be its own interface and should have a /32 netmask. These interfaces can still have the same node IP as open0, etc, without any problems.

I suggest we allocate two of the four ports for this purpose until we can make something that intelligently changes the config when an antenna-node has been connected.

We now have two remaining LAN ports that can act as a single interface. We could then bridge this LAN interface to open0. However, we want to avoid channel interference when sending from/to open0 but there is no reason we should try to avoid channel interference for traffic coming from the wired interfaces. If we bridge them we cannot treat them differently, so we should not bridge.

It is not clear if babel channel diversity also takes into account channel information for manually published routes such as the open0 route we currently publish with this rule:

  redistribute  if open0  metric 128

I'll have to look at the source code to check. If the functionality is not there then it should be fairly easy to add.

Aaaanyway: Since we don't want to bridge open0 and LAN, this complicates things because we then need two subnets per node (otherwise we have the same /26 on both LAN and open0, which is not going to work). Less than a /26 on either interface gets iffy, so it seems like we'll need to have two /26 subnets per node now.

One last but important thing I realized: If the antenna-nodes have their wifi and ethernet interfaces bridged, then we will have problems. Imagine the following setup:

(N750 A) ------ (nanobridge A) ~~~~ (nanobridge B) ----- (N750 B)

Where "-----" means ethernet and "~~~~" means wifi.

If both nanobridge A and nanobridge B are simply bridging their ethernet and wifi interfaces, then we have the following problems:

1. If e.g. nanobridge A sends a DHCP request, then it will be received by both N750 A and N750 B and they will have no reliable way of know if the request was sent by "their" nanobridge or the remote nanobridge.

2. When managing e.g. nanobridge A via the N750 A web admin interface it will be impossible for nanobridge A to know whether it should grant admin access to N750 A or N750 B.

I can think of only two solutions to this:

1. Pre-configure all antenna-nodes with static IPs and knowledge of which N750 node is their parent.

2. Don't bridge the ethernet and wifi interfaces on antenna-nodes and instead run babel on them.

There's a third solution: On boot-up the antenna-nodes do not bridge their interfaces. They then get an IP from the N750 and run a hook script that bridges the two interfaces. Another hook script is run to unbridge upon physical ethernet disconnect.

Unfortunately it seems like it's not possible to use dnsmasq as the dhcp server on these interfaces since the netmask will be /32 on the N750 and dnsmasq figures out which interface to use based on the subnet. It might be better to make a very very simple dhcp-like server that uses a different port and only ever gives out one specific IP for each port. This might not be a bad thing since it will prevent normal DHCP clients from getting an IP when they connect to the N750 ethernet ports dedicated to antenna-nodes.

I like solution 2 much better since it's easier for both us and node operators. Not sure if we'll see a performance hit if we don't bridge. We have a few nanobridges though so we could easily test this.

What do y'all say?

PS: Obviously babel channel diversity doesn't apply to antenna nodes since babel sees them as ethernet interfaces but since they are mostly directional and far away from the N750 it is fine to treat them as having no interference, which is the default for ethernet interfaces anyway.

PPS: Did you know that the default STP delay when adding an interface to a bridge is apparently 30 seconds? Not sure if OpenWRT has a different default or maybe uses RSTP (rapid spanning tree protocol) to deal with this, but if not then it is something to be aware of in order to now go insane when troubleshooting. From: http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge#Does_DHCP_work_over.2Fthrough_a_bridge.3F