Mesh/Firmware/Splash page

Some operating systems will attempt to detect whether you are behind a captive portal when you connect to a new network. This usually involves fetching a web page with known content from a web server controlled by the company behind the operating system. If a captive portal is detected, the operating system will pop up a dialog showing the web page from the captive portal.

We will run a fake captive portal that causes our splash page to be displayed on any operating system supporting captive portal detection. The fake captive portal will _only_ interfere with captive portal detection traffic. All other traffic will go through unaltered. The fake captive portal will run on the exit nodes and so will only be active when the mesh is connected to the internet.

If the mesh becomes disconnected from the internet, then all HTTP GET requests to the internet will receive 302 HTTP redirects to a community portal web app running on a server on the mesh that will inform the user that the internet is down. The redirection is accomplished with internetisdownredirect. If the node is not connected to any server running the community portal web app, then all HTTP GET requests will receive 302 HTTP redirects to a local web page telling the user that your mesh node has become disconnected from the rest of the mesh, and who to contact about it.

Currently this page talks mostly about how to implement the fake captive portal.

Alternative Options

After some thought and investigation, there some alternative options that build off the proposed actions above and the research in the sections following. Instead of listening for captive portal requests and then responding with a fake captive portal, we do a few different things. First, let's consider the use case:

Ideally, we want new users to know what this network is about.

Running a default captive portal is simply not an option. It impedes usage of the network. However, what if there was an unobtrusive way to display a message to users that did not require using the crappy captive portal mechanism, or a complicated match on special captive portal probes.

Types of captive portal detection

Android

Expects HTTP 204 response from http://clients3.google.com/generate_204

or

expects zero-length response body from http://www.google.com/blank.html

or something else.

Captive portal detection method appears to have changed in 4.2.2.

The code that uses the HTTP 204 method is here. This is the master branch, which I assume is latest stable or latest development, so I'm not sure what this "faster captive portal detection" in 4.2.2 is supposed to mean.

Mac OS

Request is made to "http://www.apple.com/library/test/success.html".

iOS

Here is the sequence of events, as verified by Juul (talk) using an iPhone running iOS 5.0.1:

First a DNS lookup is issued for www.apple.com. Next the following HTTP GET is issued to the resulting IP:

GET /library/test/success.html HTTP/1.0
Host: www.apple.com
User-Agent: CaptiveNetworkSupport-183 wispr
Connection: close

The result that it expects is an HTTP 200 with this content:

<HTML><HEAD><TITLE>Success</TITLE></HEAD><BODY>Success</BODY></HTML>

If it gets anything else, then it will do the following HTTP GET, again to the www.apple.com IP:

GET / HTTP/1.1
Host: www.apple.com
User-Agent: Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A405
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us
Accept-Encoding: gzip, d

and present the response as the splash page.

It seems that GET requests are also sent to "/library/test/success.html" after the get for "/", and if it succeeds, then the splash page disappears as soon as it has appeared.

The solutions seems to be to wait for a request for e.g. "http://sudo.mesh/drop_the_splash_page" and remove the redirect for that source IP for some number of hours / days and have a button on the splash page that links to that URL. The "button" could be the entire splash page, but that might be annoying if you're trying to scroll and accidentally click.

Windows

The captive portal detection is called NCSI (Network Connectivity Status Indicator). It works like so:

  • A DNS lookup of www.msftncsi.com followed by a GET request to the resulting IP with URL http://www.msftncsi.com/ncsi.txt. This file is expected to contain only the text "Microsoft NCSI" (no quotes).
  • A DNS lookup of dns.msftncsi.com. If the DNS lookup does not result in the IP 131.107.255.255, the internet connection is assumed to be non-functioning.

So: To get a splash page displayed, the initial request to http://www.msftncsi.com/ncsi.txt should not return the expected text (unknown if blocking the connection outright is good enough), but the DNS

More info here.

Solutions

The filtering should happen at the exit nodes (the servers from which traffic flows between the mesh and the internet). This means that we are not limited by the processing power of the routers.

Current in-progress solution for iOS

The exit nodes run a dnsmasq caching dns server. They have an entry for www.apple.com in their /etc/hosts file:

184.85.61.15    www.apple.com

This is to ensure that the IP for www.apple.com is always the same for all the entire network and is always known. This is not a good solution. Instead, the configuration that relies on the IP should be updated every time the IP for www.apple.com changes.

Apple is using Akamai and has many addresses. Moreover, it might be that multiple different companies share the same IP? (Mitar)
Since we have caching DNS servers on the mesh exit nodes, everyone connected to the mesh will see the same IP for www.apple.com. If someone sets a different DNS server, then they will simply not see the splash page if they get a different IP for www.apple.com. We are not blocking the IP or even re-directing all of the traffic for the IP. We are simply re-directing port 80 for the IP through a squid proxy and matching on the host name and URL. The content will be delivered normally unless if the URL and host name is not a captive portal deteciton probe. The only way this could be a problem is if something other than an http server is listening on port 80 on www.apple.com. This is not likely to happen in the near future. (Juul (talk))
What about IPv6? (Mitar)
The IPv6 solution is almost identical. (Juul (talk))

An iptables rule redirects all port 80 traffic for the www.apple.com IP to a different port:

iptables -t nat -A PREROUTING -i bat0 -p tcp -d 184.85.61.15 --dport 80 -j REDIRECT --to-port 3128

The squid proxy is run on port 3128 and set to run a program called rewrite.pl that sends alternate responses to specific GET requests.

Why not using internetisdownredirect to redirect? (Mitar)
We want to run this on the exit nodes. Not on the mesh nodes. (Juul (talk))

Squid 3.1 configuration:

acl mesh src 10.0.0.0/8
acl manager proto cache_object
acl localhost src 127.0.0.1/32 ::1
acl to_localhost dst 127.0.0.0/8 0.0.0.0/32 ::1
acl Safe_ports port 80 
acl CONNECT method CONNECT
# Only allow cachemgr access from localhost
http_access allow manager localhost
http_access deny manager
http_access deny !Safe_ports
http_access allow localhost
http_access allow mesh
http_access deny all
http_port 3128 transparent
coredump_dir /var/spool/squid3
# program to run to re-write urls of incoming requests
url_rewrite_program /etc/squid3/rewrite.pl
# The number of redirector processes to spawn
url_rewrite_children 10
# Bypass rewrite if all rewrite processes are busy
url_rewrite_bypass on
# This is almost certainly not needed
refresh_pattern ^ftp:           1440    20%     10080
refresh_pattern ^gopher:        1440    0%      1440
refresh_pattern -i (/cgi-bin/|\?) 0     0%      0
refresh_pattern .               0       20%     4320

We should see if we can use the squid url_rewrite_access directive to ensure that the rewrite.pl program is only run for the specific queries that need rewriting.

The rewrite.pl program simply replies to the captive portal probe queries with the replies from a local apache server. Here is the code of /etc/squid3/rewrite.pl:

#!/usr/bin/perl

$splash_response = "http://localhost/splash.html\n";

$|=1;
while (<>) {
    chomp;
    @line = split;
    $url = $line[0];
    if ($url =~ /^http:\/\/www\.apple\.com\/library\/test\/success\.html/) {

        print $splash_response;

    } elsif ($url =~ /^http:\/\/www\.apple\.com\/$/) {

        print $splash_response;

    } else {
        print $url . "\n";
    }
}

For versions 3.4 of squid and above, the program should look like this instead (reference v3.1 docs and v3.4 docs):

#!/usr/bin/perl

$splash_response = "OK rewrite-url=http://localhost/splash.html\n";

$|=1;
while (<>) {
    chomp;
    @line = split;
    $url = $line[0];
    if ($url =~ /^http:\/\/www\.apple\.com\/library\/test\/success\.html/) {

        print $splash_response;

    } elsif ($url =~ /^http:\/\/www\.apple\.com\/$/) {

        print $splash_response;

    } else {
        print "ERR\n";
    }
}

An apache 2 with standard configuration is running and in /var/www/splash.html has the following file:

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta CHARSET=utf8mb4"utf-8">
        <title>peopleswifi.org</title>
    </head>
    <body>
        <h1>Welcome to People's Wifi</h1>
        <p>
          Click anywhere to continue!
        </p>
    </body>
</html>

The last missing steps are to improve rewrite.pl to add firewall rules that bypass the squid proxy for the source IP after the user clicks past the splash screen. These should be flushed out after some period of time. Also, the matching for http://www.apple.com/ should only be activated for an IP for some minutes immediately after the request to http://www.apple.com/library/test/success.html such that requests to http://www.apple.com from non-captive-portal detecting devices will not be redirected.

One concern is: What happens when the client roams to another mesh node and then stays there until their dhcp lease expires? They may get a new IP if batman-adv decides that another gateway is closer/better. If the client gets a new IP, will it try the captive portal detection again?

Will not clients have global IPs for whole mesh? (Mitar)
The clients will get an IP that is on the mesh subnet and will be able to communicate with the entire mesh and internet. They will get different IPs depending on which dhcp server / internet gateway that is "closer" to them when their lease is up. This is how it's done in batman-adv. Juul (talk)
This is done in batman-adv if you have different topology. If you have VPN tunnels you might do it differently. The issue is that you might not want to have multiple gateways in the network anyway. Mitar (talk) 15:42, 3 September 2013 (PDT)
Yes we could run a central DHCP server, but that would make the mesh less resilient to segmentation. We are trying to make something that has as little centralization as possible. I'm interested in why we would not want multiple gateways though? (46.4.202.3 20:37, 29 October 2013 (PDT))

Proxy

A proxy such as Polipo or Squid could be used.

iptables layer 7

Layer 7 filtering allows the use of regular expression matching of the beginning of the packet data.

ip-based optimization

One of the problems with a proxy and with layer 7 filtering is that it's slow. It would be nice if we could filter only the traffic going to the servers used for captive portal detection (using proxy or layer 7). Unfortunately these servers have not one, but a range of IP addresses (at least for www.apple.com), but a DNS request from an exit node and any user on the mesh going through that same exit node should get the same response. Thus, we should be able to have a script that:

  1. Periodically looks up the IP for the different servers
  2. Adds the result as an entry in /etc/hosts
  3. Updates iptables rules to direct traffic to those IP addresses through the proxy or layer 7 rules.

It may be that we could run an actual caching DNS server, but that DNS server would need a hook to be called every time certain entries change.