Spec for a usable IP tunneling system

2022-04-01

Spec for a usable IP tunneling system

Filed under: JWRD, Networks, Software — Jacob Welsh @ 23:45

The tunnel network driver (aka "tun/tap") is a nifty thing, commonly available on Linux, the BSDs and perhaps other systems (albeit with sadly incompatible APIs). While tunneling was perhaps its initial motivation, it's more accurately described as a virtual network device interface that bridges the kernel network stack to a userspace program, and thus finds other applications such as connecting virtual machines or experimenting with novel low-level protocols.

My present interest in it is simply for tunneling, which means implementing private networks overlaid on top of potentially uncooperative (or even outright hostile) yet inexpensive public infrastructure.

When you aim to maintain direct control over your own information systems - as JWRD works to do for both ourselves and our clients - you encounter a kind of tradeoff. Long-distance cables may be abundant in the industrialized world, but competent, quality ISPs are scarce; the costs of access to them can be defrayed by sharing physical space at a colocation center, but this requires consigning your equipment to the owners of that space, opening yourself to interference from them, fellow tenants, or even complete strangers; further, you shoulder higher costs both in the real estate and in the overhead added to routine maintenance operations. On the other hand, keeping things on your own premises restricts your connectivity options to the mass consumer market (howsoever it may style itself "business grade"), wherein you will put up with all manner of indignities - at least until you grow to the point where you can lay or lease your own circuits across town, basically becoming a center in your own right.

So-called VPNs are widely used in the corporate world⁽ⁱ⁾; the trouble is that I haven't found any implementations that don't massively stink in one way or another. In particular:

UDP transport is required. Doing it over TCP forces all the drawbacks of that protocol (such as severe delays on packet loss and underutilization of bandwidth) onto everything transiting the tunnel, including other TCP connections with compounding interest (this is one problem with SSH tunnels, though they're a convenient hack in a pinch). Doing it at the lower IP level, as seen in GRE or IPSec ESP, fails to make it past the ubiquitous NAT and other stateful firewalls.
Stable IP addresses cannot be assumed on the client side; changes must be handled without human intervention. (GRE also fails here.)
Proper layering is required to maintain the utility of existing network stack functionality. For instance, the OSPF routing protocol can't be used with OpenVPN's "server" mode because it reinvents routing internally without the necessary multicast support; the in-kernel IPSec implementation in Linux suffers from similar artificial limitations.
Bloated, unverifiable and ever-shifting cryptography code is arguably worse than no crypto at all. In particular, the TLS protocol (as seen in OpenVPN) is absolutely toxic.
While never possible to guarantee over a public network, performance and latency (and consistency thereof) do matter, and router CPU cycles can be scarce; so interpreted or otherwise garbage-collected programming languages are probably out.

Using the "tun" driver, meeting these basic requirements is actually pretty simple. Here's my spec for how a minimum viable implementation - humbly entitled "jtunnel" - would work; to be refined as needed.

Server operation

jtunnel -s [-b BIND_ADDRESS] [-p BIND_PORT] [-i SERVER_ID] PROG ARGS...

Binds a UDP socket and listens for packets from authorized clients.

Each time a new client is seen, the server opens a new "tun" device and spawns PROG ARGS... with environment variable TUN_DEV_NAME set to the dynamically assigned tun name and TUN_CLIENT_ID set to the client's identifier (hex-encoded).

IP packets entering a tun are encapsulated and sent by UDP to the corresponding client at its last seen address and port.

UDP packets received from a client are decapsulated and returned to the kernel through the corresponding tun.

Client operation

jtunnel [-b BIND_ADDRESS] [-p BIND_PORT] [-i CLIENT_ID] ADDRESS PORT PROG ARGS...

Binds a UDP socket, opens a "tun" device, spawns PROG ARGS... with environment variable TUN_DEV_NAME set to the tun name, then proceeds to forwarding.

IP packets entering the tun are encapsulated and sent by UDP to the server at the given ADDRESS and PORT.

UDP packets received on the socket are filtered for the server's source ADDRESS and PORT, decapsulated, and possibly filtered for an authorized server identifier; then the payload is returned to the kernel through the tun.

Thus, the system behaves like a point-to-point link between the server and client tunnel devices.

Typically, PROG would be a shell script that configures IP addresses and routes for the new interface according to the peer's identity.

Both client and server determine authorized peer identifiers by checking existence of a correspondingly named file within a configuration directory. Later, these files could be extended to contain symmetric or public keys. Thus, new clients can be authorized without needing to restart/reload the server.

Protocol format

The payload of each UDP packet seen on the physical network consists of a header followed by the encapsulated payload.

Initially the header is just an 8-byte⁽ⁱⁱ⁾ field containing the randomly generated identifier of the sending party. Later, it could be extended to contain security fields (e.g. nonce and MAC), and the payload could be encrypted.

Implementation

The client could be done using poll/select and nonblocking I/O, though it only has two file descriptors to deal with, so could just split into two processes using simple blocking calls, one for each direction.

The server would need to poll/select among all the tun devices as well as its UDP socket. A balancing tree or hash table maps client identifier to tun device file descriptor. A resizeable array would suffice to map tun FD to the client's last seen address and port (since FDs are small integers).

Alternatively, a threaded server implementation is possible; this would likely be more complex but could scale better to many clients by utilizing parallel resources. It could evolve by extending the plain poll/select loop to a thread pool, allowing a tuneable level of parallelism.

Setting some socket options may be in order; perhaps pertaining to fragmentation or that REUSEADDR business.

Some form of keepalive will be needed to maintain connectivity across stateful firewalls. For starters it should suffice to keep a separate "ping" running through the tunnel: since there's no TCP-like connection state, there's no need for "reconnection"; but it might be nice to have this integrated into the client.

Some mechanism may be needed to make the server "forget" a particular client, i.e. to forcibly disconnect it by closing its tun device and removing its table entries. The whole server could simply be restarted, but that would interrupt traffic to all clients until each re-established its address (by a keepalive packet or otherwise). Perhaps a signal could trigger the server to rescan currently authorized identities from the filesystem and close out any not found.

At least they were; for all I know there aren't many left who still find value in not simply handing everything over to Amazon or worse. [^]
Long enough? Too long? [^]

2 Comments »

I should note that security is pretty weak in the initial version proposed here. The random client ID acts as a kind of password; if long enough, it prevents brute-force attempts by third parties to impersonate a client (to gain access to the server's network or traffic intended for the client), but once sniffed, this becomes trivial. Impersonating a server (to gain access to the client's network or traffic intended for the server) is harder, requiring ability to intercept or spoof the server's return address.

Comment by Jacob Welsh — 2022-04-02 @ 05:36
[...] with full engineering support, active monitoring and backups. Bring your own braindead ISP connection. [^]Consequently, mobile devices can play too, to the extent they're sufficiently [...]

Pingback by The Dovecot reports: how we came to forking a major email server « Fixpoint — 2023-04-06 @ 23:58

RSS feed for comments on this post. TrackBack URL

Fixpoint

2022-04-01