Monday 4 January 2016

Dealing with Ultra High Packet Loss

"The Brown Fox was quick, even in the face of obstacles"
  - Ian Munsie, 2016

Over the last couple of weeks my Internet connection has developed a fault which is resulting in rather high packet loss. Even doing a simple ping test to 8.8.8.8 shows up to about 26% packet loss! Think about that - that means that 1 in every 4 packets might get dropped. A technician from my ISP visited last week and a Telstra technician (the company responsible for the copper phone lines) coming this Friday to hopefully sort it out, but in the meantime I'm stuck with this lossy link.

Trying to use the Internet with such high packet loss really reveals just how poorly TCP is designed for this situation. See, TCP is designed around the assumption that a lost packet means the link is congested and that it should slow down. But that is not the case for my link - a packet has a high chance of being dropped even there is no congestion whatsoever.

That leads to a situation where anything using TCP will slow down after only a few packets have been sent and at least one has been lost, and then a few packets later it will slow down again, and then again, and again, and again... While my connection should be able to maintain 300KB/s (theoretically more, but that's a ball park figure that it has been able to achieve in practice in the past), right now I'm only getting closer to 3KB/s, and some connections just hang indefinitely (it's hit or miss whether I can even finish a speedtest.net run). Interactive traffic is also affected, but fares slightly better - a HTML page probably only needs one packet for the HTML, so there's a 3/4 chance it will load in the first attempt... but every javascript or css file it links only has a 3/4 chance of loading and every image has a lower chance (since they are larger and will take several packets or more) - some of those will try again, but some will ultimately give up when several retries also get lost.

Now, TCP is a great protocol - the majority of the Internet runs on it and things generally work pretty well, but it's just not designed for this situation (and there's a couple of other situations which it is not suitable for either, such as very high latency links - it will not be a good choice as we advance further into space for example). The advent of WiFi led to some improvements in congestion avoidance protocols and tunables so that it doesn't immediately assume that packet loss means congestion, but even then it can only tolerate a very small amount of packet loss before performance starts to suffer - and experimenting with different algorithms and tunables made no appreciable difference to my situation whatsoever.

So, I started thinking - what we need is a protocol that does not take packet loss to mean congestion. This protocol would instead base it's estimation of the available bandwidth on how much data was actively being received, and more to the point - how this changed as it changes how much data was being transmitted.

So, for instance, if it started transmitting at (lets pick an arbitrary number) 100KB/s and then the receiver would reply back to tell the sender that it was receiving 75KB/s (25% packet loss). At this point TCP would go "oh shit, congestion - slow down!", but our theoretical protocol would instead try sending 125KB/s to see what happens - if the receiver replies to say it is now receiving 100KB/s then it knows that it has not yet hit the bandwidth limit and the discrepancy is just down to packet loss. It could then increase to 200KB/s, then 300KB/s until finally it finds when the receiver is no longer able to receive any more data.

It could also try reducing the data being sent - if there is no change in the amount being received than it knows that it was sending too fast for no good reason, while if there is a change then it knows that the original rate was ok. The results would of course need to be smoothed out to cope with real world fluctuations and the algorithm would have to periodically repeat this experiment to cope with changes in actual congestion, but with some tuning the result should be quite a bit better than what we can achieve with TCP in this situation (at least for longer downloads over relatively low latency links that can respond to changes in bandwidth faster - this would still not be a good choice for space exploration).

This protocol would need to keep track of which packets have been transmitted but not yet acknowledged, and just resend them after a while. It should not slow down until all acknowledgements have been received - if it has other packets that haven't been sent yet it could just send them and resend unacknowledged packets a little later, or if there's only a few packets it should just opportunistically resend them until they are acknowledged. It would want to be a little smart in how acknowledgements themselves are handled - in this situation an acknowledgement itself has just as much chance of being lost as a data packet, and each lost acknowledgement would mean the packets it was trying to acknowledge will be resent. But we can make some of these redundant and acknowledge a packet several times to have the best chance that the sender will see at least one acknowledgement before it tries to resend the packet.

So, I've started working on an implementation of this in Python. This is very much a first cut and should largely be considered a highly experimental proof of concept - it currently just transfers a file over UDP between two machines, has no heuristics to estimate available bandwidth (I just tell it what rate to run at), and it's acknowledgement & resend systems needs some work to reduce the number of unnecessary packets being resent, but given this download was previously measured in Bytes per second (and Steam was estimating this download would take "more than 1 year", so I had my remote server download it using steamcmd), I'd say this is a significant improvement:

Sent: 928M/2G 31% (238337 chunks, 69428 resends totalling 270M, 101 not acked) @ 294K/s
Received: 928M (238322 chunks, 5672 duplicates) @ 245K/s

The "101 not acknowledged" is just due to my hardcoding that as the maximum number of unacknowledged packets that can be pending before it starts resending - it needs to be switched to use a length of time that has elapsed since the packet was last sent compared to the latency of the network and some heuristics. With some work I should also be able to get the number of unnecessary packets being resent down quite a bit (but 5672 / 69428 is close to 10%, which is actually pretty good - this is resending 30% of the packets and the link has 26% packet loss).

Feel free to check out the code - just keep in mind that it is still highly experimental (and one 3GB file I transferred earlier ended up corrupt and had to be repaired with rsync - still need to investigate exactly what happened there) and the usage is likely to change so I won't document how to use it (hint: it supports --help):

https://raw.githubusercontent.com/DarkStarSword/junk/master/quickfox.py