Reliable Ordered Messages

derefr · on Aug 13, 2016

> Many people will tell you that implementing your own reliable message system on top of UDP is foolish.

It is foolish: both SCTP (over UDP) and RTP+RTCP (over UDP) already exist, and in combination do exactly what the author wants. There are more things in this world than TCP and "your own custom protocol."

It's almost like the 'network engineers' in gaming have never looked at the network architecture of a telecom system. Same problems: well-known standard solutions.

(There is one case where inventing your own transport protocol instead of reusing the standard ones makes sense: when every higher-level protocol but one is blocked, forcing you to tunnel over that one protocol. Thus WebRTC. But now that's a standard too, so don't reinvent that either!)

felixgallo · on Aug 13, 2016

Oh hi! I'm a network engineer! In gaming! I happen to routinely use 'the network architecture of a telecom system' via erlang so I may be qualified to answer your post.

RFC 6951 style SCTP-over-UDP is not super widely implemented, so you end up having to implement it yourself. When you're doing so, you pretty quickly realize that it's significantly more heavyweight than anything you actually need for a game (e.g. https://tools.ietf.org/html/rfc4960#section-5.1.6), while at the same time missing several key facilities, such as knowing instantly, with every packet received, which of the previous (e.g.) 32 packets have also been received, and so forth.

RTP has the same problem. There's fields included with every packet (SSRC, CSRC, others) which are irrelevant to gaming. There's a bunch of libraries, most of which are total junk. There are end to end compatibility problems owing to different interpretations of the spec.

This turns out to be a gigantic problem, because gaming network engineers use all sorts of ridiculous tricks to try to bum every last byte out of their protocols. For an entertaining afternoon check out http://gafferongames.com/2015/03/14/the-networked-physics-da....

A much more relevant criticism might be, well why don't you use, e.g. protobuf rather than invent your own framing system? Because the case of protobuf -- where you want, essentially, to transmit a purpose designed struct directly over the wire with no bullshit -- is very much like the case of a twitch game network protocol and comes pretty close to hand-bummed byte sizes; close enough that the benefits of using something a bit more extensible and introspectable might be worth it.

Anyway, in short, you don't know what you're talking about and Glenn is a national treasure.

vvanders · on Aug 13, 2016

Agreed with everything above, one of the things I love about games is the really unique problems they tend to present.

FWIW if you're looking at protobuf then take a peek at flatbuffers[1]. We found that protobuf's allocation was hurting perf and flatbuffers fit the same role with much better performance.

[1] https://google.github.io/flatbuffers/

the_angry_angel · on Aug 13, 2016

> why don't you use, e.g. protobuf

protobuf sounds attractive, but (unless I've missed something in protobuf's implementation), but out of the box you have 2 issues:

1. Protobuf doesn't/can't deal with fragmentation - so you'd have to ensure your protobufs are small enough to fit inside a UDP packet

2. Without a wrapper you couldn't put multiple small protobufs into a single UDP packet?

Both things are fixable by wrapping protobuf, but I suppose if you're trying to get something as small as possible, can you actually just do better by avoiding protobuf in the first place (I guess the answer is yes?)

niftich · on Aug 13, 2016

I understood it as using protobuf's interface definitions as as an integration point, not necessarily protobuf's exact wire format and everything below it that comes out-of-the-box (like TCP and below) in the default client.

felixgallo · on Aug 14, 2016

I meant everything up to and including protobuf's wire format, but nothing beyond that, which as you note wouldn't be super great.

mikemarcin · on Aug 15, 2016

Protobuf and most similar libraries don't support delta compression which is essential for game state updates.

felixgallo · on Aug 15, 2016

protobuf as a structure serialization/deserialization library shouldn't have the responsibility for delta calculations in the first place. Once the application decides what the delta is, it can use protobuf or anything else to encode and transmit that.

mikemarcin · on Aug 16, 2016

You want to build a full update, then for each client diff between the last ack'd full update and send only that delta.

The application specific logic builds the full update. The rest can be done automatically by a good library.

Building a delta compressed protobuf message for a transform (using protobuf.net) looks something like: http://codepad.org/3vjxogxZ

felixgallo · on Aug 17, 2016

Dear lord. Why not just use optional fields?

jwatte · on Aug 18, 2016

To be fair, protobuf adds field-and-type framing for each field, which a known-in-sync client/server don't need; that adds up to 10-30% over a full packet!

derefr · on Aug 13, 2016

Thanks for that, it was informative! I mean, I certainly don't know what I'm talking about with respect to the specific protocols; SCTP and RTP were ones off the top of my head that seemed to fit the use-case. (I have used both—from Erlang, coincidentally!—as part of real-time collaboration software before. But the "real-time" there isn't nearly the same "real-time" as twitch games have, so I never really needed to dig deep into their semantics.)

I think I got off-track of the original thing that incensed me, though: it's not that there's any perfect existing network protocol that does everything the way games need it.

Rather, my thought was based on this premise: the OSI protocol-layering model is awesome. It allows you to have a stack of very thin protocol layers (e.g. UDP itself) that just add one single thing to the semantics of the layer below them, with as little overhead as it's possible to get away with in doing that. Thin layers, because they just Do One Thing Well, are easy to engineer and QA and port, so you often find (good!) libraries for them available for every platform. Creating your own custom protocol that re-implements the feature a thin protocol layer is intended to provide, often doesn't reduce overhead at all, and in the process often results in worse and less-tested code than you'd get by writing the rest of your custom code on top of the thin layer.

So: maybe you can't find a perfect "gaming protocol." But I can't believe that the most sensible thing to do is to implement one directly on top of UDP, rather than using A and B and C orthogonal protocol layers on top of UDP to handle A and B and C features, and then only coding the stuff that's left over.

In other words: why write your own multiplexing code instead of relying on existing session-layer logic? And why, indeed, write your own network-encoding code instead of relying on existing presentation-layer logic (like, of course, Protobuf)? And on and on for eight to ten other concerns it's possible to "pull out" of any given protocol.

felixgallo · on Aug 14, 2016

Well, I'd caution you that just because you don't understand why something is sensible, it doesn't necessarily mean that it's foolish.

In the case of protocols, there really don't exist "A and B and C orthogonal protocol layers on top of UDP to handle A and B and C features." What you have instead is several different, custom ideas that all need to work together at a low level.

For example, you need to be able to identify a particular packet so that you can keep track of whether you have it or not; so you have sequence numbers. But you need to keep the packet small, so you don't use a 128 bit float, you use a 16 bit int and keep track of under/overflow yourself. You need to understand if any very recent packets that you've sent need to be reencoded and sent again, so you have an ack field that tells the recipient the high water mark number of the packet you most recently saw from them. You need to know some, but not forever, of the history, so you keep a, say, 32 bit bitfield which tells the recipient the state of the last 32 packets you've sent. You need multiplexed messages to pack as much info in a single transmission as possible, so you have length-encoded payload attachments probably.

And all of these lessons were learned in blood. Like, hundreds of games have shipped and died to teach us these things. Many of them taught the bitter lessons of overabstraction and excessive separation of concerns. People shipped competitive fighting games with TCP. People shipped client-authoritative games where real money was involved. I mean, the battleground is littered with people who fucked up.

Glenn is one of those guys who has seen everything -- including some of the AAAest of the AAA games that you've ever even heard of -- and he's sharing some wisdom with you. Instead of first trying to question what he is doing, and attacking it for being foolish, why don't you take a chair, sit down, and try to figure out how he got where he did? You will learn a great deal and you will become a radically better engineer.

renox · on Aug 18, 2016

a 128 bit float --> a 128 bit integer Probably a typo: using float to send sequence number would be a very bad idea..

vvanders · on Aug 13, 2016

And it's almost like network engineers haven't looked at what gaming does.

> it is message-oriented like UDP and ensures reliable, in-sequence transport of messages with congestion control

Emphasis on the sequencing. Usually in games we send multiple channels over a single connection and head of line blocking will wreak havok on your simulation.

In addition you may want different(or no) congestion control which can have significant impact on the realtime nature of your packets(I've looked at WebRTC, it doesn't make any of guarantees that you'd need in games).

derefr · on Aug 13, 2016

What are you quoting? I said in combination:

• SCTP or RTCP for data that needs to arrive (both are reliable, multiplexed connection-oriented protocols, like HTTP2, where packets are demultiplexed into ordered sequences at the receiver, but are not forced into a global linearized order on the wire)

• RTP for data that can be dropped. (Using RTP's realtime-video streaming dataflow semantics, under which congestion-control is handled via the client's transport layer informing the server's application layer of a reliability rate, and the server being able to do things like send more/fewer packets, or just change the data being sent to an entirely different stream, in response.)

vvanders · on Aug 13, 2016

I quoted SCTP wiki page, if you using them in combination that means either multiple ports or you need another protocol so you know what type of frame you're getting which adds overhead.

felixgallo already covered most of my points, gaming has some specific requirements that a lot of these protocols don't provide.

toomim · on Aug 13, 2016

> It is foolish: both SCTP (over UDP) and RTP+RTCP (over UDP) already exist, and in combination do exactly what the author wants.

Can you explain this? What parts of "what the author wants" are implemented by which features in these protocols?

CamperBob2 · on Aug 13, 2016

It's almost like the 'network engineers' in gaming have never looked at the network architecture of a telecom system. Same problems: well-known standard solutions.

Not even remotely.

The biggest difference is that in commercial telecom, there's no such thing as obsolete data that can be safely dropped. If an audio stream packet doesn't arrive in time, the user will perceive a glitch, which the codec may or may not be able to cover up psychoacoustically. So these users try hard to prioritize reliable delivery whenever possible. Telecom's mission is to emulate a circuit-switched network with packet switching, which, while necessary, is never going to be the right strategy in the general case.

In game networking, on the other hand, a 200-millisecond-old packet is like a week-old newspaper. If it doesn't arrive in a reasonable amount of time, there's no upside to delivering it at all. The client is better off waiting for the next update from the server, relying on local prediction in the meantime. There's an excellent chance that the user will never notice the dropped packet(s).

This fact by itself means that any application-agnostic attempt to implement reliable delivery or packet sequencing is a waste of time from a game developer's perspective. Streaming protocols developed for video and other media may be a better match, but they still won't be ideal for a use case involving interactivity and client-side prediction. (For one thing, any media streaming protocol is likely to be more concerned with bandwidth over latency, and games live and die over the latter.)

People have been trying to come up with "one transport protocol to rule them all" since TCP/IP, and have always failed spectacularly. There's always going to be room for interesting work in this space, and the notion that an entire class of users should "just use XYZ/IP" or whatever is always going to be as misguided as the early recommendations to ignore UDP in favor of TCP were.

derefr · on Aug 13, 2016

You attacked a strawman (a "telecom" engineering that hasn't been taking the concerns of realtime video into account every telecom protocol for the last 20 years.) Video has exactly the semantics you're talking about, and video-typed flows in RTP et al are interactive channels (originally, to enable adaptive streaming) in exactly the way games need for their data.

(The sending of extra information to enable client-side prediction, on the other hand, is something that can be added without altering the transport layer at all. Layering!)

Bytes on the wire are bytes on the wire. It doesn't matter what vertical you're working in; it just matters how the bytes on the wire need to behave. People should forget about the fact that something is "a game" and just look for wire-protocol structural isomorphisms before assuming their case is so unique.

CamperBob2 · on Aug 13, 2016

Again, nobody in gamedev cares about "ordered." Knowing what you missed can still be important, of course, but in-order delivery is simply not a helpful role for the transport layer to take on. It has to be handled at a higher level.

There is nothing that XYZ/IP from RFC xxxx can do to make everyone happy in this space, and there's no sense in trying.

derefr · on Aug 13, 2016

I didn't use the word "ordered" anywhere. Neither RTP nor SCTP require you to do anything in order; they're completely unordered at the level of flows, with no flow head-of-line-blocking another flow. Spread things horizontally so you've got one flow per concurrent consumer in the client (e.g. per zone, or per other player, or whatever else) so that messages concerning events that happen in parallel aren't queuing into a single flow. This is already the idiomatic thing to do if you're working with an actor-based engine; each actor gets its own flow.

> there's no sense in trying

Do you mean you see no advantages in trying? Like, say, allowing people to easily program bots for your Second-Life-alike using well-known open protocols? Or allowing partners to write their own game clients for your Minecraft-alike server, rather than porting yours?

Just because you can't make every game ever work perfectly without writing custom code once or twice, doesn't mean that every game requires custom code. Most games in a given genre are, from a networking perspective, the same game. If you don't think telecom protocols work, put out some gaming transport-protocol RFCs, like BitTorrent did with µTP!

jwatte · on Aug 18, 2016

Also, in games, most "messages" are a few bytes, and you pack dozens or hundreds into a single UDP packet. Each of the messages will have application specific ordering/reliability/latency trade offs. No existing IETF or ITU or other protocol comes close to the efficiency gotten by designing the stack correctly for the application.

But, why would games go through the trouble to do an RFC? It's not like you're going to have DOOM talk to Overwatch. And, while the router will see those two games as the same, the internals of the packets reflect the internals of each game. Hence, similarity ends approximately at the UDP level -- which is what the internet cares about, anyway!

(In fact, I even tried to get a standard for the most basic cross have entity transfer 10 years ago, with zero uptake.)

CamperBob2 · on Aug 13, 2016

I didn't use the word "ordered" anywhere.

I seem to recall it being in there before you edited your post, but that's a tough thing to prove and a weak thing to base an argument on, so, OK. Either way, I don't have time to engage a moving target.

jephir · on Aug 14, 2016

There's a lot of talk here about TCP vs UDP vs SCTP vs x protocol.

It's important to keep in mind that this is all optimization. Don't lose sight of the big picture.

On one RTS we ended up just serializing world state over a TCP socket. If you can get away with the brute force approach then just do it. Don't optimize until you have a measurable performance problem.

dividuum · on Aug 13, 2016

Nice blog post. It reminded me a bit about how QuakeWorld did networking over UDP back in the days. There is a post about it here: http://fabiensanglard.net/quakeSource/quakeSourceNetWork.php. I remember implementing that myself for a networked version of jump'n'bump which never got anywhere :-)

jhasse · on Aug 15, 2016

Jump'n'bump rocks!

caseymarquis · on Aug 14, 2016

Glad to read anything I can regarding networking applications in practice.

Transmitting between embedded devices and PCs, I've been stuck with either Tcp or Udp; no other protocols are typically implemented on the embedded side. I've found sending reliable ordered messages requires implementing something on top of either protocol. Udp is great for concrete messages, but doesn't give you ordered reliability, Tcp has the opposite problem where the streaming nature means you need to wrap messages and occasionally confirm proper receipt of a message to figure out when old messages can be tossed. The issue with Tcp being that when the connection breaks down, you don't in practice know exactly where it broke and if the other side received several completed messages and acted on them, or if you need to resend.

I'd initially assumed Tcp would cover all my needs. Didn't take much initial research to figure out that wasn't true.