The problem is, you're using strong language like "under any reasonable definition of 'delivery'." But everyone else is defining delivery differently than you, referring to the delivery of the message to the system itself. Your language implies everyone else is unreasonable.
When your argument depends upon everyone else being unreasonable, maybe you're the one being unreasonable.
Yes, we can make the processing that occurs in response to those delivered message(s) idempotent. But in the end, the system has to either deal with:
1. messages being delivered once or lost entirely, or
2. messages being delivered once or multiple times
You are over-explaining a way to deal with situation #2 (detect duplicates at the endpoint).
Yes, that is true. But why can't I choose to view "the system itself" as the thing that is on the other side of a de-duplicator?
It feels to me like an argument over whether or not humans can fly. An unassisted human cannot fly, but with some technological augmentation, they can. It seems a bit pedantic to deny that someone can fly from LA to New York simply because they have to get into an airplane to do it.
> why can't I choose to view "the system itself" as the thing that is on the other side of a de-duplicator?
because the "de-duplicator" would either:
* be somewhere else on an unreliable network (in which case we have the same problem)
* be on the same machine (or in the same process) as "the system itself" (in which case from a distributed systems perspective makes it the same thing)
> It seems a bit pedantic...
It is pedantic. The only reason that these "delivery" rules are popular is because of how many times programmers have gotten it wrong. Mostly by making assumptions that either:
* the network is reliable
* the message queue (or whatever) will de-duplicate messages for me
Having a clear system boundary is required for analysis.
Knowing that messages will be delivered 1+ times gives us a variety of ways we could choose to deal with this on the endpoint, with different vulnerabilities. (Getting "exactly once" processing usually requires making various kinds of resilience tradeoffs based on timing windows, storage requirements, etc).
> It seems a bit pedantic to deny that someone can fly from LA to New York
At this point I question your good faith. You're calling people out by name, and you're going full on "well, aktuallyyyy" and seeming to deliberately misunderstand other peoples' assertions. "People can't breathe underwater" v. "Well, once I was in a tunnel that was under a body of water, and I still breathed!(@!("
If you choose to define words differently than everyone else, you're just sabotaging your own communication to try and feel smart.
The whole refusal to accept that a field could legitimately define something differently than how you prefer, and then running off to blog about it and name names... and then coming around for round II of flamewar... with ever more splitting of hairs in definitions... is not awesome.
One reason the delivery / processing distinction exists because very often the application needs to atomically persist "I have received this message" with any other state changes made as a result of processing that message for correctness. You can't generally solve this with a layer put on top, even on the same machine. If it's not atomic, then you can still deliver duplicates to the application or end up never delivering to the application. (Power goes out when one side has written but not the other).
So, the state change to "already received" and the changes you want to make in response to the message being received have to happen together. TCP or even a message queueing implementation with a persistence layer cannot solve this problem for you. Thus, the application needs to deal with multiple delivery.
Imagine a "subtract $5 from my bank account" message with no ID on the message itself, and a layer "on top" that gives IDs and tries to ensure exactly once delivery. If the layer "on top" does not change state at the exact time $5 is deducted from the account, bad things can happen-- and in practice this is impossible. Hence, the application needs to be able to cope with the "subtract $5" being delivered to it multiple times, and this deduping has to be intimately tied to it subtracting the $5 (processing).
I don't see anything there that is at odds with anything I have said. All I see there is a restatement of my position.
> One reason the delivery / processing distinction exists because very often the application needs to atomically...
Yes. Do you really think I did not already know that?
> Thus, the application needs to deal with multiple delivery.
That depends on your requirements. What does that have to do with the possibility or impossibility of exactly-once delivery?
> Imagine a "subtract $5 from my bank account" message with no ID on the message itself
I have never denied that you can invent scenarios that will fail. I explicitly said that exactly-once delivery is likely not what you want. What does that have to do with whether or not it is possible?
> Yes. Do you really think I did not already know that?
Well, if the application and this mythical higher-level thing have to do things atomically and be tightly wed, but you're insistent on calling them different entities so that you can win an internet argument that the second one is not getting duplicate "deliveries" ... then that's honestly kind of sad.
The literature has used the term "delivery" like this basically 100% of the time for the past 20 years, and the majority of the time somewhere else. You can argue that your definition makes sense to you, but when everyone else uses the term the other way it's not helpful. Anyone can choose to define words differently from everyone else and then try to lawyer it out, but it's not likely to be useful or accepted.
> you're insistent on calling them different entities so that you can win an internet argument
No, I'm insistent on calling them different entities because in actual practice they can be, and indeed usually are, different entities. De-duping is usually done in the operating system, and applications usually run in user space.
No, as we're saying repeatedly at this point-- the application itself needs to tolerate multiple delivery, because if the deduplication isn't atomic with the application's actions, incorrect behavior results. Stacking on top of TCP doesn't fix this.
> if the deduplication isn't atomic with the application's actions, incorrect behavior results
So? What do the application requirements have to do with the question of whether or not exactly-once delivery is possible? The application is a red herring. Why do you keep bringing it up?
If you want to argue that exactly-once delivery is generally undesirable, that is not in dispute. What is in dispute is whether or not it is possible, and the application requirements cannot possibly have any bearing on that.
This has been a fun thread to read, though not reflecting highly on HN. For what it's worth, I agree with you that the original adage kicking this off is kind of silly, and basically wrong. For further enjoyment I found this interesting blog post from another PhD that addresses things more comprehensively (and also basically agrees with you): https://www.mydistributed.systems/2021/10/exactly-once-deliv...
The opening lines include: "The exact definition, however, is not agreed upon in the community. As a result, there is a debate on whether EOD is possible or impossible to achieve." If nothing else, I and probably others learned today that this is apparently a debate that can quickly turn into a flamewar. And I thought flamewars were mostly dead!
Another interesting paper that came up, as I have an interest in TLA+ proofs: "LogPlayer: Fault-tolerant Exactly-once Delivery using gRPC Asynchronous Streaming" https://arxiv.org/abs/1911.11286 It seems there's no problem in the community to do things like prove fault-tolerant exactly-once delivery, even if such terminology isn't universally agreed on.
It is hard to make the point that "exactly-once delivery" is not a technical term without referring to it. If you think it is a technical term, would you kindly point me to a definition? I'm particularly interested in learning how "exactly-once delivery" is distinguished from "exactly-once processing".
> While exactly-once-delivery is not possible, we have a way out: Exactly-once processing. Exactly-once processing is the guarantee that even though we may receive a message multiple times, in the end, we observe the effects of a single processing operation. This can be achieved in two ways:
> Deduplication: dropping messages if they are received more than once
> Idempotent processing: applying messages more than once has precisely the same effect as applying it exactly once
(I view deduplication as a special case of idempotency).
> A 4 year old piece laying out the exact difference...
"Exactly-once delivery guarantee is the guarantee that a message can be delivered to a recipient once, and only once."
That seems circular to me.
Also, the author's proof is flawed. The 2GP requires more than exactly-once delivery, it requires common knowledge. It is not enough for the first general to know that the message will be received, it is required that the first general knows that the message has been received, and that the second general knows that the first general knows this, and that the first general knows that the second general knows... and so on.
Delivery is the property of a message showing up at a receiver, irrespective of the receiver making state changes.
Processing is making state changes.
You can't dedupe messages without some kind of state change. Your guards, writing down that a given message has been here before, have been delivered the message. An endpoint on a lossy medium has to cope with either (0 or 1) or (1 or many) messages.
Now, can the guards deliver it to you at most once? Well, if there's no lossy medium between, sure. But we already know that we can deliver exactly once when the medium is perfectly reliable. The guards have "processed" the message for this purpose, and the fact that they can deliver it to you over a perfectly reliable channel is moot.
The distinction is between the characteristics of the channel
(delivery) versus what the receiver must do to achieve appropriate processing properties.
These might come from some combination of intrinsic idempotency, timers, persisting past messages to disks, establishing a message ordering, etc, etc, etc. These are the mechanisms that you need to cope with "one or many" delivery, and they all shape the state model of your system with respect to messages.
> The guards have "processed" the message for this purpose, and the fact that they can deliver it to you over a perfectly reliable channel is moot.
No, it isn't, because the situation inside the fort is different than the situation outside. The odds of a courier being intercepted inside the fort are effectively zero. If your computational model includes a non-zero probability of failure "inside the fort" then you are no longer in the realm of distributed systems, but are now talking about fault-tolerant computing, which is a whole nuther kettle o worms.
No, one of the key ideas in distributed systems is the network is unreliable. It’s not bad actors “inside the fort” it’s just that messages get lost sometimes.
Internal switches can fail. There are countless reasons why packets will get lost
No. An element of a distributed system that wants to debit from bank accounts must acknowledge the receipt of a message after making its effects durable. That system may fail between the change being durable and the acknowledgment making it back to the original sender.
That element must tolerate that debit being delivered multiple times. And we can't even solve that problem by a higher-order system providing "reliable delivery" on that element, unless that thing persists atomically with the application performing the transaction (or the application itself tolerates multiple delivery).
> An element of a distributed system that wants to debit from bank accounts must acknowledge the receipt of a message after making its effects durable.
I do not and never have disputed that. But I fail to see what that has to do with the matter at hand, namely, whether or not exactly-once delivery is possible.
Draw a diagram containing a source system, a destination system, and an unreliable communication channel between them. The destination system also has an output with no unreliable communications channel.
Exactly-once delivery means that a message sent from the source system reaches the destination system exactly once, and its result reaches the output channel exactly once as a consequence.
Excactly-once processing means that a message sent from the source system produces the expected output from the destination system once, even though it may be received by the destination system more than once.
(That's a little sloppy because it could use more discussion of the conditions in which it won't be received zero times, and how those are different between exactly-once and at-most-once delivery, but that's mostly beside the point because it isn't part of the distinction between exactly-once delivery and exactly-once processing. And, while definitely technical, they always involved a somewhat idealized view of the destination system, because all communications channels, including those internal to a single device, have some degree of unreliability.)
Yes, that is exactly my point. The only way you can make it non-sloppy is to define "delivery" as being something that happens exclusively upstream of deduping.
No, I'm saying its "sloppy" as a definition because while it addresses the distinction you ask about it, it doesn't fully cover what distinguishes exactly-once from at-most-once.
> The only way you can make it non-sloppy is to define "delivery" as being something that happens exclusively upstream of deduping.
"Deduping" can happen in many places. If it happens anywhere before the destination system end of the unreliable connection it is part of delivery (but also can't get you to exactly-once delivery). If it happens on the destination side of the unreliable communication channel, then yes, it's not part of the delivery guarantee, it is how you get exactly-once processing from at-least-once delivery. This has been well-known for a very long time. (I don't think it was new when I first encountered it in 1999.)
I disagree. If I'm an actual general in a fort with a gate, and I tell the gate guards to inspect messages brought in by couriers and not allow couriers carrying duplicate messages to enter, the ones that the guards let through are still delivering those messages to me.
I had a similar thought as what's in your post, but didn't share because I thought the reaction would be something like this. I personally would word this as you can achieve exactly once semantics by combining at least once delivery with idempotency.
If you have duplicate things, then you've clearly been delivered more than one thing. There is no way to deliver something exactly once, and yet the receiver has more than one thing such that they can throw all but one thing away.
> If you have duplicate things, then you've clearly been delivered more than one thing.
Yes, that's true. But this doesn't turn on what "delivery" means, it turns on what "you" means. If "you" are downstream of a de-duplication mechanism, then "you" can get exactly once-delivery. Why is that so absurd?
>Yes, that's true. But this doesn't turn on what "delivery" means, it turns on what "you" means. If "you" are downstream of a de-duplication mechanism, then "you" can get exactly once-delivery. Why is that so absurd?
So in the case of, say, a network service on server A and a network client B, your solution to "exactly once delivery" is to re-define it as "deliver it from A to B multiple times and have B deduplicate"?
Do you not see how nonsensical that is to call that "exactly once delivery"?
If you have a reliable connection between “you” and the deduplicator, then “you” aren’t receiving messages over an unreliable connection at all and so the claim that you can’t have exactly once delivery over an unreliable connection isn’t applicable in the first place. You’re receiving messages over a reliable connection and what happens upstream of that is irrelevant.
The same way I ensure any behavior in a digital system. There are boundaries inside of which processes are presumed to be reliable, typically consisting of a CPU, memory busses, and attached non-volatile storage. If you don't assume that those are reliable, then you can't guarantee anything.
Great! I agree 100%. We have to assume a "reliable network" within a "boundary" (i.e. a computers CPU, memory, busses etc...). Distributed systems (from which these rules are taken) are specifically systems where anything within one of these "boundaries" is considered a "single node" and treated the same, whether it's a NIC, a kernel module/driver, a user space process or anything else.
In our case if we were to take (for example) that the NIC would de-duplicate the messages for us, anyone writing the producer/sender and a user-space program for the receiver a would need to know that the NIC was doing this (or risk having messages dropped for failure of including a unique id).
This is a pedantic point, but I would strongly stress that the only reason these "delivery rules" are so popular (and evoke such a reaction) is because of the very large number of times that programmers mis-understand them.
Commonly they either assume that:
* the network is reliable
* something else will guarantee "exactly once delivery" for me (when in fact nothing will)
It explains why most people seem to disagree with you on what "delivery" and "you" mean in this context. For the majority of contexts, "delivery" means that a system responsible for de-duplication receives the message.
Yeah, but that just seems like a bizarre definition on which to base the claim that you cannot have exactly-once delivery. Obviously, if you define delivery to preclude de-duplication, then you can't have exactly-once delivery. But you can have something that delivers messages (for some reasonable definition of "delivers") exactly once. It seems weird to define delivery in such a way that such a system does not provide exactly-once delivery.
I'm not sure how to convince you to get over your hangup around how other people feel about the word "delivery", but this is exactly why others distinguish between "delivery" and "processing". If you think there are better terms than those two to describe "the system that receives a message and must be responsible for de-duplication" and "the system that can rely on messages being already de-duplicated", then feel free to propose them and have people debate, that I suppose. But because of what I noted earlier, this is a very useful distinction to maintain for people working on systems that are responsible for de-duplication (likely most people on this forum), and these words seem to make sense for most individuals.
I don't have any hangups about how other people feel about anything. My "hangup" is that I see no evidence that there is a consensus on a technical definition of the word "delivery". In the absence of such a definition, there is no basis for asserting that exactly-once delivery is impossible, particularly when there are existence proofs to the contrary based on reasonable informal definitions of the word "delivery".
I'm sorry people are being such utter jerks about this. Honestly, the comments on this are an embarrassment. Even if you are wrong, there's absolutely no call for the tone a lot of the comments are taking.
"This post is ostensibly about an obscure technical issue in distributed systems, but it's really about human communications."
and reiterate it at the end:
"This post was intended to be about human communication more than distributed systems or network protocols."
I really don't know how I could have made this any clearer.