Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> referring to the delivery of the message to the system itself

And how do you define "the system itself"?



The thing that is at the end of the lossy medium. It must tolerate (0 or 1) or (1 or more) things being delivered to it.


Yes, that is true. But why can't I choose to view "the system itself" as the thing that is on the other side of a de-duplicator?

It feels to me like an argument over whether or not humans can fly. An unassisted human cannot fly, but with some technological augmentation, they can. It seems a bit pedantic to deny that someone can fly from LA to New York simply because they have to get into an airplane to do it.


> why can't I choose to view "the system itself" as the thing that is on the other side of a de-duplicator?

because the "de-duplicator" would either:

* be somewhere else on an unreliable network (in which case we have the same problem)

* be on the same machine (or in the same process) as "the system itself" (in which case from a distributed systems perspective makes it the same thing)

> It seems a bit pedantic...

It is pedantic. The only reason that these "delivery" rules are popular is because of how many times programmers have gotten it wrong. Mostly by making assumptions that either:

* the network is reliable

* the message queue (or whatever) will de-duplicate messages for me


Having a clear system boundary is required for analysis.

Knowing that messages will be delivered 1+ times gives us a variety of ways we could choose to deal with this on the endpoint, with different vulnerabilities. (Getting "exactly once" processing usually requires making various kinds of resilience tradeoffs based on timing windows, storage requirements, etc).

> It seems a bit pedantic to deny that someone can fly from LA to New York

At this point I question your good faith. You're calling people out by name, and you're going full on "well, aktuallyyyy" and seeming to deliberately misunderstand other peoples' assertions. "People can't breathe underwater" v. "Well, once I was in a tunnel that was under a body of water, and I still breathed!(@!("

If you choose to define words differently than everyone else, you're just sabotaging your own communication to try and feel smart.


> You're calling people out by name

I am? Where?

> Getting "exactly once" processing usually requires making various kinds of resilience tradeoffs based on timing windows, storage requirements, etc

Yes, of course. But that's not the same as "impossible".


> I was reading Hacker News a few days ago and stumbled on a comment posted by ...


Really? That is what causes you to question whether or not I'm acting in good faith?

If that's what you call "calling people out by name" I guess we'll just have to agree to disagree.


The whole refusal to accept that a field could legitimately define something differently than how you prefer, and then running off to blog about it and name names... and then coming around for round II of flamewar... with ever more splitting of hairs in definitions... is not awesome.

You are especially well-answered here, I think: https://news.ycombinator.com/item?id=41599131

One reason the delivery / processing distinction exists because very often the application needs to atomically persist "I have received this message" with any other state changes made as a result of processing that message for correctness. You can't generally solve this with a layer put on top, even on the same machine. If it's not atomic, then you can still deliver duplicates to the application or end up never delivering to the application. (Power goes out when one side has written but not the other).

So, the state change to "already received" and the changes you want to make in response to the message being received have to happen together. TCP or even a message queueing implementation with a persistence layer cannot solve this problem for you. Thus, the application needs to deal with multiple delivery.

Imagine a "subtract $5 from my bank account" message with no ID on the message itself, and a layer "on top" that gives IDs and tries to ensure exactly once delivery. If the layer "on top" does not change state at the exact time $5 is deducted from the account, bad things can happen-- and in practice this is impossible. Hence, the application needs to be able to cope with the "subtract $5" being delivered to it multiple times, and this deduping has to be intimately tied to it subtracting the $5 (processing).


> You are especially well-answered here

I don't see anything there that is at odds with anything I have said. All I see there is a restatement of my position.

> One reason the delivery / processing distinction exists because very often the application needs to atomically...

Yes. Do you really think I did not already know that?

> Thus, the application needs to deal with multiple delivery.

That depends on your requirements. What does that have to do with the possibility or impossibility of exactly-once delivery?

> Imagine a "subtract $5 from my bank account" message with no ID on the message itself

I have never denied that you can invent scenarios that will fail. I explicitly said that exactly-once delivery is likely not what you want. What does that have to do with whether or not it is possible?


> Yes. Do you really think I did not already know that?

Well, if the application and this mythical higher-level thing have to do things atomically and be tightly wed, but you're insistent on calling them different entities so that you can win an internet argument that the second one is not getting duplicate "deliveries" ... then that's honestly kind of sad.

The literature has used the term "delivery" like this basically 100% of the time for the past 20 years, and the majority of the time somewhere else. You can argue that your definition makes sense to you, but when everyone else uses the term the other way it's not helpful. Anyone can choose to define words differently from everyone else and then try to lawyer it out, but it's not likely to be useful or accepted.


> you're insistent on calling them different entities so that you can win an internet argument

No, I'm insistent on calling them different entities because in actual practice they can be, and indeed usually are, different entities. De-duping is usually done in the operating system, and applications usually run in user space.


No, as we're saying repeatedly at this point-- the application itself needs to tolerate multiple delivery, because if the deduplication isn't atomic with the application's actions, incorrect behavior results. Stacking on top of TCP doesn't fix this.


> if the deduplication isn't atomic with the application's actions, incorrect behavior results

So? What do the application requirements have to do with the question of whether or not exactly-once delivery is possible? The application is a red herring. Why do you keep bringing it up?

If you want to argue that exactly-once delivery is generally undesirable, that is not in dispute. What is in dispute is whether or not it is possible, and the application requirements cannot possibly have any bearing on that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: