Side note: there are some business insights you can get from a company using ser...

pacificmint · on Nov 25, 2023

> you can tell the growth rate of the company.

You can even do this when you don’t know the exact interval by using probabilities. The Allies used this method to estimate German tank production in World War II by analyzing the serial numbers of captured or destroyed tanks.

This is know as the German Tank Problem [1]

[1] https://en.wikipedia.org/wiki/German_tank_problem

leobg · on Nov 25, 2023

Very interesting.

I’m a lawyer and using sequential IDs in a fraud case right now, to determine the number of victims.

Unfortunately, so far, I only have the IDs of two victims, and those are from just within about a month, whereas the fraud has likely been going on for several years. Just simply extrapolating that growth rate isn’t going to be very accurate.

Also, I suspect that the perpetrators did not start at ID 1.

infogulch · on Nov 26, 2023

You might try to use the information to find more victims first.

namtab00 · on Nov 25, 2023

ehm, yeah, n=2 will not get you anything useful...

that'll be like trying to determine the average salary in a company with only two known ones, which could be the janitor's and the CEO's

selcuka · on Nov 26, 2023

> that'll be like trying to determine the average salary in a company with only two known ones, which could be the janitor's and the CEO's

Ironically that would be somewhat close to the actual average.

mcherm · on Nov 26, 2023

It would be significantly above the average unless the company is ridiculously top-heavy or has shockingly little variation in salary. Or if the "salary" for the CEO ignores certain compensation (eg: paid a salary of $1 + stock options).

selcuka · on Nov 27, 2023

Sure thing. I could have worded it better, but I was trying to say that it would be much more skewed if the two samples were, say, CEO and the CFO, or two janitors.

pixel8account · on Nov 26, 2023

Even with n=1 you can get something useful. IIRC "on average" if you have ID x than the best population estimation is 2*x. Of course the error margin is immense, but it's still better than nothing.

lhamil64 · on Nov 25, 2023

It also makes it slightly easier to perform certain attacks since it's trivial to figure out other IDs.

ozim · on Nov 26, 2023

Making non-guessable IDs for broken authorization is security by obscurity.

If you have integer IDs it is also trivial to find authorization flaws on your own. Any pentester will go for it right away.

If you make non guessable IDs they might skip it and go look for other stuff.

rkagerer · on Nov 26, 2023

I would have introduced random, increasing skips in the sequence to make my army look 10x bigger.

raggi · on Nov 25, 2023

I see so many organizations add weird slowdowns from debts associated with this. I reflect on some of the most successful tech businesses of the last decade and remember that all their APIs exposed this kind of data early on and many still do.

Does anyone have an example they can reference of a business being harmed by this information being out there?

wodenokoto · on Nov 26, 2023

I don’t know any stories of digital businesses but there was a case where someone went and counted customers at the door only to realize that the company was lying on the yearly reports.

So I guess that kinda harmed that fraudulent business strategy…

aequitas · on Nov 25, 2023

At an internship long ago, my boss instructed me to always add a few extra to the auto incremented order ID so customers couldn’t guess how business was going if they happen to order stuff quickly in a row.

qumpis · on Nov 26, 2023

How big and random were these "few extras"?

diznq · on Nov 26, 2023

The solution I prefer is to simply just encrypt the data such as IDs.

Instead of giving user an ID in response, user gets hmac(cipher(Data, secret_key), secret_key) + cipher(Data, secret_key) and then some simple pre-request handler just iterates over query params / form data and decrypts them if signature matches.

It also works as a really nice CSRF protection as user ID of currently signed user can be embedded into Data and checked if current user.id == decrypted data.id.

Another nice advantage is that you can deny the request right in the beginning as you know ahead of time that the provided data is not valid (signature doesn't match), saving some DB queries.

The down side is that URL gets pretty long though, but if that's hidden by browser or user doesn't care, it's a non-issue

throwaway47747 · on Nov 26, 2023

Can confirm, when I worked in VC we used this to verify order volume for a number of startups we were evaluating. For one startup, I wrote a bot to place a small order a few times a day, and log the order number.

ChrisCinelli · on Nov 26, 2023

Did you use this for due diligence to verify that the data reported by the entrepreneur in the pitch was good or you were looking at some companies in a specific space and checking the company that has more orders?

throwaway47747 · on Nov 26, 2023

The former.

paulddraper · on Nov 25, 2023

> most browsers

Not chrome...

Also, links are a thing in chat, etc

parhamn · on Nov 25, 2023

Heres what a recent youtube (which squid documents as a sample use case) link I shared looked like:

> https://www.youtube.com/watch?v=fFMzQ3tYTFU&pp=ygURY2hQImVzZ...

Or Twitter:

> https://x.com/elonmusk/status/172853302828286055507?s=20

Or TikTok:

> https://www.tiktok.com/@<userId>/video/730292574259232054785...

While I tend to strip the tracking params and there are extensions that do this, I don't think most people do. These URLs are pretty 'ugly'.

So if the links that are being shared most on the internet (YT, TikTok, Twitter) don't care, you probably shouldn't either. I think the onus is on the UI layers (Chat apps, etc) to show urls how they look best on their respective platforms.

Edit: to this point, it looks like HN truncates these to make them less ugly too.

wodenokoto · on Nov 26, 2023

If you use the share button and not the url bar you get prettier and shorter urls.

Don’t know how common that is. Wouldn’t be surprised if no -techies don’t know how to copy from url-bar.

taneq · on Nov 26, 2023

I always take note of invoice numbers for this exact reason, they give you a feel for how busy they are.

hinkley · on Nov 26, 2023

And this is the stuff you get if you manage to get your access control right.

Get it wrong and we jump from actionable business metadata to actionable business data (like perhaps which of your customer's customers are poachable)

hot_gril · on Nov 26, 2023

Yes but just using Sqids doesn't fix this. Sqids are decodable. You need to use a random or otherwise unpredictable input.

What's the advantage of uuid7 (sequential + random) vs uuid4 (full random) for this?

krembo · on Nov 26, 2023

You can extract the timestamp from UUID7/8 so it also can reveal business information.

hot_gril · on Nov 26, 2023

That's why I'm asking, it seems like a liability.

parhamn · on Nov 26, 2023

You don't get the cardinality of the data type, just when the object was created.

There are probably some business cases where the “when” information is potentially useful (I cant think of any) but, you cant know, for example, how many users are in the database.

hot_gril · on Nov 26, 2023

It's usually benign, but why encode any info into your public IDs? I wouldn't go anywhere near that.

It can make sense for some situational internal database use case where you want temporal locality and can't use full sequential since it's distributed, and even then your DBMS might recommend something else, e.g. Spanner explicitly says not to do this. And it doesn't need to be exposed to users.

parhamn · on Nov 27, 2023

Assuming the alternative is a fully random key, this can wreak havoc for performance depending on the database engine and index type used (lots written on this topic).

But I do agree, if performance isn't an issue with your db choice and you're not interested in in getting a free "created_at", might as well go fully random.

hot_gril · on Nov 27, 2023

Primary keys have to be chosen carefully because they impact disk layout, joins, etc, and full random makes bad PKs in certain distributed DBs. But it's simple and cheap to convert a public user ID (full random) to/from internal row keys (sequential-ish) at the API boundaries using a secondary index or even a cache.

hot_gril · on Nov 27, 2023

(Like, simple/cheap enough that it's probably worth doing instead of exposing your row keys. Maybe in some careful cases exposing uuid7 row keys makes sense, but not nearly enough to recommend that so broadly. It's not a safe default.)

parhamn · on Nov 27, 2023

Makes sense. Longest HN convo I've managed to keep at this point -- thanks for engaging!

maxcan · on Nov 26, 2023

Yes, that's how I know I was roughly the 600,000th person to sign up for thefacebook.com.

cortesoft · on Nov 26, 2023

Unless they are doing master-master replication so are incrementing by something other than 1