Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Side note: there are some business insights you can get from a company using serial ids.

i.e if you sign up and get user id 32588 and make another account a few days later, you can tell the growth rate of the company.

And this is possible with every resource type in the application.

I do wonder how much the url bar junk thing matters these days. I tend to use uulids (waiting on uuid v7 wide adoption), and they're a bit ugly, but most browsers hide most of the urls now anyway. The fact that there is a builtin time component comes in clutch sometimes (e.g. object merging rules).



> you can tell the growth rate of the company.

You can even do this when you don’t know the exact interval by using probabilities. The Allies used this method to estimate German tank production in World War II by analyzing the serial numbers of captured or destroyed tanks.

This is know as the German Tank Problem [1]

[1] https://en.wikipedia.org/wiki/German_tank_problem


Very interesting.

I’m a lawyer and using sequential IDs in a fraud case right now, to determine the number of victims.

Unfortunately, so far, I only have the IDs of two victims, and those are from just within about a month, whereas the fraud has likely been going on for several years. Just simply extrapolating that growth rate isn’t going to be very accurate.

Also, I suspect that the perpetrators did not start at ID 1.


You might try to use the information to find more victims first.


ehm, yeah, n=2 will not get you anything useful...

that'll be like trying to determine the average salary in a company with only two known ones, which could be the janitor's and the CEO's


> that'll be like trying to determine the average salary in a company with only two known ones, which could be the janitor's and the CEO's

Ironically that would be somewhat close to the actual average.


It would be significantly above the average unless the company is ridiculously top-heavy or has shockingly little variation in salary. Or if the "salary" for the CEO ignores certain compensation (eg: paid a salary of $1 + stock options).


Sure thing. I could have worded it better, but I was trying to say that it would be much more skewed if the two samples were, say, CEO and the CFO, or two janitors.


Even with n=1 you can get something useful. IIRC "on average" if you have ID x than the best population estimation is 2*x. Of course the error margin is immense, but it's still better than nothing.


It also makes it slightly easier to perform certain attacks since it's trivial to figure out other IDs.


Making non-guessable IDs for broken authorization is security by obscurity.

If you have integer IDs it is also trivial to find authorization flaws on your own. Any pentester will go for it right away.

If you make non guessable IDs they might skip it and go look for other stuff.


I would have introduced random, increasing skips in the sequence to make my army look 10x bigger.


I see so many organizations add weird slowdowns from debts associated with this. I reflect on some of the most successful tech businesses of the last decade and remember that all their APIs exposed this kind of data early on and many still do.

Does anyone have an example they can reference of a business being harmed by this information being out there?


I don’t know any stories of digital businesses but there was a case where someone went and counted customers at the door only to realize that the company was lying on the yearly reports.

So I guess that kinda harmed that fraudulent business strategy…


At an internship long ago, my boss instructed me to always add a few extra to the auto incremented order ID so customers couldn’t guess how business was going if they happen to order stuff quickly in a row.


How big and random were these "few extras"?


The solution I prefer is to simply just encrypt the data such as IDs.

Instead of giving user an ID in response, user gets hmac(cipher(Data, secret_key), secret_key) + cipher(Data, secret_key) and then some simple pre-request handler just iterates over query params / form data and decrypts them if signature matches.

It also works as a really nice CSRF protection as user ID of currently signed user can be embedded into Data and checked if current user.id == decrypted data.id.

Another nice advantage is that you can deny the request right in the beginning as you know ahead of time that the provided data is not valid (signature doesn't match), saving some DB queries.

The down side is that URL gets pretty long though, but if that's hidden by browser or user doesn't care, it's a non-issue


Can confirm, when I worked in VC we used this to verify order volume for a number of startups we were evaluating. For one startup, I wrote a bot to place a small order a few times a day, and log the order number.


Did you use this for due diligence to verify that the data reported by the entrepreneur in the pitch was good or you were looking at some companies in a specific space and checking the company that has more orders?


The former.


> most browsers

Not chrome...

Also, links are a thing in chat, etc


Heres what a recent youtube (which squid documents as a sample use case) link I shared looked like:

> https://www.youtube.com/watch?v=fFMzQ3tYTFU&pp=ygURY2hQImVzZ...

Or Twitter:

> https://x.com/elonmusk/status/172853302828286055507?s=20

Or TikTok:

> https://www.tiktok.com/@<userId>/video/730292574259232054785...

While I tend to strip the tracking params and there are extensions that do this, I don't think most people do. These URLs are pretty 'ugly'.

So if the links that are being shared most on the internet (YT, TikTok, Twitter) don't care, you probably shouldn't either. I think the onus is on the UI layers (Chat apps, etc) to show urls how they look best on their respective platforms.

Edit: to this point, it looks like HN truncates these to make them less ugly too.


If you use the share button and not the url bar you get prettier and shorter urls.

Don’t know how common that is. Wouldn’t be surprised if no -techies don’t know how to copy from url-bar.


I always take note of invoice numbers for this exact reason, they give you a feel for how busy they are.


And this is the stuff you get if you manage to get your access control right.

Get it wrong and we jump from actionable business metadata to actionable business data (like perhaps which of your customer's customers are poachable)


Yes but just using Sqids doesn't fix this. Sqids are decodable. You need to use a random or otherwise unpredictable input.

What's the advantage of uuid7 (sequential + random) vs uuid4 (full random) for this?


You can extract the timestamp from UUID7/8 so it also can reveal business information.


That's why I'm asking, it seems like a liability.


You don't get the cardinality of the data type, just when the object was created.

There are probably some business cases where the “when” information is potentially useful (I cant think of any) but, you cant know, for example, how many users are in the database.


It's usually benign, but why encode any info into your public IDs? I wouldn't go anywhere near that.

It can make sense for some situational internal database use case where you want temporal locality and can't use full sequential since it's distributed, and even then your DBMS might recommend something else, e.g. Spanner explicitly says not to do this. And it doesn't need to be exposed to users.


Assuming the alternative is a fully random key, this can wreak havoc for performance depending on the database engine and index type used (lots written on this topic).

But I do agree, if performance isn't an issue with your db choice and you're not interested in in getting a free "created_at", might as well go fully random.


Primary keys have to be chosen carefully because they impact disk layout, joins, etc, and full random makes bad PKs in certain distributed DBs. But it's simple and cheap to convert a public user ID (full random) to/from internal row keys (sequential-ish) at the API boundaries using a secondary index or even a cache.


(Like, simple/cheap enough that it's probably worth doing instead of exposing your row keys. Maybe in some careful cases exposing uuid7 row keys makes sense, but not nearly enough to recommend that so broadly. It's not a safe default.)


Makes sense. Longest HN convo I've managed to keep at this point -- thanks for engaging!


Yes, that's how I know I was roughly the 600,000th person to sign up for thefacebook.com.


Unless they are doing master-master replication so are incrementing by something other than 1




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: