Side note: there are some business insights you can get from a company using serial ids.
i.e if you sign up and get user id 32588 and make another account a few days later, you can tell the growth rate of the company.
And this is possible with every resource type in the application.
I do wonder how much the url bar junk thing matters these days. I tend to use uulids (waiting on uuid v7 wide adoption), and they're a bit ugly, but most browsers hide most of the urls now anyway. The fact that there is a builtin time component comes in clutch sometimes (e.g. object merging rules).
You can even do this when you don’t know the exact interval by using probabilities. The Allies used this method to estimate German tank production in World War II by analyzing the serial numbers of captured or destroyed tanks.
I’m a lawyer and using sequential IDs in a fraud case right now, to determine the number of victims.
Unfortunately, so far, I only have the IDs of two victims, and those are from just within about a month, whereas the fraud has likely been going on for several years. Just simply extrapolating that growth rate isn’t going to be very accurate.
Also, I suspect that the perpetrators did not start at ID 1.
It would be significantly above the average unless the company is ridiculously top-heavy or has shockingly little variation in salary. Or if the "salary" for the CEO ignores certain compensation (eg: paid a salary of $1 + stock options).
Sure thing. I could have worded it better, but I was trying to say that it would be much more skewed if the two samples were, say, CEO and the CFO, or two janitors.
Even with n=1 you can get something useful. IIRC "on average" if you have ID x than the best population estimation is 2*x. Of course the error margin is immense, but it's still better than nothing.
I see so many organizations add weird slowdowns from debts associated with this. I reflect on some of the most successful tech businesses of the last decade and remember that all their APIs exposed this kind of data early on and many still do.
Does anyone have an example they can reference of a business being harmed by this information being out there?
I don’t know any stories of digital businesses but there was a case where someone went and counted customers at the door only to realize that the company was lying on the yearly reports.
So I guess that kinda harmed that fraudulent business strategy…
At an internship long ago, my boss instructed me to always add a few extra to the auto incremented order ID so customers couldn’t guess how business was going if they happen to order stuff quickly in a row.
The solution I prefer is to simply just encrypt the data such as IDs.
Instead of giving user an ID in response, user gets hmac(cipher(Data, secret_key), secret_key) + cipher(Data, secret_key) and then some simple pre-request handler just iterates over query params / form data and decrypts them if signature matches.
It also works as a really nice CSRF protection as user ID of currently signed user can be embedded into Data and checked if current user.id == decrypted data.id.
Another nice advantage is that you can deny the request right in the beginning as you know ahead of time that the provided data is not valid (signature doesn't match), saving some DB queries.
The down side is that URL gets pretty long though, but if that's hidden by browser or user doesn't care, it's a non-issue
Can confirm, when I worked in VC we used this to verify order volume for a number of startups we were evaluating. For one startup, I wrote a bot to place a small order a few times a day, and log the order number.
Did you use this for due diligence to verify that the data reported by the entrepreneur in the pitch was good or you were looking at some companies in a specific space and checking the company that has more orders?
While I tend to strip the tracking params and there are extensions that do this, I don't think most people do. These URLs are pretty 'ugly'.
So if the links that are being shared most on the internet (YT, TikTok, Twitter) don't care, you probably shouldn't either. I think the onus is on the UI layers (Chat apps, etc) to show urls how they look best on their respective platforms.
Edit: to this point, it looks like HN truncates these to make them less ugly too.
You don't get the cardinality of the data type, just when the object was created.
There are probably some business cases where the “when” information is potentially useful (I cant think of any) but, you cant know, for example, how many users are in the database.
It's usually benign, but why encode any info into your public IDs? I wouldn't go anywhere near that.
It can make sense for some situational internal database use case where you want temporal locality and can't use full sequential since it's distributed, and even then your DBMS might recommend something else, e.g. Spanner explicitly says not to do this. And it doesn't need to be exposed to users.
Assuming the alternative is a fully random key, this can wreak havoc for performance depending on the database engine and index type used (lots written on this topic).
But I do agree, if performance isn't an issue with your db choice and you're not interested in in getting a free "created_at", might as well go fully random.
Primary keys have to be chosen carefully because they impact disk layout, joins, etc, and full random makes bad PKs in certain distributed DBs. But it's simple and cheap to convert a public user ID (full random) to/from internal row keys (sequential-ish) at the API boundaries using a secondary index or even a cache.
(Like, simple/cheap enough that it's probably worth doing instead of exposing your row keys. Maybe in some careful cases exposing uuid7 row keys makes sense, but not nearly enough to recommend that so broadly. It's not a safe default.)
i.e if you sign up and get user id 32588 and make another account a few days later, you can tell the growth rate of the company.
And this is possible with every resource type in the application.
I do wonder how much the url bar junk thing matters these days. I tend to use uulids (waiting on uuid v7 wide adoption), and they're a bit ugly, but most browsers hide most of the urls now anyway. The fact that there is a builtin time component comes in clutch sometimes (e.g. object merging rules).