Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you're going to be naming a lot of computers, it's surprisingly important to pick a naming format that is (1) expandable and (2) trivially parseable. The naming scheme that seems simple when you're in a garage can be constraining when there's too many to track in a spreadsheet.

My favored format is somewhat complex in terms of layout, but is compact and easy to read once you get used to it:

  IATA code (https://en.wikipedia.org/wiki/IATA_airport_code)
  Cluster number (digits)
  'r' (for "rack") \______ if meaningful for you
  Rack number      /       (ignore for EC2/GCP)
  'm' (for "machine")
  Machine number
An example hostname might be `dls1r56m10.mycompany-prod.com`.

Alternatives that don't work as well:

* Don't use a fixed-width field anywhere. Google used two-letter cluster names, and when those ran out they discovered that the two-letter assumption had worked its way into every layer of the stack. One of the important core services had `uint16_t cluster` in its wire protocol.

* Don't make up your own cluster names. Don't use names like "northwest" or "east". IATA codes are your friend and you will love them because someone else already decided what they should be and wrote them down.

* Don't use fields without delimiters. Being able to say "read digits until the next non-digit" is incredibly useful when writing ad-hoc parsers in shell scripts, because those parsers won't break when you bring up the first datacenter with more than 99 racks. If you tell people not to write hacky ad-hoc parsers in shell scripts, they will (1) do so anyway and (2) not tell you.

* Don't leave off the cluster number. Yes, you only have one cluster in us-west-2 right now, but maybe in five years you'll need to have more than one because you want to run 30,000 EC2 instances there but all your per-cluster infrastructure software falls over at 20,000 instances. Then you can just turn up "pdx2" instead of trying to explain to Hashicorp engineers why you want to run the world's biggest Consul cluster.

* Do not put the production hostnames under a subdomain of your corporate website. If you are ACME LLC then your hostnames should end with `.acme-prod.com` instead of `.prod.acme.com`. The same is true of corporate IT assets like laptops or workstations (`acme-corp.com` -- NEVER `.corp.acme.com`). Why? Browser cookies.



> IATA codes are your friend and you will love them because someone else already decided what they should be and wrote them down.

UN/LOCODE may be more appropriate:

* https://en.wikipedia.org/wiki/UN/LOCODE

* https://unece.org/trade/cefact/unlocode-code-list-country-an...

Has both country code and location with-in that.


I second LOCODES, they're great


+1 for airport codes. I was surprised to learn about this convention which seems to be used by many CDNs when I first started working for Netflix, but it makes a huge amount of sense.

Now if I just had a dollar for each time I've fat-fingered sjc002 to scj002 ... :)


The IATA code thing seems a bit wobbly. Am i really going to sit and work out if our datacentre in Sutton is closer to Heathrow or City airports?

We name our datacentres with a two-letter city code and a digit (or some letters and then some digits in your framework!). The city codes aren't from any canonical list, but it turns out there aren't enough to matter. So far, this has served equally well at avoiding arguments about what to call things.

We name machines ${datacentre}-${other_stuff}. That makes it trivial to tell what exact datacentre a machine is in. That's very nice if you have to reason about networking. In your scheme, if you had multiple datacentres near one airport, you would have to know the mapping from cluster to datacentre, right?


The usual (and suboptimal) solution to that one is to just use the biggest airport in the metro area -- e.g. servers in Chicago are tagged ORD even if they happen to be located next door to Midway.

However, IATA does provide city codes even if no airport in that city actually uses them. London's is LON, Chicago's is CHI. It's better to just use those.


Same. Biggest airport in the area unless the data center is somehow VERY close to a secondary airport or something.

Can't speak to others but one of the main reasons is if our team has to fly chances are they're hitting the main airport, anyway. Like, we're not going to try to finagle a Spirit or RyanAir flight, just fly to ORD and taxi / uber. For someone looking to travel on the cheap and with no concerns about time those airlines and airports are fine, but work demands change that math.


I love the idea that someone might book a flight purely based on a hostname. Maybe they even have a Perl script for it!


Actually for Sutton it is probably BQH - Biggin Hill


> IATA code (3 characters)

What do you do if there's no airport?


Pick the nearest airport? Or a nearby airport? Airport in the location's capital city? There's always an airport[0].

The purpose is to have a Schelling point that bypasses any tedious weeks-long arguments. Otherwise your Frankfurt datacenter gets named "ceurope" because the London datacenter got "europe" first, or you named the Ohio datacenter "east" and there's a fight about whether to call the new Virginia datacenter "easter".

[0] If you're building a submerged datacenter in the middle of the Atlantic then ... well, do your best.


“I know we’re physically closer to BDL, but I think we’re culturally closer to JFK and, come on, you know the name is cooler,” says the guy determined to reintroduce bikesheding into the naming process.


There's only one good "closer to" for these purposes, and that's "by packet latency." The prefix is basically the location of the carrier hotel that serves your DC.


This seems to just shift the problem to mapping an airport to the carrier hotel, though?


Yeah see, it doesn’t necessarily solve any of these problems.

If you have to do all this guesswork or refer to documentation anyway, maybe just use the human readable and immediately interpretable name of the town the DC is in. Increment a number for each new DC in the area. No cognitive hoops to jump through.


How many Springfields in the US? (65ish)

How many San Joses in the world? (1700ish)

Pennsylvania has two Baldwins, two Whitehalls and two Elizabeths.


Got it. No data centre in Andorra.


https://unece.org/trade/cefact/unlocode-code-list-country-an...

UN/LOCODE tends to have an abbreviation for most places.


I'm sure someone can find an objection. For example, Belfast (UK) is "GB BEL", but isn't actually in Great Britain (it is in "the United Kingdom of Great Britain and Northern Ireland").


Here, "GB" stands for "United Kingdom of Great Britain and Northern Ireland", not "Great Britain".

> The codes are chosen, according to the ISO 3166/MA, "to reflect the significant, unique component of the country name in order to allow a visual association between country name and country code".[5] For this reason, common components of country names like "Republic", "Kingdom", "United", "Federal" or "Democratic" are normally not used for deriving the code elements. As a consequence, for example, the United Kingdom is officially assigned the alpha-2 code GB rather than UK, based on its official name "United Kingdom of Great Britain and Northern Ireland" (although UK is reserved on the request of the United Kingdom). Some codes are chosen based on the native names of the countries. For example, Germany is assigned the alpha-2 code DE, based on its native name "Deutschland".

https://en.wikipedia.org/wiki/ISO_3166-1


> What do you do if there's no airport?

UN/LOCODE may be more appropriate:

* https://en.wikipedia.org/wiki/UN/LOCODE

* https://unece.org/trade/cefact/unlocode-code-list-country-an...

Has both country code and location with-in that.


Or several?


Including location detail in the name would only seem appropriate for massive operations which include slow and methodical procedures for changes. Otherwise you would end up moving servers and now having names which incorrectly suggest their mount positions.


> Do not put the production hostnames under a subdomain of your corporate website. ... Why? Browser cookies.

Is there more to say about this? How do browser cookies conflict with server and PC hostnames?


Interesting, I will bookmark this. Using airport codes for location is surprising to me.


You can pack 3 letters into uint16_t, 5 bits per letter.


That would have been one way to implement it, certainly.

The actual implementation was something like this:

  const char *cluster = "ex";
  uint16_t enc_cluster = (((uint16_t)cluster[0]) << 8) & (uint16_t)cluster[1];
A proposal to reserve the high bit to signal a "long name" was unfortunately(?) not accepted.

This was nearly a decade ago, so things may be different now. You'd have to ask someone who currently works there to tell you what their cluster names look like.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: