Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

More context: This is related to today's release of the Spring Top 500 list (https://news.ycombinator.com/item?id=40346788). Aurora rated 1,012.00 PetaFLOPS/second Rmax, and is in 2nd place, behind Frontier.

In the November 2023 list, Aurora was also in second place, with an Rmax of 585.34 PetaFLOPS/second.

See https://www.top500.org/system/180183/ for the specs on Aurora, and https://www.top500.org/system/180047/ for the specs on Frontier.

See https://www.top500.org/project/top500_description/ and https://www.top500.org/project/linpack/ for a description of Rmax and the LINPACK benchmark, by which supercomputers are generally ranked. The Top 500 list only includes supercomputers that are able to run the LINPACK benchmark, and where the owner is willing to publish the results.

The jump in Aurora's Rmax scope is explained by Aurora's difficult birth. https://morethanmoore.substack.com/p/5-years-late-only-2 (published when the November 2023 list came out) has a good explanation of what's been going on.



Looking at the two specs, interesting to see how Frontier (the first, running AMD CPUs) has much better power efficiency than Aurora (the second, running Intel), 18.89 kW/PFLOPS vs 38.24 kW/PFLOPS respectively... Good advertisement for AMD? :)


These days this is true from top to bottom, desktop, servers, ... Even in gaming, the 7800X3D is cheaper than the 14700K, it is also more performant and yet uses roughly 20% less power at idle and the gap only grows at full charge.

AMD's current architecture is very power responsible, and Intel has more or less used watt overfeeding to catch back in performance.


Is there any good estimate of how much of AMD’s power efficiency advantage can be attributed to TSMC’s process vs Intel’s? I know in GPUs AMD doesn’t enjoy the same advantage vs nVidia since they’re both manufactured by TSMC, and with nVidia actually being on a smaller node, iirc.


7800x3d maxes out around 80 watts (has to be gentle to the vcache), the 14900k can go up to 300w (out of box, though Intel is issuing a new bios to limit that), and they trade blows in gaming.

I would say that's a bit more than process efficiency?

https://youtu.be/2MvvCr-thM8?t=423


Oh, certainly there are significant architectural advantages, especially for the vcache SKUs in gaming. It would just be interesting to see how much TSMC is still (or maybe further) ahead of Intel. Intel was so used to having the process advantage vs AMD that their architecture could afford to be less efficient. But now that they're the ones behind in both process and arch, they're really hurting, especially on mobile now that AMD is making inroads and Snapdragon X is about to get a serious launch in a week. I'm typing this on a ThinkPad 13s with a Snapdragon 8cx CPU running Windows, and it's a pretty usable device that lasts much longer on a smaller battery than my comparable Intel laptop. It seems to particularly use much less power on standby, although it can't seem to wake up from hibernation reliably.


Aurora has 21K Xeons and 64K Intel X(e) GPUs which provide most of the compute power. The GPUs are made by TSMC.

https://en.wikipedia.org/wiki/Intel_Xe


I was under the impression that AMD desktops/home servers generally don't go below 15-20 W, while Intel can get down to 4-6 W idle for the full system. Has that changed? AMD seems to generally be the better perf/$, but I thought power usage at idle was their big drawback for desktops/low-usage servers.

IIRC the numbers I've read are that (at least desktop) Intel CPUs should be using something like 0.2 W package power at idle if the OS is correctly configured, regardless of whether it's a performance (K) or "efficiency" (T) model. Most power usage is the rest of the system.


https://en.wikipedia.org/wiki/Cool%27n%27Quiet

They both have similar frequency and voltage scaling algorithms at this point. You will probably not see 0.2W idle though, both probably idle around 10W on desktop and 5W on laptop. But Intel is getting much more aggressive with "turbo boost" to try to hide their IPC/process deficit vs. AMD/TSMC, to the point that a 14900k will use 120W+ to match the performance of a 7800x3d at 60W.


As far as I can gather, that's not the case. These guys[0] have been crowdsourcing information about power efficiency for a while now, and the big takeaways right now seem to be that

* Intel is the best for idle (there's several people that have systems that run at less than 5 W for the full system using modified old business minipcs off ebay). Allegedly someone has a 9500T at less than 2 W full system power.

* It doesn't matter which Intel processor you use; all of them for many years will get down to 1 W or less for the CPU at idle. A 14900K will idle just as well as an 8100T, which will be much better than a Ryzen 7950X.

* AMD pretty much never gets below 10 W with any of the Ryzen chiplet CPUs. Only their mobile processors can do it, but they don't sell them retail and they're usually (always?) soldered.

* Every component except the CPU is more important. Your motherboard and PCIe devices need to support power management. You need an efficient PSU (which has nothing to do with the 80-plus rating, which doesn't consider power draw at idle). One bad PCIe device like an SSD or a NIC can draw 10s of watts if it breaks sleep states. Unfortunately, this information seems to be almost entirely undocumented beyond these crowdsourcers.

For a usually idle home-server, Intel seems to be better for power usage, which is unfortunate because AMD tends to have more IO and supports ECC.

[0] https://www.hardwareluxx.de/community/threads/die-sparsamste...


Also the delta between theoretical performance and benchmarked performance is much smaller for Frontier (AMD) than for Aurora (Intel).

That being said, note that the software is also different on the two computers.


Wouldn't be surprised if it's the same thing : more watt usage, more heat, more throttling.


Note all mentions of FLOPS in this thread refer to FP64 (double precision), unlike more popular “AI OPS”, which are typically INT8, specified for modern GPUs.


> which are typically INT8

These systems are used for training which is VERY rarely INT8. On Frontier, for example, it's recommended to use bfloat16 or float32 if that doesn't work for you/your application.

Nvidia has FP8 with >=Hopper and supposedly AMD MI300 has it as well although I have no experience with the MI300 so I can't speak to that.


What does FLOPS/second mean? Isn’t FLOPS already per second? Are they accelerating?


I'd actually be interested in an estimate of the world's overall flop/s^2. Could someone please run a back of the envelope calculation for me, e.g. looking at least year's data?


We added 6.3 gigaFLOPS per second on 2022-23, based on an increase of 200 million gigaFLOPS observed on that period. This is in contrast to 20 gigaFLOPS per second in 2021-22. It’s nil in 2020-21, but that seems only partially attributable to the pandemic, as there appears to be a tick-tock pattern going back to 2013.

https://ourworldindata.org/grapher/supercomputer-power-flops


Yeah, the top500 pages cited use Flop/s (apparently using Flop for “Floating point operations” – not sure which “o” and “p” are used), I’ve could swear I’ve seen FLOPS and seen it expanded specifically as “FLoating point Operations Per Second” when I first encountered it, FLOPS/s seems to be using “FLOPS” like the “Flop” above (probably as “FLoating point OPerationS”, in which case the “/s” makes sense.)



Made me chuckle. F=ma, where a is the derivative of FLOPS with respect to time.


Some people treat FLOPS as “FLoating point OPerationS”.


But that doesn't make much sense in comparison to evaluating system performance. A Pentium III could have a billion FP32 operations given almost 16 years, but you wouldn't say its 1 GFLOPS. Assuming the "S" is seconds, it becomes a useful metric and we can say it has 2 FLOPS.


Then it should be "FLOPs" to indicate that the S is not separate word in the acronym, just the plural form.


It’s not an acronym, it’s an abbreviation. Unfortunately, the rules for abbreviations are effectively arbitrary (or specific to some etymology that’s not available from context). The “s” could be op(s) or seconds, but back in 90s trade publications it was seconds.


I don't know anything about supercomputer architecture; are lifetime upgrades that double the performance typical, let alone YoY?

What do those kinds of upgrades entail from a hardware side? Software side? Is this just a horizontal scaling of a cluster?


This isn’t really an upgrade, it’s the system still being commissioned.

See the last paragraph of my post for a link to more info.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: