Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The moral is that POPCNT is critical to zillions of massive optimizations, and omitting it from base instruction sets, as has been done over and over and then later corrected, each time at great expense, is extremely foolish. The latest offender was RISC-V, but the overwhelming majority of x86 code is still compiled to a target version lacking it.


Well, actually, in this case, no, POPCNT is unnecessary here, since you don't actually have to count anything. you only need to know if the bitmask changed when you added a new character ;)

But otherwise, yes.

I spent quite a while optimizing GCC's bitmap operations years ago, including implementing some new sparse bitmap types, and the sparse bitmap types i implemented ended up dependent on the speed of popcnt and friends, so yes.


In practice if you want the popcount of anything larger than a couple registers you'll want to use vector instructions anyways though. There are a lot of operations that will speed up certain applications if made into their own instruction, not all of them need to be.


it's not sufficient that the newest character in the window is unique? the older characters need to be unique too.


I posted how to do it in another comment on the thread.


If you're using OR (unlike the OP) how are you removing the left side of the window? And still, even if the newest character is unique, previous characters might not be


Input: "abccdef"

Your algorithm will report a match at "bccd" since the "a" wasn't removed from the bitmask.


Yes, you are correct you would have to xor them back out


I swear I have seen this conversation before. A glitch?


If you XOR them together, you get back to the original conversation.


No - he responded on both parts of the thread where i commented. I started to respond here as well, then updated it to redirect to the other comment so we don't end up with two long threads.


It is good to remember that the instruction POPCNT was introduced for the first time in the instruction set of a computer by Alan Turing, already in 1949, in Ferranti Mark I (as "sideways add").

Unfortunately few other computers have included it before Cray 1, which made it well known, under the current name.


We had it on the CDC 6600 in 1963 as 'Bi NXj' if I remember rightly.


Sideways add makes a bit of sense for the name of this operation. But I can't figure out why Cray named it "population count." It just seems odd to refer to the set bits as part of a population that you're counting.


legend say that popcnt was the "NSA" instruction, was heavily used for cryptographic analysis and that was kept out of common instruction sets for a long time to give NSA an advantage.

It is probably just a legend though.


The usefulness of popcnt, et al, in cryptography was known at least as far back as Alan Turing. It wasn't 'kept out of' ISAs, if for no other reason than not all computer manufacturers were (are) American, so the NSA wouldn't have had much leverage to keep, say, Ferranti or Hitachi from including it in their computers.

The legend you're probably misremembering is the one where the NSA approached Seymor Cray at CDC while he was designing the 6600 super and 'suggested' that if he included a popcnt instruction in the ISA, the NSA would certainly look favorably on purchasing some. He did and they did (quite a few). This story is also possibly apocryphal.


> The usefulness of popcnt, et al, in cryptography was known at least as far back as Alan Turing

Ok. Why??????????????

@gpderetta is correct at least in quoting hacker's delight where it was also said to be rumoured the NSA wanted popcount but it was unclear to HD's author why they wanted it.


IIRC the Colossus machine built during WW2 was basically counting the number of bits resulting from doing various boolean operations on the input. It was used to crack the Lorenz cipher, which XORed the plaintext with a pseudo-random keystream (generated using mechanical rotors!).

Cryptography has advanced since then - and I'm not an expert - but there may still be statistical weaknesses that could be found by counting bits?


Why??????????????

It accelerates the calculation of the Hamming weight of a vector or string. Hamming weight is useful lots of places in crypto, like helping frequency analysis, or observed power consumption attacks against crypto systems. It's useful in many other disciplines as well.


I think it is generally taken as useful in, specifically, cryptanalysis, presumably for estimating Hamming distance.


Oh right! Hacker's Delight might be where I read the story, and likely I misremember the details.


I never think of POPCNT because it's not a language primitive in any languages I use.

I would do it by storing the product of the counts; divide by the outgoing count before decrementing, multiply by the incoming count after incrementing. If the product is one they're all unique. Scaling should be comparable to POPCNT.


>RISC-V

Doesn't B have a POPCNT?

B is mandatory in RVA22.


These are the cpop and cpopw instructions.

https://github.com/riscv/riscv-bitmanip/blob/main/bitmanip/i...


Can you buy any RISC-V embodiment that implements such an instruction? It seems to be extremely optional.


B is pretty recent, but Sifive p650 will have it

https://www.sifive.com/cores/performance-p650

Heres another:

https://www.andestech.com/en/2022/12/08/andes-technology-unv...

I expect all future Linux cores to have it


>I expect all future Linux cores to have it

All RVA22 profile cores will, as RVA22 requires B extension.

>Sifive p650 will have it

Current versions of U74 also have it. And a U74 with B is already shipping in VisionFive 2.


It is part of B extension, so it should be available in the VisionFive2 (now being shipped), which has the newer version of SiFive U74 that has B.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: