Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1. Want a simpler ISA

2. Build it

3. Realize adding a complex instruction that's weird can really boost performance in key use cases

4. Repeat 3 until someone thinks your ISA is overcomplicated and makes a new one



And not only for Instruction Set Architectures. I feel this might be the case for software too.

1. Want a simpler application

2. Write it

3. Realize adding complex code that's weird can really boost performance

4. Repeat 3 until someone thinks your application is overcomplicated and makes a new one

I guess the moral here is that computers are complicated and trying to avoid complexity is hard or infeasible.


Except it's not performance usually, it's just features. And then it gets bloated. And someone things they don't need all that crap.

But turns out they did.


I feel like there are massive industries that exist just because of this


Everyone uses about 5% of the features of their word processors, but it is a different subset for everyone so all features are needed by someone and most get equal useage.


I do not think that is true. There are definitely more and less commonly used features. Everyone uses basic formatting, but only a small minority of users use index or bibliography features. The features might matter a lot to that small minority, but they are not anything like equally used.


There are a small minority that everyone uses. It quickly falls off.


Ehh the issue is features tacked on w/out regards to existing ones. Lotta apps like that end up with multiple ways to do the exact same thing but with very slightly different use cases


Jira has entered the chat...


Or programming language design. However, I think during the process, things are still distilled so that new common patterns are incorporated. I am sure null-terminated strings were not a particularly bad idea in the 70s given the constraints developers faced at the time. It's just that later we have different constraints and have gained more experience, thus finally realizing that it is an unsafe design.


I expect it's basically always been understood that null-terminated strings were unsafe (after all, strncpy has existed since the 70s [1]), more just that the various costs of the alternatives (performance, complexity, later on interop) weren't seen as worth it until more recent times. And it's not like they didn't get tried— Pascal has always had length-prefixed strings as its default, and it's a contemporary of C.

[1]: https://softwareengineering.stackexchange.com/a/450802


We are coming full circle back to Pascal Strings, just that now we don't mind using 32 or 64 bytes for the length prefix. And in cases where we do mind we are now willing to spend a couple of instructions on variable length integers.

But in the bigger picture the wheel of programming languages is a positive example of reinvention. We do get better at language design. Not just because the requirements become more relaxed due to better hardware and better compilers, but also because we have gained many decades of experience which patterns and language features are desirable. And of course the evolution of IDEs plays a huge role: good indentation handling is crucial for languages like python, and since LSP made intelligent auto-complete ubiquitous strong typing is a lot more popular. And while old languages do try to incorporate new developments, many have designed themselves into corners. That's where new languages can gain the real advantage, by using the newest insights and possibilities from the get-go, depending on them in standard library design and leaving out old patterns that fell out of favor.

No modern language in their right mind would still introduce functions called strstr, atoi or snwprint. Autocomplete and large screens make the idea of such function names antipatterns. But C can't easily get rid of them.


I think saying "software" is much too broad, and you have to narrow the comparison to a small subset of software development for it to make sense. With software, typically you're dealing with vague and changing requirements, and the hope is that if you build five simple applications, four will be basically adequate as written, needing only incremental feature enhancements, and only the fifth needs significant work to rise to the emerging complexity of the problem. (The ratio can be adjusted according to the domain.)

In this case they're creating a new solution to a problem where all previous solutions have ended up extremely complex, and the existing range of software currently running on x86 and ARM gives them with a concrete set of examples of the types of software they need to make fast, so they're dealing with orders of magnitude more information about the requirements than almost any software project.

The closest software development equivalent I can think of would be building a new web browser. All existing web browsers are extremely complex, and you have millions of existing web pages to test against.


Yeah, definitely this. We think that the specs are pretty much baked after a year or two of shipping, so the focus is now on making things faster, which requires very complex algorithms, but don't worry, once we get it done, it won't have to change anymore! Right? ... then new use cases come in. New feature requests come in. We want to adapt the current code to the new cases while still covering the existing ones and also maintaining all the performance boost we gained along the way. But the code is such a mess that it is just not feasible to do so without starting from scratch.


Your understansing of programming is superficial to point of unfixable by explaining why :(


Explain why please


Example gratia, last statement is approximately equivalent to "I guess computer science is infeasible."

But it is not; the fact that Debian UNIX running programming languages of the sort that are used today exist proves that we have already managed a significant amount of complexity; compare to infite-size ENIAC for programming. You would certainly want to use the Debian than be stuck with latter given that you could magically NOT to implement the existing systems for it—had to write your whole program from scratch either on the tape or the Debian system.

There are many other layers of issues and problems with the comment, but understand that no matter how intellectually honest and kind you are, you cannot reply to every person who is wrong saying it & telling WHY. Often the "you are wrong" may be more valuable—mutually. I do not want to argue about this though.


...or you can cheat and write the weird code as a separate, child app.


though I believe in RISC-V‘s case what will happen is that every vendor will have that realization at the same time, not tell anyone and make an extension and now there‘s five different incompatible encodings for the same operation.


And that doesn't matter, because:

- Such custom extensions live in custom extension space.

- Software ecosystem that must work across vendors will use neither.

- If these extensions actually do something useful, the experience from them will be leveraged to make a standard extension, which will at some point make it into a standard profile, and thus adopted by the software ecosystem.


Opensource fundamentally changes that situation. All you need is a maintained version of GCC/LLVM that supports your processor and you’ll have distro that supports your needs. Especially if it’s just about some performance boosting instructions. It’s not going to be an issue, we really aren’t in a binary world anymore for the most part.


RISC-V had a fantastic opportunity to be the one true microcontroller ISA, but the attempts to become an "everything, everywhere, all at once" ISA have been fascinating to see throughout the whole process.


RISC-V is not a ISA, but a family of ISAs. That is why is has a small core and plenty of extensions. So instead of having a dozen completely different ISAs you now have a dozen almost the same ISAs. It was meant to be forked and customized for needs.

Ideally, your needs can be met by selecting from the basket of already ratified extensions. So what if no one else has the exact set of extensions you do? A lot easier to support if you share a common core with already supported chips.

Look at how many programming languages look C and how much easier it is to pick them up than say APL. Think of Risc-V as the C of the future.


It seems it's very hard to avoid trying to become everything everywhere. It looks easy from our point of view, but when you're at the wheel, it looks like the most reasonable course of action.


RISC-V will ALWAYS be more simple than x86 because the core of the ISA isn't a disaster. Around 90% of all x86 instructions are just 12 major instructions. x86 gets those instructions very wrong while RISC-V gets them very right. This guarantees that RISC-V will ALWAYS get 90% of code right.

The main benefit of variable-length encoding is code density, but x86 screwed that up too. An old ADD instruction was 2 bytes. Moving to AMD64 made that 3 bytes and APX will make it 4 bytes. RISC-V can code most ADDs in just 2 bytes. In fact, average x86 instruction length is 4.25 bytes vs 4 bytes for ARM64 and just 3 bytes on average for RISC-V.

x86 made massive mistakes early on like parity flag (there today because Intel was trying to win a terminal contract with the 8008), x87, or nearly 30 incompatible SIMD extensions. RISC-V avoided every one of these issues to create actually good designs.

Lessons from previous RISC ISAs were also learned so bad ideas like register windows, massive "fill X registers from the stack", or branch delay slots aren't going to happen.

I hear the claim that "it'll become overcomplicated too", but I think the real question is "What's left to screw up?"

You have to get VERY far down the long tail of niche instructions before RISC-V doesn't have examples to learn from and those marginal instructions aren't used very much making them easier to fix if needed anyway. This is in sharp contrast with x86 where even the most basic stuff is screwed up.


> x86 gets those instructions very wrong while RISC-V gets them very right

That is heavy over-exaggeration.

> An old ADD instruction was 2 bytes.

By "old" you mean.. 32-bit adds with 8 registers to choose from. Which are gonna be the majority of 'int' adds. Unfortunately, 64-bit, and thus pointer, 'add's do indeed require 3 bytes, but then you get a selection of 16 registers, for which RISC-V will need 4-byte instructions, and on x86 a good number of such 'add's can be "inlined" into the consumer ModR/M. (ok, for 'add' specifically RISC-V does have a special-cased compressed instruction with full 5-bit register fields (albeit destination still same as a source), but the only other such instruction is 'mv'. At least x86 doesn't blatantly special-case the encoding of regular 'add' :) )

> nearly 30 incompatible SIMD extensions

30, perhaps, but incompatible? They're all compatible, most depend on the previous (exceptions being AMD's one-off attempts at making their own extensions, and AVX-512 though Intel is solving this with AVX10) and routinely used together (with the note of potentially getting bad performance when mixing legacy-prefix and VEX-prefix instruction forms, but that's trivial to do, so much so that an assembler can upgrade the instructions for you).

RISC-V isn't cheaping out on vector extensions either - besides the dozen subsets/supersets of 'v' with different minimum VLEN and supported element types (..and also 'p', which I've still heard about being supported on some hardware despite being approximately dead), it has a dozen vector extensions already: Zvfh, Zvfhmin, Zvfbfwma, Zvfbfmin, Zvbb, Zvkb, Zvbc, Zvkg, Zvkned, Zvknhb, Zvknha, Zvksed, Zvksh

> massive "fill X registers from the stack"

https://github.com/riscv/riscv-isa-manual/blob/176d10ada5d8c...

Granted, that's optional (though, within a rounding error, all of RISC-V is) and meant primarily for embedded, but it still is a specified extension in the main RISC-V spec document.

Now, all that said, I'd still say that RISC-V is pretty meaningfully better than x86, but not anywhere near the amount you're saying.


I'd love to see a ground-up ISA that takes a CISC-style approach without all the memory hazards (and other faults) baked into it. Decode logic is "cheap" these days relative to other things on the CPU so why not lean into it?


Maintaining separation of concerns is the complete opposite of a CISC-style approach. If you are implementing garbage collection or OOP in hardware, you are bound to conflate things in weird ways just like this becomes unavoidable in software too.


This requirement is captured by MASKMOVDQU. It's clearly an initialism.


Reminds me of the XKCD about standards.

1. There are 5 different standards.

2. Some well-meaning person says "this is ridiculous, this should be standardized"

3. There are 6 different standards.


Time/date being the best example of this. I think I've seen every possible variation by now, except maybe putting the year in between the month and day. Drives me absolutely crazy.


A man finds a shiny bottle on the beach.

He rubs it, and a genie emerges.

  Genie: "I shall grant you one wish, mortal."

  Man: "I wish to bring my grandmother back to life."

  Genie: "Apologies, but reviving the dead is beyond even my vast powers. Perhaps you'd like to make a more... achievable wish?"

  Man: "Alright then, I wish for worldwide adoption of the ISO 8601 date format."

  Genie: "...Uh, where did you say your grandmother was buried?"


Please, at least ask for RFC 3339.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: