Thanks for writing this up, though I feel that it may be a bit premature since at this point hardly anyone has any real experience writing Hare code. Regardless, I understand that this is a contentious design decision of Hare, so I would be happy to explain in it in more detail.
We have discussed adding first-class maps to the language many times. We recognize the value in this feature and have tried to come up with a good way of doing it that fits within the design constraints of the language, but it has several design issues. There are three key problems: finding a good way of hashing arbitrary data structures (or arbitrarily limiting the kinds of keys the map can store), finding a good way of determining equality of arbitrary data structures, and dealing with memory allocation semantics. Hare does not have generics, and the alternative is a hands-free approach to hashing and equality, which has some troubling limitations, some of which are very unintuitive and non-obvious (which conflicts with our values for explicitness). The allocation problem is also troublesome: Hare uses manual memory management, and any hash map solution which involves magic behind-the-scenes allocations has serious design conflicts with Hare's principles.
This article criticizes a sample of a hash map implemented for the build driver. This hash map is used to store a mapping of module details keyed on the module's namespace. It is true that it is a fixed size map, and that collisions can easily be found for the fnv32 hash. These constraints limit the ability for this hash map design to generalize. However, this is not supposed to generalize. It's designed to be a special-purpose hash map for this specific use-case, and takes the simplest approach which is sufficient to this specific problem. As the author notes, there are many approaches to hash maps and there is no one-size-fits-all solution. So far as hash collisions are concerned, these are very unlikely in this use-case. This is not a hash map where hash flooding is a concern, and accidental collisions are so unlikely as to be a negligible risk. If this were not the case, it is easily fixed by just comparing each bucket entry by the namespace rather than by the hash. For use-cases where these things do matter, I would be interested in seeing something like siphash end up in the Hare standard library.
Recall that Hare is designed to be similar to C in terms of scope and goals. C also provides no general-purpose hash map, and little by way of other data structures (though some attempts exist, none of them good). Each of these approaches and concerns comes with different needs and trade-offs, and Hare places responsibility for evaluating these needs and trade-offs into the capable hands of the programmer. This is a reflection of Hare's values, which are distinct from the values of some other languages mentioned in the OP - Rust, Go, Zig, C++, etc.
Thanks for providing me the opportunity to clarify this here, though perhaps this merited a blog post rather than an overlong HN comment. I understand that this is a design decision which appears especially confusing from the outside, but know that it was made through careful deliberation and discussion.
Oh-- and a map-like thing for net::uri would be nice to have as a convenience function in the future. We need to implement the basic approach using the most fundamental design and then build the convenience on top of it, in order to accommodate all use-cases and leave the trade-offs for the user to make.
> We recognize the value in this feature and have tried to come up with a good way of doing it that fits within the design constraints of the language, but it has several design issues.
> Recall that Hare is designed to be similar to C in terms of scope and goals.
This sounds like the language inherits all the limitations of C and isn't able to provide sufficient expressivity to solve problems that you yourself consider valuable (which aren't that many considering the niche scope). People move away from C because of these limitations, how do you expect they would choose Hare as a replacement?
It does inherit some of the limitations of C, but in exchange, it also inherits much of the power. The trade-offs are similar to C, which is desirable in many cases. But, you may be working under different constraints, in which case you may be better off with another language - which is totally fine!
> So far as hash collisions are concerned, these are very unlikely in this use-case. This is not a hash map where hash flooding is a concern, and accidental collisions are so unlikely as to be a negligible risk.
The current compiler will silently ignore colliding hashes and (I believe) result in a very confusing error. The probability of this happening is not very large, but still not small enough that someone will hit this error probably this year. At the very least you need to report hash collisions so that you can somehow rename your modules and so on.
Checking the bucket entries based on their module identifier instead of (or in addition to) their key is a simple and obvious improvement to this code.
> There are three key problems: finding a good way of hashing arbitrary data structures (or arbitrarily limiting the kinds of keys the map can store), finding a good way of determining equality of arbitrary data structures, and dealing with memory allocation semantics.
It's not unheard of or unreasonable to have collections that have take an allocator as an argument. And a comparator and a hash function as arguments.
I think adding generic types and/or a notion of an interface might increase the complexity of the language, but it will also become far more expressive. And yes, these are tools that are sharp and require caution, especially when designing a standard library - you want to think hard about the constraints you'd impose on the consumers of your standard library when picking which interfaces require what methods etc, but I still think that there's not much to gain from a language without these features when compared to C.
Aye, we would have probably gone with a design similar to what you suggest... if we had generics, but we do not. Generics simply don't fit with the core values of the language. I still believe that there is room for languages without them.
With maybe some default options should be enough for at least the basics.
Note that the indirection overhead would be significant, but without generics or something similar, you don't really have a choice.
The problem here is *void, which is not great. You lose type safety with this approach, and the allocation problems are still present. Rolling your own can be both more type safe and better suited to your problem area, and it's not particularly difficult.
What about a use case where you want to use a hashmap to store 3 different types of keys independently. Would you implement the hashmap once and just store a union of the 3 types and then assert the correct type has been stored at each call site? Would you implement the hash map 3 different times? What happens if later on, some years later, the code needs to change and another type needs to be stored in a map? Generalizing a map via void pointers is the most pragmatic solution.
It depends on your specific needs and is difficult to characterize in hypothetical. I imagine that it would probably involve Hare's native tagged unions, though.
Why not support limited parametric polymorphism for things like equality and hashing? These are well studied problems, seems a bit convoluted to reject existing solutions.
I think that there is a problem here with regards to what you are trying to do.
You are conflating the _implementation_ of a hash table with the _existence_ of a hash table.
You list a few problems:
Computing the hash for arbitrary structure & check equality of the hash is something that you can do by adding a func in the creation of the map.
That will likely add overhead (since you can't inline it, and most of them are trivial), but at least you'll have something.
I'm not sure how memory allocations for a hash table is any different than the usual. Memory ownership semantics are likely relevant, so you'll likely need to add a way provide a free() routine here, as well.
A common example may be a `map<string, file>` - where you need to deallocate the string and `close()` the file.
This gets interesting when you do a set over an already existing value, by the way. And certainly a challenge you'll need to deal with.
My issue isn't so much with the complexity of the solution. Design choices such as not having generics will have impact on performance. But having _a_ solution is still a baseline requirement in my eyes.
The original post said that you can just roll your own, except that you really can't. Certainly not over & over again.
With regards to the hash table I criticized, that is exactly the point. You were able to "get away" with implementing a pretty bare bone thing, but you called it out as something that should be viable in general.
Hashtables are in _common_ use. They aren't something that you'll use once in a blue moon. Go's decision to provide dedicated syntax for map wasn't an accident.
And for certain scenarios, you may need to use a different implementation to get the best results. But for the vast majority of scenarios, you just want _something_. And whatever you have in the box should suffice.
That is because a generic data structure has a lot of work done on it to optimize it for a wide range of scenarios. Conversely, a one off implementation is likely to be far worse.
There are also security considerations to take into account. You are not likely to properly implement things like protections against has poisonings attacks, etc.
And "easy to fix" is sure, but you have to remember to _do_ that, each and every time you write this code. Because your approach is to give that ownership to the user.
And your language design means that you _cannot_ actually solve that properly.
That is not a good place to be.
C was designed in the 60s, and it is very much a legacy of that timeframe. Today, there is literally no system beyond hello world you can build that won't require hash tables galore.
The web is full of those (response and request headers, query string parameters), data formats (JSON is basically a hashtable plus some fancy pieces) , caches (that are just hash tables), etc.
>You were able to "get away" with implementing a pretty bare bone thing, but you called it out as something that should be viable in general.
I can see how you had this impression from the language of the blog post, but what I meant to express is that the approach generalizes, even if this specific code does not.
>The web is full of those (response and request headers, query string parameters), data formats (JSON is basically a hashtable plus some fancy pieces) , caches (that are just hash tables), etc.
The web generally falls well outside of the target use-cases for Hare.
We have discussed adding first-class maps to the language many times. We recognize the value in this feature and have tried to come up with a good way of doing it that fits within the design constraints of the language, but it has several design issues. There are three key problems: finding a good way of hashing arbitrary data structures (or arbitrarily limiting the kinds of keys the map can store), finding a good way of determining equality of arbitrary data structures, and dealing with memory allocation semantics. Hare does not have generics, and the alternative is a hands-free approach to hashing and equality, which has some troubling limitations, some of which are very unintuitive and non-obvious (which conflicts with our values for explicitness). The allocation problem is also troublesome: Hare uses manual memory management, and any hash map solution which involves magic behind-the-scenes allocations has serious design conflicts with Hare's principles.
This article criticizes a sample of a hash map implemented for the build driver. This hash map is used to store a mapping of module details keyed on the module's namespace. It is true that it is a fixed size map, and that collisions can easily be found for the fnv32 hash. These constraints limit the ability for this hash map design to generalize. However, this is not supposed to generalize. It's designed to be a special-purpose hash map for this specific use-case, and takes the simplest approach which is sufficient to this specific problem. As the author notes, there are many approaches to hash maps and there is no one-size-fits-all solution. So far as hash collisions are concerned, these are very unlikely in this use-case. This is not a hash map where hash flooding is a concern, and accidental collisions are so unlikely as to be a negligible risk. If this were not the case, it is easily fixed by just comparing each bucket entry by the namespace rather than by the hash. For use-cases where these things do matter, I would be interested in seeing something like siphash end up in the Hare standard library.
Recall that Hare is designed to be similar to C in terms of scope and goals. C also provides no general-purpose hash map, and little by way of other data structures (though some attempts exist, none of them good). Each of these approaches and concerns comes with different needs and trade-offs, and Hare places responsibility for evaluating these needs and trade-offs into the capable hands of the programmer. This is a reflection of Hare's values, which are distinct from the values of some other languages mentioned in the OP - Rust, Go, Zig, C++, etc.
Thanks for providing me the opportunity to clarify this here, though perhaps this merited a blog post rather than an overlong HN comment. I understand that this is a design decision which appears especially confusing from the outside, but know that it was made through careful deliberation and discussion.
Oh-- and a map-like thing for net::uri would be nice to have as a convenience function in the future. We need to implement the basic approach using the most fundamental design and then build the convenience on top of it, in order to accommodate all use-cases and leave the trade-offs for the user to make.