Thanks for writing this up, though I feel that it may be a bit premature since a...

rootlocus · on May 4, 2022

> We recognize the value in this feature and have tried to come up with a good way of doing it that fits within the design constraints of the language, but it has several design issues. > Recall that Hare is designed to be similar to C in terms of scope and goals.

This sounds like the language inherits all the limitations of C and isn't able to provide sufficient expressivity to solve problems that you yourself consider valuable (which aren't that many considering the niche scope). People move away from C because of these limitations, how do you expect they would choose Hare as a replacement?

ddevault · on May 4, 2022

It does inherit some of the limitations of C, but in exchange, it also inherits much of the power. The trade-offs are similar to C, which is desirable in many cases. But, you may be working under different constraints, in which case you may be better off with another language - which is totally fine!

lifthrasiir · on May 4, 2022

> So far as hash collisions are concerned, these are very unlikely in this use-case. This is not a hash map where hash flooding is a concern, and accidental collisions are so unlikely as to be a negligible risk.

The current compiler will silently ignore colliding hashes and (I believe) result in a very confusing error. The probability of this happening is not very large, but still not small enough that someone will hit this error probably this year. At the very least you need to report hash collisions so that you can somehow rename your modules and so on.

ddevault · on May 4, 2022

Checking the bucket entries based on their module identifier instead of (or in addition to) their key is a simple and obvious improvement to this code.

https://todo.sr.ht/~sircmpwn/hare/679

eptcyka · on May 4, 2022

> There are three key problems: finding a good way of hashing arbitrary data structures (or arbitrarily limiting the kinds of keys the map can store), finding a good way of determining equality of arbitrary data structures, and dealing with memory allocation semantics.

It's not unheard of or unreasonable to have collections that have take an allocator as an argument. And a comparator and a hash function as arguments.

I think adding generic types and/or a notion of an interface might increase the complexity of the language, but it will also become far more expressive. And yes, these are tools that are sharp and require caution, especially when designing a standard library - you want to think hard about the constraints you'd impose on the consumers of your standard library when picking which interfaces require what methods etc, but I still think that there's not much to gain from a language without these features when compared to C.

ddevault · on May 4, 2022

Aye, we would have probably gone with a design similar to what you suggest... if we had generics, but we do not. Generics simply don't fit with the core values of the language. I still believe that there is room for languages without them.

ayende · on May 4, 2022

Something like this:

    struct cmp
    {
       hash: *fn(_:*void) u64,
       eq  : *fn(_:*void, _:*void) bool,
       free: *fn(_:*void) void 
    }

With maybe some default options should be enough for at least the basics. Note that the indirection overhead would be significant, but without generics or something similar, you don't really have a choice.

ddevault · on May 4, 2022

The problem here is *void, which is not great. You lose type safety with this approach, and the allocation problems are still present. Rolling your own can be both more type safe and better suited to your problem area, and it's not particularly difficult.

eptcyka · on May 4, 2022

What about a use case where you want to use a hashmap to store 3 different types of keys independently. Would you implement the hashmap once and just store a union of the 3 types and then assert the correct type has been stored at each call site? Would you implement the hash map 3 different times? What happens if later on, some years later, the code needs to change and another type needs to be stored in a map? Generalizing a map via void pointers is the most pragmatic solution.

ddevault · on May 4, 2022

It depends on your specific needs and is difficult to characterize in hypothetical. I imagine that it would probably involve Hare's native tagged unions, though.

bigbillheck · on May 4, 2022

> I still believe that there is room for languages without them.

In the same sense that there's room for languages without, say, for-loops.

duped · on May 4, 2022

Why not support limited parametric polymorphism for things like equality and hashing? These are well studied problems, seems a bit convoluted to reject existing solutions.

ayende · on May 4, 2022

I think that there is a problem here with regards to what you are trying to do.

You are conflating the _implementation_ of a hash table with the _existence_ of a hash table.

You list a few problems:

Computing the hash for arbitrary structure & check equality of the hash is something that you can do by adding a func in the creation of the map.

That will likely add overhead (since you can't inline it, and most of them are trivial), but at least you'll have something.

I'm not sure how memory allocations for a hash table is any different than the usual. Memory ownership semantics are likely relevant, so you'll likely need to add a way provide a free() routine here, as well.

A common example may be a `map<string, file>` - where you need to deallocate the string and `close()` the file.

This gets interesting when you do a set over an already existing value, by the way. And certainly a challenge you'll need to deal with.

My issue isn't so much with the complexity of the solution. Design choices such as not having generics will have impact on performance. But having _a_ solution is still a baseline requirement in my eyes. The original post said that you can just roll your own, except that you really can't. Certainly not over & over again.

With regards to the hash table I criticized, that is exactly the point. You were able to "get away" with implementing a pretty bare bone thing, but you called it out as something that should be viable in general.

Hashtables are in _common_ use. They aren't something that you'll use once in a blue moon. Go's decision to provide dedicated syntax for map wasn't an accident.

And for certain scenarios, you may need to use a different implementation to get the best results. But for the vast majority of scenarios, you just want _something_. And whatever you have in the box should suffice.

That is because a generic data structure has a lot of work done on it to optimize it for a wide range of scenarios. Conversely, a one off implementation is likely to be far worse.

There are also security considerations to take into account. You are not likely to properly implement things like protections against has poisonings attacks, etc.

And "easy to fix" is sure, but you have to remember to _do_ that, each and every time you write this code. Because your approach is to give that ownership to the user.

And your language design means that you _cannot_ actually solve that properly. That is not a good place to be.

C was designed in the 60s, and it is very much a legacy of that timeframe. Today, there is literally no system beyond hello world you can build that won't require hash tables galore.

The web is full of those (response and request headers, query string parameters), data formats (JSON is basically a hashtable plus some fancy pieces) , caches (that are just hash tables), etc.

ddevault · on May 4, 2022

>You were able to "get away" with implementing a pretty bare bone thing, but you called it out as something that should be viable in general.

I can see how you had this impression from the language of the blog post, but what I meant to express is that the approach generalizes, even if this specific code does not.

>The web is full of those (response and request headers, query string parameters), data formats (JSON is basically a hashtable plus some fancy pieces) , caches (that are just hash tables), etc.

The web generally falls well outside of the target use-cases for Hare.