The scope is small and well defined and so is a welcome break from having to divine murky business requirements, slow iteration speed with microservices, and other every day fatigue.
A possible alternative source of income for Mozilla I think would be, and one I haven’t seen mentioned before, providing consulting services in their areas of expertise.
For example, demand for Rust engineers seems to be steadily increasing[1] and I can’t think of a better company with a larger concentration of Rust expertise than Mozilla. Perhaps the same could work for other areas (ex frontend), but I’m biased toward Rust.
Due to lack of experience in this regard I’m not sure if this could actually work, but I could imagine allocating part of their engineering time budget to consultation, whose income would fund their core products.
[1] Besides all the articles popping up about starting to employ Rust by well known companies, my anecdote as a full-time Rust SDE is that I’m getting more and more requests on LinkedIn, and not just for the usual crypto roles.
There are so many ways for Mozilla to leverage their core competencies. Unfortunately failed management with no means to remove them means none of that will happen.
Executive leadership at Mozilla is a disgrace that has now caused a tangible negative result on the future of the internet. The fact that Mitchell Baker still leads this organization is insane. An absolutely inept failure somehow still leads Mozilla. It’s absolutely crazy
You do realize that Mitchell has only been ceo of mozilla corp since January, right? And that almost all of the C suite has turned over in the last few years?
I completely agree. It’s depressing to see the whole thing steadily come apart at the seams due to their own leadership (or lack thereof, one could argue).
Actually I realize what I wrote above is less true now as they have fired a good number of their incredibly skilled engineers, many of them of course Rust developers. So I suppose my post should be rephrased along the lines of “what Mozilla should or could have done”.
It's a valid point, I think. I'd probably also trust something so established as Boost more than some random guy's lib on GitHub. However, I specifically wrote mio because I prefer not to use Boost, and from what I understand, many others don't either.
This is a valid point. My use case was very frequent reads of large files at pretty much unpredictable positions, so in theory mmap seemed justified. However, I never got around thoroughly testing this assumption, and may indeed just have been better off using read(2) and its variants.
You seem very experienced, so I hope you don't mind a question. In my use case the files were as large as tens of gigabytes and I was creating read-only mappings of 256KB-1MB chunks in them, keeping the mmap handles around according to a cache policy and RAM usage limit. Do you think in this case using mmap could in theory introduce performance gains?
I think that this is the wrong way to use mmap. Just map the whole file at once. The operating system will automatically read the pages you access from disk. And if memory gets tight, these pages will be flushed to disk if they are dirty and then discarded before the system starts paging. These mmapped pages essentially live in the disk cache.
> The operating system will automatically read the pages you access from disk. And if memory gets tight, these pages will be flushed to disk if they are dirty and then discarded before the system starts paging.
You can tell that you understand how modern OS memory management works when you realize that the OS "automatically read[ing] the pages...from disk" and "flush[ing them] to disk" on memory pressure is paging whether those pages are anonymous pages or mmaped file pages. :-)
[Edit: flushing dirty file-paged pages is analogous to swapping anonymous memory to the swapfile. Discarding clean file-backed pages is a bit like discarding anonymous pages that have been made unused through munmap, process death, etc.]
But to the GP's point: you don't need (except to conserve address space) to limit file mapping size. I think he really wants something like MADV_FREE. But it's complicated.
Therr is a subtle difference between anonymous and mapped readonly pages: the later can be discarded right away because their contents were read from permanemt storage to begin with. Anonymous pages need to be written to disk first and that is significantly slower.
Random access to large files is a legitimate use case! LMDB [1] uses a similar technique, and it works well for them. But depending on the specific application, explicitly application-managed caching via O_DIRECT IO with something like threaded pread or AIO might be even better, because with this explicit model, you control the cache sizing and eviction policy, and it's certainly possible with application-level knowledge to do better than the generic kernel-level LRU/active/inactive/kinda-sorta-works-heuristic stuff can do without application-specific knowledge.
Another advantage of using application-managed caching is the ability to take advantage of things like huge pages (which can drastically reduce TLB miss rates), whereas with conventional mmap of conventional files, you're limited to regular 4kB (or whatever) small pages and associated management overhead. (There's no reason in principle filesystems can't use huge pages for page cache, but AFAIK, nobody does it yet.)
OTOH, kernel management of page cache allows for better integration of cache eviction with system memory pressure signals and allows for multiple users of a single file to share the memory mirroring the contents of that file.
> Do you think in this case using mmap could in theory introduce performance gains?
It depends. The right approach depends on a lot of factors, including workload and developer complexity budget. It's funny, really: the more experience you get, the less likely you are to say "$SOLUTION is the bestest evar!" and the more often you say "well, it really depends, so I can't give you an answer".
What really strikes me as needless is someone using mmap to read a 10kB ~/.myapplication.lol.ini file or something.
Oh, I missed this response (still a little overwhelmed). I'm so glad I checked again because there is some precious wisdom in there. Thank you for taking the time to write it down. And it indeed seems like I was misusing mmap...well, next time I'll know better!
This is definitely unfortunate, but in my defense I was not aware of Rust's mio (or anything related to Rust beyond its existence) at the time of writing and naming my library. I have no emotional investment in the name, so I'm open to suggestions should anyone take issue with it.
Author here. Long time lurker, but made an an account now.
Wow, I did not expect this. I'm really touched. I wrote this as a small utility for my own consumption because I was unsatisfied with the existing selection at the time, so I'm both surprised and delighted to learn that people are finding it useful. Although to be completely frank, I think this library is way too small and insignificant to deserve a spot on HN's front page, but it definitely made my day. So thank you kind stranger who posted it!
Op here. Thank you for creating mio! Your project clearly deserves the attention. I just found it and thought it would belong here. A lot of people seem to share that opinion :)
I am sure this won't be the last top HN post about one of your projects.
I like and took a look at the github. I think there is actually a missed opportunity here to make it a single header library since there seem to only be 4 or 5 files that go into it.
The scope is small and well defined and so is a welcome break from having to divine murky business requirements, slow iteration speed with microservices, and other every day fatigue.