Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Usefulness of mnesia, the Erlang built-in database (erlang.org)
140 points by motiejus on Oct 14, 2015 | hide | past | favorite | 36 comments


It is what it is. It's useful to prototype with in Erlang. It may be useful to ship with. If Mnesia turns out not to fit your problem, here in 2015, you've got literally dozens of choices of alternate DB, with all sorts of consistency and distributibility and performance characteristics.

My guess is that if somehow Erlang was where it was in 2015 except it didn't have Mnesia, nobody would really perceive much of a hole there, and nobody would write it, because of the database explosion we've seen in the past 10 years. But it is there, and if it works for you, go for it.

My only slight suggestion is that rather than inlining all your mnesia calls, you ought to isolate them into a separate module or modules or something with an interface. But, that's not really because of Mnesia... I tend to recommend that anyhow. In pattern terms, I pretty much always recommend wrapping a Facade around your data store access, if for no other reason than how easy it makes your testing if you can drop in alternate implementations. And then, if mnesia... no, wait... And then, if $DATABASE turns out to be unsuitable, you're not stuck up a creek without a paddle. With this approach it's not even all that hard to try out multiple alternatives before settling on something.


mnesia is great in isolation with bounded datasets. erlang is great for operational observability of both the data flow and your application. they go great together, sometimes, if you know all your requirements up front.

It's always good to keep in mind erlang+mensia were designed to run in a "network" consisting of a unified blade chassis (each blade is a new node, and the network is a physical backplane). So, things like network partitions around erlang distribution and remote mensia tables don't have easy/proper error recovery strategies.

Even with unbounded datasets, mensia is great if you devote an engineering team to managing the scaling of it because amnesia can't do simple things like move a DB from one node to another without semi-advanced erlang+mnesia knowledge. But, whatsapp was (is?) 100% mensia last I recall, even though they had to reengineer some of it: http://www.erlang-factory.com/upload/presentations/558/efsf2...

But, mensia still relies on dets and dets still, in 2015, has a 2 GB max file size. If your data grows beyond 2 GB, you have to do mnesia fragmentation which is just an operational burden.

My scaling thoughts tend towards: sqlite -> postgres -> riak


If you can fit all your data in ram, you can use disk_copies, which doesn't use sets. schema is always disk_only_copies, but if it gets to be 2gb, wow!

Edit to add: Adding nodes isn't that hard? Connect to dist, add to extra copies of schema table, then the tables that you want. More complex if you're trying to merge together different sets of tables into one schema though.


Edit to add: Adding nodes isn't that hard?

Adding nodes isn't hard, but try restoring a mnesia table from service@node1 to service@node2. You have to do something like http://stackoverflow.com/questions/463400/how-to-rename-the-...

It's not a great solution for modern AWS-style API-driven-deployment operational models.


Oh yeah... I would handle that by having node1 and node2 running at the same time, add copies to node2, delete copies from node1, and done; but that requires both nodes to be able to run at the time (which sounds like it wasn't really an option in the stack overflow post)


>My only slight suggestion is that rather than inlining all your mnesia calls, you ought to isolate them into a separate module or modules or something with an interface. But, that's not really because of Mnesia... I tend to recommend that anyhow.

+1 also nice to have a module per-table because you can have a behaviour for your mnesia tables so that things like initializing tables are standardized (i.e. they all export `init([node()]) -> ok | error').


Yes. To explain to others, it's easy to miss how dynamic Erlang is if you don't study the syntax, but generally, where ever you see a module or a function, you're actually using atoms, and they can be freely substituted by variables containing atoms.

    Eshell V5.10.1  (abort with ^G)
    1> lists:reverse([1,2,3]).
    [3,2,1]
    2> L = lists.
    lists
    3> R = reverse.
    reverse
    4> L:R([1,2,3]).
    [3,2,1]
This makes it easy to iterate through lists of modules, calling an init function on them, or generally treating modules generically. It is, I think, perhaps less powerful than Python's capabilities, but it can still take you a long ways, and save you a lot of code.


I'm the Fred mentioned in the quoted text. As mentioned by FLGMwt, Mnesia being a 90s database was an offhand remark done at a workshop in a discussion doing a break.

The reasons I considered it a DB of the 90s is that back then, it could have been state of the art, but by today's standards, under its current form, it makes sense to be used mostly on fixed cluster sizes with reliable networks and a fairly stable topology.

Any fancier cases and you start requiring to dive into the internals when it comes to coming back up from failures, partitions, requiring repairs, and so on. The DB has 3 standard backends: all in RAM, all on Disk (with a 2GB limit), or as a log-copy (infinite disk size, but also bound by memory size).

That ends up leaving you with a DB that has a need for its whole dataset to fit in memory, supports distributed transactions but can't deal with network failures well out of the box (you need something like https://github.com/uwiger/unsplit)

Mnesia gaining new backends (Klarna is currently open-sourcing code for an experimental postgres backend and are using a leveldb one) would fix a lot of issues as a single-node DB, but another overhaul would be required for the rest.

The problem I see is that it was a very cool database back then, but it started lagging behind for a long while and now it has to play a catch-up game. Its model and Erlang interface is still extremely nice and I wish it made more sense to use in production without committing to learning its internals in case of troubles.


This has little to do with databases, erlang or mnesia, its just a moan against people writing ad tech.

mnesia is a database for the 90's because it was written by smart people in the 80's and like most of the rest of the otp stack was fairly under used or maintained.

I have a huge amount of respect for Klacke and the original authors behind a lot of this tech, however the erlang community that followed seems to suffer some cognitive dissonance around what problems it solves and how well they are doing them. It would be hard to pick a database less suitable for SMB use than a domain specific database in a niche ecosystem.


I have much the same sentiment with SQLite. Much dismissed as a toy database, but absolutely appropriate for 99% of my clients - small and medium businesses, the same as mentioned in thread.


I was working at a place that had decided that they needed "big data tools" and part of the team was developing a Hadoop data cluster and myriad of other tools (Tableau, Reddis cache etc).

I put all the campaign (it was an ad tech company) data that we actually used to alter bidding behaviour and it could fit into a sqlite3 DB that was < 5MB. Using python the sqlite DB worked like a charm.


People like to believe that they have "big problems" and that they need large scale solutions. If they ever manage to get it working correctly they now have big system administration problems on top of their existing problem. Where as doing backups, creating staging/testing environments or moving servers can be done with cp or scp when running SQLite.

Agreeably if you pick SQLite and ever manage to reach a scale where it's no longer an appropriate solution, then you will have more trouble migrating of it.


I would say that SQLite transitions quite nicely, as far as transitions go, into a client-server SQL RDBMS.


> People like to believe that they have "big problems" and that they need large scale solutions.

People like to believe that they will have big problems and will need large scale solutions, and often make the call to plan accordingly. Decision-making for this sort of thing is a tightrope walk across a bunch of unknowables affecting a bunch of trade-offs.


SQLite is amazing. It should definitely get more popular - it gives you relational DB features while still storing everything in a file. Which means portability, performance, and no bloating the entire OS with another database driver / server.


It's extremely popular: it's on pretty much every smartphone, IIRC.


apple's native graph database called "core data" is also built on sqlite.

mnesia lets you decide on a per table basis - where should be stored:

    1) ram only
    2) disk only
    3) disk copies ( cached, and persisted stored to disk)
as well the option of how it should be accessed in a cluster

    1) replicated (to a set of nodes you want)
    2) or location transparent ( accessible from any node you want )
i doubt any dbms gives you this kind of distributed systems friendly primitives.

my only peeve against mnesia is corruption. if you change these nodes, or one of them are down, or you call it with the wrong list of nodes ( eg: you may now need a distribtued store like zk just to get this list of nodes right), and many different things can go wrong which is why most folks use it for idempotent store. something that can be recreated if needed, and then used to great effect - but never as "the" persistent store. additionally there are limits on the size of these tables, and workarounds.

once bootstrapped - it works like a dream. mnesia is still my go to dbs for a distributed cache or router for actors.

~B


I recall a piece recently where someone asserted it's the most widely-deployed software in the world, largely for this reason.


It shows up in all sorts of places. It's completely eclipsed the BerkeleyDB/DBM type of databases in most of their use cases. The main thing is that it's usually embedded into something else like say Subversion.


I like a lot of things about sqlite, but I don't like the fact that it doesn't enforce types.


That would be nice, at least as an option.

Theoretically, one could use triggers to enforce types. Not sure if that is worth the hassle, though.


Many valuable considerations inside, read the post. Starting with this thought in the question: 'I hate transiting syntactic boundaries when I'm programming'.

But the answer is such a broader evaluation of the utility of the tools we are using, related to what we use those in.

And some rants that I share: "but really boil down to adtech, adtech, adtech, adtech, and some more adtech, and marketing campaigns about campaigns about adtech."


Erlang (and by association Elixir) tooling has a nice progressive approach to managing state.

Agent -> ets -> dets -> mnesia -> riak (or sql tooling etc.)

(Agent http://elixir-lang.org/docs/v1.1/elixir/Agent.html is just a state-holding process. Erlang folks can probably write one of these in their sleep, Elixir added a bit of wrapping-paper around it.)

If you're writing an app, I think it's best to be storage-agnostic from the get-go. You shouldn't be building up queries in your core app code- push it to the edge of your code, because otherwise it's not separating concerns. All your app (business logic) code should delegate to some wrapper to work out the specifics of retrieving the data; your app code should just be calling something like Modelname.specific_function_returning_specific_dataset(relevant_identifier) and let that work out the details. That way, if you ever upgrade your store, you just have to refactor those queries but your app code remains the same. On top of that, in your unit tests you can pass in a mimicking test double for your store to do a true unit test, and avoid retesting your store over and over again wastefully. (You'd still of course have an integration test to cover that, but it wouldn't be doing it on every test.)


Not that this hasn't generated good talking points, but I was there at the workshop the OP mentioned and Fred's remark was very much said in passing to a small group of people during a break and he didn't seem to be making an intentionally negative remark. He certainly wasn't stating it as instructional fact.


Well, you could summarise this article by saying: "Just because something got invented 25 years ago it doesn't mean it is useless". On the contrary - it is worth taking a look on technology that survived 25 years in the wildness.


The article is light on actual details and heavy on rants about the current state of products built on the web.

However there is 1 thing that mnesia got absolutely and totally right. Database schema upgrades. You can create an mnesia database and upgrade it's schema on the fly as a part of it's operation without once bringing it down or running a script. I did this[0] for a toy project I did in erlang once that I unfortunately never finished since the need for it disappeared.

[0]: https://github.com/zaphar/iterate/blob/master/src/db_migrate...


What? Mnesia is not fully-webscale graph-db so it's not useful in the 2010s?? Didn't get much from this piece.


By the same argument, though, why not just use Postgres? And I write that as a fan of Erlang. Indeed: https://github.com/epgsql/epgsql


I can't say I found the piece particularly insightful.

It seemed to imply that mnesia is the DB of the future as soon as everyone realises that everything they are doing is completely wrong and they should be doing things that are more suited to mnesia. Without saying what those things are.

I actually found one of the child comments [1] was pushing in a better direction. Essentially, the vast drop in $/TB of storage means that persistence of time series/ event type data is practical for the masses now. Sure it's found a niche in ads on the web, but it has much wider applicability than that. I personally think that Erlang is particularly well suited to this space.

[1] http://erlang.org/pipermail/erlang-questions/2015-October/08...


The reason why the comment is insightful is that suddenly Erlang has found itself in the crosshairs of an industry that it is particularly useful for (ad-tech) but there are many more applications that have nothing to do with adtech which erlang is also very well suited for. So for adtech to start pushing the direction in which erlang/mnesia evolve would be wrong.

As for the drop in $/TB of storage, that drop has been steadily going on since the 70's of the previous century and for a very large chunk of that period Erlang/mnesia have been adapting bit by bit to take advantage of that price-decrease.

Persistence of time series and event type data does not (usually) require the kind of storage solution that mnesia offers, a much simpler storage medium would probably suffice and/or be a better choice for that application anyway, storing such data without further processing in a relational database is a bit of a cop-out.

Most of those applications could benefit from digesting and compression which is something you're not going to easily retro-fit onto an existing database.


> Persistence of time series and event type data does not (usually) require the kind of storage solution that mnesia offers

Indeed. That's precisely my point (just stated better than me). Erlang looks like being a very good fit for a large class of applications of this form. mnesia, on the other hand, is not well suited as the persistent storage for them. It would follow then that it could make sense for the "next gen" storage for Erlang to be more in line with this sort of thinking in the way you describe.

As to the drop in $/TB, it's been pretty steadily exponential. But the cost of consuming and analysing TB of data has only really been practical for non-specialist houses for 5-10 years. When you're in the low GB range then a static data picture makes sense. When you're able to play with TB of data then a dynamic picture makes sense and Erlang really shines.


> This is much more interesting than chasing click statistics in the interest of brokering ad sales at the speed of light so that users can continue to ignore them.

That comment really packs a punch and should get much wider visibility. Ad tech and related software is where way too much of our collective efforts are going.


> instead of shelling out for yet another upgrade to a spreadsheet program that will be used to do exactly what they would have done with a spreadsheet in the 80's -- or even funnier, for a version of that spreadsheet that still does the same things, but slower, in a browser

Oh yes. This whole e-mail is just golden.


Because it's one of the most important sources of funding for tech companies (justified or not).


Not really. There is a niche of social companies (twitter,facebook,google search engine) that make money from ads. There is a niche of media companies (youtube) that make money from ads. There is a nice of startups which make money from Funding/IPOs. These niches are a focus of HN and get talked about way out of proportion of their representational percentage.

Most most tech companies make products and get funding from sales (netflix, amazon, apple, ibm, microsoft, 99% of the Enterprise and 99% of small-medium market).


And it became so probably because it's easiest - you can build whatever crap you want, slap some ads on top and earn $$$. In time, the whole industry learned that it's not even worth to improve the actual utility of the product - focusing on sexiness and addictiviness is better, because it drives even more ad views.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: