A Haskell memory leak in way too much detail with Cachegrind

loa_in_ · on Nov 4, 2021

That's an example of a suboptimal program but it's not a memory leak. This memory would get freed eventually.

Otherwise a very nice article. With both Haskell and valgrind being a bit "scary" this is something I'll be coming back to read again.

bojo · on Nov 4, 2021

The example is definitely a memory leak, which in Haskell land means that the un-evaluated expressions keep building up into what is known as a "thunk" in memory. Example expression: ((((0+1)+2)+3)... ad infinitum)

Sure, the memory would eventually get freed in one of two cases: 1) your program crashes due to being OOM, or 2) you finally evaluate the thunk (print it, use it somewhere, etc) and spike your CPU as it is computed. Definitely not good when you are expecting linear performance.

The author fixes this by adding the BangPattern language extension, which allows the expressions passed forward to be evaluated on demand to the minimal expression before producing a value itself (also known as Weak Normal Head Form).

That all said, the more interesting part of the article as how they use cachegrind/valgrind to identify the performance characteristics of the underlying machine code, and give us some tips on what to look for if we are diving into deep level performance issues like their example.

joe_the_user · on Nov 4, 2021

Virtually all "memory leaks" are freed eventually, usually when a program closes.

So when someone talks about a memory, they mean usually stuff allocated one place in a fashion that it will stay around as long as the program lasts (or maybe just unnecessarily long). "memory that's not deleted with you're done with it" is a crude approximation.

The point of the fragment, as far as I can tell, is that if you keep calling it repeated, you will eventually use up memory, which is where it matters. And there are other places it doesn't matter (I was shocked years to find the QT C++ framework doesn't delete any of the objects it creates. Valgrind complains about this but their reasoning is "everything is allocated once, lasts the life of the program, so who cares about deletion". Well, it would be nice just so I could see only the thing I unintentionally keep but they have a point).

setr · on Nov 4, 2021

> Virtually all "memory leaks" are freed eventually, usually when a program closes.

I think that's literally the definition -- it can only be freed upon program death. More specifically, there's no way I could choose to free it, because all aliases have been lost (but that memory is still allocated).

If I create a bunch of intermediate objects unnecessarily while I'm doing some computation, I'm not leaking memory -- I'm just being inefficient. Unnecessary generation of thunks definitely falls in the same place; it's only a leak once I've lost track of it.

Otherwise you're just using memory leak to mean "My objects live longer than I thought", which is an issue probably affecting more than 99% of existing programs.

rovolo · on Nov 5, 2021

You can also define a memory leak as retaining memory outside of the scope its needed. The program lifespan is one of those scopes, but there are also smaller scopes you can leak memory to. An example is the per-request scope on a server.

> Otherwise you're just using memory leak to mean "My objects live longer than I thought", which is an issue probably affecting more than 99% of existing programs.

Yes, it's a class of errors. Big programs always have errors, so it's not necessarily helpful to say "this program has a null-pointer error" or "this program has a memory leak". The scope of the problem depends on the impact caused by the error.

For example, a memory leak during a computation could still bring down your program. Imagine a file uploader which does an md5 computation of a file before uploading the file. If the uploader pulls the entire file into memory for checksumming when it didn't intend to, a large file could cause an OOM error and crash your program. That is a memory leak even though it is scoped to the md5 operation and not the program lifetime.