> And before anyone starts talking about "but most programs don't need random ac...

kerkeslager · on Oct 5, 2024

I am not sure of the architecture of the machines where Lisp originated, but for all the computers I had access to in 1998, the gap between memory speed and virtual memory speed was relevant. So perhaps I should have just said "locality" and not "cache locality".

It's hard for me to imagine any sane architecture where using more memory and having if fragmented rather than contiguous is going to perform as well as using a contiguous block of memory.

And to be clear, I'm not saying there are no upsides. Cons cells are very effective as a minimal yet highly expressive data structure. All I'm saying is that this post by Naggum, and a lot of writing about Common Lisp and other Lisps, pick out the upsides and present them as if they're all that exists, when the reality is that the choices described here are tradeoffs. Sometimes (often, even!) the tradeoffs are worth it. Sometimes they aren't. That doesn't fit the narrative that Lisp is god's gift to McCarthy who then gifted it to humanity, but it is reality.

_19qg · on Oct 5, 2024

> It's hard for me to imagine any sane architecture where using more memory and having if fragmented rather than contiguous is going to perform as well as using a contiguous block of memory.

Yet many of the language runtimes are actually following the Lisp model. Just that it's there not the cons cell with two slots, but arrays and objects with slots. A Java program is a multitude of dynamically allocated objects with one or more slots, where many of the slots contain pointers to other objects. The memory management is done by the runtime (and, usually, its garbage collector).

> Sometimes they aren't. That doesn't fit the narrative that Lisp is god's gift to McCarthy who then gifted it to humanity, but it is reality.

Other than what you might think (and set up as a strawman), that's well known and the evolution of Lisp is also showing how the implementations had to deal with this problem.

The LISP 1 implementation introduced mark&sweep garbage collection (-> McCarthy). As a reaction to it Collins proposed "reference counting" as another way to manage&reclaim allocated memory.

Over time early Lisp programs (Macsyma is an example often mentioned) were problematic on multi-user time-shared machines. One reaction to that was to develop single user workstations for Lisp, where all the memory was available to a single user. There one then used large virtual memory, which still made memory access (for example during garbage collection) time consuming. A GC run of a large heap in VM could run for 30 minutes making the computer mostly unusable during that time. Vectors/Arrays and OOP objects then also used the model of dynamic allocation with many referenced objects (unless they were of some primitive types). Generational GCs, compacting GCs, hardware-supported GCs, etc. were invented and implemented. Full GCs through virtual memory heaps were rare then. The availability of cheaper RAM made it possible that more of the large heaps fit into RAM.

Lisp implementations early on added vectors and other data structures. But they still often need pointers. A vector of records (called structures in Common Lisp) is usually a vector with pointers to records. For example Common Lisp implementations usually will not provide vectors of inline records. A vector of fixnums OTOH will be a vector without pointers -> the fixnums will be stored directly in the vector.

In the end a Lisp heap is a huge graph of objects referencing other objects. This does not only effect CONS cells, but also vectors, arrays, records, ... -> they also reference other, non-primitive, objects via pointers.

That's not very different from any other language runtime which uses dynamic memory allocation and a heap of linked objects (-> Java, JavaScript, ...). The problem of dealing/avoiding non-locality and pointer-chasing is there, too.

kerkeslager · on Oct 6, 2024

This is moving the goalposts. My critique was about list implementation, not about implementing the entire runtime.

> Yet many of the language runtimes are actually following the Lisp model.

This is frankly not true in any way relevant to this conversation. Using garbage collection is not equivalent to using the entire Lisp model. Java, JavaScript, Python, Lua, Ruby, C#, etc. do not make extensive use of linked lists in their standard libraries, instead preferring--you guessed it--vectors. Hashmaps in modern languages are implemented as... vectors again. And if you dig into how structs are implemented in say, C#, they are implemented as contiguous blocks of memory.

Yes, there are some things contiguous blocks of memory can't do, so everyone has to support nested data structures, but by default, most languages push you toward contiguous memory in their standard libraries and idioms whenever possible, because it's just so obviously more performant for the vast majority of cases.

> That's not very different from any other language runtime which uses dynamic memory allocation and a heap of linked objects (-> Java, JavaScript, ...). The problem of dealing/avoiding non-locality and pointer-chasing is there, too.

Yes, and my point is that having cons cells as a ubiquitous data structure in your language greatly exacerbates this problem. The extremely common case of lists does not need to entail pointer chasing, and in the vast majority of cases it should not entail pointer chasing, and contrary to your claims here, in most languages it does not entail pointer chasing. Lisp-family languages are fairly unique in this problem. Ironically, higher order functions like map/reduce/etc. push you toward operating on lists as a whole which is exactly the case where vectors most outshine linked lists.

_19qg · on Oct 6, 2024

> My critique was about list implementation, not about implementing the entire runtime.

Linked lists are actually managed by the runtime.

Example on a esoteric platform, a Lisp Machine, a real, but outdated, computer. But we still have Virtual Lisp Machines, which have a virtualized implementation (not as in a microcoded CPU, as the real ones were).

If I cons a list via CONS, I get a linked list made of cons cells. But over the lifetime of the list, this may change. At some point in time the runtime may decide to copy the list (for example because it has a copying garbage collector). The effect then is that the list is allocated in contiguous memory. The program itself does not see a difference.

First the list (1 2 3) is allocated as [1 | -> [2 | -> [3 | NIL]]], written in Lisp as (1 . (2 . (3 . NIL))) . At some point in time the runtime changes this transparently to [1][2][3], where the cells are tagged, such that the successor is in the next memory word and where the last element is tagged that it has no successor. That's called cdr-coding.

Now If I copy a list, with COPY-LIST I always get a cdr-coded list as a result. Several list functions return cdr-coded lists, instead of linked lists.

This cdr-coding method is not much used anymore, since there is not much of an advantage: either one uses linked lists (because they are easy to use and have useful properties, like that CONS does not need to copy its arguments) or other available data structures (vectors, structures, ...).

What is still used are locality improving garbage collectors.

> Using garbage collection is not equivalent to using the entire Lisp model. Java, JavaScript, Python, Lua, Ruby, C#, etc. do not make extensive use of linked lists in their standard libraries, instead preferring--you guessed it--vectors.

Lisp also makes use of vectors, where necessary. These languages with managed memory like Java have the same object/reference model as Lisp and the same calling mechanisms. linked lists are only a special case (-> java.util.LinkedList).

Say we have N cities. Each city is a an object of class CITY. Each city will have attributes, for example NAME and PEOPLE-NUMBER. Now we want to keep two "lists" of them one sorted by name and another one sorted by PEOPLE-NUMBER.

We need two vectors and N city objects. In Lisp the city objects are not stored in the vectors. They are stored somewhere in the heap. -> there goes your "contiguous memory" to bust. The vectors are pointers into a heap. Every access to the nth city object through one of the nicely contiguous vectors, references an object which is stored somewhere else.

That model is used in many languages implementations. It's the Lisp model of memory management, introduced with Lisp 1 -> dynamic memory management plus a garbage collector to reclaim unused space.

> And if you dig into how structs are implemented in say, C#, they are implemented as contiguous blocks of memory.

That's also the case in Lisp. But a struct (or list or vector) of structs usually points to those being somewhere on the heap. A struct/list/vector of structs is not a single contiguous block of memory. structs in slots are typically not inlined. Some languages inline data non-primitive structures, Lisp usually does not.

> contiguous blocks of memory

That's can be an illusion. In a virtual memory system, the memory is made from pages of a fixed size. Random pages are cached in RAM, influenced by their usage pattern over time.

> in most languages it does not entail pointer chasing

It does, see above. Just not tail pointers in the list.

> Ironically, higher order functions like map/reduce/etc. push you toward operating on lists as a whole which is exactly the case where vectors most outshine linked lists.

That's why in Common Lisp the functions MAP and REDUCE work over vectors, too.

> Lisp-family languages are fairly unique in this problem.

see Prolog and various functional languages, ... some even have basic strings implemented as linked lists of characters (which Common Lisp does not do).

If we look at data structures, one tries do address quite a bit more than "contiguous memory", for example persistence, runtime updates and runtime lookups, ... see for examples: https://cstheory.stackexchange.com/a/1550

kerkeslager · on Oct 8, 2024

> Lisp also makes use of vectors, where necessary. These languages with managed memory like Java have the same object/reference model as Lisp and the same calling mechanisms. linked lists are only a special case (-> java.util.LinkedList).

You seem insistent on missing my point.

"Where necessary?" is almost always, because vectors almost always outperform linked lists based on cons cells. If your program uses cons cells at all, chances are you made the wrong choice. Lisp doesn't make use of vectors in many cases "where necessary" because "where necessary" is a whole lot more than Lisp's structure, libraries, and community encourage you to do.

Simply pointing out that java.util contains a Linked List implementation means nothing. I wrote Java full time professionally for ~5 years and never saw that used once. If Python or C# ship with linked list implementations I'm not aware of them because they're also never used.

To be frank, if your program uses linked lists at all, it's probably a mistake from a performance perspective. There simply aren't many cases where a linked list is a better choice than a vector. And this is a mistake that Lisp programs make all the time because cons cells are so baked into the language.

> > in most languages it does not entail pointer chasing

> It does, see above. Just not tail pointers in the list.

It is both dishonest and rude to not even quote a full sentence when you're trying to refute something.

Tail pointers on the list being avoided is exactly my point, a fact which you obfuscated by removing context. That is a huge amount of added cache thrashing, even with cache-aware allocation. This is not something you can just pretend is a minor issue.

Are you even capable of admitting that Lisp has serious faults, or do you really think it's completely, 100% perfect? This isn't a rhetorical question: I legitimately would like to hear what serious faults you think Lisp has, because if you can't present any problems with Lisp at all, then you simply aren't capable of participating usefully in conversations about Lisp. Lisp is a human invention which includes human mistakes, and if you can't admit that, anything you say about it is going to suffer from crippling bias and can't be trusted.

And to be clear, I'm not interested in further whataboutism. If your idea of a problem with Lisp is something where you can end with "but every other language has this problem too", stop--that's just further proving your inability to criticize Lisp. I'm happy to describe serious problems unique to any of the languages I've mentioned in this thread, but that's not the current topic of discussion.

Pointer chasing in linked lists is a cause of performance problems in Lisp and it isn't a cause of performance problems in other languages. No, not for retrieving each individual item, I'm talking about moving from one item to the next. Every single thing you've said in this thread has been trying to move the goalposts to make it seem like this isn't a problem or it isn't unique to Lisp. It is a problem. It is unique to Lisp(s).

_19qg · on Oct 8, 2024

> "Where necessary?" is almost always, because vectors almost always outperform linked lists based on cons cells.

Let's see, I have here a Common Lisp compiling to machine code, here non-optimized generic Lisp code:

I'm allocating n elements list/vector of vectors and reverse it p times. reversing allocates a new list/vector.

    (defun test1 (n p)
      (let ((v (make-array n)))
        (map-into v
                  (lambda ()
                    (make-array 10 :initial-element 1000)))
        (loop repeat p
                do (setf v (reverse v)))
        v))

    (defun test2 (n p)
      (let ((l (make-list n)))
        (map-into l
                  (lambda ()
                    (make-array 10 :initial-element 1000)))
        (loop repeat p
                do (setf l (reverse l)))
        l))

    CL-USER 15 > (time (test1 10000 100))
    Timing the evaluation of (TEST1 10000 100)

    User time    =        0.005
    System time  =        0.000
    Elapsed time =        0.003
    Allocation   = 9088696 bytes
    1 Page faults
    GC time      =        0.000
    #(# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # ...)

    CL-USER 16 > (time (test2 10000 100))
    Timing the evaluation of (TEST2 10000 100)

    User time    =        0.008
    System time  =        0.001
    Elapsed time =        0.006
    Allocation   = 17113288 bytes
    842 Page faults
    GC time      =        0.003
    (# # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # # ...)

The list or vector is 10000 elements long and 100 times reversed and the difference in runtime is tiny. The list version is only marginally slower, but allocates a lot more memory.

I can also construct cases, where the list is much faster. Adding a one element list/vector to the front:

    CL-USER 22 > (time (test1 10000 100))
    Timing the evaluation of (TEST1 10000 100)

    User time    =        0.016
    System time  =        0.000
    Elapsed time =        0.012
    Allocation   = 9148776 bytes
    0 Page faults
    GC time      =        0.000
    #(A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A ...)

    CL-USER 26 > (time (test2 10000 100))
    Timing the evaluation of (TEST2 10000 100)

    User time    =        0.000
    System time  =        0.000
    Elapsed time =        0.000
    Allocation   = 1042248 bytes
    0 Page faults
    GC time      =        0.000
    (A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A ...)

Here the runtime for lists is much faster, since it does not copy the complete list or -> lists share storage, consing to the front is cheap.

-> for many practical use cases for lists the total runtime and the runtime difference does not matter

Obviously there are also many case where a list would perform much worse, like accessing the last element of a list.

> Are you even capable of admitting that Lisp has serious faults

You seem to be keen to find "faults". Usually things are not black and white. There is no true and false, but engineering trade-offs. These tradeoffs can Lisp make the wrong choice for problems. Many of the trade-offs are also influenced by non-technical issues: like the amount of engineering time invested into a language implementation: see SUN/Oracles investment into Java.

kerkeslager · on Oct 8, 2024

So, you did tests to show that linked lists are slower AND use more memory? Which we both already knew? Why?

> I can also construct cases, where the list is much faster. Adding a one element list/vector to the front:

Oh good, I'll keep that in mind for the next time I am forced to prepend a bunch of items to a list, which I expect to happen approximately 0 times before I die of old age.

> > Are you even capable of admitting that Lisp has serious faults

> You seem to be keen to find "faults". Usually things are not black and white. There is no true and false, but engineering trade-offs. These tradeoffs can Lisp make the wrong choice for problems. Many of the trade-offs are also influenced by non-technical issues: like the amount of engineering time invested into a language implementation: see SUN/Oracles investment into Java.

So... what I'm hearing here is, that no, you aren't capable of admitting that Lisp has faults.

I'm aware that tradeoffs exist. I'm also aware that faults exist. These two can both be true.

It seems to me that being slower and using more memory and getting nothing in return isn't a tradeoff.

_19qg · on Oct 8, 2024

> So, you did tests to show that linked lists are slower AND use more memory? Which we both already knew? Why?

I showed you that it often does not matter much.

> Oh good, I'll keep that in mind for the next time I am forced to prepend a bunch of items to a list, which I expect to happen approximately 0 times before I die of old age.

It's not that uncommon, for example for a simple stack implementation.

> So... what I'm hearing here is, that no, you aren't capable of admitting that Lisp has faults.

With this discussion style you won't get very far.

> It seems to me that being slower and using more memory and getting nothing in return isn't a tradeoff.

Right, I think that's a useful task for you to actually think about: what do we get in return?

Why does Haskell use linked lists?

Why does INRIA's OCAML ( https://ocamlbook.org/lists-and-structural-recursion/ ) ?

Why does Microsoft's F# ( https://learn.microsoft.com/en-us/dotnet/fsharp/language-ref... ) ?

Why does Ericsson's Erlang ( https://www.erlang.org/doc/system/listhandling.html ) ?

kerkeslager · on Oct 9, 2024

> I showed you that it often does not matter much.

...which we also both already knew.

The thing is, it does matter sometimes; often enough that there's no good reason to take the less performant route by default.

> > Oh good, I'll keep that in mind for the next time I am forced to prepend a bunch of items to a list, which I expect to happen approximately 0 times before I die of old age.

> It's not that uncommon, for example for a simple stack implementation.

1. Does Lisp not come with a stack implementation? In general, I do not plan to implement simple stacks.

2. Pushing items onto a stack isn't inherently a prepend operation. It's generally a "put it on an end" operation, meaning you could append it. Now, with a linked list that would be slower, so obviously you prepend if you're using a linked list, but if you're using a vector, you append. And, lo and behold, appending-vector-based stack performs better than a prepending-linked-list-based stack, because cache. So if I do implement a simple stack, I implement it with a vector (example[1]).

I'm a professional. I'm not writing high school CS level demo programs. If I need a stack, I'm generally using the standard implementation that ships with the language, and if the language doesn't have one, I implement a stack with a vector because there's no upside to implementing it as a linked list. So no, it's not common to prepend to a list, and unless your language is intended for writing high school CS level programs, it probably shouldn't prominently feature linked lists as a normal way to do things, and if it does, that's a problem. Not a tradeoff, because you're not getting anything in return. A problem.

> > So... what I'm hearing here is, that no, you aren't capable of admitting that Lisp has faults.

> With this discussion style you won't get very far.

And how far has your jumping in to defend Lisp and never admitting it has any faults gotten you?

> Right, I think that's a useful task for you to actually think about: what do we get in return?

Nothing I need or want.

Yes, I know the answer you're fishing for is "immutability" but the fact is if you're operating on lists as a whole, i.e. your map/reduce/etc. type operations, you can achieve immutability with vectors, and it will be more performant when you do so.

> Why does Haskell use linked lists?

> Why does INRIA's OCAML ( https://ocamlbook.org/lists-and-structural-recursion/ ) ?

> Why does Microsoft's F# ( https://learn.microsoft.com/en-us/dotnet/fsharp/language-ref... ) ?

> Why does Ericsson's Erlang ( https://www.erlang.org/doc/system/listhandling.html ) ?

Because all these languages are wearing the same FP blinders as Lisp.

Why aren't any of these languages popular? ;)

[1] https://craftinginterpreters.com/a-virtual-machine.html#the-...

kazinator · on Oct 10, 2024

It took me about a minute to find an instance of list linking in Java sources (OpenJDK):

  class JVMCIRuntime: public CHeapObj<mtJVMCI> {
    ../
    JVMCIRuntime* _next;

The Linux kernel has list linking all over the place. Userland libraries and utilities ditto.

I don't think I've ever worked on a C++ or C project that had no linked lists anywhere. I'm trying to think of an example but coming up blank.

Quite often but not always linked lists in C are intrusive. Certain object are going to be put into exactly one list. Or more than my list but only one at a time. So the link pointer or pointers are embedded into that object. And so no memory allocation is required to insert those objects into a list. A classic example of this is putting a process structure onto a scheduling queue. You would never want a vector for this, for reasons that a professional such as yourself should understand.

kerkeslager · on Oct 15, 2024

> Quite often but not always linked lists in C are intrusive.

Yes, and that's a significantly different data structure from a cons-cell based list, so I don't think it's particularly relevant. C and C++ certainly aren't examples of one of the languages lispm is claiming uses the Lisp model.

> And so no memory allocation is required to insert those objects into a list.

This is a pretty key distinction which doesn't apply to the linked list objects I'm talking about.

_19qg · on Oct 10, 2024

I haven't looked at Linux kernel sources, but I read about this "linked list" usage in the kernel. Embedded link pointers is also a good point.

Do you happen to have a good example for it?

kerkeslager · on Oct 15, 2024

Let's not pretend this proves your point in any way. Intrusive lists in a manually managed memory system are incomparable to the cons cell based linked lists in a garbage collected system which you're defending. This is particularly true when the intrustive lists require no memory allocation.

_19qg · on Oct 9, 2024

> Pushing items onto a stack isn't inherently a prepend operation.

a stack has no idea of "prepend" or similar. It's defined by abstract like operations PUSH, POP, IS-EMPTY, MAKE-STACK. How it is implemented is an implementation detail.

> It's generally a "put it on an end" operation,

Which you made up. There is nothing "generally" like that.

> meaning you could append it. Now, with a linked list that would be slower,

and to append would be unnecessary.

> So if I do implement a simple stack, I implement it with a vector (example[1]).

That's a simple "bounded" & "fixed size" stack.

> I'm a professional.

Okay. ;)

> Yes, I know the answer you're fishing for is "immutability"

That would be one answer. Can you think of more?

kerkeslager · on Oct 15, 2024

> > Pushing items onto a stack isn't inherently a prepend operation.

> a stack has no idea of "prepend" or similar. It's defined by abstract like operations PUSH, POP, IS-EMPTY, MAKE-STACK. How it is implemented is an implementation detail.

You say that as if the fact that it's an implementation detail means it doesn't matter.

Implementation details do, in fact, matter.

> > It's generally a "put it on an end" operation,

> Which you made up. There is nothing "generally" like that.

So you're saying that stacks don't have a "push" operation? Or that you don't understand what "push" does?

> > meaning you could append it. Now, with a linked list that would be slower,

> and to append would be unnecessary.

Literally picking and quoting parts of two sentences out of context to state the obvious as if it's a response to anything I said in context is a new low even for you.

> > So if I do implement a simple stack, I implement it with a vector (example[1]).

> That's a simple "bounded" & "fixed size" stack.

No, it's not. Are you aware that vectors can be resized?

> > Yes, I know the answer you're fishing for is "immutability"

> That would be one answer. Can you think of more?

I'm not going to make your argument for you. If you have something to say, say it.

_19qg · on Oct 4, 2024

Lisp programs often needed large amounts of virtual memory. Factor ten (VM size vs. RAM size) was not unheard. RAM was basically a cache for that.