> Too many compilers classes get bogged down in grammar classifications and pars...

slaymaker1907 · on Feb 23, 2021

I agree about XML, but I don't think JSON is a reinvention of S-expression. JSON is great if you are representing a lot of (string) key/value data since it has a canonical way of representing string keyed dictionaries. I guess you can represent these as a list of pairs, but that obviously is ambiguous to a a list of pairs... JSON doesn't have this problem since if you see {"name": ...} you know that you are reading an object which has a 1-1 mapping to a dictionary with string keys.

While this seems like an innocent change, it simplifies serialization a lot. A generic S-expression deserializier can't make nearly as many assumptions as a JSON deserializer can.

S-expressions have first class support for primitives (i.e. strings, numbers, etc.) and lists but nothing else. On the other hand, JSON has first class support for the same plus string keyed dictionaries. It is no coincidence that most other generic serialization formats have included at least support for string dictionaries if not other key types as well.

lisper · on Feb 23, 2021

Sure, but 1) s-expressions also have native symbols whereas JSON has only strings, and 2) it is trivial to extend standard s-expression syntax to include a native dictionary serialization, and to extend existing s-expression parsers to parse that extension. It is much, much harder to add symbols to JSON. JSON is also very profligate with its use of punctuation, with tons of unnecessary commas and colons all over the place. So I stand by my initial characterization.

Also, Common Lisp s-expressions include a standard syntax for structures, which are essentially dictionaries. Extending that to a generic dictionary (using, say, #D(key value key value ...), which is actually available) takes about three lines of code.

munificent · on Feb 23, 2021

> 1) s-expressions also have native symbols whereas JSON has only strings,

This is inconsistent with the claim that s-exprs are better because of their simplicity. Having two very similar string-like things is unnecessarily complex for little (no?) benefit in return.

2) > it is trivial to extend standard s-expression syntax to include a native dictionary serialization

Sure, and when you do you get something at about the level of complexity of JSON, so it's not clear what the value proposition is here.

> JSON is also very profligate with its use of punctuation, with tons of unnecessary commas and colons all over the place. So I stand by my initial characterization.

JSON uses a richer set of delimiters, which provides:

1. Greater brevity. Basic information theory says that it takes fewer characters to encode a given piece of data if your character set is larger.

2. Some level of redundancy. This isn't necessary for data transmission for JSON, but it does make it much easier for parsers to provide good localized error reporting.

3. Easier visual parsing. Most humans have functional eyeballs connected to optical processing neurons dedicated to detecting different shapes. Using characters with a greater variety of shapes to encode structure takes advantage of that.

Don't get me wrong, I like Lisps and s-exprs. But it seems like any time any notation comes up for discussion, a lisper appears to claim how s-exprs are clearly superior. This despite the fact that that notation is decades older than almost every other syntax out there and yet still lost the popularity contest.

tikhonj · on Feb 24, 2021

I mean, greater brevity in theory, but having used both s-expressions and JSON for config files, I can tell you that the s-expression version ends up noticeably less verbose and noisy in practice.

> This despite the fact that that notation is decades older than almost every other syntax out there and yet still lost the popularity contest.

Beyond some minimal level, quality and popularity aren't all that heavily correlated; something not becoming popular tells us almost nothing about it.

tomjakubowski · on Feb 24, 2021

> Having two very similar string-like things is unnecessarily complex for little (no?) benefit in return.

Using symbols (or :keywords) and strings together offers many benefits:

1. Like with JSON's richer syntax for aggregates, omitting the "" makes for much easier visual parsing. Especially if symbols are usually used for keys.

2. Symbols may be suitable for interning, perhaps into a set used to validate input data. Strings usually aren't.

3. Most importantly, with this distinction, symbols can be imbued with particular semantics: they can be commands, references, lookup keys, whatever the program needs. Strings are just data, usually to be displayed somewhere to be read by someone, or to be parsed by something nasty like an interpreter.

The most human-readable data notation I know of is EDN. It combines the best of JSON and s-expressions.

dragonwriter · on Feb 23, 2021

> s-expressions also have native symbols whereas JSON has only strings

I'm not sure “native symbols” are a good thing in an interchange format. If you are serializing constructs from a language (Lisp, Erlang, Ruby) where that's a fundamental type, sure, it's convenient, but largely that’s a language implementation detail, from an interchange perspective there's not a lot of reason to distinguish symbols from (possibly format-constrained) strings. That they are immutable (in languages where strings are either wise mutable) and/or interned doesn't mean anything at the interchange level.

lisper · on Feb 24, 2021

> I'm not sure “native symbols” are a good thing in an interchange format.

"Try", "writing", "a", "sentence", "in", "JSON", "syntax", "some", "time", "and", "I", "guarantee", "you", "will", "change", "your", "mind"

deckard1 · on Feb 23, 2021

> 1) s-expressions also have native symbols whereas JSON has only strings

The only standardization effort of generalized s-expressions that I know of is http://people.csail.mit.edu/rivest/Sexp.txt, which makes strings and symbols/tokens equivalent.

Most languages do not have a runtime symbol table or the concept of a symbol as a data atom. In languages outside Lisp, what would a symbol even mean?

> extend existing s-expression parsers to parse that extension. It is much, much harder to add symbols to JSON

This sounds like a form of "No True Scotsman" argument. If you're extending S-Exp parsing via Lisp, then you can extend JSON parsing too. Once you add code, then it's not a format. It's whatever you want it to be.

> Extending that to a generic dictionary (using, say, #D(key value key value ...), which is actually available) takes about three lines of code.

Three lines of code in what language? C? C++? Java? It should be obvious you're conflating two things here. Generalized formats vs. a full Lisp ecosystem.

touisteur · on Feb 23, 2021

And you need the gazillion things on top of a grammar. Error recovery, IDE, language server, formatter, all the memory bugs, all the semantics that's not described in a formal language (and you can't auto-generate code to check or request it), all the binding generators for other languages and linking conventions... Ugh I wish we had a standard tool for that, like a much much much expanded yacc + interpreter generator + batteries included... There's something with libadalang and its language-agnostic tooling langkit (https://github.com/AdaCore/lang kit) that is very seducing (write once generate much code).

touisteur · on Feb 23, 2021

Previous discussion on automatic tool chain + interpreter generation : https://news.ycombinator.com/item?id=21917927

mingodad · on Feb 24, 2021

I think that the now defunct dotGNU http://www.dotgnu.org/ was trying to do it, one of the nice things it left us is libjit https://www.gnu.org/software/libjit/

touisteur · on March 1, 2021

Sorry I'm trying to find out more but the project is a bit confusing. I found http://www.dotgnu.org/treecc_essay.html that looks of historical value (nowadays using java code generation / bytecode engineering quite often with amazing success). And of course libjit, but I'm at a loss finding out about semantic attributes and functions & typechecking and others.

peteretep · on Feb 24, 2021

> Two examples: XML and JSON are bad re-inventions of S-expressions.

What? This sounds like you read something that was true but you’ve misunderstood and regurgitated.

XML is absolutely S-expressions, JSON is absolutely not. How in the heck is JSON a reinvention of an S-expression?

lisper · on Feb 24, 2021

How is it not? Both are serialization formats for trees with strings and numbers at the leaves.

peteretep · on Feb 24, 2021

Is there a serialization format you don't consider to be a bad re-invention of S-expressions?

lisper · on Feb 24, 2021

Of course. There are many. Any binary format. Any ASN.1 format. DEF and LEF for hardware descriptions. The output of mysqldump.

Here's another example:

https://github.com/rongarret/tweetnacl/blob/master/ratchet.l...

starting at line 82. (That's one that I designed.)

konjin · on Feb 24, 2021

>XML is absolutely S-expressions

You need a lot of predefined structure to capture attributes using brackets, much more than you do for dictionaries - which are just ordered pairs in a list when serialized.

peteretep · on Feb 24, 2021

(apologies, ninja deleted, I do not wish to continue this discussion)

konjin · on Feb 24, 2021

There isn't a difference when it's serialized.

megameter · on Feb 23, 2021

As an informally trained writer of compilers, I believe parsing could be reduced to just these concepts:

1. Greedy matching

2. Minimal matching

3. Tokenization

Regular expressions and implementation of a subset of their operators are a good way to introduce and discuss each, and recursive descent is the elaboration on that, the graduation to a customized mechanism. The last is to generalize it all to a form of constraint logic and introduce other forms of backtracking(e.g. pathfinding algorithms). Discussion of types follows from discussion of constraints and explains "smartness" in compilers, so it might be the last thing if I were teaching the course.

What I really think is at issue with our vast number of grammars is just the reliance on text as the interface. If we discuss parsing as a thing applicable to arbitrary data streams we can start asking "well, what if we don't serialize it to characters, but to some other token format?" That is way more interesting today, since we seem to have dispensed with the premise of natural language being the way to program computers, that really got the whole parsing thing started.

umanwizard · on Feb 24, 2021

I have a basic Sieve interpreter in Rust here, if you're interested: https://crates.io/crates/libsieve

It's not very featureful or well-tested, but it does work for my own use (personal e-mail server).