Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Mistakes in programming language design (tuxen.de)
39 points by SlyShy on July 9, 2010 | hide | past | favorite | 44 comments


I totally disagree on the parser-unfriendly syntax point (no. 1). C++ is a particularly bad example in that code cannot be definitely parsed without semantic information, however I think limiting yourself to a language which can be expressed as LALR/LL(1) is pretty brain dead - you limit yourself to what a given algorithm can express clearly rather than what is clearest to the programmer. It's really actually pretty hard to make a language "naturally" LALR anyway, at least without hacks of some kind. C# definitely isn't LALR, there are many features which get in the way of LALR-ness, such as generics which are, however, useful.

I do agree on every other point, however!


And I completely agree with you here.

The progress in programming language design is going to come from eliminating human unfriendly syntax. The closer you can get to something that let's someone express their intentions without restrictions, the better. The computer is, uh, servant of a human.

There are many barriers in the way of programming languages or computers in general adapting to human functioning. But whenever someone takes the line that says humans must-adopt-to-computers, it is a fail. And that fail is going to be swept away by the next technology which just works.


The progress in programming language design is going to come from eliminating human unfriendly syntax. The closer you can get to something that let's someone express their intentions without restrictions, the better. The computer is, uh, servant of a human.

And a servant isn't very useful if he can't understand its master. Ignoring machine-friendliness in language design just leads to worse and more buggy tools (including compilers).

whenever someone takes the line that says humans must-adopt-to-computers, it is a fail. And that fail is going to be swept away by the next technology which just works.

Except the "just works" is far more likely to be actually achieved if it's easy to make it work as well as easy to understand.

Besides, "human-friendly vs. computer-friendly" is a false dichtomy. There are certainly things that are good for one and bad for the other, but language features can also be good for both, or very good for one while incurring very small disadvantages to the other.


But don't forget that we don't have to limit ourselves to LALR or LL(k). Approaches like GLR allow the user to be happy as well as the tool writers.

And again, don't forget, the tool writers have a harder time if they have to mangle their grammar to fit LALR, then write a bunch of hacks around it to get stuff to work.

Working around an algorithm which can't express usable languages properly is a bit silly I think. It's better to improve the algorithm in the first place and keep everybody happy!!


I do not know a single editor which does syntax highlighting by actually parsing (including semantic analysis) the code. Most of them use regular expressions at most.


Programming languages force programmers to think clearly and express ideas unambiguously. It's not a bad thing we should avoid; it helps us write robust software.


Both points are valid. Programming languages should be designed for people to use, and only secondarily for machines to execute. However, they should be designed so as to encourage clear expression of ideas when practical.

I think whether a general-purpose language should discourage or prevent unclear thinking is an open question. I don't think I've seen an example of a language doing so without being somehow crippled, and that's a Bad Thing.


" Programming languages should be designed for people to use"

Exactly. Which means that it must be tool friendly, hence LALR/LL(k). Unless you want to do all your refactoring yourself.


I don't think LALR/LL(k) implies automated refactoring is easy and anything else does not!!

Obviously non-context-free grammars are a problem, but a GLR or PEG grammar is still perfectly refactorable.

You forget that refactoring an intuitive grammar to LALR involves totally mangling that grammar and making it more difficult for everyone.

If the language is expressed as a GLR or PEG grammar then tool makers have it easier I reckon - easier to understand grammar.

The way to really ensure it's easy for everyone is for the compiler write to provide some abstract yet intuitive representation of the underlying grammar.

It really isn't an either/or, it's a case of both at once, I think!


I think you hit the nail on the head there. "Context-free" is more of a requirement than LALR or LL. The author clearly isn't a parser writer. I am, but I've yet to try a PEG grammar because ANTLR does such a great job.

But this doesn't change his point (or mine): languages should be created that are easy to write parsers for, so that its easier to write good tools.

And I think this applies to the grammar too: ANTLR's .g files are a good grammar language because its easy to write tools for - hence AntlrWorks.


I'm the author and i did write some parsers. However i'm more in the write-parser-by-hand camp and context-free is not so hard here. ANTLR needs strange hacks to parse Python with its whitespace indentation. A handwritten parser just needs some context within the lexer.

In my eyes a parser generator is the solution, if you want to trade efficiency for cross-language-portability. ANTLR is probably the best solution at this point, though the C backend is quite wierd.


I like to get shit working first, and then downcode if the PG does a horrible job. Nothing better than a hand-written anything, but I like the options of tools. Stange hacks for python supports your original argument I believe.


There's no language of higher order than machine code that allows you to specify computer behavior unambiguously (even that's doubtful).

In fact, any entity abstract enough to be called "an idea" is going to be ambiguously specified. The question is what sort of ambiguity is most useful - ambiguity about about which object shows itself on the screen is great in an object-orient window manager. Ambiguity about which memory location to read when dereferencing an uninitialized pointer is less useful.


I completely agree with the parser friendly syntax (and disagree with you).

The issue is that tools are important. C++ is famously tool unfriendly, when it in fact, has the most need for tools to verify it.

From my understanding of the generics problem, generics do not get in the way of LALR requirements. You just have to implement them slightly differently than C# does to make the tokens for the generic types more identifiable. So you can do generics, just they can't look like C++ makes them look.


That is not the only problem I had with writing a LALR C# grammar. Anything even slightly touching anything else that is used for other stuff caused endless shift/reduce reduce/reduce conflicts. Not to mention I had to totally mangle the intuitive representation of a C# grammar to fit LALR.

Forcing tool writers to have to do the sort of shit I did, even without generics, is not the way forward.

The alternative of a fully LALR-able language is not necessarily desirable either - 'I want to add a feature which makes my language far more usable, oh hang on, that screws the LALR. Oh well, let's scrap that then'.

How about looking at LALR alternatives?!


It would be interesting to see how often code is written for the language vs how often a parser is written for the language before tuning the syntax for one or the other.

Also I suspect that a sufficiently well designed parser api, even if it internally uses quirky non LALR parsing techniques, will get a lot of tools going without the need for rewriting a parser each time.


C++ is a pertinent example of terrible syntax design. Not only does the complexity make it a nightmare to write a standard compliant compiler, it also slows down development because you need much non-contextual information to read code.


Yeah, Ok. http://www.youtube.com/watch#!v=v9kTVZiJ3Uc

"A keyboard? How quaint!"

Sure, it would be great if you could just describe what it is you are after, except for two problems:

1. Programmers out of a job. 2. How good are you at describing what you want? In English? You think an AI will be able to do it better than a lawyer?

What has this got to do with your point? Well, until that time, your productivity is going to be determined by the quality of your tools. You are going to need to iterate, test and refactor. To do that you need good tools, and good tools are hard to write for non-easily parsed languages. Compare the refactoring tools for C++ (limited) and Objective-C (ha ha) vs C# or Java. My productivity is double, tripple in Java and C# because I'm not spending my time moving characters around files - I'm programming.

http://blog.binaryfinery.com/?p=250


You're comparing four relatively similar C-family languages — it's not too surprising that tool support is the biggest difference here. Throw in Ruby or F# here if you really want to compare language quality vs. tool support.


It's possible that I'm missing the thrust of #0, but I don't see how an object-oriented language completely without the concept of a "null pointer" could be used to construct elementary data structures like linked-lists and trees without resorting to special classes as terminators. It would be just as clumsy as using null pointers in the first place.

If "not-null" references were provided in addition to normal references, the effect would be pretty similar to the common usage of "final" fields in Java- compiler-enforced initialization.


> It would be just as clumsy as using null pointers in the first place.

When you combine this with type-inference, it's actually really convenient. Check out the Maybe monad in Haskell or the option type in OCaml - nullable types are boxed in a variant type of None | Some 'a, you're reminded to check for both possibilities in the few places where it actually matters, and you can quit wasting time and bloating your code writing null tests, forever.

You can define special sentinel subclasses of things in OO languages, but there's usually quite a bit of boilerplate compared to just adding a variant type in ML (and the compiler still can't guarantee that nulls will never occur), so people don't usually bother.


I was wondering that too, so I looked up the linked "Cyclone" language. If I'm understanding what I read there, the entire #0 is pretty misleading.

Basically, it boils down to adding a "qualifier" to the language, so you can mark pointers as nullable / never-null, so the system doesn't have to check.[1] You can also mark them as "thin" (no pointer arithmetic) and "fat" (safer pointer arithmetic), all of which is accomplished by putting a "@nullable" / "@notnull" / etc. qualifier on the pointer when you declare it (I think). @nullable does a check before each access, @notnull does not.

So, it's C + super-static-types to make C somewhat safer. At the cost of making the language significantly more complex. But still not safe, as this is put there by the programmer - people make mistakes. And C programmers seem especially vulnerable to making things unsafe "for speed"... what's to stop an overzealous optimizer from marking more pointers as @notnull?

[1]: http://cyclone.thelanguage.org/wiki/Pointers


If the compile can't statically prove that a pointer can't be null, any attempt to convert it to *@notnull inserts a runtime assertion that it isn't null, according to the website.


Enabling faster compiled code if it can prove it can't be null? Makes sense, and a useful trade-off for essentially polluting the language with yet more keywords. But that still means you can force it to read things as never being null, even when they could be, and I still don't trust developers (despite (because of?) being one).

Unless I'm reading that wrong?


There are alternatives to making references nullable by default.

For example, functional programming languages often support algebraic data types and pattern matching. Creating the equivalent of a nullable value is one possible application of these features. Typically, the type system would make sure you can't get to the underlying value without first checking that your data is non-null via pattern matching, so the equivalent of derefencing a null pointer simply isn't possible without generating a compile-time error.

Note that this idea is independent of the underlying type, so there's no particular reason you couldn't have a type system that could wrap an OO class type in an algebraic data type. Unfortunately, for now, these ideas seem to live in different worlds: I know of no mainstream OO language that has anything close to the power of algebraic data types and pattern matching that is commonly available in functional programming languages. But this is one way you could have an OO-friendly language with powerful data structures and without relying on the idea of a nullable reference.


This was the first thing I thought of. Addition types with pattern matching (which enable Maybes) both provide safe options for these data types. An example Haskell type that keeps the surface idea of nulls (via Maybe having an object) might be something like:

  List a :: Cons a (Maybe (List a))
  Tree a :: Branch (Maybe (Tree a)) (Maybe (Tree a))
The type system removes the concept of 'dereferencing nulls' while maintaining that if the pointer isn't null it still must be the (polymorphic) type a.

Of course, the simpler implementation unfolds the addition type inside Maybe and is more like:

  List a :: Cons a (List a) | End
  Tree a :: Branch (Tree a) (Tree a) | Leaf a


Pattern matching seems to mix really poorly with conventional OO subtyping / inheritance, though multi-methods seem like a good synthesis. (I haven't really used CLOS enough to decide, though.)


I'm not sure OO fundamentally requires the kind of inheritance relationship that is familiar from languages like C++, Java and C#, though.

For example, the Liskov Substitution Principle is phrased in terms of a subtype relationship between two types. This is a logical relationship rather than a physical one: it is about whether both types present the same interface in terms of (some aspect of) their state and behaviour, and therefore generalised code can be written that interacts analogously with instances of either type.

This subtype relationship need not be realised via a subclass relationship, which is a physical relationship related to the mechanics of inheritance hierarchies and so on.

I don't see any problem with mixing pattern matching and OO, if you allow a representation as an algebraic data type as one view of a class and provide a mechanism to convert between them in some suitable way. Indeed, I'm fairly sure I've seen papers investigating this approach, though I'm afraid I can't immediately remember what the authors called it so I could look them up; if anyone knows what I'm talking about, please post citations if you have them.


Late reply to a late reply.

While I think you're probably right, if I'm already representing things with algebraic data types, I'd rather just skip using OO altogether.


Linked lists are a particuarly poor example though. It is just as easy to make them without null pointers as with null pointers:

struct list_node { list_node *next; };

struct list { list_node sentinel; };

IE. A circular linked list. In C++ speak, begin() is sentinel.next, end() is sentinel. In an empty list, sentinel.next points to &sentinel.

It might seem more clumsy, but it isn't really any more work. When you actually do need the equivalent of a null pointer you can use an option type to specify that you either want nothing or a valid pointer.


Welcome to the idea of the "null object". It's an object which satisfies all of the use cases of the null value but doesn't cause you to litter nearly every line of your code with defensive null checks to avoid your program randomly barfing. When done right it's a far more elegant solution than "null".


> Dynamic languages are fast enough to implement internet services and outgrow the demeaning term "scripting language".

This is never going to happen, because "scripting language" is used as a slur or insult, rather than a term with any usefully defined meaning. It's not possible for Python to "outgrow" being a scripting language as long as that's used as a condescending shorthand for "doesn't look like C++".

Consider Ruby. How many applications do you know of which use Ruby for scripting? Or Python -- I think my system has two applications which can be scripted in Python (Gimp and Blender), but many dozens of applications written in Python. If Ruby and Python are scripting languages, then so are Smalltalk, Haskell, Boo, or dozens of other high-level languages. And yet, you'll never hear of these being called "scripting languages" because nobody has an axe to grind against them.

------------

Unrelated to that, Haskell does have pointers (including null pointers) -- you can use them just like pointers in C/C++.


"This is never going to happen, because "scripting language" is used as a slur or insult, rather than a term with any usefully defined meaning."

This isn't necessarily true. See Ousterhout's dichotomy: http://home.pacbell.net/ouster/scripting.html


That's a really wonderful link, and I wish more people used the definition it advocates. I've one minor complaint, in that it conflates typeless and dynamically-typed languages, but considering when it was written that's not a big deal -- strong, dynamically-typed languages probably didn't exist in any significant way.

Unfortunately, almost every use of "scripting language" I've ever heard is more along the lines of:

"We can't write our GUI frontend in Python^! Scripting languages are too slow for rendering advanced graphics."

Usually uttered by somebody who's been writing C++ for 20 years and refuses to touch anything more recent.

^ Substitute Boo, Haskell, Scala, Clojure, or anything else without enough curly braces


(I know this is just an example but I can't resist commenting)

"We can't write our GUI frontend in Python^! Scripting languages are too slow for rendering advanced graphics."

After downloading and using Avant Windows Manager for Linux a little bit, I came to exactly that conclusion. It was horrifically slow. Maybe I'm so much smarter than the authors of that program that I could writer something acceptable where they couldn't. But I wouldn't want to take that chance.

Maybe Python gives other scripting language a bad name. I've written large web apps in Python and Ruby but for desktop programs, the choices with acceptable speed that I know of are C++, java, and c#.

Rapid prototype languages make sense for the web. For the desktop, I think the problem is that OSes have been adding crud at the same speed Intel has added speed. Also, the size of hard disks has increased even more quickly and thus an app that happens to index files must index 100x as many, again negating increased processor speed.


AWN is written in C; if it's slow, that's mostly likely due to your graphics drivers.


Why you are correct.

Well I'm really not sure what's happening because other dock programs work fairly well.

Also, even the demo video here looks "slow" to me: http://www.youtube.com/v/mHDL4dIH10I - the icons' changes follow after the movement of the mouse. Perhaps it's just the way the thing is animated - what seems responsive is different than merely running quickly.


strong, dynamically-typed languages probably didn't exist in any significant way

Need I do the Smug Lisp Weenie thing?

That said, a lot of people didn't make the distinction between dynamically-typed and untyped twelve years ago. It was common to refer to Scheme as untyped then, and I think a few people still do.


I haven't used LISP in years, so my information may be out-of-date, but isn't it mostly untyped? I don't remember it having any mechanism for defining types -- the only types were conses and atoms (numbers, strings, etc). There was no way to say "this string is a Name and this is a City, and they are different types".


Common Lisp has several kinds of user-defined types (types, structs, classes). It is possible to write non-trivial code without ever using them, and I don't think they really fit in to traditional Lisp style. Classes are fairly popular in modern Common Lisp code.


Prolog and APL had also been around for a while, or so I hear.


I agree with you, but I think jmillikin's point is when most programmers hear "scripting language", they think "dumbed-down programming language" rather than "cleanly separated layers of design".


I don't necessarily agree with #2. Yeah, SML has formalized syntax, but it's also a frozen language. It simply isn't practical to have completely formalized semantics for a language like Python or Ruby that are still actively being changed. Nor is it necessarily always possible to eliminate all implementation-specific quirks.

In short, it's a good thing to have a language have few quirks, but there is such thing as overspecifying.


These things don't need to be taken to their logical extremes, though - handling most of the semantic edge cases is still an improvement, even if it isn't exhaustive.

Emphasizing well-defined semantics could also be a push to keeping the language small and clean.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: