Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A parser should never crash on bad input. If it does, that's a serious bug that needs immediate attention, since that's at least a DoS vulnerability and quite likely something that could result in remote code execution. You definitely need to assume that the parser could fail, but that's different. Unless you're using "crash" in some way I'm not familiar with?


From the context, I'm assuming s_q_b simply means that if given malformed input, the parser should communicate the problem via the appropriate error channels for the given language. In Java, throw an exception; in Go, return an error, etc.

But yeah, the article's example of an XCode segfault is a good example of both poor parsing logic and poor isolation of fault domains. If your json processing library can corrupt the whole process then something's wonky.


Yes, exactly this.

What I see a lot is initializing an object from a JSON parser which, when it receives malformed input, either halts execution outright or returns an empty object. In both cases you just want to make sure to handle the error appropriately per-language.

Again, the main offender here is PHP, which makes this issue surprisingly easy to get wrong. (And add "==" to your lint checks! 99% of the time coders mean "===", and this can lead to surprisingly severe security issues.)


If you meant that it would throw or return an error on malformed input, I totally agree. You must always handle errors that your calls might return, and especially when those calls are given anything derived from outside input.


Yes, generally, and specifically that failure to handle this type of error can cause all sorts of plague, pestilence, and application compromise.


Indeed, forgetting to check for errors is a great way to cause all sorts of terrible bugs. That's one reason I'm starting to become a fan of checked exceptions and result types.


It can always crash when there is not enough memory.

Especially on the stack.

On of the test cases has 100000 [ and my own JSON parser crashed on them, because it parsed recursively and the stack overflowed. So now I have fixed it to parse iteratively.

It still crashes.

Big surprise. Turns out the array destructor is also recursive, because it deletes every element in the array. So the array destructor crashes at some random point, after the parser has worked correctly.


In "safe" languages (i.e. ones which aren't supposed to segfault/etc unless there's a compiler/interpreter/VM bug -- PHP, Java, JS, Python all fit in here) I've often heard "crash" being used to just mean an uncaught exception, or something else that brings the program crashing down.

GP says "or catch the error", so they're probably using that term in this sense.


Some parsers are dealing with known good input, and should not include validation code for performance reasons. Often you will parse the same JSON many times throughout a pipeline, but you only really need to validate it once. A good example of this is a scatter-gather message bus, where the router parses a message record, scatters it to a large number of peers, and gathers responses. Depending on how latency-critical this system is, it can make sense to validate and normalize the representation at the router, and send normalized (whitespace-free) JSON to the compute nodes.


Even if the json source is $DEITY's own platinum-iridium json generator, the json has to actually get to your parser somehow, and you can't guarantee that it did so in one piece. Memory is finite, no bus is perfect, etc.


Might as well translate validated JSON to a faster binary format for the internal bus, e.g. MessagePack. Validated msgpack is probably faster than unvalidated JSON.


This was something I wondered about in these results: what's the definition of "crash"? Specifically, with the Rust libraries, I'm not sure if this means "the program panic'd" or "the program segfaulted", or something else. The former isn't ideal, but isn't the worst. The later would be much more worrysome.


In the Rust case, the crashes are either the stack being blown on recursion or panics when overflowing numbers.


What about raise an exception?


That qualifies as error reporting in most languages using exceptions.


What about it?


Exceptions should only be used for exceptional cases. For a parser, bad input should be expected.


You say that as fact but you must know it is a matter of opinion: I would say you should match the language idioms. For example, Python iterators and generators work by raising/throwing when there are no more items: it is fully expected and will always happen when you write a for-in loop.


That seems like bad design. Rust iterators return an Option<T>, for example.


In Python a lot of flow control uses exceptions as it's cheaper to ask for forgiveness rather than permission and then still have to deal with errors.


And in many languages, they are the common way to communicate that a function can not return the data that is expected of it. Invalid input data means the parser can't produce the equivalent data structure -> exception.

If the parser has some kind of partial parsing, a way to recover from errors or you are using a language in which returning explicit errors is the more common idiom, then you probably shouldn't throw an exception.


Throwing an exception when parsing fails, sounds like a case of exception-handling as flow of control: a bad thing even when commonly done, having a lot in common with GOTO statements. (See http://softwareengineering.stackexchange.com/questions/18922... , and Ward's Wiki when it comes back up.)


Exceptions are for flow control. That's their entire purpose.

Throwing an exception when parsing fails is a near perfect example of where exceptions produce clarity of code.

That is to say, a parse function is usually a pure function that takes a string and returns some sort of object. As a pure function, it "answers a question". A parser answers the question: "What object does this string represent?"

When given bad input, the answer is not "NULL". It is not "-1". It is not a tuple of ("SUCCESS", NULL). The question itself has no answer. Exceptions enable code flows for questions without answers, and for procedures that can not be actualized.

Now, you can engineer it such that you change the question to "Give me a tuple that represents an error code and an object representing... etc." But then if your calling function can not answer the question it's meant to answer, you have to manually check the inner function result and return some code. With exceptions, stack unrolling comes for free, with zero code. Once you get used to expecting the secondary, exceptional code flow, you can read and design far cleaner code than you can otherwise.


To be more specific (and your parser example actually makes it very clear), exceptions are a form of an error monad. The benefit that you describe - the ability to automatically flow throw error values without having to check for them at every boundary - is exactly what a monad provides.

The problem with exceptions is that they're not type-checked in most languages - you can throw whatever you want, and the type system doesn't reflect that, so there's no way to statically determine that all expected exceptions are properly handled.

They're (partially) type-checked in Java, but it's extremely annoying, because there's no ability to express that part of the type in a way that lets you write methods that are generic with respect to exceptions thrown by other code (e.g. there's no way to describe a method `f(T x)` such that it throws `E` plus everything that `x.g()` may throw, for any `T`).


In Rust, it would be Option<T> where T is the type of the object you're supposed to get.

This means that if you don't parse correctly, it would be None, if you do parse correctly it would be Some(T)


Instead of `Option<T>` it would be more idiomatic to use `Result<T, ()>` or even better `Result<T, E>`. Where `E` is some way to communicate the error code/type/message back to the caller.


Consider this:

    def load_data():
      with open('some_file.json', 'r') as f_in:
        data = parse_json_stream(f_in)
        data['timestamp'] = some_date_fn() # Do something with the *definitely-valid* data on the next line.
        return data

    def parse_json_stream(io_stream):
      # Some complex parser...
      # at some point...
      if next_char != ']':
        raise JsonException('Expected "]" at line {}, column {}'. format(line, col))
      # More parser code...

A benefit of exceptions here is that you don't have to check the result of "data = parse_json_stream(f_in)" to immediately work with the resulting data. The stack unwinds until it is in a function that can handle the exception.

*edit: Code formatting.


Rust nearly has that same benefit. You can wrap `parse_json_stream(f_in)` in `try!(parse_json_stream(f_in))`, and if an error was returned from `parse_json_stream`, then an early return for `load_data` is inserted automatically, thereby propagating the error, similar to raising an exception.

Of course, these approaches are not isomorphic, but in Rust, the cost of explicitly checking an error value is typically very small thanks to algebraic data types, polymorphism and macros.


But exception handling is flow control, by its very nature. So it's clearly a gray area and the right thing to do depends on the common idioms of the language you're using, the expected frequency of parsing failures, and (possibly) runtime performance concerns. In Java for example, the XML parser built into the standard library does throw exceptions for certain types of invalid input.


But see Djikstra on "GOTO statement considered harmful" (http://david.tribble.com/text/goto.html). The problem is unstructured control flow, which both GOTOs and exceptions-as-control-flow give you; at least in what I was taught (early-2000s CS degree focused on C++), unstructured flow of control is only acceptable when it's a panic button to quit the program (or a major area of processing).

It sounds like the Web way of doing things doesn't have this tradition -- much like how it doesn't have static strict extensible type systems.


Thank you for that great link. Thank you twice over, because it refutes your claim.

Dijkstra specifically calls out exceptions as structured control flow, and as being probably-acceptable, and not subject to his concerns.

More broadly, any argument that goes "Exceptions are an extension of GOTO, and therefore bad" has some questions to answer, given that nearly all control structures are implemented as an extension of GOTO.

As to your last sentence, I think you have it backwards. I speculate that of the code written in 2016, most of the code that did not use exceptions for control flow was Javascript async code. (There are of course other non-exception-using languages, but other than C they're mostly pretty fringe, and for good or for ill JS is so damn fecund).


> nearly all control structures are implemented as an extension of GOTO

Well put. Under the hood, every IF, ELSE, WHILE, SWITCH, FOR, and other decision point in structured code is implemented with at least one unconditional JMP.


Exception flow control is far more structured than goto. An unexpected goto can drop you anywhere in memory, with no idea how you go there or how to get back. Exceptions cause a clearly-defined chain of events and you always know exactly where you'll end up (passing through all the finally blocks until you hit the closest catch block).


It isn't unstructured. It is an exception from the control flow of calling and returning from functions, but it has a structure of its own which is often very convenient for writing clean code. This is not at all specific to the Web, by the way.


Quoting Dijkstra from 1968 on goto as argumentum ad authority in 2016 should be considered harmful.

He had reasons for saying what he said in the context of the times. Remove that temporal context and the why becomes more important than the what.


His arguments are still sound, I think, and until that post I'd assumed that avoiding unstructured flow of control was still received wisdom. (I've certainly found it a very sound policy; million-plus-line codebases of other people's code, plus surprises, equal misery.)

Joel Spolsky had much the same to say: http://www.joelonsoftware.com/items/2003/10/13.html . If I'm going to have to argue about this, I'd rather use Spolsky than Dijkstra as a starting point; what, if anything, do you see that's wrong in his article?


The article basically makes 2 points, which are:

1. They are invisible in the source code

2. They create too many possible exit points

As a java developer I really don't find 1 to be a problem at all. Lots of people complain about checked exceptions, but they solve this problem. It's very easy to see which functions throw and which don't. I'd even argue that the invisibility that is there is a positive -- I usually don't want to think about the error case, but can easily trace the exceptional path if I do. I find that often functions need to go back up the stack to handle issues, in which case the exception control flow is exactly what I want.

Runtime exceptions, on the other hand, are invisible and can be an issue, but things like running out of memory, or overflowing the stack, well, I really don't want to have to write: public int add(int x, int y) throws OutOfMemoryException, StackOverflowException, ...

For 2, I find that they just create 1 extra exit point, which is back up the stack/to the catch handler. You could certainly abuse them to cause problems, but I personally don't find this to be an issue in practice.

I think that everyone agrees that unstructured flow of control is problematic, but checked exceptions do have structure, even if it's a bit of a loose one.


Parsing is often deeply recursive and in some languages, throwing an exception will automatically unwind the stack until it finds a handler. As well as explicitly giving you a good path and a bad path (as someone pointed out upthread) this can (again, in some languages) save a ton of repeated special case code in your parsing routines.

Some languages or libraries are explicitly designed to use exceptions for flow control some strongly discourage this. Apple's ObjC error handling guide for example contains this stricture but also calls out parsing as an example of when you should do it.

C.A.R 'null pointer' Hoare considered exceptions a blight on the earth that would lead to us accidentally nuking ourselves, so there's a spectrum of opinions available here.


The JavaScript JSON parser throws an exception on invalid input. A benefit of this is that there is only one code path to handle any kind of failure, and another is that unhandled parse failures lead to a hard stop, which is a good default.


> Exceptions should only be used for exceptional cases.

Why?

(My thoughts on the matter: http://www.mcherm.com/reasons-why-my-code-style-is-wrong.htm... )


You must really dislike Python, then since exceptions are used for all sorts of things that aren't unusual.


Legacy PHP code has an "Error" failure mechanism where the process just ends or execution jumps straight to a global handler function.

There is ongoing work to deprecate and remove it in favor of actual exceptions, which travel up the stack and are much easier to clean up after.


Recursion kills.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: