I find it ironic that the article points out that relying on error from humans to find errors is something of a hit or miss proposition and suggests that automating error finding is an appropriate course instead of making it less likely to make the error in the first place.
For example, I wonder how many errors would have been found if the definition of a format string was the default? That is, how many times would people have written something like "hello {previously-defined-variable}" and not meant to substitute the value of that previously defined variable at runtime?
I don't think this makes sense. Plain strings and format strings are not interchangeable, and using one where the other was meant is probably a bug.
Would you expect that a user input like "{secret} please" is interpolated? If so, we hopefully agree that this would blow major security holes into any python script processing untrusted user input. And if not... Why not?
Look up how this works in Swift. They only have one string. No raw strings or f strings. Yet they have all the power of all three python string types and less syntax. It's very nice.
But they’re a distinct string syntax. Your point seemed to be that there was only one. rf"{expression}" works in Python too, note, so either way you want to interpret it, raw strings aren’t a difference.
If you only make it work with string literals (e.g. generate the underlying formatting logic at parse time), it wouldn't allow arbitrary inputs to be treated as f strings.
The assumption I'm thinking they mean is to make formatting default and unformatted not default, for example, how "raw" strings were treated, escaped characters are replaced with the ascii code by default unless the string is raw, signified by an 'r' prefixed in front.
As someone who would like to be working on new, interesting things in 2-3 years rather than bringing old code into conformance with breaking changes, this attitude captures a worrisome trend in development.
On the one hand, it's great that we have platforms that innovate and improve and harden over time, but we're also facing a development culture where more and more time is spent servicing package/platform/language/OS changes that have no material impact on our own otherwise-mature projects.
It's worth being judicious about where breaking changes are applied, right?
We’re not talking about deprecating a feature here, we’re talking about the addition of behaviour that will break existing code, potentially in non-trivial and hard to debug ways, and in ways that could easily introduce security vulnerabilities.
We've done this too many times and we've had enough pain, let's please proceed at a pace where we can worry about delivering our product and not updating formatted strings, thank you.
Did you think this through? What would you treat as a fatal error? How would the compiler know if a particular string is old style code wanting to print some characters between curly braces or new style code wanting to string interpolate a variable?
Right?! Imagine: "We're announcing Python 4. Python 3 was because we handled unicode in a way that turned out to make nearly 0 sense. Python 4 is because you lot can't put your f's in the right place."
Changing something as deeply rooted as the string type?
Python already went through exactly that disaster once before, when they changed the default string type from b””-strings to u””-strings. It took about 20 years for this transition to finally complete.
PHP has also been responsible for the majority of exploited servers and misconfigured applications. Whatever they are doing it, I take it as a strong negative signal.
That's not unreasonable considering that PHP is by far the most popular server-side language. It's not like we have many hackers targeting Erlang instead.
To be fair, your suggestion might make for a more resilient default, but it's also a great way to leak data and add overhead for the default case. There are tradeoffs.
Not much overhead, I would think. We’re talking about literal strings in source code, not strings in general. It’s not much work to check those.
One thing that it would break is that strings read from files would be treated differently from those in source code, even those read from files that logically “belong” to the application (say config file)
I don’t think that’s an issue, though.
Also, in Swift "\(foo)" does string interpolation. I haven’t seen people complain it leaks data or makes Swift slow (but then, it’s not fast at compiling at all because of its rather complicated type inference)
> Also, in Swift "\(foo)" does string interpolation. I haven’t seen people complain it leaks data or makes Swift slow (but then, it’s not fast at compiling at all because of its rather complicated type inference)
I think that the claim is not that this leaks data in an absolute sense, but rather that changing the behaviour after people have come to rely on it will leak data from currently well behaving applications.
>...and suggests that automating error finding is an appropriate course instead of making it less likely to make the error in the first place.
You can't fix the syntax and standard lib of the language. It is what it is. Similarly, how many bugs would you prevent if Python had compiler support to catch those types of syntax (and type) errors.
This is how bash works. Any string with a $ in it will be interpolated unless you double escape it. Also depending on if you use double or single quoted strings.
Spaces as list separator could also fall into this philosophical question of what makes most sense as string separators. Some times it is super convenient, until you have actual spaces in your string and it becomes a pita.
See also the yaml Norway problem for what happens when implicit goes over explicit.
It generates about the same amount of bugs, if not more, and would also end up with a code-review-doctor suggesting you to use /$ over $. In the end, regardless of syntax, a human always have to make the final call on whether interpolation is wanted or not.
For example, I wonder how many errors would have been found if the definition of a format string was the default? That is, how many times would people have written something like "hello {previously-defined-variable}" and not meant to substitute the value of that previously defined variable at runtime?