Finding an authorization bypass on my own website

PeterisP · on March 6, 2022

The key point is that an abstraction layer trying to implement an API that looks like parametrized queries is not equivalent to actual parametrized queries where the query and parameters are kept separately, and the SQL text is parsed and execution plan is formed by the DB engine before the parameters are even considered. If your DB library is inserting the parameters in a text query before sending it off to the server to be parsed as arbitrary SQL, that's not a parametrized query, but just fake smoke and mirrors.

bgrainger · on March 6, 2022

One major problem is that the MySQL "text" protocol doesn't support true parameterised queries. There's no way to pass parameters in the `COM_QUERY` packet: https://dev.mysql.com/doc/internals/en/com-query.html

That's probably why many of the libraries mentioned in this thread use "smoke and mirrors". Of course, it is quite possible to correctly escape a value by rendering it to a string first, _then_ encoding the whole thing.

To have true parameterised queries, you need to use the "binary" protocol, which many MySQL libraries don't offer support for. (MySQL also has some frustrating limitations with the binary protocol, such as not allowing a SQL string containing more than one statement to be prepared.)

namibj · on March 6, 2022

PostgreSQL shares that limitation on it's extended query protocol (needed for prepared statements), though you can send multiple queries off before sending the "I'm waiting, hurry up and get back with answers" packet to the server. Not that the native C library (libpq) supports such pipelining...

gruez · on March 6, 2022

>If your DB library is inserting the parameters in a text query before sending it off to the server to be parsed as arbitrary SQL, that's not a parametrized query, but just fake smoke and mirrors.

Do you have examples of libraries that's actually a "parametrized query"? The python libraries I encountered (psycopg2, sqlalchemy) both seem to be "fake smoke and mirrors".

PeterisP · on March 6, 2022

This is one of the major differences between psycopg2 and psycopg3 - https://www.psycopg.org/psycopg3/docs/basic/from_pg2.html#se...

kayodelycaon · on March 6, 2022

Active Record from Ruby on Rails does. It’s actually a problem on older versions because it may incorrectly generate duplicates of queries that could use the same prepared statement but don’t. Can cause unexpected memory usage.

btschaegg · on March 6, 2022

If ADO.NET counts as a library for you: Parameterized queries for MS SQL through ADO.NET.

They are seemingly run through a stored procedure on the server side, with each parameter passed in as an argument. This has some consequences that lead to very obscure behavior, too [1].

For example, you should never create temporary tables in an SQL statement that was initialized with parameters, as they won't survive the end o the statement; they will be destroyed as soon as the innermost scope (within the stored procedure call) finishes.

I ran into this because I tried setting up a single "run this query" method in a more complicated database routine in order to keep my C# code clean... didn't work out as I'd hoped ;)

[1]: https://stackoverflow.com/a/46311328

Edit: I just realized that this still isn't a guarantee that parameters are handled as a separate structure by the underlying network protocol. I hope they are -- but I don't know how to check. Maybe the according .NET Core code is on GitHub?

nightski · on March 6, 2022

Just run SQL Server Profiler. You can see what is hitting your server, it's a very useful tool.

bgrainger · on March 6, 2022

MySqlConnector (C# - https://github.com/mysql-net/MySqlConnector) uses "smoke and mirrors" for the "text" protocol (because MySQL Server has no way to send parameters as part of a `COM_QUERY` packet, https://dev.mysql.com/doc/internals/en/com-query.html) and true parameters for the "binary" protocol, which does (https://dev.mysql.com/doc/internals/en/com-stmt-execute.html).

Every MySQL client needs to do something similar, due to the design of the MySQL protocol.

adeel_siddiqui · on March 6, 2022

Hibernate has actual prepared statements and binds values when the query is executed.

tedunangst · on March 6, 2022

sqlite3 con.execute() appears to prepare and execute a statement, with bound parameters.

tedunangst · on March 6, 2022

From the mysqljs readme: "Caution This also differs from prepared statements in that all ? are replaced, even those contained in comments and strings."

Eek. "select * from classes where teacher = '?'" -> boom.

tyingq · on March 6, 2022

Wordpress also has faked parameterized queries.

https://github.com/WordPress/WordPress/blob/4ae0744585ea9417...

They do seem to have resisted attacks for quite a while though.

jolux · on March 6, 2022

Reading WordPress internals like this makes me feel inclined not to use it.

SahAssar · on March 6, 2022

Hasn't WP had multiple SQL injection CVE's? WP is famously terrible at security.

tomnipotent · on March 6, 2022

> to actual parametrized queries

Except there is no such thing as "actual parametrized queries", either as defined by the SQL standard or as supported by any of the major RDBMS vendors.

The closest thing we have are prepared statements, which are basically session-scoped stored procedures used by client-side libraries to pull off the illusion of parametrized queries.

I wouldn't be surprised that if you go back far enough, the idea of parametrized queries probably started in client libraries trying solve for SQL string soup with a dash of sprintf-like syntax sugar.

tus666 · on March 6, 2022

Of course there is. If the parameters are incorporated server side after the SQL is parsed then you have true parameterized queries. If the parameters are inserted before the SQL is parsed you have trouble.

tomnipotent · on March 6, 2022

> you have true parameterized queries

In the sense that stored procedures and prepared statements are queries, sure. But your SQL statement and your values are not being sent in the same call to the database.

default-kramer · on March 6, 2022

> Except there is no such thing as "actual parametrized queries" [...] as supported by any of the major RDBMS vendors.

That would surprise me... I thought that when I send a parameterized query to PostgreSQL or MS SQL Server, most (or even all) of the query plan gets created without looking at any parameter values. And if that is true, then I think your "no such thing" claim cannot be true. If the query plan has already been created based on the unparameterized SQL string, then parameter values cannot cause it to do something crazy like drop an unrelated table.

(But I haven't read the source code to either of those RDBMS, so maybe I am about to be surprised.)

hn_throwaway_99 · on March 6, 2022

> I thought that when I send a parameterized query to PostgreSQL or MS SQL Server, most (or even all) of the query plan gets created without looking at any parameter values.

Your comment peaked my interest so I looked into what postgres does, and it's pretty ingenious in my opinion. It can create either a generic plan (not looking at parameter values) or a custom plan where it looks at parameter values and then uses statistics to find the best plan.

By default, for the first 5 executions of a prepared statement uses a custom plan. If the execution costs of these different plans are pretty close to each other, from then on it just uses the generic plan (idea being that the extra cost of generating a custom plan each time isn't worth it), but if they are different enough (meaning looking at statistics can have a big speedup), then custom plans are used going forward. This behavior can be tweaked using DB flags.

tomnipotent · on March 6, 2022

> when I send a parameterized query to PostgreSQL or MS SQL Server

You're not sending a parameterized query. The libraries are creating prepared statements under the hood, and managing them for you. This generally requires one call to prepare the statement and fetch some sort of handle, and then executing the statement with the handle from the first call.

Part of the confusion is that it's common to see SQL libraries offer an API to parametrize queries using some sort of placeholder in the query, but it's really just a facade to perform local string interpolation along with some sort of vendor-specific sanitation scheme.

hn_throwaway_99 · on March 6, 2022

> This generally requires one call to prepare the statement and fetch some sort of handle, and then executing the statement with the handle from the first call.

Not always. The Node `pg` module pipelines down the commands on the connection: it sends a `prepare` command, then a `bind` command, then an `execute` command. Yes, it uses prepared statements under the hood, but there is no wire round trip between prepare and bind.

tomnipotent · on March 6, 2022

Looks like Postgres 14 added pipeline mode in libpq, good to know!

magicalhippo · on March 6, 2022

> Except there is no such thing as "actual parametrized queries", either as defined by the SQL standard or as supported by any of the major RDBMS vendors.

We're in the process of adding support for MSSQL in addition to the DB server we've been using for ages. This is one of the pain points. They differ in how they handle "parameterized queries" enough to be a PITA.

mirekrusin · on March 6, 2022

You can do it right with ie. combinators [0] where you explicitly specify at which position you are in (value, identifier etc) and have much more control over query building (dynamism and flexibility that value-only parametrized query/prepared statement doesn't provide). Tagged templates are fantastic to represent safe query mini-"dsl".

[0] https://github.com/appliedblockchain/tsql/

cromd · on March 6, 2022

> [...] execution plan is formed by the DB engine before the parameters are even considered

Is this really possible if the server considers table statistics (i.e. frequencies of certain values) when forming a plan?

thinkharderdev · on March 6, 2022

Typically the SQL statement will be parsed to a logical plan first, which then gets optimized and then turned into a physical execution plan. Statistics are used for a second layer of optimization on the physical execution plan.

tus666 · on March 6, 2022

This is a perfect example of why I often just read the comments before the links on HN. Saves a lot of face-palming.

Aeolun · on March 6, 2022

> that's not a parametrized query, but just fake smoke and mirrors.

And still much preferable to not having it at all.

Don’t let perfect be the enemy of good.

hn_throwaway_99 · on March 6, 2022

Well, in this case mysqljs pretty well fucked up the implementation here in that this sort of bug should never happen when using prepared statements, so I'd say it's worse than not having it at all, because it breaks the standard safety guarantees of prepared statements.

Aeolun · on March 6, 2022

This implies that the alternative to using mysqljs is using prepared statements, which is most definitely not the case.

The alternative for people using a library like this is to send plain queries to the db.

moron4hire · on March 6, 2022

That only applies when perfect is not yet achievable. Real parameterized queries exist and can be had today.

DaiPlusPlus · on March 6, 2022

> Real parameterized queries exist and can be had today

/s/today/30 years ago: SQL-92 describes them in chapter 4.18 - which at the time was just standardizing something almost all vendors already had in some form or another, for example, I'm pretty sure even the first release of ODBC in the late 1980s also had parameters.

Disclaimer: My pontifications below derive from my life's experiences starting with Access/JET Red as a sprog, to my current professional work with MS SQL Server, Postgres (and MySQL, I'll admit - but I'll say MariaDB instead) built over the past 17 years - but I have absolutely zero experience with Oracle and Db2, so I honestly don't know what cool language features they have (and they must be way ahead of the ISO spec, otherwise why else would people pay so much for it?... Hmm, then again, Oracle still doesn't have a bit/bool column type, does it?).

Anyway:

However, the expressiveness of parameters hasn't changed much since the original design of ODBCS - at least as far as I'm aware: Query parameters are still mere scalar value placeholders instead of hardcoding literals inside queries and statements: with limited exceptions like T-SQL's ceremony-laden table-valued-parameters and Postgre's array types, it's still not possible to parameterize database object identifiers or even have a true variadic `WHERE IN ( ... )` predicate without resorting to Dynamic-SQL, nor can we use parameters to conditionally enable or disable query predicate clauses: yet these are all essential features for any kind of ORM or language-integrated-query system built today (the canonical example being Linq/EF in .NET and TypeORM or Prisma for TypeScript/JS.

Why isn't anyone meaningfully advocating for a "SQL/2"? I know backwards-compatibility is 100% essential, but that's the easy part (because relational calculus and relational are isomorphisms, hence why semantic-preserving query translation between different SQL engines is a solved problem), but lack-of backwards-compatibility is usually the reason most good-ideas die in committee - so how come Google was successful in pushing HTTP/2 and even HTTP/3, but we're still using 1980s-level SMTP and SQL? Are all of the major RDBMS vendors so afraid of change that they're willing to forgoe winning-over millions of new customers if they can ship a usable, flexible, expressive, and safe query (and query-building) language?

I'm just blabbering at this point. Forgive me, but I need something to distract me from the outside world right now.

jhgb · on March 6, 2022

> /s/today/30 years ago: SQL-92 describes them in chapter 4.18 - which at the time was just standardizing something almost all vendors already had in some form or another, for example, I'm pretty sure even the first release of ODBC in the late 1980s also had parameters.

Interbase must have had them in 1980s since Interbase (and now Firebird) compiles ESQL queries into static BLR code. Since there's no way to compile a new query at runtime (other than by outright switching to using Interbase/Firebird's DSQL C API for a non-compile-time query), all these queries have to be parameterized to be of any use.

DaiPlusPlus · on March 6, 2022

I was curious about BLR so I had a quick look but there isn’t much around other than what’s on Firebird’s website.

Their description of BLR makes it sound like an eagerly-evaluated equivalent of SQL Server’s Execution Plans, but are compiled on CREATE instead of on first-use, and are comprised of machine-native (i.e. raw x86/x64 + disk read syscalls) operators arranged together - or maybe slightly higher-level? It’s unclear how schema-binding works.

I’m curious what advantages there are to that approach anymore: databases are invariably IO-bound, not CPU bound, and things like well-maintained indexes and statistics objects are far more important when it comes to DB perf than the ISA of the query engine. I’d wager an ultramodern engine running in WASM or even interpreted Java will run faster on the same hardware than JET, I’d think.

On a related note, why are we still stuck with 4K/8K-sized pages?

namibj · on March 6, 2022

You can combine literals into an SQL query, and doing so isn't even hard to do correctly. You'd still use parameters to feed user input, of course.

DaiPlusPlus · on March 6, 2022

SQL Server has had _autoparameterization_ opf queries for a while now: when you have multiple queries of the same general shape but with varying literal values, it automatically converts the literals to parameters (rather than vice-versa), and it's generally beneficial for performance because existing query-plans can be reused (instead of having to create a new plan for each literal value encountered): https://www.mssqltips.com/sqlservertip/2935/sql-server-simpl...

namibj · on March 6, 2022

I didn't mean it that way. I mean you can combine SQL fragments, think multiple `ORDER BY` clauses.

exac · on March 6, 2022

And has since June 2004 in MySQL.

https://en.wikipedia.org/wiki/MySQL#Milestones

Aeolun · on March 6, 2022

Yeah, but you’re just shifting the location you expect bugs. Using parametrized queries at SQL level doesn’t guarantee you don’t have bugs of this kind at the database layer.

Obviously they’ve been proven for tens of years now, so it’s extremely unlikely, but conceptually it isn’t different.

sqliwtf · on March 6, 2022

That’s the kind of thing you say to ship a product to market, to eschew a non-critical feature.

Parametrized Queries for a SQL library are a critical “do not ship without” feature. You do not lie and tell your user you have a safe product causing them to have a compromised system.

I hope to God you are not creating production systems anywhere.

tragictrash · on March 6, 2022

We had a candidate with 'wrote dynamic sql' listed under a job he held previously. They also couldn't answer the question 'how do you prevent a sql injection attack'. He had something like 5 years of experience.

I don't understand how anyone but the greenest of devs doesn't comprehend the importance of these kind of things. But alas, I see it everywhere, including the comment you are responding to.

papercrane · on March 6, 2022

I had a similar candidate a couple weeks ago. Their CV made a big deal about a dynamic SQL framework they wrote for a previous job, but they just gave me a blank look when I asked how they mitigated SQL injection attacks.

I ask about SQL injection pretty regularly, and it's scary how many devs don't seem to even know what it is.

DaiPlusPlus · on March 6, 2022

> I ask about SQL injection pretty regularly, and it's scary how many devs don't seem to even know what it is.

That's either a good thing: because they grew-up with database libraries designed to encourage, if not force, the use of parameterized queries, so it was never a problem for them.

...or it's a bad thing, because they grew-up in the early-days of PHP 4x, learning from PHP/MySQL tutorials written by people who'd today be considered utterly unqualified to speak at length about programming: the kinds of things that only happen when the blind were leading the blindfolded: nightmares like actively encouraging concatenating $_GET values directly into mysql_query() strings because it means having less variables, and less variables means better performance (right?!) - and they never learned otherwise.

-----

I have a pet theory that people who got started with PHP in the 2000s who are still working today are so burned from their earlier experiences that they're now the most detail-oriented and best-practices-following programmers around, regardless of the language they use today - while the people who never experienced hardship (to the extent that having to use PHP is a hardship...) become complacent, and our ever-increasing reliance on unvetted external dependencies (in all language ecosystems, imo) is going to end badly.

Or not. No idea, honestly!

Aeolun · on March 6, 2022

> I hope to God you are not creating production systems anywhere.

Good luck building production systems without compromises.

Also, I think everyone would appreciate it if you didn’t hurl personal insults about. This aint Reddit.

jsiaajdsdaa · on March 6, 2022

Imagine working on a flight controlller that has functions that produce the wrong values for flight control! "it's ok, something is better than nothing" is not true!

onphonenow · on March 6, 2022

This title is horrible. I clicked through expecting something interesting with let's say PostgreSQL parameterized queries. This has nothing to do with parameterized queries, this is some kind of standard SQL injection issue on an intermediate layer.

progforlyfe · on March 6, 2022

Still, I hope the author realizes the problem so they won't repeat such a mistake. The fact that such a package exists is a little odd to begin with...

onphonenow · on March 6, 2022

Why not just use the engines parameterized query support if you are supporting the feature?

They are probably missing out on many of the OTHER benefits of prepared statements. Easily 60%+ of queries can be prepared reducing overhead sometimes significantly.

addandsubtract · on March 6, 2022

At least it helped me understand that "parameterized queries" are a database feature and not just a library abstraction.

onphonenow · on March 7, 2022

Right, benefits include performance gains.

From HN:

https://blog.soykaf.com/post/postgresql-elixir-troubles/

In addition to security.

88913527 · on March 5, 2022

Is there any plausible reason why mysqljs should accept anything other than a string for the parameters array (second argument)?

  connection.query("SELECT * FROM accounts WHERE username = ? AND password = ?", [{username: {username: 1}}, "secret"]);

Without looking at the implementation details, I would expect this code to throw an exception. A "garbage in, garbage out" philosophy seems dangerous for a SQL statement.

progforlyfe · on March 6, 2022

It looks like the article headline is slightly misleading. It says "Parameterized queries", but I wouldn't use that phrase to describe escaping data "client side" and sending the SQL query to the Database essentially as raw. Or maybe I am thinking of "Prepared statements", but I tend to use the terms interchangeably.

merb · on March 6, 2022

one of the authors of the library is quite funny: https://github.com/mysqljs/mysql/issues/274#issuecomment-128... really stupid that such a big library still does not have the most important feature of a sql library.

tkiolp4 · on March 6, 2022

It’s an open source library that the author made for free in its free time. You either accept it as it is (check its License file), improve it, or walk away and use something else.

lazide · on March 6, 2022

That’s all we’ll and good when it doesn’t have the equivalent of camouflaged knives sticking out at eye level. Yikes.

technion · on March 6, 2022

I agree that this particular issue is extremely obvious and quite poor, but in the particular given link we're responding to their response was more than reasonable.

couchand · on March 6, 2022

Hmm... commentor raises concern that string manipulation dressed up as prepared statements is potentially a source of SQL injection vulnerabilities. Maintainer dismisses concern as "FUD". Eight years later, article is posted showing an exploitable SQL injection vulnerability.

I'm thinking they should have listened rather than getting defensive.

merb · on March 6, 2022

Well the first sentence is the problem, not the second.

datalopers · on March 6, 2022

These are not prepared statements, just sloppy crafting of SQL. Same mistakes made in PHP two decades ago are being made here.

rovr138 · on March 6, 2022

Those who do not learn from history, are doomed to repeat it. (or however that saying goes)

hn_throwaway_99 · on March 5, 2022

This is basically the exact same bug as log4shell: trying to be too "smart" by parsing unsafe user input values, as opposed to keeping that kind of logic only on the programmer provided template strings.

zamalek · on March 6, 2022

Exactly. Something called 'sql', 'json', 'xml' (DOCTYPE exploit anyone?), or whatever else should do the thing on the label and nothing else. If the developers are so inclined to add extra magic or automation then they can always make a prelude library.

Also, why in the hell are values being quoted (sql parlance for escaped)? Does the mysql protocol not support actual parameterized queries?

pigbearpig · on March 6, 2022

Amazing how SQL injection is in the OWASP Top 10 for the last 20 years and then you see something like this. I understand it’s open source but I can’t fathom why the author would be ok with not having true parameterized queries.

shoo · on March 6, 2022

at the day job i see quite a few fresh potential SQL injection vulns when i do code review or when reading through existing code. many of our colleagues in the industry are junior and haven't had to learn this yet or have deep expertise in some areas but dont have deep experience in writing production code, then end up in teams writing production code.

i didn't realise SQL injection was a thing to watch out for until i'd already been working professionally for 5 years -- that maybe sounds bad, but for the first 5 years i didn't work on any projects that integrated with databases.

throw_m239339 · on March 6, 2022

> at the day job i see quite a few fresh potential SQL injection vulns when i do code review or when reading through existing code. many of our colleagues in the industry are junior and haven't had to learn this yet or have deep expertise in some areas but dont have deep experience in writing production code, then end up in teams writing production code.

Don't blame that on juniors. The people writing these libraries with "magic" bullshit query parsing aren't junior. SQL injection is a very basic security issue that can be very mitigated simply with actual prepared statements.

thyrsus · on March 5, 2022

The exploit relies on the behavior of backtick quotes in mySql, which I'd never encountered before. Do other SQLish databases have similar features?

epitactic · on March 6, 2022

MySQL uses `backticks` for quoting identifiers, ANSI SQL and PostgreSQL use "double-quotes", SQL Server and MS Access use [square brackets].

SQLite supports all three: https://www.sqlite.org/lang_keywords.html

andreareina · on March 5, 2022

Other rdbms have identifier quoting, yes. But the problem isn't that identifier quoting exists, it's that the library interprets an object as a key = value statement (compounded treating the key as an identifier), which it shouldn't.

    ... AND password = password = 1

is just wrong, quoted or not.

yawaramin · on March 6, 2022

Also, the fact that the library doesn't validate the type of the input data ({password: 1} when the password should be a string...).

This is just a cascading chain of failures.

useerup · on March 5, 2022

Yes. Almost all DBMSs has a way to delimit table/column names (for instance when a name contains spaces or special characters.

However, proper DBMSs expose parameters as 1st class concepts through the API. That has several advantages, some of which:

1) It is more secure, as you will not have to do this dangerous escaping before invoking. Parameters are substituted by the DBMS.

2) The DBMS can better understand the "dynamic" part of a query and cache query plans.

phiresky · on March 6, 2022

The popular pg-promise library for PostgreSQL in NodeJS has a similar issue - for some types of queries ("Formatting Filters") it interpolates parameters itself instead of using real parameterized queries.

This is especially bad because it uses it's own escaping function and escaping in PG depends on a server configuration variable (standard_conforming_strings) that the client doesn't know about.

This behavior is barely mentioned in the docs, and the author does not really accept any suggestions or criticism.

throw_m239339 · on March 6, 2022

> This behavior is barely mentioned in the docs, and the author does not really accept any suggestions or criticism.

This is where the programming community has a role to play.

When library authors blatantly brush off security issues, it's time to call out that behavior publicly and promote a secure fork. a database library should never have hidden behaviors such as theses. And "magics" such has manually building strings into a query like that s, or parsing a provided query to transform it into something else under the hood, should be turned off by default.

This is absolute madness.

SahAssar · on March 6, 2022

Is there any reason to not use the pg package? It does promises too, and has been rock solid for me.

henvic · on March 6, 2022

I see no parameterized queries on the article. The title is misleading.

technion · on March 6, 2022

This appears to be an exact copy of an issue described 12 days ago:

https://flattsecurity.medium.com/finding-an-unseen-sql-injec...

The opening segment is the exact same piece of vulnerable code.

tedunangst · on March 6, 2022

Indeed, that's why the opening paragraph links to that article.

axiosgunnar · on March 5, 2022

…in a JavaScript ORM, not the actual database.

Still interesting.

nyanpasu64 · on March 6, 2022

I think this bug wouldn't happen in a statically typed language where the attacker couldn't pass a variable of a different type than expected. Dynamic typing makes it difficult to know every possible state your code can be in.

todotask · on March 6, 2022

Indeed, this title should be change or suggest remove this article for wrong info.

henning · on March 5, 2022

I clicked through expecting some fiendishly clever trick involving weird Unicode characters and encodings or something. Nope, just people using a browser scripting language invented in 7 days on the backend by choice.

throw_m239339 · on March 6, 2022

The fault lies with library authors trying to be smart while being ignorant of how SQL works, not JavaScript necessarily.

Aeolun · on March 6, 2022

What is up with all you people ragging Javascript the language? This is 100% programmer error, and has absolutely nothing to do with the language.

Javascript has plenty of bad points without making without making things up.

s4i · on March 6, 2022

It's a meme on HN at this point. Lazy negative remarks about JS or other web technologies has high demand in the reader side too I guess.

the_gipsy · on March 5, 2022

The root mistake is taking in user input without any checking.

It's not the author's fault, it's JavaScript that leads this way. Even worse is TypeScript, as its compile-time checking adds a false sense of security.

hn_throwaway_99 · on March 5, 2022

I disagree. I would qualify this as a clear bug in mysqljs.

First off, most other DB drivers will use real prepared statements, so what is passed down to the DB is actually the templated query and a set of values. It looks like mysqljs actually parses and interpolates the string before sending it to the DB engine.

Perhaps more importantly, though, this bug is basically the exact type of bug that caused the log4shell fiasco: trying to be too "smart" with user provided values that should just be treated as "dumb" scalars. The fundamental way of filling out template parameters with something that can refer to DB column names is a major flaw IMO.

progforlyfe · on March 6, 2022

Bingo. Any time software is made to try parsing & escaping SQL queries before sending to the server is going to find out the hard way why that is risky.

With prepared statements it's like saying to the DB, here's the query: SELECT * FROM users WHERE email = ? AND password = ?. And here's the data: ["whatever", "whatever"]. You can literally send whatever your heart desires for those two strings, unescaped, to the DB server and let them deal with it.

galaxyLogic · on March 6, 2022

Prepared Statements are great. Is there any way to force the programmer to use (only) them?

singron · on March 6, 2022

From the DB, no. E.g. you could always have `query("... a = ? and b = "+b, [a])`.

From within a language, you somehow need to restrict how you build up the query. E.g. a query builder or ORM usually makes it hard to use anything besides parameters, but they usually have escape hatches (e.g. a where method that takes arbitrary sql strings).

One trick you can do in Go is to create a type like `type sqlLiteral string`. The type is private, so other packages can't directly instantiate it, but a string literal will automatically coerce to it. Basically, you can't compute sqlLiteral from arbitrary expressions. It must be a single hardcoded string literal.

nicoburns · on March 6, 2022

No, because constants and column names are of course legitimate parts of SQL queries (even those using prepared statements), and because queries with no parameters are also legitimate.

namibj · on March 6, 2022

Code review and static analysis work great, by allowing obviously-safe dynamic SQL where static SQL is insufficient.

Merad · on March 6, 2022

Both are true. When the request input is blatantly the wrong type like this, you shouldn't even be attempting to execute the sql query. But of course JavaScript encourages developers to be lazy and not even think about such things.

hn_throwaway_99 · on March 6, 2022

Developers shouldn't have to think about these things. If I'm doing any sort of string interpolation, whether it be in a SQL statement like this or in something like a log or print formatter, I should be able to put whatever the f I want into the value, without checking it. That's the whole point of using interpolated statements like this in the first place: it should be safe to run (but may throw something like a type error in the DB) regardless of what is stuffed in the value.

This is solely a mysqljs bug.

Merad · on March 6, 2022

If you use any strongly typed language with a halfway decent framework you _don't_ have to think about this because the request will automatically be validated and rejected for invalid input before your code is ever hit.

nicoburns · on March 6, 2022

That’s also true of a decent JS library. The problem here being that the library wasn’t decent.

A library in a strongly typed language would still need to use parameterised queries or escape the strings just like a JS library. And it would be just as easy or difficult to have an incorrect sanitising function.

In fact, the specified string is not invalid input, it would be perfectly valid to store backticks in a string field in MySQL. They just need to be correctly escaped before being submitted to the database.

hn_throwaway_99 · on March 6, 2022

That's not really true. Pretty much every DB driver I know, in every language, will take prepared statement values that are any valid DB datatype: strings, ints, decimals, etc., and, importantly, JSON values because most DBs now support JSON. Many JS DB frameworks will take any JS object and convert it to a JSON string because that's obviously a trivial operation (it is "JavaScript Object Notation" after all).

Again, the only bug here is the completely incorrect serialization that mysqljs does.

Merad · on March 6, 2022

You're missing my point because you're focused on the database query. I'm talking about the web application framework. If this was written in Asp.Net, for example, the password field would be declared as a string and the data binding code within Asp.Net would validate that the field was in fact a string as part of instantiating a model for the strongly typed request body. If someone passed in an object for that field your code for the endpoint, including all of the database interaction, would never be called because the framework would automatically return a 400 response when data binding failed.

thinkharderdev · on March 6, 2022

I don't think it would actually be possible to use proper prepared statements for a "feature" like this. The whole point is that query is built dynamically from any JSON object. So you could, strictly speaking, build a prepared statement dynamically but you would have the exact same bug.

zamalek · on March 6, 2022

Nope. Primarily, this stance is hostile to international users. We have all the tools required to transparently move data from the front-end to the backend, and back to the UI. Furthermore, simply because my password begins with '{', contains '{', or is 4367 characters long shouldn't be a concern whatsoever.

Sanitization is an ignorantly dangerous practice, because it obscures much safer practices.

Validation? That's for business rules. No more order items than there are things in stock, and stuff like that. If you need to do that because something in your stack needs it, then it's far past the time to reconsider your stack.

ovex · on March 6, 2022

The problem is not only that types are not checked but that strings are simply replaced in structured input. Without having looked in detail, I would bet that the implementation also fails at queries which contain question marks inside strings, i.e. question marks that are not placeholders. Even if string escaping was the right way, which it is not, the proper way to do it would be to translate the SQL statement into an AST and then replace the leaves of the tree that are placeholders with the respective escaped strings.

the_gipsy · on March 6, 2022

> Perhaps more importantly, though, this bug is basically the exact type of bug that caused the log4shell fiasco: trying to be too "smart" with user provided values that should just be treated as "dumb" scalars.

Which is exactly the equivalent of using JSON.parse(). You will find more library combinations, and for many of them the options can't be considered bugs like you do for mysqljs here, and you definitely won't be able to catch it.

The solution is parsing beyond JSON.parse(). You can't have arbitrary user provided structures floating around your program.

ovex · on March 6, 2022

I'd argue that the root mistake is that the server-side feature of prepared statements through the client/server binary protocol is not used as described in [1]. Instead, the question marks are replaced using a dirty hack. The whole point of prepared statements is that the client/server protocol takes care of inserting the values into the statement instead of having to fiddle with functions like mysql_real_escape_string.

[1]: https://dev.mysql.com/doc/refman/8.0/en/sql-prepared-stateme...

evanelias · on March 6, 2022

This is actually not unusual though in practice. Large-scale applications quite frequently do client-side (application-side) handling of bind variables -- including many large websites/apps that you use, I can absolutely guarantee this from personal experience.

The reason is that using a real prepared statement in MySQL (as well as most other DBMS) requires an extra round-trip, which adds latency. It also potentially adds complexity if a proxy/middleware layer is in use, since prepared statements are per-connection, at least in MySQL.

The core problem in this specific case is that this js mysql client library simply did not implement client-side interpolation correctly or securely. This is quite bad; I'm not aware of a similar problem in any other major language's most popular mysql driver in recent years.

ovex · on March 6, 2022

Interesting. I would still insist on not doing it that way, especially when writing a library for universal use. First of all, string replacement on structured input is an immediate red flag. Second, even if you get handling the structured input right by parsing the statement into an AST and replacing the leaves that are placeholders with the escaped strings, there are still potential vulnerabilities. For example, mysql_real_escape_string is still vulnerable when being used with some exotic character sets [1].

I don't know the internals of the (binary) protocol used for communication with the MySQL server though. Couldn't one just save the extra round-trip with length-prefixed strings by sending the query together with the parameters in a single message?

[1]: http://www.gosecure.it/blog/art/483/sec/mysql_escape_string-...

bgrainger · on March 6, 2022

> Couldn't one just save the extra round-trip with length-prefixed strings by sending the query together with the parameters in a single message?

No, this wouldn't work, because you have to send COM_STMT_PREPARE (https://dev.mysql.com/doc/internals/en/com-stmt-prepare.html) first, which takes the SQL and returns a "statement ID". Then you can send COM_STMT_EXECUTE (https://dev.mysql.com/doc/internals/en/com-stmt-execute.html) which contains the statement ID and the parameters. Finally, you would ideally send COM_STMT_CLOSE (https://dev.mysql.com/doc/internals/en/com-stmt-close.html) to free the server-side resources for the prepared statement, although this could be "pipelined" with the EXECUTE packet.

evanelias · on March 6, 2022

> mysql_real_escape_string is still vulnerable when being used with some exotic character sets

Indeed -- mysql_real_escape_string "mostly" fixes this problem by requiring a db connection as one of its args. Since the driver is usually aware of the connection state, mysql_real_escape_string can check to see if one of those exotic charsets is in-use. The issue is that there are multiple ways to change the connection charset, some of which the driver is aware of (e.g. in PHP mysqli set_charset) but some it is not (running textual statements like SET NAMES or SET CHARACTER SET).

However, generally an attacker won't have the ability to set an arbitrary exotic character set for the connection anyway... unless they already have some other sql injection mechanism, in which case it's a moot point :)

Driver documentation also typically mentions this problem. For example, here's the doc for doing client-side param interpolation in the most popular MySQL driver for Golang: https://github.com/go-sql-driver/mysql#interpolateparams (see warning in italics)

It also explicitly detects if your initial connection settings attempt to use one of those charsets along with param interpolation, and throws an error if so: https://github.com/go-sql-driver/mysql/blob/21f789cd/dsn.go#...

> Couldn't one just save the extra round-trip with length-prefixed strings by sending the query together with the parameters in a single message?

AFAIK, no, not with the traditional MySQL binary protocol. The newer "X protocol" introduced in MySQL 5.7 does allow this, but it is not widely implemented in drivers.

nicoburns · on March 6, 2022

I actually think it’s fine to do string sanitising here. The string formats for the databases are well specified, and writing a function to that spec shouldn’t be difficult. I believe there’s no need to parse to AST for postgres as you can just wrap the string in quotes, replace any existing quotes in the string with two quotes and you’re done. If other databases have funkier string formats then you may have a point.

at least that’s true for Postgres, I can’t speak for MySQL.

vlovich123 · on March 6, 2022

I’m kind of shocked that there’s an extra round trip.

One would expect you can do the preparation lazily as there shouldn’t even be a hypothetical performance consideration for using prepared statements.

In other words you send an execute prepared statement request, you can provide the truly prepared statement OR the templated sql string. In the latter case the server would send you the results of preparing the statement before sending you the result of execution so that you can provide it on subsequent requests.

Or use an RPC system that supports promise pipelining so that you could send the prepare request and the execute is sent with the promise of the prepare request.

evanelias · on March 6, 2022

I think the core mistake in a lot of these older protocols is conflating the concept of a prepared statement with the concept of a parameterized query.

With a prepared statement, there's an implication that you want to re-use it multiple times, so the DB needs to give you back some sort of handle/identifier on it. But this requires bookkeeping on both the client and server side, and that overhead is often not worthwhile on modern CPUs, especially when running fairly simple CRUD-style queries which are quick to parse. (and at least in modern versions of MySQL, the DB will track a "digest" of the query which removes params anyway, even if you used client-side param interpolation!)

So ideally DB binary protocols should offer less expensive ways of doing parameters for "fire and forget" queries, instead of having to track real prepared statements, but often they don't.

MySQL's newer X Protocol provides a way to avoid the extra round-trip, but it isn't very widely used yet AFAIK, and it still involves prepared statement bookkeeping: https://dev.mysql.com/doc/internals/en/x-protocol-use-cases-...

vlovich123 · on March 6, 2022

Yes, you should also be able to do a stand-alone parametrized query. The only DB system I’ve seriously programmed against was SQLite where this isn’t an issue.

namibj · on March 6, 2022

PostgreSQL Extended Query Protocol won't cause the round trip issue; you can even send multiple parametrized queries back-to-back without any RTT penalties. This requires consideration in the API, though, because errors are delayed until the end of such a sequence, loosing attribution as to which individual query caused the error (at least at a low-level protocol level).

Not that this is a problem for a well-designed library, of course. E.g. tokio-postgres in Rust does it if you use a promise-combinator for the first poll of multiple queries inside the same transaction.

thinkharderdev · on March 6, 2022

The root mistake is that the actual feature makes using real prepared statements impossible. The whole point of a prepared statement is that you know the query AST ahead of time so you can just plug user input into the AST. I'm guessing since this seems to be intentionally designed this way, the goal is to allow a "parameterized query" like "SELECT * FROM foos WHERE ?" and I can pass { "foo": 1, "bar": 2 } to get "SELECT * FROM foos WHERE foo = 1 AND bar = 2"

tedunangst · on March 6, 2022

Soon: mysqljs.connection.real_query()

exabrial · on March 7, 2022

Looking at this code, this bug would could have been prevented if a statically typed language was used.

In addition to, actually using DB-level pre-compiled statements.

_bz2r · on March 6, 2022

When the API makes your query appear parameterized but it's not, then you may be the victim of attacks that work against non-parameterized queries.

joshuahaglund · on March 6, 2022

It seems to me that none of this would happen if they followed best practices, which (if I understand correctly) is to query for the record using the idendifier (username), make sure there is only one result, then use like some standard crypt library to compare the password to the hashed+salted password?

This just seems like bad code on bad code.

jbverschoor · on March 6, 2022

So it’s not actually parametrized

SonOfLilit · on March 6, 2022

As a security professional, I was horrified to find out that the maintainers don't consider this a security issue, though they did promise to take this seriously and change the API when they were made aware of it in 2014 (https://github.com/mysqljs/mysql/issues/731).

So I bumped an issue, noting this is all over HN, and offered to write a pull request for the API change proposed by the maintainers:

https://github.com/mysqljs/sqlstring/issues/60

Doug agreed to accept such a request, so I just sat down to figure out the code and a reasonable upgrade plan.

Three hours later, I could already write Doug this email (pasting it here because the issue and codebase are locked to non-contributors so I had to send it via email):

OK, I have a draft pull request ready. Of course, it's a big change and I expect to get a lot of feedback and have a few rounds of back and forth and fixups before it is accepted.

This is the plan as I envision it:

* Release SqlString 3.0.0 that has a new allowObjectValues parameter defaulting to false. This is a new major, so it shouldn't break anybody's code.

* Release mysqljs 2.19.0 (or should it be 2.18.3 for even higher adoption?) that depends on SqlString 3.0.0 but explicitly passes allowObjectValues on every call to it,with a default of true unless the user explicitly set it to false. This is a non-breaking change. This version will also add a deprecation warning whenever a ConnectionConfig is built without explicitly setting allowObjectValues, warning that the default will change in 3.0 and suggesting to set it to false unless it's needed and highlighting the need to typecheck values if it's set to true.

* Release mysqljs 3.0 that changes the default and removes the deprecation warning, so new projects get a sane default.

This involves, of course, changes to two repositories, so here they are (I can't open pull requests because I have not contributed in the past):

https://github.com/SonOfLilit/sqlstring https://github.com/SonOfLilit/mysql

(I didn't write the mysqljs3.0 patch yet to make pull request technicalities simpler, but it's trivial)

Again, I'm new to this project, am not a javascript developer in my day job, and I expect - and am prepared to handle - nontrivial amounts of feedback and requests for improvement.

For a brighter, safer future :-),

Aur