This is a big deal in the database world as delta, iceberg and hudi mean that da...

pjot · on Dec 30, 2023

Apache Arrow and Substrait have been working towards making this a reality. I see a future where executing a query can/will send plans to many different engines distributed across the cloud, but also locally on your on machine.

falcor84 · on Dec 30, 2023

Real-time Bidding on query execution? The more I think about it, I believe you actually have a viable business model here.

FridgeSeal · on Dec 30, 2023

That’s a wildly interesting idea.

It open up another market too: compatible, scalable storage. Sell shovels in a gold-rush, and what better shovel than the substrate infrastructure that those bidding query engines would probably depend on.

yodon · on Dec 30, 2023

If the queries can be executed by any provider, you are talking about a commodity product.

The business model of selling a commodity is wildly unlike the business model tech is in today.

FridgeSeal · on Dec 30, 2023

The query execution might be commodity, but the purchasers will still need to store their data somewhere, and this somewhere will need to be able to service the bandwidth and requirements of the query execution providers.

fifilura · on Dec 30, 2023

It feels like you could just as well pack the runtime/engine into the job you are requesting? Am I wrong?

pjot · on Dec 30, 2023

The point is more so in aim of creating interoperability between systems and making them in turn composable.

When there’s a common intermediate representation you can pass around those compute instructions and execute. And when there’s shared memory formats data can pass from storage to engine without serialization/deserialization.

So it wouldn’t matter if data is here or there, in this or that format, because the instructions are the same the specific interface (snowflake, MySQL, a local parquet file, etc) is irrelevant mitigating the need for glue code.

apwell23 · on Dec 30, 2023

> " every database vendor will be forced by the market to optimise for performance such that they tend towards the performance of natively ingested data."

This assumes that their internal storage format has nothing to do with decades of engineering infrastructure that they built their business model around and that they would simply give all that up and compete based on just their compute layer. snowflake might as well shutup shop and return billions to the investors. Locking in data into their ecosystem is their whole business model.

Is there as good example of open standard forcing companies to give up their proprietary tech ?

nostrademons · on Dec 30, 2023

That's the natural evolution of most tech markets. When the tech is young, proprietary companies dominate because they can control the customer experience better and deliver functionality that is simply too complex for open solutions. As the technology matures, customers start demanding interoperability, reliability, better prices, and eventually some employees "defect" from one of the big companies and start the open standards that replace their ex-employer, or an outsider reads a paper and re-implements the technology from scratch.

> Is there as good example of open standard forcing companies to give up their proprietary tech ?

UNIX -> Linux, BSD

Oracle/Sybase -> MySQL/PostgresQL

Symbolics/Lucid -> Common Lisp

Altair/Apple/Commodore/Atari -> IBM PC & clones

VMWare -> QEMU

Basically every tech that Google pioneered and then missed out on commercializing. Protobufs -> Avro/Parquet, MapReduce -> Hadoop, Flume -> Spark, Chubby -> Zookeeper, Borg -> Kubernetes, etc.

chimerasaurus · on Dec 30, 2023

I’ll just point out on the Snowflake side, we’ve been very public saying we want Iceberg/Parquet to be at or as close to parity as possible with our native format. The value add is the platform, not lock in. That also forces us to be the best on open formats, which IMO is also a good thing for everyone.

Disclaimer: I work at Snowflake literally on this with my team. :)

apwell23 · on Dec 30, 2023

> we’ve been very public saying we want Iceberg/Parquet to be at or as close to parity as possible with our native format

Thats great to hear. Would this mean that external iceberg tables would have the same performance as native table ? My impression of parent comment was that, eventually there would be no such thing as 'native format'. Really interested to see public statements by snowflake to that effect, would love to share that with my team.

FridgeSeal · on Dec 30, 2023

> snowflake might as well shutup shop and return billions to the investors.

I mean, we can dream right?

There’s a bunch of companies that I don’t believe deserve their status or valuation and Snowflake is one of them.

Pasivae · on Dec 30, 2023

[flagged]

zbentley · on Dec 30, 2023

...gpt? Or ... why would you summarize an adjacent thread? Am I missing a joke?