Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been looking at Iceberg for a while, but in the end went with Delta Lake because it doesn't have a dependency on a catalog. It also has good support for reading and writing from it without needing Spark.

Does anyone know if Iceberg has plans to support similar use cases?



Iceberg has the hdfs catalog, which also relies only on dirs and files.

That said, a catalog (which Delta also can have) helps a lot to keep things tidy. For example, I can write a dataset with Spark, transform it with dbt and a query engine (such as Trino) and consume the resulting dataset with any client that supports Iceberg. If I use a catalog, all happens without having to register the dataset location in each of these components.


Why don't you want a catalog? The SQL or REST catalogs are pretty light to set up. I have my eye on lakekeeper[0], but Polaris (from Snowflake) is a good option too.

PyIceberg is likely the easiest way to write without Spark.

0 - https://github.com/lakekeeper/lakekeeper


We did an evaluation of various REST catalog options and went with Open Catalog from Snowflake (a Polaris-based managed service that works independently from their data warehousing solution). Lakekeeper is nice - it's one of the few catalogs with FGAC and table maintenance.

https://tower.dev/blog/picking-snowflake-open-catalog-as-a-m...


PyIceberg is nice but we had to drop it because it's behind Java API and it's unclear when it will match up, so depending on which features are needed I'd look it up


what are you using instead?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: