a) Scala being a JVM language is one of the fastest around. Much faster than say...

rr808 · 2025-05-22T08:43:58 1747903438

1) Yes Scala and JVM is fast. If we could just use that to clean up a feed on a single box that would be great. The problem is calling the Spark API creates a lot of complexity for developers and runtime platform which is super slow. 2) Yes for the few feeds that are a TB we need spark. The platform really just loads from hadoop transforms then saves back again.

threeseed · 2025-05-22T09:09:47 1747904987

a) You can easily run Spark jobs on a single box. Just set executors = 1.

b) The reason centralised clusters exist is because you can't have dozens/hundreds of data engineers/scientists all copying company data onto their laptop, causing support headaches because they can't install X library and making productionising impossible. There are bigger concerns than your personal productivity.

rr808 · 2025-05-22T09:12:29 1747905149

> a) You can easily run Spark jobs on a single box. Just set executors = 1.

Sure but why would you do this? Just using pandas or duckdb or even bash scripts makes your life is much easier than having to deal with Spark.

cgio · 2025-05-22T09:36:05 1747906565

For when you need more executors without rewriting your logic.

this_user · 2025-05-22T11:05:14 1747911914

Using a Python solution like Dask might actually be better, because you can work with all of the Python data frameworks and tools, but you can also easily scale it if you need it without having to step into the Spark world.

threeseed · 2025-05-23T19:35:34 1748028934

But Dask is orders of magnitude slower to Spark.

And you can still use Python data frameworks with Spark so not sure what you're getting.

rpier001 · 2025-05-22T10:09:01 1747908541

Re: b. This is a place where remote standard dev environments are a boon. I'm not going to give each dev a terabyte of RAM, but a terabyte to share with a reservation mechanism understanding that contention for the full resource is low? Yes, please.

Larrikin · 2025-05-22T10:50:30 1747911030

But can you justify Scala existing at all in 2025. I think it pushed boundaries but ultimately failed as a language worth adoption.l anymore.

threeseed · 2025-05-22T21:31:14 1747949474

Absolutely.

a) One of the only languages you can write your entire app in Scala i.e. it supports compiling to Javascript, JVM and LLVM.

b) It has the only formally proven type system of any language.

c) It is the innovation language. Many of the concepts that are now standard in other languages had their implementation borrowed from Scala. And it is continuing to innovate with libraries like Gears (https://github.com/lampepfl/gears) which does async without colouring and compiler additions like resource capabilities.

mountainriver · 2025-05-22T23:14:49 1747955689

I’m sorry but these are extremely weak arguments, and I would contend scale caused more harm than good in all

tsss · 2025-05-23T08:03:49 1747987429

Scala is still one of the most powerful languages out there.

tomrod · 2025-05-22T13:13:14 1747919594

PySpark is a wrapper, so Scala is unnecessary and boggy.

spark1377485 · 2025-05-22T17:36:04 1747935364

PySpark is great, except for UDF performance. This gap means that Scala is helpful for some Spark edge cases like column-level encryption/decryption with UDF