Hacker Newsnew | past | comments | ask | show | jobs | submit | mikejulietbravo's commentslogin

nuances like idioms can and will be solved. wispr flow is already solving a lot of these things via their speech-to-text interface.

as better models are introduced, figurative language, implication, cultural nuancec etc. becomes easier to reconcile.


does the plugin architecture come with added overhead (assuming you add a lot of elements)?


If the plugins are in Rust and tie into Bevy, then there probably is minimal overhead. That's one of the advantages of ECS. It's designed for efficiently adding additional layers of functionality.


It depends on if the ECS needs to sync its data with external (non-ecs) systems.


was just glad this wasn't about twitter :)


Mike from Baseten here

We're super proud to support this work. If you're thinking of running deepseek in production, give us a shout!


We currently evaluate DeepSeek-R1 for our production system. We aren't done yet, but I think it's a match.


Awesome - we'd love to have our CEO/CTO chat with you and your team if you're interested. Shoot me a note at mike.bilodeau @ baseten.co and I'll make it happen!


Earlier today I read a reddit comment[1] about a guy who tried running the quantized version from unsloth[2] on 4xH100 and the results was underwhelming (it ended up costing $137 per 1 million tokens).

Any idea of what they're doing wrong?

[1]: https://www.reddit.com/r/LocalLLaMA/comments/1icphqa/how_to_...

[2]: https://unsloth.ai/blog/deepseekr1-dynamic


They're using Llama.cpp which is an amazing tool for local inference but doesn't match fast inference frameworks like TensorRT-LLM/SGLang for production speeds and throughputs on Hopper GPUs.

The Unsloth quantizations are really cool, but if you want to experiment with the R1 models in a smaller form factor the R1 Distills like Llama 70B are great and should run a lot faster as they take advantage of existing optimizations around inferencing llama-architecture models.


> They're using Llama.cpp which is an amazing tool for local inference but doesn't match fast inference frameworks like TensorRT-LLM/SGLang for production speeds and throughputs on Hopper GPUs.

That's something I thought about, but it wouldn't explain much, as they are roughly two orders of magnitude off in terms of cost, only a small fraction of which could be explain by performance of the inference engine.

> The Unsloth quantizations are really cool, but if you want to experiment with the R1 models in a smaller form factor the R1 Distills like Llama 70B are great and should run a lot faster as they take advantage of existing optimizations around inferencing llama-architecture models.

What kind of optimization do you have in mind? Because Deepseek having only 37B active parameters, which means ~12GB at this level of quantization, means inference ought to be much faster that a dense 70B model, especially unquantized, no? The Llama 70B distill would benefit from speculative decoding though, but it shouldn't be enough to compensate. So I'm really curious about what kind of llama-specific optimizations, and how much speed up you think they'd bring.


I’m not an expert on at-scale inference, but they surely can’t have been running at a batch size of more than 1 if they were getting performance that bad on 4xH100… and I’m not even sure how they were getting performance that low even at batch size 1. Batching is essential to serving large token volumes at scale.

As the comments on reddit said, those numbers don’t make sense.


> I’m not an expert on at-scale inference, but they surely can’t have been running at a batch size of more than 1 if they were getting performance that bad on 4xH100… and I’m not even sure how they were getting performance that low even at batch size 1. Batching is essential to serving large token volumes at scale.

That was my first though as well, but from a quick search it looks like Llama.cpp has a default batch size that's quite high (like 256 or 512 I don't remember exactly, which I find surprising for something that's mostly used by local users) so it shouldn't be the issue.

> As the comments on reddit said, those numbers don’t make sense.

Absolutely, hence my question!


Sure, but that default batch size would only matter if the person in question was actually generating and measuring parallel requests, not just measuring the straight line performance of sequential requests... and I have no confidence they were.


Can you share at a high level how you run this model?

We know it’s 671B params with each MOE node at 37B…

If the GPUs have say, 140GB for an H200, then do you just load up as many nodes as will fit into a GPU?

How much do interconnects hurt performance vs being able to load the model into a single GPU?


Yeah so MoE doesn't really come into play for production serving -- once you are batching your requests you hit every expert at a large enough batch size so you have to think about running the models as a whole.

There are two ways we can run it:

- 8xH200 GPU == 8x141GB == 1128 GB VRAM

- 16xH100 GPU == 8x80GB == 1280 GB VRAM

Within a single node (up to 8 GPUs) you don't see any meaningful hit from GPU-to-GPU communication.

More than that (e.g. 16xH100) requires multi-node inference which very few places have solved at a production-ready level, but it's massive because there are way more H100s out there than H200s.


> Yeah so MoE doesn't really come into play for production serving -- once you are batching your requests you hit every expert at a large enough batch size

In their V3 paper DeepSeek talk about having redundant copies of some "experts" when deploying with expert parallelism in order to account for the different amounts of load they get. I imagine it only makes a difference at very high loads, but I thought it was a pretty interesting technique.


The start of this reads like the beginning of a cult manifesto, but then transitions to a very logical solution for an important problem.

Baffling. I'm in


It keeps bouncing between too real and too absurd.

Absurd: Not enough power on our sailboat to run Ableton and Photoshop.

Real: So we replaced it with open source technology.

Absurd: That technology was based on Electron.

Real: Electron was too bloated.

Absurd: So we ported everything over to the NES.

Real: And now you can run our software anywhere you can emulate an NES


everything about it reinforces the feeling that it's all just retroactive justification for finding a toy they made more fun than expected

ETA: to be clear there's nothing wrong with making a toy and then turning that toy into it's own all-consuming hobby (TTRPGs for example) and one of the best parts of programming is how easy it is to do just that. It's just kind of annoying watching people wax rhapsodic about nonsense instead of copping to "yeah we're having a lot of fun, i feel like a kid again"


fwiw they actually live on a sailboat and have sporadic internet access and limited electricity, so saying it's retroactive justification isn't really true and minimizes the real problems they face.


The problem is that none of their problems are real problems and there's nothing to minimize when they're not real. You cannot minimize made up first world problems


No, they really do live on a sailboat with intermittent power and internet access. Unless you take "made up" to mean "as a result of their choices" these are real problems, and ironically enough not problems faced by most people in the first world.

https://checkpointgaming.net/features/2020/05/making-games-a...


> Unless you take "made up" to mean "as a result of their choices"

Not the original poster, but that’s my view exactly. If you impose the limitations upon yourself then it’s not really a “problem” for you, is it now. You just can afford to make your life shittier for an “experience” to then have fun solving the issues you’ve created for yourself


Then say "constraints" if it feels better. To me, this conversation comes off as much more of a manufactured problem than idealistic people living on a boat and figuring out how to make tech work for them.

Edit: However, upon reading further comments, I don't want this to be seen as a defense of the group against actual complaints.


One of the (many) fascinating things here is that - even if by virtue of their 'self-imposed' stringencies - their output showcases production values that are very applicable throughout.-


Problems created by lifestyle choices are still real problems.


> , so saying it's retroactive justification isn't really true and minimizes the real problems they face.

I wouldn't call any of the listed problems "real problems" in the context of my long winded disability and homelessness lmao. I used to be in their community, the mods, and indirectly, them, were abusive as hell. Their community is, last I heard, hemorrhaging queer folk (or maybe it's bled dry and queer folk just don't stick around there anymore!) because they have repeatedly shielded abusive members and placed them in positions of power, and ignored, silenced, and ejected their victims when they finally kicked up a fuss about it. Part of the move from an internal chat to Mastodon was specifically so it would take the pressure off them having to actively perform any sort of moderation duty or deal with the abusive people directly.

They are, fundamentally, rich people playing at being poor and living in a tiny sustainable island while the rest of the world burns. Their stuff is very interesting, sure, but stating "real problems they face" ignores the fact that every one of the problems they are facing are ones that they themselves have created. I actually really love some of the things they've come out with, but it's important that all of their work comes with the context that it was formed in, at least in my opinion.

edit: I forgot about the 'cult' thing... they are absolutely a cult. at least one of their members made explicit reference multiple times to being part of a cult and it was never actively denied outside of a "well, not yet, we don't have the numbers ;)" kind of thing.


Wow, you're the first person I've seen speak up about having similar experiences with them as me, thank you. I was a merveilles member some years back until I had some really rude/abusive interactions in IRC from Devine and a prominent moderator. I really would love to play with uxn and varvara but gosh I simply refuse to be around people like that.


Honestly, adding your voice here is incredibly kind; and likewise, I'm so grateful to hear of another with this sort of experience.

Their design sensibilities are very good, and I feel exactly the same -- it just... doesn't sit right, feels bitter, somehow, to create things with their tools, in the full context of everything.

I've often mulled over starting up a little group sharing some of the same sensibilities but without the toxicity, to be honest.


If the single voice of just some random, well meaning guy on the internet helps: Go ahead and get going. We need more "groups", projects, efforts, initiatives, approaches, not less. Go for it.-


Thanks for writing this. It matches my experience 100%. I just signed up to comment because I know people will desperately want it to not be true but there are plenty of us ex-mervilles folk out there who've experienced the cult element and abuse, we just don't talk about it.


Is this the right forum for accusations lacking evidence? We appear to be very reluctant about it, if it’s someone like Sam Altman, but it’s just fine for random developers?


Do you have anything I can review to see for myself? This is the first I've heard of any of this.


I have logs of some interactions stored somewhere, but they're very patchy and stored in plain text. They also contain personal interactions between server members, so I would not feel comfortable releasing them (I also lack any way to get in touch to obtain consent for releasing the logs!)

I do not have logs of direct messages because it escaped my intention -- while I planned to get them, that never happened. At the time, I was lied to and told I would be able to return, and then 3 months later I was informed I was not going to be able to return to the space. They also did not inform anyone that I was leaving, either. I had long friendships with many in that slack instance, and not only would they not know where to find me, but none of them were informed that I had even left -- as far as any of them know, I ghosted them. There was absolutely zero transparency of moderation both at the time, and as far as I am aware, to this day.

Something I forgot to mention in the above is that at the time they had a code of conduct, and this code of conduct listed a two strike system, along with a resolution system. Neither of these were followed in any capacity (likely because they didn't exist), and there was never any communication by the moderators that I had had strikes raised against me.


I'm very surprised to read this, considering both authors of the linked article use they/them pronouns.


The thing to understand about minorities, the disabled, queer and alphabet folk is that they are human beings just like everyody else.

Ergo: some of them are actual arseholes.

Oscar Pistorius was an abusive murdering douchebag, not just a brave para olympic gold medal winning runner.


Indeed. And even within a group that shares some core identity across one axis (e.g., queer people), the usual fraught hierarchies have a way of establishing themselves—unless you really make a point of preventing that from happening.

The ones who are wealthy will hold relative power over the ones who aren't. The ones in good health may neglect or actively exclude the ones who aren't. Racism and xenophobia rear their ugly heads. And so on.


Shieet, glad to know all it takes to be a doubleplus good person in Current Year is using an approved pronoun. Makes everything much easier.


No that's not what they mean. They mean you'd kind of assume someone who identifies as queer, or is at least knowledgeable enough on the community to participate in some ways, wouldn't be homophobic.

In practice this isn't the case, because you can use this as a shield. So for homophobic people it might be advantageous to enter the community in a way that causes the least amount of personal friction. Like, simply putting pronouns in your bio and doing literally nothing else is trivial - but the social benefit is not.

It's a big problem, because people who ARE non-binary or ARE bisexual or whatever then get a ton of backlash. Because those identities are the most common to be commandeered, so to speak. At least online.


The problem with identifying the goodness of people by their use of pronouns is that, surprise surprise, empty words good person does not make


Yes, well, that's not what anyone is doing. Here is the logic that caused surprise:

1. Leaders identify with nonbinary pronouns,

2. thus: leaders appear to be members of the/a queer community,

3. and: queer community members tend to center queer people/experiences (regardless of whether said members are shitty people for any reason),

4. yet: the leaders are specifically harming and driving out queer members of their community. This is unexpected. Not "wow, this should be impossible" unexpected, just "damn, this shouldn't have happened" unexpected.

It's quite simple and straightforward.

As an aside, (and I know I'll get downvoted for my tone, but it is what it is), for ye straight commenters: consider that your opinions on queerness and queer community dynamics probably aren't very well informed when you're entering conversations about them. (Inspired by but not personally attacking the parent comment. They might be queer too! And their statement is true, it's just off the mark in this context.)


You may be right about all that. Sad to hear, but not altogether unexpected.

I'll just point out though that most problems of the world are ones we ourselves have created.


UXN/Varvava don't do anything about relieving those pain points. WRT electricity it actually adds to the pain.


I mean its all quite obviously a larp but it's what makes their work interesting.


They could solve these problems by not living on a sailboat.


There are solutions you want, and solutions you dont want.

Every personal problem has at least one easy solution. better ones take more effort.


It's a bit like saying: People climbing a mountain can solve their mountain-climbing problems by not climbing mountings.

Also not unlike: It's not the destination, it's the journey.


It's a bit like saying that having to climb mountains is a problem when you choose to be a mountain climber.


Living on a sailboat approaches some very very hard life/existential pinnacles that most people never even attempt to climb.

Yeah, you can have a simple regular life; that's lower on problems maybe. But man, sailing around & futzing with interesting barefoot developers projects sure sounds challenging in a lot of very very excellent ways.


Satellite internet is expensive, let’s all move down town! Housing in the city is expensive, let’s all move to sailboats! So you see at some point you have to address difficulties with some kind of approach besides avoiding them


"We choose to make this video game and do the boat life thing, not because they are easy, but because they are hard. Because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win!"


Nice JFK you pulled there.-


Living in a boat is not hard.


Sure, if you live on a boathouse on a British river in front of the supermarket


Having spent a pandemic locked down in a boat, I beg to differ.


Spoken like someone who has never lived on a boat before.

Signed, someone who has lived on a boat before.


Spoken like someone who has never lived on a boat before.

Or whose boat came equipped with casinos, an Olympic swimming pool, Michelin-starred restaurants, and somebody else footing the bill.


Funny, I was about to say the same thing about most "modern" tech.


Exactly my feeling.-

PS. Which leads me - tangentially - to think that (maybe) the solution to (at least) some of our problems might someday be found in a cult :)

Who knows ...

> Absurd: So we ported everything over to the NES.

This was grand. The NES as a most effective "baseline" platform. Can totally see humanity sending out an NES emulator on Voyager VI as a last gasp.-


This is now my headcanon for why the UI of super advanced computers in 80’s sci-fi movies looks the way it does.


Good call :)

"8 bit 'looks' and hardware constitutes - and 'looks like' - some optimum as far as computing is concerned"

... so sufficiently advanced systems will look like it to interface with us as a sort of lingua franca.-

AGIs. Alien probes. The works. They will all look to us like a C64 or NES would :)


It's like how you can say that VT100 emulation has an expiration date, but you can't say that about the underlying concept of some UI based on a screenful of monospaced text, which is immortal.


> PS. Which leads me - tangentially - to think that (maybe) the solution to (at least) some of our problems might someday be found in a cult :)

The major religions have been beating that dead horse for a long time.


And then resurrecting it.


(I See what you did here :)


I apologize for an off-topic question, but I'm curious why you choose to write "." as ".-". Is it an internet convention I'm unaware of, or maybe punctuation from a language other than English?


No problem, thanks.-

Please, vid.:

- https://news.ycombinator.com/item?id=40989221


LessWrong had some pretty good advice in the early months of the pandemic, despite their terrible track record on politics and AI. There's a lot right with the Amish. You could write an entire book about the Rocky Horror Picture Show. Cults can have a lot to offer.


> Cults can have a lot to offer.

... in no small part perhaps because they remain isolated "pockets" of culture where - often - "progress" is slower or more controlled. Where idiosyncratic behavior becomes the "new" orthodoxy as behavior or culture "degrades".-

Where was it ... "Nightfall" (the novel) I think it was where a cult periodically saves civilization - by being the only ones that know how to handle the aftermath.-


cults are generally the only way to solve deep-rooted problems. otherwise people's habits are too strong and they keep reproducing the existing traditions that create the problems through unexamined avenues

technically varvara isn't actually the nes


> cults are generally the only way to solve deep-rooted problems.

Now that's an interesting proposition (which, I do not contend mind you ...)


The real question stands still - how can I join your cult?


Seconded.-

Make it a thousand rabbits. Make it a flotilla. Make it an armada ... :)


I think the general thesis statement is, "there are very few things we do today that couldn't have been done on older hardware".


Which, holds (?)

PS. Except for AI, perhaps ...

... I was going to add certain forms of cryptography to that, but then realized that we've always have had some sort of cryptography that was "hardware-appropriate" (ie. sufficiently hard to break, to be useful) for the age. So older hardware was just fine ...


Any crypto you did couldn't be future-proof in the way it is today though. Don't know if that's mainly due to better algorithms or from the fact modern CPUs are optimized to rapidly decrypt/encrypt things.


It was algorithms. Back in the 90s there was no AES or ECC. There was RSA, and it was feasible to generate long keys, but it was impractical. Keys from back then could probably be easily factored nowadays. I think the spread of the Internet pushed demand for longer keys and better (more secure and efficient) algorithms.


Just because I was there (I agree with your general point) I wanted to say that I made my first PGP key in 1995 and it was a 4096 byte one, which is just as uncrackable now as it was then. I even remember being vaguely confused, because it gave you options, and I was thinking to myself "wut. Who wants the weaker-than-necessary key. I'll take the big one, thx"


Interesting. How long did it take to sign? Also, though I wasn't sure (which is why I didn't mention it), I thought one of the reasons keys were so short back then was due to the US classifying encryption algorithms as munitions, which made working with actually secure encryption standards difficult for developers. I would have expected the longest key would be 1024 bits, at a stretch. Even that is barely crackable today.


(I always thought the smaller keys options were there to accommodate much lower-end hardware or limited resources - ie. embedded systems ...)


Neural nets using individual tubes as nodes? Although the current trend seems to be quantizing down to a minimal amount of bits to process more in parallel, in an analogue system you could have a near "continuous" range of values.


    Chat rooms and bare bones text editors aren't supposed to be process-heavy, and yet the popular communication platform Slack requires outrageous amounts of ram and CPU to function. [...] Making software this way is costly to off-grid users, or those on slow connections, [...]
So true.


Slack had a good solution in the form of an IRC bridge but of course they killed it.


Yep. When you're small, cooperate, when you're big, kick everyone else out


Moat, then drawbridge removal.-


Embrace, Expand, Extinguish


It could be worse. One word: Urbit.

What the boat couple is doing strikes me as the most romantic sort of bricolage and just gives me the warm fuzzies all over. But Urbit just pisses me off for a variety of reasons.


I think you need to have the right mix of the absurd when you try to make something interesting.


I think they used Krita first. But UXN isn't restricted to small res art/screens. Look at oquonie.


The Baffler was a favorite read of mine in the early 90s.

https://thebaffler.com/


I’m glad I’d still around — I’m a happy subscriber


For me any potential technical argument and innovation is completely drowned in the needlessly pervasive anti-capitalist genderfluid digital nomad hippie talk.


Sorry? Couldn’t hear you over the unnecessarily-inserted alt-right knee-jerk anti-wokism.


I'm sorry you think me not acknowledging or caring about your made-up social minority makes me some sort of political activist.

What I care about is technology, and you have to dig quite hard to extract it here.


Fascinating that you seem to think that taking the time and energy to write a trollish shitpost about your offense at someone’s use of pronouns is somehow not acknowledging or caring about that someone’s use of pronouns.

I don’t think you’re an activist, I just think you’re yet another someone who is unable to see the Amazon forest for the chip on your shoulder.


What's the tl;dr on a difference from this to SD?


tl;dr better quality even with the least powerful model and can be much faster


This is such a wild undertaking. I love it


This is the biggest point. You can simply not use them. Cook your own food or walk or drive or bike to a restaurant.

There is a massive market of lazy AF people who also are terrible at personal financial management, and they are the ones complaining.


I hear this a lot but I don't think the "lazy AF people who also are terrible at personal financial management" is really fair.

I order delivery through the apps a ton, but it's not because I'm bad at financial management. Like many here, I have a well paying job that takes up a ton of my time and energy during the week, and cooking every night is simply not feasible. Takeout is expensive, sure, but if it enables me to hold down a job that let's me afford it, it's a conscious trade-off that I can make.

Could I spend more time meal-prepping and freeze meals instead? Sure. But again, time and energy.


There's also seemingly an army of judgy AF people just salivating at the chance to call other people lazy.


Taxes are not the answer, let the free market figure it out


…which seems to be going poorly?


I'm kind of surprised they haven't already built this into the LD core


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: