Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Better training, larger models, and larger datasets, will lead to models that

Hypothetically, with enough information, one could predict the future (barring truly random events like radioactive decay). Generative AI is also constrained by economic forces - how much are GenAI companies willing to invest to get eyeball reflections right? Would they earn adequate revenue to cover the increase in costs to justify that feature? There are plenty of things that humanity can technically achieve, that don't get done because the incentives are not aligned- for instance, there is enough food grown to feed every human on earth and the technology to transport it, and yet we have hunger, malnutrition and famines.



> how much are GenAI companies willing to invest to get eyeball reflections right

This isn't how it works. As the models are improved, they learn more about reality largely on their own. Except for glaringly obvious problems (like hands, deformed limbs, etc) the improvements are really just giving the models techniques for more accurately replicating features from reasoning data. There's nobody that's like "today we're working on fingernails" or "today we're making hair physics work better": it's about making the model understand and replicate the features already present in the training dataset.


No, it’s a valid point, which I didn’t interpret as literally “we’re working on eyeballs today” but rather “we’re scaling up these imperfect methods to a trillion dollar GPU cluster”, the latter of which is genuinely something people talk about. The models will learn to mimic more and more of the long tail of the distribution of training data, which to us looks like an emergent understanding. So there’s a theoretical amount of data you could provide for them to memorize physical laws.

The issue is practical. There isn’t enough data out there to learn the long tail. If neural nets genuinely understood the world they would be getting 100% on ARC.


You don't need a trillion dollar GPU to accomplish this stuff. Models from two years ago look incredibly different than the ones today, not just because they're bigger but because they're more sophisticated. And the data from two years ago looks a lot like the data today: messy and often poorly annotated.

Even if we added no new compute and capped the resources used for inference and training, models would produce higher fidelity results over time due to improvements in the architecture.


I don't know where you get this opinion from as it doesn't match the landscape that I'm witnessing. Around the huge names are many companies and business units fine-tuning foundational models on their private datasets for the specific domains they are interested in. I can think of scenarios where someone is interested in training models to generate images with accurate reflections in specific settings.


> I don't know where you get this opinion from as it doesn't match the landscape that I'm witnessing.

I'm an engineer at an AI company that makes text/image to video models, but go off


I may be in agreement, and I was an idiot to misunderstand your comment and reply based on it.

I especially agree with the last sentence that models largely learn features in the dataset, but I don't understand why you would describe it as

> There's nobody that's like "today we're working on fingernails" or "today we're making hair physics work better"

If there were a business case for that, I would characterize curating a dataset of fingernails and fine-tuning or augmentind a model based on that as "today we're working on fingernails".

And the same too with eye reflections. So with the right dataset you can get eye reflections right, albeit in a limited domain. (E.g. deepfakes in a similar setting as the training data). In fact you can look at the community that sprung up around SD 1.5 (?) that fine tunes SD with relevant datasets to improve its abilities in exactly a "today we're going to improve its ability to produce these faces" kind of fashion.

Where did I misunderstand your comment? I seem to arrive at the completely opposite response from the same fact.

I also noticed that you say

> the improvements are really just giving the models techniques for more accurately replicating features from reasoning data.

You seem to refer to aspects of a model unrelated to dataset quality. But fine-tuning on a curated dataset may be sufficient and necessary for improving eye reflections and fingernails.


There could be edge cases, but fine tuning doesn't normally concentrate on a single specific feature. With positive and negative examples you could definitely train the eyes, but it's not what people usually do. Fine tuning is widely used to provide a specific style, clothes, or other larger scale elements.


> There could be edge cases, but fine tuning doesn't normally concentrate on a single specific feature.

Normally is being strained here: yes, most finetuning isn't for things like this, but quite a substantial minority is for more accurate rendering of some narrow specific feature, especially ones typically identified as signature problems of AI image gen; publicly identifying this as a way to visually distinguish AI gens makes it more likely that fine tuning effort will be directed at addressing it.


> This isn't how it works. As the models are improved, they learn more about reality largely on their own.

AI models aren't complete blackboxes to the people who develop them: there is careful thought behind the architecture, dataset selection and model evaluation. Assuming that you can take an existing model and simply throw more compute at it will automatically result in higher fidelity illumination modeling takes almost religious levels of faith. If moar hardware is all you need, Nvidia would have the best models in every category right now. Perhaps someone ought to write the sequel to Fred Brooks' book amd name it "The Mythical GPU-Cluster-Month".

FWIW, Google has AI-based illumination adjustment in Google Photos where one can add virtual lights - so specialized models for lighting already exist. However, I'm very cynical about a generic mixed model incidentally gaining those capabilities without specific training for it. When dealing with exponential requirements (training data, training time, GPUs, model weight size), you'll run out of resources in short order.


What you're refuting isn't what I said. I'm making the point that nobody is encoding all of the individual features of the human form and reality into their models through code or model design. You build a model by making it capable of observing details and then letting it observe the details of your training data. Nobody is spending time getting the reflections in the eyeballs working well, that comes as an emergent property of a model that's able to identify and replicate that. That doesn't mean it's a black box, it means that it's built in a general way so the researchers don't need to care about every facet of reality.


> If moar hardware is all you need, Nvidia would have the best models in every category right now.

Nvidia is making boatloads of money right now selling GPUs to companies that think they will be making boatloads of money in the future.

Nvidia has the better end of things at this very moment in time.


> Assuming that you can take an existing model and simply throw more compute at it will automatically result in higher fidelity illumination modeling takes almost religious levels of faith.

Seems an odd response to a poster who said “as the models are improved...”; the way the models are improved isn't just additional training to existing models, its updated model architectures.


Getting the eyeballs correct will correlate with other very useful improvements.

They won’t train a better model just for that reason. It will just happen along the way as they seek to broadly improve performance and usefulness.


I’m far from an expert on this, but these are often trained in conjunction with a model that recognizes deep fakes. Improving one will improve the other, and it’s an infinite recursion.


Popper disagrees

https://en.wikipedia.org/wiki/The_Poverty_of_Historicism

"Individual human action or reaction can never be predicted with certainty, therefore neither can the future"

See the death of Archduke Franz Ferdinand - perhaps could be predicted when it was known that he would go to Sarajevo. But before?

If you look at SciFi, some things have been predicted, but many -obvious things- haven't.

What if Trump had been killed?

And Kennedy?


I could see state actors being willing to invest to be able to make better propaganda or counter intelligence.


Yeah every person is constantly predicting the future, often even scarely accurately. I don't see how this is a hot take at all.


> how much are GenAI companies willing to invest to get eyeball reflections right?

Willing to? Probably not much. Should? A WHOLE LOT. It is the whole enchilada.

While this might not seem like a big issue and truthfully most people don't notice, getting this right (consistently) requires getting a lot more right. It doesn't require the model knowing physics (because every training sample face will have realistic lighting). But what underlines this issue is the model understanding subtleties. No model to date accomplishes this. From image generators to language generators (LLMs). There is a pareto efficiency issue here too. Remember that it is magnitudes easier to get a model to be "80% correct" than to be "90% correct".

But recall that the devil is in the details. We live in a complex world, and what that means is that the subtleties matter. The world is (mathematically) chaotic, so small things have big effects. You should start solving problems not worrying about these, but eventually you need to move into tackling these problems. If you don't, you'll just generate enshitification. In fact, I'd argue that the difference between an amateur and an expert is knowledge of subtleties and nuance. This is both why amateurs can trick themselves into thinking they're more expert than they are and why experts can recognize when talking to other experts (I remember a thread a while ago where many people were shocked about how most industries don't give tests or whiteboard problems when interviewing candidates and how hiring managers can identify good hires from bad ones).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: