Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No, I'm telling you, you're assuming way too much. The problems are lower-level

> stereoscopic cameras that map to a 3d world

The current state of the art for this is completely atrocious.

Take a look at this very recent research: https://makezur.github.io/SuperPrimitive/

The idea that robots can "understand" the 3D world from vision is, right now, completely illusory.



I basically agree, I dont think you understood my comment.

If you look at transformers in llm, you have a input matrix, some math in the middle (all linear), and an output matrix. If you take a single value of the output matrix, and write the algebraic expression for it, you will get something that looks like a linear layer transformation on the input.

So a transformer is simply a more efficient simplification of n connected layers, and thus is faster to train. But its not applicable to all things.

For the following examples, lets say you hypothetically had cheap power with good infrastructure to deliver it, and A100s that cost a dollar each, and same budget as OpenAI.

First, you could train GPT models as just a shitload of fully connected, massively wide deep layers.

Secondly, you could also do 3d mapping quite easily with fully connected deep layers.

First you would train a Observer model to take an image from 2 cameras and reconstruct a virtual 3d scene with an autoencoder/decoder. Probably through generating photorealistic images with raytracing.

Then you would train a Predictor model to predict the physics in that 3d scene given a set of historical frames. Since compute is so cheap, you just have rng initialization of initial conditions with velocities and accelerations, and just run training until the huge model converges.

Then you would train a Controller model to move a robotic arm, with input being the start and final orientation, and output being the motion.

Then hook them all together. For every cycle in the robot controller, Controller sends commands to move along a path, robot moves, Observer computes the 3d scene, history of scenes is fed to Predictor that generates future position, which gets some error, and controller adjusts accordingly.

My point is, until we reach that point with power and hardware, there have to be these simplification discoveries like the transformer made along the way. One of which is how to one shot or few shot adjust parameters for a set of new data. If we can do that, we can basically fine tune shitty models on specific data quite fast to make it behave well in a very limited data set.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: