Hacker Newsnew | past | comments | ask | show | jobs | submit | meindnoch's commentslogin

Our CEO chiming in on a technical discussion between engineers: by the way, this is what Claude says: *some completely made-up bullshit*

I do want to counter that in the past before AI, the CEO would just chime in with some completely off the wall bullshit from a consultant.

Hi CEO, thanks for the input. Next time that we have a discussion, we will ask Claude instead of discussing with who wrote the offending code.

No no no, please don't try to shame them publicly, because they will rewrite it in SwiftUI and it's going to be even worse!

Looks badly AI generated.

Jamie, bring up their nationalities.


>you will get the same results using a simple matmul, because euclidean distance over normalized vectors is a linear transform of the cosine distance.

Squared euclidean distance of normalized vectors is an affine transform of their cosine similarity (the cosine of the angle between them).

  EuclideanDistance(x, y) = sqrt(dot(x - y, x - y)) = sqrt(dot(x, x) - 2dot(x, y) + dot(y, y)) = sqrt(2 - 2dot(x, y))

yes, you are right. I realized my mistake afterwards but it was after the edit window.

1. Create a point cloud from a scene (either via lidar, or via photogrammetry from multiple images)

2. Replace each point of the point cloud with a fuzzy ellipsoid, that has a bunch of parameters for its position + size + orientation + view-dependent color (via spherical harmonics up to some low order)

3. If you render these ellipsoids using a differentiable renderer, then you can subtract the resulting image from the ground truth (i.e. your original photos), and calculate the partial derivatives of the error with respect to each of the millions of ellipsoid parameters that you fed into the renderer.

4. Now you can run gradient descent using the differentiable renderer, which makes your fuzzy ellipsoids converge to something closely reproducing the ground truth images (from multiple angles).

5. Since the ellipsoids started at the 3D point cloud's positions, the 3D structure of the scene will likely be preserved during gradient descent, thus the resulting scene will support novel camera angles with plausible-looking results.


You... you must have been quite some 5 year old.

ELI5 has meant friendly simplified explanations (not responses aimed at literal five-year-olds) since forever, at least on the subreddit where the concept originated.

Now, perhaps referring to differentiability isn't layperson-accessible, but this is HN after all. I found it to be the perfect degree of simplification personally.


Some things would be literally impossible to properly explain to a 5 year old.

If one actually tried to explain to a five year old, they can use things like analogy, simile, metaphor, and other forms of rhetoric. This was just a straight-up technical explanation.

Lol. Def not for 5 year olds but it's about exactly what I needed

How about this:

Take a lot of pictures of a scene from different angles, do some crazy math, and then you can later pretend to zoom and pan the camera around however you want


sure, but does that explanation really help anyone. Imo it might scare people off actually diving into things, the math isn't too crazy.

Saying math (even using it in a dismissive tldr) is immensely helpful. Specifically, I've never encountered these terms before:

- point cloud - fuzzy ellipsoid - view-dependent color - spherical harmonics - low order - differentiable renderer (what makes it differentiable? A renderer creates images, right?) - subtract the resulting image from the ground truth (good to know this means your original photos, but how do you subtract images from images?) - millions of ellipsoid parameters (the explanation previously mentioned 4 parameters by name. Where are the millions coming from?) - gradient descent (I've heard of this in AI, but usually ignore it because I haven't gotten deep enough into it to need to understand what it means) - 3D point cloud's positions (are all point clouds 3d? The point cloud mentioned earlier wasn't. Or was it? Is this the same point cloud?)

In other words, you've explained this at far too high a level for me. Given that the request was for ELI5, I expected an explanation that I could actually follow, without knowing any specific terminology. Do disregard specifics and call it math. Don't just call it math and skip past it entirely: call it math and explain what you're actually doing with the math, rather than trying to explain the math you're doing; same for all the other words. If a technical term is only needed once in a conversation, then don't use it.

Given that I actually do know what photogrammetry is at a basic level, I can make a best-effort translation here, but it's purely from 100% guessing rather than actually understanding:

1. Create a 3d scan of a real-life scene or object. It uses radar (intentionally incorrect term, more familiar) or multiple photographs at different angles to see the 3 dimensional shape.

2. For some reason, break up the stapes into smaller shapes.

This is where my understanding goes to nearly 0:

3-5: somehow, looking at the difference between a rendering of your 3d scene and a picture of the actual scene allows you to correct the errors in the 3d scene to make it more realistic. Using complex math works better and having the computer do it is less effort than manually correcting the models in your 3d scene.


Anybody sufficiently interested would press further, not back away.

Thanks.

How hard is it to handle cases where the starting positions of ellipsoids in 3D is not correct (being too off). How common is such a scenario with the state of the art? E.g., if having only a stereoscopic image pair, the correspondences are often not accurate.

Thanks.


I assume that the differentiable renderer is only given its position and viewing angle at any one time (in order to be able to generalize to new viewing angles)?

Is it a fully connected NN?


No. There are no neural networks here. The renderer is just a function that takes a bunch of ellipsoid parameters and outputs a bunch of pixels. You render the scene, then subtract the ground truth pixels from the result, and sum the squared differences to get the total error. Then you ask the question "how would the error change if the X position of ellipsoid #1 was changed slightly?" (then repeat for all ellipsoid parameters, not just the X position, and all ellipsoids, not just ellipsoid #1). In other words, compute the partial derivative of the error with respect to each ellipsoid parameter. This gives you a gradient, that you can use to adjust the ellipsoids to decrease the error (i.e. get closer to the ground truth image).

Great explanation/simplification. Top quality contribution.

And what about the "mature enough" part? How has it changed / progressed recently?

The field is advancing rapidly. New research papers are being published daily for a few years now. The best news feed I've found on the topic is

https://radiancefields.com/

https://x.com/RadianceFields alt: https://xcancel.com/RadianceFields


Thanks for the explanation!

Or: Matrix bullet time with more viewpoints and less quality.


And especially the ulimit command mentioned, which is mostly unknown to folks nowadays, it seems.

Lol, no. When you buy "barrels" of oil on a commodity market, the barrel is a unit of volume (42 US gallons).

>4,000 tons is almost four million kilograms

It is exactly four million kilograms. (Germany uses the SI metric ton)


TIL there are two units of measurement that are both called ton but confusingly are not the same as a ton. One is a tiny bit more than a ton (1.016 tons) and one is a bit less (0.907 tons). Apparently people use the prefixes long and short to differentiate them, at least that part is intuitive.

Well, three, the two you mentioned and the metric ton (1000 kg)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: