Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Solving Entry-Level Edge AI Challenges with Nvidia Jetson Orin Nano (nvidia.com)
65 points by homarp on Sept 21, 2022 | hide | past | favorite | 24 comments


Oh wow, this is a mixed bag.

First off, NVENC is gone. Instead of 4x 1080p @ 30 (HEVC) that the Jetson Nano could do with GPU acceleration, this one can barely manage 2x 1080p @ 30. Apparently, too many people (myself included) used the Jetson Nano as a cheap camera recorder.

Next, price went from $99 for a Jetson Nano to $299 for Jetson Orin Nano.

And lastly, they now advertise 40 TOPS for Jetson Orin Nano but that's measured at 50% sparsity and for INT8... So it is in no way comparable to the 0.5 TFLOPS that a Jetson Nano has, which was measured at dense FP16. And the fact that I can't find any FP16 number for any Jetson Orin device is not a good look.

In summary, this looks to me more like a TPU than a GPU, because they moved from FP16 to INT8 precision. That makes this much more similar to the mini PCIe Coral priced at $25 delivering 4 TOPS dense. Compared to that, Jetson Orin Nano is 5x faster and 12x more expensive.

EDIT: In case you're wondering why I care about INT8 vs. FP16: The Jetson Nano can autonomously collect data and then calculate gradients and fine-tune the AI model. If the Jetson Orin Nano lacks FP16, it won't be able to do gradients and learning. So going from FP16 to INT8 is a downgrade from having AI training capabilities to pure execution of pre-trained AI models.


Orin has support for FP16 of course, it would be insane not to have it when INT8 isn't viable for a lot of models anyway. There are some preliminary benchmarks that can confirm this that weren't hard to find.[1] The number juicing by way of using "TOPS" to mean "Sparse INT8" or whatever has been going on since the beginning of the Ampere line, though, so yes you should take that with a grain of salt (Ampere being the gen they added sparse tensor cores.) But not including FP16 support would just be a ridiculous move, frankly.

According to this table on this page[2], the lack of the NVENC support is due to the "Nano" vs "NX" product lines which they are introducing. NX started with Xavier, there was never a Xavier Nano, to my knowledge (since you could just buy the original Jetson Nano.) Xavier NX still has NVENC support, and now you can buy separate Orin NX modules with even more RAM. So they're just segregating their products further, it looks like. (Hit "Specifications" then expand the table and look at Video Encode/Decode support.)

So for people like us, we just want the Orin NX series. But yes, no more ultra cheap cameras.

I'm guessing they're eventually going to gear for Orin as an eventual successor to the original Nano, but who knows. It'll take a few price cuts to get there.

[1] https://www.edge-ai-vision.com/2022/04/is-the-new-nvidia-jet...

[2] https://www.nvidia.com/en-us/autonomous-machines/embedded-sy...


> the lack of the NVENC support is due to the "Nano" vs "NX" product lines

Translated into plain English: "we're crimping the regular product because we want to extract more cash from consumers: we didn't like that they enjoyed so much this hardware encoding support. Let's make them cough up a few extra hundreds by cutting it out!"


Your link [1] doesn't seem to mention the Orin Nano anywhere? It says that the bigger SOMs with DLA have FP16.

But the Orin Nano doesn't have DLA.


Out of curiosity, what use cases require training at the edge?


In my example, it is very helpful if you can adjust a speech model a bit to better match the user's dialect. So you don't train the full model, rather you just adjust the most coarse 5% of parameters in a pre-trained model by a tiny bit. But that can lead to great quality improvements at almost no additional cost.


I'm no expert in ML, one use case is that if you want to refine the already trained model that was trained elsewhere.


Additionally, privacy-preserving AI.


On the flip side, 6xA78 cores with decent range of IO (7 PCIe 3 lanes for example) is quite nice. There aren't many SoCs that individuals can really use that are comparable.


Damn, why shoot themselves in the foot for image processing tasks. Appreciate the insights!


What happened to their hardware encoder? 2x 1080p30 by CPU compared to Jetsons "4K @ 30 | 4x 1080p @ 30 | 9x 720p @ 30 (H.264/H.265)" is a huge step in the wrong direction, especially considering their image processing edge computing use case.

Some benchmarks can be found here:https://developer.ridgerun.com/wiki/index.php/NVIDIA_Jetson_...


I think that is the larger, more expensive Jetson Xavier: https://www.nvidia.com/en-au/autonomous-machines/embedded-sy...


No, the small cheap Jetson Nano could always handle 1080p30 x4 without any problem. But maybe their wording is just misleading "1080p30 on 1-2 CPU cores".

The Xavier NX was 20x 1080p30.


Related, so I'd like to give a shout out to the guys who made meta-tegra[1] possible so that I don't have to use NVidia's garbage software, which barely resembles something you get out of a proper SBC.

[1] https://github.com/OE4T/meta-tegra


Just curious, what problems of L4T does Yocto solve? Except for the ones mentioned here[0] ?

[0] https://witekio.com/blog/yocto-for-nvidia-jetson/


Quoting from this source:

> "NVIDIA does not provide source code for their CUDA libraries, but instead only provide them as Ubuntu packages for their Jetpack L4T BSP"

Also, they don't maintain these packages.

So you get something less stale than say Ubuntu 18 for the current Jetson Nano.


Mender integration


Never again will I personally buy a Jetson from Nvidia. The hardware is essentially disposable, and Nvidia moves/renames/hides software packages all the time. After a year or two, you’re stuck mining forum posts for third party hacks to keep using your device.


Sounds like GPU uses system RAM like TPU as I can't find VRAM information in the page.


hi


This board sets "the new standard for entry-level edge AI" at 30x the performance of the previous generation of "entry level". Now I can perform number plate recognition at 1000fps vs the previous 100fps. Is the previous board no longer OK for entry level tasks?


At 100fps you can do superhuman agility robotics. This is a very old video, still impressive: https://youtu.be/-KxjVlaLBmk?t=155


Good luck getting low latency video into the board. The best previous jetson did I was told was 40ms. This is still 3~4 times better than any alternative, but nothing to write home about.

Most delay happens in the isp path (image processing such as debayer, color correction, denoise, auto exposure). It seems many chips these days have the raw power to do those things quickly, but the the (chip)integrators don't really care. So then we get results like Rk3566 that can encode 10bit 720p in ~7ms, but the isp path adds 30~60ms of latency.


Entry level tasks of yesteryear != entry level task of today.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: