Solving Entry-Level Edge AI Challenges with Nvidia Jetson Orin Nano

fxtentacle · on Sept 21, 2022

Oh wow, this is a mixed bag.

First off, NVENC is gone. Instead of 4x 1080p @ 30 (HEVC) that the Jetson Nano could do with GPU acceleration, this one can barely manage 2x 1080p @ 30. Apparently, too many people (myself included) used the Jetson Nano as a cheap camera recorder.

Next, price went from $99 for a Jetson Nano to $299 for Jetson Orin Nano.

And lastly, they now advertise 40 TOPS for Jetson Orin Nano but that's measured at 50% sparsity and for INT8... So it is in no way comparable to the 0.5 TFLOPS that a Jetson Nano has, which was measured at dense FP16. And the fact that I can't find any FP16 number for any Jetson Orin device is not a good look.

In summary, this looks to me more like a TPU than a GPU, because they moved from FP16 to INT8 precision. That makes this much more similar to the mini PCIe Coral priced at $25 delivering 4 TOPS dense. Compared to that, Jetson Orin Nano is 5x faster and 12x more expensive.

EDIT: In case you're wondering why I care about INT8 vs. FP16: The Jetson Nano can autonomously collect data and then calculate gradients and fine-tune the AI model. If the Jetson Orin Nano lacks FP16, it won't be able to do gradients and learning. So going from FP16 to INT8 is a downgrade from having AI training capabilities to pure execution of pre-trained AI models.

aseipp · on Sept 21, 2022

Orin has support for FP16 of course, it would be insane not to have it when INT8 isn't viable for a lot of models anyway. There are some preliminary benchmarks that can confirm this that weren't hard to find.[1] The number juicing by way of using "TOPS" to mean "Sparse INT8" or whatever has been going on since the beginning of the Ampere line, though, so yes you should take that with a grain of salt (Ampere being the gen they added sparse tensor cores.) But not including FP16 support would just be a ridiculous move, frankly.

According to this table on this page[2], the lack of the NVENC support is due to the "Nano" vs "NX" product lines which they are introducing. NX started with Xavier, there was never a Xavier Nano, to my knowledge (since you could just buy the original Jetson Nano.) Xavier NX still has NVENC support, and now you can buy separate Orin NX modules with even more RAM. So they're just segregating their products further, it looks like. (Hit "Specifications" then expand the table and look at Video Encode/Decode support.)

So for people like us, we just want the Orin NX series. But yes, no more ultra cheap cameras.

I'm guessing they're eventually going to gear for Orin as an eventual successor to the original Nano, but who knows. It'll take a few price cuts to get there.

[1] https://www.edge-ai-vision.com/2022/04/is-the-new-nvidia-jet...

[2] https://www.nvidia.com/en-us/autonomous-machines/embedded-sy...

csdvrx · on Sept 21, 2022

> the lack of the NVENC support is due to the "Nano" vs "NX" product lines

Translated into plain English: "we're crimping the regular product because we want to extract more cash from consumers: we didn't like that they enjoyed so much this hardware encoding support. Let's make them cough up a few extra hundreds by cutting it out!"

fxtentacle · on Sept 21, 2022

Your link [1] doesn't seem to mention the Orin Nano anywhere? It says that the bigger SOMs with DLA have FP16.

But the Orin Nano doesn't have DLA.

c_o_n_v_e_x · on Sept 21, 2022

Out of curiosity, what use cases require training at the edge?

fxtentacle · on Sept 21, 2022

In my example, it is very helpful if you can adjust a speech model a bit to better match the user's dialect. So you don't train the full model, rather you just adjust the most coarse 5% of parameters in a pre-trained model by a tiny bit. But that can lead to great quality improvements at almost no additional cost.

hasperdi · on Sept 21, 2022

I'm no expert in ML, one use case is that if you want to refine the already trained model that was trained elsewhere.

bravura · on Sept 21, 2022

Additionally, privacy-preserving AI.

bpye · on Sept 21, 2022

On the flip side, 6xA78 cores with decent range of IO (7 PCIe 3 lanes for example) is quite nice. There aren't many SoCs that individuals can really use that are comparable.

sjnair96 · on Sept 21, 2022

Damn, why shoot themselves in the foot for image processing tasks. Appreciate the insights!

dayjaby · on Sept 21, 2022

What happened to their hardware encoder? 2x 1080p30 by CPU compared to Jetsons "4K @ 30 | 4x 1080p @ 30 | 9x 720p @ 30 (H.264/H.265)" is a huge step in the wrong direction, especially considering their image processing edge computing use case.

Some benchmarks can be found here:https://developer.ridgerun.com/wiki/index.php/NVIDIA_Jetson_...

nl · on Sept 21, 2022

I think that is the larger, more expensive Jetson Xavier: https://www.nvidia.com/en-au/autonomous-machines/embedded-sy...

dayjaby · on Sept 21, 2022

No, the small cheap Jetson Nano could always handle 1080p30 x4 without any problem. But maybe their wording is just misleading "1080p30 on 1-2 CPU cores".

The Xavier NX was 20x 1080p30.

spyremeown · on Sept 21, 2022

Related, so I'd like to give a shout out to the guys who made meta-tegra[1] possible so that I don't have to use NVidia's garbage software, which barely resembles something you get out of a proper SBC.

[1] https://github.com/OE4T/meta-tegra

voqv · on Sept 21, 2022

Just curious, what problems of L4T does Yocto solve? Except for the ones mentioned here[0] ?

[0] https://witekio.com/blog/yocto-for-nvidia-jetson/

csdvrx · on Sept 21, 2022

Quoting from this source:

> "NVIDIA does not provide source code for their CUDA libraries, but instead only provide them as Ubuntu packages for their Jetpack L4T BSP"

Also, they don't maintain these packages.

So you get something less stale than say Ubuntu 18 for the current Jetson Nano.

thebruce87m · on Sept 21, 2022

Mender integration

coredog64 · on Sept 21, 2022

Never again will I personally buy a Jetson from Nvidia. The hardware is essentially disposable, and Nvidia moves/renames/hides software packages all the time. After a year or two, you’re stuck mining forum posts for third party hacks to keep using your device.

kbumsik · on Sept 21, 2022

Sounds like GPU uses system RAM like TPU as I can't find VRAM information in the page.

grayinska · on Sept 21, 2022

yzydserd · on Sept 21, 2022

This board sets "the new standard for entry-level edge AI" at 30x the performance of the previous generation of "entry level". Now I can perform number plate recognition at 1000fps vs the previous 100fps. Is the previous board no longer OK for entry level tasks?

visarga · on Sept 21, 2022

At 100fps you can do superhuman agility robotics. This is a very old video, still impressive: https://youtu.be/-KxjVlaLBmk?t=155

Roark66 · on Sept 21, 2022

Good luck getting low latency video into the board. The best previous jetson did I was told was 40ms. This is still 3~4 times better than any alternative, but nothing to write home about.

Most delay happens in the isp path (image processing such as debayer, color correction, denoise, auto exposure). It seems many chips these days have the raw power to do those things quickly, but the the (chip)integrators don't really care. So then we get results like Rk3566 that can encode 10bit 720p in ~7ms, but the isp path adds 30~60ms of latency.

DeWilde · on Sept 21, 2022

Entry level tasks of yesteryear != entry level task of today.