Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My two cents as a kernel developer, the driver is pretty abominable compared to the code quality of most of the rest of the kernel.

However, having a GPU driver not just be open source but in the upstream Linux kernel is a gigantic deal. Kernel development takes a long time, we have millions of lines in the amdgpu driver, and if every one of those dealt with the lengthy review process it would have never made it in the tree.

So it's a necessary evil. I do wish they would clean it up though, I sent a fix to amdgpu once that was the same thing for 3 different files that were largely duplicated. That kind of thing wouldn't fly anywhere else in the kernel



I would also mention that gpus are a GIANT abstraction, and since they are rev'd faster than arguably any hardware in a sytem, there are abstractions layered on top of that for the families and models of gpus too.

Another way of looking at it is -- I started playing with openwrt for a relatively small router, with 5 ports plus wifi.

I was amazed at not only the amount of openwrt code required to support the different router families and the different router models, but at the sheer amount of stuff turned on by default in the kernel just in case I might need to load a module for some obscure feature or package. I assume the same goes for a gpu driver both at the source level and in the kernel.


Yeah AMD/NV typically do 2-3 chips per year. That complexity adds up pretty quickly when backwards/forwards compatibility is fairly strict and the inputs/abstractions not particularly well defined or behaved.


I'm wondering at which point it would make sense to split the driver into multiple device family drivers instead of lumping it all together into a mess of unmaintainable abstractions.


At the point where people are making jokes about the Linux kernel making up X% of the GPU driver.


The opposite. The lack of abstractions, mere industrial copy pasta code is the problem.


I’m not saying it’s true here but duplication over the wrong abstraction is always better. Seems to me if each graphics card is different enough there’s probably lots to duplicate.


In software development absolutes are always dangerous.


Apart from absolutes about absolutes.


Nonsense. This duplication is not maintainable and needs massive amounts of memory. Proper abstraction adds a few if else in the data, and is about 20x smaller. You can even read and understand that, e.g. what changed with this HW upgrade. No chance with duplicated blobs of structs and enums.


> I’m not saying it’s true

Yes, I did caveat my suggestion. Why not submit a fix if it's so simple ;-)


If the Radeon driver is in the kernel now, then maybe at some point someone other than AMD will pick it up and start cleaning up excessive copy paste code.

Assuming they play nice with the community, it could be a huge benefit to AMD in the long run.

Still, 2 million lines is a massive amount of code to start working on.


Unlikely, unless that entity has all the supported hardware at hand and ready for automated tests. Refactoring without a thorough test harness is ref*toring.


isn't that what Google are hoping to do with Fuchsia to make a next generation of Android that's not dependant on device drivers in the kernel ?


I can't see how they'd be able to achieve that, unless you mean simply that the device drivers would run in userland.


Reinventing QNX will always be cutting edge


Are there still workarounds for specific games/programs inside the driver?


You can see all the workaround used in mesa here: https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/u...

Propriatery drivers (especially nvidia), most likely has lots of similar game specific workarounds and optimizations (even going as far as overriding shaders in games with better ones they wrote). [1]

1: https://www.gamedev.net/forums/topic/666419-what-are-your-op...


Yes. In fact, that's essentially what the drivers are.


Leaving aside arguments about "what the drivers are", the kernel driver being discussed here generally doesn't have or need that kind of thing. The user-space drivers which talk to the kernel drivers are under the Mesa umbrella as part of Gallium for OpenGL and Direct3D support (e.g. https://github.com/mesa3d/mesa/tree/master/src/gallium/drive...) or as a standalone driver for Vulkan support (https://github.com/mesa3d/mesa/tree/master/src/amd/vulkan). That said, I haven't seen many app-specific hacks in the open source drivers, even in the user-space code.

If anyone wants to learn more about lower-level aspects of GPUs, the Vulkan driver code I linked is one of the best places to start. It directly implements the Vulkan API on one end and talks to the kernel drivers on the other end, so it's relatively easy to follow if you're a systems programmer with an API-level understanding of graphics. Just pick a Vulkan function of your choice and start tracing through the code, e.g. vkCmdDraw: https://github.com/mesa3d/mesa/blob/master/src/amd/vulkan/ra.... The Vulkan driver calls into some of the low-level radeonsi code I linked from the Gallium tree but it isn't a Gallium-based driver, so you don't have to deal with those extra layers of abstraction.


> That said, I haven't seen many app-specific hacks in the open source drivers, even in the user-space code.

They are enabled via driconf [0]. Not nearly as many as what I imagine you'd find in the proprietary Windows drivers though.

[0] https://github.com/mesa3d/mesa/blob/master/src/util/00-mesa-...



I understand how big a deal this is and want to buy an AMD card for my next PC, just to support them, but is the driver actually good? Ie, is support for AMD cards on par with Windows?

The Nvidia driver is crappy, doesn't support Optimus, etc, but at least I haven't had any problems with it for as long as I've used it.


I bought an AMD GPU specifically for use with my Linux workstation and haven't regretted it. Perhaps I simply had bad luck with specific nVidia cards, but the AMD driver is stable in a way the nVidia driver simply never was, especially w/ respect to GPU accelerated desktop environments and screen capture utilities. The only change I made was to switch Arch over to the LTS kernel, as the upstream kernel in Arch isn't quite as battle hardened, and did occasionally require a rollback. That's not something that's likely to affect any other distro though, it's a side effect of Arch's bleeding edge nature.

Anecdotal data and all that. I'm on a Radeon VII, pretty darn solid, will probably continue to choose AMD cards in the future. Wish the Windows driver were a bit more stable, and it's... frankly weird to be saying that in comparison to the Linux driver for the same card.


I have had a similar experience with my laptop that uses a Ryzen 5 Pro 2500U, it would crash once every couple days under Windows, but no such issue crops up using mainline drivers on Debian.


Yeah, from what I understand the difference in driver stability between Nvidia and AMD on Linux is exactly the reverse of their relationship on Windows.


>Wish the Windows driver were a bit more stable

I have an AMD 5700XT in my Windows games machine, and the driver is an absolute travesty. And looking over the installed files is a horror show. Qt5WebEngineCore.dll and avcodec-58.dll, because a browser engine and ffmpeg are essential in a device driver. And why does FacebookClient.exe exist? Fuck knows.


It's even worse with NVIDIA - they ship an entire custom NodeJS with their drivers.

Also don't forget that these are user-space apps that are simply bundled with the driver but not necessarily part of it. Qt5WebEngineCore.dll is most likely used by the UI portion of the driver (settings dialogs, radeon software etc.), same with the ffmpeg dll and the facebook client.

NVIDIA does the exact same btw. - see [1]

[1] https://www.ghacks.net/2020/03/13/nv-updater-nvidia-driver-u...


I think that crap is only included if you install "Geforce Experience". Nvidia parted off their garbage into an optional component, while AMD forces you to install it.

For anyone using Nvidia on Windows, here's a useful tool to carve out most of the trash from the driver prior to installing.

https://www.techpowerup.com/download/techpowerup-nvcleanstal...


Still nothing compared to typical RGB control software, that on Windows only runs while actively logged in (stops when locking the screen/desktop) instead of a tight/light service that uses the "last known" config that updates from the desktop/gui. Let alone painfully missing tecnical docs or support for Linux.


Does AMD make you sign in to open their locally-installed driver utility? Nvidia seems to have thought it was a good idea and went ahead and did that a year or so ago...


No, and neither does Nvidia. "Geforce Experience" is not the driver. It's just bloatware nobody needs.


I have the same card and have issues with locking up and crashing in both Windows and Linux. It looks like the kernel in Ubuntu 20.10 might have a fix for some of issues.


I remember back when AMD had both closed and open drivers and, having trouble with the proprietary drivers, I switched to the OSS version. It was NIGHT AND DAY. Games that would crash and had weird oddities now ran smooth with higher frame rates. A lot of weird video issues, especially those that come from running more than one X server or Xnest, all went away.

I would only use AMD cards in my Linux boxes. The nvidia drivers/cards pale in comparison.


At least on par. That said, the AMD driver on Windows is notoriously crap compared to Nvidia.

I'm hoping AMD's next gen turns out to be competitive with the RTX3000 series for my next GPU for the same reason.


That is really interesting, I haven't had an AMD card for >10 years now and would really like to be rid of Nvidia due to the closed source drivers. How is suspend/resume? Thinking about canceling my pre-order queue with EVGA and getting a 6xxx card.


For system integration stuff like suspend/resume, display hotplug and resolution changes, etc, the open-source radeon driver is good, and probably the best option on linux. The 3D accel is not bad (but not as good as nvidia).

However, don't expect a new Radeon GPU to be well supported on day of release, expect 1 kernel release cycle until it basically works, and one more until it has most of the bugs ironed out, and then wait until your favorite distro gets that kernel. So you're looking at 3 to 9 months depending on what distro you use.

I'm personally going to be looking for people selling their RX 5700, to replace my RX 480 ...


Agree mostly, but the best option under Linux seems to be Intel graphics (or at least was until a few years ago) - arguably not beefy enough for some things, but regarding supported features, stability and power consumption the best supported mainstream gpu in the Linux kernel.

Intel simply has no closed source driver for Linux. New hardware is often supported/merged before it is even sold. AMD is trying the same, but not there yet.


The Intel i915 driver STILL crashes my system regularly even on the latest kernels... I have a skylake i5-2600k and the iGPU is absolute dogshit. Not sure if it's a hardware or driver issue but it still hasn't been sorted out after all these years.

Typically the entire system will freeze (speakers will continue to play whatever was in the short audio buffer - pretty awful) for 10-15s, then the driver will detect the hang and reboot the iGPU. Happens much more frequently (every ~15m) when using more graphically intense programs. I can't use blender because sometimes when it hangs it won't reset and requires a full reboot.

There are dozens of issues about it and related problems in Intel's drm fork of the kernel [0]. I (finally) posted a bug report about it months ago since it seemed to have gotten worse after 5.4 but never heard back from them.

All this to say - be wary of Intel graphics on linux.

[0] https://gitlab.freedesktop.org/drm/intel/-/issues


Ever since kernel 5.7 was released my i7-5500 will not boot. (Well it will boot with “nomodeset” option but then X doesn’t work so not very useful.) It’s still not fixed in 5.9.


Wouldn't even say that, I've experienced regressions/bugs on intel drivers for laptops a few times.

In general, it's kind of a crapshoot no matter which way you go, and expect pain if the gpu chipset is less than a year old.


> The 3D accel is not bad (but not as good as nvidia).

How is that? I think Mesa provides state of the art OpenGL and Vulkan support, especially with work on ACO. Nvidia doesn't have any edge in that anymore. They did a few years ago still, but not today.


Last time i checked (which was about a couple of months ago), Mesa had very primitive support for display lists (most of the time you get a command playback though if you only submit vertex commands it gets converted to VBO - and i think that was added recently-ish) whereas Nvidia's perform optimizations in background threads to convert in the best GPU format, split as necessary to minimum calls and when rendering it performs culling before processing the full list. AMD's Windows drivers also do some of that stuff (though not all).

Mesa does implement a lot of stuff but they do not take much advantage of what the higher level parts of the API allow to optimize rendering. From what i remember until AMD pushed some devs on it, they didn't care about supporting the entire API at all.

Vulkan support is most likely good though.

(EDIT: yes, "display lists are deprecated", but this is irrelevant, the API is there, available and works and works great on Nvidia and still very good on AMD Windows driver and a lot of applications use it - Khronos splitting the API to core/compatibility was a mistake that made everything more complicated than necessary when what they should have done if they wanted a clean API would be to make something new like they eventually did with Vulkan and avoid messing up OpenGL )


> Mesa does implement a lot of stuff but they do not take much advantage of what the higher level parts of the API allow to optimize rendering.

There is always more that could be optimized, especially when it comes to niche use cases, but generally Mesa/radeonsi do a decent job of making things fast.

> yes, "display lists are deprecated", but this is irrelevant, the API is there, available and works and works great on Nvidia and still very good on AMD Windows driver and a lot of applications use it

By "lot of applications" you mean some workstation applications that refuse to upgrade their code. You can still use AMD's closed source driver on Linux if you need optimizations for those. If you don't (and most people won't) then Mesa works extremely well.

> Khronos splitting the API to core/compatibility was a mistake that made everything more complicated than necessary when what they should have done if they wanted a clean API would be to make something new like they eventually did with Vulkan and avoid messing up OpenGL

You could argue for drivers not providing newer features in the compatibility profile (and Mesa did that until recently) but as long as there are customers demanding support for newer features while refusing to move off the older APIs, this is what you will get. I don't think having OpenGL Core and OpenGL Compat sharing some of the API hurt anything here.


> There is always more that could be optimized, especially when it comes to niche use cases, but generally Mesa/radeonsi do a decent job of making things fast.

Sure, i didn't dispute that, what i wrote was that Nvidia's drivers are faster in some cases based on code i've actually seen. And they used to be slower until not too long ago in that case too, so it isn't like they aren't improving. But still Nvidia's implementation is faster.

> By "lot of applications" you mean some workstation applications that refuse to upgrade their code. You can still use AMD's closed source driver on Linux if you need optimizations for those. If you don't (and most people won't) then Mesa works extremely well.

I mean games, applications and tools, not workstation applications. Not every application uses the latest and -rarely- greatest version of everything out there nor all applications are always updated - or even under development (especially games). Those that are may have other priorities too.

But why an applications uses some API is irrelevant, the important part is that the API is being used and one implementation is faster than another, showing that that other implementation has room for improvement.

> You could argue for drivers not providing newer features in the compatibility profile (and Mesa did that until recently) but as long as there are customers demanding support for newer features while refusing to move off the older APIs, this is what you will get. I don't think having OpenGL Core and OpenGL Compat sharing some of the API hurt anything here.

My point was that the split itself was a mistake (it isn't like splitting OpenGL into Core and Compatibility was a mandate from heaven -or hell- it was something Khronos came up with) and the hurt was that it make things complicated for a lot of people (e.g. not everyone cares about having the best performance out there - some applications are, e.g., tools that wont even come close to using even a 1% of a GPU's power, but they'd still prefer to rely only on open APIs instead of some proprietary one or some library that may be abandoned next year - code written for OpenGL 1.x 25 years ago can still work fine in modern PCs after all) and split the OpenGL community into two "camps".

This created issues like libraries and tools only supporting one version or the other, tons of bugs and wasted time for "integrating" to Core (or supporting both Compatibility and Core), invalidating a ton of existing knowledge and books (OpenGL being backwards compatible down to 1.0 is very helpful since you can always start at the beginning with something proven and work your way towards more modern functionality in an as-needed basis) and at the end all of that was a huge waste of time since everyone outside Apple decided that Compatibility is necessary - and Apple decided that splitting OpenGL in two halves wasn't enough, so they made everyone's life even harder and came up with a proprietary API all on their own.


ACO developers will work on OpenGL at some point too. OpenGL in general isn't the case I worry about, as long as it performs sufficiently well. All modern things should be using Vulkan anyway, especially if something requires focus on performance.

And deprecated features? I think there are better things to focus on first optimization wise.


Well, the original comparison was with Nvidia's driver and Nvidia has a much more optimized driver.

Also it is much more practical (and realistic) to have a few devs optimize a handful of API implementations than expect the thousands of devs who work on thousands of applications to do that (also why OpenGL etc isn't going anywhere).


> Well, the original comparison was with Nvidia's driver and Nvidia has a much more optimized driver.

I wouldn't say that. In all common cases they don't. And as above, deprecated features is the last thing I'd start comparing that on. If you use something deprecated, worrying about performance shouldn't be the case, rather you should worry about rewriting your code.


That sounds just like sour grapes :-P "Mesa is as fast as Nvidia" "But they are slower in these cases" "That doesn't count".


At least on the hardware that I've had, it's basically rock-solid in practice. I use it with high-refresh-rate monitors, I've tried FreeSync and that works, it works with all my displays, and recently the older of my GPUs (Radeon Pro WX 7100) finally got audio output over DisplayPort, as the newer of them (Radeon VII), though I never really had any use for that feature.

The acceleration, particularly with RadeonSI and RADV, and particularly as the RADV developers (independents, Valve, and some smaller companies I wish I remembered the names of) have been making massive improvements on the shader compiler side. RADV's own shader compiler (ACO) is noticeably better than the first-party AMD LLVM stack, and RADV is substantially faster than any of the first-party AMD Vulkan drivers for both graphics and compute workloads. I hope ACO in RadeonSI becomes a thing, I think it will be a major improvement.

Message to anyone listening from AMD: maybe look into making ACO your primary target rather than LLVM, it is clearly a better design for your GPUs, it has substantially less overhead, and there's no legal reason it can't be a part of all of your drivers.

As for kernel support, it is often same-day or at least it can access the displays on launch day, provided you have the latest stable kernel. ArchLinux is rarely that far behind a new stable kernel release, so on ArchLinux, same-day support of one form or another, and full support that day or some day soon, is the norm.


Suspend / resume works fine with my Sapphire Pulse RX 5700 XT.


Is it really crap? I have it and it feels stable and the Crimson UI seems well made. It feels way better than the Catalyst days.


It is crap enough for me (RX 5700 XT user) to keep a backup of the few previous successful drivers so that when one inevitably breaks things i can roll back to a previous driver.

Some issues i had with a variety of AMD drivers on my current PC from the top of my head: turning on the monitor before the PC would cause the GPU to not realize there is a monitor attached, letting the monitor to go to power save mode would also cause the GPU to think the monitor was lost, settings for display scaling would be lost after every full reboot (full=real reboot, not the fast hibernate based one Win10 do most of the time, you get a full reboot after updates, some installs, etc), random full system hangs when trying to play GPU accelerated video (which is pretty much most videos on web as well as some applications like Microsoft's new XBox Games app), random reboots too, etc.

So i tend to be careful with updating the drivers. Last issue i had wasn't as bad the random hangs/reboots (which fortunately hasn't happened recently) but i simply couldn't launch the crimson UI at all. I had to do a full reset and reinstall of the drivers for it to appear again.

In comparison updating to the latest Nvidia driver when i had an Nvidia GPU (which was since early 2000s to ~2 years ago) was basically a non-issue: i wouldn't even think twice about it as i never had any issue.

And FWIW that was the same on Linux too: i never had issues with Nvidia's drivers there either and performance was more or less the same (at least for OpenGL stuff). But note that i avoid stuff like Wayland, hybrid GPUs, etc like the plague.


turning on the monitor before the PC would cause the GPU to not realize there is a monitor attached

I have a similar issue with a Dell display attached to an AMD card. After suspending the PC, the monitor does not detect the PC at the other end of the DP cable, except for Amazon Basic cables which work for some reason. Digital standards are weird.


I've had all the same problems with my recently bought DisplayPort Monitor (previous ones were all HDMI and worked flawlessly).

The fix for me was switching from Xorg to Wayland. Haven't had a problem since, apart from Steam not liking it all that much.


Interesting you mention this standby issue. I just moved a monitor from an nvidia setup where it had zero issues.

Now when you turn the laptop (with Radeon gfx) on, it requires me to turn the monitor off and on before It is recognised.


This back and forth in this thread about nuisances like that is one of the reasons I am definitely sticking up to Intel integrated GPU when running Linux. It's 2020 and stuff like that should be much smoother :-(.


Note that in my comment above i was referring to the Windows AMD driver. I haven't used Linux much with this machine (though when i did it had a 50/50 chance to completely hang the system, but i think this was an issue with the kernel and the then-new Zen APUs that was quickly fixed).


I have Lexa PRO in my workstation (Fedora) - Suspend/Resume works so far.

I have an issue though where switching off the monitor for a few days might make the AMD card disabling the outputs and not recognizing the monitor afterwards (I think it is related to the order in which I try to "wake" the monitor) - which I cannot recover from without rebooting the machine.

But this is with a machine never going into suspend or any sleep state - and I can't say if this would be the same with the NVIDIA card. I do not use the NVIDIA card for video output because the proprietary driver would regularly stop showing my desktop - or suddenly any output at all after reboot.

The integrated Intel GPU on my laptop is mostly without issues whatsoever.

On laptops I would still recommend Intel GPUs anyway for power consumption reasons - although AMD APUs are quite interesting and I don't have recent knowledge about how well they compare. The CPU and its ability to lower power consumption under sleep is also relevant there, and this was way better under Intel so far. Unless you need the increase in performance an AMD GPU/APU would offer...


I have a similar issue with Nvidia on Linux. My larger display is slow to start, so I have to rerun xrandr after suspend in order to get it working.


I remember the Catalyst days. I used to work for a company which included a pc in the price when selling its software. We unofficially supported people who would run it on their own PC but eventually had to put our foot down and explicitly state that we wouldn't support AMD cards.


Hmm, that's a low bar, huh. Is AMD on Linux anywhere close to Nvidia on Windows?


The thing is, Nvidia also has issues, but their PR game is historically better. Many graphics developers have had experiences with Nvidia support where they run into a strange bug and are instructed to set a magic value to enable a driver hack. AMD drivers have had good and bad periods and hacks of their own, but are usually better behaved in this respect. But it's actually Intel that gets the most praise for adhering to spec, and therefore being a useful baseline. So user perceptions and dev perceptions diverge on what makes the drivers good, actually, and this has shifted with the different generations of APIs too; as we've gone towards a lower level access model, the basic driver functionality has become less focused on performance hacks, but there is a lot of legacy support there to support old games.

We're long past the worst period for Radeon on Linux which was back in the 2000's with "fglrx" - a driver that I never managed to get working. The new stuff will run with some competence.


I recently bought a RX 5700 XT, and installed it on a computer that first ran Linux and then Windows 10.

In Linux, the driver (including audio) seemed very robust, but I didn't find anything like a detailed control panel for the card's graphics features.

On Windows, the AMD-supplied control panel has plenty of knobs and buttons, but the driver itself seems less robust, particularly w.r.t. audio-over-HDMI.


That's very informative, thanks. I wonder if there's a cli utility on Linux instead...



Thanks for the links; I will definitly test those out


Sure. For some reason they aren't packaged yet in common distros, so that makes them not well known.


Maybe have a look at corectrl, which aims to create a beautiful control panel for graphics cards.


What about: is AMD on Linux anywhere close to Intel on Linux? No games, just 3D acceleration for the desktop, bug-free suspend and resume, etc.


Maybe I'm just lucky, but I have not had a single issue on Windows with my RX560. I know AMD/ATI drivers used to be horrible on Windows back in the day, but I really think they've gotten a lot better, I'd say on par with Nvidia's.


It is not. My 5500 XT is unusable when using 2 monitors. https://gitlab.freedesktop.org/drm/amd/-/issues/929

Apparently AMD doesn't have the resources to debug these millions lines of code, since this has been open for a year now.

Yet people still say NVIDIA on Linux has issue. They don't support Wayland and tend to lag behind with Linux only tech in general, but the driver itself is top notch. I haven't had an nv driver crash on Linux in 10 years. It's only the same echo chamber borne by the famous moment of Linus flipping NVIDIA the bird.


My experience is 180° opposite to what you describe.

I never had a mentionable issue with AMD cards since switching to the open source driver approx. a decade ago. I have a NVIDIA 1060 card in my workstation for CUDA - every single time I put it in running state again, I have a realistic chance of completely borking my system.

In fact I had an AMD card installed after the first two incidents, simply to have at least a chance of having working video output when the NVIDIA driver once again doesn't want to talk to the kernel.

That and the whole practical implications and idealistic differences of having an (mostly) open source driver vs. a (mostly) closed source driver (I think we can agree that the open source NVIDIA driver is out of the discussion).

Obviously you might run into problems if you try to run very recent hardware right after availability. Kernel driver development is not ideal for cutting edge hardware and some things might break and it might need some time for your distro to ship the newest kernel/driver.


The Nvidia driver started supporting proper optimus at the begining of 2020 (can runs apps on integrated and dedicated cards simultaneously). I use it regularly on my XPS 15 (to play Kerbal Space Program). It's called "DRI PRIME". You have to set an environment variable for starting and application saying what GPU you want it to run on.

I am, however, very much looking forward to the new AMD GPUs. Hopefully the RX 6000 series will be near a 3080 in more than the 3 hand picked games in their teaser. Would love to use Wayland on my desktop.


That's interesting, thanks, I tried to use Optimus on my XPS years ago but it wouldn't work. I'll try it now, thanks!


Search for “amdgpu ring gfx timeout”. There seem to be a whole class of bugs that have been open for years which not only haven’t been fixed, but there isn’t even any clear indication of what the root cause(s) is/are.


I tried a couple of different AMD cards, and my machine crashes on resume if I try to use either of them (but the Intel iGPU works fine).

Searching for amdgpu bug reports leads to:

https://amdgpu-install.readthedocs.io/en/latest/install-bugr...

which links to a page saying "Bugzilla is no longer in use" :-(

This is under Qubes/Xen, though, so maybe that causes extra problems. If any devs are reading, I did report it here in the end:

https://github.com/QubesOS/qubes-issues/issues/5459


It could be misbehaved applications. While AMDGPU and Mesa are much faster than AMD proprietary driver (on some OpenGL workloads I have seen 2x improvement compared to AMDGPU-PRO or Windows driver) and are normally stable, I had several issues where bad shaders brought down whole GPU (with "ring gfx timeout"). Things like out-of-bounds access or division by zero.


I upgraded from a Geforce 460GTX to a Radeon RX560, and I ran into two issues. Nothing major, and I've had worse issues with the Nvidia drivers, but they are still something to be aware of.

The first was that my distro (KDE Neon based on Ubuntu 18.04) shipped an older version of Mesa at the time, which was too old for the AMDGPU driver, so I had to add a PPA with an updated version. Since Neon updated to a 20.04 base, it works straight from a clean install. It also worked with no issues when I switched to openSUSE Leap 15.2.

The second was that DVI output was limited to single-link instead of dual-link. My monitor at the time only supported full 1440p through dual-link DVI or displayport, and the old GPU didn't have displayport. Buying a displayport cable was a quick fix, and I believe the DVI issue is fixed in the driver now.

Aside from those two minor hurdles, it has been smooth sailing, very good OpenGL performance in the games I play.


Not sure if this is a driver problem but there's a LOT of general usability issues on AMDGPU + Linux. The default thermal control being absolute catastrophe for one.


How is it a catastrophe? I game every day on AMD on Linux and have no issues. 99.9% of consumers don’t care about overclocking so if that’s what you’re referring to I think it’s a non-issue.


It runs 75C at idle because the fan curves are wonky.


AMD has caught up to Intel but still lags behind nVidia (on Windows at least). I'm just not sure they can fight a two front war. Something has to give.


If we're talking about CPU's wrt Intel, and GPU's wrt nVidia, I think they'll do fine- IIRC, they're both separate internal groups with the same overall leader (Dr. Su).


Wait a few months after a new GPU comes out, maybe until the next major version cycle (like if you want a new card that comes out in November, wait until the 21.04 Ubuntu/PopOS release).

I bought my RX 5700XT shortly after release, and was using alpha/beta kernel releases and downloading extra files manually for several months after to run, then an upgrade/update may turn into a blank screen on boot for me. It also broke out of the box support for running full VMs, which was pretty painful for me as well, and I wasn't going down that rabbit hole to try and build it myself.

YMMV of course.. but that's just my take on it.. I bought specifically for Linux support, but took a few months to shake out.


Have you tried Nvidia on-demand option for Optimus?


It is good enough. I'd say overall Nvidia's driver is worse.


I have returned 2 Radeons that I bought for specs but returned because the drivers were bad enough that I couldn’t get the same-clock performance as Nvidia or worse dealt with driver crashes and system reboots - note that this was between 10 and 20 years ago. I am highly considering trying again at eom when they announce the new cards but that it’s a Radeon is still a downside to me.

Most of the Linux community has a historical hatred of Nvidia because of the driver issue so there’s a lot of relative love out in forums, but just “stable” would be a step up for me for Radeons on windows.


I recently built a system based around a Ryzen 5 3600 CPU and Radeon RX 5600 XT GPU, and in both Windows 10 and Linux with a 5.4+ kernel it's rock solid. Gaming in Windows is simply amazing and it pairs well with my 1440p monitor. On Linux gaming is also extremely good, with only a couple of "Windows only" titles acting buggy under Proton/Steam. Considering Proton itself is in its infancy, that's to be expected.

With native performance on official Linux games on par with or better than the Windows equivalent, and more and more games getting Linux ports due to Vulkan, I just about have no need to boot into Windows at home anymore apart from Fusion 360.

As a workstation in Windows, since I don't overclock I don't see any stability issues. Fusion 360 is fast and fluid unlike my 8 year old Sandy Bridge dinosaur at work, even after adding a GT 1030. Good quality Crucial RAM and a no-frills AsRock B450 board make for a rock-solid build. Ditto on Linux as a workstation, everything just works and works well, and it's superb for 3D modeling and music creation (two of my main hobbies).


Good to hear that things have gotten better! Will be watching the oct 27 reveal of the new cards :)


I'm also very interested on giving my 2080 Ti to my partner (a Windows user) and getting the fastest next gen Radeon to myself.


It is not on par at all - my 5600 got annihilated by driver issues.

AMD has incredible CPUs, but just buy an Nvidia GPU - especially if you are using linux.


Nvidia has subpar support for Wayland on Linux because it uses its own EGLStreams buffer API instead of the standard GBM buffer API, which is better-supported. Both AMD and Intel use GBM.

Also, the open source driver for Nvidia (nouveau) has incredibly poor performance compared to Nvidia's proprietary driver, and lacks essential features such as reclocking for recent hardware generations:

https://nouveau.freedesktop.org/PowerManagement.html

AMD's and Intel's open source drivers are their primary offerings on Linux and have good performance across all hardware generations.


Intel has actually gone downhill lately, especially for prior generations. I've had to live with 5 or so years of tearing with multi-monitor support on Ivy Bridge, and even single monitor tears inexplicably with some software (that shouldn't). The Intel Xorg driver is unmaintained and the generic modesetting driver doesn't work quite as well. When I first got my Ivy Bridge system, triple head mode didn't work for a while either, so it's not like they have great support when the hardware is current either.

I've switched to AMD now and things are much better. Go with AMD.


The Xorg modesetting driver works quite reliably on Intel in my experience.

The SNA acceleration architecture in the Intel Xorg driver was a disaster in terms of correctness and stability. When SNA appeared as an option it initially seemed quite fast, but didn't take long to reveal it was also quite broken vs. UXA.

I used to explicitly use UXA but for the last 5-10 years simply using modesetting has been the way to go.

Personally I think you're conflating Xorg and kernel driver issues. Xorg is basically unmaintained in general now and unfortunately SNA was the last major development in that context for the Intel driver, and it was not good.


This doesn't apply if you want to run CUDA-dependent software. I've generally gone for Nvidia for my personal machine since Torch has behaved oddly on AMD cards in the past.

It's true that Nvidia doesn't support Wayland properly, but that's not really an issue in my opinion. Wayland still has its own problems that mean switching from X11 isn't viable yet.


Although your argument is valid, are we talking about CUDA? Obviously CUDA is an NVIDIA thing under all platforms, right? I don't think anyone would buy AMD with the intention of running CUDA.

Regarding GPUs and how good they work under Linux, computing on GPUs is only a part of the discussion I would argue...


What issues have you had with Wayland? Switching to it has given me a tear free experience on both AMD & Intel laptops, besides that it performs similar to X11.


> tear free

> 5 or so years of tearing

I know what people are referring to, but a less geeky person might come away from this thinking people get very emotional about bad Linux graphics drivers.


My main problem with it is limited software support. Xmonad isn't available and as far as I can tell what support exists for screen recording and screenshots is half-baked at best. I haven't seen anywhere near enough problems with X11 to make switching window managers worth it, and the screen recording thing would be a massive pain to work around.


I'm still on an Intel system (skylake) and my experience is similar to yours. 5+ years of bugs and crashes, tearing, multi-monitor headaches and general instability.

Eagerly awaiting the new AMD hardware.


I've found the wayland server to be a great experience with intel—the only weird bits I've seen is full-screen noise on firefox and poor support for high dpi, the latter of which is even shittier under X11. The server is really very usable nowadays.

AMD's ok if you have the room for the discrete card, but I wish they would invest more in integrated on-board chips.


Modern AMD GPUs work better on linux than nvidia. No tearing, multi-monitor works, and vulkan is very smooth. Nvidia is actually less stable, and has some peculiar quirks, such as needing composite manager running to get rid of tearing, spotty multi-monitor support, etc..


You are dismissing people saying they ARE having issues with AMD on Linux. In fact my AMD card does not do multi-monitor, and in this thread I'm not the only one that has multi-monitor issues on AMD.


Which card are you using? I'm aware the older cards are still bad. Especially if you still need fglrx. In my personal experience, the modern AMD GPUs on linux is first time graphics have worked reasonably well on linux. Even intel drivers are riddled with bugs and instability (not to mention they still don't even do gallium). GMA 3650 (powersgx based) being the most infamous worst driver ever.


A 5500 XT bought in June, so not old at all. I've heard the opposing argument, that since it's a relatively new card (out since Dec 2019?) I should expect some bugs, which is insane one year later. It's actually unusable, I have to log into my machine via SSH to restart it, or force reboot. It might break after 30 minutes or 3 days, when idle or busy.

https://gitlab.freedesktop.org/drm/amd/-/issues/929

AMD developers in that thread are chasing their tails and still haven't figured out why so many cards are having issues, and why other aren't, but as a consumer, that's really not inspiring at all.


Funny, I have 5600 XT (Sapphire Pulse) and it runs like dream. The out of box experience with Linux has been very good. Note that some of the aftermarket cards are actually bad and the instability might not be software related. Before 5600 XT, I used R9 290, and while it did require some tweaks to enable all features (due to being older card), it still ran relatively stable and in general was better experience than any nvidia card I had used in past.


This guy is having the same issues I'm having with a 5600. Multi-monitor, entirely new computer built a couple months ago.

Randomly locks up, random black screen, random rainbow colors all over my monitors.

With my new Nvidia 2060 which I bought to replace it; nothing. No issues. Works just fine on Manjaro.

For whatever reason, the AMD cards just get clapped on Linux.


My experience with linux is that the nvidia drivers and support are the worst of the bunch, and if I had a nickle every time I could trace a kernel panic through their driver I'd get a very nice lunch. Their popularity seems to be driven primarily by exclusive access to CUDA APIs and windows gaming. Nouveau is OK for accelerated 2d but is hardly in the same ballpark as the AMD drivers.

That said I just picked up a quadro (not my choice, came with a prebuilt NUC) and I've been pleased to find that it "just works" on freebsd (I use it to realtime transcode video), so clearly great experiences are possible and I don't want to be needlessly harsh.

Personally, I'm dying for a discrete intel card. I can't recall any hiccups with intel chipsets, ever, and that matters WAY more to me than raw performance.


> the driver is pretty abominable compared to the code quality of most of the rest of the kernel.

Could you say more about what specifically makes the driver abominable? Is it just those files with largely duplicated code?


duplication 3 times with small differences between is a good case to keep separate imo.

abstraction is one of the main sources of code complexity.

you start with one function used in 3 places, then add boolean args to it to get slightly different functionality at each place, eventually it becomes a mess of complexity


I think that's very subjective and situational.

The amdgpu driver has duplicated files for different versions of things, so it'll have thing_v6.c and thing_v7.c and thing_v8.c with a lot of duplicated functions.

The more common way of doing something like this would be to have structs of function pointers that get populated based on what version of GPU you have. You have one file with all the common functions that they can share, in the definitions for each GPU version you set the majority of the function pointers to the common version they all share and for ones that have to be different, you set them to their unique version. That way you can define all the common functions once, and point to them in the structs for each version.

Having a quick flick through the code now, they do use structs of function pointers in each version for common operations but they still don't abstract out the ones that are either identical or have very few differences that you could special case.

Refactoring such a giant driver for no performance gain is going to be extremely low on AMD's todo list, so it'll probably stay like that. It just doesn't look like anything else in the kernel


This is literally what what everyone does in embedded C land. The repetitive definitions ate generally intended to be used with macros and are typically generated from the same definitions as the chip registers itself. Some places also auto generate embedded c/c++ structs or classes which imo is better. But I have gotten quite a bit of pushback for doing it.

A big issue also is the use of bitfields as much as reg duplication. Bitfields in c/c++ are a minefield if you don't lock down a known-good compiler version because there's just so much of it that's technically unspecified. Oftentimes you'll also have issues where certain register fields exist for some registers of a series and not the next or where the functionality/sizing/interpretation is context dependent or where certain locks or write orders are needed for correct access and these are often handed with presence checking macros.

IMO, if we want better driver code, it's time for GCC/Clang to nail down the bitfield layouts for the embedded use cases. This has been broken for far too long.


Sounds like an excellent way for someone looking for something to contribute to get their code into the kernel though


It would be very difficult to get accepted. You'd have to get the AMDGPU driver maintainers on board, and you'd probably have to do a lot of it at once to justify the change. It would also take some discussion, and you're talking about refactoring a lot of stuff which probably moves underneath you during this, so you have to keep iterating to keep up with the changes, all without knowing if they'll even end up taking it...

Changes like this are probably a good way to get started but I would guess the AMDGPU driver is one of the worst places to get started as a beginner.


I mean, each new version is separate, correct? So the only change that can happen under you is when something is backported. How often does that happen for a gpu driver, and how far back does that go?


Or you duplicate code in 3 places, and apply the same fixes or updates in 3 places for all of eternity. There are pros and cons to both methods and each have their places, no need to start this constant debate here.


That's why this approach can tend to be a positive for driver versions matched to hardware iterations: a given fix may or may not apply to a given hardware config, and likely has to be tested against each config separately.

It's one of the unusual circumstances where, unfortunately, abstraction can decrease flexibility and increase development time.


Proverb: "A little copying is better than a little dependency." (Rob Pike)

That is, it's better to have duplications than the wrong abstraction. This may also be in reference to C compilation, in that loading header files and dependencies costs more than inlined code. That's one of the goals that the Go language sought to resolve, anyway.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: