"On x86-64, there are two CPU settings which control the kernel’s ability to access memory."
There are a couple more than two, even in 2021.
Memory Protection Keys come to mind, as do the NPT/EPT tables when virtualization is in play. SEV and SGX also have their own ways of preventing the kernel from writing to memory. The CPU also has range registers that protect certain special physical address ranges, like the TDX module's range. You can't write there either.
That's all that comes to mind at the moment. It's definitely a fun question!
a thought: do MPK actually control the kernel's ability to access memory? on intel, i think if you try to read that memory, a page fault wont be thrown. although with PKS, kernel reads will cause a page fault.
so can the kernel (ring0) freely read/write to memory encrypted with MPK? I think so, yes. good luck with whatever happens next tho lol
There are two versions of MPK. One is only applicable to userspace pages. The other is newer and can be applied to kernel space pages; last time I checked, this was only available on newer Xeon processors.
By the way, MPK memory is not encrypted. The key is just an identifier for the requestor. If the requestor key doesn’t match the same identifier for the memory page, then an exception is raised.
Funnily enough, MPK isn’t new at all. It’s almost a reintroduction of a feature from Itanium.
Aw, so I was half right. I knew the newer one, which is MPS, will throw a page fault. Sorry, it’s been a while since I’ve done this stuff and we were mostly working with tz
> Specifying "gather_data_sampling=force" will use the microcode mitigation when
> available or disable AVX on affected systems where the microcode hasn't been
> updated to include the mitigation.
Disclaimer: I work on Linux at Intel. I probably wrote or tweaked the documentation and changelogs that are confusing folks.
Uh... Did I miss the patches that add a pre-zeroed page pool to Linux? Wouldn't be the first time I missed something like that getting added, but 6.3-rc5 definitely zeroes _some_ pages at allocation time, and I don't see any indiciation of it consulting a prezeroed page pool: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
> there's no way[0][1] to express R^X, PROT_EXEC without PROT_READ is not possible.
I'll also add a [2]:
[2] There's no way to do it in the page tables. But, if you have Protection Keys for Userspace (PKU), you can get it ... kinda. You can have a PROT_READ|PROT_EXEC mapping, assign it a pkey, then set PKEY_DISABLE_ACCESS in the PKRU register for that key. In fact, if you have a PKU CPU and you do an unadorned mmap(PROT_EXEC), the kernel will allocate you a pkey and do this under the covers FOR you. Anyone who can execute WRPKRU can easily undo this protection, but it's better than nothing.
Granted, this was a lonely little fellow. But, he knew perfectly well what he was doing and repeatedly approached boats, despite the noise. He died after colliding with a tugboat prop.
They actually leave lots of evidence. A transient eating a seal is messy business and there are lots of seal bits and chunks left over. Eva Saulitis describes the aftermath in several cases in her book (https://www.penguinrandomhouse.com/books/219235/into-great-s...). IIRC, fishing the evidence out of the water is one of the primary ways they study killer whale diets.
IIRC IME also does a lot of core functionality like power regulation. Unlike many in this thread probably think, it does provide a lot of core functionality that you probably don't want removed.
The contention, from the User standpoint, however, is the network stack, potential to phone home, and the unrestricted access to the global machine state, combined with the fact, it is not documented or disclosed.
It's one thing to have that and be up front and open on it. Get secretive, and you're creating a massive source of unknown unknowns for everyone involved.
And like it or not, if you won't/can't be transparent about it, either
A) It'd take too long to document, which suggests there may be room for simplification
B) you're doing something that if it saw the light of day, would cause outrage, likely because you shouldn't be doing it
C) You're holding back the state-of-the-art for the sake of securing a revenue stream.
None of these inspires a excess of confidence/trust.
Actually, its primary design goal is to make address sanitizers faster. Right now, all the code that touches a sanitizer-tagged address must be recompiled to understand how to place and remove the tag. These address-bit-ignore approaches can (ideally) allow you to just modify the memory allocator to hand out tagged addresses. Those addresses can then be passed around to code that doesn't even know it's handling a tagged address. It doesn't need to be modified. You don't need to recompile the world. Even when the sanitizer is on, you also don't need to be constantly stripping tags out of pointers before dereferencing them.
There are a couple more than two, even in 2021.
Memory Protection Keys come to mind, as do the NPT/EPT tables when virtualization is in play. SEV and SGX also have their own ways of preventing the kernel from writing to memory. The CPU also has range registers that protect certain special physical address ranges, like the TDX module's range. You can't write there either.
That's all that comes to mind at the moment. It's definitely a fun question!
reply