Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Your estimation of BLAKE2's data/second is indeed off by an order of magnitude <https://blake2.net/ >, and the demand is certainly there to have better checksum algorithms in filesystems, both for integrity and deduplication. ZFS for example uses SHA-2.

BLAKE2's authors do say they designed a fast algorithm envisioned for storage, so that's why I think it fits btrfs.



Replying to all the helpful posters here.

Thanks, that's very information. I checked out the link and it is very cool how blake2 can be tuned for different roles. I hadn't though of how newer filesystems are doing deduplication, my head is stuck in the ext4 era.

I still think performance matters. Even at 800 megabytes a second you are talking about committing two entire cores to checksums on 10-gig E if you need to move data around. An entire core if you are talking about sequentially scanning an SSD. I suppose this will stop mattering as we get more cores.

If a filesystem is using CRC32 for something, it doesn't need the properties of a cryptographic hash or they are doing it wrong. I can see how you can argue against CRC for reliability.

I am not sure whether you actually risk a corrupt block every 1 in 2^32 blocks. Most blocks won't be corrupt so the CRC only has to detect a much smaller number of errors. Assuming every block had an error needing detection you will miss an error every 16 terabytes (assuming other things as well). Assuming 1% of blocks are corrupt you would miss an error every 1.6 petabytes? Maybe I am thinking about this wrong, and I recall other factors like block size effecting CRC's reliability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: