Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Something that has always bothered me about z-lib and libgen and so on is the impossibility of being able to tell quality from the different uploads.

Sometimes the epubs are very good, with a good cover and metadata as well as chapters and no weird aberrations; other times, they're unreadable. But the only thing you have to judge quality is size and general description of the upload which in this website it's been reduced so everything looks the same.

I think the website looks good, but yeah, would be nice if we could rate quality in some way.



Large pdf and djvu files generally means low quality scans.


I think that really depends on the type of book. I've found the large PDFs to be by far the most reliable for books printed before the 20th century because - save for Google Books - no one's OCR is tuned for the kind of printing errors, age related wear and tear, and fonts used over the centuries.

Manuscripts (which are admittedly very niche) often have to be high resolution scans or photographs just to be readable, but zlib and libgen don't really have many of those.


Surely books published before the 20th century are in the public domain by now, though?


They are but someone has to be in possession of the book and go through the trouble of digitizing it. Usually it’s museums, libraries, and private collections and a variety of enthusiasts eventually upload from those sources to Zlib/libgen. Since no one cares about copyright there anyway it can be more complete in some genres than other central databases.


That depends, variously on a number of factors.

A straight-text book can run as little as 500--750 KiB for 100--200 pages or so.

ePubs may run smaller as they're a compressed archive of HTML files, so how large the straight text is is largely a matter of the additional markup and stylesheets applied. I've generated PDFs from Markdown which are fairly comparable.

Most mainstream trade press books run about 3--6 MiB as PDFs, if they have few images or graphics.

Books with a heavy graphics content can easily swell to 30--300 MiB.

And scanned rather than generated PDFs tend to run similarly sized. Many of those scans are, however, excellent quality.

The largest ebook within my own collection for quite some time was a copy of Lyell's Geography, downloaded from the Internet Archive and based on a library scan of a 19th-century printing. On a colour tablet, it's actually pretty nice reading, on a B&W E-ink tablet, such scans often have a significant background ghosting which may or may not be eliminated by contrast adjustments in your reader.

For newer materials, I've seen any number of issues with ebooks:

- Clean single-page scans with OCR'd text for highlighting and copy/paste are generally my favourite. There's something about original print layouts that I still find preferable to ePubs and other digital-native formats.

- Native PDF can also be quite good though they're far more prone to designeritis, where someone has thought they could improve over ink-on-paper conventions. If you think you can do this, you are almost certainly wrong.

- ePubs rerendered as PDFs are among the worst options. I prefer PDFs generally, but not in this case.

- ePubs ... can ... be reasonably good, though again the principle problem is excessive flexibility for designers. Less is more. I really miss having a single consistent layout of the text, rather than something which reflows as font faces, sizes, and spacing are changed. As a saving grace, cringeworthy (and migrane-inducing) font choices can be overridden with The One True Serif Font.

- Various low-quality scans. 2-up, lots of page skew and placement variation, heavily-marked texts, and such. I'll still almost always prefer these to an ePub or other digital-native format, though the reading can be much harder.

- Tiny-font 3-column scientific publications. Even on my 13" e-ink reader, these can be a challenge, and I'll occasionally resort to a sub-page rendering (this is natively supported in the Onyx NeoReader and several other e-book readers). Why such publications continue to insist on this format as we approach the 2nd quarter of the 21st century I've no idea at all. Scans based on older journal pubs (from ~1950s -- 1980s or so) with both physical wear and scan artefacts can be especially challenging.


The books are free, if you download the wrong version of a book you can simply try another and it costs you no more than a minute of your time.

Do you know how it worked before we had the internet for books? You had to go to a library yourself, which may have taken an hour of your day just in the journey, then hope the library had the book you wanted. If not, you may have to wait weeks for another library in the system to mail the book to your library.


I'm describing a problem, the fact that the books are free and that this problem causes a minor inconvenience doesn't make it not a problem. It's important to be aware of these things.


One mitigating factor is that it’s very easy to “correct” all but the very worst ebooks in Calibre:

- Metadata and a cover are one click away

- Removing excessive spacing, padding etc. is a few clicks


Pirates being picky, eh?

Don't get me wrong, I do appreciate the access that libgen and z-lib provide and think on the whole it is beneficial to society. It is still not paying for people's work though.


As a counterpoint, have you looked at your local library's ebook offerings? Publishers have created a situation where digital books are significantly more expensive than physical copies and expire far more quickly. We're also just coming out of a two year period where physical libraries effectively didn't exist and most library systems hadn't expanded their digital offerings to meet demand.

I still buy books, but these sites have largely replaced traditional libraries for my partner and I because they allow us to read what we actually want instead of Oprah's book club titles or the 50 digital self-help guides my local library stocks with their limited resources. I recognize that they're simply serving what most people want, but the alternative for me is simply not reading recreationally.

The other side of this conversation is that even if we did get rid of pirate libraries there would still be huge, unresolved issues with author compensation. There's been a dramatic decline in how much authors (and the other people directly involved) take home over the past couple decades [1] that has little to do with the marginal number of pirate readers.

[1] https://authorsguild.org/news/authors-guild-survey-shows-dra...


A big reason piracy exists is because the functionality of these systems is better than the official channels. There's a reason streaming services led to a dramatic decrease in piracy. There's also a reason that piracy is making a return. Yeah, pirates are picky and they push other services forward.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: