Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Could we make a method to sanitize PDF’s that preserves the metadata?

It would be better to strip active content like javascript and actions, without flattening the PDF and losing all the text data having the original text is better than sending it through ocr again.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: