Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> How do you rank results?

There's a ton of factors.

https://github.com/MarginaliaSearch/MarginaliaSearch/blob/ma...

> Can you give some rough indications of how many pages you index in total?

I index like 300 million documents right now, though I crawl something like 1.4 billion (and could index them all). The search engine is pretty judicious about filtering out low quality results, mostly because this improves the search results.

> How many page you crawl each day?

I don't know if I have a good answer for that. In general the crawling isn't really much of a bottleneck. I try to refresh the index completely every ~8 weeks, and also have some capabilities for discovering recent changes via RSS feeds.

> Size of the machine(s) in RAM and HDD?

It's an EPYC 7543 x2 SMP machine with 512 GB RAM and something like 90 TB disk space, all NVMe storage.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: