This sounds very promising, but let me ask an honest question: to me, it seems like databases are the hardest part to scale in your average IT infrastructure. How much work does it add to the database if you let it make all the ML related work as well? How much work is saved by reducing the number of necessary queries?
Contrary to some of the sibling responses, my experience with pgvector specifically (with hundreds of millions or billions of vectors) is that the workload is quite different from your typical web-app workload, enough so that you really want them on separate databases. For example, you have to be really careful about how vacuum/autovacuum interacts with pgvector’s HNSW indices if you’re frequently updating data; you have to be aware that the tables and indices are huge and take up a ton of memory, which can have knock-on performance implications for other systems; etc.
This is a read workload that can be easily horizontally scaled. The reduction in dev and infrastructure complexity is well worth the slight increase in DB provisioning.
You can use PL/Python to make API calls outside of the database, you just don't need a separate service to interact with the DB to orchestrate all your ML stuff, only endpoints.