Downloading the MLX version of "Qwen2.5-VL-32b-Instruct -8bit" via LM Studio right now since it's not yet available on Ollama and I can run it locally... I have an OCR side project for it to work on, want to see how performant it is on my M4... will report back
Its errors are interesting (averaging around one per paragraph). Semantically-correct, but wrong on precision (simple example, the English word "ardour" is transcripted as "ardor", and a foreign word like "palazzo" which is intended to remain so, is translated to "palace"). I'm still messing with temp/presence/frequency/top-p/top-k/prompting to see if I can squeeze some more precision out of it, but I'm running out of time.
Not sure if it matters but I exported a PDF page as a PNG with 200dpi resolution, and used that.
It seems like it's reading the text but getting the details wrong.
I would not be comfortable using this in an official capacity without more accuracy. I could see using this for words that another OCR system is uncertain about, though, as a fallback.