There's some comments I've run across saying Qwen2.5-VL's really good at handwriting recognition.
It'd also be interesting to see how Tesseract compares when trying to OCR more mixed text+graphic media. Some possible examples: high-design magazines with color backgrounds, TikTok posts, maps, cardboard hold-up signs at political gatherings.
Is there an advantage of using an LLM here?