Actually in this case that's not exactly true: > generation of 281,128 augmented...

littlestymaar · 2025-09-30T18:47:03 1759258023

> All example are already correlated because they are generated in the same way.

All examples of “document information extraction” would be correlated no matter where they come from because they all would be “document information extraction” examples…

The real question is whether or not the examples are representative of the broad “document information extraction” use-case.