Back to Articles
Using Rejection Pairs From Your Model's Own Failures The Loop Survives Fine-Tuning The Design Decision: Degenerate Outputs as Rejection Pairs Consistent Across Five Model Families The Pattern Beyond OCR Sources
Using Rejection Pairs From Your Model's Own Failures
In April, we released DharmaOCR, our specialized structured OCR model (available on Hugging Face) along with a paper detailing the methodology behind it and a benchmark demonstrating its superior quality and cost efficiency.
The paper benchmarked leading vision-language model families - both open-source and commercial - on a structured document extraction task: OCR on Brazilian Portuguese text. Among the reported metrics was text degeneration rate: the frequency with which a model produces a repetition loop instead of a transcription.






