Datalab has released lift, a 9B open-weights vision model for structured extraction. You pass it a JSON schema, and it returns a JSON object that matches. The model reads PDFs and images directly, then decodes against your schema.
This is Datalab’s first model built purely for extraction. The team already ships open-source OCR tools: chandra, marker, and surya. lift extends that work into schema-driven field extraction.
lift scores 90.2% field accuracy on Datalab’s 225-document benchmark. The research team reports it as the strongest small self-hostable model they tested. It runs at a median of 9.5 seconds per document.
What is Datalab lift?
lift is a 9B-parameter vision model for structured extraction. It accepts standard JSON Schema as input. It returns valid JSON of that shape as output.












