Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

Datalab has released lift, a 9B open-weights vision model for structured extraction. You pass it a JSON schema, and it returns a JSON object that matches. The model reads PDFs and images directly, then decodes against your schema.

This is Datalab’s first model built purely for extraction. The team already ships open-source OCR tools: chandra, marker, and surya. lift extends that work into schema-driven field extraction.

lift scores 90.2% field accuracy on Datalab’s 225-document benchmark. The research team reports it as the strongest small self-hostable model they tested. It runs at a median of 9.5 seconds per document.

What is Datalab lift?

lift is a 9B-parameter vision model for structured extraction. It accepts standard JSON Schema as input. It returns valid JSON of that shape as output.

This is Datalab’s first model built purely for extraction. The team already ships open-source OCR tools: chandra, marker, and surya. lift extends that work into schema-driven field extraction.

What is Datalab lift?

lift is a 9B-parameter vision model for structured extraction. It accepts standard JSON Schema as input. It returns valid JSON of that shape as output.

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas

Other newsrooms on this story

Related reading

Using Lift to Turn Research PDFs into Structured JSON with Controlled,…

Designing a Schema-Guided Invoice Intelligence Pipeline with lift-pdf for…

Replacing Fragile CSS Selectors with LLM-Powered Zero-Shot JSON Extraction

Extract text from documents and images with Datalab Marker and OCR – Replicate…

Extract PDF text in your browser with LiteParse for the web

doceval — eval harness for LLM document extraction pipelines

Other newsrooms on this story

Related reading

Using Lift to Turn Research PDFs into Structured JSON with Controlled,…

Designing a Schema-Guided Invoice Intelligence Pipeline with lift-pdf for…

Replacing Fragile CSS Selectors with LLM-Powered Zero-Shot JSON Extraction

Extract text from documents and images with Datalab Marker and OCR – Replicate…

Extract PDF text in your browser with LiteParse for the web

doceval — eval harness for LLM document extraction pipelines