2025, Dec 15 15:00
How to Fix Empty Docling Exports: Treat DoclingDocument.load_from_doctags as a Factory Method
Seeing a correct DocTags string but an empty Docling export? Assign the result of DoclingDocument.load_from_doctags; understand why static constructor matters.
Exporting structured text after a successful multimodal run can be unexpectedly tricky. A common pitfall when converting SmolDocling output to a Docling document is ending up with an empty export even though the DocTags payload looks correct and no error is thrown. Below is a minimal, reproducible path to that behavior and how to fix it.
Reproducing the issue
The pipeline generates valid DocTags from a JPEG that contains a table. Decoding the tokens shows a proper DocTags string. The empty output happens at the final conversion step before exporting.
import torch
from transformers import AutoConfig, AutoProcessor
from transformers.image_utils import load_image
import onnxruntime
import numpy as np
import os
from docling_core.types.doc import DoclingDocument
from docling_core.types.doc.document import DocTagsDocument
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["ORT_CUDA_USE_MAX_WORKSPACE"] = "1"
# 1. Load models
model_ref = "ds4sd/SmolDocling-256M-preview"
cfg = AutoConfig.from_pretrained(model_ref)
proc = AutoProcessor.from_pretrained(model_ref)
# 2. ONNX sessions (CPU)
sess_vis = onnxruntime.InferenceSession("./models/smoldocling/vision_encoder.onnx")
sess_tok = onnxruntime.InferenceSession("./models/smoldocling/embed_tokens.onnx")
sess_dec = onnxruntime.InferenceSession("./models/smoldocling/decoder_model_merged.onnx")
# 3. Config values
kv_heads = cfg.text_config.num_key_value_heads
dim_head = cfg.text_config.head_dim
layers_hidden = cfg.text_config.num_hidden_layers
id_eos = cfg.text_config.eos_token_id
id_imgtok = cfg.image_token_id
id_eou = proc.tokenizer.convert_tokens_to_ids("<end_of_utterance>")
# 4. Inputs
chat_msgs = [
{
"role": "user",
"content": [
{"type": "image"},
{"type": "text", "text": "Convert this page to docling."}
]
},
]
pic = load_image("./data/image-with-table.jpeg")
tmpl = proc.apply_chat_template(chat_msgs, add_generation_prompt=True)
batch_inputs = proc(text=tmpl, images=[pic], return_tensors="np")
bsz = batch_inputs["input_ids"].shape[0]
mem_kv = {
f"past_key_values.{layer}.{kv}": np.zeros([bsz, kv_heads, 0, dim_head], dtype=np.float32)
for layer in range(layers_hidden)
for kv in ("key", "value")
}
img_feats = None
ids_in = batch_inputs["input_ids"]
mask_attn = batch_inputs["attention_mask"]
pos_idx = np.cumsum(batch_inputs["attention_mask"], axis=-1)
# 5. Generation loop
max_new = 8192
out_tokens = np.array([[]], dtype=np.int64)
for _ in range(max_new):
embeds = sess_tok.run(None, {"input_ids": ids_in})[0]
if img_feats is None:
img_feats = sess_vis.run(
["image_features"],
{
"pixel_values": batch_inputs["pixel_values"],
"pixel_attention_mask": batch_inputs["pixel_attention_mask"].astype(np.bool_)
}
)[0]
embeds[batch_inputs["input_ids"] == id_imgtok] = img_feats.reshape(-1, img_feats.shape[-1])
logits, *present = sess_dec.run(None, dict(
inputs_embeds=embeds,
attention_mask=mask_attn,
position_ids=pos_idx,
**mem_kv,
))
ids_in = logits[:, -1].argmax(-1, keepdims=True)
mask_attn = np.ones_like(ids_in)
pos_idx = pos_idx[:, -1:] + 1
for j, key in enumerate(mem_kv):
mem_kv[key] = present[j]
out_tokens = np.concatenate([out_tokens, ids_in], axis=-1)
if (ids_in == id_eos).all() or (ids_in == id_eou).all():
break
# 6. Decode to DocTags
doc_markup = proc.batch_decode(out_tokens, skip_special_tokens=False)[0].lstrip()
print(doc_markup) # Visible, looks correct
# 7. Build DocTagsDocument and try to export
dt_doc = DocTagsDocument.from_doctags_and_image_pairs([doc_markup], [pic])
print(doc_markup)
paper = DoclingDocument(name="Document")
paper.load_from_doctags(
doctag_document=dt_doc,
document_name="Document"
)
print(paper.export_to_markdown()) # Empty stringWhat is actually going on
The conversion call that looks like a mutating method is not mutating anything. DoclingDocument.load_from_doctags is a static constructor. It returns a new, populated document instance and does not populate the instance you called it on. Because the original object remains untouched, exporting from it yields an empty string without warnings or errors.
The fix
Treat load_from_doctags as a factory and capture the returned document. Do not call it on a precreated instance expecting in-place population.
paper = DoclingDocument.load_from_doctags(
doctag_document=dt_doc,
document_name="Document"
)
print(paper.export_to_markdown())Why this matters
This kind of silent no-op is easy to miss in data extraction pipelines, especially when earlier steps visibly produce good intermediate artifacts like DocTags. Recognizing that some APIs expose static constructors with instance-like names helps avoid chasing phantom bugs in model inference, ONNX sessions, token decoding, or image preprocessing when the real issue is a non-mutating call.
Takeaway
If you see a correct DocTags string but an empty export, verify that you are assigning the result of DoclingDocument.load_from_doctags to a new document variable. Once you capture that return value, exporting to markdown should reflect the content you generated.