https://pytroubles.com/en/posts/id2485-fixing-empty-docling-exports-doctags-to-markdown-fails-when-load-from-doctags-isn-t-assigned

Fixing Empty Docling Exports: DocTags to Markdown Fails When load_from_doctags isn't Assigned

How to Fix Empty Docling Exports: Treat DoclingDocument.load_from_doctags as a Factory Method

Fixing Empty Docling Exports: DocTags to Markdown Fails When load_from_doctags isn't Assigned

Seeing a correct DocTags string but an empty Docling export? Assign the result of DoclingDocument.load_from_doctags; understand why static constructor matters.

2025-12-15T15:00:12+03:00

2025-12-15T15:00:13+03:00

Exporting structured text after a successful multimodal run can be unexpectedly tricky. A common pitfall when converting SmolDocling output to a Docling document is ending up with an empty export even though the DocTags payload looks correct and no error is thrown. Below is a minimal, reproducible path to that behavior and how to fix it.Reproducing the issueThe pipeline generates valid DocTags from a JPEG that contains a table. Decoding the tokens shows a proper DocTags string. The empty output happens at the final conversion step before exporting.import torch from transformers import AutoConfig, AutoProcessor from transformers.image_utils import load_image import onnxruntime import numpy as np import os from docling_core.types.doc import DoclingDocument from docling_core.types.doc.document import DocTagsDocument os.environ["OMP_NUM_THREADS"] = "1" os.environ["ORT_CUDA_USE_MAX_WORKSPACE"] = "1" # 1. Load models model_ref = "ds4sd/SmolDocling-256M-preview" cfg = AutoConfig.from_pretrained(model_ref) proc = AutoProcessor.from_pretrained(model_ref) # 2. ONNX sessions (CPU) sess_vis = onnxruntime.InferenceSession("./models/smoldocling/vision_encoder.onnx") sess_tok = onnxruntime.InferenceSession("./models/smoldocling/embed_tokens.onnx") sess_dec = onnxruntime.InferenceSession("./models/smoldocling/decoder_model_merged.onnx") # 3. Config values kv_heads = cfg.text_config.num_key_value_heads dim_head = cfg.text_config.head_dim layers_hidden = cfg.text_config.num_hidden_layers id_eos = cfg.text_config.eos_token_id id_imgtok = cfg.image_token_id id_eou = proc.tokenizer.convert_tokens_to_ids("<end_of_utterance>") # 4. Inputs chat_msgs = [ { "role": "user", "content": [ {"type": "image"}, {"type": "text", "text": "Convert this page to docling."} ] }, ] pic = load_image("./data/image-with-table.jpeg") tmpl = proc.apply_chat_template(chat_msgs, add_generation_prompt=True) batch_inputs = proc(text=tmpl, images=[pic], return_tensors="np") bsz = batch_inputs["input_ids"].shape[0] mem_kv = { f"past_key_values.{layer}.{kv}": np.zeros([bsz, kv_heads, 0, dim_head], dtype=np.float32) for layer in range(layers_hidden) for kv in ("key", "value") } img_feats = None ids_in = batch_inputs["input_ids"] mask_attn = batch_inputs["attention_mask"] pos_idx = np.cumsum(batch_inputs["attention_mask"], axis=-1) # 5. Generation loop max_new = 8192 out_tokens = np.array([[]], dtype=np.int64) for _ in range(max_new): embeds = sess_tok.run(None, {"input_ids": ids_in})[0] if img_feats is None: img_feats = sess_vis.run( ["image_features"], { "pixel_values": batch_inputs["pixel_values"], "pixel_attention_mask": batch_inputs["pixel_attention_mask"].astype(np.bool_) } )[0] embeds[batch_inputs["input_ids"] == id_imgtok] = img_feats.reshape(-1, img_feats.shape[-1]) logits, *present = sess_dec.run(None, dict( inputs_embeds=embeds, attention_mask=mask_attn, position_ids=pos_idx, **mem_kv, )) ids_in = logits[:, -1].argmax(-1, keepdims=True) mask_attn = np.ones_like(ids_in) pos_idx = pos_idx[:, -1:] + 1 for j, key in enumerate(mem_kv): mem_kv[key] = present[j] out_tokens = np.concatenate([out_tokens, ids_in], axis=-1) if (ids_in == id_eos).all() or (ids_in == id_eou).all(): break # 6. Decode to DocTags doc_markup = proc.batch_decode(out_tokens, skip_special_tokens=False)[0].lstrip() print(doc_markup) # Visible, looks correct # 7. Build DocTagsDocument and try to export dt_doc = DocTagsDocument.from_doctags_and_image_pairs([doc_markup], [pic]) print(doc_markup) paper = DoclingDocument(name="Document") paper.load_from_doctags( doctag_document=dt_doc, document_name="Document" ) print(paper.export_to_markdown()) # Empty stringWhat is actually going onThe conversion call that looks like a mutating method is not mutating anything. DoclingDocument.load_from_doctags is a static constructor. It returns a new, populated document instance and does not populate the instance you called it on. Because the original object remains untouched, exporting from it yields an empty string without warnings or errors.The fixTreat load_from_doctags as a factory and capture the returned document. Do not call it on a precreated instance expecting in-place population.paper = DoclingDocument.load_from_doctags( doctag_document=dt_doc, document_name="Document" ) print(paper.export_to_markdown())Why this mattersThis kind of silent no-op is easy to miss in data extraction pipelines, especially when earlier steps visibly produce good intermediate artifacts like DocTags. Recognizing that some APIs expose static constructors with instance-like names helps avoid chasing phantom bugs in model inference, ONNX sessions, token decoding, or image preprocessing when the real issue is a non-mutating call.TakeawayIf you see a correct DocTags string but an empty export, verify that you are assigning the result of DoclingDocument.load_from_doctags to a new document variable. Once you capture that return value, exporting to markdown should reflect the content you generated.

Docling, DocTags, SmolDocling, DoclingDocument, load_from_doctags, empty export, export to markdown, static constructor, factory method, ONNX, multimodal OCR, Python, table extraction, fix

2025

2025, Dec 15 15:00

How to Fix Empty Docling Exports: Treat DoclingDocument.load_from_doctags as a Factory Method

Seeing a correct DocTags string but an empty Docling export? Assign the result of DoclingDocument.load_from_doctags; understand why static constructor matters.

Reproducing the issue

The pipeline generates valid DocTags from a JPEG that contains a table. Decoding the tokens shows a proper DocTags string. The empty output happens at the final conversion step before exporting.

import torch
from transformers import AutoConfig, AutoProcessor
from transformers.image_utils import load_image
import onnxruntime
import numpy as np
import os
from docling_core.types.doc import DoclingDocument
from docling_core.types.doc.document import DocTagsDocument
os.environ["OMP_NUM_THREADS"] = "1"
os.environ["ORT_CUDA_USE_MAX_WORKSPACE"] = "1"
# 1. Load models
model_ref = "ds4sd/SmolDocling-256M-preview"
cfg = AutoConfig.from_pretrained(model_ref)
proc = AutoProcessor.from_pretrained(model_ref)
# 2. ONNX sessions (CPU)
sess_vis = onnxruntime.InferenceSession("./models/smoldocling/vision_encoder.onnx")
sess_tok = onnxruntime.InferenceSession("./models/smoldocling/embed_tokens.onnx")
sess_dec = onnxruntime.InferenceSession("./models/smoldocling/decoder_model_merged.onnx")
# 3. Config values
kv_heads = cfg.text_config.num_key_value_heads
dim_head = cfg.text_config.head_dim
layers_hidden = cfg.text_config.num_hidden_layers
id_eos = cfg.text_config.eos_token_id
id_imgtok = cfg.image_token_id
id_eou = proc.tokenizer.convert_tokens_to_ids("<end_of_utterance>")
# 4. Inputs
chat_msgs = [
    {
        "role": "user",
        "content": [
            {"type": "image"},
            {"type": "text", "text": "Convert this page to docling."}
        ]
    },
]
pic = load_image("./data/image-with-table.jpeg")
tmpl = proc.apply_chat_template(chat_msgs, add_generation_prompt=True)
batch_inputs = proc(text=tmpl, images=[pic], return_tensors="np")
bsz = batch_inputs["input_ids"].shape[0]
mem_kv = {
    f"past_key_values.{layer}.{kv}": np.zeros([bsz, kv_heads, 0, dim_head], dtype=np.float32)
    for layer in range(layers_hidden)
    for kv in ("key", "value")
}
img_feats = None
ids_in = batch_inputs["input_ids"]
mask_attn = batch_inputs["attention_mask"]
pos_idx = np.cumsum(batch_inputs["attention_mask"], axis=-1)
# 5. Generation loop
max_new = 8192
out_tokens = np.array([[]], dtype=np.int64)
for _ in range(max_new):
    embeds = sess_tok.run(None, {"input_ids": ids_in})[0]
    if img_feats is None:
        img_feats = sess_vis.run(
            ["image_features"],
            {
                "pixel_values": batch_inputs["pixel_values"],
                "pixel_attention_mask": batch_inputs["pixel_attention_mask"].astype(np.bool_)
            }
        )[0]
        embeds[batch_inputs["input_ids"] == id_imgtok] = img_feats.reshape(-1, img_feats.shape[-1])
    logits, *present = sess_dec.run(None, dict(
        inputs_embeds=embeds,
        attention_mask=mask_attn,
        position_ids=pos_idx,
        **mem_kv,
    ))
    ids_in = logits[:, -1].argmax(-1, keepdims=True)
    mask_attn = np.ones_like(ids_in)
    pos_idx = pos_idx[:, -1:] + 1
    for j, key in enumerate(mem_kv):
        mem_kv[key] = present[j]
    out_tokens = np.concatenate([out_tokens, ids_in], axis=-1)
    if (ids_in == id_eos).all() or (ids_in == id_eou).all():
        break
# 6. Decode to DocTags
doc_markup = proc.batch_decode(out_tokens, skip_special_tokens=False)[0].lstrip()
print(doc_markup)  # Visible, looks correct
# 7. Build DocTagsDocument and try to export
dt_doc = DocTagsDocument.from_doctags_and_image_pairs([doc_markup], [pic])
print(doc_markup)
paper = DoclingDocument(name="Document")
paper.load_from_doctags(
    doctag_document=dt_doc,
    document_name="Document"
)
print(paper.export_to_markdown())  # Empty string

What is actually going on

The conversion call that looks like a mutating method is not mutating anything. DoclingDocument.load_from_doctags is a static constructor. It returns a new, populated document instance and does not populate the instance you called it on. Because the original object remains untouched, exporting from it yields an empty string without warnings or errors.

The fix

Treat load_from_doctags as a factory and capture the returned document. Do not call it on a precreated instance expecting in-place population.

paper = DoclingDocument.load_from_doctags(
    doctag_document=dt_doc,
    document_name="Document"
)
print(paper.export_to_markdown())

Why this matters

This kind of silent no-op is easy to miss in data extraction pipelines, especially when earlier steps visibly produce good intermediate artifacts like DocTags. Recognizing that some APIs expose static constructors with instance-like names helps avoid chasing phantom bugs in model inference, ONNX sessions, token decoding, or image preprocessing when the real issue is a non-mutating call.

Takeaway

If you see a correct DocTags string but an empty export, verify that you are assigning the result of DoclingDocument.load_from_doctags to a new document variable. Once you capture that return value, exporting to markdown should reflect the content you generated.

huggingface-transformers machine-learning ocr onnxruntime python