2026, Jan 02 01:00

How to Fix DeepSeek-VL2 Tiny Import Errors: Use deepseek_vl2 Namespace, Not deepseek_vl

Troubleshoot DeepSeek-VL2 tiny import errors fast: switch from deepseek_vl to deepseek_vl2 module paths. Includes step-by-step code and a working example.

DeepSeek-VL2 tiny fails to import? You are not alone. The example on the model card mixes package names from two different codebases, so trying to run it as-is leads to immediate import errors. The fix is simple once you align the import path with the correct repository and module names.

What goes wrong

The snippet below demonstrates the typical failure. It uses classes that belong to the VL2 codebase but imports them from the VL (v1) package path. That mismatch is exactly why the imports cannot be resolved.

import torch
from transformers import AutoModelForCausalLM

from deepseek_vl.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM
from deepseek_vl.utils.io import load_pil_images

repo_id = "deepseek-ai/deepseek-vl2-tiny"
dialog_proc = DeepseekVLV2Processor.from_pretrained(repo_id)
txt_tok = dialog_proc.tokenizer

mm_llm = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
mm_llm = mm_llm.to(torch.bfloat16).cuda().eval()

# single image conversation example
dialogue = [
    {
        "role": "<|User|>",
        "content": "<image>\n<|ref|>The giraffe at the back.<|/ref|>.",
        "images": ["./images/visual_grounding.jpeg"],
    },
    {"role": "<|Assistant|>", "content": ""},
]

pil_imgs = load_pil_images(dialogue)
batched_inputs = dialog_proc(
    conversations=dialogue,
    images=pil_imgs,
    force_batchify=True,
    system_prompt=""
).to(mm_llm.device)

token_embeddings = mm_llm.prepare_inputs_embeds(**batched_inputs)

gen_out = mm_llm.language_model.generate(
    inputs_embeds=token_embeddings,
    attention_mask=batched_inputs.attention_mask,
    pad_token_id=txt_tok.eos_token_id,
    bos_token_id=txt_tok.bos_token_id,
    eos_token_id=txt_tok.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True
)

decoded_text = txt_tok.decode(gen_out[0].cpu().tolist(), skip_special_tokens=True)
print(f"{batched_inputs['sft_format'][0]}", decoded_text)

Why it fails

The example mixes two different namespaces. The import path from deepseek_vl ... refers to the original DeepSeek-VL (v1) codebase, while the classes DeepseekVLV2Processor and DeepseekVLV2ForCausalLM belong to DeepSeek-VL2. The v1 repository does not provide these VL2 classes, hence the import error.

The model card for the tiny checkpoint links to the DeepSeek-VL2 repository, where examples correctly use the deepseek_vl2 namespace. Aligning the import path with that codebase resolves the mismatch.

The fix

Use the VL2 package path consistently. The correct import path for the VL2 classes is under deepseek_vl2. You should use the DeepSeek-VL2 repository rather than the original DeepSeek-VL if you need DeepseekVLV2Processor and DeepseekVLV2ForCausalLM. Below is the same program with corrected imports and the logic unchanged.

import torch
from transformers import AutoModelForCausalLM

from deepseek_vl2.models import DeepseekVLV2Processor, DeepseekVLV2ForCausalLM
from deepseek_vl2.utils.io import load_pil_images

repo_id = "deepseek-ai/deepseek-vl2-tiny"
dialog_proc = DeepseekVLV2Processor.from_pretrained(repo_id)
txt_tok = dialog_proc.tokenizer

mm_llm = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
mm_llm = mm_llm.to(torch.bfloat16).cuda().eval()

# single image conversation example
dialogue = [
    {
        "role": "<|User|>",
        "content": "<image>\n<|ref|>The giraffe at the back.<|/ref|>.",
        "images": ["./images/visual_grounding.jpeg"],
    },
    {"role": "<|Assistant|>", "content": ""},
]

pil_imgs = load_pil_images(dialogue)
batched_inputs = dialog_proc(
    conversations=dialogue,
    images=pil_imgs,
    force_batchify=True,
    system_prompt=""
).to(mm_llm.device)

token_embeddings = mm_llm.prepare_inputs_embeds(**batched_inputs)

gen_out = mm_llm.language_model.generate(
    inputs_embeds=token_embeddings,
    attention_mask=batched_inputs.attention_mask,
    pad_token_id=txt_tok.eos_token_id,
    bos_token_id=txt_tok.bos_token_id,
    eos_token_id=txt_tok.eos_token_id,
    max_new_tokens=512,
    do_sample=False,
    use_cache=True
)

decoded_text = txt_tok.decode(gen_out[0].cpu().tolist(), skip_special_tokens=True)
print(f"{batched_inputs['sft_format'][0]}", decoded_text)

Why this matters

Visual-language stacks evolve quickly and often ship multiple major versions side by side. A single-character difference in the module path, such as deepseek_vl versus deepseek_vl2, is enough to break imports even when the class names look right. Following the repository link on the model page and matching its import namespace prevents hours of trial-and-error. It also helps when reporting issues, since providing the exact import error message clarifies whether the problem is a version mismatch or something else.

Takeaways

When a model card example fails on the first imports, check that the import prefix matches the repository version referenced on the same page. For DeepSeek-VL2, use the deepseek_vl2 namespace. If you need DeepseekVLV2Processor or DeepseekVLV2ForCausalLM, work with the DeepSeek-VL2 repository rather than the original DeepSeek-VL. Keeping the package name consistent across all imports avoids mixing code from different major versions and gets you to a working setup faster.

deepseek huggingface-transformers large-language-model python