2026, Jan 08 21:00

How to resolve CrewAI DuplicateIDError during knowledge ingestion: attach shared PDFs at crew level

Learn why CrewAI hits DuplicateIDError when agents share a PDFKnowledgeSource, and fix it by moving shared knowledge to crew level to prevent duplicate IDs.

Fixing DuplicateIDError in CrewAI when sharing knowledge sources across agents

Initializing a Crew on Azure can unexpectedly fail with a document upsert error that mentions duplicated IDs. The failure surfaces during knowledge ingestion and stops the crew before the first task runs. The error looks like this:

crewai Failed to upsert documents: "Expected IDs to be unique, found 28 Duplicate IDs"

The behavior turned out to be tied to how knowledge sources are assigned. Adding the same PDF knowledge to multiple agents triggered the duplication. Moving that shared knowledge to the crew level resolved the issue.

Problem setup

The following example mirrors the failing configuration. Two agents both receive the same PDFKnowledgeSource, while the crew itself has a separate text source. The IDs generated for the PDF chunks collide when both agents try to upsert the same documents.

from typing import List
from crewai import Agent, Crew, Process, Task, LLM
from crewai.project import CrewBase, agent, crew, task
from crewai.agents.agent_builder.base_agent import BaseAgent
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource
from crewai_tools import VisionTool, CodeInterpreterTool

# Tools
viz_tool = VisionTool()
code_runner = CodeInterpreterTool()

# Knowledge sources
notes_docs = TextFileKnowledgeSource(file_paths=["abc.md"])
pdf_docs = PDFKnowledgeSource(file_paths=["def.pdf", "ghj.pdf"]) 
csv_docs = CSVKnowledgeSource(file_paths=["tyu.csv", "iop.csv"]) 

# LLM
core_model = LLM(
    model="anthropic/claude-sonnet-4-20250514",
    temperature=0,
    max_tokens=64000
)

@CrewBase
class CollabHub:
    """CollabHub crew"""

    agents: List[BaseAgent]
    tasks: List[Task]

    agents_config = 'config/agents.yaml'
    tasks_config = 'config/tasks.yaml'

    @agent
    def market_researcher(self) -> Agent:
        return Agent(
            config=self.agents_config['financial_analyst'],
            tools=[viz_tool, code_runner],
            verbose=True,
            reasoning=True,
            llm=core_model,
            knowledge_sources=[pdf_docs],
            allow_delegation=True,
        )

    @agent
    def lab_researcher(self) -> Agent:
        return Agent(
            config=self.agents_config['scientist'],
            tools=[viz_tool],
            verbose=True,
            reasoning=True,
            llm=core_model,
            knowledge_sources=[pdf_docs, csv_docs],
            allow_delegation=True,
        )

    @agent
    def coordinator(self) -> Agent:
        return Agent(
            config=self.agents_config['orchestrator'],
            allow_delegation=True,
            verbose=True,
            reasoning=True,
            llm=core_model,
        )

    @task
    def reply_to_request(self) -> Task:
        return Task(
            config=self.tasks_config['respond_to_user_task'],
        )

    @crew
    def team(self) -> Crew:
        return Crew(
            agents=[
                self.lab_researcher(),
                self.market_researcher(),
            ],
            tasks=[
                self.reply_to_request()
            ],
            process=Process.hierarchical,
            verbose=True,
            manager_agent=self.coordinator(),
            knowledge_sources=[notes_docs]
        )

What’s going on

The stack trace shows the failure occurs during knowledge ingestion, specifically when saving chunks from PDFKnowledgeSource. The upsert into the backing store throws a DuplicateIDError. The error message points to duplicated IDs during collection.upsert and aligns with the moment CrewAI adds sources for the agents:

... in crewai/knowledge/source/pdf_knowledge_source.py self._save_documents() ... chromadb DuplicateIDError: Expected IDs to be unique, found duplicated IDs ...

The essential detail is that the same PDF knowledge was attached to two different agents. Each agent attempted to add the same documents, which led to colliding IDs and the upsert failure. The behavior did not reproduce locally in one environment, and the reason for that difference is unknown here, but the root cause in the failing environment was confirmed to be the duplicate ingestion of the same PDFs.

Working fix

Instead of assigning the PDF knowledge to individual agents, attach it once at the crew level. This prevents repeated ingestion and eliminates the duplicated IDs during upsert. The agents keep their tools and configuration, while the shared knowledge moves to Crew.

from typing import List
from crewai import Agent, Crew, Process, Task, LLM
from crewai.project import CrewBase, agent, crew, task
from crewai.agents.agent_builder.base_agent import BaseAgent
from crewai.knowledge.source.text_file_knowledge_source import TextFileKnowledgeSource
from crewai.knowledge.source.pdf_knowledge_source import PDFKnowledgeSource
from crewai.knowledge.source.csv_knowledge_source import CSVKnowledgeSource
from crewai_tools import VisionTool, CodeInterpreterTool

# Tools
viz_tool = VisionTool()
code_runner = CodeInterpreterTool()

# Knowledge sources
notes_docs = TextFileKnowledgeSource(file_paths=["abc.md"])
pdf_docs = PDFKnowledgeSource(file_paths=["def.pdf", "ghj.pdf"]) 
csv_docs = CSVKnowledgeSource(file_paths=["tyu.csv", "iop.csv"]) 

# LLM
core_model = LLM(
    model="anthropic/claude-sonnet-4-20250514",
    temperature=0,
    max_tokens=64000
)

@CrewBase
class CollabHub:
    """CollabHub crew"""

    agents: List[BaseAgent]
    tasks: List[Task]

    agents_config = 'config/agents.yaml'
    tasks_config = 'config/tasks.yaml'

    @agent
    def market_researcher(self) -> Agent:
        return Agent(
            config=self.agents_config['financial_analyst'],
            tools=[viz_tool, code_runner],
            verbose=True,
            reasoning=True,
            llm=core_model,
            allow_delegation=True,
        )

    @agent
    def lab_researcher(self) -> Agent:
        return Agent(
            config=self.agents_config['scientist'],
            tools=[viz_tool],
            verbose=True,
            reasoning=True,
            llm=core_model,
            allow_delegation=True,
        )

    @agent
    def coordinator(self) -> Agent:
        return Agent(
            config=self.agents_config['orchestrator'],
            allow_delegation=True,
            verbose=True,
            reasoning=True,
            llm=core_model,
        )

    @task
    def reply_to_request(self) -> Task:
        return Task(
            config=self.tasks_config['respond_to_user_task'],
        )

    @crew
    def team(self) -> Crew:
        return Crew(
            agents=[
                self.lab_researcher(),
                self.market_researcher(),
            ],
            tasks=[
                self.reply_to_request()
            ],
            process=Process.hierarchical,
            verbose=True,
            manager_agent=self.coordinator(),
            knowledge_sources=[notes_docs, pdf_docs, csv_docs]
        )

Why this matters

Knowledge architecture in multi-agent systems isn’t just an organizational detail. When multiple agents try to ingest the same data source, the underlying storage can reject the operation if it sees identical document IDs more than once. Centralizing shared knowledge at the crew level ensures it’s added a single time and available to all agents without duplication. This also avoids puzzling environment-specific behavior; the issue here did not reproduce on one local setup, but assigning the knowledge once at Crew resolved it reliably.

Takeaways

If you hit DuplicateIDError during crew initialization and the trace points at knowledge ingestion, review where knowledge sources are attached. If the same PDF knowledge is added to multiple agents, move that source to the Crew’s knowledge_sources so it’s ingested once and shared. This small change aligns with the observed fix: attaching the PDF knowledge to the entire crew removed the error and allowed the system to initialize and run as expected.

agent crewai large-language-model python