2025, Oct 01 11:00

How to Query Null-like Metadata in LlamaIndex: Understanding None Filtering and Practical Fixes

Learn why LlamaIndex metadata filters with EQ=None return no results, and see two fixes using a sentinel string or empty value to make null queries work.

Filtering by metadata that contains null-like values is easy to get wrong in retrieval pipelines. If you are building on LlamaIndex and expect a filter with value None to match documents that explicitly store None in their metadata, you will get an empty result set. Below is a concise walkthrough of the behavior, a minimal code sample, and two straightforward ways to make such queries work reliably.

Reproducing the issue

The following example builds a VectorStoreIndex with a single TextNode whose metadata contains start_date set to None. A metadata filter with FilterOperator.EQ and value=None returns no nodes.

from llama_index.core import VectorStoreIndex
from llama_index.core.schema import TextNode
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)
sample_node = TextNode(
    text="This document has None in the metadata",
    id_="node_01",
    metadata={"start_date": None},
)
idx = VectorStoreIndex([sample_node])
print("Index nodes:", [n.metadata for n in idx.docstore.docs.values()])
null_date_rule = MetadataFilter(key="start_date", operator=FilterOperator.EQ, value=None)
rule_set = MetadataFilters(filters=[null_date_rule])
fetcher = idx.as_retriever(filters=rule_set, similarity_top_k=1)
results = fetcher.retrieve("this")
print("Retrieved nodes:", [(n.node_id, n.metadata) for n in results])

Observed output shows that the metadata is indeed stored with None, yet the filter retrieves nothing.

Index nodes:
 [{'start_date': None}]
Retrieved nodes:
 []

What’s happening and why

This is expected behavior in LlamaIndex: None is not filterable. Even if the metadata contains a field explicitly set to None, a filter where value is None will not match it. If you need to query for “null” semantics, you must store a concrete, filterable value.

Making it work in practice

There are two practical approaches. First, serialize the null-like value as a string and query that string. Second, store an empty string and filter for the empty string. Both strategies make the field filterable without changing retrieval logic elsewhere.

If you also need to configure embeddings and LLM with Ollama for your environment, the following setup can be used before building the index.

from llama_index.core import VectorStoreIndex, Settings
from llama_index.embeddings.ollama import OllamaEmbedding
from llama_index.llms.ollama import Ollama
emb_backend = OllamaEmbedding(
    model_name="llama3.2",
    base_url="http://localhost:11434",
)
Settings.embed_model = emb_backend

The first option stores the sentinel "None" string and filters with str(None).

from llama_index.core.schema import TextNode
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)
from llama_index.core import VectorStoreIndex
DocA = TextNode(
    text="This document has None in the metadata",
    id_="node_01",
    metadata={"start_date": "None"},
)
DocB = TextNode(
    text="This document has start date in the metadata",
    id_="node_02",
    metadata={"start_date": "20/03/2023"},
)
idx = VectorStoreIndex([DocA, DocB])
print("Index nodes:", [d.metadata for d in idx.docstore.docs.values()])
null_date_rule = MetadataFilter(key="start_date", operator=FilterOperator.EQ, value=str(None))
rule_set = MetadataFilters(filters=[null_date_rule])
fetcher = idx.as_retriever(filters=rule_set, similarity_top_k=1)
results = fetcher.retrieve("this")
print("Retrieved nodes:", [(x.node_id, x.metadata) for x in results])

Expected output shows the node with the sentinel string is matched by the filter.

Index nodes:
 [{'start_date': 'None'}, {'start_date': '20/03/2023'}]
Retrieved nodes:
 [('node_01', {'start_date': 'None'})]

The second option stores an empty string and filters for the empty string.

from llama_index.core.schema import TextNode
from llama_index.core.vector_stores import (
    MetadataFilter,
    MetadataFilters,
    FilterOperator,
)
from llama_index.core import VectorStoreIndex
DocA = TextNode(
    text="This document has None in the metadata",
    id_="node_01",
    metadata={"start_date": ""},
)
DocB = TextNode(
    text="This document has start date in the metadata",
    id_="node_02",
    metadata={"start_date": "20/03/2023"},
)
idx = VectorStoreIndex([DocA, DocB])
print("Index nodes:", [d.metadata for d in idx.docstore.docs.values()])
null_date_rule = MetadataFilter(key="start_date", operator=FilterOperator.EQ, value="")
rule_set = MetadataFilters(filters=[null_date_rule])
fetcher = idx.as_retriever(filters=rule_set, similarity_top_k=1)
results = fetcher.retrieve("this")
print("Retrieved nodes:", [(x.node_id, x.metadata) for x in results])

Expected output confirms that filtering by the empty string returns the intended node.

Index nodes:
 [{'start_date': ''}, {'start_date': '20/03/2023'}]
Retrieved nodes:
 [('node_01', {'start_date': ''})]

Why this matters

When building retrieval systems that depend on metadata filters for routing or post-filtering, silently missing documents is costly. Knowing that None is not filterable in LlamaIndex helps you normalize “null” data at ingestion time and maintain predictable query behavior.

Takeaways

If a field can be conceptually null, store an explicit filterable representation. Either use the string "None" and query str(None), or use an empty string and query that empty string. Keep the representation consistent across both indexing and filtering so that your VectorStoreIndex retrievers return the expected nodes.

The article is based on a question from StackOverflow by Gino and an answer by Ajeet Verma.