2025, Dec 26 23:00

How to round-trip YAML in Python without losing ~ nulls or explicit !!timestamp tags in ruamel.yaml

Learn how to round-trip YAML in Python with ruamel.yaml while preserving ~ nulls and explicit !!timestamp tags using a custom representer and pre/post transforms.

Editing YAML with Python often looks deceptively simple until serialization changes the shape of your data. A common case: you load a document, tweak a different part of the tree, and on writing back discover that a tilde-based null and a tagged timestamp did not survive the round-trip intact. If you rely on exact tag forms, that normalization is a problem you cannot ignore.

Problem statement

Consider a YAML snippet with a null shown as a tilde and a value tagged as a timestamp. The original structure is straightforward:

example:
  - x: ~
    y: !!timestamp 2025-05-04

After processing, it unexpectedly turns into:

example:
  - x:
    y: 2025-05-04

The tilde is gone because the scalar became an empty value, and the explicit tag disappeared because it was treated as superfluous. The result is a semantically similar YAML tree, but not byte-for-byte compatible with what some systems require.

Why this happens

ruamel.yaml normalizes output during a round-trip. That includes emitting None as an empty scalar when possible and dropping superfluous tags such as !!timestamp. If your downstream tooling depends on the exact spelling of null (~) or on having that explicit tag in place, the default behavior will not preserve it.

A precise fix without changing your data model

There are two pieces to restoring the original representation. First, ensure that null is emitted as a tilde by providing a custom representer for the Python None. Second, preserve the explicit timestamp tag by pre-processing the source to convert the superfluous tag into a non-superfluous one and then post-processing the output to revert that change. The following code demonstrates both steps end-to-end.

import sys
import ruamel.yaml

doc_src = """\
example:
  - x: ~
    y: !!timestamp 2025-05-04
"""

class TildeNullRep(ruamel.yaml.representer.RoundTripRepresenter):
    def emit_null(self, value):
        if len(self.represented_objects) == 0 and not self.serializer.use_explicit_start:
            return self.represent_scalar('tag:yaml.org,2002:null', 'null')
        return self.represent_scalar('tag:yaml.org,2002:null', "~")

TildeNullRep.add_representer(type(None), TildeNullRep.emit_null)

def restore_ts_tag(s):
    return s.replace('TMP_TS', '!!timestamp')

yaml_rt = ruamel.yaml.YAML()
yaml_rt.Representer = TildeNullRep
yaml_rt.indent(sequence=4, offset=2)

doc_obj = yaml_rt.load(doc_src.replace('!!timestamp', '!TMP_TS'))
yaml_rt.dump(doc_obj, sys.stdout, transform=restore_ts_tag)

This produces:

example:
  - x: ~
    y: !!!timestamp 2025-05-04

Non-superfluous tags are preserved through the round-trip without extra work, which is why the pre/post transformation strategy is effective for this use case.

Why this matters

Some pipelines are sensitive to the exact textual form of YAML, not just the data it carries. Dropping an explicit tag or altering how null is spelled can break schema validators, integration tests, or systems that key off of those markers. Understanding how ruamel.yaml normalizes output, and how to override that behavior cleanly, keeps your automation predictable.

Takeaway

If you need to keep null represented as ~, provide a custom representer for None. If you must preserve a tag that ruamel.yaml considers superfluous, wrap the round-trip with a pre-processing placeholder and a post-processing restore. This approach lets you modify only what you intend while keeping the rest of the document exactly as required.