2025, Sep 16 09:00

Fast Serialization of Float Embeddings in Python: choose str(), json.dumps, or orjson wisely

Learn how to serialize Python float embeddings fast. We compare str()/repr(), json.dumps, and orjson with benchmarks, pros/cons, and when to choose JSON vs plain strings.

Converting large lists of float embeddings to strings can be unexpectedly slow. When you’re writing thousands of vectors to plain text, the serialization step becomes a measurable bottleneck. A quick benchmark across common approaches shows an interesting result: a third-party library, orjson, outpaces native options. The natural question follows — is there a standard-library method that matches it?

Example: turning a float array into strings

import json, orjson

vals = [3.141592653589793] * 100

plain_repr = str(vals)          # Fastest native option for a plain string
json_repr = json.dumps(vals)    # JSON-compliant, slower
fast_serial = orjson.dumps(vals) # Third-party, fastest overall

What’s really slow here

If you’re aiming for native Python performance close to orjson, there isn’t a built-in that matches it. orjson is a Rust extension module that performs highly optimized serialization at C/Rust speed. The standard library options — str(), format(), json.dumps() — operate under stricter guarantees, and that cost shows up in benchmarks.

There’s another important nuance. str(vals) and "{}".format(vals) both rely on the same underlying mechanism (list.__str__), which is why they perform nearly identically. json.dumps() is slower because it must honor full JSON semantics: escaping, correctness of output, precise handling of numeric types, and so on.

External libraries can be faster not in spite of being external, but because they can use different implementations without breaking compatibility. Converting floats to strings isn’t trivial. The discussion you referenced is about replacing existing C code with newer and faster algorithms like Rye and Dragonbox. orjson may be using different string management mechanisms too. Allocating and releasing memory is expensive, so fast serializers pre-allocate and reuse buffers.

There’s also the reality of microbenchmarks. Your numbers show str(x) is less than twice as slow compared to orjson, whereas in 2022 it was 3–4 times slower; either the implementation improved or the microbenchmark is too sensitive to noise. Both points are reflected in the discussion you linked.

What to use in practice

If you don’t need strict JSON and only want a readable string, the fastest native approach is str() or repr(). They’re already implemented in C and about as fast as you’ll get in the standard library.

If you do need JSON-compatible output and care about speed, orjson is designed for the job and wins against both json and ujson in benchmarks. That’s the practical answer today.

A couple of caveats are worth keeping in mind. First, json.dumps(x) != orjson.dumps(x). Treat them as different tools rather than drop-in equivalents. Second, the original measurement used ints, while the described workload talks about floats. If your real data are floats, benchmark with floats to reflect your actual pipeline.

There’s also a separate angle: serializing to bytes instead of strings. Whether that helps depends on your downstream requirements; measure with your real data and targets to make the right trade-off.

Why this matters

When saving many embeddings, the cost of float-to-string conversion compounds. Every extra microsecond per vector turns into seconds or minutes at scale. Choosing an approach aligned with your output constraints — plain string vs JSON — avoids paying for guarantees you don’t need, or silently losing time on a hot path.

Conclusion

For plain textual representation, use str() or repr() and move on. For JSON-compatible output with high throughput, orjson.dumps is the fastest option, and nothing in the standard library currently matches it. Don’t assume equality between json.dumps and orjson.dumps; verify what your consumers expect. Finally, benchmark with your real float payloads, preferably with a tool like pyperf, and be mindful that microbenchmarks can be noisy — run enough iterations to get a stable signal.

The article is based on a question from StackOverflow by K_Augus and an answer by TAHSEEN BAIRAGDAR.