2025, Dec 17 01:00

Build a dictionary of lists from SQLite results in Python: efficient column-wise loading with chunks

Convert SQLite query results to a Python dictionary of lists using chunked fetchmany and a single transpose for faster column-wise loading and lower overhead.

When you pull data from SQLite, the obvious path is to build a list of dictionaries: one dict per row. That is ergonomic and readable, but not always what you want. If your downstream code expects column-major structures, turning the result into a dictionary of lists is a better fit. Below we look at a small table, show how the conventional approach behaves, and then switch to an efficient column-wise load using chunked reads.

Problem setup

Consider a simple table stored in my_db.db:

idx   speed   duration
 0     420       50
 1     380       40
 2     390       45

Loading it into Python as a list of dictionaries is straightforward:

import sqlite3

base_path = folder_path  # example only; assume folder_path is defined

db_conn = sqlite3.connect(base_path + 'my_db.db')
db_conn.row_factory = sqlite3.Row
db_cur = db_conn.cursor()
db_cur.execute('SELECT * FROM my_table')
row_objs = db_cur.fetchall()
rows_as_dicts = [dict(r) for r in row_objs]
db_cur.close()
db_conn.close()

This produces:

rows_as_dicts = [
  {'idx': 0, 'speed': 420, 'duration': 50},
  {'idx': 1, 'speed': 380, 'duration': 40},
  {'idx': 2, 'speed': 390, 'duration': 45}
]

The target structure, however, is a dictionary of lists, organized by columns:

cols_as_lists = {
  'idx': [0, 1, 2],
  'speed': [420, 380, 390],
  'duration': [50, 40, 45]
}

What’s really going on

The list-of-dicts approach iterates row by row and constructs a separate dictionary for every record, repeatedly touching the same keys in Python. If you need a column-oriented layout, you end up doing extra work: either a second pass to pivot rows into columns, or per-cell dictionary lookups during ingestion. Both add overhead. The alternative is to build the column structure directly while fetching from the cursor.

An efficient way to build a dictionary of lists

The idea is to read the result set in chunks and transform each chunk from row-major to column-major form with a single transpose step. Using fetchmany and a compact iterator pipeline avoids repeated lookups per cell and leans on dense iteration. This approach has been measured to be roughly 5x faster, because it avoids repeated dictionary lookups for every single column in every single row and uses more dense iterators.

import sqlite3
import collections

cn = sqlite3.connect(base_path + 'my_db.db')
cr = cn.cursor()
cr.execute('SELECT * FROM my_table')

# 'col_names' should contain the column names in the same order as the SELECT output
col_names = columns  # assume this sequence exists alongside the cursor

store = collections.defaultdict(list)
while batch := cr.fetchmany(CHUNK_SIZE):
    for col_values, col_name in zip(zip(*batch), col_names):
        store[col_name].extend(col_values)

cr.close()
cn.close()

cols_as_lists = dict(store)

This builds the desired dictionary-of-lists structure directly during ingestion, without materializing a list of row dictionaries first.

Why this matters

For tables with many rows, column-wise construction can reduce Python-level overhead and speed up loading significantly. It also aligns with column-major consumers out of the box, avoiding a second pass. As one practitioner noted, loading directly with pandas was an order of magnitude slower than the above approach in a recent test. Another observation is that you can go from SQL straight to a pandas DataFrame, and a DataFrame is conceptually close to a dictionary of lists. Whether that trade-off is attractive depends on your constraints and workload.

In a recent test, loading directly with pandas was an order of magnitude slower than the above approach.

It’s possible to go directly from SQL to a pandas DataFrame, where the DataFrame is equivalent to a dictionary of lists.

Takeaways

If your pipeline benefits from a dictionary-of-lists layout, build it directly from the cursor. Chunked fetch with a simple transpose step keeps the code compact and avoids per-cell lookups. When picking tools, consider the execution model you need: row-wise convenience versus column-wise throughput. If speed is a concern in your setting, test the chunked iterator approach on your data and keep it in your toolkit.