2025, Dec 17 01:00
Build a dictionary of lists from SQLite results in Python: efficient column-wise loading with chunks
Convert SQLite query results to a Python dictionary of lists using chunked fetchmany and a single transpose for faster column-wise loading and lower overhead.
When you pull data from SQLite, the obvious path is to build a list of dictionaries: one dict per row. That is ergonomic and readable, but not always what you want. If your downstream code expects column-major structures, turning the result into a dictionary of lists is a better fit. Below we look at a small table, show how the conventional approach behaves, and then switch to an efficient column-wise load using chunked reads.
Problem setup
Consider a simple table stored in my_db.db:
idx speed duration
0 420 50
1 380 40
2 390 45
Loading it into Python as a list of dictionaries is straightforward:
import sqlite3
base_path = folder_path # example only; assume folder_path is defined
db_conn = sqlite3.connect(base_path + 'my_db.db')
db_conn.row_factory = sqlite3.Row
db_cur = db_conn.cursor()
db_cur.execute('SELECT * FROM my_table')
row_objs = db_cur.fetchall()
rows_as_dicts = [dict(r) for r in row_objs]
db_cur.close()
db_conn.close()
This produces:
rows_as_dicts = [
{'idx': 0, 'speed': 420, 'duration': 50},
{'idx': 1, 'speed': 380, 'duration': 40},
{'idx': 2, 'speed': 390, 'duration': 45}
]
The target structure, however, is a dictionary of lists, organized by columns:
cols_as_lists = {
'idx': [0, 1, 2],
'speed': [420, 380, 390],
'duration': [50, 40, 45]
}
What’s really going on
The list-of-dicts approach iterates row by row and constructs a separate dictionary for every record, repeatedly touching the same keys in Python. If you need a column-oriented layout, you end up doing extra work: either a second pass to pivot rows into columns, or per-cell dictionary lookups during ingestion. Both add overhead. The alternative is to build the column structure directly while fetching from the cursor.
An efficient way to build a dictionary of lists
The idea is to read the result set in chunks and transform each chunk from row-major to column-major form with a single transpose step. Using fetchmany and a compact iterator pipeline avoids repeated lookups per cell and leans on dense iteration. This approach has been measured to be roughly 5x faster, because it avoids repeated dictionary lookups for every single column in every single row and uses more dense iterators.
import sqlite3
import collections
cn = sqlite3.connect(base_path + 'my_db.db')
cr = cn.cursor()
cr.execute('SELECT * FROM my_table')
# 'col_names' should contain the column names in the same order as the SELECT output
col_names = columns # assume this sequence exists alongside the cursor
store = collections.defaultdict(list)
while batch := cr.fetchmany(CHUNK_SIZE):
for col_values, col_name in zip(zip(*batch), col_names):
store[col_name].extend(col_values)
cr.close()
cn.close()
cols_as_lists = dict(store)
This builds the desired dictionary-of-lists structure directly during ingestion, without materializing a list of row dictionaries first.
Why this matters
For tables with many rows, column-wise construction can reduce Python-level overhead and speed up loading significantly. It also aligns with column-major consumers out of the box, avoiding a second pass. As one practitioner noted, loading directly with pandas was an order of magnitude slower than the above approach in a recent test. Another observation is that you can go from SQL straight to a pandas DataFrame, and a DataFrame is conceptually close to a dictionary of lists. Whether that trade-off is attractive depends on your constraints and workload.
In a recent test, loading directly with pandas was an order of magnitude slower than the above approach.
It’s possible to go directly from SQL to a pandas DataFrame, where the DataFrame is equivalent to a dictionary of lists.
Takeaways
If your pipeline benefits from a dictionary-of-lists layout, build it directly from the cursor. Chunked fetch with a simple transpose step keeps the code compact and avoids per-cell lookups. When picking tools, consider the execution model you need: row-wise convenience versus column-wise throughput. If speed is a concern in your setting, test the chunked iterator approach on your data and keep it in your toolkit.