2025, Nov 23 13:00

Resolve Azure Blob Storage DNS errors and insert CSVs into an Excel workbook with tab1 first and tab5 penultimate using pandas and openpyxl

Fix Azure Blob Storage DNS errors from a bad connection string and append CSVs to an Excel workbook with pandas and openpyxl, preserving worksheet order.

When you automate Excel updates in Azure Blob Storage, a small misconfiguration can derail the entire flow. A frequent stumbling block is a malformed connection string that breaks DNS resolution, followed by confusion around appending CSV data as new worksheets at specific positions. Below is a focused walkthrough that shows the failure pattern, explains the root cause, and demonstrates a working approach to insert two CSVs into an existing workbook so that one becomes the first worksheet (tab1) and the other becomes the penultimate one (tab5).

What goes wrong

The pipeline generates two CSV files and attempts to push them into an existing Excel workbook with pandas and openpyxl. The initial code tries to read the workbook and then append CSV content as new sheets. However, the run fails with a DNS error caused by a malformed connection string.

Failed to resolve 'odsblobcontainer.blob.odsblobcontainer.blob.core.windows.net'

This happens because the connection string format is incorrect, producing a duplicated hostname in the blob endpoint. The SDK can’t resolve the resulting URL, so the download step for the Excel blob never even starts.

Problematic example

The following snippet illustrates the behavior. It targets Azure Blob Storage, reads one workbook and two CSVs, and tries to write tab1 and tab5, then re-upload the file. The configuration error sits in the connection string.

from azure.storage.blob import BlobServiceClient
import pandas as pd
from io import BytesIO
from datetime import date

run_stamp = date.today().strftime("%Y%m%d")

conn_str = "DefaultEndpointsProtocol=https;AccountName=MyAccount;AccountKey=accountkey;EndpointSuffix=myendpoint.blob.core.windows.net/myproj"

cont_id = "MyContainer/Prj1"
xls_blob = "My Excel.xlsx"
csv_blob_a = "My_csv1_" + run_stamp + ".csv"
csv_blob_b = "My_csv2_" + run_stamp + ".csv"

svc = BlobServiceClient.from_connection_string(conn_str)

xls_client = svc.get_blob_client(container=cont_id, blob=xls_blob)
xls_mem = BytesIO(xls_client.download_blob().readall())

csv_client_a = svc.get_blob_client(container=cont_id, blob=csv_blob_a)
buf_a = BytesIO(csv_client_a.download_blob().readall())
frame_a = pd.read_csv(buf_a)

csv_client_b = svc.get_blob_client(container=cont_id, blob=csv_blob_b)
buf_b = BytesIO(csv_client_b.download_blob().readall())
frame_b = pd.read_csv(buf_b)

with pd.ExcelWriter(xls_mem, mode="a", engine="openpyxl") as xls_writer:
    frame_a.to_excel(xls_writer, sheet_name="tab1", index=False)
    frame_b.to_excel(xls_writer, sheet_name="tab5", index=False)

xls_mem.seek(0)
xls_client.upload_blob(xls_mem, overwrite=True)

Why it fails

The DNS error stems from the way the endpoint is built from the provided connection string. The string concatenates values incorrectly and causes the blob endpoint to include a repeated hostname. The correct connection string format is simple and must follow this shape exactly:

DefaultEndpointsProtocol=https;AccountName=<Your storage account name>;AccountKey=<Your storage account key>;EndpointSuffix=core.windows.net

Once the connection string is valid, the SDK resolves the endpoint properly and the rest of the workflow—downloading the workbook, writing CSVs as new sheets, and uploading back—can proceed.

Working solution

The code below connects with a properly formatted connection string, reads the workbook via openpyxl, loads the CSVs into pandas DataFrames, removes any preexisting tab1 and tab5, writes fresh content for both, reorders the sheets so that tab1 is first and tab5 is penultimate, and uploads the revised Excel to the same blob.

from azure.storage.blob import BlobServiceClient
import pandas as pd
from io import BytesIO
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
from datetime import date

run_stamp = date.today().strftime("%Y%m%d")
conn_str = "DefaultEndpointsProtocol=https;AccountName=<storage account>;AccountKey=<key>;EndpointSuffix=core.windows.net"
cont_id = "<container name>"
base_path = "prj1/"
xls_blob = base_path + "My Excel.xlsx"
csv_blob_a = base_path + f"My_csv1_{run_stamp}.csv"
csv_blob_b = base_path + f"My_csv2_{run_stamp}.csv"

svc = BlobServiceClient.from_connection_string(conn_str)
xls_client = svc.get_blob_client(container=cont_id, blob=xls_blob)

xls_stream = BytesIO(xls_client.download_blob().readall())
wb = load_workbook(xls_stream)

def load_csv(blob_name):
    obj = svc.get_blob_client(container=cont_id, blob=blob_name)
    mem = BytesIO(obj.download_blob().readall())
    return pd.read_csv(mem)

frame_a = load_csv(csv_blob_a)
frame_b = load_csv(csv_blob_b)

for nm in ["tab1", "tab5"]:
    if nm in wb.sheetnames:
        wb.remove(wb[nm])

ws1 = wb.create_sheet("tab1")
ws5 = wb.create_sheet("tab5")

for row in dataframe_to_rows(frame_a, index=False, header=True):
    ws1.append(row)

for row in dataframe_to_rows(frame_b, index=False, header=True):
    ws5.append(row)

order = wb.sheetnames.copy()
order.remove("tab1")
order.remove("tab5")
final_order = ["tab1"] + order[:-1] + ["tab5", order[-1]]
wb._sheets = [wb[s] for s in final_order]

out_buf = BytesIO()
wb.save(out_buf)
out_buf.seek(0)
xls_client.upload_blob(out_buf, overwrite=True)

Why this matters

Storage connectivity issues are easy to misdiagnose when the error shows up far from the actual cause. A malformed connection string manifests as a DNS failure that looks like a network glitch, but the fix is in configuration, not infrastructure. Getting this right is critical when your orchestration needs deterministic placement of sheets and repeatable updates to existing workbooks in Blob Storage.

Takeaways

Validate the connection string format before anything else. Once connectivity is correct, explicitly control worksheet creation and ordering so that tab1 becomes the first sheet and tab5 lands just before the last sheet. Keeping the flow in-memory via BytesIO with pandas and openpyxl is a straightforward way to update workbooks stored in Azure Blob Storage without temporary files. With these pieces aligned, the pipeline cleanly merges CSV outputs into the Excel artifact in the required positions.