2025, Nov 25 15:00

Tightly Crop PDF Diagrams to SVG Using PyMuPDF: Export Only Vector Content, Remove Page Margins

Crop PDF to SVG in PyMuPDF (fitz): detect vector drawings, union their bounds, set the crop box, and export a clean, tight diagram without page margins.

Extracting line diagrams from a PDF and converting them to SVG often yields an oversized canvas with unwanted margins. When the goal is a tight, borderless SVG focused on the vector content itself, exporting the full page is overkill and inconvenient.

Problem overview

You already have a working export to SVG using PyMuPDF (fitz), but it captures the entire page and keeps the surrounding margin scale. The intent is to crop to the diagram’s bounds and save only that area as SVG.

Minimal example that reproduces the issue

import fitz
pdf_obj = fitz.open(pdf_path)
first_page = pdf_obj[0]
svg_data = first_page.get_svg_image(matrix=fitz.Matrix(1, 1))

This exports a full-page SVG, including borders you don’t need.

What actually causes the oversized SVG

The export relies on the page’s current canvas. If you don’t alter the page’s visible area, the SVG covers everything inside the original page box, including empty margins and border graphics. To remove that extra space, the page region must be narrowed down to the union of all vector drawings on the page before exporting.

Solution: crop to the union of vector drawings and export

The approach is straightforward: collect drawing objects from the page, aggregate their bounding rectangles, slightly pad the bounds to avoid clipping, set the crop box to this rectangle, and then export to SVG.

import fitz
def make_svg_trimmed(src_pdf, out_svg, pg_idx=0):
    doc_obj = fitz.open(src_pdf)
    pg = doc_obj[pg_idx]
    draw_items = pg.get_drawings()
    rects = [itm["rect"] for itm in draw_items if itm["rect"].is_valid]
    if not rects:
        print("No vector drawings found")
        return
    union_box = rects[0]
    for bx in rects[1:]:
        union_box |= bx
    union_box = union_box + (-2, -2, 2, 2)
    pg.set_cropbox(union_box)
    svg_txt = pg.get_svg_image(matrix=fitz.Matrix(1, 1))
    with open(out_svg, "w", encoding="utf-8") as out_f:
        out_f.write(svg_txt)
    print(f"SVG saved to {out_svg}")

This trims the page to the actual vector content and removes the border effect by restricting the canvas to the combined bounds of the drawings with a small extra space.

Why this matters

When working with vector diagrams, clean and tightly cropped SVGs simplify downstream use, avoid distracting page margins, and match the intended visual area. The export becomes focused on the diagram itself rather than the original PDF’s full layout.

Wrap-up

If an SVG export keeps the full PDF page, gather the page’s vector drawings, compute their combined bounding box, set the crop box to that area, and then render the SVG. This trims the canvas to the diagram and removes the margin scale that otherwise clutters the result.