2025, Nov 24 19:00
How to Keep Your Python sdist Clean with setuptools-scm: Exclude .gitignore and .github via MANIFEST.in
Learn to control setuptools-scm sdist contents with MANIFEST.in: exclude .gitignore, prune .github, keep pyproject.toml, and fix package discovery patterns.
When you build a Python package with setuptools and setuptools-scm, it’s easy to end up with more files in the source distribution than you intended. Typical culprits are repository-only artifacts like .gitignore and the .github directory. If your goal is a clean sdist that contains just package code and the essentials, there’s a precise way to get there without breaking your build.
Reproducing the issue
Consider a small project where you invoke the build with python -m build. The repository is straightforward and includes a package, docs files, and some VCS-related housekeeping.
repo_root
├── .gitignore
├── .github
├── pyproject.toml
├── README.rst
├── LICENSE
└── toolkit
├── __init__.py
└── module.py
The packaging configuration specifies setuptools, setuptools-scm, and wheel, and asks setuptools to discover packages using a pattern.
[build-system]
requires = [
"setuptools >= 65.5.1",
"setuptools-scm",
"wheel"
]
build-backend = "setuptools.build_meta"
[tool.setuptools.package-data]
"*" = [
"README.rst",
"LICENSE"
]
[tool.setuptools.packages.find]
include = [
"toolkit.*"
]
After building, the generated SOURCES.txt pulls in .gitignore and the .github directory, which then appear in the dist artifacts.
.gitignore
LICENSE
README.rst
pyproject.toml
.github/dependabot.yml
.github/pull_request_template.md
.github/ISSUE_TEMPLATE/99_any.md
.github/workflows/testing.yml
toolkit.egg-info/PKG-INFO
toolkit.egg-info/SOURCES.txt
toolkit.egg-info/dependency_links.txt
toolkit.egg-info/requires.txt
toolkit.egg-info/top_level.txt
toolkit/__init__.py
toolkit/module.py
Why it happens
With setuptools-scm enabled, every file under version control is included by default. That is exactly what you are seeing: files tracked by your VCS are treated as part of the source distribution unless you explicitly say otherwise. This behavior is by design. If you want a narrower set, you can’t rely on implicit filters; you must declare exclusions.
There’s another subtle trap that sometimes confuses package discovery. If your include pattern is like toolkit.* it will only match subpackages and not the top-level toolkit package. If discovery appears broken and your base package isn’t recognized, the pattern needs to be toolkit* instead. That is a separate concern from the file selection policy and does not affect why .gitignore or .github are ending up in your sdist.
The fix: MANIFEST.in with exclude and prune
When you must use setuptools-scm, the way to control what doesn’t ship is to add a MANIFEST.in file in the repository root. Use exclude to drop specific files and prune to drop entire directories or directory trees. Arguments support globs, so you can be very targeted or broad depending on your needs.
exclude .gitignore
prune .github
Place this MANIFEST.in at the top level of the repo. The globs can be exact names as above or patterns like exclude foo/*.py[co]. The official directives and options are detailed at https://setuptools.pypa.io/en/latest/userguide/miscellaneous.html.
What the result should look like
With MANIFEST.in in place, the sdist stops bundling repository-only content and keeps the packaging essentials. It’s a good idea to leave pyproject.toml in the archive, because it tells build tools how to assemble and install the package; without it, your sdist won’t be installable.
LICENSE
README.rst
pyproject.toml
toolkit.egg-info/PKG-INFO
toolkit.egg-info/SOURCES.txt
toolkit.egg-info/dependency_links.txt
toolkit.egg-info/requires.txt
toolkit.egg-info/top_level.txt
toolkit/__init__.py
toolkit/module.py
A note on simplifying the setup
If you don’t actually need setuptools-scm, a leaner configuration without it will avoid sweeping every tracked file into the sdist. In that case, you can drop tool.setuptools.* sections as well, since README and LICENSE are included by default, and your archive will include pyproject.toml and won’t pull in .github or other random files in your repo.
Why this matters
Keeping source distributions tight and intentional avoids shipping CI workflows, templates, and other repository metadata to your users. It produces cleaner artifacts and reduces confusion for consumers of your package. Ensuring pyproject.toml is present also safeguards installability across tools that rely on it to orchestrate the build.
Takeaways
When setuptools-scm is part of your toolchain, assume that everything under version control will be included unless you say otherwise. Put MANIFEST.in in the repository root, use exclude for single files and prune for directories, and keep pyproject.toml in the archive. If package discovery looks off, confirm your include pattern covers the base package, for example toolkit* rather than toolkit.*. With these adjustments, your sdist will contain exactly what it should—and nothing it shouldn’t.