2025, Dec 18 11:00

How to centralize Python media file extensions with a shared module or a YAML configuration

Learn how to centralize Python media file extensions with a single source of truth: compare a shared module vs a YAML configuration, with code to scan dirs.

When multiple Python scripts scan the current working directory for media files, hardcoding dozens of extensions in each script quickly becomes a maintenance problem. The moment a new type appears, you’re copy-pasting changes across files. Centralizing this knowledge in one place is the obvious optimization. Below is a practical look at two ways to do it and how to wire them up cleanly.

Baseline: a shared Python module with tuples of extensions

A common starting point is a separate module that stores the file type lists, and each script imports it. The structure is simple: a small configuration file with tuples of extensions and a script that resolves the module, loads it, and filters files in the CWD.

C:\code\script1.py
C:\code\SYS_python\media_file_types.py

(base) P:\aaa\bbb\CWD> python c:\code\script1.py

Example configuration module:

# C:\code\SYS_python\media_file_types.py
pic_kinds = ('jpg', 'jpeg', 'jfif', 'gif')
clip_kinds = ('mov', 'lrv', 'thm', 'xml')
sound_kinds = ('3gpp', 'aiff', 'pcm', 'aac')

Example import and usage in a script (paths preserved, names rewritten, behavior the same):

import os
import sys

cfg_path = os.path.join(os.path.dirname(__file__), 'SYS_python')
cfg_file = os.path.join(cfg_path, 'media_file_types.py')

if not os.path.exists(cfg_file):
    print(f"Error: Config file '{cfg_file}' is missing.")
    sys.exit(1)
else:
    sys.path.append(cfg_path)
    try:
        import media_file_types as media_defs

        pic_exts = tuple(f".{e.lower()}" for e in media_defs.pic_kinds)
        clip_exts = tuple(f".{e.lower()}" for e in media_defs.clip_kinds)
        sound_exts = tuple(f".{e.lower()}" for e in media_defs.sound_kinds)
        all_exts = pic_exts + clip_exts + sound_exts

        pic_list = [n for n in os.listdir('.') if n.lower().endswith(pic_exts)]
        clip_list = [n for n in os.listdir('.') if n.lower().endswith(clip_exts)]
        sound_list = [n for n in os.listdir('.') if n.lower().endswith(sound_exts)]

        print(f"Found {len(pic_list)} photo files.")
        print(f"Found {len(clip_list)} video files.")
        print(f"Found {len(sound_list)} audio files.")

    except ImportError:
        print(f"Error: Unable to import 'media_file_types' from '{cfg_path}'.")
        sys.exit(1)
    except AttributeError as exc:
        print(f"Error: Expected attributes missing in 'media_file_types': {exc}")
        sys.exit(1)

What’s going on here and why it works

The configuration lives in a dedicated module that lists file extensions as tuples. The script constructs a path to that module directory, checks that the file exists, amends sys.path, imports the module, normalizes extensions to lowercase with a dot prefix, and collects files by checking endswith against the appropriate tuple. It also reports counts and exits early if the configuration is not present or is missing expected attributes. The result is a single source of truth for all scripts that need the same media taxonomy.

Alternative: a YAML configuration with a small accessor class

The same “single source of truth” can be expressed as a .yaml file. The definitions move out of Python code and into a simple data document, while scripts query a helper class for the extensions they need and then scan directories.

Example YAML outline:

media_types:
  photo:
    - jpg
    ...
  video:
    ...

Example accessor class and usage (names rewritten, behavior preserved):

import os
from pathlib import Path

class MediaRegistry:
    def gather_ext(self, kind, include_dot=True):
        exts = self.media_types.get(kind.lower(), [])
        if include_dot:
            return tuple(f".{x.lower()}" for x in exts)
        return tuple(x.lower() for x in exts)

    def collect_files(self, root='.', kind=None):
        exts = self.gather_ext(kind)
        return [p for p in os.listdir(root) if p.lower().endswith(exts)]

manifest_path = Path(__file__).parent / 'SYS_python' / 'media_config.yaml'
registry = MediaRegistry()  # initialization is required
files_photo = registry.collect_files(kind='photo')

The definitions are kept in a single YAML file. The class exposes two focused operations: return extensions for a requested media type, with or without a leading dot, and scan a directory using those extensions. The path to the YAML sits next to your scripts, and you initialize the class accordingly.

Why this is worth knowing

Both approaches consolidate media type knowledge so you edit one file instead of many scripts. The first uses a Python module with tuples and explicit import checks. The second expresses the same idea through a data file and a thin layer that provides get-and-find methods. Either way, you avoid duplication and make it easier to add new extensions consistently across multiple entry points.

Practical takeaway

If you already have modules shared across scripts, keeping the tuples in a dedicated module and importing them with basic validation works and is easy to wire up. If you prefer a data-only source that scripts query through a small helper, placing the lists in a .yaml file and reading them through a class offers the same centralization. Use one configuration file, normalize extensions once, and have all scripts consume that shared definition rather than redefining it.