2025, Sep 24 21:00

How to Prevent Checkmarx SSRF Findings: Build Safe Python URLs Without Breaking CI/CD

Stop Checkmarx and SAST SSRF flags by building URLs safely in Python. See two patterns: urljoin+quote for encoding, and a clean requests.Session with a base URL.

Static analysis tools like Checkmarx are good at spotting risky taint flows, and they will halt a CI/CD pipeline the moment user input is interpolated into a URL. A common case is building a request path with a supposedly benign identifier. Even after strict UUID checks, directly placing a variable into a URL path can be flagged as potential SSRF. Below is a minimal, real-world pattern that triggers such a finding and how to address it without changing the application’s behavior.

Problem setup

The application queries an internal REST endpoint to fetch a file by its UUIDv4. The hostname comes from configuration, the identifier is validated rigorously, and then used to construct the URL. Despite the checks, the direct string interpolation is reported as an SSRF risk in the CI/CD gate.

import os
import re
import uuid
import validators
import requests

UUID_V4_PATTERN = r'^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$'


def grab_document(doc_key):
    if not re.match(UUID_V4_PATTERN, doc_key):
        raise ValueError('invalid file_id')

    if not validators.uuid(doc_key):
        raise ValueError('invalid file_id')

    try:
        doc_key = str(uuid.UUID(doc_key))
    except Exception:
        raise ValueError('invalid file_id')

    svc_host = os.getenv('HOSTNAME', None)

    if svc_host is None:
        raise ValueError('invalid file_id')

    endpoint = f'https://{svc_host}/v1/files/{doc_key}'

    resp = requests.get(endpoint)

    if resp.status_code == 200:
        pass

Why this is flagged

From the scanner’s perspective, a user-controlled value flows into a URL string that is then used in an HTTP request. This direct flow is a classic SSRF pattern. Even though the code verifies that the value is a UUIDv4, the path construction itself remains a taint sink. The issue is not the UUID format, but the interpolation of external input into a request URL.

Two safe construction patterns

There are two straightforward ways to restructure the code so that the taint flow is broken and the request is built in a controlled manner. Both keep the same validation logic and behavior, and both avoid raw string concatenation of the user value into the URL.

The first approach uses explicit URL construction primitives that properly encode and join path segments.

import os
import re
import uuid
import validators
import requests
from urllib.parse import urljoin, quote

UUID_V4_PATTERN = r'^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$'


def fetch_asset(asset_uuid):
    if not re.match(UUID_V4_PATTERN, asset_uuid):
        raise ValueError('invalid file_id')

    if not validators.uuid(asset_uuid):
        raise ValueError('invalid file_id')

    try:
        normalized_uuid = str(uuid.UUID(asset_uuid))
    except Exception:
        raise ValueError('invalid file_id')

    cfg_host = os.getenv('HOSTNAME', None)
    if cfg_host is None:
        raise ValueError('invalid hostname')

    base_root = f'https://{cfg_host}/v1/files/'
    encoded_uuid = quote(normalized_uuid, safe='')
    target_url = urljoin(base_root, encoded_uuid)

    resp = requests.get(target_url)

    if resp.status_code == 200:
        return resp

The second approach leverages requests.Session with a composed base URL and a clean path expansion. In practice, this variant has resolved the pipeline issue.

import os
import re
import uuid
import validators
import requests

UUID_V4_PATTERN = r'^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$'


def retrieve_file(item_id):
    if not re.match(UUID_V4_PATTERN, item_id):
        raise ValueError('invalid file_id')

    if not validators.uuid(item_id):
        raise ValueError('invalid file_id')

    try:
        clean_id = str(uuid.UUID(item_id))
    except Exception:
        raise ValueError('invalid file_id')

    env_host = os.getenv('HOSTNAME', None)
    if env_host is None:
        raise ValueError('invalid hostname')

    api_base = f'https://{env_host}/v1/files'

    sess = requests.Session()
    resp = sess.get(f"{api_base}/{clean_id}")

    if resp.status_code == 200:
        return resp

What changed and why it helps

In both variants the input validation remains identical. The difference is how the final URL is assembled. Explicit URL building and encoding, or composing a base endpoint and issuing the request via Session, breaks the direct taint flow that static analysis tools look for. The result is the same outgoing request, but constructed in a way that avoids the flagged pattern.

Why you should care

SSRF is a high-signal category in SAST and can block releases even for internal-only traffic. Adjusting how you build URLs keeps pipelines green without relaxing checks. It also makes the intent of the code clearer: the path segment is treated as data, not as a free-form URL.

Takeaways

Validate the identifier as strictly as needed, then build the request URI using safe primitives or a Session-based flow rather than raw string interpolation. This preserves functionality, reduces perceived SSRF risk in static analysis, and prevents your CI/CD from stalling on a false-positive taint pattern.

The article is based on a question from StackOverflow by hivegu and an answer by Mahrez BenHamad.