2025, Oct 18 23:00

How to generate unique, cross-role pairs in Python and fix the off-by-one tuple slicing bug

Learn how an off-by-one tuple slicing bug let same-role matches slip into a Python pairing algorithm, and see the fix to generate unique, cross-role pairs.

Pairing people randomly sounds trivial until one constraint flips the outcome. In this case, the goal is to form pairs that have never occurred before and that do not share the same role. The implementation almost gets there, but a subtle slice error lets same-role matches slip through.

Reproducing the issue

The following implementation reads previously seen pairs from a CSV, shuffles a roster, and builds new pairs while filtering out duplicates, exclusions, and role overlaps. The catch is in how roles are extracted from each tuple.

import random
import pandas as p

# Roster of all names and their roles
roster = [
    ("Samantha Reyes", "Innovation", "Product Owner"),
    ("Ethan McAllister", "Data Scientist"),
    ("Priya Deshmukh", "Data Architect", "SMT"),
    ("Marcus Liu", "Stream 3"),
    ("Elena Petrova", "SMT", "Stream 3"),
]

# Load previously generated pairs from the CSV file
def read_seen_duos(csv_path):
    df = p.read_csv(csv_path)
    duo_cache = set()
    for _, row in df.iterrows():
        duo = (row['name1'], row['name2'])
        duo_rev = (row['name2'], row['name1'])
        duo_cache.add(duo)
        duo_cache.add(duo_rev)
    return duo_cache

# Append new pairs to the CSV file
def append_duos_to_csv(csv_path, duos):
    df = p.DataFrame(duos, columns=['name1', 'name2'])
    df.to_csv(csv_path, index=False, mode='a', header=False)

# Path to previously generated pairs
pairs_csv_path = 'prev_pairs2.csv'

# Initialize the cache of seen pairs
prior_duos = read_seen_duos(pairs_csv_path)

# Exclusions for pairing logic
banned_duos = []

def build_fresh_duos_with_trace(roster, prior_duos, banned_duos):
    banned_set = set(banned_duos) | set((a[1], a[0]) for a in banned_duos)
    max_tries = 10000
    attempts = 0

    while attempts < max_tries:
        random.shuffle(roster)
        duos = []
        engaged = set()
        leftovers = []

        for i in range(0, len(roster) - 1, 2):
            member1, member2 = roster[i], roster[i + 1]
            tags1 = set(member1[2:]) if len(member1) > 2 else set()
            tags2 = set(member2[2:]) if len(member2) > 2 else set()
            duo_key = (member1[0], member2[0])
            duo_key_rev = (member2[0], member1[0])

            if duo_key in banned_set or duo_key_rev in banned_set:
                print(f"Skipping pair due to exclusion: {member1[0]} - {member2[0]}")
                continue
            if duo_key in prior_duos or duo_key_rev in prior_duos:
                print(f"Skipping pair due to seen pair: {member1[0]} - {member2[0]}")
                continue
            if tags1 & tags2:
                print(f"Skipping pair due to role overlap: {member1[0]} ({tags1}) - {member2[0]} ({tags2})")
                continue
            if member1[0] in engaged or member2[0] in engaged:
                print(f"Skipping pair due to duplicate usage: {member1[0]} - {member2[0]}")
                continue

            duos.append((member1, member2))
            engaged.update([member1[0], member2[0]])

        leftovers = [person for person in roster if person[0] not in engaged]
        if not leftovers:
            duo_keys = set((d[0][0], d[1][0]) for d in duos)
            if not any((k in prior_duos or (k[1], k[0]) in prior_duos) for k in duo_keys):
                prior_duos.update(duo_keys)
                return duos

        attempts += 1

    print("Unable to generate unique pairs with the given restrictions.")
    print(f"Skipped individuals: {[person[0] for person in leftovers]}")
    raise ValueError("Unable to generate unique pairs with the given restrictions.")

# Generate pairs
result_duos = build_fresh_duos_with_trace(roster, prior_duos, banned_duos)

# Print the pairs
for d in result_duos:
    print(f"{d[0][0]} - {d[1][0]}")
print("-----")
print(f"Total pairs: {len(result_duos)}")

# Persist
append_duos_to_csv(pairs_csv_path, [(d[0][0], d[1][0]) for d in result_duos])

What actually goes wrong

Tuples in the roster use this structure: index 0 is the name, indices 1 and beyond are the roles. Because Python uses 0-based indexing, slicing from index 2 means “start from the third element.” In other words, the first role at index 1 is silently dropped. When the overlap check relies on these role sets, two people sharing only the first role will pass the filter and get paired together, which is exactly the behavior you observed.

The fix

Build role sets from index 1 onward so that all roles are considered. There’s no need to guard with a length check, because slicing from 1 on a one-element tuple yields an empty tuple, which becomes an empty set.

def build_fresh_duos_with_trace(roster, prior_duos, banned_duos):
    banned_set = set(banned_duos) | set((a[1], a[0]) for a in banned_duos)
    max_tries = 10000
    attempts = 0

    while attempts < max_tries:
        random.shuffle(roster)
        duos = []
        engaged = set()
        leftovers = []

        for i in range(0, len(roster) - 1, 2):
            member1, member2 = roster[i], roster[i + 1]
            # Corrected: include all roles starting from index 1
            tags1 = set(member1[1:])
            tags2 = set(member2[1:])
            duo_key = (member1[0], member2[0])
            duo_key_rev = (member2[0], member1[0])

            if duo_key in banned_set or duo_key_rev in banned_set:
                print(f"Skipping pair due to exclusion: {member1[0]} - {member2[0]}")
                continue
            if duo_key in prior_duos or duo_key_rev in prior_duos:
                print(f"Skipping pair due to seen pair: {member1[0]} - {member2[0]}")
                continue
            if tags1 & tags2:
                print(f"Skipping pair due to role overlap: {member1[0]} ({tags1}) - {member2[0]} ({tags2})")
                continue
            if member1[0] in engaged or member2[0] in engaged:
                print(f"Skipping pair due to duplicate usage: {member1[0]} - {member2[0]}")
                continue

            duos.append((member1, member2))
            engaged.update([member1[0], member2[0]])

        leftovers = [person for person in roster if person[0] not in engaged]
        if not leftovers:
            duo_keys = set((d[0][0], d[1][0]) for d in duos)
            if not any((k in prior_duos or (k[1], k[0]) in prior_duos) for k in duo_keys):
                prior_duos.update(duo_keys)
                return duos

        attempts += 1

    print("Unable to generate unique pairs with the given restrictions.")
    print(f"Skipped individuals: {[person[0] for person in leftovers]}")
    raise ValueError("Unable to generate unique pairs with the given restrictions.")

Why this detail matters

Data shape assumptions leak into control logic quickly. Here, the uniqueness and overlap constraints depend on correctly interpreting tuple structure. An off-by-one slice is easy to miss in review, but it directly undermines the pairing invariant and pollutes your history with invalid pairs. Fixing the slice restores the intended semantics without changing the pairing strategy or I/O behavior.

Takeaways

When roles start at the second element, always slice from index 1 to capture the full set of attributes used in filtering. In this context, using set(person[1:]) is enough and resilient to variable tuple length. With that in place, the overlap check behaves as expected and the generator produces only unseen, cross-role pairs.

The article is based on a question from StackOverflow by Imam and an answer by Rafalon.

csv pandas python