2025, Dec 04 03:00

Why Quickselect Times Out on Many Duplicates - How to Fix Lomuto Partition for Kth Largest

Learn why Quickselect can time out on the Kth largest with many duplicates: Lomuto partition funnels equals, causing quadratic time. See simple fixes.

Quickselect is a go-to approach for finding the Kth largest element in an array with expected linear time. Yet it can still time out on seemingly reasonable inputs. If your implementation partitions with a fixed end pivot and treats equal elements naively, a large array with many duplicates and a K near the middle can push it into quadratic behavior. Understanding why this happens and how to neutralize it is essential for robust selection code.

Problem setup

Consider a straightforward Quickselect that shuffles the input, partitions with a Lomuto-style scheme using the last element as the pivot, and recurses to the side that contains the target index. The mapping from Kth largest to the index of the Kth smallest is done by targeting n - k in zero-based indexing.

import random

class Ranker(object):
    def kth_largest(self, arr, kpos):
        m = len(arr)
        random.shuffle(arr)
        return self._pick(arr, 0, m - 1, m - kpos)

    def _pick(self, arr, lo, hi, idx):
        p = self._split(arr, lo, hi)
        if idx < p:
            return self._pick(arr, lo, p - 1, idx)
        elif idx > p:
            return self._pick(arr, p + 1, hi, idx)
        else:
            return arr[p]

    def _split(self, arr, lo, hi):
        p = hi
        i = lo
        for j in range(lo, hi):
            if arr[j] <= arr[p]:
                arr[i], arr[j] = arr[j], arr[i]
                i += 1
        arr[i], arr[p] = arr[p], arr[i]
        return i

What actually goes wrong

The slowdown appears when the input has very low entropy, for example when the array is dominated by a single repeated value and the target index is near the middle. With the Lomuto scheme shown above, the partition condition uses less-or-equal. If the pivot value occurs many times, the loop moves every equal element to the left side and ends with the pivot placed at the far end. The recursion then discards only one element per step. That pattern repeats until the target is reached, degrading the overall running time to quadratic. Shuffling does not help here, because the issue is not pivot choice alone but how equals are handled during partitioning.

Fixing the partition on equals

To break that worst-case pattern, the partition must avoid funneling all elements equal to the pivot to the same side. One practical tweak is to swap strictly on less-than and, for elements equal to the pivot, perform the swap only half the time. This randomizes the placement of equals across the boundary, preventing the one-element-at-a-time shrinkage that causes timeouts on large inputs with many duplicates.

import random

class Ranker(object):
    def kth_largest(self, arr, kpos):
        m = len(arr)
        random.shuffle(arr)
        return self._pick(arr, 0, m - 1, m - kpos)

    def _pick(self, arr, lo, hi, idx):
        p = self._split(arr, lo, hi)
        if idx < p:
            return self._pick(arr, lo, p - 1, idx)
        elif idx > p:
            return self._pick(arr, p + 1, hi, idx)
        else:
            return arr[p]

    def _split(self, arr, lo, hi):
        p = hi
        i = lo
        for j in range(lo, hi):
            if arr[j] < arr[p] or (arr[j] == arr[p] and random.getrandbits(1)):
                arr[i], arr[j] = arr[j], arr[i]
                i += 1
        arr[i], arr[p] = arr[p], arr[i]
        return i

This small change preserves correctness and keeps the overall structure of Quickselect intact, while neutralizing the pathological behavior caused by clusters of equal values. It stays lightweight and avoids additional passes or memory.

Why this matters

Real-world data often contains many duplicates. If your partition routine pushes all equals to one side, Quickselect can repeatedly peel off a single element and fall into quadratic time. That type of hidden performance cliff is easy to miss during development and shows up only at scale or on particular test sets. Handling equals well is therefore not a micro-optimization but a guardrail against worst-case blowups.

There are other partition strategies that also help with many duplicates. A Hoare-style partition is known to behave better as the number of duplicates grows, and a three-way partition (the Dutch flag variant) is another common tool to separate less-than, equal-to, and greater-than regions efficiently. The key idea across these approaches is the same: do not let equal elements skew the recursion.

Takeaways

If your Quickselect times out on large inputs with a big K, look closely at how the partition treats equal values. Avoid the less-or-equal funnel in the Lomuto pattern by randomizing the placement of equals or by switching to a scheme that balances them by design. Test with arrays dominated by a single repeated number and a target near the middle to confirm that your implementation does not degrade. With that in place, Quickselect remains a dependable tool for Kth-largest selection.

algorithm python quickselect