https://pytroubles.com/en/posts/id1630-pandas-reshape-top-n-by-rank-per-date-with-concat-stack-pivot-plus-a-fast-numpy-top-k

Pandas reshape: top-N by rank per date with concat, stack, pivot (plus a fast NumPy top-K)

Top-N by Rank per Date in pandas: reshape scores with concat, stack, pivot, plus a NumPy top-K shortcut

Pandas reshape: top-N by rank per date with concat, stack, pivot (plus a fast NumPy top-K)

Reshape scores by contestant into top-N by rank per date in pandas. Use a clean concat+stack+pivot workflow, and a fast NumPy top-K method when you need values.

2025-11-02T07:00:09+03:00

Turning a “scores by contestant” table into a “top-N by rank per date” view is a common reshaping task in pandas. The subtlety is keeping the logic clean and resilient while avoiding unnecessary round-trips through long-form data. Below is a compact pattern that does exactly that and an alternative when you only need the top values themselves.Problem setupWe have daily scores for several contestants. The goal is to build a DataFrame with dates on the index, ranks as columns (1, 2, 3), and the corresponding score values as the cells.import numpy as npimport pandas as pdgen = np.random.default_rng(42)days = pd.date_range('2024-08-01', '2024-08-07')players = ['Alligator', 'Beryl', 'Chupacabra', 'Dandelion', 'Eggplant', 'Feldspar']vals_rand = gen.random((len(days), len(players)))df_scores = pd.DataFrame(vals_rand, index=days, columns=players)df_scores.index.name = 'DATE'df_scores.columns.name = 'CONTESTANT'df_ranks = df_scores.rank(axis=1, method='first', ascending=False)ranks_top3 = df_ranks.where(df_ranks <= 3)The first attempt (works, but feels clunky)One way to get the desired layout is to melt both frames, stitch the rank onto the score rows, drop non-top-3 entries, then pivot back to wide form.long_vals = df_scores.T.melt(value_name='SCORE')long_ranks = ranks_top3.T.melt(value_name='RANK')long_vals['RANK'] = long_ranks['RANK']wide_try = long_vals.dropna().pivot(columns='RANK', index='DATE', values='SCORE')This produces the target shape, but the double melt and manual alignment of ranks to scores can be error-prone and harder to read.A cleaner reshape with concat, stack, and pivotYou can avoid the double melt entirely by concatenating scores and ranks side by side, stacking once to align by contestant, filtering, and then pivoting.tidy = (pd.concat({'score': df_scores, 'rank': ranks_top3}, axis=1) .stack() .dropna(subset=['rank']) .droplevel('CONTESTANT'))final_by_rank = tidy.pivot(columns='rank', values='score')Here, concat binds the two aligned DataFrames under the keys score and rank. A single stack brings contestant-level data into the index, making the score and rank columns line up row by row. After filtering out non-top-3 entries and dropping the contestant level, a straightforward pivot produces the wide table with rank columns.If you only need the top-N valuesWhen contestant identity and rank labels aren’t needed, you can skip ranking altogether and take the top N values per date directly with NumPy sorting. This does not rely on unique ranks.arr = df_scores.to_numpy()arr.sort(axis=1)K = 3topn_values = pd.DataFrame( arr[:, :-K-1:-1], index=df_scores.index, columns=pd.Index(range(1, K+1), name='RANK'))This yields the highest K scores per day ordered from rank 1 to K, purely by value.Why this mattersReshaping performance data is a recurring task in analytics pipelines: ranking leaderboards, extracting top performers, or preparing inputs for downstream dashboards. Using concat and stack keeps the transformation concise and readable by aligning measurements with their derived labels in one pass. And when all you need are the values themselves, a small NumPy slice does the job with minimal ceremony and without depending on ranking semantics.Practical advice and wrap-upPrefer a single, well-structured reshape over multiple melts when you already have aligned DataFrames. Keep score and rank data together as long as possible to reduce the chance of misalignment. If rank uniqueness isn’t guaranteed or isn’t needed, sorting raw values is a simple, robust alternative. Finally, choose clear, distinct variable names and generate arrays directly in the target shape to keep the setup obvious and easier to maintain.

pandas reshape, top-N by rank per date, concat stack pivot, pandas ranking, leaderboard, DataFrame pivot, stack melt, NumPy top-K, top values per day, scores by contestant

2025

2025, Nov 02 07:00

Top-N by Rank per Date in pandas: reshape scores with concat, stack, pivot, plus a NumPy top-K shortcut

Reshape scores by contestant into top-N by rank per date in pandas. Use a clean concat+stack+pivot workflow, and a fast NumPy top-K method when you need values.

Problem setup

We have daily scores for several contestants. The goal is to build a DataFrame with dates on the index, ranks as columns (1, 2, 3), and the corresponding score values as the cells.

import numpy as np
import pandas as pd

gen = np.random.default_rng(42)
days = pd.date_range('2024-08-01', '2024-08-07')
players = ['Alligator', 'Beryl', 'Chupacabra', 'Dandelion', 'Eggplant', 'Feldspar']

vals_rand = gen.random((len(days), len(players)))
df_scores = pd.DataFrame(vals_rand, index=days, columns=players)
df_scores.index.name = 'DATE'
df_scores.columns.name = 'CONTESTANT'
df_ranks = df_scores.rank(axis=1, method='first', ascending=False)
ranks_top3 = df_ranks.where(df_ranks <= 3)

The first attempt (works, but feels clunky)

One way to get the desired layout is to melt both frames, stitch the rank onto the score rows, drop non-top-3 entries, then pivot back to wide form.

long_vals = df_scores.T.melt(value_name='SCORE')
long_ranks = ranks_top3.T.melt(value_name='RANK')
long_vals['RANK'] = long_ranks['RANK']
wide_try = long_vals.dropna().pivot(columns='RANK', index='DATE', values='SCORE')

This produces the target shape, but the double melt and manual alignment of ranks to scores can be error-prone and harder to read.

A cleaner reshape with concat, stack, and pivot

You can avoid the double melt entirely by concatenating scores and ranks side by side, stacking once to align by contestant, filtering, and then pivoting.

tidy = (pd.concat({'score': df_scores, 'rank': ranks_top3}, axis=1)
        .stack()
        .dropna(subset=['rank'])
        .droplevel('CONTESTANT'))
final_by_rank = tidy.pivot(columns='rank', values='score')

Here, concat binds the two aligned DataFrames under the keys score and rank. A single stack brings contestant-level data into the index, making the score and rank columns line up row by row. After filtering out non-top-3 entries and dropping the contestant level, a straightforward pivot produces the wide table with rank columns.

If you only need the top-N values

When contestant identity and rank labels aren’t needed, you can skip ranking altogether and take the top N values per date directly with NumPy sorting. This does not rely on unique ranks.

arr = df_scores.to_numpy()
arr.sort(axis=1)
K = 3
topn_values = pd.DataFrame(
    arr[:, :-K-1:-1],
    index=df_scores.index,
    columns=pd.Index(range(1, K+1), name='RANK')
)

This yields the highest K scores per day ordered from rank 1 to K, purely by value.

Why this matters

Reshaping performance data is a recurring task in analytics pipelines: ranking leaderboards, extracting top performers, or preparing inputs for downstream dashboards. Using concat and stack keeps the transformation concise and readable by aligning measurements with their derived labels in one pass. And when all you need are the values themselves, a small NumPy slice does the job with minimal ceremony and without depending on ranking semantics.

Practical advice and wrap-up

Prefer a single, well-structured reshape over multiple melts when you already have aligned DataFrames. Keep score and rank data together as long as possible to reduce the chance of misalignment. If rank uniqueness isn’t guaranteed or isn’t needed, sorting raw values is a simple, robust alternative. Finally, choose clear, distinct variable names and generate arrays directly in the target shape to keep the setup obvious and easier to maintain.

The article is based on a question from StackOverflow by lubenthrust and an answer by mozway.

pandas python