2025, Oct 30 03:00

Stop StaleElementReferenceException in Selenium: Collect HREF Links First and Use WebDriverWait the Right Way

Learn why Selenium throws TypeError and StaleElementReferenceException during navigation and how to fix it: collect hrefs first and use WebDriverWait correctly.

Looping over a collection of Selenium WebElements, following each link, and scraping a small piece of data sounds trivial until navigation enters the picture. As soon as the browser moves away from the current page, previously located elements become invalid, and naïve waiting strategies quickly fall apart with TypeError or StaleElementReferenceException. Below is a clear walkthrough of the failure mode and a robust fix.

Problem setup

You start with a function that opens a listing page, iterates over a list of cards, drills into the anchor inside each card, goes to the detail page, extracts a date, and repeats the loop. The code looks about like this:

def pull_posting_dates(browser, start_url):
    date_bins = []
    waiter = WebDriverWait(browser, 5)
    browser.get(start_url)
    # cards - a list of web elements
    cards = browser.find_elements(By.CLASS_NAME, 'object-cards-block.d-flex.cursor-pointer')
    for card in cards:
        try:
            anchor = waiter(card, 5).until(
                EC.presence_of_element_located(
                    (By.CSS_SELECTOR, 'div.main-container-margins.width-100 > a')
                )
            )
        except NoSuchElementException:
            print('no such element')
        # follow the link
        browser.get(anchor.get_attribute('href'))
        stamp = browser.find_element(
            By.XPATH, '//*[@id="Data"]/div/div/div[4]/div[2]/span'
        ).text
        date_bins.append(stamp)

Why it breaks

The first issue is the TypeError: WebDriverWait is not callable. Passing a WebElement into a WebDriverWait like waiter(card, 5) attempts to call the wait object as a function, which it is not. The correct pattern is creating WebDriverWait with a driver and timeout, then calling until with an expected condition. Trying to bind a wait to an individual element in this way triggers the callable error.

The second, and more fundamental, issue is navigation. Once get() is called, the driver’s “view” of the previous page is gone. Selenium holds references to DOM objects in the browser’s memory. Navigating to a new page replaces that DOM entirely, which invalidates the stored references from the previous page. That’s why StaleElementReferenceException appears after the first navigation, and why reusing elements you collected before calling get() fails. Even without a full navigation, modifying the DOM can force you to locate elements again because object positions change in memory.

Fix: collect HREFs first, then visit each page

There are multiple ways to handle this, but the simplest approach is to extract all href attributes before navigating anywhere. With all links buffered in a Python list, you can safely iterate and call get() per link, waiting for the target element on each detail page. This is straightforward, reliable, and avoids reusing stale element references. It is not the most efficient pattern, but it is easy to reason about and it works.

from selenium.webdriver import Chrome, ChromeOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
SOURCE_URL = "https://upn.ru/kupit/kvartiry"
def scrape_post_dates(session, entry_url):
    session.get(entry_url)
    gate = WebDriverWait(session, 10)
    predicate = EC.presence_of_all_elements_located
    container = (By.CLASS_NAME, "object-cards-block.d-flex.cursor-pointer")
    links = []
    for node in gate.until(predicate(container)):
        a_tag = node.find_element(
            By.CSS_SELECTOR, "div.main-container-margins.width-100 > a"
        )
        href_val = a_tag.get_attribute("href")
        if href_val is not None:
            links.append(href_val)
    stamps = []
    for link_url in links:
        session.get(link_url)
        single_pred = EC.presence_of_element_located
        spot = (By.XPATH, "//*[@id='Data']/div/div/div[4]/div[2]/span")
        target = gate.until(single_pred(spot))
        txt = target.text
        stamps.append(txt)
        print(txt)
    return stamps
if __name__ == "__main__":
    opts = ChromeOptions()
    opts.add_argument("--headless=new")
    with Chrome(options=opts) as drv:
        results = scrape_post_dates(drv, SOURCE_URL)

Example output (partial):

Размещено: 16.06.2025 11
Размещено: 06.06.2025 6
Размещено: 02.06.2025 57
Размещено: 19.04.2025 42
Размещено: 03.04.2025 29
Размещено: 25.03.2025 63
...

Why this matters

Understanding how Selenium tracks DOM nodes is crucial. After navigation or any substantial DOM update, old references are no longer valid because the underlying page structure was replaced. The reliable pattern is to extract primitive data like URLs or IDs up front, then navigate and re-locate whatever you need on the new page. This keeps the scraper stable, reduces flaky waits, and helps isolate failures.

If you need better throughput, a multithreaded approach would be significantly more efficient, but the correctness principle stays the same: never rely on WebElement references across page loads.

Final notes

Avoid calling WebDriverWait as if it were a function and do not attach it to WebElements. Use a single wait per driver context and pass expected conditions to until. Do not reuse elements across navigations; instead, collect hrefs first and then iterate over them, re-finding what you need on each destination page. With this pattern in place, you eliminate the TypeError, sidestep stale references, and make the scraper predictable. If performance becomes a concern later, explore concurrency on top of the same approach.

The article is based on a question from StackOverflow by Burtsev and an answer by Ramrab.