2025, Dec 26 07:00

How to Combine Multiple container-ingredient_group Elements into One Ingredients String with Selenium

Learn how to scrape recipe ingredients with Selenium and XPath: merge container-ingredient_group nodes or target an ancestor to get one unified text block

Scraping grouped content often looks straightforward until the DOM structure forces you to rethink how you aggregate text. A common case is when Selenium returns a list of nodes for each group, and you end up with multiple fragments instead of a single, coherent block. Below is a concise walkthrough of why this happens with elements under class container-ingredient_group and how to collect them as one string without breaking the extraction flow.

Problem statement

You fetch a recipe page, locate all blocks with class container-ingredient_group and then pull their text. Each div lands in your result as a separate entry, which misaligns the data when you expect one combined string per recipe.

for link_idx in range(len(link_pool)):
    browser.get("https://receptes.tvnet.lv/recepte/23989-skabenu-darza-zalumu-zupa-ar-kupinatu-vistu-un-perlu-grubam")

    groups = browser.find_elements(By.XPATH, '//div[@class="container-ingredient_group"]')

for group_node in groups:
    collected_groups.append(group_node.text)

What’s actually going on

find_elements returns a list of matching elements. Calling .text on each of them yields separate strings for each group, so your output is fragmented by design. If the downstream logic expects a single string containing the entire ingredients section, you must explicitly merge these fragments or switch the selection to a single ancestor node and read .text once.

Two practical fixes

The first approach is to merge the text from multiple nodes into a single string. Build a list of strings from group nodes, join them with a delimiter, and append the result once.

groups = browser.find_elements(By.XPATH, '//div[@class="container-ingredient_group"]')

text_chunks = [node.text for node in groups]
one_blob = "\n".join(text_chunks)

final_payload.append(one_blob)

The second approach is to pick the parent container that holds all the desired subsections and read .text from that one node. This avoids manual joining and returns a single block. Because there can be many container nodes on the page, use an XPath that targets the one that actually contains container-ingredient_group. Note that this also brings along the header Sastāvdaļas.

container_xpath = '//div[@class="container" and div[@class="container-ingredient_group"]]'
section = browser.find_element(By.XPATH, container_xpath)
one_blob = section.text

final_payload.append(one_blob)

Why this matters

When you expect one record per page but scrape multiple slices, your data model drifts: indexes no longer match, lists of attributes de-sync, and any joining logic downstream becomes brittle. Consolidating the text at the point of extraction keeps the shape of the data stable and predictable.

Takeaways

If you truly need one string, join what Selenium returns from multiple nodes, or switch to a single ancestor that spans the entire area of interest. The first option gives you full control over delimiters and ordering. The second option is shorter and cleaner, but it may include extra labels such as Sastāvdaļas, which you can handle later if needed. In both cases, the program’s behavior stays the same: you end up with one coherent text block per recipe, ready for storage or further parsing.

python selenium-webdriver