2025, Nov 22 05:00

Scrape CoCoRaHS daily precipitation reports in Python by aligning POST payload, VIEWSTATE, and form field values

Learn why your CoCoRaHS scraper returns no results and RuntimeError, and how to fix it by matching POST payload, VIEWSTATE, checkbox and dropdown values.

Scraping data from form‑driven pages often fails not because of parsing, but because the POST payload doesn’t match what the page actually sends. A small mismatch in a checkbox value or a select option is enough to return different HTML and, consequently, no results to parse. Here’s a concise walkthrough of a real case on cocorahs.org that resulted in a RuntimeError when the table couldn’t be found, and how aligning the form data resolves it.

Problem setup

The target is the daily precipitation report list at https://www.cocorahs.org/ViewData/ListDailyPrecipReports.aspx. The script posts a search with station FL-BV-163 and a date range, then tries to extract the results table by its id. It fails with “table#ucReportList_ReportGrid not found”.

Below is a minimal example that reproduces the issue. The program logic is intact, only the names are different for clarity.

import requests
from bs4 import BeautifulSoup
from requests_html import HTMLSession

import pandas as pd
from io import StringIO

from datetime import datetime

client = requests.Session()

first_resp = client.get('https://www.cocorahs.org/ViewData/ListDailyPrecipReports.aspx')

dom = BeautifulSoup(first_resp.content, "html.parser")
vs_token = dom.find("input", {"name": "__VIEWSTATE", "value": True})["value"]
vsg_token = dom.find("input", {"name": "__VIEWSTATEGENERATOR", "value": True})["value"]
ev_token = dom.find("input", {"name": "__EVENTVALIDATION", "value": True})["value"]

search_resp = client.post('https://www.cocorahs.org/ViewData/ListDailyPrecipReports.aspx', data={
    "__EVENTTARGET": "",
    "__EVENTARGUMENT": "",
    "__LASTFOCUS": "",
    "VAM_Group": "",
    "__VIEWSTATE": vs_token,
    "VAM_JSE": "1",
    "__VIEWSTATEGENERATOR": vsg_token,
    "__EVENTVALIDATION": ev_token,
    "obsSwitcher:ddlObsUnits": "usunits",
    "frmPrecipReportSearch:ucStationTextFieldsFilter:tbTextFieldValue": "FL-BV-163",
    "frmPrecipReportSearch:ucStationTextFieldsFilter:cblTextFieldsToSearch:0": "checked",
    "frmPrecipReportSearch:ucStationTextFieldsFilter:cblTextFieldsToSearch:1": "",
    "frmPrecipReportSearch:ucStateCountyFilter:ddlCountry": "allcountries",
    "frmPrecipReportSearch:ucDateRangeFilter:dcStartDate:di": "6/13/2025",
    "frmPrecipReportSearch:ucDateRangeFilter:dcStartDate:hfDate": "2025-06-13",
    "frmPrecipReportSearch:ucDateRangeFilter:dcEndDate:di": "6/16/2025",
    "frmPrecipReportSearch:ucDateRangeFilter:dcEndDate:hfDate": "2025-06-16",
    "frmPrecipReportSearch:ddlPrecipField": "GaugeCatch",
    "frmPrecipReportSearch:ucPrecipValueFilter:ddlOperator": "LessEqual",
    "frmPrecipReportSearch:ucPrecipValueFilter:tbPrecipValue:tbPrecip": "0.15",
    "frmPrecipReportSearch:btnSearch": "Search",
})

grid = BeautifulSoup(search_resp.content, "html.parser").find("table", id="ucReportList_ReportGrid")

if grid is None:
    raise RuntimeError("table#ucReportList_ReportGrid not found")

frame = pd.read_html(StringIO(str(grid)))[0]

print(frame)

Why it fails

The request doesn’t match what the site expects. The form fields for the station text search checkbox, the country selector, and the precipitation field use specific values the site posts when you submit the form. Using different values leads to a different response and the expected results table won’t be present in the HTML, triggering the RuntimeError.

Concretely, these parameters need to be aligned with the site’s actual payload. The checkbox should send "on" instead of "checked". The country dropdown uses "0" instead of "allcountries". The precipitation field selection must be "TotalPrecipAmt" rather than "GaugeCatch". Also, the extra line for the second text field checkbox must be removed.

Fix and working example

The snippet below corrects the payload values and adds a User-Agent header. The rest of the flow remains the same: load the landing page, extract the state tokens, submit the search, parse the resulting table, convert it to a DataFrame.

import requests
from bs4 import BeautifulSoup
import pandas as pd
from io import StringIO
from datetime import datetime

http_agent = requests.Session()

landing = http_agent.get('https://www.cocorahs.org/ViewData/ListDailyPrecipReports.aspx')

tree = BeautifulSoup(landing.content, "html.parser")
vs_val = tree.find("input", {"name": "__VIEWSTATE", "value": True})["value"]
vsg_val = tree.find("input", {"name": "__VIEWSTATEGENERATOR", "value": True})["value"]
ev_val = tree.find("input", {"name": "__EVENTVALIDATION", "value": True})["value"]

req_headers = {
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/137.0.0.0 Safari/537.36'
}

form_payload = {
    "__EVENTTARGET": "",
    "__EVENTARGUMENT": "",
    "__LASTFOCUS": "",
    "VAM_Group": "",
    "__VIEWSTATE": vs_val,
    "VAM_JSE": "1",
    "__VIEWSTATEGENERATOR": vsg_val,
    "__EVENTVALIDATION": ev_val,
    "obsSwitcher:ddlObsUnits": "usunits",
    "frmPrecipReportSearch:ucStationTextFieldsFilter:tbTextFieldValue": "FL-BV-163",
    "frmPrecipReportSearch:ucStationTextFieldsFilter:cblTextFieldsToSearch:0": "on",
    "frmPrecipReportSearch:ucStateCountyFilter:ddlCountry": "0",
    "frmPrecipReportSearch:ucDateRangeFilter:dcStartDate:di": "6/13/2025",
    "frmPrecipReportSearch:ucDateRangeFilter:dcStartDate:hfDate": "2025-06-13",
    "frmPrecipReportSearch:ucDateRangeFilter:dcEndDate:di": "6/16/2025",
    "frmPrecipReportSearch:ucDateRangeFilter:dcEndDate:hfDate": "2025-06-16",
    "frmPrecipReportSearch:ddlPrecipField": "TotalPrecipAmt",
    "frmPrecipReportSearch:ucPrecipValueFilter:ddlOperator": "LessEqual",
    "frmPrecipReportSearch:ucPrecipValueFilter:tbPrecipValue:tbPrecip": "0.15",
    "frmPrecipReportSearch:btnSearch": "Search"
}

results = http_agent.post('https://www.cocorahs.org/ViewData/ListDailyPrecipReports.aspx', headers=req_headers, data=form_payload)

grid_node = BeautifulSoup(results.content, "html.parser").find("table", id="ucReportList_ReportGrid")

if grid_node is None:
    raise RuntimeError("table#ucReportList_ReportGrid not found")

out_frame = pd.read_html(StringIO(str(grid_node)))[0]

print(out_frame.to_string())

If you prefer to skip the second HTML parse step, you can read the table directly by targeting the id attribute during read_html:

out_frame = pd.read_html(StringIO(results.content.decode('utf-8')), flavor='bs4', attrs={'id': 'ucReportList_ReportGrid'})[0]

Why this matters for scraping dynamic forms

With form‑based pages, especially those that ship hidden state fields like __VIEWSTATE, you must echo back exactly what the browser sends. Values for checkboxes and select elements aren’t always obvious from page source. The practical method is to open your browser’s developer tools, switch to the Network tab, interact with the form, submit it, and inspect the request to see the payload that was actually sent. Replicate those names and values exactly in your POST. This is how you discover details such as a checkbox posting "on" or a dropdown posting "0" for a specific selection.

Takeaways

When a post‑submission selector returns no results and downstream parsing fails, verify the payload first. Reuse the server’s state values from the initial GET. Match checkbox and dropdown values to what the browser sends, not what seems plausible from the static HTML. Remove fields that the site doesn’t send for your chosen options. Once the POST mirrors the real submission, the expected table will be present and parsing into a DataFrame is straightforward.