2025, Dec 21 05:00

How to Automate Dikidi Business Login: Dealing with HTML-driven Flow, CSRF, Cookies, and Selenium Locators

Learn why Dikidi login via Python requests returns HTML, handle CSRF and cookies, and when to use a session token or Selenium for automated access securely.

Automating login to a modern web app often looks trivial until the flow turns out to be stateful and UI-driven. That’s exactly what happens when trying to sign in to the Dikidi Business site using plain Python requests: the endpoint behaves differently based on the input, and the server drives the next step with HTML fragments instead of a simple JSON API.

Problem overview

The authentication endpoint at https://auth.dikidi.ru/ajax/check/auth/ accepts credentials, but the response isn’t always a success-or-fail JSON. With an obviously invalid phone number you get a structured error in JSON. With a number that exists, the server returns a chunk of HTML that contains a password prompt and a hidden field named csrf. With an unregistered number you also get HTML. In other words, you’re dealing with a staged flow where the number is validated first and only then the password is requested.

Code that reproduces the issue

The following snippet shows a straightforward POST that does not account for the server’s two-step flow or any anti-CSRF state. It demonstrates the behavior differences you’ll see depending on the input values.

import requests

with requests.Session() as http:
    endpoint = 'https://auth.dikidi.ru/ajax/check/auth/'
    payload = {
        'number': 'number',
        'password': 'password'
    }

    resp = http.post(endpoint, data=payload)
    print(resp.text)

When the number format is clearly wrong, the server responds with JSON like {"error":{"code":"NUMBER_NOT_TRUE","message":"Probably you entered incorrect number"},"data":{}}. With an unregistered number, the server returns an HTML page. With a real number, the response is an HTML fragment that contains a password field, the phrase Forgot your password?, and a hidden input named csrf with a value. That fragment is intended to be injected into the modal on the website and used in the next request.

If you try to drive the flow through Selenium instead, the phone input might not be interactable through the chosen locator, which blocks sending keys to the field. For reference, the attempt below triggers the modal but fails to type a number using the ID locator.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time

browser = webdriver.Chrome()
browser.get("https://dikidi.ru/en/business")

browser.find_element(By.PARTIAL_LINK_TEXT, "Login / Registration").click()
time.sleep(1)

browser.find_element(By.PARTIAL_LINK_TEXT, "Phone number").click()
time.sleep(1)

phone_input = browser.find_element(By.ID, 'number')
phone_input.send_keys('my number without a country code')

time.sleep(10)

What’s really happening

The login is not a single JSON API call. The server validates the phone number and then responds with HTML for the next step of the flow. That HTML includes a hidden field named csrf and prompts for a password, which indicates that the subsequent request is expected to include CSRF-related state and likely the cookies set earlier in the flow. Sending just number and password to the initial endpoint skips this state and results in HTML intended for the UI layer rather than an authenticated session.

Working approach with a session token

The reliable path in this case is to authenticate using the session token the site sets in a cookie. Once you have that token, you can make authenticated requests by passing it back as a cookie without replaying the modal steps.

import requests

cookie_jar = {'token': 'MyToken'}
requests.post('https://dikidi.ru', cookies=cookie_jar)

Here, the presence of the token cookie is what marks the session as authenticated for subsequent requests to the site. This bypasses the two-phase modal exchange entirely because the server already recognizes the session via the token.

Why this matters

Not all login forms expose a clean, documented JSON API. Some endpoints are designed for the front-end, return HTML fragments, and depend on CSRF fields and dynamic cookies that are negotiated across several requests. Mixing JSON and HTML responses in one flow is a signal that the server expects state carried between steps. In such scenarios, reusing the site’s own session token is often the most stable and least brittle way to perform authenticated calls programmatically.

Conclusion

If a direct POST with credentials keeps yielding HTML instead of a logged-in state, treat the flow as UI-driven: the server gates progress through staged responses and CSRF. When you already have a valid session token, supply it as the token cookie to operate as an authenticated user. If you still need to automate the full login, capture and send the CSRF value along with the cookies the server sets between steps. And if browser automation is involved, choose a locator strategy that matches what the page actually exposes for the phone input rather than relying solely on an element ID that might not be interactable. Understanding these details upfront saves time and prevents brittle scripts in environments where authentication is tightly coupled to front-end state.