Practical guides·8 min read

How to detect browser automation beyond user agents

Detection techniques that work when user agents lie: TLS fingerprints, HTTP/2 parameters, CDP artifacts, and behavioral signals.

What is browser automation?

Browser automation is the practice of driving a real (or mostly-real) browser from a script instead of a human. The script opens pages, clicks elements, fills forms, and reads the DOM through a control channel, usually the Chrome DevTools Protocol or a vendor-specific equivalent. The browser renders. The script decides. The site on the other side is supposed to tell the two apart and usually cannot.

Most bot detection starts with the user agent string, a self-reported label that every HTTP client sends with each request. Changing a user agent takes one line of code. The W3C WebDriver specification requires browsers to set `navigator.webdriver` to `true` when controlled by automation. That was supposed to be a reliable signal. It was not. As DataDomes threat research documented, when WebDriver bots realized they were being detected through `navigator.webdriver`, they simply toggled it back to false.

Why browser automation matters right now

The scale of what gets through is hard to overstate. Imperva measured 51% of all web traffic in 2024 as automated, the first time bots surpassed human visitors in a decade (Imperva, 2025). Of the bad-bot traffic, 41% is now classified as advanced: built specifically to mimic human behavior (Imperva, 2025). These bots spoof everything the browser surface exposes, not only the user agent.

If the signals a client volunteers about itself are unreliable, detection has to move to signals the client cannot easily control. That shift is not optional at this volume. Only 2.8% of websites are fully protected against bots, down from 8.4% a year earlier (DataDome, 2025). 85% of companies hit by account takeover attacks already had bot detection in place (Kasada, 2025). Single-layer detection degrades over time because the evasion ecosystem specifically targets whatever the current detection standard is.

Types of automation (Puppeteer, Playwright, Selenium, CDP-based)

Four families of automation tooling account for the majority of scraping and agentic traffic on the web today.

**Selenium.** The oldest and most widely taught framework. Drives browsers through ChromeDriver or GeckoDriver, which talk to the browser via vendor automation protocols. Leaves the most recognizable artifacts: the `$cdc_asdjflasutopfhvcZLmcfl_` variable that ChromeDriver injects into every page is a one-line regex catch.

**Puppeteer.** Googles own Node.js library for controlling Chromium via the Chrome DevTools Protocol. Ships with stealth plugins (puppeteer-extra-stealth) that overwrite obvious automation artifacts at page load. As of early 2025, puppeteer-extra-stealth no longer reliably bypasses Cloudflare, and maintainers paused active updates.

**Playwright.** Microsofts multi-browser automation framework, also CDP-based for Chromium and using vendor-specific protocols for Firefox and WebKit. Forks such as Patchright patch Playwrights CDP usage at the C++ layer to avoid emitting the side-effect events detection relies on.

**CDP-based and post-CDP frameworks.** nodriver, zendriver (an async fork), and rebrowser-patches form the newest generation. They either minimize CDP usage (avoiding high-risk domains like `Runtime` and `Console`) or eliminate CDP entirely, driving the browser through OS-level input simulation, so no CDP session exists for a detector to notice.

How browser automation works

Puppeteer, Playwright, and Selenium all talk to Chromium through the same channel: the Chrome DevTools Protocol, or CDP. It is a WebSocket interface the browser exposes for debugging, and automation frameworks use it to drive clicks, execute scripts, and read the DOM. Roughly 95% of automated actions end up running through `Page.evaluate`, which is CDPs way of running JavaScript inside the page (Rebrowser, 2024).

Running `Page.evaluate` requires enabling the CDP `Runtime` domain. Enabling it emits events the page can observe. For years, the standard way bot-detection code spotted automation was to plant an error object with a custom getter on `.stack`, call `console.debug()` on it, and watch for the getter to trigger. A firing getter meant the Runtime domain was serializing the object across the WebSocket.

Then, in May 2025, two V8 commits killed the trick. One titled Avoid error side effects in DevTools landed on May 7; Apply getter guard throughout error preview followed on May 9. Together they stopped the browser from running user-defined getters during error preview (Castle, 2025). The console.debug check went from reliable to useless, and most vendors did not notice for months.

How to identify automation on your site

Detection has to move below the layer the script controls. Four signal classes survive modern automation.

**TLS and protocol fingerprinting.** Before a browser sends its first HTTP request, it performs a TLS handshake. The ClientHello contains cipher suites, extensions, protocol versions, and ALPN preferences. These values are set by the network stack, not by any JavaScript the script might run later. JA4, developed by FoxIO and adopted by Cloudflare, resists the extension randomization that broke JA3. Cloudflare tracks over 15 million unique JA4 fingerprints generated from more than 500 million user agents (Cloudflare, 2025).

**HTTP/2 SETTINGS frame.** When a connection opens, the client sends a SETTINGS frame with parameters that vary by implementation. Chrome sends a WINDOW_UPDATE of ~15MB; Firefox sends ~12.5MB. Most HTTP libraries send zero or omit it entirely. A 100x difference, visible before a single page byte is exchanged. Pseudo-header order (`:method`, `:authority`, `:scheme`, `:path`) is hardcoded per browser and does not match what libraries send by default.

JavaScript and CDP artifacts

**JavaScript environment signals.** Modern detection checks how properties are defined, not their values. When a stealth plugin overrides `navigator.webdriver` to return `false`, the property descriptor changes. The getter is no longer native code; it is a proxy. Calling `toString()` on it returns something subtly different from what a real browser returns. Canvas and WebGL outputs are determined by the clients GPU. AudioContext processing depends on audio hardware. Tools running in cloud environments produce outputs that match the cloud providers hardware, not the consumer hardware their user agent claims.

**CDP artifacts.** When a CDP client connects and enables the `Runtime` domain, it triggers object serialization across the WebSocket connection. JavaScript running in the page can observe these serialization behaviors that do not occur in a browser with no CDP client attached. In October 2025 alone, Castle detected roughly 205,000 Puppeteer stealth events, but ten times more traffic from vanilla Selenium bots (Castle, 2025).

How to respond to automated traffic

Every detection technique has a countermeasure. TLS fingerprinting spawned curl-impersonate and tls-client. CDP detection spawned nodriver and Rebrowser. JavaScript fingerprinting spawned anti-detect browsers like Multilogin and GoLogin that maintain separate browser profiles with unique fingerprints. A response that rests on a single signal degrades over time because the evasion ecosystem specifically targets that signal.

The response that survives the arms race is cross-layer consistency checking combined with behavioral confirmation. A request whose TLS fingerprint, HTTP/2 settings, and JavaScript environment all match Chrome, and whose behavioral signals look human, is probably a real browser. A request where any layer contradicts the others is probably not. The more layers you check simultaneously, the more expensive evasion becomes.

Behavioral analysis picks up what fingerprinting cannot see. Bots move the mouse in straight lines at constant speed, with direction locked to fixed angles. Human cursors wander, follow curved paths with variable speed, and pause irregularly. In controlled tests, keystroke timing analysis alone correctly classified bots with 99.98% accuracy (IFIP, 2024). Bots either type with mechanical consistency or inject randomness that follows a uniform distribution, which is itself a distinguishing pattern. These signals are strongest on interactive pages such as login forms, search fields, and checkout flows.

Once a verdict is reached, three responses are available per request. Block automated traffic at the edge so origin never pays for it. Challenge borderline sessions with a non-interactive test that forces the client to execute code. Verify and allow partner agents and search indexers you want through, with a signed trust stamp and per-operator volume tracking.

Key takeaways

- Browser automation accounts for most of the bot traffic reaching modern sites. Bad bots made up 37% of all internet traffic in 2024, up from 32% the year before (Imperva, 2025). - User-agent filtering, IP reputation lists, and rate limiting alone catch the laziest bots and miss the rest. The W3C `navigator.webdriver` flag is trivially toggled back to `false`. - Effective detection crosses layers: TLS handshake, HTTP/2 SETTINGS, JavaScript environment, CDP artifacts, and behavior, cross-validated in real time. A contradiction between any two layers is the signal. - Response is a per-request decision: block, challenge, or verify. robots.txt handles well-behaved crawlers. Centinel checks every request across all these layers and decides in under 2ms, before the bot reaches origin.

See what's crawling your site right now

Run a free audit and get a detailed report of which AI crawlers are accessing your content. 48 hours.

Get your free audit