Skip to content
Practical guides·8 min read

How to detect browser automation beyond user agents

Detection techniques that work when user agents lie — from TLS fingerprints and HTTP/2 parameters to CDP artifacts and behavioral signals.

Why user agent checks fail

Most bot detection starts with the user agent string, a self-reported label that every HTTP client sends with each request. Selenium announces itself. Puppeteer used to announce itself. The problem is that changing a user agent string takes one line of code.

The W3C WebDriver specification requires browsers to set `navigator.webdriver` to `true` when controlled by automation. This was supposed to be a reliable signal. It wasn't. As DataDome's threat research documented, "when WebDriver bots realized they were being detected through navigator.webdriver, they simply toggled it back to false."

The scale of what's getting through is hard to overstate. 51% of all web traffic in 2024 was automated, the first time bots surpassed human visitors in a decade (Imperva, 2025). Of the bad bot traffic, 41% is now classified as "advanced": built specifically to mimic human behavior (Imperva, 2025). These bots spoof everything the browser surface exposes, not only the user agent.

If the signals a client volunteers about itself are unreliable, detection has to move to signals the client can't easily control.

TLS and protocol fingerprinting

Before a browser sends its first HTTP request, it performs a TLS handshake. The ClientHello message in that handshake contains cipher suites, extensions, protocol versions, and ALPN preferences. All are determined by the client's network stack, not by any JavaScript it might run later. These values form a fingerprint.

JA3 was the first widely adopted TLS fingerprinting method. It worked until browsers started randomizing the order of TLS extensions in their ClientHello messages, which broke JA3's hash consistency. JA4, developed by FoxIO and adopted by Cloudflare, fixes this by being resistant to extension randomization and adding protocol-level dimensions like ALPN. Cloudflare tracks over 15 million unique JA4 fingerprints generated from more than 500 million user agents (Cloudflare, 2025).

A Python script claiming to be Chrome 120 will have a JA4 fingerprint that belongs to Python's `requests` library. The mismatch is binary — there is no gray area.

The detection goes deeper at the HTTP/2 layer. When a connection opens, the client sends a SETTINGS frame with parameters that vary by implementation. Chrome sends a WINDOW_UPDATE of ~15MB; Firefox sends ~12.5MB. Most HTTP libraries send zero or omit it entirely. That's a 100x difference, visible before a single page byte is exchanged.

Even the order of pseudo-headers in HTTP/2 requests is hardcoded per browser. Chrome sends `:method, :authority, :scheme, :path`. Firefox sends `:method, :path, :authority, :scheme`. Safari sends `:method, :scheme, :path, :authority`. Standard HTTP libraries use none of these orderings.

A mismatched stack — a BoringSSL TLS fingerprint paired with Python's hyper-h2 HTTP/2 settings, for example — triggers immediate blocking. This cross-layer consistency check operates at the connection level, before any page content is served.

JavaScript environment signals

When automation tools control a browser, they modify the JavaScript environment in ways that are difficult to hide completely. The most obvious is `navigator.webdriver`, but that's just the beginning.

Headless browsers historically lacked APIs that real browsers have: specific plugins, MIME type handlers, screen properties. In November 2022, Google unified its headful and headless Chrome codebases, which eliminated many of these discrepancies. Detection adapted.

Modern JavaScript-based detection focuses on how properties are defined, not their values. When a stealth plugin overrides `navigator.webdriver` to return `false`, the property descriptor changes. The getter is no longer native code; it's a proxy. Calling `toString()` on the getter returns something subtly different from what a real browser returns. Detection systems test dozens of properties this way, checking whether property descriptors and `toString()` representations match a genuine browser environment.

Canvas and WebGL outputs are determined by the client's GPU. AudioContext processing depends on audio hardware. Automation tools running in cloud environments produce outputs that match the cloud provider's hardware, not the consumer hardware their user agent claims. Two visitors claiming to be Chrome on a MacBook Pro should not produce identical canvas hashes if they're on different machines — and they definitely shouldn't produce hashes that match an AWS GPU instance.

Chrome DevTools Protocol detection

Puppeteer, Playwright, and Selenium all control browsers through the Chrome DevTools Protocol (CDP). CDP works by opening a WebSocket connection to the browser and sending commands through domains like `Runtime`, `Network`, and `Page`.

Detection systems target CDP's side effects. When a CDP client connects and enables the `Runtime` domain via `Runtime.enable`, it triggers object serialization across the WebSocket connection. JavaScript running in the page can observe these serialization artifacts — behaviors that don't occur in a browser with no CDP client attached.

In October 2025 alone, Castle detected roughly 205,000 Puppeteer stealth events — but ten times more traffic from vanilla Selenium bots (Castle, 2025). Puppeteer stealth is losing ground because it patches JavaScript-level artifacts without addressing the CDP layer underneath.

The evasion side responded by changing architecture, not patches. "Attackers have started shifting to newer frameworks like nodriver or Rebrowser patches, which focus on removing subtle Chrome DevTools Protocol (CDP) side effects," Castle's research found. These frameworks either minimize CDP usage (avoiding high-risk domains like `Runtime` and `Console`) or eliminate CDP entirely, automating the browser through OS-level input simulation.

Behavioral analysis

Fingerprinting catches bots that lie about what they are. Behavioral analysis catches bots that lie about what they do.

The easiest pattern to spot is mouse movement. Bots move in straight lines at constant speed, with direction locked to fixed angles (0, 90, 180 degrees). Human cursors wander. They follow curved, exploratory paths with variable speed, displacements centered around 300-400 pixels, and irregular pauses. A human exploring a page is inefficient by nature.

In controlled tests, keystroke timing analysis alone correctly classified bots with 99.98% accuracy (IFIP, 2024). The signal is inter-key delay: humans vary their rhythm between familiar and unfamiliar words, pause to think, and correct mistakes. Bots either type with mechanical consistency or inject randomness that follows a uniform distribution, which is itself a distinguishing pattern.

Scroll behavior rounds out the picture. Bots scroll at constant rates or jump directly to target positions. Humans pause at content, vary speed, and occasionally scroll backward.

These signals are strongest on interactive pages: login forms, search fields, checkout flows. They are less useful for detecting crawlers that hit static pages without interaction. That's why behavioral analysis complements fingerprinting rather than replacing it.

The arms race

Every detection technique described above has a countermeasure in development. TLS fingerprinting spawned curl-impersonate and tls-client. CDP detection spawned nodriver and Rebrowser. JavaScript fingerprinting spawned anti-detect browsers like Multilogin and GoLogin that maintain separate browser profiles with unique fingerprints.

The data shows how this plays out at scale. Only 2.8% of websites are fully protected against bots, down from 8.4% a year earlier (DataDome, 2025). And 85% of companies hit by account takeover attacks already had bot detection in place (Kasada, 2025). Single-layer detection degrades over time because the evasion ecosystem specifically targets whatever the current detection standard is.

No single signal wins. Cross-layer consistency checking does. A request whose TLS fingerprint, HTTP/2 settings, and JavaScript environment all match Chrome, and whose behavioral signals look human, is probably a real browser. A request where any layer contradicts the others is probably not. The more layers you check simultaneously, the more expensive evasion becomes. Cost is the only deterrent that holds.

What this means for your site

Bad bots accounted for 37% of all internet traffic in 2024, up from 32% the year before (Imperva, 2025). The bots getting through are not the ones announcing themselves with honest user agent strings. They're the ones spoofing everything a surface-level check can see.

If your detection relies on user agent filtering, IP reputation lists, or rate limiting alone, you're catching the laziest bots and missing the rest. Effective detection requires checking signals across multiple layers (TLS handshake, HTTP/2 connection, JavaScript environment, CDP artifacts, behavior) and cross-validating them against each other in real time.

robots.txt handles well-behaved crawlers. For the 37% that aren't well-behaved, you need enforcement at the request level. Centinel checks every request across all of these layers and makes a decision in under 2ms — before the bot reaches your origin.

See what's crawling your site right now

Run a free audit and get a detailed report of which AI crawlers are accessing your content — in 48 hours.

Get your free audit