Skip to content
Fundamentals·8 min read

TLS fingerprinting explained

How TLS fingerprinting identifies bots by the shape of their handshake — JA3, JA4, and why it catches scrapers that user-agent checks miss.

What is TLS fingerprinting?

TLS fingerprinting identifies the software making an HTTPS connection by inspecting the first message of the handshake. When a client opens a TLS connection, it sends a Client Hello packet that announces which TLS version it supports, which cipher suites it prefers, which extensions it understands, and in what order. That combination is a byte-level signature of the underlying TLS library, and different libraries produce different signatures.

The useful part: the Client Hello is a low-level implementation detail, not a header the scraper chose to set. You can put any string you want in the User-Agent field, but you cannot easily change the cipher suite list your TLS library ships with. A Python script claiming to be Chrome will get caught the moment the handshake hits the wire.

How JA3 works

JA3 was the first widely adopted TLS fingerprinting method. Salesforce engineers published it in January 2019, building on work they had already released on GitHub. It takes five fields from the TLS Client Hello (version, accepted ciphers, list of extensions, elliptic curves, and elliptic curve point formats), concatenates them, and MD5-hashes the result into a 32-character fingerprint.

Here is what the input looks like before hashing: `769,47-53-5-10-49161-49162-49171-49172-50-56-19-4,0-10-11,23-24-25,0`. That becomes the hash `ada70206e40642a3e4461f35503241d5`.

Every browser, every scraping library, and every bot framework presents a different combination of ciphers and extensions, so the JA3 hash is a consistent identifier for the underlying software. Python's requests library has one JA3 hash. curl has another. Firefox has its own. A scraper that sends `Mozilla/5.0...` in the User-Agent but produces Python's JA3 on the wire has already told you it is lying.

Why Chrome broke JA3 in 2023

The method worked for years. Then in early 2023, Chrome started randomizing the order of TLS extensions in its Client Hello. The goal was to prevent servers and middleboxes from fixating on a specific extension sequence, but the side effect was that JA3 lost its ability to identify Chrome.

The math is brutal. A single Chrome client sending 16 extensions in randomized order can produce 16 factorial different orderings, roughly 20.9 trillion distinct JA3 hashes from the same browser on the same machine. As Stamus Networks put it after measuring the impact, "JA3 has been rendered useless for identifying clients and user agents."

Firefox followed with similar randomization. Within months, the industry's default TLS fingerprint no longer distinguished a real browser from a noise pattern, which left bot detection vendors with a gap and no obvious replacement.

Enter JA4

FoxIO released JA4 in September 2023 to fix the randomization problem directly. The fix is simple: sort the ciphers and extensions alphabetically before hashing. Randomized order no longer changes the output because the output never depended on order in the first place.

JA4 also restructured the fingerprint itself. Instead of one opaque MD5, a JA4 fingerprint has an `a_b_c` format with three segments that can be queried independently. The `a` segment encodes protocol metadata: TLS version, SNI presence, ALPN value, cipher and extension counts. The `b` segment is a truncated SHA-256 of the sorted cipher list. The `c` segment is a truncated SHA-256 of sorted extensions plus signature algorithms. A detection engine can match on any single part of the fingerprint without needing the whole thing, which is useful when a client's ciphers are stable but its extensions are randomized, or when only the protocol metadata is known.

JA4 is actually one member of a larger suite. JA4+ includes JA4S for TLS servers, JA4H for HTTP request patterns, JA4X for X.509 certificate generation, JA4T for TCP stack fingerprints, and JA4SSH for SSH connections. Each captures a different layer of the connection, and the segments are designed to be composable.

One caveat the FoxIO team calls out in the spec: a browser's JA4 will shift roughly once a year as its TLS library updates. Fingerprint databases need to track those updates or they accumulate false positives against the latest Chrome. By early 2025, JA4 had shipped in Cloudflare and Fastly bot management, with Akamai and others tracking the format.

What TLS fingerprints reveal about bots

The scale at which TLS fingerprinting operates is easier to show with numbers. Cloudflare reports analyzing over 15 million unique JA4 fingerprints per day across more than 500 million user agents and billions of IP addresses (Cloudflare, 2024). That is the aggregate signal set a large edge provider pulls from every day.

The useful part of that dataset is what it says about what is not a browser. Out-of-the-box HTTP libraries like Python's requests, Go's `net/http`, and Node's axios all produce TLS handshakes that no real browser would ever send. The cipher order is different. The extension set is different. GREASE values (the intentionally-invalid bytes Chrome injects to keep middleboxes honest) are missing. Session ticket behavior is different. You do not need machine learning to catch them. A lookup against a handful of known library fingerprints is enough to flag the first packet.

Which matters because automated traffic is now the majority of the web. Imperva's 2025 Bad Bot Report found that automated traffic accounted for 51% of all web traffic in 2024 for the first time in a decade, with bad bots making up 37% (Imperva, 2025). A site that cannot tell a library from a browser is blind to more than a third of its own traffic, and that blindness is most expensive in exactly the places scrapers target: login endpoints, pricing pages, search APIs, and RSS feeds.

How bots fight back

Scrapers saw JA3 coming years ago. There is now a small industry of impersonation libraries built specifically to replay real browser handshakes from scripts. curl-impersonate is a fork of curl built against NSS and BoringSSL that produces byte-identical Client Hellos for recent versions of Chrome, Firefox, Edge, and Safari: the same TLS handshake a real browser would send, driven by a Python or shell script. uTLS is a Go fork of `crypto/tls` that exposes ClientHello construction, so a developer can set any field they want and pick from a library of pre-built browser profiles. CycleTLS brings the same capability to Node.js. curl_cffi gives Python the curl-impersonate handshake as a drop-in replacement for the `requests` library. All four are open source. All four are trivial to install.

These are not obscure research tools. They ship in the toolchains of commercial scraping services, and the services that use them are measurably hard to block. A 2025 UC Davis study tested 20 commercial bot services against two leading detection systems and measured average evasion rates of 52.93% against DataDome and 44.56% against BotD across more than half a million requests (Venugopalan et al., IMC 2025).

Call that what it is. Peer-reviewed data shows that over half of commercial scraping traffic gets through one of the most-deployed bot detection systems in the industry. The number is not a marketing claim. It is an academic measurement from traffic captured under realistic conditions.

Why fingerprints alone aren't enough

Cloudflare's own engineers are upfront about this. In the blog post announcing JA4 Signals, the team writes that "fingerprints can be easily spoofed, they change frequently, and traffic patterns and behaviors are constantly evolving." The answer they pair with JA4 is behavioral: inter-request features computed across the last hour of global traffic, looking for clusters that no honest browser would ever form.

The logic is straightforward. A single JA4 hash tells you what software made the connection. It does not tell you what that software is doing. A forged Chrome fingerprint coming from thousands of residential IPs, hitting hundreds of pages per minute per session, requesting only HTML and skipping every image, is still obviously a bot. Just not because of its JA4. The fingerprint narrows the search. Behavior confirms the verdict.

This is why modern bot detection is layered. TLS fingerprint is one input. IP reputation, request cadence, header consistency, JavaScript execution signals, and crawler database matches are others. Remove any single layer and effectiveness drops, but no single layer is enough on its own.

What this means for AI crawler defense

AI crawlers run the same tooling as every other scraper: the same impersonation libraries and the same residential proxy networks. Tollbit data shows that roughly a third of AI scrapes bypass robots.txt entirely, which means you cannot trust what a crawler says about itself in either the User-Agent header or the robots.txt opt-out. The durable way to identify an AI crawler is to inspect what its software actually does at the protocol level and match against a database of known crawlers.

That is what Centinel runs at the edge. Every request is matched against 1,600+ crawler fingerprints using TLS signals, header patterns, and behavioral heuristics, the same layered approach Cloudflare, Fastly, and the academic research all point toward, applied specifically to AI crawler detection. TLS fingerprinting gives you the baseline. A crawler database and behavioral enforcement close the rest of the gap.

See what's crawling your site right now

Run a free audit and get a detailed report of which AI crawlers are accessing your content — in 48 hours.

Get your free audit