Fundamentals·8 min read

TLS fingerprinting explained

How TLS fingerprinting identifies bots by the shape of their handshake. JA3, JA4, and why it catches scrapers that user-agent checks miss.

What is TLS fingerprinting?

TLS fingerprinting identifies the software making an HTTPS connection by inspecting the first message of the handshake. When a client opens a TLS connection, it sends a Client Hello packet that announces which TLS version it supports, which cipher suites it prefers, which extensions it understands, and in what order. That combination is a byte-level signature of the underlying TLS library, and different libraries produce different signatures.

The useful part: the Client Hello is a low-level implementation detail, not a header the scraper chose to set. You can put any string you want in the User-Agent field, but you cannot easily change the cipher suite list your TLS library ships with. A Python script claiming to be Chrome will get caught the moment the handshake hits the wire.

Why TLS fingerprinting matters right now

Automated traffic is now the majority of the web. Impervas 2025 Bad Bot Report found that automated traffic accounted for 51% of all web traffic in 2024 for the first time in a decade, with bad bots making up 37% (Imperva, 2025). A site that cannot tell a library from a browser is blind to more than a third of its own traffic, and that blindness is most expensive in exactly the places scrapers target: login endpoints, pricing pages, search APIs, and RSS feeds.

The shift from human to agentic traffic forces the question down to the protocol layer. Headers lie, IPs rotate, user agents are a free-text field. The TLS handshake is the one thing the scraper has to produce before any HTTP body is transmitted, and the code producing it is the code the scraper actually runs.

Types of TLS fingerprints (JA3, JA4, JA4H)

Three fingerprint formats dominate the landscape as of 2026.

**JA3.** The first widely adopted TLS fingerprinting method, published by Salesforce engineers in January 2019. JA3 takes five fields from the TLS Client Hello (version, accepted ciphers, list of extensions, elliptic curves, and elliptic curve point formats), concatenates them, and MD5-hashes the result into a 32-character fingerprint. Input looks like `769,47-53-5-10-49161-49162-49171-49172-50-56-19-4,0-10-11,23-24-25,0`. Hash output: `ada70206e40642a3e4461f35503241d5`.

The method worked for years. Then in early 2023, Chrome started randomizing the order of TLS extensions in its Client Hello. The math is brutal: a single Chrome client sending 16 extensions in randomized order can produce 16 factorial different orderings, roughly 20.9 trillion distinct JA3 hashes from the same browser on the same machine. As Stamus Networks put it, JA3 was rendered useless for identifying clients and user agents. Firefox followed with similar randomization.

JA4 and the JA4+ suite

**JA4.** FoxIO released JA4 in September 2023 to fix the randomization problem directly. The fix is simple: sort the ciphers and extensions alphabetically before hashing. Randomized order no longer changes the output because the output never depended on order in the first place. JA4 also restructured the fingerprint into an `a_b_c` format with three segments that can be queried independently. The `a` segment encodes protocol metadata. The `b` segment is a truncated SHA-256 of the sorted cipher list. The `c` segment is a truncated SHA-256 of sorted extensions plus signature algorithms.

**JA4+.** JA4 is one member of a larger suite. JA4+ includes JA4S for TLS servers, JA4H for HTTP request patterns, JA4X for X.509 certificate generation, JA4T for TCP stack fingerprints, and JA4SSH for SSH connections. Each captures a different layer of the connection, and the segments are designed to be composable. By early 2025, JA4 had shipped in Cloudflare and Fastly bot management.

One caveat the FoxIO team calls out in the spec: a browsers JA4 will shift roughly once a year as its TLS library updates. Fingerprint databases need to track those updates or they accumulate false positives against the latest Chrome.

How TLS fingerprinting works

Every browser, every scraping library, and every bot framework presents a different combination of ciphers and extensions. Pythons requests library has one JA3 hash. curl has another. Firefox has its own. A scraper that sends `Mozilla/5.0...` in the User-Agent but produces Pythons JA3 on the wire has already told you it is lying.

The match happens before any HTTP data is exchanged. Client opens TCP connection, sends Client Hello as the first TLS message, edge captures the byte pattern, hashes or sorts it per JA3 or JA4 rules, and looks the hash up against a signature database. If the fingerprint does not match a declared user agent, the edge has a decision to make before the HTTP request ever starts.

How to identify bots via TLS fingerprints

The scale at which TLS fingerprinting operates is easier to show with numbers. Cloudflare reports analyzing over 15 million unique JA4 fingerprints per day across more than 500 million user agents and billions of IP addresses (Cloudflare, 2024). That is the aggregate signal set a large edge provider pulls from every day.

The useful part of that dataset is what it says about what is not a browser. Out-of-the-box HTTP libraries like Pythons requests, Gos `net/http`, and Nodes axios all produce TLS handshakes that no real browser would ever send. The cipher order is different. The extension set is different. GREASE values (the intentionally-invalid bytes Chrome injects to keep middleboxes honest) are missing. Session ticket behavior is different. You do not need machine learning to catch them. A lookup against a handful of known library fingerprints is enough to flag the first packet.

Two concrete checks surface most library traffic in practice. Match the Client Hellos cipher order against the published order for the user-agents declared browser. Match the presence of GREASE values against what that browser version emits. A mismatch on either is a signal the request is not what it claims to be.

How to respond when bots spoof fingerprints

Scrapers saw JA3 coming years ago. There is now a small industry of impersonation libraries built specifically to replay real browser handshakes from scripts. curl-impersonate is a fork of curl built against NSS and BoringSSL that produces byte-identical Client Hellos for recent versions of Chrome, Firefox, Edge, and Safari. uTLS is a Go fork of `crypto/tls` that exposes ClientHello construction. CycleTLS brings the same capability to Node.js. curl_cffi gives Python the curl-impersonate handshake as a drop-in replacement for the `requests` library. All four are open source. All four are trivial to install.

A 2025 UC Davis study tested 20 commercial bot services against two leading detection systems and measured average evasion rates of 52.93% against DataDome and 44.56% against BotD across more than half a million requests (Venugopalan et al., IMC 2025). Call that what it is: peer-reviewed data showing that over half of commercial scraping traffic gets through one of the most-deployed bot detection systems in the industry.

The response is layered, not lateral. Cloudflares own engineers are upfront about this. In the blog post announcing JA4 Signals, the team writes that fingerprints can be spoofed, they change frequently, and traffic patterns and behaviors are constantly evolving. The answer they pair with JA4 is behavioral: inter-request features computed across the last hour of global traffic, looking for clusters that no honest browser would ever form.

A single JA4 hash tells you what software made the connection. It does not tell you what that software is doing. A forged Chrome fingerprint coming from thousands of residential IPs, hitting hundreds of pages per minute per session, requesting only HTML and skipping every image, is still obviously a bot. Just not because of its JA4. The fingerprint narrows the search. Behavior confirms the verdict. Remove any single layer and effectiveness drops.

Key takeaways

- TLS fingerprinting identifies the software behind an HTTPS connection by inspecting the Client Hello, before any HTTP data is exchanged. A User-Agent is a string the scraper chose; a TLS fingerprint is a property of the code actually running. - Three formats cover the landscape: JA3 (broken by Chromes 2023 extension randomization), JA4 (sorts before hashing, resistant to randomization), and the JA4+ suite (HTTP, TCP, SSH extensions). - Fingerprints alone do not close the gap. A UC Davis 2025 study measured average evasion rates of 52.93% against DataDome and 44.56% against BotD using impersonation libraries like curl-impersonate and uTLS. - The durable stack is layered: TLS fingerprint plus HTTP/2 SETTINGS, behavioral patterns, and a crawler signature database. Centinel runs 1,600+ crawler fingerprints against this layered model at the edge.

See what's crawling your site right now

Run a free audit and get a detailed report of which AI crawlers are accessing your content. 48 hours.

Get your free audit