The state of TLS fingerprinting: What’s Working, What Isn’t, and What’s Next
TLS fingerprinting has become a prevalent tool to help security defenders identify what clients are talking to their server infrastructure. The idea – which initially came from Lee Brotherston’s 2015 research resulting in a blog post and subsequent talk at Derbycon – outlined that there were enough differences between how individual clients (and servers) handle TLS negotiations that we can differentiate between them. Since that time, there have been several methods and protocol adaptations to these ideas.
In this post, we’ll discuss three fingerprinting methods, as well as insights and caveats to each approach.
The Trailblazer: JA3
In 2017, a trio of researchers from Salesforce – John Althouse, Jeff Atkinson, and Josh Atkins – released a passive method for TLS fingerprinting called JA3. JA3 is used for fingerprinting a TLS client, and JA3S is its counterpart for servers. This method was found to be useful for identifying not only malware clients and servers, but also web API clients and browsers.
To calculate the JA3 fingerprint, we can receive or observe a TLS Client Hello packet and extract the TLS version, accepted ciphers, list of extensions, supported groups and elliptic curve formats. The extraction process is fairly simple: we take the decimal values of the bytes of those fields and concatenate them together according to the method specification.
Example JA3 fingerprint and hash
This fingerprint is also sometimes referred to as the JA3 string. For ease of sharing and reducing size, JA3 implementations will calculate an MD5 hash of this fingerprint. This makes it easy to share with others and is a more compact form for lookups in databases.
JA3S uses a similar process: Instead of the Client Hello, it makes use of the Server Hello packet to extract the TLS version, ciphers and extensions. It’s worth noting that Server Hello varies based on the Client Hello; therefore, it does not provide fingerprint uniqueness equivalent to its client counterpart, but it is still useful when used in conjunction with the JA3 client hash.
JA3 in some ways has properties similar to those of a browser’s User-Agent. As noted in the JA3 team’s blog post, there can be false positives. This can be due to clients behaving similarly enough to have the same hash, or through intentional deception. Attackers and bot developers alike are aware of fingerprinting and may attempt to emulate the TLS negotiation of a “good” client in order to evade detection. However, having to work to deceive controls which use JA3 is still an increased cost for attackers.
Public JA3 Databases: Convenient but Proceed with Caution
Being able to share a de-facto standard fingerprint among teams or across organizations has a lot of utility in detection engineering and threat research. These days, JA3 is well supported by many platforms and services. One common service and database that we see referenced is ja3er.com.
However, these databases are not perfect. In October 2021, Fastly’s SOC team noticed a mismatch between ja3er and an internal implementation developed for calculating JA3 fingerprints. After a thorough investigation, they narrowed it down to handling the TLS Application-Layer Protocol Settings Extension, which is not yet registered with IANA, but is included in Google’s BoringSSL implementation. The ja3er service drops this extension value, resulting in an invalid fingerprint being calculated within their system.
The inconsistency in calculating fingerprints between differing toolsets creates a situation where sharing fingerprints between organizations is no longer valuable. To understand how widespread this issue might be, we set up an experiment to see if there are other parameters that can cause mismatches. We implemented a custom client using uTLS that allowed us to control exactly what parameters we were sending during TLS negotiation. We then directed the client at both ja3er and our own server to compare the results. We were able to verify that the hashes again did not match.
We used Wireshark to confirm its output matched ours and that each field value was correct. When comparing the full JA3 fingerprint from our system to ja3er’s output, we found that a single elliptic curve, x448, accounted for the discrepancy. In the IANA registry this field is represented by the decimal value 30. For reasons unknown to us, ja3er produces a decimal value of 1035 for this field.
Output from JA3 diff script
Not long after this finding, we looked at the official JA3 repository on Github, and found that another user had independently discovered mismatches last October. They had already filed an issue and attempted to contact the author through multiple channels, seemingly without a response.
We can confirm that at the time of this writing, this issue is still present. In light of this, we would strongly recommend that defenders do not rely on the results of the ja3er service or database.
Unfortunately there aren’t great alternatives. Abuse.ch has a database of JA3 fingerprints, but it’s largely focused on malware clients. The original JA3 repository has references to lists but they have not been updated in a few years and aren’t comprehensive. Without a third party repository, you may want to consider creating one for your own use case.
Improved Server Fingerprinting: JARM
In November 2020, John Althouse together with Andrew Smart, RJ Nunnally and Mike Brady released another method for server TLS fingerprinting called JARM. JARM is used to scan and identify servers and provides more uniqueness compared to JA3S. Unlike JA3S, which utilizes passive observation, JARM involves active scanning to solicit information from servers. JARM sends 10 specially crafted TLS Client Hello packets and performs a hash over specific attributes of the responses. Since this method involves scanning, it is possible to JARM fingerprint large portions of the Internet proactively. This ability to scan for the fingerprint has resulted in widespread adoption in the industry and is supported in many tools and services.
Example JARM fingerprint
Similar to JA3/JA3S, defenders should not rely on a JARM hash alone. A concrete example of this is that Cobalt Strike’s JARM fingerprint is really Java’s JARM fingerprint. This means that if teams were to use the fingerprint of a malicious Cobalt Strike installation as a way to block connections they could also end up blocking other non-malicious Java software. Raphael Mudge did a great write-up on this topic. If you’re looking specifically for Cobalt Strike servers, you need to investigate further to confirm or at least have high confidence that you’ve encountered a malicious server.
The Latest: CYU
A couple of years after JA3 was announced, the QUIC protocol was continuing to gain traction. This left a blind spot and created a need to perform similar passive fingerprinting on those connections. Caleb Yu, as part of his internship at Salesforce, proposed a solution called CYU hash. Currently implementations of CYU are less common than JA3, although you will see it in a few widely-used tools such as Zeek and Suricata. We expect that CYU or a similar method will gain wider adoption now that HTTP/3 has become an official standard.
The work published by Yu and Althouse includes an example of a single attack tool using QUIC: Merlin C2. However, as many standard web clients have implemented support for QUIC (including web browsers and curl) we expect this to grow as time proceeds.
Where do we go from here?
John Althouse and team have been the predominant force in pushing encrypted channel fingerprinting forward for the past several years. They have done excellent work and we commend them for their efforts. Going forward, we see a lot of benefit coming from more community participation to address the challenges inherent in these solutions. In our opinion, this effort does not need to rise to the level of a full standards body, but written specifications and common test suites would be helpful to ensure that different implementations are getting correct results in the face of various inputs.
We encourage security teams to consider fingerprinting as another tool in their arsenal to identify and track malicious clients and servers. Sharing these identified fingerprints between teams can help to improve detections for everyone defending against malice on the internet. Just don’t forget to confirm that everyone in the sharing group has validated their tool’s fingerprint outputs or is at least using common tools to generate the fingerprints. Without confidence in the fingerprint output the sharing may not lead to the desired outcome.