Inside Fastly: a look at our vulnerability remediation process
We are consistently impressed by the engineers who use our platform to build great things, design exceptional services, and even tell us when we’ve made a mistake. We experienced one example of the latter when researcher Emil Lerner sent in a report to our Vulnerability Disclosure Program after he discovered what we thought was an interesting bug.
We thought this would be a good opportunity to give you some insight into our vulnerability remediation process and team. After all, how better to learn what works — and why — than from each other.
In this post, I’ll tell you what he found, how we rolled out a complete fix for it in just one week following triage, and why we were able to roll out a fix so speedily. The CISA recommended time to remediation is 30 days, but we completed the fix just seven days after triage.
Before we dig in, it's worth noting that we have no evidence or reason to believe that this vulnerability was exploited to attack customers.
Part of our security organization, the vulnerability remediation team’s purpose is to understand discovered weaknesses and help the internal engineering teams fix them. We work closely with security researchers and our internal engineering teams to monitor and fix vulnerabilities. We have a few areas of focus that we’ve found enable us to work thoroughly and quickly.
First, we’re dedicated to constantly improving our vulnerability triage process. With our triage process, we strive to be able to understand and reproduce reported issues within five business days, which is half the time of the typical “time to triage”.
Then, within our vulnerability triage process, we make sure our engineering team has as much information as possible in order to debug. This means we have to be thoughtful in how we communicate the report — for example, giving the engineering team a method to quickly reproduce the issue, ideally in an automated fashion.
And finally, we focus on hiring the best engineers (and we’re hiring). We have a very talented team with deep expertise in system-level programming, as well as deep expertise in network protocols. As a result, their processes and workflows allow them to review the report and quickly take action — as we saw with this particular vulnerability.
The vulnerability, reported on Nov. 23, 2021, relates to the HTTP/3 server-side implementation in H2O, an optimized HTTP server that supports HTTP/1, HTTP/2, and HTTP/3. We are a main contributor to this open-source project. It was discovered that if the H2O server receives a set of QUIC frames in a certain order, H2O can be tricked into treating uninitialized memory as HTTP/3 frames that have been received.
If H2O is then used as a reverse proxy, an attacker can take advantage of this vulnerability to cause the reverse proxy to send the internal state of H2O to backend servers that are controlled by the attacker or a third party, as detailed in the diagram below. The attacker does not control what exactly will be dumped by the H2O server — it may be other users’ requests, responses to them, or something else present in H2O’s previously freed memory.
This vulnerability, labeled CVE-2021-43848, would only have been applicable to those customers who have HTTP3 enabled on Fastly (and only on services running on HTTP3). As mentioned above, we have no evidence or reason to believe that this vulnerability was exploited to attack any of those customers.
The remediation process
Our initial triage team received and acknowledged the report, then forwarded it for a deeper technical review to our vulnerability triage team.
After the team reviewed the report and reproduced the issue, our engineers determined the issue, created the fix, and deployed it to a test environment within a day. We found that when H2O’s HTTP3 receive buffer is updated via
lib/http3/common.c, the memory being used for the buffer isn’t cleared before being reused, if the buffer size is less than the new size.
The vulnerability triage team retested and confirmed that the code changes fixed the issue. Once the fix was confirmed, the rollout process was initiated and completed. You can see the full timeline below.
Nov. 23, 2021: Researcher reports issue.
Nov. 24, 2021: Initial triage team receives report.
Nov. 29, 2021: Vulnerability triage team completes research on report and confirms reproduction.
Dec. 1, 2021: Engineering team creates a fix, test environment, and vulnerability triage team performs retesting to review the fix. Vulnerability triage team communicates with the researcher to get confirmation that the proposed code modification fixes the vulnerability.
Dec. 8, 2021: Rollout of fix completes.
In the end, we awarded the report a bounty, and as you can see in the timeline above, the entire process took just about two weeks, a turnaround we’re truly proud of — and that we credit to the great work being done on this team. We thank Emil Lerner for submitting this report. You can read his full write up here.