
Incident Report: Feb 18-21 DDoS + Cloudflare Outage
Railway experienced a series of networking access issues that impacted customer workload availability throughout Feb 18th through Feb 20th, with some complications remaining for select users in Asia due to remediations in Feb 21st.
The incident was caused due to a number of hostile traffic patterns (DDoS) attackers were able to impact subsections of the railway proxy leading to service disruptions for a subset of customers. In isolation, usually these have minimal impact, however, due to the multiple attack events compounded with reduced network capacity due to fiber cuts, this caused a wave of sporadic outages on our platform. We truly regret the impact
For this incident, due to the wide impact throughout a variety of customers, we are reaching out to customers to compensate them for the business impact that they faced throughout the week. It’s our goal that companies can build and serve their business on our platform and we truly regret the impact that we have caused.
Within this incident report, we're going to be going into the shape of the attacks that we faced, the impact to our service services, what the Railway Engineering team put in place to mitigate these attacks and our plan prevent future ones in the future, as well as process improvements to make it so that we improve on further incidents for our customers’ business.
This report is a rather long, but in the spirit of transparency, a comprehensive one.
Over the course of February 18th through February 21st UTC, Railway customers experienced intermittent service disruptions across all regions.
Customer-facing impact included:
- Intermittent 4xx and 5xx for workloads served through Railway's HTTP proxy layer
- Periods of complete unreachability for services during peak attack windows
- During a WAF rollout on February 20th, approximately 2,700 active endpoints experienced errors due to SSL certificate re-provisioning, primarily affecting customers with non-standard domain configurations (e.g., A records instead of CNAMEs; forward proxies/WAFs above Railway blocking
.well-known) - HTTP requests exceeding 10 seconds were prematurely terminated during an early Fastly configuration issue
- Customers experienced an additional window of impact on February 20th due to an upstream Cloudflare BGP prefix outage
- On February 21st, some legitimate traffic from Asia was briefly blocked due to Fastly WAF tuning
Duration of impact by event:
| Event | Start (UTC) | Approx. Duration of Customer Impact |
| Attack 1 | Feb 18, 3:57 AM | ~48 minutes |
| Attack 2 | Feb 18, 9:44 AM | ~41 minutes |
| Attack 3 | Feb 18, 2:01 PM | ~30 seconds |
| Attack 4 | Feb 18, 5:39 PM | ~1 minute |
| Attack 5 | Feb 18, 9:05 PM | ~7 minutes |
| Attack 6 | Feb 19, 2:21 PM | ~28 minutes |
| Attack 7 | Feb 19, 4:23 PM | ~6 minutes |
| Attack 8 | Feb 19, 6:20 PM | ~15 minutes |
| Cloudflare BGP Outage | Feb 20, 5:01 PM | Customer Impact: 5 minutes ~Upstream 6 hours 49 minutes |
| Attack 9 | Feb 21, 5:20 AM | ~10 minutes |
| Fastly Tuning (Asia) | Feb 21, 7:39 PM | ~11 minutes |
Railway is reaching out to all affected customers to provide compensation for the business impact experienced during this period.
All times listed are UTC
- February 16th, 2026 at 11:21 A.M. - Railway receives a notification from our network transit vendor that upstream network links are physically degraded, this leads to no customer impact but enabled the following events to have downstream customer impact
Attack 1
- February 18th, 2026 at 3:57 A.M. - Railway’s internal network monitoring reports a traffic anomaly
- February 18th, 2026at 4:00 A.M. - Railway receives the first customer report of customer impact
- February 18th, 2026 at 4:12 A.M. - Railway confirms that the traffic anomaly is a DDoS
- February 18th, 2026 at 4:13 A.M. - Railway raises a post-facto incident
- February 18th, 2026 at 4:15 A.M. - Railway engages countermeasures
- February 18th, 2026 at 4:19 A.M. - Customers report partial recovery
- February 18th, 2026 at 4:45 A.M. - Hostile traffic reported to have ended
- February 18th, 2026 at 5:01 A.M. - Railway disables the traffic counter measures once the attack subsides
Attack 2
- February 18th, 2026 at 9:44 A.M. - Railway’s internal network monitoring reports a traffic anomaly
- February 18th, 2026 at 9:45 A.M. - Railway receives the customers report of customer impact
- February 18th, 2026 at 9:46 A.M. - Monitor pages the Infra on-call for a response
- February 18th, 2026 at 9:50 A.M. - Railway engages countermeasures
- February 18th, 2026 at 9:55 A.M. - Customers report partial recovery
- February 18th, 2026 at 10:25 A.M. - Hostile traffic reported to have ended
- February 18th, 2026 at 11:45 A.M. - Railway support raises a post-facto incident
- February 18th, 2026 at 12:10 P.M. - The Infra on-call provisions additional network infrastructure to help defend against additional attacks
Attack 3
- February 18th, 2026 at 2:01 P.M. - Railway’s network probes alert us of another wave of hostile traffic
- February 18th, 2026 at 2:02 P.M. - Due to countermeasures being put in place, the attack only affects customer traffic for 30~ seconds
Attack 4
- February 18th, 2026 at 5:39 P.M. - Railway’s network probes alert us of another wave of hostile traffic
- February 18th, 2026 at 5:41 P.M. - Customers report impact in the form of 502s
- February 18th, 2026 at 5:43 P.M. - Due to countermeasures being put in place, the attack only affects customer traffic for 1~ minute.
Attack 5
- February 18th, 2026 at 9:05 P.M. - Railway’s network probes alert us of another wave of hostile traffic. The attacker shifts to a L7 HTTP attack direct to Railway’s infra + proxies from a L3 TCP attack pattern requiring Cloudflare to act on after it notices a anomalous baseline
- February 18th, 2026 at 9:10 P.M. - Customers report impact in the form of 502s or Cloudflare error pages
- February 18th, 2026 at 9:12 P.M. - The network engineering team sets Railway’s L7 WAF to attack mode and mitigates the attack.
- February 18th, 2026 at 11:45 A.M. - Railway support raises a post-facto incident
Attack 6
- February 19th, 2026 at 2:21 P.M. - Railway’s network probes alert us of another wave of hostile traffic. The attacker shifts back to a L4 TCP attack pattern requiring us to fall back to the previous countermeasures to scrub traffic.
- February 19th, 2026 at 2:30 P.M. - A subset of customers report impact in the form of 502s and timeouts
- February 19th, 2026 at 2:34 P.M. - The Infra on-call pages the network team to engage the vendor
- February 19th, 2026 at 2:39 P.M. - Upstream vendor countermeasures fail, vendor acknowledges outage on the product used for counter measures.
- February 19th, 2026 at 2:45 P.M. - Railway support raises an incident
- February 19th, 2026 at 2:49 P.M. - Attack pattern ends
Attack 7
- February 19th, 2026 at 3:10 P.M. - After the failure of the upstream vendor’s countermeasures. The Railway Network engineering team engages Fastly to rollout a WAF for all Railway customers. Work then begins to integrate it.
- February 19th, 2026 at 4:23 P.M. - Railway’s network probes alert us of another wave of hostile traffic.
- February 19th, 2026 at 4:29 P.M. - Railway swaps and redeploys the proxies with a different IP set, mitigating the attack.
Attack 8
- February 19th, 2026 at 6:20 P.M. - Railway’s network probes alert us of another wave of hostile traffic.
- February 19th, 2026 at 6:35 P.M. - Railway swaps and redeploys the proxies with a different IP set, mitigating the attack.
- February 19th, 2026 at 6:50 P.M. - Initial commercials are established unblocking implementation of the Fastly CDN for Railway
- February 19th, 2026 at 7:05 P.M. - Railway’s network engineering team begins provisioning a separate set of proxies for Business Class, and Enterprise customers
- February 19th, 2026 at 11:05 P.M. - Railway’s customer teams begins migrating business plan and above customers to a separate shard of proxies.
Rollout of WAF
- February 20th, 2026 at 4:01 A.M. - Railway begins provisioning customer TLS certs on Fastly and begins migration of user domains to Fastly, the first step of enabling the WAF for all customers.
- February 20th, 2026 at 5:01 A.M. - 2700 active endpoints report an error to internal telemetry. Customers with incorrect domain setups such as A records are triaged to the new system. Customers who self-report are triaged.
- February 20th, 2026 at 6:01 A.M. - The network engineering team successfully mitigates non-active certs and re-provisions incorrect certs.
- February 20th, 2026 at 6:32 A.M. - Customers report that HTTP connections lasting more than 10s are terminated.
- February 20th, 2026 at 6:33 A.M. - Railway engineering engages Fastly to triage the issue. The fix is rolled out from Fastly.
- February 20th, 2026 at 7:10 A.M. - Customers who use Cloudflare report a cert collision for some custom domains on the platform. Railway engineering engages Fastly to triage the issue and is fixed with the implementation of fallback certs.
- February 20th, 2026 at 9:00 A.M. - Fastly is rolled out to the complete set of customers on Railway. Fastly begins collecting baseline traffic information to tune automatic DDoS response.
Upstream Cloudflare BGP IP Prefix Outage - https://www.cloudflarestatus.com/incidents/kwy3dt82bwbt
- February 20th, 2026 at 6:20 P.M. - Railway’s internal monitoring report that Railway hosted services are unreachable
- February 20th, 2026 at 6:20 P.M. - Railway automated alerting pages the Infra on-call
- February 20th, 2026 at 6:22 P.M. - Early investigation led us to believe it was an unknown DNS issue
- February 20th, 2026 at 6:24 P.M. - Railway support raises an incident
- February 20th, 2026 at 6:25 P.M. - Railway concludes that our BGP prefixes used for authoritative DNS were removed from Cloudflare. We escalate to Cloudflare’s team.
- February 20th, 2026 at 6:26 P.M. - An infra on-call engineer figured out that our announcements were labeled as
Withdrawnon the Cloudflare Magic Transit dashboard, and immediately triggered a re-announcement from the dashboard. - February 20th, 2026 at 6:28 P.M. - Customers observe recovery.
- February 20th, 2026 at 6:45 P.M. - Cloudflare formally acknowledges impact.
- February 20th, 2026 at 11:50 P.M. - Cloudflare formally acknowledges resolution
Attack 9
- February 21st, 2026 at 5:20 AM - An attacker decides to test our DDoS protection. (We would have preferred to not announce it, but people need to know where their traffic is routed through.)
- February 21st, 2026 at 5:30 AM - Due to Fastly still establishing a baseline, it took 10 minutes for automated mitigations to enter in place. Customers report recovery at this time, support adds a post facto incident.
Fastly Tuning (Asia)
- February 21st, 2026 at 7:39 PM - Our Fastly config starts accidentally blocking some legitimate traffic.
- February 21st, 2026 at 7:50 PM - We rolled out a fix restoring full traffic availability to end users in Asia.
- February 21st, 2026 at 8:10 PM - Subsequent attacks are thwarted, the Railway network engineering team proceeds to phase 2 of network reinforcement.
Railway maintains relationships with global internet bandwidth providers to deliver fast networking for our customers, alongside peering relationships with public clouds. Railway has a number of peering relationships at the bandwidth level to deliver a high level of service worldwide.
Since October 2025, we've observed the macro environment decrease in stability affecting these connections, especially with the escalating undersea cable cuts. On February 16th at 11:21 A.M. UTC, we observed a fiber cut between EU and US-East affecting an upstream vendor, resulting in decreased bandwidth capacity on Railway's end. This event did not impact Railway's service quality at the time.
For customers, it's important to know that Railway faces a variety of traffic patterns that range from benign to hostile. The goal of the network operations team at Railway is to ensure a high service quality for all customers, regardless of what traffic patterns Railway faces throughout the day. Railway historically has handled hostile traffic by investing in peering relationships and network capacity first and foremost. If you have a very large pipe, it's very difficult for the pressure of traffic to "burst" that pipe.
However, with the reduced upstream bandwidth from the fiber cut, that pipe got smaller. Ambient hostile attack patterns that would normally be absorbed were now able to overwhelm the network capacity we could offer. We were able to gracefully handle attack patterns from February 16th to February 17th using our primary mitigation from Cloudflare, an upstream vendor. But on February 18th, the attacks escalated.
At 3:57 A.M. UTC, our internal network monitoring reported the first traffic anomaly. Within minutes, customers reported impact, and we confirmed it was a DDoS attack. We engaged our countermeasures and customers saw partial recovery by 4:19 A.M.
The attacker came back. A second wave hit at 9:44 A.M., and our Infra on-call provisioned additional network infrastructure after it subsided. The countermeasures we had in place were working. When attacks 3 and 4 hit later that afternoon, customer impact was limited to roughly 30 seconds and 1 minute respectively.
Then the attacker adapted. At 9:05 P.M., they shifted from an L4 TCP flood to an L7 HTTP attack directed at Railway's infrastructure and proxies. This bypassed the L4 countermeasures entirely. Our network engineering team responded by setting Railway's L7 WAF to attack mode, mitigating the attack within minutes.
The important thing to understand here is how shared proxy infrastructure works: each proxy is responsible for making workloads available within a given set of hosts. When hostile traffic overwhelms a proxy, every workload on that proxy is affected, even if a customer has their own WAF vendor (Fastly, Akamai, Cloudflare, etc.) in front of their workload. However, if an attack targets a domain without a vendor, it saturates the proxy they happen to share.
Due to the degraded upstream bandwidth, we observed attack volumes reaching up to 1 Tbps at a time, directed at a myriad of workload targets across the platform.
The attacks continued into February 19th. At 2:21 P.M., another L4 TCP wave hit. We engaged our upstream vendor's countermeasures to scrub traffic, and they failed. The vendor acknowledged an outage on the product we relied on for mitigation. This was the turning point.
With our primary DDoS mitigation vendor down during an active attack, our Network Engineering team made the call to engage Fastly and roll out a WAF for the entire set of hosts on Railway. Work began immediately to integrate it.
In the meantime, the attacks kept coming. For attacks 7 and 8, our team swapped and redeployed the proxies with different IP sets to buy time. A manual mitigation that worked but wasn't sustainable. By 6:50 P.M., initial commercial terms with Fastly were established, unblocking implementation of a global WAF. At 7:05 P.M., our network engineering team also began provisioning a separate set of proxies for Business Class and Enterprise customers, isolating them from the shared infrastructure under attack. By 11:05 P.M., our customer teams began migrating business plan and above customers to this separate shard.
At 4:01 A.M. on February 20th, Railway began uploading our customer TLS certificates to Fastly and migrating user domains. This was the first step of enabling the WAF for all customers. This was an emergency rollout, and it came with complications.
Approximately 2,700 active endpoints reported errors due to SSL certificate re-provisioning, primarily customers with non-standard domain setups such as WAFs on top of Railway blocking our ACME challenges. Our network engineering team triaged these throughout the morning. At 6:32 A.M., customers reported that HTTP requests longer than 10 seconds were being terminated. This was a Fastly configuration issue that was quickly resolved. At 7:10 A.M., a cert collision affected customers using Cloudflare for custom domains, which was fixed with the implementation of fallback certs.
By 9:00 A.M., Fastly was rolled out to the complete set of customers on Railway and began collecting baseline traffic information to tune its automatic DDoS response.
Just when we thought we were through the worst of it, a completely separate issue hit. At 6:20 P.M., our internal monitoring reported that Railway-hosted services were unreachable. After initial investigation pointed to what we thought was a DNS issue, we determined that our BGP IP prefixes had been removed from Cloudflare. This was an unrelated Cloudflare incident. A few minutes after first page, we discovered that our prefix was labeled as Withdrawn on the Cloudflare dashboard, and immediately triggered a re-announcement via their dashboard, which fixed the issue within a minute. We were also talking directly with Cloudflare executives during the incident to ensure a status page entry was created so we could inform our customers.
The Railway authoritative DNS servers (those which resolve domains on our platform) are announced using a /24 IP block to the internet via Cloudflare Magic Transit to protect them from DDoS attacks. We also announce a less-specific /23 IP block to the internet for the same IP blocks from our own infrastructure, which should protect us from Cloudflare outages like this. However, due to the nature of which the outage occurred on Cloudflare, our /24 was still being announced to the internet by Cloudflare from some erroneous routers on their network, even though the traffic was blackholed within their network, which rendered our own /23 path redundant.
At 5:20 A.M. on February 21st, an attacker tested our new DDoS protection. Because Fastly was still establishing its traffic baseline, it took about 10 minutes for automated mitigations to kick in. Customers reported recovery at 5:30 A.M.
Later that evening at 7:39 P.M., our Fastly configuration started accidentally blocking some legitimate traffic to end users in Asia. We rolled out a fix by 7:50 P.M., restoring full traffic availability. By 8:10 P.M., subsequent attacks were thwarted automatically, and the Railway network engineering team proceeded to phase 2 of network reinforcement.
Usually it is Railway's policy to write up an incident report for every outage that affects workload availability. The continuous nature of the attacks made it difficult for the response teams to perform a full survey of the series of incidents.
We have teams such as platform, product, logistics, etc., and people in those teams take turns with a pager (in our case, PagerDuty). We have a myriad of monitoring systems set up to page the applicable team. When a page fires, we assess the issue, and if there is user impact, logistics is paged, and logistics calls a public incident for the applicable attribute, then runs communications and updates the status as needed.
The persistent and evolving nature of these attacks, nine distinct waves across four days shifting between L4 TCP floods and L7 HTTP attacks, meant that our incident response team was in a near-continuous state of triage. Each new attack required re-evaluation of the threat vector and adjustment of countermeasures, limiting the team's ability to perform root-cause analysis in parallel with active mitigation.
During Attack 6 on February 19th, our primary upstream DDoS mitigation vendor experienced a product outage at the exact moment we needed countermeasures engaged. This forced our network engineering team to pivot to an entirely new mitigation strategy, engaging Fastly to roll out a global WAF, under active attack conditions. This was not a planned migration and required establishing commercial terms, provisioning infrastructure, and integrating a new vendor into our networking stack within hours.
On February 20th, just hours after completing the Fastly WAF rollout, Cloudflare experienced an unrelated incident in which our BGP IP prefixes were removed from Cloudflare. This created a compounding failure: customers who had weathered the DDoS attacks and WAF migration were then hit with an entirely separate upstream networking outage.
Rolling out a global WAF under emergency conditions introduced its own set of challenges. Approximately 2,700 endpoints encountered SSL certificate errors during the migration, primarily those with non-standard domain configurations. Additionally, an initial Fastly configuration terminated HTTP requests exceeding 10 seconds, and a certificate collision affected customers using Cloudflare for their custom domains. Each of these issues required real-time coordination between Railway's engineering team and Fastly's support team to resolve.
- Global WAF deployment: All customer workloads on Railway are now protected behind Fastly's WAF with automated DDoS detection and mitigation. This closes the primary attack vector that allowed hostile traffic to impact shared proxy infrastructure.
- Business Class and Enterprise proxy isolation: Business plan and above customers have been migrated to a dedicated shard of proxy infrastructure, ensuring that attack traffic targeting other workloads on the platform cannot impact their services.
- L7 WAF attack mode: Railway's L7 WAF rules have been updated to proactively defend against HTTP-layer attack patterns, in addition to the L3/L4 protections already in place.
- Fastly baseline tuning: Fastly's automated DDoS response system has completed its initial traffic baseline collection and is actively tuning detection thresholds.
- Network capacity restoration: We are working with our bandwidth partners to restore full upstream bandwidth capacity following the February 16th fiber cut. We are also diversifying our transit relationships to reduce single-vendor dependency for critical bandwidth. This workstream is now complete.
- Improved SSL provisioning pipeline: We are hardening our certificate provisioning and migration tooling to gracefully handle non-standard domain configurations (A records, Cloudflare-proxied domains, wildcard certs) during infrastructure changes, reducing the error surface for future migrations.
- Enhanced monitoring and alerting: We are expanding our network monitoring to provide earlier detection of bandwidth degradation events and automated correlation between upstream capacity changes and DDoS risk posture.
- Phase 2 network reinforcement: Our network engineering team has begun work on the next phase of network hardening, which includes additional redundancy at the proxy layer, improved traffic isolation between customer workloads, and expanded peering relationships to increase overall bandwidth headroom. We plan to move to a 1:1 model where the proxy is tied to per host to minimize the blast radius os such attacks assuming they overwhelm a WAF.
- Incident response process improvements: We are revising our incident response playbooks to better handle sustained, multi-day incidents. This includes clearer escalation paths for vendor failures, pre-negotiated emergency contracts with backup vendors, and dedicated incident commanders for extended events.
- Customer communication improvements: We are investing in more proactive and granular customer communication during incidents, including per-region status updates, automated impact notifications for affected workloads, and clearer guidance on customer-side mitigations (such as temporarily removing upstream proxies during BGP events). We encountered multiple issues with our status page vendor reporting copy such as “All systems operational” at the top of the page when we were in an outage state, confusing our customers. We are seeking a different vendor for our status updates.
- Domain configuration guidance: We will be working to support additional domain configuration options so that we can minimize the risk of certificate and routing issues during infrastructure changes and show customers within the product when it’s not properly configured.
We care deeply about our customers, your workloads matter to us, and we understand our downtime is also downtime for your customers.
The work resulting from these series of issues, although in non-ideal conditions, closed the gaps that turned individual failures into cascading outages on the network. The meta-pattern across every incident was the same: one thing fails, and because our systems were too tightly coupled and our blast radius too large, it cascaded into something much bigger. The last month, we have made significant progress on the scale of the platform where we can efficiently serve 10,000+ new customers a day.
However, it not lost on us that what matters are the people with the existing businesses on Railway and the uptime you deserve. For that, we apologize.
You chose to run your businesses on Railway. A lot of you have been here since the early days. That means something to us, and we don't take it for granted especially when we let you down.
Trust gets rebuilt through actions over time, not blog posts, so we appreciate you allowing us to try to win and keep your business.
