Avatar of Diana MooreAvatar of Brody Over
Diana Moore & Brody Over

Incident Report: February 11, 2026

Railway experienced an outage caused by a misconfigured automated abuse enforcement system that incorrectly terminated legitimate user deployments.

After a new abuse pattern was detected, we rolled out a new anti-fraud ruleset that unfortunately delivered a false positive for legitimate workloads leading to workloads being put into a “forced pause” state where they were unable to start.

In total, <3% of our fleet was impacted during this staged rollout.

This resulted in unexpected SIGTERM signals being sent to active workloads. This included databases such as Postgres and MySQL, causing service disruptions across the platform. After we received reports, we began working on restoring access for all affected customers.

The way this manifested was a subset of customers found that no end user can access their workload, or access a affected workload (such as a DB) which manifests as an outage. Making the implicit, explicit, we treat workload access outages critically internally and the platform team immediately began triaging the issues when we received reports.

When a Major Outage occurs, it is Railway's policy to share the public details of what happened.

This incident caused disruptions for around 3% of services across the platform. Affected users experienced:

  • Roughly 3% of workloads were inaccessible to end customers
  • Applications and databases taken offline due to erroneous SIGTERM signals
  • Deployment statuses on the dashboard inaccurately reflecting the state of services. Previously terminated workloads continued to appear as active
  • Networking errors due to services trying to contact services that were no longer online
  • Slower deployment times due to back pressure when users were redeploying services to restore access

On February 11, 2026:

  • 9:12 UTC - We identified some additional bypassed our existing automated systems
  • 9:22 UTC - After analysis, we determined that there was only internal elevated metrics and no customer impact. On call began implementing additional automated checks
  • 12:35 UTC - We performed a dry run of an update to our automated abuse detection system
  • 14:33 UTC - We moved from dry run to live run on our automated abuse detection system
  • 14:47 UTC - On-call engineers noticed abnormal shutdown of deployments causing outages in customer environments
  • 14:56 UTC - We reverted the changes causing abnormal shutdown of deployments
  • 15:07 UTC - Incident declared on our Status Page
  • 15:15 UTC - Revert begins on affected hosts, partial recovery begins
  • 16:31 UTC - Full reversion confirmed, all deployments restart abilities restored
  • 17:29 UTC - Automated recovery for all impacted deployments started
  • 18:31 UTC - Automated recovery for all impacted deployments completed

The full incident is available on our Status Page here

Railway maintains a threat model system for analyzing and preventing abuse. At 9:12 UTC, we identified a new abuse pattern. We rolled out a mitigation heuristic, dry-ran it, and then initiated a staged, fleet-wide rollout.

After the rollout was complete, engineers noticed the enforcement logic was overly broad in its targeting criteria. Rather than isolating only the intended workloads, the system incorrectly matched certain legitimate user processes, including some databases and application services. As a result, the enforcement system sent SIGTERM signals to legitimate user workloads.

All impacted user workloads were automatically restarted. A sudden surge in active deployments caused slower than normal deployment times across the platform. Manually triggered redeploys were delayed due to this spike.

The enforcement system was tested extensively in staging prior to rollout and showed no adverse impact. It was dry run in production and showed correct and accurate abuse identification. Only when turned on, via staged rollout in production, did false positives end up being observed.

Going forward, Railway is tightening the review and rollout process for any changes to our fraud model that leverage workload fingerprinting.

From here on, Railway is implementing the following changes:

  • Additional false positive testing against a broader range of common workloads before production rollout
  • Extending the testing window for enforcement changes to allow for longer observation periods
  • Staged rollout, by tier, to mitigate impact across customer cohorts
  • Adding safeguards to prevent enforcement systems from targeting known legitimate process types

We apologize for this outage and are actively working to prevent similar issues from happening again. We understand that reliability is central to your workflow, and this disruption is unacceptable.