Incident Report: February 11, 2026

Q: What Happened?

Railway maintains a threat model system for analyzing and preventing abuse. At 9:12 UTC, we identified a new abuse pattern. We rolled out a mitigation heuristic, dry-ran it, and then initiated a staged, fleet-wide rollout.

Railway experienced an outage caused by a misconfigured automated abuse enforcement system that incorrectly terminated legitimate user deployments.

After a new abuse pattern was detected, we rolled out a new anti-fraud ruleset that unfortunately delivered a false positive for legitimate workloads leading to workloads being put into a “forced pause” state where they were unable to start.

In total, <3% of our fleet was impacted during this staged rollout.

This resulted in unexpected SIGTERM signals being sent to active workloads. This included databases such as Postgres and MySQL, causing service disruptions across the platform. After we received reports, we began working on restoring access for all affected customers.

The way this manifested was a subset of customers found that no end user can access their workload, or access a affected workload (such as a DB) which manifests as an outage. Making the implicit, explicit, we treat workload access outages critically internally and the platform team immediately began triaging the issues when we received reports.

When a Major Outage occurs, it is Railway's policy to share the public details of what happened.

Impact

This incident caused disruptions for around 3% of services across the platform. Affected users experienced:

Roughly 3% of workloads were inaccessible to end customers
Applications and databases taken offline due to erroneous SIGTERM signals
Deployment statuses on the dashboard inaccurately reflecting the state of services. Previously terminated workloads continued to appear as active
Networking errors due to services trying to contact services that were no longer online
Slower deployment times due to back pressure when users were redeploying services to restore access

Incident Timeline

On February 11, 2026:

9:12 UTC - We identified some additional fraud bypasses in our existing automated systems
9:22 UTC - After analysis, we determined that there were only internal elevated metrics and no customer impact. On call began implementing additional automated checks
12:35 UTC - We performed a dry run of an update to our automated abuse detection system
14:33 UTC - We moved from dry run to live run on our automated abuse detection system, beginning an automated staged rollout.
14:47 UTC - On-call engineers noticed abnormal shutdown of deployments causing outages in customer environments
14:56 UTC - We reverted the changes causing abnormal shutdown of deployments
15:07 UTC - Incident declared on our Status Page
15:15 UTC - Revert begins on affected hosts, partial recovery begins
16:31 UTC - Full reversion confirmed, all deployments restart abilities restored
17:29 UTC - Automated recovery for all impacted deployments started
18:31 UTC - Automated recovery for all impacted deployments completed

The full incident is available on our Status Page here

What Happened?

Railway maintains a threat model system for analyzing and preventing abuse. At 9:12 UTC, we identified a new abuse pattern. We rolled out a mitigation heuristic, dry-ran it, and then initiated a staged, fleet-wide rollout.

After the rollout was complete, engineers noticed the enforcement logic was overly broad in its targeting criteria. Rather than isolating only the intended workloads, the system incorrectly matched certain legitimate user processes, including some databases and application services. As a result, the enforcement system sent SIGTERM signals to legitimate user workloads.

All impacted user workloads were automatically restarted. A sudden surge in active deployments caused slower than normal deployment times across the platform. Manually triggered redeploys were delayed due to this spike.

Preventative Measures

The enforcement system was tested extensively in staging prior to rollout and showed no adverse impact. It was dry run in production and showed correct and accurate abuse identification. Only when turned on, via staged rollout in production, did false positives end up being observed.

Going forward, Railway is tightening the review and rollout process for any changes to our fraud model that leverage workload fingerprinting.

From here on, Railway is implementing the following changes:

Additional false positive testing against a broader range of common workloads before production rollout
Extending the testing window for enforcement changes to allow for longer observation periods
Staged rollout, by tier, to mitigate impact across customer cohorts
Adding safeguards to prevent enforcement systems from targeting known legitimate process types

We apologize for this outage and are actively working to prevent similar issues from happening again. We understand that reliability is central to your workflow, and this disruption is unacceptable.

Incident Report: February 11, 2026

Impact

Incident Timeline

What Happened?

Preventative Measures

Continue Reading...

Your train has arrived!