Incident Report: November 25th, 2025

We recently experienced an outage that impacted deployments and parts of the dashboard.

When a Major Outage occurs, it is Railway’s policy to share the public details of what happened.

Impact

This incident affected our task queue system. All deployments across Free, Trial, and Hobby were temporarily paused. Pro deployments continued, but with delays. Service-level actions, such as configuration changes, environment creation, and deployment removals, were also impacted.

All running deployments and platform-level features remained online throughout this period. Users who didn't interact with the dashboard or trigger a new deploy during the incident window experienced no disruption.

Incident Timeline

On November 25th, 2025:

22:47 UTC - Engineers were paged as deploy throughput rate dropped sharply
22:50 UTC - We observed elevated error rates on multiple systems dependent on the task queue
23:04 UTC - Free and Trial deployments temporarily disabled
23:10 UTC - Paused hobby deployments to alleviate back pressure and prevent extra load
23:16 UTC - Issue identified and mitigation put in place
00:06 UTC - Several fixes were pushed to reallocate resources in an attempt to stabilize the system
00:21 UTC - Previously, queued deployments were being picked up and processed
00:48 UTC - Queue processing resumed and caught up. All delayed deployments finished successfully
00:50 UTC - Hobby deployments re-enabled
01:08 UTC - Free and Trial deployments re-enabled
01:22 UTC - Incident resolved

For further reference, please refer to the incident’s live updates on our Status Page on Deploys and configuration changes delayed.

What Happened?

Railway runs critical operations, such as deployments, configuration changes, and resource limit updates, through a task queue backed by Temporal. When you push code, update variables, or change service limits, a workflow is created in this queue to process that action.

Around 19:30 UTC, we observed GitHub API calls slowing to nearly 4x their usual p95 latency. This happened during peak deployment hours, and the slowdown cascaded into growing backlog of tasks in the queue and delayed processing. We saw no signs that GitHub's API itself was unhealthy. Workers handling GitHub API calls began consuming elevated resources, eventually hitting Out-Of-Memory failures and crashing. As those workers went offline, new tasks were shifted to the remaining workers, increasing load and intensifying system pressure.

With this elevated pressure, the remaining workers ended up crashing and new tasks began piling up faster than workers could process them. As additional workers came online, they were immediately overloaded by the backlog and hit Out-Of-Memory failures as well. Free and Hobby deployments were temporarily disabled to reduce strain on the task queue.

To reduce system load, worker CPU and memory was increased. We also adjusted worker parameters that were causing them to request more tasks than they could handle.

After these fixes, we saw gradual recovery. The task queue began clearing the backlog of deployments, and as load decreased, we re-enabled deployments in stages: first Hobby, then Trial and Free. After additional monitoring, we observed full recovery and declared the incident resolved.

Preventative Measures

We’re going to introduce several changes to prevent errors of this class from happening again:

We’ve implemented an auto-tuning algorithm that should prevent our fleet of workers from starving themselves if a similar thundering herd scenario appears again.
We’ve scaled our task queue’s resources up across the board to account for additional load.
We’re working to remove external API dependencies from critical workflows to limit the impact of similar outages.

We apologize for this outage and are actively working to prevent similar issues from happening again. We understand reliable deployments are central to your workflow, and this disruption is unacceptable.

Incident Report: November 25th, 2025

Impact

Incident Timeline

What Happened?

Preventative Measures

Continue Reading...

Your train has arrived!