Avatar of Brody OverBrody Over

Incident Report: October 15th, 2025

We recently experienced an outage that impacted our dashboard and deployment pipeline.

When a Major Outage occurs, it is Railway’s policy to share the public details of what happened.

Between 20:49 UTC and 22:15 UTC, users experienced difficulties deploying new services and accessing the dashboard. Most of the user facing platform’s control systems were unavailable during this time, though running services and networking remained operational.

  • 20:49 UTC - A large deletion operation triggered a surge in internal requests, causing elevated error rates in our backend systems.
  • 20:49 UTC - The on-call was alerted and the team began investigating.
  • 21:06 UTC - We identified that the issue was caused by an unexpected overload on our network management systems.
  • 21:08 UTC - Deployments were temporarily paused to reduce pressure while we worked on mitigation.
  • 21:45 UTC - A fix has been implemented and deployed. We started monitoring it.
  • 22:00 UTC - The platform began to recover, and deployments were gradually re-enabled.
  • 22:15 UTC - The majority of users regained full access to the dashboard and deployment capabilities.
  • 23:05 UTC - Some residual issues affecting a few services were resolved after clearing outdated cache entries.

For further reference, please refer to the incident’s live updates on our Status Page on Network control plane outage.

We’re implementing additional safeguards to prevent similar incidents, including:

  • Improved rate limiting of our internal APIs.
  • Better visibility and alerting around internal request patterns.
  • Making the network control plane overall more resilient.

Railway is committed to providing a best-in-class cloud experience. Any inability to use the platform is unacceptable to us. We apologize for any inconvenience caused by this, and we will work to eliminate the entire class of issues contributing to this incident.