Avatar of Mahmoud Abdelwahab
Mahmoud Abdelwahab

Deploy autoscaling services, AI Workflow automation, and LLM APIs Without Kubernetes

Kubernetes solves a specific class of problems: orchestrating containers across nodes, managing service discovery at scale, handling rolling deployments across distributed systems. These are real concerns for organizations running hundreds of services across multiple regions with dedicated platform teams.

Most teams adopting Kubernetes do not have these problems. They have a handful of services, a small engineering team, and a need to deploy containers reliably. Kubernetes gives them that, but it also gives them YAML configuration sprawl, control plane maintenance, networking abstractions, and an observability stack they need to assemble themselves. The operational surface area expands far beyond the original requirement.

The control plane alone requires understanding:

  • etcd for state storage
  • The API server for cluster interactions
  • The scheduler for pod placement
  • The controller manager for reconciliation loops

Managed Kubernetes services like EKS, GKE, and AKS abstract some of this, but you still own node management, networking configuration, and the entire application layer above the cluster.

The question worth asking before adopting Kubernetes: what are you actually trying to accomplish? If the answer involves deploying containers, scaling them, setting up CI/CD, and having visibility into what is running, there are simpler paths.

This post covers common scenarios where developers reach for Kubernetes and walks through how to accomplish the same outcomes with less infrastructure overhead.

The appeal of Kubernetes for microservices is understandable. You get container orchestration, service discovery, and a consistent deployment model across services. The cost is managing the control plane, configuring ingress, writing deployment manifests, and maintaining Helm charts or Kustomize overlays.

For an API/service, the actual requirements are typically:

  1. Build a container from source
  2. Expose an HTTP endpoint
  3. Connect to a database or message queue
  4. Scale horizontally when load increases

On Kubernetes, each of these requirements translates to configuration:

RequirementKubernetes Implementation
Build from sourceCI pipeline → Registry → Image pull
Expose endpointService + Ingress + cert-manager
Database connectionSecrets (base64, not encrypted) or external secrets operator
Horizontal scalingmetrics-server + HorizontalPodAutoscaler

Each component is well-documented. Each component also introduces failure modes. The ingress controller can misconfigure routing rules. Secrets can fall out of sync. HPA can thrash between replica counts if thresholds are too sensitive. Debugging these issues requires understanding how each component interacts with the others.

Container platforms like Railway provide a different model. Instead of assembling orchestration primitives yourself, you deploy applications to managed infrastructure that handles builds, networking, scaling, and observability as platform features.

Railway runs long-lived containers with usage-based billing. You can deploy virtually anything that runs in a container: web APIs, background workers, WebSocket servers, cron jobs, databases, message queues, or custom binaries. The platform supports any language or framework—Node.js, Python, Go, Rust, Java, .NET, Ruby, PHP—with no vendor-specific runtime constraints.

The value proposition is operational simplicity. You push code; the platform handles everything between your repository and a running service. For teams that need to ship products rather than manage infrastructure, this tradeoff often makes sense.

Push to a Git repository and Railway builds the container using Railpack, a custom builder that detects your project type and generates a build plan automatically—or uses a Dockerfile if one is present. The service receives an internal hostname automatically, formatted as service-name.railway.internal, which other services in the same project can resolve. Public endpoints are provisioned on demand with TLS termination handled by the platform.

Environment variables and secrets can be stored per-service or per-environment. There is no separate secrets management layer to configure. Railway also supports reference variables, which let services consume configuration from other services dynamically:

DATABASE_URL=${{Postgres.DATABASE_URL}}
REDIS_URL=${{Redis.REDIS_URL}}
API_URL=${{api.RAILWAY_PRIVATE_DOMAIN}}

References resolve at runtime. If you rotate database credentials, swap to a different Postgres instance, or rename a service, the updated values propagate automatically to every service that references them. On Kubernetes, this kind of change typically requires updating secrets in multiple places and restarting dependent pods. Reference variables eliminate that manual synchronization.

Scaling is a configuration value in the dashboard: set a replica count and resource allocation, and Railway handles distribution across its infrastructure.

The tradeoff is flexibility. Kubernetes lets you define exactly how pods are scheduled, how health checks are evaluated, how traffic is routed between service versions. Railway makes those decisions for you. Pod affinity, topology spread constraints, custom readiness probes with specific thresholds: these are not exposed because the platform assumes sensible defaults. For most microservice architectures under a certain scale, those defaults are appropriate. When they are not, Kubernetes becomes the right tool.

Moving a containerized API to production typically involves several pieces:

  • A container registry
  • A CI pipeline that builds and pushes images
  • A deployment mechanism that pulls the new image and rolls it out

On Kubernetes, the typical CI/CD flow looks like this:

┌─────────┐     ┌─────────┐     ┌─────────┐     ┌─────────────┐
│  Push   │────▶│  CI     │────▶│Registry │────▶│  Cluster    │
│  Code   │     │  Build  │     │  (ECR)  │     │  (kubectl)  │
└─────────┘     └─────────┘     └─────────┘     └─────────────┘
                    │                                  │
                    ▼                                  ▼
              ┌───────────┐                    ┌─────────────┐
              │  Secrets  │                    │   ArgoCD/   │
              (credentials)                    │    Flux     │
              └───────────┘                    └─────────────┘

The CI pipeline typically:

  1. Builds the image
  2. Tags it with the commit SHA or a semantic version
  3. Pushes to the registry
  4. Updates the deployment manifest or Helm values file with the new tag

This update then needs to propagate to the cluster. If you are using GitOps with ArgoCD or Flux, the controller detects the manifest change and applies it. If you are using kubectl directly from CI, you need cluster credentials stored as CI secrets with appropriate RBAC permissions.

Rollbacks require either reverting the manifest to a previous image tag, using Kubernetes' native rollout undo (which depends on the revision history limit you configured), or having a more sophisticated deployment strategy like canary releases through Flagger or Argo Rollouts.

For an API or any Docker-based application, Railway collapses this:

┌─────────┐     ┌─────────────────────────────────────────────┐
│  Push   │────▶│                 Railway                     │
│  Code   │     │  (build, deploy, scale, route, TLS)         │
└─────────┘     └─────────────────────────────────────────────┘

Connect a repository, and pushes to the main branch trigger builds and deployments automatically. Services can also deploy from pre-built container images instead of source code. Railway supports Docker Hub, GitHub Container Registry, Quay.io, and GitLab Container Registry.

When you configure a service to track an image, Railway monitors the registry for new versions. For versioned tags like nginx:1.25.3, an update stages the new version. For tags like nginx:latest, Railway redeploys to pull the latest digest. Automatic updates can run on a schedule with a defined maintenance window.

┌─────────────┐      ┌─────────────┐      ┌─────────────┐
│  Your CI    │─────▶│  Registry   │─────▶│   Railway   │
│  (builds)   │      │  (stores)   │      │  (deploys)  │
└─────────────┘      └─────────────┘      └─────────────┘

This accommodates teams that build images in existing CI pipelines, use third-party software distributed as containers, or want to separate the build step from the deployment platform. It’s also possible to deploy from a Private registry by configuring authentication credentials in your service settings.

Preview environments spin up automatically for pull requests. Each PR gets an isolated deployment with its own URL, connected to either shared or isolated backing services depending on your configuration. When the PR closes, the preview environment is torn down. This is functionality that requires significant setup on Kubernetes: you need a system like Argo CD ApplicationSets or a custom controller that watches GitHub webhooks and provisions namespaces with the appropriate resources.

Rollbacks on Kubernetes depend on how you manage deployments. Native kubectl rollout undo works if you have not exceeded your revision history limit. GitOps tools like ArgoCD require reverting the manifest in Git and waiting for sync. Helm rollbacks restore the previous release but can conflict with manual changes applied outside Helm. Each approach has edge cases.

Railway retains the full deployment history for each service. Every push creates an immutable deployment with its container image preserved. Rolling back means selecting a previous deployment and promoting it. There is no rebuild, no CI pipeline to re-run, no manifest to revert. The previous image is already available and deploys immediately.

Deployment History
──────────────────
#47  ← current (live)
#46  ← rollback target (one click)
#45
#44
...

This model provides rollback safety without configuration. If a deployment introduces a regression, you revert to the last known-good state in seconds rather than minutes.

Railway supports health checks that gate traffic routing. You define an endpoint, and Railway waits for a successful response before directing traffic to a new deployment. If the health check fails, the deployment is marked as failed and traffic continues flowing to the previous version.

New deployment
     │
     ▼
Health check ──── fail ───▶ Deployment marked failed
     │                      Traffic stays on previous version
     │
   pass
     │
     ▼
Traffic routed to new deployment

This is comparable to Kubernetes readiness probes, but without writing probe configuration in your deployment manifest. The platform handles the routing logic. Combined with instant rollbacks, this means bad deployments are caught before users see them, and recovery is a single click if something slips through.

On Kubernetes, autoscaling requires assembling multiple components:

ComponentPurpose
metrics-serverProvides CPU/memory utilization
HorizontalPodAutoscalerDefines scaling thresholds
Prometheus AdapterRequired for custom metrics
PrometheusCollects application metrics

You write HPA manifests specifying target CPU utilization or custom metrics, configure the metrics pipeline, and tune thresholds to avoid thrashing between replica counts. Vertical scaling requires the Vertical Pod Autoscaler, which has its own set of configuration and limitations.

On Railway, autoscaling is configured through the dashboard. You set replica counts and resource limits per service.

The platform handles load distribution across replicas. There is no metrics-server to deploy, no HPA resources to write, no custom metrics adapters to configure.

The mechanism here is opinionated integration rather than composable primitives. You lose the ability to define custom scaling metrics or implement sophisticated scaling strategies. You gain working horizontal scaling without the configuration overhead.

Railway operates dedicated infrastructure across four regions: US West (California), US East (Virginia), EU West (Amsterdam), and Southeast Asia (Singapore). Each service runs in a chosen region with consistent latency.

For horizontal scaling, you deploy replicas of a service. Each replica receives the full resource allocation and can be placed in any available region. Railway's routing layer balances public traffic across replicas within each region automatically.

┌─────────────────────────────────────────────────────────────────┐
│                     Multi-Region Deployment                     │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│   US West          US East          EU West        SE Asia      │
│   ┌──────┐        ┌──────┐        ┌──────┐        ┌──────┐     │
│   │ API  │        │ API  │        │ API  │        │ API  │     │
│   │ (2x) │        │ (2x) │        │ (2x) │        │ (1x) │     │
│   └──────┘        └──────┘        └──────┘        └──────┘     │
│       │               │               │               │         │
│       └───────────────┴───────────────┴───────────────┘         │
│                    Automatic regional routing                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

To run globally, you deploy the same service across multiple regions. Incoming requests route to the nearest healthy replica. This provides geographic distribution without configuring Kubernetes federation, managing multiple clusters, or setting up global load balancers.

Services scale vertically up to 32 vCPUs and 32 GB RAM on the Pro plan. If a workload needs more capacity than a single instance provides, horizontal scaling with replicas is the path forward.

On Kubernetes, observability is a separate infrastructure project. You deploy Prometheus for metrics collection, Grafana for dashboards, Fluentd or Fluent Bit for log aggregation, and Jaeger or Tempo for tracing. Each component requires configuration, storage, and maintenance. When something breaks in your observability stack, you lose visibility into everything else.

┌─────────────────────────────────────────────────────────────────┐
│              Kubernetes Observability Stack                     │
├─────────────────────────────────────────────────────────────────┤
│  Metrics          Prometheus → Grafana                          │
│                   ├── ServiceMonitor CRDs                       │
│                   ├── Alertmanager configuration                │
│                   └── Persistent storage for TSDB               │
├─────────────────────────────────────────────────────────────────┤
│  Logs             Fluentd/Fluent Bit → Elasticsearch/Loki       │
│                   ├── DaemonSet on every node                   │
│                   ├── Parser and filter configuration           │
│                   └── Index lifecycle management                │
├─────────────────────────────────────────────────────────────────┤
│  Traces           Jaeger/Tempo                                  │
│                   ├── Collector deployment                      │
│                   ├── Sampling configuration                    │
│                   └── Storage backend                           │
└─────────────────────────────────────────────────────────────────┘

Railway provides observability as a platform feature rather than an infrastructure project:

CapabilityKubernetesRailway
Log aggregationDeploy and configure Fluentd/LokiAutomatic capture from stdout/stderr
Log searchSet up Elasticsearch or Loki queriesLog Explorer with structured filtering
MetricsPrometheus + Grafana + ServiceMonitorsBuilt-in CPU, memory, disk, network metrics
Metric retentionConfigure TSDB storage30 days included
AlertingAlertmanager configurationConfigure thresholds on dashboard widgets
Cross-service viewAssemble dashboards manuallyObservability Dashboard per environment

Logs are captured automatically from standard output. If you emit structured JSON logs, Railway parses and indexes the fields for filtering. The Log Explorer supports queries like @level:error to find all error-level logs or @user_id:4821 to trace a specific user's requests across services.

Metrics for CPU, memory, disk usage, and network traffic are collected per service with no instrumentation required. For services with multiple replicas, you can view metrics as a combined total or per replica. Monitors let you configure alerts when metrics exceed thresholds, sending notifications via email or webhooks to Slack and Discord.

The tradeoff is customization. Kubernetes lets you define exactly what metrics to collect, how to aggregate them, and how to visualize them. Railway provides a fixed set of platform metrics. If you need application-level metrics (request latency percentiles, business KPIs), you would still instrument your application and send data to an external provider or build custom logging.

Tools like n8n are appealing precisely because they reduce reliance on external services. Self-hosting them, however, introduces the infrastructure burden those external services were abstracting away.

On Kubernetes, running n8n requires:

  • Deployment manifest
  • PersistentVolumeClaim for workflow data
  • Ingress controller for external access
  • Network connectivity to databases and external APIs
  • Logging infrastructure for observability

Persistent storage on Kubernetes requires understanding storage classes and provisioners:

ProviderStorage ClassesZone Constraints
AWSgp2, gp3, io1Volume bound to single AZ
GCPpd-standard, pd-ssdVolume bound to single zone
Azuremanaged-premiumVolume bound to single zone

The persistent volume claim binds to a volume in a specific availability zone, which constrains pod scheduling. If the node in that zone goes down, your pod cannot reschedule to a different zone until the volume is manually migrated or you have configured a storage solution that replicates across zones.

For n8n specifically, the persistent volume stores:

  • Workflow definitions
  • Execution history
  • Credentials and API keys

Losing this data means losing your automation setup. Backup strategies on Kubernetes typically involve either VolumeSnapshot resources (if your storage class supports them) or application-level backups using n8n's export functionality triggered by a CronJob.

Ingress configuration determines how external webhooks reach your n8n instance. Workflows that respond to external events need a stable, publicly accessible URL:

External Webhook                    n8n Pod
      │                                │
      ▼                                │
┌──────────┐    ┌──────────┐    ┌─────────────┐
│   DNS    │───▶│  Ingress │───▶│   Service   │───▶ n8n
│          │    │Controller│    │  (ClusterIP)│
└──────────┘    └──────────┘    └─────────────┘
                     │
                     ▼
              ┌─────────────┐
              │ cert-manager│
              │   (TLS)     │
              └─────────────┘

The Railway templates marketplace features an n8n template that provisions the application with persistent storage attached automatically. The storage persists across deployments and is backed up at the platform level. You do not select a storage class or configure volume claims.

What you get out of the box:

  • Persistent filesystem at a specified path
  • Logs streaming to the dashboard
  • Railway-provided domain with automatic TLS
  • Custom domain support with automatic certificate provisioning

Updates follow the same Git-based flow as any other Railway service. If you are using the template, pulling upstream changes and pushing to your repository triggers a rebuild. There is no kubectl rollout, no Helm upgrade, no concern about whether the persistent volume will successfully reattach to the new pod.

The operational difference is significant. On Kubernetes, you are responsible for the container and everything beneath it: storage, networking, certificate management, logging infrastructure. On Railway, you are responsible for configuring n8n itself.

Running inference workloads has different constraints than typical web services. Models are large, startup times are slow, and resource requirements (particularly GPU access) do not fit neatly into standard container orchestration patterns.

FastChat, Ollama, and similar tools let you expose a GPT-compatible API backed by open-weight models. The OpenAI API format has become a de facto standard, meaning applications built against OpenAI can switch to self-hosted inference by changing a base URL and API key.

On Kubernetes, hosting these tools involves:

┌─────────────────────────────────────────────────────────────────┐
│                    GPU Node Configuration                       │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Pod Spec                      Node Requirements                │
│  ├── nodeSelector:             ├── GPU node pool                │
│  │   gpu: nvidia               ├── nvidia device plugin         │
│  ├── tolerations:              ├── CUDA drivers                 │
│  │   - nvidia.com/gpu          └── dcgm-exporter (metrics)      │
│  └── resources:                                                 │
│      limits:                                                    │
│        nvidia.com/gpu: 1                                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Key considerations for LLM workloads on Kubernetes:

ConcernImplication
GPU costBilled while running, regardless of utilization
Spot instancesLower cost but can terminate mid-request
Cold start30-60s for 7B models, minutes for larger
Model storagePVC or S3 mount for model weights
Memory requirements7B fp16 ≈ 14GB VRAM

Model loading is a significant consideration. If your Kubernetes deployment scales to zero to save costs, every scale-up event incurs a cold start. Node selectors and tolerations in your pod spec ensure pods land on GPU-equipped nodes. Misconfiguration means your inference pods schedule to CPU nodes and fail to start.

Railway runs containers with configurable resource allocations. For inference workloads that fit within CPU and memory bounds, deployment is identical to any other container.

Quantized models provide a CPU-compatible option:

Model SizeQuantizationRAM RequiredTokens/sec (CPU)
7B4-bit (Q4)~4GB2-5
7B8-bit (Q8)~8GB1-3
13B4-bit (Q4)~8GB1-2

This is not suitable for high-volume production inference, but it works for internal tools, development environments, and applications where latency tolerance is higher.

For these CPU-compatible workloads, Railway eliminates the node management and GPU scheduling complexity. You configure the memory allocation your model needs, deploy, and the API is accessible. There is no cluster to provision, no node pools to configure, no nvidia device plugin to install.

Railway also supports TCP proxies for non-HTTP protocols. If your inference API uses gRPC (common for high-performance inference serving), the service can expose both HTTP and TCP endpoints simultaneously.

The constraint is GPU availability. If your inference workload requires dedicated GPU access for acceptable latency, Railway may not be the right fit today. For lightweight models that run on CPU, or for prototyping before moving to dedicated GPU infrastructure, the deployment simplicity is worth considering.

Kubernetes is not unnecessary complexity. It is complexity appropriate to a certain class of problems.

When Kubernetes is the right choice:

  • Multi-cloud deployments requiring a consistent abstraction layer
  • Platform teams building internal developer platforms for product teams
  • Compliance requirements mandating network policies, pod security standards, or audit logging
  • Multi-tenancy with different trust boundaries between teams or customers

The primitives Kubernetes provides for these use cases (namespaces, resource quotas, RBAC, network policies) exist because large organizations need them. Managed platforms like Railway may not expose the same level of control.

The question is whether those are your problems. For a team shipping a product rather than building a platform, the answer is often no.

Railway provides templates for common self-hosted tools including n8n, and supports any container you can build from a Git repository or Dockerfile. The fastest way to evaluate whether it fits your use case is to deploy something and see how it behaves.