The Vanilla Cloud Tax: What Rolling Your Own on AWS Actually Costs
The phrase "we'll just use AWS direct" has done more damage to engineering team capacity in the last decade than any single technical decision. It is the default-because-default choice, the answer that sounds responsible, the choice almost nobody gets fired for. It is also, for most teams, the most expensive answer they will give.
I've watched this play out from two very different vantage points. Before Railway, I was at Citrix, where I lived in the kind of customer environments where "just use the cloud" is not a real option: Verizon, Lockheed, the names that come with audit binders and air-gapped staging clusters. Those teams had dedicated platform organizations, three-letter compliance acronyms taped to the wall, and the staffing to make a vanilla cloud posture work. They paid the tax knowingly, and they got something for it.
Now, as a Solutions Engineer at Railway, I spend my days talking to the other 95% of the market. Twenty-engineer startups. Forty-engineer growth-stage teams. Series B companies who hired a "DevOps person" eighteen months ago and now wonder why that person ships product code roughly once a quarter. These teams are paying the same tax those Lockheed environments paid, except they're paying it accidentally, with a much smaller team, and they're getting nothing for it except a CloudWatch dashboard nobody looks at.
The vanilla cloud tax is not a single line item. It's four of them, and most teams underbudget every one. Let's go through them.
The four components of the vanilla cloud tax
Setup engineering: one engineer-quarter, minimum
Most teams never write this part down before they start. A clean, production-ready AWS footprint, the kind you'd be comfortable putting a paying customer's workload on, takes one experienced infrastructure engineer roughly a quarter of focused work. If your team is learning AWS as they go, double or triple it. I've seen this take a full year at teams who insisted it would take "a few sprints."
What does that quarter contain? Roughly:
- VPC and networking design. Public and private subnets across at least two availability zones. NAT gateways, and the awareness that they cost $45/month each, per AZ, before a single byte flows through them. Route tables, peering if you have more than one VPC, VPC endpoints for the services where egress through NAT would otherwise eat you alive.
- IAM, done correctly. Not the "AdministratorAccess to everyone, we'll fix it later" version. Role-per-service, least privilege, cross-account boundaries if you have separate dev/staging/prod accounts (which you should), and a strategy for human access that isn't long-lived access keys in someone's dotfile.
- Compute. ECS Fargate if you want to skip the EC2 plane, EKS if you've decided Kubernetes is worth the operational surface area, plain ECS if you like a middle ground. Each of these has a learning curve measured in weeks, not days.
- RDS, properly. Multi-AZ, automated backups with a retention policy you've actually thought about, a read replica if your workload warrants it, parameter groups tuned for your engine, and a disaster recovery runbook that someone has tested.
- Secrets management. Secrets Manager or Parameter Store, rotation policies, injection into your runtime in a way that doesn't end up in CloudTrail or a deploy log.
- Observability. CloudWatch logs and metrics are the floor. Most teams end up adding Datadog, Grafana, or an equivalent on top because raw CloudWatch is unpleasant for application logs. Alarms, SNS topics, PagerDuty integration. None of this is hard individually; all of it is tedious.
- CI/CD. A pipeline that builds, tests, and deploys to dev, staging, and prod. ECR for images. Deployment automation that handles blue/green or rolling, with the ability to roll back when something is on fire at 2am.
- The unglamorous stuff. Security group hygiene, the difference between "0.0.0.0/0 because it works" and a posture you can defend in a SOC 2 audit. Cost allocation tags, applied uniformly, so finance can answer "which team spent the money." A bastion or SSM Session Manager setup so engineers can get into things when they need to.
Fully-loaded cost of a senior infrastructure engineer in 2026 is somewhere around $250k/year all-in: salary, equity, benefits, the laptop. One quarter is $62k+ of engineering time before you've shipped a single feature on top of it. If you're learning as you go, it's $125k to $200k.
Ongoing operational time: 15 to 25% of a senior engineer, forever
The setup cost is the part teams remember. The operational cost is the part that quietly metastasizes.
Once your AWS footprint exists, it requires care. Not in the romantic "infrastructure is a craft" sense, but in the boring sense that someone has to be on the hook for it. Conservatively, this is 15 to 25% of a senior engineer's capacity, in perpetuity. At a small team, that's an entire day a week, every week, that someone with senior product-engineering judgment is spending on platform work instead of shipping.
What does that time go to? In no particular order:
- Pager duty for infra issues. RDS storage hits 90%. The NAT gateway in one AZ is misbehaving. An ECS task is OOM-looping in a way that doesn't trigger your existing alarms.
- IAM debugging when something breaks. "Why can't this service write to this bucket" is an evergreen question, and the answer is almost always a four-hour rabbit hole through trust policies, resource policies, and SCPs.
- Cost monitoring. Someone has to notice when egress bills spike, when a forgotten dev environment is burning $2k/month, when a Postgres replica was provisioned in the wrong AZ and is paying cross-AZ transfer.
- Patching, rotation, and upgrades. RDS minor version upgrades. EKS Kubernetes version bumps: the AWS-managed control plane handles the masters; you handle the data plane, the addons, and the migrations.
- Security findings remediation. GuardDuty, Security Hub, your SOC 2 auditor's pen tester. There is always a backlog.
- SDK and runtime migrations. AWS SDK v2 to v3 was a real, multi-week project for a lot of teams. The next one is coming.
That 20% of a senior engineer is another $50k/year, every year, that doesn't show up on your AWS bill but does show up on your payroll.
The support contract you forgot to budget
This one is small in absolute dollars but disproportionately painful when you discover it. AWS Basic support (the free tier) does not get you a human being on a bad day. It gets you a documentation portal and the AWS forums.
The minimum tier that gets you a human is Business support, and it scales with your bill: 10% of your monthly AWS spend, with a floor around $100/month and a ceiling that gets large quickly. At a small startup spending $2k/month on AWS, that's $200/month, or $2,400/year. At a growth-stage company spending $50k/month, you're at $5,000/month, or $60k/year, to have someone to call.
Without it, "RDS is unresponsive and I don't know why" is a forum post. With it, it's a chat with a support engineer inside fifteen minutes. Teams that skip the support contract to save money discover its value during the exact incident where they cannot afford to discover it.
The opportunity cost of platform work vs product work
This is the least-budgeted line of all, and it's the one I care about most, because it's the one that determines whether your company exists in two years.
Every hour your senior engineers spend on AWS plumbing is an hour they are not spending on the product. At a 20-engineer team, your senior engineers are the leverage points. They are the ones who can architect the feature that wins the enterprise deal, who can debug the performance issue that's bleeding your retention, who can mentor the four mid-level engineers who joined last quarter. When they're rotating IAM policies and arguing with CloudFormation, none of that is happening.
I've seen this concretely. A team I worked with spent six months building out their own deployment platform on EKS because "we'll save money in the long run." During those six months, a competitor with half their headcount on a real PaaS shipped three features they'd had on their roadmap, signed two of their target customers, and closed a Series A on the strength of it. The first team is still on EKS. They saved an estimated $3k/month on hosting. They lost the market.
This is the bet you are making, implicitly, when you choose vanilla cloud at a stage where you don't need to. You are betting that the platform work is more valuable than the product work, on a team where the product work is the only thing keeping you alive.
Vanilla cloud as the right answer
I'd be a bad solutions engineer if I didn't tell you when to ignore this entire essay. Vanilla cloud is the right choice in four cases:
- You have a dedicated platform engineer. Not a generalist who "also does DevOps." Someone whose entire job is the platform, whose performance reviews are about the platform, who wakes up thinking about the platform. If you have that person, and ideally a small team around them, the math changes.
- You have a regulated compliance posture that demands your own VPC. FedRAMP High, certain HIPAA configurations, PCI environments where the auditor wants to see your networking, defense-industrial customers who require deployment into their tenancy. If your customers' contracts include the phrase "tenant-isolated infrastructure," you're on AWS direct, and a PaaS likely can't help you.
- You have exotic workloads. Multi-thousand-GPU training clusters with specialized interconnect. Workloads with bare-metal latency requirements. Custom kernel modules. Anything where the PaaS abstraction gets in the way of the thing you're trying to do.
- You have a real Enterprise Discount Program. If you've committed to $10M+ of cloud spend over three years and you're getting 30%+ off list, the math on your bill is materially different, and you need someone (probably that dedicated platform engineer) figuring out how to spend the commit.
If you are not in one of those four buckets, the rest of this essay applies to you.
The PaaS premium math
At a glance:
Let's do the numbers, because teams who roll their own AWS have rarely done the comparison honestly.
A 20-engineer team on a Pro-tier PaaS, at $20/seat/month, is paying $400/month, or $4,800/year. Add the underlying infrastructure spend (let's call it $5k/month for a growth-stage product), and you're at roughly $65k/year all-in.
Compare against the vanilla cloud posture for the same team:
- One quarter of a senior engineer for setup: $62k, one-time, plus probably another $30k of overrun.
- 20% of a senior engineer in perpetuity: $50k/year.
- AWS Business support: roughly $6k/year at this scale.
- The infrastructure spend itself: $5k/month, same as above, $60k/year.
Year one, vanilla cloud: $62k + $30k + $50k + $6k + $60k = $208k. Year two onward: $116k/year.
Year one, PaaS: $65k. Year two onward: $65k.
The PaaS pays for itself, in cash, in about 90 days. And that's before we count the opportunity cost; if you believe (as I do) that the senior engineer time saved is worth two to three times its fully-loaded cost in shipped product, the gap widens by another order of magnitude.
Closing
Most teams are paying the vanilla cloud tax because they think "free AWS" means "free." It doesn't. It means "the bill is on my payroll instead of on my AWS account," which is worse, because the bill on your payroll is paid in the currency you can least afford to spend: the attention of the people who could be building your product.
Pick the right tool for the right job. If you have a dedicated platform engineer, a regulated tenancy requirement, exotic workloads, or a committed-spend EDP, run your own AWS. You'll get something for the tax you're paying.
For everyone else, in 2026, the right tool is a real PaaS. Give yourself the quarter back. Give your senior engineers their day-a-week back. Spend it on the product, on the customers, on the things that determine whether you're still here in two years.
Happy shipping.
Angelo
Angelo Saraceno is a Solutions Engineer at Railway. Before Railway he was at Citrix, working inside Verizon and Lockheed environments, so he has seen what "enterprise IaaS" looks like after the slides come down. He writes about infrastructure, deployment, and the gap between how cloud is sold and how it runs in practice.