Kill your onboarding: selling to 10,000+ new users a day
When I was at Looker (later Google, after the acquisition), I spent most of my time in rooms with people who had bought a BI tool and were trying to figure out what to do with it. I sold a lot in those rooms and the pattern was almost always the same.
The customer would describe a question they wanted to answer.
I'd ask what data they had.
They'd hesitate.
Sometimes they had a warehouse but no events. Sometimes they had events but no warehouse. Sometimes they had both. The question they wanted to answer was almost always reasonable. The infrastructure to answer it almost never existed yet. (Which was the whole Big Data wave from 2010 to 2020)
The lesson I took from four years of that is unglamorous… you need to gather and clean your data… The interesting analysis you want to run in 2026 depends on events you’re logging in 2024.
If you weren't, no amount of cleverness will work.
Recently, I joined Railway as a Solutions Engineer expecting to apply some version of this lesson to a smaller, scrappier stack. Instead I walked into the opposite problem.
Before any of this, what we were doing was the most common-denominator version of PLG-to-sales onboarding.
Our email on-boarding was last touched in 2022, and then 2024. Meaning the the email program looked like every other PLG email program that was attempting to convert sales leads.
For the 2.9 Million users that signed up before April 2026, you’d remember a generic welcome sequence fired on signup. "Welcome to Railway, here's a template, here's our docs, here's our Discord."
This had a rough open rate of around 27%. Reply rate was, generously, a rounding error.
On top of that, we had no idea who it was talking to. Every sales-ICP signup got treated the same. A hobbyist deploying a side project in a big org and a platform team standing up production infra got the exact same "here's a template!" email on day one.
We had a targeting problem dressed up as an email problem.
As such: we advise companies to kill their generic sales onboarding entirely.
It’s common to say that data is oil. (editors note: we think of data more as waste plutonium)
Well, my colleagues Angelo and Echo did spend time instrumenting the funnel but no one thought about it from sign up to closed-won since the product matured at it’s Enterprise offering.
Now, Railway gets about 10,000+ signups a day. And as a proud product-led company, you sign up with GitHub or an email and you can deploy something in minutes seconds. We deliberately don't ask "what company are you with?" or "how many engineers?"
There's a meme going around about Granola-style onboarding flows that interrogate you before letting you do anything; we are aggressively not that.
The tradeoff is obvious in retrospect: if you don't ask, you don't know. Real companies, the kind that will deploy production workloads get lost in the noise.
When opened Hex, I got the number that framed the whole project: we had identified roughly 21,000 accounts that fit our sales ICP (rough rubric: company size, market, region, funding stage). We had engaged less than 1% of them.
For context, Railway has one AE and now two SEs including me and no system existed to surface them in a way a tiny team could act on. (Esp. with the demos booked at capacity.)
In a 1:1, our founder Cooper put it well: "All the oil rigs are in the right place. Nobody's drilling."
This was the inverse of every Looker conversation I'd ever had. PostHog had been collecting product events for a long time. The signup metadata was rich.
Our internal system ("backboard") had a granular view of what was actually running in each project: services, instance sizes, templates, database connections, deploy outcomes. I then set up dbt models on top of the warehouse, with Hex sitting in front for analysis.
The right question, for me, turned out to be: which of these 10,000 daily signups are companies, and which of those companies need help right now?
There's a temptation, at this point, to sit in a room with the sales team and guess what a "company-ish" signal looks like.
Don't. You'll pick the things that flatter your priors. (At Looker I watched a lot of dashboards built this way; they were beautiful and confirmed everything the sponsor already believed.)
What I did instead, in Hex:
- Pull every dimension off our
customersandworkspacestables. - For each dimension or behavior - "connected a database," "reset credentials," "downloaded a trust document," "added a teammate," and compute a ratio: rate among accounts we already knew were companies vs. rate among accounts we knew were hobbyists.
- Sort by uplift.
Hex made this tractable because it has a registry of the schema and an LLM that can write the boilerplate, so I could iterate on prompts instead of writing the same GROUP BY fifty times. It is genuinely the part that would have taken weeks five years ago, when I was hand-rolling the same diagnostic queries in LookML.
Then I had to throw out a class of signals that looked amazing but were useless: anything deterministic. SSO usage, for example, has an absurdly high uplift… but only customers who've already had a sales conversation can turn it on.
So it tells you nothing about unmasking a stranger. 1 = 1
The same logic killed a few other tempting features. The rule I ended up with: a signal only counts if a hobbyist could plausibly trip it but rarely does.
For aspiring GTM Engineers, the features that look most predictive in a confusion matrix are often the ones where the label is leaking into the input.
Well… a few things, in roughly this order of strength:
- Trust center document downloads. This was by far the strongest probabilistic signal. Hobbyists do not download SOC 2 reports. When sales reaches out to someone who downloaded one, the response rate is around 50%, which is not a normal email response rate.
- Seat count and growth of seats. This was meaningful because Railway did remove seat pricing last year, so we can use this as a more predictive signal.
- Credential resets. This one surprised me. Best guess: stricter internal security policies and password rotation requirements. It could also be users hitting our auth flow weirdly. It’s worth more digging.
- Database connections to managed/external DBs. Makes sense companies plug in real data faster than hobbyists do.
- Specific deploy failure patterns. Less about "failed once" and more about "is wrestling with a real production-shaped problem."
None of these are individually decisive, so I put this in a score.
I weighted each signal by its uplift ratio and summed them into a per-account score. That's it. Pure classical data science with a linear combination of hand-picked features with weights derived from observed lift. No LLMs.
I considered something fancier and decided against it for two reasons:
- We were usually working with one day of behavioral data. Anything more sophisticated than a weighted sum would be overfitting to a sample size of "vibes."
- The score has to be legible to the sales team. If Rahul, our AE, is going to act on a high score, he needs to be able to look at the row and immediately understand why it's high. "Trust doc + 8 seats + DB connected" is legible. A 0.87 from a gradient-boosted tree is not.
I learned the same thing the hard way: the most accurate model in the world is worthless if the human who has to act on its output can't tell when it's wrong. Legibility beats marginal accuracy almost every time.
The score then drives two paths:
- Mid-tier scores → automated behavioral emails through Customer.io, sent from a real person's inbox.
- High scores → routed to Rahul (our AE) + Solutions for a human reach-out.
That's the whole architecture. …and I admit, it’s not clever. The cleverness, to the extent there is any, is in the signal selection.
The Customer.io campaign to our sales-ICP isn't a welcome sequence. Welcome sequences are how you get low open rates.
Instead, emails fire on events that meet two conditions simultaneously:
- The event is correlated with being a company.
- The event is correlated with the user being mildly stuck or making a real commitment right now.
The email goes out shortly after the event, from one of our actual sales people, and it offers help. It does not ask for a meeting. It does not pitch the enterprise tier. It says some version of "saw you hit X, anything I can do?"
On day one: with about 300–400 emails sent. Our open rates by trigger ranged from ~50% to ~70% in the first 24 hours, vs. the ~27% baseline of the old broad campaign.
The usual caveats apply, in roughly the order you're going to point them out in the comments:
- Open rate is partially gamed by Apple Mail Privacy Protection prefetching pixels. Reply rate is the number that matters, and we've had two real replies on day one.
- Selection bias is enormous. We're emailing the people most likely to want to talk to us, on the day they're most likely to want to talk to us. The honest comparison isn't "27% baseline," it's "what would these specific accounts have done with no email, or with a generic one." I don't have that A/B test yet. We'll set it up.
- N=300 is small. The variance on these open rates is wide.
I'm posting the numbers anyway, because the alternative, wait until we have a clean six-month cohort study, is how interesting things never get written about.
Now, hot take territory. The reason why you haven’t seen a lot of lift in GTM Engineering is because, well, we don’t train GTM Engineers.
The standard PLG-to-sales path is: hire Business Development Representatives (cold-callers, usually new grads), then have them prospect from LinkedIn and Apollo, hand qualified leads to AEs, repeat.
Those teams exist because most startups need outbound to grow at all. I built and ran that motion in different shapes at IBM and at Firebolt; it works, but it's expensive, and it's slow to ramp.
Railway grew through the product and through marketing. We didn't have to build that team to hit our current size.
Keep in mind, the pitch isn't "replace your sales team with a SQL query."
It's: if you're a PLG company and you already have product telemetry, you probably have an unreasonably valuable list of warm accounts you're not talking to, and the cost of starting the conversation is mostly an afternoon in Hex and a Customer.io account.
This is the part I would have given the Looker version of myself.
(Editor’s note: Railway does well if the businesses on Railway do well, so we share our insights for free for this reason.)
- Instrument event data on day one. PostHog or whichever competitor you prefer. The cost of not having historical events when you finally need them is high and asymmetric. We really like using PostHog here at Railway.
- Don't forget SaaS retention windows. Most behavioral tools (including Customer.io) keep ~90 days of event history. ETL it out to your warehouse. You for sure will want to ask questions in 18 months that the vendor's UI cannot answer.
- Brute-force your features before you model them. Compute your dimensions!
- Throw out deterministic signals. They will inflate your model's apparent accuracy and tell you nothing about the people you can't already see.
- Keep the score legible. A linear weighted sum that a salesperson can read is worth more than a black-box classifier.
The last piece, and the part I'm most excited about, is what happens after the email reply.
We built two internal applications to collapse that window.
The first wraps DocuSign. When Rahul finishes a call and agrees on terms, he doesn't open a template, hunt for the right legal entity, or ping ops to generate a redline. He fills out an order form inside our internal tool coded in-house maintained by our operations team. The second one ties the entitlement to our product. Thats a whole separate post.
That's the part I find genuinely interesting, and the part I think more PLG companies should steal: once you've got the data plumbing in place to find the right accounts, the marginal cost of using that same plumbing to close and provision them is small.
As we said, the oil rigs were already there. We finally sent that oil to the refinery.
If you are a highly motivated team looking to ship fast on a Cloud platform that keeps up with your agents: try Railway out at https://dev.new

