Infrastructure Scaling Framework

A decision framework for right-sizing your startup infrastructure

Learn when to scale, when to stay simple, and how to avoid the complexity trap that kills startups. This framework helps you make infrastructure decisions based on reality, not hypotheticals.

Want to learn more about the philosophy behind this framework? Read our related blog post: Right-Size Your Infrastructure: Avoiding the Complexity Trap

The Core Question

"Is this infrastructure decision driven by paying customers, or hypothetical scenarios?"

Decision Gates

Must pass ALL gates before scaling infrastructure:

Gate	Question	Red Flag
Revenue Gate	Do we have paying customers demanding this?	"We might need this for future customers"
Cost/Revenue Ratio	Is infra cost < 20% of revenue?	Infra costs growing faster than revenue
Complexity Trigger	What specific problem does this solve?	"It would be nice to have" or "best practice"
Reversibility Check	Can we undo this in < 2 weeks?	Architectural decisions that lock us in

Team-Size Infrastructure Ceilings

Team Size	Max Monthly Infra	Recommended Complexity
Pre-PMF (1-10)	$1,000-2,000	Managed services only (CloudRun, Fargate, RDS)
Early Traction (10-25)	$3,000-5,000	Single-region, single-tenant simplicity
Scaling (25-50)	$5,000-15,000	Multitenancy required before multi-region
Growth (50+)	Revenue-justified	EKS/GKE only if ops team exists

The "Why Kubernetes?" Litmus Test

Before adopting K8s, you must answer YES to at least 3:

Do we have 10+ microservices that need orchestration?

Do we have dedicated ops/platform engineers?

Is our monthly container spend already > $5,000?

Do paying customers require specific deployment models?

Have we exhausted managed alternatives (ECS, CloudRun, Fargate)?

Usage

Apply each decision gate sequentially. If any gate fails, STOP and reconsider the infrastructure change.

Strategy Change Audit Checklist

Trigger: Run this audit whenever the business strategy changes (pivot to SaaS, abandon a product line, change target market, etc.)

Questions to Answer

What infrastructure was built specifically for the old strategy?
- List all components added to support the previous direction
- Example: "Multi-region EKS was built for portable deployments"
Is this infrastructure still needed?
- For each component, ask: "Does the NEW strategy require this?"
- If no → schedule removal or simplification
What's the monthly cost of orphaned infrastructure?
- Calculate: components no longer needed × monthly cost
- This is money you're burning for nothing
What's the simplest architecture for the new strategy?
- Start from zero: "If we were building for THIS strategy today, what would we build?"
- Compare to current state
What's the migration cost vs. ongoing waste?
- If migration takes 2 weeks and saves $5k/month, payback = immediate
- Factor in reduced complexity and cognitive load

The "Big Fish" Trap

Warning Sign: Building infrastructure to capture ONE specific prospect or customer.

Questions Before Chasing the Big Fish

Has this prospect committed money (deposit, LOI, signed contract)?

Have 3+ other prospects validated they'd pay for the same thing?

Is the prospect's need aligned with our core product direction?

Can we serve this prospect WITHOUT major architecture changes?

If this prospect says no, is the infrastructure still valuable?

Rule: If you can't check at least 3 boxes, DON'T build custom infrastructure for them.

The Trap in Action

"One customer built a solution using our OSS tools. Leadership decided to rebuild what they built so we could sell it to them."

Problems:

Customer already solved their problem - why pay you?
Sample size of one driving architecture decisions
Building on speculation, not validation

Container Architecture Checklist

Before deploying containers, verify you're not creating scaling bottlenecks.

Anti-Pattern: The Monolith Container

❌BAD: Multiple processes in one container

nginx + API + UI + supervisor

- Can't scale independently
- State prevents scaling
- Deployment = downtime

✅GOOD: Separate concerns

(CDN)

API

(stateless)

(managed)

Container Readiness Checklist

One process per container?

Stateless (no local storage dependencies)?

Can run multiple replicas without conflict?

Health checks that accurately reflect readiness?

Graceful shutdown handling?

If any box is unchecked: Fix before scaling, or accept that horizontal scaling won't work.

Gate 5: Minimum Viable Infrastructure

Question: Is this the simplest solution that meets the requirement?

Even legitimate requirements can be solved with varying levels of complexity. The right question isn't "can we manage this?" but "what's the minimum infrastructure that solves this problem?"

The Trap

Good operational practices (GitOps, ArgoCD, Terraform) can make complexity manageable without making it necessary. "We can manage it" is not the same as "we should build it this way."

Examples

Requirement	Over-Engineered	Right-Sized
Data residency	Full EKS cluster in region	Managed DB in region + existing compute
Customer isolation	Cluster per customer	Namespace per customer
High availability	Multi-region active-active	Single region with AZ redundancy
Portable deployments	Helm + K8s everywhere	Docker Compose + documentation
Blue-green deploys	Custom orchestration	Managed service feature (CloudRun revisions)

Before Building, Ask

What's the actual requirement? (Not the solution someone proposed)

Can a managed service handle this?

What's the simplest architecture that meets the requirement?

Are we building this because we *can* manage it, or because we *should*?

Rule: If you can solve the problem with a managed service or simpler architecture, do that first. You can always add complexity later—removing it is much harder.

Gate 0: Market Validation (Before Any Infrastructure)

The Root Question: Before asking "what infrastructure do we need?", ask "is there a market willing to pay for this?"

Infrastructure decisions are downstream of product-market fit. You can right-size infrastructure perfectly and still fail if you're building for a market that doesn't exist or won't pay.

The Warning Signs

Feature parity isn't differentiation: If competitors bundle your core offering into broader tools, you're competing against "free"
Utility ≠ willingness to pay: Open-source adoption validates usefulness, not revenue potential
"Why would someone pay for this?": If you can't answer this clearly, don't build infrastructure for it

Market Validation Checklist

Is there a market willing to pay for this as a standalone product?

What's our differentiation vs. competitors who bundle this feature?

Have we validated pricing with actual prospects (not just interest)?

If larger players offer this 'free' as part of broader tools, what's our wedge?

Can we articulate why a customer would choose us over bundled alternatives?

The Multiplier Effect

Market validation failure + infrastructure over-engineering = accelerated runway burn.

The infrastructure wasn't the root cause of failure—it was a multiplier on a market validation problem.

Rule: Validate the market before validating the architecture. A perfectly right-sized infrastructure for the wrong product is still a waste.

Framework Summary: Gate Sequence

🚪Gate 0: Market Validation

↓ (Pass before ANY infrastructure work)

💰Gate 1: Revenue Gate

↓

📊Gate 2: Cost/Revenue Ratio

↓

🎯Gate 3: Complexity Trigger

↓

🔄Gate 4: Reversibility Check

↓

⚡Gate 5: Minimum Viable Infrastructure

Stop at any failed gate. Don't build infrastructure for markets that won't pay, products without differentiation, or hypothetical customers.

Need help right-sizing your infrastructure?

Let's discuss how to scale efficiently without over-engineering

Schedule a Consultation