在AWS上为有状态服务构建混合多租户架构

Running a large-scale ad-serving infrastructure presents unique challenges when balancing tenant isolation with operational efficiency. Our infrastructure handles millions of requests per second and generates billions of dollars in annual advertising revenue, serving ads across multiple properties and systems.

The cellular architecture problem

Earlier, we had a cellular architecture where we allocated each AWS account with Application Load Balancer (ALB) and Amazon Elastic Container Service (Amazon ECS) to a given tenant. This approach provided accurate isolation but created the following significant operational challenges.

The scale problem: Supporting only 18 clients across four AWS Regions requires 181 separate targets. Our team configured dedicated AWS accounts, VPCs, load balancers, AWS Identity and Access Management (IAM) roles, and downstream service connections for each client.
The efficiency problem: Our servers spent more than 98 percent of their time waiting and less than 1 percent executing code. Average CPU utilization sat at 3 percent, and memory at 19 percent. We were paying for massive infrastructure that remained idle most of the time.
The onboarding problem: Bringing a new client online took approximately 52 days—roughly two weeks for AWS account provisioning, three weeks for VPC and networking setup, one week for IAM role configuration, and two weeks for downstream service integration and testing.
The scalability problem: When traffic grows or a new client joined, our only option is to spin up an entirely new cell and migrate to the client. We couldn’t support concurrent tier-1 live events—multiple high-value games couldn’t run simultaneously, forcing us to divert traffic to alternative systems.
The noisy neighbor problem: Despite our isolation efforts, we still experienced performance degradation when tenants shared infrastructure, affecting service quality and reliability.

Why we needed dedicated compute

Our ad-serving platform is a stateful service that loads and maintains data in memory for each tenant rather than fetching it from a database on every request. This in-memory state improves performance but creates the noisy neighbor problem when tenants share infrastructure.When two tenants share a cluster, their in-memory data competes for the same heap. A tenant with a large dataset can trigger out-of-memory conditions that affect its neighbors. This made shared-task and shared-cluster approaches challenging our stateful workloads.We needed a solution that maintained cluster-level isolation while dramatically improving operational efficiency.

Solution overview

We designed a hybrid multi-tenant architecture that provides cluster-level isolation within shared accounts. Here’s what we implemented:

Pre-integration model: Instead of provisioning VPCs, IAM roles, and downstream service connections for each new tenant, we created a configuration-driven infrastructure where these integrations are established once and reused across tenants.
Amazon Route 53 weighted routing: We implemented Route 53 weighted routing to enable gradual traffic migration between clusters without client-side changes. This allowed us to shift tenants between tiers as their traffic patterns evolved.
AWS PrivateLink connectivity: We established AWS PrivateLink endpoints that all tenants share, removing the need for us to set up new VPC peering or Transit Gateway connections for each tenant and reducing network configuration overhead by 80 percent.
Tier-based architecture: We organized our infrastructure into tiers (High TPS, Standard TPS, Low TPS) with multiple cells per tier, enabling horizontal scaling without the operational burden of per-tenant AWS accounts.
Configuration-driven onboarding: New tenant onboarding became a configuration change rather than an infrastructure provisioning exercise, dramatically reducing time and manual effort.

The architecture is organized around three nested levels of hierarchy. A tier is the top-level grouping—a logical classification of tenants that share a common infrastructure footprint. A tier spans one or more cells, where each cell is an AWS account boundary that represents the unit of horizontal scale-out at the account level. Within each cell, one or more infra groups serve as the self-contained infrastructure unit: a VPC, an Application Load Balancer, a set of ECS clusters (one per tenant), IAM roles, and a monitoring stack.

Why three levels? As you scale from 10 to 100 to 1,000 tenants, you will reach different AWS limits at different scales. Application Load Balancer target group limits constrain how many tenants fit in a single load balancer. AWS account limits on Elastic Network Interfaces (ENIs) and VPC endpoints constrain how many load balancers fit in a single account. This three-level hierarchy gives you two independent scaling levers to address each constraint—add infra groups to scale within an account and add cells to scale across accounts. The key design principle is that we pre-wire downstream service dependencies at tier creation, not at tenant onboarding. AWS PrivateLink connections from the tier VPC to each downstream service VPC are established after the tier is provisioned. After onboarding tenants to that tier, they automatically inherit full downstream connectivity. This single architectural decision is the primary reason for the 80 percent reduction in infrastructure setup steps. Route 53 performs weighted DNS routing across Application Load Balancers in multiple infra groups and cell accounts, enabling horizontal scale-out without client-side changes.

The following diagram illustrates the full architecture: Route 53 distributes traffic across ALBs in multiple infra groups within a single cell account, each ALB routes to tenant-specific ECS clusters using listener rules and target groups, and the clusters share tier-level PrivateLink connections to downstream services.

Multi-Tenant Architecture Diagram

Figure 1: Hybrid multi-tenant architecture showing Route 53 weighted routing, Application Load Balancer listener rules, dedicated ECS clusters per tenant, and shared AWS PrivateLink connections to downstream services.

Prerequisites

Before you build this architecture, make sure that you have the following:An AWS account configured with least privileged permissions to create VPCs, Application Load Balancers, ECS clusters, Route 53 hosted zones, and VPC endpoints. You also need the AWS Command Line Interface (AWS CLI) version 2.x or later installed and configured with appropriate credentials. This walkthrough assumes intermediate familiarity with Amazon ECS, Application Load Balancer, and Amazon Route 53—specifically ECS task definitions, Application Load Balancer listener rules, and Route 53 routing policies. You also need at least one downstream service exposing a VPC endpoint service for AWS PrivateLink connectivity.

Estimated time to complete: 2–3 hours.

Walkthrough

This walkthrough shows you how to build the previously described hybrid multi-tenant architecture. You will configure Route 53 weighted routing, deploy an ALB with tenant-specific listener rules, create dedicated ECS clusters per tenant, and establish AWS PrivateLink connectivity to shared downstream services. These will be done in a way that makes future tenant onboarding a configuration-only operation.

Step 1: Configure Route 53 Regional endpoints with weighted routing

Each tier exposes a single Regional DNS endpoint (for example, tier-1.us-east-1.example.com) backed by Route 53 weighted routing records. You can configure Route 53 to use weighted routing to help distribute traffic across ALBs in multiple AWS accounts. When you add a new account to the tier for horizontal scale-out, add a new weighted record. You don’t need to change existing tenant DNS entries.

To configure Route 53 weighted routing for a tier:

Open the Amazon Route 53 console and choose Hosted zones.
Select or create the hosted zone for your tier.
Choose Create record and select Weighted as the routing policy.
Set the record name to your tier endpoint (for example, tier-1.us-east-1.example.com), record type to A, and configure an alias pointing to the ALB in your first AWS account.
Set the Weight to 50 and provide a unique Set ID (for example, account-1).
Enable Evaluate target health so Route 53 helps make sure that it directs traffic to healthy ALBs when you configure health evaluation.
Repeat for each additional AWS account in the tier, using matching weights.

Alternatively, run the following AWS CLI command to create the first weighted record:

aws route53 change-resource-record-sets \
  --hosted-zone-id YOUR_HOSTED_ZONE_ID \
  --change-batch '{
    "Changes": [{
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "tier-1.us-east-1.example.com",
        "Type": "A",
        "SetIdentifier": "account-1",
        "Weight": 50,
        "AliasTarget": {
          "HostedZoneId": "Z35S*****K",
          "DNSName": "your-alb.us-east-1.elb.amazonaws.com",
          "EvaluateTargetHealth": true
        }
      }
    }]
  }'

Note: Replace Z35S****K with the hosted zone ID for your ALB’s AWS Region. For more information, see Elastic Load Balancing endpoints and quotas.

Route 53 supports up to 10,000 weighted records per hosted zone, so this approach scales to thousands of AWS accounts without architectural changes. For more information about weighted routing, see Weighted routing in the Amazon Route 53 Developer Guide.

Step 2: Deploy an Application Load Balancer with tenant-specific listener rules

Each infra group contains one Application Load Balancer. The load balancer inspects incoming requests and forwards them to the correct tenant’s ECS service based on a tenant identifier extracted from the request path or a custom HTTP header.

Two Application Load Balancer quotas shape the capacity of each infra group: a maximum of 100 target groups per load balancer, and a maximum of 5 target groups per listener rule. With 20 listener rules each forwarding to 5 target groups, a single load balancer supports up to 50 tenants per infra group. With up to 5 ECS clusters per tenant, a single infra group can host up to 100 ECS clusters.

To create a tenant-specific listener rule:

Open the Amazon EC2 console and choose Load Balancers in the navigation pane.
Select your Application Load Balancer and choose the Listeners tab.
Choose View/edit rules for the HTTPS listener.
Choose the plus (+) icon to add a new rule.
Add a condition: Path is /tenant-a/* (or HTTP header if you use header-based routing).
Add an action: Forward to the target group for tenant-a.
Set a unique rule priority and save.

To create the target group and listener rule using the AWS CLI:

# Create a target group for the tenant
aws elbv2 create-target-group \
  --name tg-tenant-a \
  --protocol HTTP --port 8080 \
  --vpc-id YOUR_VPC_ID \
  --target-type ip
# Add a listener rule routing /tenant-a/* to the target group
aws elbv2 create-rule \
  --listener-arn YOUR_LISTENER_ARN \
  --conditions '[{"Field":"path-pattern","Values":["/tenant-a/*"]}]' \
  --actions '[{"Type":"forward","TargetGroupArn":"YOUR_TARGET_GROUP_ARN"}]' \
  --priority 10

For more information, see Listener rules for your Application Load Balancer.

Step 3: Create dedicated ECS clusters per tenant

In this step, you create a dedicated ECS cluster for each tenant within your infra group’s VPC. Use a consistent naming convention that encodes the tier, cell, infra group, and tenant identifier (for example, tier-1-cell-1-ig-1-tenant-a) to make ownership clear during operations and incident response.To create a dedicated ECS cluster for a tenant:

Open the Amazon ECS console and choose Clusters.
Choose Create cluster.
Enter a cluster name following your naming convention (for example, tier-1-cell-1-ig-1-tenant-a).
Select EC2 Linux + Networking and configure the instance type and Auto Scaling group settings appropriate for the tenant’s workload.
Select the infra group VPC and subnets.
Choose Create.

To create the cluster using the AWS CLI:

aws ecs create-cluster \
  --cluster-name tier-1-cell-1-ig-1-tenant-a \
  --region us-east-1

In the ECS task definition for this tenant, pass the tenant identifier as an environment variable. The application reads this value at startup to scope its data access — loading only that tenant’s configuration and state from the shared remote cache:

{
  "containerDefinitions": [{
    "name": "app",
    "image": "your-ecr-image:latest",
    "environment": [
      { "name": "TENANT_ID", "value": "tenant-a" },
      { "name": "CACHE_ENDPOINT", "value": "cache.tier-1.internal" }
    ]
  }]
}

Note: Replace your-ecr-image:latest with your Amazon Elastic Container Registry (Amazon ECR) image URI.

Register the ECS service as a target in the ALB target group created in Step 2. Configure ECS service auto-scaling based on central processing unit (CPU) and memory utilization metrics, scoped to the individual service. Because each cluster is single-tenant, the ECS limit of 5,000 tasks per service applies exclusively to that tenant. One tenant’s resource consumption can’t affect another tenant’s cluster. For more information, see Creating a cluster in the Amazon ECS Developer Guide.

Step 4: Establish AWS Private Link connectivity to shared dependencies

This step happens at tier creation, not at tenant onboarding—and that distinction is the architectural heart of the design. For each downstream service your application integrates with, create a VPC interface endpoint in the infra group VPC. The ECS tasks in the tier route traffic to downstream services through these endpoints. Tenants onboarded to that tier can access downstream connectivity through the pre-configured endpoints.

Each VPC interface endpoint costs approximately $7.30/month plus data transfer charges ($0.01/GB). For a tier with 50 tenants sharing one endpoint, this cost is negligible compared to the operational savings. If your downstream services are in the same VPC, consider using VPC peering or AWS Transit Gateway as lower-cost alternatives. Use AWS PrivateLink when you need to connect to services in different AWS accounts or when you require the security and isolation benefits of private connectivity.

To create a VPC interface endpoint for a downstream service:

Open the Amazon VPC console and choose Endpoints in the navigation pane.
Choose Create endpoint.
Select Find service by name and enter the VPC endpoint service name provided by the downstream service owner.
Select the infra group VPC and the subnets used by ECS tasks.
Attach a security group that allows outbound traffic from ECS tasks to the endpoint on the required port.
Choose Create endpoint.

To create the endpoint using the AWS CLI:

aws ec2 create-vpc-endpoint \
  --vpc-id YOUR_VPC_ID \
  --service-name com.amazonaws.vpce.us-east-1.vpce-svc-YOUR_SERVICE_ID \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-*** subnet-*** \
  --security-group-ids sg-YOUR_SG_ID

Define tier-level IAM roles with the permissions needed to access downstream services and assign these roles to ECS task definitions at the tier level. New tenants can receive the tier-level permissions through the shared IAM roles without per-tenant role creation. For more information, see Access an AWS service using an interface VPC endpoint.

Step 5: Configure tenant isolation, scaling, and observability

This architecture enforces tenant isolation at three layers through customer configuration. At the routing layer, ALB listener rules route traffic exclusively to the correct tenant’s target group based on the tenant identifier. ALB listener rules help route traffic to the correct tenant’s target group based on your configuration. At the compute layer, each tenant has a dedicated ECS cluster, so resource limits apply per cluster and cluster-level isolation is designed to help minimize the impact of one tenant’s resource consumption on another tenant. At the in-memory state layer, because each ECS cluster is single-tenant, in-memory data loaded at startup belongs exclusively to that tenant with no shared heap between tenants.

Scaling strategies

When a single tenant’s traffic grows but you haven’t reached the 50-tenant limit per infra group, use vertical scaling — it’s faster (minutes vs. hours) and doesn’t require Route 53 changes. Increase ECS task CPU and memory reservations in the task definition, or switch to larger EC2 instance types in the Auto Scaling group.

When you’re approaching the 50-tenant limit or when multiple tenants need capacity simultaneously, add a new infra group within the same cell—a new VPC, ALB, and set of ECS clusters. Route 53 weighted routing distributes traffic across infra groups without client-side changes:

aws route53 change-resource-record-sets \
  --hosted-zone-id YOUR_HOSTED_ZONE_ID \
  --change-batch '{
    "Changes": [{
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "tier-1.us-east-1.example.com",
        "Type": "A",
        "SetIdentifier": "cell-1-ig-2",
        "Weight": 50,
        "AliasTarget": {
          "HostedZoneId": "Z3******K",
          "DNSName": "your-alb-ig-2.us-east-1.elb.amazonaws.com",
          "EvaluateTargetHealth": true
        }
      }
    }]
  }'

Use cell-level scaling only when you’re approaching account-level limits—typically after 3–4 infra groups per cell. Each AWS account has hard limits on ENIs, VPC endpoints, and other resources. When a cell approaches these limits, add a new cell by provisioning an identical tier infrastructure stack in a new AWS account and registering its ALBs in Route 53 with weighted records alongside existing cells:

aws route53 change-resource-record-sets \
  --hosted-zone-id YOUR_HOSTED_ZONE_ID \
  --change-batch '{
    "Changes": [{
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "tier-1.us-east-1.example.com",
        "Type": "A",
        "SetIdentifier": "cell-2",
        "Weight": 50,
        "AliasTarget": {
          "HostedZoneId": "Z35SXDOTRQ7X7K",
          "DNSName": "your-alb-cell-2.us-east-1.elb.amazonaws.com",
          "EvaluateTargetHealth": true
        }
      }
    }]
  }'

The tier endpoint (tier-1.us-east-1.example.com) remains stable. Tenants don’t need to update their DNS configuration as the tier grows. The following table summarizes when to use each scaling lever:

Trigger	Action	Unit added
Application Load Balancer target group limit (~50 tenants per infra group)	Add an infra group within the same cell	Infra group (VPC + Application Load Balancer + ECS clusters)
AWS account-level limits (ENIs, VPC endpoints)	Add a new cell	Cell (new AWS account)

Observability

Observability is structured at two levels. Emit tenant-level metrics from each ECS service with the tenant identifier as an Amazon CloudWatch dimension. Key metrics to monitor:

Memory usage per ECS service is the primary signal for in-memory state growth. A sudden spike often indicates a data model change or misconfigured data pipeline. Set CloudWatch alarms at 70 percent (warning) and 85 percent (critical). When memory usage exceeds 70 percent, investigate whether the tenant’s data model has changed or if a data pipeline is misconfigured. At 85 percent, prepare to vertically scale the ECS task definition. TargetResponseTime and request count per ALB target group measure latency and throughput per tenant. Establish a baseline for each tenant during onboarding (typically 100–200 ms for stateful services), then alert when latency exceeds 2x baseline for more than 5 minutes. HTTPCode_Target_5XX_Count per target group tracks error rate per tenant.For tier-level health, monitor ALB ActiveConnectionCount and ProcessedBytes, Route 53 health check status per load balancer, and ECS cluster CPU reservation and memory reservation for capacity planning. Configure Amazon CloudWatch Logs with structured log fields including tenant_id, tier_id, and region in every log entry. Use a single log group per tier with log stream prefixes that encode the tenant identifier. The following CloudWatch Logs Insights query identifies error rates by tenant across the entire tier:

fields @timestamp, tenant_id, @message
| filter @message like /ERROR/
| stats count() as error_count by tenant_id
| sort error_count desc

Step 6: Validate the architecture

Before onboarding production tenants, validate your architecture with the following checks:

Send test requests to your tier endpoint with different tenant identifiers in the path.
Verify that Route 53 distributes traffic across Application Load Balancers: aws route53 test-dns-answer --hosted-zone-id YOUR_ID --record-name tier-1.us-east-1.example.com
Confirm the load balancer routes requests to the correct tenant’s ECS cluster by checking ALB access logs.
Test AWS PrivateLink connectivity by making requests from ECS tasks to downstream services.
Simulate a tenant memory spike by loading a large dataset and confirm that it doesn’t affect other tenants.
Verify that CloudWatch metrics are being emitted with correct tenant_id dimensions.

Results

These results come from implementing this architecture for a stateful ad-serving application. Before this architecture, onboarding a new tenant required 52 days. With this architecture, onboarding dropped to seven days—primarily testing and validation, because infrastructure is pre-provisioned.

Measured improvements:

Tenant onboarding time: from 52 days to 7 days (86 percent reduction)
Infrastructure setup steps per tenant: 80 percent fewer
Engineering effort per onboarding: 80 percent reduction
Feature release time: from 2–3 days to 1 day
Tenant capacity: up to 100 tenants per AWS account with strong cluster-level isolation

Cleaning up

To avoid incurring future charges, delete the resources in the following order:

Deregister ECS services from target groups, then delete ECS clusters (this might take 5–10 minutes).
Delete Application Load Balancer listener rules, then delete target groups associated with test tenants.
Remove Route 53 weighted routing records for test tier endpoints.
Delete VPC interface endpoints (AWS PrivateLink) created during tier setup.
Terminate EC2 instances in Auto Scaling groups, then delete the Auto Scaling groups.
(Optional) Delete the VPC if no other resources depend on it.

Note: Deleting these resources stops charges immediately. If you plan to reuse this architecture, consider stopping ECS services instead of deleting clusters.

Conclusion

In this post, I showed you how to build a hybrid multi-tenant architecture that provides strong tenant isolation without requiring per-tenant AWS accounts. You learned how to configure Route 53 weighted routing to distribute traffic across multiple accounts, deploy Application Load Balancer listener rules for tenant-specific routing, create dedicated ECS clusters per tenant, and establish AWS PrivateLink connectivity to shared dependencies. This approach reduced tenant onboarding time by 86 percent and infrastructure setup steps by 80 percent.

The most important design decision is decoupling dependency setup from tenant onboarding. Pre-wiring the PrivateLink connections, IAM roles, and remote cache endpoints at tier creation transforms onboarding from a multi-week infrastructure project into a configuration-only operation. The three-level hierarchy (tier, cell, infra group) gives you two independent scaling levers. Add infra groups when an Application Load Balancer approaches its target group limit. Add cells when an AWS account approaches its ENI or VPC endpoint limits. Route 53 weighted routing absorbs both changes transparently.

Next steps

Ready to implement this architecture? Here’s how to get started:

Assess your current tenant distribution and identify candidates for tier consolidation.
Define tier promotion criteria based on your latency and isolation requirements.
Start with a single tier and 2–3 test tenants to validate the architecture.
Gradually migrate existing tenants using a phased approach.
Monitor tenant-level metrics for 2–4 weeks before scaling to additional tiers.

For additional guidance, review the AWS Well-Architected Framework — SaaS Lens and explore the SaaS ECS reference architecture on the GitHub website.

Optional enhancements

After you’ve implemented this architecture, consider these additional improvements: formalized tier migration playbooks with automated tooling to make moving tenants between tiers a predictable, low-risk operation; and bin-packing analysis across tiers to identify tenants whose memory footprints allow co-location on the same EC2 instance without sharing a cluster, reducing EC2 costs while maintaining isolation properties.Have you implemented a similar multi-tenant architecture? Leave a comment or reach out to share your story.