Amazon ECS 中使用线性与金丝雀策略的渐进式部署
来源: AWS 容器
When deploying new application versions, you need confidence changes won’t impact customers. Amazon Elastic Container Service (Amazon ECS) now supports linear and canary deployment strategies, complementing built-in blue/green deployments. With linear deployments, you shift traffic in equal increments with a bake time between each shift. With canary deployments, you route a small percentage to the new revision and monitor before shifting the rest. Both strategies support Amazon CloudWatch alarms for failure detection and rollback, and lifecycle hooks for custom validation.
In this post, we walk through how linear and canary strategies work in Amazon ECS, how to configure each, and how to set up automatic rollbacks with CloudWatch alarms.
How Amazon ECS orchestrates gradual deployments
When you configure linear or canary deployments, Amazon ECS uses Elastic Load Balancing weighted target groups and CloudWatch alarms for traffic shifting and automated rollback.
Figure 1: Amazon ECS architecture for gradual deployments showing blue and green revisions connected to a load balancer (Application Load Balancer or Network Load Balancer) with weighted target groups monitored by CloudWatch alarms for automated rollback.

Architecture and traffic flow
Amazon ECS supports four deployment strategies, each with a different approach to traffic management. Your choice depends on your risk tolerance and required control over the rollout.
Rolling: task-by-task replacement
Rolling deployments replace tasks progressively without traffic shifting. Amazon ECS starts new tasks before stopping old ones to maintain availability (controlled by the minimumHealthyPercent and maximumPercent parameters). This is the default deployment type.
Figure 2: Rolling deployment showing four tasks being replaced progressively. New tasks (green) start before old tasks (blue) are stopped to maintain availability.

Consider rolling deployments for cost-sensitive deployments where you want to avoid duplicate infrastructure, and workloads where simplicity is preferred over fine-grained traffic control.
Blue/green: full traffic switch
Blue/green deployments create a complete replacement environment (green) alongside the existing one (blue). After validation using a test listener, traffic switches instantly from blue to green. The blue environment remains available for rollback.
Figure 3: Blue/green deployment showing instant traffic switch from blue (100%) to green (100%) after health checks pass.

Blue/green deployments work well for database schema changes requiring synchronized cutover, major version upgrades where instant rollback is critical, and services where gradual rollout provides no additional validation benefit.
Linear: gradual shift in equal increments
Linear deployments shift traffic in equal increments, with a configurable bake time at each stage. If CloudWatch alarms breach or health checks fail, the deployment automatically rolls back.
Figure 4: Linear deployment shifting traffic in 10% increments across 10 steps, with a 5-minute validation period at each step.

Canary: small traffic slice first
Canary deployments route a small percentage of traffic to the new version for an extended observation period. If validation succeeds, the remaining traffic shifts in a single step.
Figure 5: Canary deployment routing 5% of traffic to the green revision for 15 minutes of extended validation, then shifting remaining traffic on success.

Choosing your rollout strategy
The following table compares all four Amazon ECS deployment strategies to help you decide which approach fits your workload.
| Strategy | Traffic Control | Rollback Speed | Cost Impact | Best For |
| Rolling | No control | Slow (redeploy) | Low | Cost-sensitive workloads, simplicity over traffic control |
| Blue/green | Instant switch | Instant | High (2x resources) | Critical updates, DB migrations |
| Linear | Gradual increments | Fast (traffic shift) | Medium | APIs, microservices |
| Canary | Small test first | Fast (traffic shift) | Medium | Changes requiring careful validation, machine learning models |
Observability and rollbacks
Amazon ECS provides several mechanisms to monitor deployments and automatically roll back when issues arise.
Amazon CloudWatch alarms for automated failure detection
You can associate CloudWatch alarms with your Amazon ECS service for automatic rollback. If an alarm enters the ALARM state, the deployment rolls back.
Common alarm metrics include:
- Error rate: HTTPCode_Target_5XX_Count, HTTPCode_Target_4XX_Count
- Latency: TargetResponseTime (p50, p99, p99.9)
- Availability: UnHealthyHostCount, HealthyHostCount
- Custom metrics: Application-specific business metrics published to Amazon CloudWatch
Lifecycle hooks
Lifecycle hooks let you run custom validation logic at specific points during the deployment. You can use AWS Lambda functions to implement hooks for:
- Pre-deployment validation: Verify prerequisites before creating new tasks
- Post-deployment testing: Run automated tests against the new revision
- Custom health checks: Validate application-specific health criteria
- Integration testing: Test interactions with downstream dependencies
Hooks can return IN_PROGRESS to indicate ongoing validation, SUCCEEDED to proceed, or FAILED to trigger rollback.
Bake time configuration
The deployment bake time is a buffer period after traffic shifting completes, during which the old (blue) revision remains running. This gives you instant rollback by shifting traffic back, without redeploying tasks.
Consider the following bake time configuration:
- Most workloads: Set a baseline bake time that covers your typical health check intervals and allows enough time for error rates and latency metrics to surface in CloudWatch alarms.
- Workloads requiring extended observation: Increase the bake time for workloads with longer feedback loops, such as batch processing, async workflows, or services where downstream effects take time to manifest.
- Cost vs. safety tradeoff: Longer bake times increase costs (running both revisions) but improve rollback capability. Choose based on your service’s mean time to detect (MTTD) failures.
Prerequisites
Before starting either walkthrough, verify that you have the following:
- An Amazon ECS cluster (AWS Fargate or Amazon Elastic Compute Cloud (Amazon EC2))
- An Application Load Balancer or Network Load Balancer with two target groups configured
- A task definition for your application
- A running Amazon ECS service configured with a load balancer and advancedConfiguration (two target groups and a production listener rule), along with an IAM role for traffic shifting. For the linear walkthrough, create a service named my-web-service; for the canary walkthrough, create a service named payment-service. See Creating an Amazon ECS linear deployment and Creating an Amazon ECS canary deployment for instructions.
- AWS Command Line Interface (AWS CLI) installed and configured
- Appropriate AWS Identity and Access Management (AWS IAM) permissions for Amazon ECS, Elastic Load Balancing, and Amazon CloudWatch
- An IAM role (for example, ecsBlueGreenRole) that allows Amazon ECS to manage load balancer target group weights during traffic shifting
- For the canary walkthrough: custom Amazon CloudWatch metrics for business logic validation (optional)
Walkthrough: linear strategy implementation
In this walkthrough, you deploy a sample application using the linear deployment strategy with automatic rollback capabilities.
Step 1: Create CloudWatch alarm for 5XX errors
# Create alarm for 5XX errors across both target groups
aws cloudwatch put-metric-alarm \
--alarm-name my-service-5xx-errors \
--alarm-description "Trigger on high 5XX error rate across both target groups" \
--metrics '[
{
"Id": "blue5xx",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "HTTPCode_Target_5XX_Count",
"Dimensions": [
{"Name": "TargetGroup", "Value": "targetgroup/blue/xxx"},
{"Name": "LoadBalancer", "Value": "app/my-load-balancer/xxx"}
]
},
"Period": 60,
"Stat": "Sum"
},
"ReturnData": false
},
{
"Id": "green5xx",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "HTTPCode_Target_5XX_Count",
"Dimensions": [
{"Name": "TargetGroup", "Value": "targetgroup/green/xxx"},
{"Name": "LoadBalancer", "Value": "app/my-load-balancer/xxx"}
]
},
"Period": 60,
"Stat": "Sum"
},
"ReturnData": false
},
{
"Id": "total5xx",
"Expression": "SUM([blue5xx, green5xx])",
"Label": "Total 5XX Errors",
"ReturnData": true
}
]' \
--evaluation-periods 2 \
--threshold 10 \
--comparison-operator GreaterThanThreshold
Step 2: Create alarm for high latency
Create an alarm for high response time across both target groups.
# Create alarm for high latency across both target groups
aws cloudwatch put-metric-alarm \
--alarm-name my-service-high-latency \
--alarm-description "Trigger on high response time across both target groups" \
--metrics '[
{
"Id": "blueLatency",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "TargetResponseTime",
"Dimensions": [
{"Name": "TargetGroup", "Value": "targetgroup/blue/xxx"},
{"Name": "LoadBalancer", "Value": "app/my-load-balancer/xxx"}
]
},
"Period": 60,
"Stat": "Average"
},
"ReturnData": false
},
{
"Id": "greenLatency",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "TargetResponseTime",
"Dimensions": [
{"Name": "TargetGroup", "Value": "targetgroup/green/xxx"},
{"Name": "LoadBalancer", "Value": "app/my-load-balancer/xxx"}
]
},
"Period": 60,
"Stat": "Average"
},
"ReturnData": false
},
{
"Id": "maxLatency",
"Expression": "MAX([blueLatency, greenLatency])",
"Label": "Max Response Time",
"ReturnData": true
}
]' \
--evaluation-periods 2 \
--threshold 1.0 \
--comparison-operator GreaterThanThreshold
Step 3: Configure the linear strategy
Update your Amazon ECS service to use linear deployment with CloudWatch alarm integration. This also enables deployment circuit breakers for automatic rollback.deployment circuit breakers
# Configure linear deployment strategy with CloudWatch alarm integration
aws ecs update-service \
--cluster production-cluster \
--service my-web-service \
--deployment-configuration '{
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
},
"strategy": "LINEAR",
"linearConfiguration": {
"stepPercent": 10,
"stepBakeTimeInMinutes": 5
},
"alarms": {
"alarmNames": [
"my-service-5xx-errors",
"my-service-high-latency"
],
"enable": true,
"rollback": true
}
}'
Step 4: Deploy a new version
Trigger a deployment by forcing a new deployment of the existing task definition:
# Trigger a deployment by forcing a new deployment of the existing task definition
aws ecs update-service \
--cluster production-cluster \
--service my-web-service \
--force-new-deployment
Step 5: Monitor rollout progress
Monitor deployment progress in real time using the DescribeServiceDeployments and DescribeServiceRevisions APIs, which provide detailed information about traffic shifting status, rollout state, and revision details.
# List service deployments to get the deployment ARN
aws ecs list-service-deployments \
--service arn:aws:ecs:region:account:service/production-cluster/my-web-service
# Get detailed deployment status including traffic shifting progress
aws ecs describe-service-deployments \
--service-deployment-arns arn:aws:ecs:region:account:service-deployment/production-cluster/my-web-service/xxx
# Get details about a specific service revision
aws ecs describe-service-revisions \
--service-revision-arns arn:aws:ecs:region:account:service-revision/production-cluster/my-web-service/xxx
Step 6: Observe traffic shifting
During the deployment, traffic shifts in 10% increments every 5 minutes:
Time Blue (Old) Green (New) Status
0:00 100% 0% Deployment started
0:05 90% 10% Step 1 complete
0:10 80% 20% Step 2 complete
0:15 70% 30% Step 3 complete
...
0:45 0% 100% Deployment complete
If Amazon CloudWatch alarms breach during deployment, Amazon ECS automatically pauses the deployment, shifts traffic back to the stable (blue) revision, terminates the new (green) tasks, and marks the deployment as FAILED.
Step 7: Verify rollout status
Check the deployment completed successfully:
# Check deployment status
aws ecs describe-services \
--cluster production-cluster \
--services my-web-service \
--query "services[0].deployments[?status=='PRIMARY'].rolloutState" \
--output text
Step 8: Verify alarm state
Confirm no alarms are in ALARM state:
# Confirm no alarms in ALARM state
aws cloudwatch describe-alarms \
--alarm-names my-service-5xx-errors my-service-high-latency \
--query "MetricAlarms[*].[AlarmName,StateValue]" \
--output table
Walkthrough: canary strategy implementation
This walkthrough deploys an update using the canary strategy.
Step 1: Create CloudWatch alarm for HTTP errors
Create an alarm for HTTP 5XX errors using metric math across both target groups.
# Create alarm for 5XX errors across both target groups (blue and green)
aws cloudwatch put-metric-alarm \
--alarm-name payment-service-errors \
--alarm-description "Trigger on high 5XX error rate across both target groups" \
--metrics '[
{
"Id": "blue5xx",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "HTTPCode_Target_5XX_Count",
"Dimensions": [
{"Name": "TargetGroup", "Value": "targetgroup/blue/xxx"},
{"Name": "LoadBalancer", "Value": "app/my-load-balancer/xxx"}
]
},
"Period": 60,
"Stat": "Sum"
},
"ReturnData": false
},
{
"Id": "green5xx",
"MetricStat": {
"Metric": {
"Namespace": "AWS/ApplicationELB",
"MetricName": "HTTPCode_Target_5XX_Count",
"Dimensions": [
{"Name": "TargetGroup", "Value": "targetgroup/green/xxx"},
{"Name": "LoadBalancer", "Value": "app/my-load-balancer/xxx"}
]
},
"Period": 60,
"Stat": "Sum"
},
"ReturnData": false
},
{
"Id": "total5xx",
"Expression": "SUM([blue5xx, green5xx])",
"Label": "Total 5XX Errors",
"ReturnData": true
}
]' \
--evaluation-periods 2 \
--threshold 5 \
--comparison-operator GreaterThanThreshold
Step 2: Create alarm for business metrics
Create a business metric alarm for application monitoring:
# Business metric alarm (custom)
aws cloudwatch put-metric-alarm \
--alarm-name payment-transaction-failure-rate \
--metric-name TransactionFailureRate \
--namespace CustomApp/Payments \
--statistic Average \
--period 300 \
--evaluation-periods 1 \
--threshold 0.5 \
--comparison-operator GreaterThanThreshold
Step 3: Configure the canary strategy
Configure the canary with a small initial traffic percentage and an extended bake time:
# Configure canary deployment strategy with CloudWatch alarm integration
aws ecs update-service \
--cluster production-cluster \
--service payment-service \
--deployment-configuration '{
"deploymentCircuitBreaker": {
"enable": true,
"rollback": true
},
"strategy": "CANARY",
"canaryConfiguration": {
"canaryPercent": 5,
"canaryBakeTimeInMinutes": 20
},
"alarms": {
"alarmNames": [
"payment-service-errors",
"payment-transaction-failure-rate"
],
"enable": true,
"rollback": true
}
}'
Step 4: Deploy new version
Trigger a deployment by forcing a new deployment of the existing task definition:
# Trigger a deployment by forcing a new deployment of the existing task definition
aws ecs update-service \
--cluster production-cluster \
--service payment-service \
--force-new-deployment
Step 5: Observe canary traffic pattern
During canary deployment, traffic shifts in two phases:
Phase 1: Canary testing (20 minutes)
Time Blue (Old) Green (New) Status
0:00 100% 0% Canary deployment started
0:01 95% 5% Canary phase - monitoring
0:05 95% 5% Canary phase - monitoring
0:10 95% 5% Canary phase - monitoring
0:15 95% 5% Canary phase - monitoring
0:20 95% 5% Canary validation complete
Phase 2: Full rollout (if canary succeeds)
0:21 0% 100% Full traffic shift
0:21 0% 100% Deployment complete
Step 6: Verify rollout status
Check the deployment completed successfully:
# Check deployment status
aws ecs describe-services \
--cluster production-cluster \
--services payment-service \
--query "services[0].deployments[?status=='PRIMARY'].rolloutState" \
--output text
Step 7: Verify alarm state
Confirm no alarms are in ALARM state:
# Confirm no alarms in ALARM state
aws cloudwatch describe-alarms \
--alarm-names payment-service-errors payment-transaction-failure-rate \
--query "MetricAlarms[*].[AlarmName,StateValue]" \
--output table
Best practices
This section provides guidance on configuring alarms and setting bake times for your deployments.
Amazon CloudWatch alarm configuration
Consider configuring two tiers of alarms: critical alarms that trigger automatic rollback, and warning alarms for monitoring only. Set thresholds and evaluation periods based on your application’s baseline performance and acceptable error rates.
Critical alarms (immediate rollback):
- HTTPCode_Target_5XX_Count
- TargetResponseTime (p99)
- UnHealthyHostCount
Warning alarms (monitor, don’t roll back):
- TargetResponseTime (p50)
- RequestCount (anomaly detection or percentage decrease)
- CPUUtilization
Bake time guidelines
Canary bake time:
- Low risk: shorter bake time
- Medium risk: moderate bake time
- High risk: extended bake time
Deployment bake time:
- Set a baseline that exceeds your service’s mean time to detect (MTTD) failures
- This gives you instant rollback without redeployment
Clean up resources
To avoid ongoing charges, delete all resources created during the walkthroughs. Load balancers and running Amazon ECS tasks are the primary cost drivers.
Warning: The –force flag immediately stops all running tasks without draining connections. This causes service disruption. Make sure no active traffic is being served and back up any necessary data before proceeding.
# Delete the linear walkthrough ECS service
aws ecs delete-service \
--cluster production-cluster \
--service my-web-service \
--force
# Delete the canary walkthrough ECS service
aws ecs delete-service \
--cluster production-cluster \
--service payment-service \
--force
# Delete CloudWatch alarms for linear walkthrough
aws cloudwatch delete-alarms \
--alarm-names my-service-5xx-errors my-service-high-latency
# Delete CloudWatch alarms for canary walkthrough
aws cloudwatch delete-alarms \
--alarm-names payment-service-errors payment-transaction-failure-rate
# Delete target groups (after service deletion completes)
aws elbv2 delete-target-group \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/blue/xxx
aws elbv2 delete-target-group \
--target-group-arn arn:aws:elasticloadbalancing:region:account:targetgroup/green/xxx
If you created the following resources specifically for this walkthrough and no longer need them, delete them:
# Delete the ECS cluster
aws ecs delete-cluster \
--cluster production-cluster
# Delete the load balancer
aws elbv2 delete-load-balancer \
--load-balancer-arn arn:aws:elasticloadbalancing:region:account:loadbalancer/app/my-load-balancer/xxx
# Deregister task definitions
aws ecs deregister-task-definition \
--task-definition my-task-definition:1
# Delete the IAM role
aws iam delete-role \
--role-name ecsBlueGreenRole
Conclusion
In this post, we showed you how to configure linear and canary deployment strategies in Amazon ECS with CloudWatch alarms for automatic rollback, providing native gradual rollout support with automated safety.
Next steps
To get started, try the linear deployment strategy with a non-production service first. Experiment with different step percentages and bake times to find optimal settings. After validating linear deployments, adopt canary deployments for your most sensitive services.
These strategies are available today in commercial AWS Regions. For pricing details, see the Amazon ECS pricing page.
For more information, see the Amazon ECS Developer Guide.
Additional resources
- For more information about deployment types and configuration options, see Amazon ECS deployment strategies documentation.
- For more information about automatic rollback mechanisms, see Amazon ECS deployment circuit breaker.
- For more information about traffic distribution during deployments, see Elastic Load Balancing weighted target groups.