This workshop has been deprecated and archived. The new Amazon EKS Workshop is now available at www.eksworkshop.com.
Before we start the canary deployment setup, lets learn more about the canary analysis and how it works in Flagger.
Flagger can be configured to automate the release process for Kubernetes workloads with a custom resource named canary which we installed in the previous chapter. The canary custom resource defines the release process of an application running on Kubernetes.
We have defined the canary release with progressive traffic shifting for the deployment of backend service detail
here.
When we deploy a new version of detail
backend service, Flagger gradually shifts traffic to the canary, and at the same time, measures the requests success rate as well as the average response duration.
Below is one of the section from Canary Resource for canary target for detail
service. For more details, see Flagger documentation here.
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: detail
progressDeadlineSeconds: 60
Based on the above configuration, Flagger generates deployment/detail-primary Kubernetes object during canary analysis. This primary deployment is considered the stable release of our detail
service, by default all traffic is routed to this version and the target deployment is scaled to zero. Flagger will detect changes to the target deployment (including secrets and configmaps) and will perform a canary analysis before promoting the new version as primary.
The progress deadline represents the maximum time in seconds for the canary deployment to make progress before it is rolled back, defaults to ten minutes.
Below canary service section from Canary Resource dictates how the target workload is exposed inside the cluster. The canary target should expose a TCP port that will be used by Flagger to create the ClusterIP Services. For more details, see Flagger documentation here.
service:
# container port
port: 3000
Based on our canary spec service, Flagger creates the following Kubernetes ClusterIP service
detail.flagger.svc.cluster.local
selector app=detail-primary
detail-primary.flagger.svc.cluster.local
selector app=detail-primary
detail-canary.flagger.svc.cluster.local
selector app=detail
This ensures that traffic to detail.flagger:3000
will be routed to the latest stable release of our detail
service. The detail-canary.flagger:3000
address is available only during the canary analysis and can be used for conformance testing or load testing.
The canary analysis defines:
The canary analysis runs periodically until it reaches the maximum traffic weight or the number of iterations. On each run, Flagger calls the webhooks, checks the metrics and if the failed checks threshold is reached, stops the analysis and rolls back the canary. For more details, see Flagger documentation here.
For the canary analyis of detail
service, we have used the below setup.
# define the canary analysis timing and KPIs
analysis:
# schedule interval (default 60s)
interval: 1m
# max number of failed metric checks before rollback
threshold: 1
# max traffic percentage routed to canary
# percentage (0-100)
maxWeight: 15
# canary increment step
# percentage (0-100)
stepWeight: 5
# App Mesh Prometheus checks
metrics:
- name: request-success-rate
# minimum req success rate (non 5xx responses)
# percentage (0-100)
thresholdRange:
min: 99
interval: 1m
- name: request-duration
# maximum req duration P99
# milliseconds
thresholdRange:
max: 500
interval: 30s
# testing (optional)
webhooks:
- name: acceptance-test
type: pre-rollout
url: http://flagger-loadtester.flagger/
timeout: 30s
metadata:
type: bash
cmd: 'curl -s http://detail-canary.flagger:3000/ping | grep "Healthy"'
- name: 'load test'
type: rollout
url: http://flagger-loadtester.flagger/
timeout: 15s
metadata:
cmd: 'hey -z 1m -q 5 -c 2 http://detail-canary.flagger:3000/ping'
Later, in next section we will deploy canaries in our cluster, then we could get the current status of canary deployments :
kubectl get canaries --all-namespaces
The status condition reflects the last known state of the canary analysis:
kubectl -n flagger get canary/detail -oyaml | awk '/status/,0'
status:
canaryWeight: 0
conditions:
- lastTransitionTime: "2021-03-23T05:16:21Z"
lastUpdateTime: "2021-03-23T05:16:21Z"
message: Canary analysis completed successfully, promotion finished.
reason: Succeeded
status: "True"
type: Promoted
failedChecks: 0
iterations: 0
lastAppliedSpec: 75cf74bc9b
lastTransitionTime: "2021-03-23T05:16:21Z"
phase: Succeeded
trackedConfigs: {}
The Promoted status condition can have one of the following reasons:
A failed canary will have the promoted status set to false, the reason to failed and the last applied spec will be different from the last promoted one.
status:
canaryWeight: 0
conditions:
- lastTransitionTime: "2021-03-24T01:07:21Z"
lastUpdateTime: "2021-03-24T01:07:21Z"
message: Canary analysis failed, Deployment scaled to zero.
reason: Failed
status: "False"
type: Promoted
failedChecks: 0
iterations: 0
lastAppliedSpec: 6d5749b74f
lastTransitionTime: "2021-03-24T01:07:21Z"
phase: Failed
trackedConfigs: {}
Now let’s deploy the app and canary analysis setup and see this in action!