Since that last DevOpsCon presentation, the team here at Daysha DevOps took some time out to publish a book titled One Feature at a Time (1FaaT™). It is our collective learnings over the previous eight or so years since we embraced DevOps as a true north for our clients.
Assuming a client is doing the right thing, the next priority is to do the thing right. Broadly speaking, that means our clients prioritize features by value and optimize the delivery process to put those features into production as soon as possible at the lowest risk of failure.
One of the key messages we are hearing from client product teams today is that a feature is worthless until it is in the customers’ or end users’ hands. Any improvement in the speed at which features are delivered is welcomed but not at the expense of lower quality.
First, a quick refresh rehash on options for reducing cycle time.
Continuous delivery is production quality software with a ‘push button’ to deploy all of the features as one release. This approach is often used by product teams that want to market a group of features as a branded version or by regulated organizations where a CAB authorizes releases.
Continuous deployment means every commit goes straight to production or returns an error message pinpointing the cause of the error.
Progressive delivery (PD) is a hybrid version of these two approaches — software is released but gradually deployed or deprecated as the case arises. By separating the release of the feature from its deployment we retain control over when and how a feature can be impactful. If the feature is buggy or not delivering the value we expect — we can reverse its deployment. If it’s working as designed, we proceed at a pace that is within our control.
Why are organizations motivated to invest in PD?
Increased customer experimentation to discover value at lower cost
Website booking.com runs circa 30,000 experiments per annum and is likely to have 1,000 such experiments running concurrently. They adapt to their customers’ changing needs by constantly evaluating their needs. There is no room for hunch or a product owner’s bias. ‘Show me the data’ is the only means by which product owners can drive decisions and value realization.
Scale safely and without loss of brand reputation
Ryanair.com implemented their 2.0 website (in a microservices cloud architecture) as a progressive deployment, gradually phasing out 1.0 features and replacing them with the newer edition. 1.0 had not been an easy big-bang deployment. This is a link to Ryanair’s presentation at a Daysha event late last year. Bottom line Ryanair customers never noticed the upgrade was in progress over 6 months.
Scaling teams
Organizations that are growing their engineering team size and product breadth will be releasing more features at a rate that could create a cognitive overload. As organizations scale, it’s not unusual to find teams building the same feature or releasing features that collide (this is different from breaking the build) at the user end. In this respect, PD is a form of going slow to go fast.
Reduce friction between product and engineering functions
Tensions naturally arise between product and engineering teams. Product is interested in what is delivered and engineering in how this is done. Organizations that continuously deploy address this, but when a feature is not delivering value it has to be deprecated or left in situ, adding to technical debt. With PD product, teams can toggle features on or off thereby obviating the need to deploy and then remove the feature in a subsequent deployment. The toggled code will need to be removed if the feature is not adding value, but no time is lost in the process.
What are the prerequisites for progressive delivery?
- Continuous delivery – which implies solid continuous integration and automated end-to-end testing.
- Deployment management system – the DMS links all the disparate systems (the change set from source control, the bug tracking system, code review comments, testing results, etc.) and the developers responsible for the update. Ultimately, the DMS is used to schedule the deployment and as a source of truth if things go awry. A list of commercially available DMS is available here on Stackify.
- Deployment tool – to compile code, build environments and configuration, install software, and roll back if required.
- Monitoring, for two reasons. First, to anticipate and avert an incident, or if it occurs, to log it so that a post-mortem can be undertaken. The second purpose is to measure the value.
- Highly cohesive, loosely coupled software which makes small changes more likely to be isolated. Small deployment units allow updating of software with higher precision and give the release engineering team flexibility in not releasing problematic updates. Microservices ticks all of these boxes.
- Delivery teams – which includes accountability for the developer to see code into production and to own it until it is stable or deprecated.
- A management framework to measure progress. Our preference is for the 4 DORA metrics:
- Cycle time
- Change failure
- Number of releases (adapted for PD)
- Mean time to recover
What types of organizations don’t benefit from using these techniques?
Codebases that are in maintenance mode or where there are only platform changes and no new features.
What are the specifics around implementing PD?
Feature Flagging is a simple boolean operation to turn on or off a feature in deployed code. Here is a Java example:
public class FeatureFlag {
private boolean isEnabled;
public FeatureFlag() {
// Set default value for feature flag
isEnabled = false;
}
public boolean isEnabled() {
return isEnabled;
}
public void setEnabled(boolean isEnabled) {
this.isEnabled = isEnabled;
}
}
Here we have defined a FeatureFlag class with a single boolean field isEnabled. The constructor defaults the value of the flag to false, but this can be changed to true based on some conditions.
Dark Launch (not to be confused with the name of a company). This is a deployment strategy where changes are released during off-peak hours; or where code is installed on all servers but configured so that users do not see their effects because their user interface components are switched off. This is testing in production without any potential for brand damage.
Shadow Testing, which can also be used as a blue-green deployment later on. Production traffic is cloned and sent to a set of shadow machines that execute newer code than production. Results between production, and shadow environments can be automatically compared and discrepancies reported as failures. This is now far easier to do in the cloud where clusters can be spun up via scripts and terminated thereafter.
Blue-Green Deployment. Following on from a successful shadow test where all discrepancies are remediated, the shadow can be promoted to prod, and the older production instance terminated once the new instance is deemed stable. Very often and subject to cost, this can mean leaving the old and new instances permanently in place. The term originated when a physical switch on a router could be used to redirect web traffic between instances.
Canary Deployment. Back when mining coal was a business, miners would take canaries in cages with them as they descended to the coal face. If the canary keeled over, it was time to turn back as noxious gasses were in the atmosphere.
This concept is a useful PD technique to allow teams to deploy based on a rule such as location or % of traffic. If the first location or 5% works, then the deployment can be continued. A service mesh such as Open Source Istio is a big help as a rules engine to control traffic flow via rules to your container pods.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:v1
ports:
- containerPort: 8080
---
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: myapp
spec:
host: myapp
subsets:
- name: v1
labels:
app: myapp
version: v1
- name: v2
labels:
app: myapp
version: v2
---
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: myapp
spec:
hosts:
- myapp
http:
- route:
- destination:
host: myapp
subset: v1
weight: 100
- destination:
host: myapp
subset: v2
weight: 0
mirror:
host: myapp
subset: v2
mirrorPercentage:
value: 100
Here, we created a deployment for our application myapp, with a single replica and the image tagged with v1. We are also defining two subsets in the Istio DestinationRule for the myapp host — v1 and v2, with labels indicating the version of the application.
Next, we created a VirtualService to route traffic to the myapp host, with two routes defined — one to v1 and one to v2. We use a weight of 100 for the v1 route and 0 for the v2 route, effectively routing all traffic to the v1 subset. We also define a mirror to v2 and set the mirror percentage to 100, which means all traffic going to v1 will also be mirrored to v2.
To perform the canary deployment, you would update the myapp deployment with the new image tagged with v2, and update the VirtualService to gradually increase the weight of the v2 route and decrease the weight of the v1 route until all 100% of the traffic is being routed to the v2 subset and 0% to v1.
A critical element of this is the observation of the feature that is being deployed. This can be a human eyeball or automation to either continue the deployment or stop and roll back to v1.
This all sounds like common sense, so what stops organizations from implementing PD?
Our book One Feature at a Time subtitle includes the words ‘overcoming cultural debt’, which is a polite way of saying sometimes we need to get out of our own way.
The concept of cultural debt includes process decisions that were made before PD existed but organizations have locked these older ways of working into their governance or they have become ingrained and people can’t recall why we ‘do things this way’. Organizations with a rigid or hierarchical culture may resist adopting progressive delivery because it requires increased collaboration, the democratization of decision-making, and transparency between teams.
Aside from the people, it’s also true to say that legacy infrastructure makes it more difficult to implement the modern tooling and automation required for PD. Note the use of the words more difficult — but not impossible.
STAY TUNED
Learn more about DevOpsCon
References
[1] https://studio.youtube.com/video/DeadhB2bRgI/edit Fabrizio Fortunato, Head of Front End Development Ryanair Labs.
[2] All code samples provided by ChatGPT.