Available, reliable and testable

We design for automation.
Platform operations that are repeatable and are required to be run often should be automated.
Where things can’t be automated we aim for runnable documentation.

We make small, frequent changes using modern pipelines with validation built in.
Our CI/CD pipelines should contain every step required to build, test and release a change to production and prove it works.

We make reversible changes using modern deployment practices like blue-green.

Blast-radius reduction and redundancy are factored in to our design decisions.

Observability is part of every change we make.
Knowing the state of the platform and having alerting means we can ensure everything is running as it should be, and take preventative measures.

Disaster recovery is part of every change we make.
We have robust operational procedures to ensure our recovery point and recovery time objectives are met.
We use IaC to ensure we can quickly, easily and repeatably recover to a known state.