One of the things that's not immediately apparent in an initiative like this is the huge amount of effort required to provision, deploy and manage a system of this scale, across a whole bunch of different teams. We couldn't have done any of this without a tremendous group of SREs with the wisdom to invest heavily in automation, in addition to all the hard work.
We'll follow up in a week or two with a blog post highlighting some of the work here. I think the framework for disk remediation (automatically detecting failures, re-provisioning, etc) is particularly interesting.
One of the things that's not immediately apparent in an initiative like this is the huge amount of effort required to provision, deploy and manage a system of this scale, across a whole bunch of different teams. We couldn't have done any of this without a tremendous group of SREs with the wisdom to invest heavily in automation, in addition to all the hard work.
We'll follow up in a week or two with a blog post highlighting some of the work here. I think the framework for disk remediation (automatically detecting failures, re-provisioning, etc) is particularly interesting.