Section completed
This commit is contained in:
parent
649969cdf7
commit
46c836a284
39
Part_2.md
39
Part_2.md
@ -235,3 +235,42 @@ SLIs drives SLOs, SLOs inform the SLA
|
||||
|
||||
- How does the SLO inform the SLA; example:
|
||||
|
||||
We want our SLO to be at or under 200ms, and we therefore we want to set our SLA at a higher value e.g. 300ms, and beyond that, "hair on fire ms". You want to set your SLA significantly different to the SLO because you don't want to be consistently setting your customers high on fire. Moreover your SLA's should be routinely achievable because you are working towards your objective (SLO) constantly.
|
||||
|
||||
To summarize:
|
||||
|
||||
SLO:
|
||||
- Internal targets that guide prioritization
|
||||
- Represents desired user experience
|
||||
- Missing objectives should have consequences
|
||||
|
||||
SLA
|
||||
- Set level just enough to keep customers
|
||||
- Incentizes minimum level of service
|
||||
- Looser than corresponding objectives
|
||||
|
||||
An SLI is an indicator of how things are at some particular point in time. Are things good or bad right now? - If our SLI doesn't always confidently tell us that, then it's not a good SLI. An SLO asks whether that indicator has been showing what we want it to for most of the time period that we care about. Has the SLI met our objective? In particular have things been the right amount of both good and bad? An SLA is our agreement of what happens if we don't meet our objective. What's our punishment for doing worse than we agreed we would. The SLI is like the speed of a car. It's travelling at some particular speed right now. The SLO is the speed limit at the upper end and the expected travel time for your trip at the lower end. You want to go approximately some speed over the course of your journey. The SLA is like getting a speeding ticket because you drove to fast, or driving too slowly.
|
||||
|
||||
### Making the Most of Risk
|
||||
|
||||
#### Setting Error Budgets
|
||||
|
||||
#### Defining and Reducing Toil
|
||||
|
||||
### Generating SRE Metrics
|
||||
|
||||
#### Monitoring Reliability
|
||||
|
||||
#### Alerting Principals
|
||||
|
||||
#### Investigating SRE Tools
|
||||
|
||||
### Reacting to Incidents
|
||||
|
||||
#### Handling Incident Response
|
||||
|
||||
#### Managing Service Lifecycle
|
||||
|
||||
#### Ensuring Healthly Operations Collaboration
|
||||
|
||||
####
|
||||
Loading…
Reference in New Issue
Block a user