Typos, formatting

2021-02-18 12:11:46 +00:00 · 2021-02-18 12:11:46 +00:00 · 753053326c
commit 753053326c
parent 9f33d9d605
1 changed files with 24 additions and 24 deletions
--- a/Part_2.md
+++ b/Part_2.md
@ -319,19 +319,19 @@ So, if we take A = S/T, S = A*T and change our target availability to 99.8%
 100% - 99.8% = 0.2%, 0.2% (in decimal) == 0.002
 <br>
 Sucessful Requests (Really this is allowed errors based on target availability) = 0.002 * 1,000,000 = 2000 Errors, 1,000,000 - 2000 = 998,000 Successful requests
-
+<br>

 ##### Error budgets, what are they good for?

 1. Releasing new features
-  - Top use by the product team
+   1. Top use by the product team
 2. Expected system changes
-  - roll out enhancements, good to know you are covered should something go wrong
+   1. roll out enhancements, good to know you are covered should something go wrong
 3. Inevitable failure in networks, etc
 4. Planned downtime
-   1. e.g. take the entire system offline to implement a major upgrae
+   1. e.g. take the entire system offline to implement a major upgrade
 5. Risky experiments 
-6. Unforseen circumstances (unknown unknownes) e.g. Global pandemic!
+6. Unforseen circumstances (unknown unknowns) e.g. Global pandemic!

 #### Defining and Reducing Toil

@ -357,7 +357,7 @@ Toil Reduction Benefits

 3x Top Tips for Reducing Toil

-1. Identify toil - Make sure youu're differentiating it from overhead or actual engineering
+1. Identify toil - Make sure you're differentiating it from overhead or actual engineering
 2. Estimate the time to automate - Make sure the benefits outweigh the cost
 3. Measure everything including context switching e.g. the time it takes you to switch to a new task and become involved in it

@ -407,7 +407,7 @@ Logging - Append-only record of events
 * Inherent delay between when an event occurs and when it is visible in logs
 * Logs can be proccessed with a batch system, interrogated with ad hoc queries and visualised with dashboards
 * Use logs to find the root cause of an issue, as the information needed is often not available as a metric
-* For non-time-sensitive reporting, generate details reports using log processing systems
+* For non-time-sensitive reporting, generate detailed reports using log processing systems
 * Logs will nearly always produce more accurate data than metrics

 #### Alerting Principals
@ -516,10 +516,10 @@ Roles should include:

 2. Established Command Post
   - The "post" could be a physical location or, more likely in a large company, a communication venue such as a slack channel
-
+<br>
 3. Live Incident State Document
-   - A shared docuement that reflects the current state of the incident, updated as necessary and retained for postmortem
-
+   - A shared document that reflects the current state of the incident, updated as necessary and retained for postmortem
+<br>
 4. Clear, Real-time Handoff
   - If the day is ending and the issue remains unresolved, an explicit handoff to another incident commander must take place

@ -534,7 +534,7 @@ If yes to any of these questions:
 Incident Management Best Practices

 - Develop and document procedures
- Prioritize damage and restore service - Take care of the buggest issues first
+- Prioritize damage and restore service - Take care of the biggest issues first
 - Trust team members - Give team autonomy they need without second guessing
 - If overwhelmed, get help
 - Consider response alternatives
@ -605,13 +605,13 @@ What a postmortem is not:
    - Time to identify
    - Time to act
    - Time to resolve
-
+<br>
 - Recreate timeline
  - When and how was the incident reported?
  - When did the response start?
  - When and how did we make it better?
  - When was it over?
-
+<br>
 - Generate report
  - Report will be initiated by the incident commander
  - All participants need to add their own details on actions taken
@ -629,20 +629,20 @@ Production Meeting Collaboration

 1. Upcoming production changes
   - Default to enabling change, which requires tracking the useful properties of that change: start time, duration, expected effect and so on. This is called near-term horizon visibility
-
+<br>
 2. Metrics
   - Review current SLOs, even if they are in line. Track how latency figures, CPU utilization figures, etc.. change over time
-
+<br>
 3. Outages
   - The big picture portion of the meeting can be devoted to a synopsis of the postmortem or working on the process
-
+<br>
 4. Paging Events
   - The tactical view: the list of pages, who was pages, what happened then, and so on. Two primary questions: should that alert have paged the way it did, and should it have paged at all?
-
-5. Nonpaging Event #1
+<br>
+5. Nonpaging Event `#1`
   - What events didn't get paged, but probably should have?
-
-6. Nonpaging Event #2 and #3
+<br>
+6. Nonpaging Event `#2` and `#3`
   - What events occured that are not pageable and require attention? What events are not pageable and do not require attention?

 ####