Passed exam, w00t

2021-02-23 09:53:47 +00:00 · 2021-02-23 09:53:47 +00:00 · 9765410f11
commit 9765410f11
parent bd5304a097
3 changed files with 24 additions and 6 deletions
--- a/Part_2.md
+++ b/Part_2.md
@ -318,7 +318,7 @@ So, if we take A = S/T, S = A*T and change our target availability to 99.8%
 <br>
 100% - 99.8% = 0.2%, 0.2% (in decimal) == 0.002
 <br>
-Sucessful Requests (Really this is allowed errors based on target availability) = 0.002 * 1,000,000 = 2000 Errors, 1,000,000 - 2000 = 998,000 Successful requests
+Successful Requests (Really this is allowed errors based on target availability) = 0.002 * 1,000,000 = 2000 Errors, 1,000,000 - 2000 = 998,000 Successful requests
 <br>

 ##### Error budgets, what are they good for?
--- a/Part_4.md
+++ b/Part_4.md
@ -822,14 +822,14 @@ sudo service google-fluentd start

 <u>Logs Viewer Query Interface</u>

- View logsa through queries
+- View logs through queries
 - Basic and Advanced query interface
 - Basic
  - Dropdown menus - simple searches
 - Advanced
  - View across log categories - advanced search capabilities

-<u>Basic and Advanced Filter Queries
+<u>Basic and Advanced Filter Queries</u>
 - Different query formats
  - Search field syntax fifferent for each method
 - Basic query
@ -915,7 +915,7 @@ sudo service google-fluentd start
 #### Routing and Exporting Logs

 - Main premise - route a copy of logs from Cloud Logging to somewhere else
-  - BigQuery, Clous Storage, Pub/Sub, another logging bucket and more
+  - BigQuery, Cloud Storage, Pub/Sub, another logging bucket and more
 - Can export all logs, or certain logs based on defined criteria

 <u>Why Route/Export logs?</u>
@ -1017,7 +1017,7 @@ Custom logs based distribution metrics
 - https://sre.google/workbook/alerting-on-slos/

 <u>Alerts Review - Why we need them</u>
- Somethign is not working correctly
+- Something is not working correctly
 - Action is necessary to fix it
 - Alerts inform relevant personnel that action is necessary when specified conditions met

@ -1038,7 +1038,7 @@ Precision | Recall | Detection time | Reset time
 - Reset time: How long alerts persist after issue is resolved
  - Longer reset time = confusion/'white noise'

-<u>How to we balance these parameters?</u>
+<u>How do we balance these parameters?</u>
 - Window Length: Time period measured
  - % of errors over (x) time period
    - Example: average CPU utilization per minute vs. per hour
--- a/exam.md
+++ b/exam.md
@ -0,0 +1,18 @@
+
+Practice these scenario type questions:
+1. You look after a system with a well-defined SLO...
+2. You look after a web-site that's experiencing latency xyz?
+3. How would you define an SLI for an application where you've tracked down high latency to the record generating systems, where the Persistent disk is resized to fix?
+   1. IO
+   2. A proportion
+   3. 
+4. Kubernetes deployment strategy to roll out new version of application to half of the web nodes
+   1. StatefulSet
+   2. ReplicaSet
+   3. Rolling-release with ???
+5. How to track billing of systems?
+   1. Simply examine them within the cloud console?
+   2. Add labels to groupings of resources and export to big query?
+6. So as not to impact 3rd-party developers and users, how would you plan the roll-out of an updated API?
+   1. Steps to follow e.g. Announce new api, notify user stil using old one, deprecate old one, provide support
+   2.