Passed exam, w00t

This commit is contained in:
Alex Soul 2021-02-23 09:53:47 +00:00
parent bd5304a097
commit 9765410f11
3 changed files with 24 additions and 6 deletions

View File

@ -318,7 +318,7 @@ So, if we take A = S/T, S = A*T and change our target availability to 99.8%
<br>
100% - 99.8% = 0.2%, 0.2% (in decimal) == 0.002
<br>
Sucessful Requests (Really this is allowed errors based on target availability) = 0.002 * 1,000,000 = 2000 Errors, 1,000,000 - 2000 = 998,000 Successful requests
Successful Requests (Really this is allowed errors based on target availability) = 0.002 * 1,000,000 = 2000 Errors, 1,000,000 - 2000 = 998,000 Successful requests
<br>
##### Error budgets, what are they good for?

View File

@ -822,14 +822,14 @@ sudo service google-fluentd start
<u>Logs Viewer Query Interface</u>
- View logsa through queries
- View logs through queries
- Basic and Advanced query interface
- Basic
- Dropdown menus - simple searches
- Advanced
- View across log categories - advanced search capabilities
<u>Basic and Advanced Filter Queries
<u>Basic and Advanced Filter Queries</u>
- Different query formats
- Search field syntax fifferent for each method
- Basic query
@ -915,7 +915,7 @@ sudo service google-fluentd start
#### Routing and Exporting Logs
- Main premise - route a copy of logs from Cloud Logging to somewhere else
- BigQuery, Clous Storage, Pub/Sub, another logging bucket and more
- BigQuery, Cloud Storage, Pub/Sub, another logging bucket and more
- Can export all logs, or certain logs based on defined criteria
<u>Why Route/Export logs?</u>
@ -1017,7 +1017,7 @@ Custom logs based distribution metrics
- https://sre.google/workbook/alerting-on-slos/
<u>Alerts Review - Why we need them</u>
- Somethign is not working correctly
- Something is not working correctly
- Action is necessary to fix it
- Alerts inform relevant personnel that action is necessary when specified conditions met
@ -1038,7 +1038,7 @@ Precision | Recall | Detection time | Reset time
- Reset time: How long alerts persist after issue is resolved
- Longer reset time = confusion/'white noise'
<u>How to we balance these parameters?</u>
<u>How do we balance these parameters?</u>
- Window Length: Time period measured
- % of errors over (x) time period
- Example: average CPU utilization per minute vs. per hour

18
exam.md Normal file
View File

@ -0,0 +1,18 @@
Practice these scenario type questions:
1. You look after a system with a well-defined SLO...
2. You look after a web-site that's experiencing latency xyz?
3. How would you define an SLI for an application where you've tracked down high latency to the record generating systems, where the Persistent disk is resized to fix?
1. IO
2. A proportion
3.
4. Kubernetes deployment strategy to roll out new version of application to half of the web nodes
1. StatefulSet
2. ReplicaSet
3. Rolling-release with ???
5. How to track billing of systems?
1. Simply examine them within the cloud console?
2. Add labels to groupings of resources and export to big query?
6. So as not to impact 3rd-party developers and users, how would you plan the roll-out of an updated API?
1. Steps to follow e.g. Announce new api, notify user stil using old one, deprecate old one, provide support
2.