From 9765410f1175d5765e9a8855547e001ca66146c4 Mon Sep 17 00:00:00 2001 From: Alex Soul Date: Tue, 23 Feb 2021 09:53:47 +0000 Subject: [PATCH] Passed exam, w00t --- Part_2.md | 2 +- Part_4.md | 10 +++++----- exam.md | 18 ++++++++++++++++++ 3 files changed, 24 insertions(+), 6 deletions(-) create mode 100644 exam.md diff --git a/Part_2.md b/Part_2.md index 7eda5e9..0b24d80 100644 --- a/Part_2.md +++ b/Part_2.md @@ -318,7 +318,7 @@ So, if we take A = S/T, S = A*T and change our target availability to 99.8%
100% - 99.8% = 0.2%, 0.2% (in decimal) == 0.002
-Sucessful Requests (Really this is allowed errors based on target availability) = 0.002 * 1,000,000 = 2000 Errors, 1,000,000 - 2000 = 998,000 Successful requests +Successful Requests (Really this is allowed errors based on target availability) = 0.002 * 1,000,000 = 2000 Errors, 1,000,000 - 2000 = 998,000 Successful requests
##### Error budgets, what are they good for? diff --git a/Part_4.md b/Part_4.md index ea80422..d05d295 100644 --- a/Part_4.md +++ b/Part_4.md @@ -822,14 +822,14 @@ sudo service google-fluentd start Logs Viewer Query Interface -- View logsa through queries +- View logs through queries - Basic and Advanced query interface - Basic - Dropdown menus - simple searches - Advanced - View across log categories - advanced search capabilities -Basic and Advanced Filter Queries +Basic and Advanced Filter Queries - Different query formats - Search field syntax fifferent for each method - Basic query @@ -915,7 +915,7 @@ sudo service google-fluentd start #### Routing and Exporting Logs - Main premise - route a copy of logs from Cloud Logging to somewhere else - - BigQuery, Clous Storage, Pub/Sub, another logging bucket and more + - BigQuery, Cloud Storage, Pub/Sub, another logging bucket and more - Can export all logs, or certain logs based on defined criteria Why Route/Export logs? @@ -1017,7 +1017,7 @@ Custom logs based distribution metrics - https://sre.google/workbook/alerting-on-slos/ Alerts Review - Why we need them -- Somethign is not working correctly +- Something is not working correctly - Action is necessary to fix it - Alerts inform relevant personnel that action is necessary when specified conditions met @@ -1038,7 +1038,7 @@ Precision | Recall | Detection time | Reset time - Reset time: How long alerts persist after issue is resolved - Longer reset time = confusion/'white noise' -How to we balance these parameters? +How do we balance these parameters? - Window Length: Time period measured - % of errors over (x) time period - Example: average CPU utilization per minute vs. per hour diff --git a/exam.md b/exam.md new file mode 100644 index 0000000..4ab6020 --- /dev/null +++ b/exam.md @@ -0,0 +1,18 @@ + +Practice these scenario type questions: +1. You look after a system with a well-defined SLO... +2. You look after a web-site that's experiencing latency xyz? +3. How would you define an SLI for an application where you've tracked down high latency to the record generating systems, where the Persistent disk is resized to fix? + 1. IO + 2. A proportion + 3. +4. Kubernetes deployment strategy to roll out new version of application to half of the web nodes + 1. StatefulSet + 2. ReplicaSet + 3. Rolling-release with ??? +5. How to track billing of systems? + 1. Simply examine them within the cloud console? + 2. Add labels to groupings of resources and export to big query? +6. So as not to impact 3rd-party developers and users, how would you plan the roll-out of an updated API? + 1. Steps to follow e.g. Announce new api, notify user stil using old one, deprecate old one, provide support + 2. \ No newline at end of file