From 64faa6a4257d55547523281b7f8eaf646e2c77b7 Mon Sep 17 00:00:00 2001
From: Dzmitry Kliapkou <51699361+dzmitrykliapkou@users.noreply.github.com>
Date: Fri, 27 Mar 2026 15:16:18 +0300
Subject: [PATCH] Create dzmitry-kliapkou.md

---
 progress/dzmitry-kliapkou.md | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)
 create mode 100644 progress/dzmitry-kliapkou.md

diff --git a/progress/dzmitry-kliapkou.md b/progress/dzmitry-kliapkou.md
new file mode 100644
index 000000000..33dd7126a
--- /dev/null
+++ b/progress/dzmitry-kliapkou.md
@@ -0,0 +1,22 @@
+2026-03-27 Fri:
+
+This week main focus was onboarding into the project, understanding existing infrastructure, workflows and operational practices. I spent time reviewing current Ansible setup, monitoring stack, alerting rules and deployment processes to build a complete picture of the system.
+
+From the infrastructure side, I reworked the inventory structure. Previously, the project was using monolithic TOML inventory files where hosts, groups and variables were mixed together. I migrated this to a YAML-based inventory with proper separation into group_vars and host_vars. This significantly improves maintainability and reduces risk of configuration errors.
+
+During this migration, I encountered an issue with boolean values being rendered incorrectly in the Harmony TOML configuration (uppercase True/False coming from YAML). I updated the template to safely normalize boolean values to lowercase, ensuring compatibility with the application config format.
+
+Additionally, I improved monitoring configuration:
+
+- Improved generation of Prometheus scrape configs via Ansible templates
+- Added alert for missing log ingestion (promtail not sending logs)
+- Excluded archival nodes from disk usage percentage alerts, since their large disks made percentage-based alerts noisy and not actionable
+
+From the deployment side, I extended existing functionality for using local Harmony binaries. Previously this was supported only in the upgrade playbook; I added the same capability to the install playbook, enabling more flexible testing of development builds.
+
+Operationally, I handled routine on-call activities:
+- Responded to Grafana alerts and investigated triggered issues
+- Performed Harmony app updates on several devnet hosts upon developer request
+- Continued familiarization with system behavior under real incidents
+
+Overall, the week was focused on improving infrastructure clarity, fixing configuration edge cases, and establishing a more maintainable foundation for further work.