From 64faa6a4257d55547523281b7f8eaf646e2c77b7 Mon Sep 17 00:00:00 2001 From: Dzmitry Kliapkou <51699361+dzmitrykliapkou@users.noreply.github.com> Date: Fri, 27 Mar 2026 15:16:18 +0300 Subject: [PATCH] Create dzmitry-kliapkou.md --- progress/dzmitry-kliapkou.md | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 progress/dzmitry-kliapkou.md diff --git a/progress/dzmitry-kliapkou.md b/progress/dzmitry-kliapkou.md new file mode 100644 index 000000000..33dd7126a --- /dev/null +++ b/progress/dzmitry-kliapkou.md @@ -0,0 +1,22 @@ +2026-03-27 Fri: + +This week main focus was onboarding into the project, understanding existing infrastructure, workflows and operational practices. I spent time reviewing current Ansible setup, monitoring stack, alerting rules and deployment processes to build a complete picture of the system. + +From the infrastructure side, I reworked the inventory structure. Previously, the project was using monolithic TOML inventory files where hosts, groups and variables were mixed together. I migrated this to a YAML-based inventory with proper separation into group_vars and host_vars. This significantly improves maintainability and reduces risk of configuration errors. + +During this migration, I encountered an issue with boolean values being rendered incorrectly in the Harmony TOML configuration (uppercase True/False coming from YAML). I updated the template to safely normalize boolean values to lowercase, ensuring compatibility with the application config format. + +Additionally, I improved monitoring configuration: + +- Improved generation of Prometheus scrape configs via Ansible templates +- Added alert for missing log ingestion (promtail not sending logs) +- Excluded archival nodes from disk usage percentage alerts, since their large disks made percentage-based alerts noisy and not actionable + +From the deployment side, I extended existing functionality for using local Harmony binaries. Previously this was supported only in the upgrade playbook; I added the same capability to the install playbook, enabling more flexible testing of development builds. + +Operationally, I handled routine on-call activities: +- Responded to Grafana alerts and investigated triggered issues +- Performed Harmony app updates on several devnet hosts upon developer request +- Continued familiarization with system behavior under real incidents + +Overall, the week was focused on improving infrastructure clarity, fixing configuration edge cases, and establishing a more maintainable foundation for further work.