refactor: replace the collectd mapper with local flows#4136
refactor: replace the collectd mapper with local flows#4136didier-wenzek wants to merge 14 commits intothin-edge:mainfrom
Conversation
Robot Results
Failed Tests
|
a8ba280 to
e460f13
Compare
e460f13 to
e3a4f91
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! 🚀 New features to boost your workflow:
|
| Check running collectd | ||
| Service Should Be Running collectd | ||
|
|
||
| Is collectd publishing MQTT messages? | ||
| ${messages}= Should Have MQTT Messages topic=collectd/# minimum=1 maximum=None | ||
|
|
There was a problem hiding this comment.
These checks have been moved to the suite setup. They are not system tests but prerequisite checks.
| @@ -0,0 +1,18 @@ | |||
| // Translate an array of thin-edge measurements into a single message with grouped measurements | |||
There was a problem hiding this comment.
I consider promoting this script to a builtin transformer: grouping measurements is a quite common use case.
There was a problem hiding this comment.
That would be a good addition. The input topic would also have to be in the tedge measurements format, right?
There was a problem hiding this comment.
The input topic would also have to be in the tedge measurements format, right?
I still have to see what's the best approach to decompose the flow into independent steps that can be re-used / re-combined.
| let batch_config = BatchConfigBuilder::new() | ||
| .event_jitter(500) | ||
| .delivery_jitter(400) // Heuristic delay that should work out well on a Rpi | ||
| .message_leap_limit(500) | ||
| .build(); |
There was a problem hiding this comment.
These configuration parameters are simply hairy to get correct, the batching algorithm being complicated and fragile with two independent clocks (batching time and event time). This complexity is furthermore useless in practice, the best way to avoid too many collectd measurements being to reduce the collection frequency.
dedbc3a to
e00dafc
Compare
| fn event_time(&self) -> OffsetDateTime { | ||
| self.timestamp | ||
| .map(|t| t.into()) | ||
| .unwrap_or_else(OffsetDateTime::now_utc) | ||
| } |
There was a problem hiding this comment.
This defeats the purpose of the batcher: i.e using the processing time to batch message, ignoring the event time.
I will fix that with a flow config giving a JSON path to the message event time.
There was a problem hiding this comment.
Finally, this code will be let unchanged, i.e. the timestamp of the message being used as the event time.
For that purpose the flow engine has been improved. Now, message timestamps can be set by a script and used by the subsequent steps.
| type Key = String; | ||
|
|
||
| fn key(&self) -> Self::Key { | ||
| self.topic.clone() |
There was a problem hiding this comment.
Using the topic as a key make the batcher transformer a bit complicated to use: the previous step needs to be aware that the topic is used as a key by the batcher.
I will fix that with a way to configure the transformer with a path to the key.
rina23q
left a comment
There was a problem hiding this comment.
I guess still code is in progress, so I reviewed the documentation changes.
| ## Enabling the systemd watchdog feature for a tedge service | ||
|
|
||
| Enabling systemd watchdog for a %%te%% service (tedge-agent, tedge-mapper-c8y/az/collectd) is a two-step process. | ||
| Enabling systemd watchdog for a %%te%% service (tedge-agent, tedge-mapper-c8y/az) is a two-step process. |
There was a problem hiding this comment.
(unrelated) Out of curiosity, The watchdog feature doesn't work also for aws and local mappers?
I found this document is actually unlisted, so maybe it doesn't matter how precise information we have here.
| Run `tedge_mapper --debug aws` to log more debug messages | ||
| ::: | ||
|
|
||
| ### Device monitoring logs {#device-logs} |
There was a problem hiding this comment.
Probably we can also remove the section "Collectd logs" below (inside "Third-party component logs"). Instead, we can mention how to see the log of collectd in the new repository for the collectd flow.
| Install Collectd | ||
| Execute Command | ||
| ... sudo apt-get update && sudo apt-get install -y --no-install-recommends collectd-core libmosquitto1 | ||
| Execute Command sudo cp /etc/tedge/contrib/collectd/collectd.conf /etc/collectd/collectd.conf |
There was a problem hiding this comment.
I guess this collectd.conf file is also a candidate to move out of the thin-edge.io repository later.
e00dafc to
9bb4852
Compare
| ${start_time}= Get Unix Timestamp | ||
| Execute Command | ||
| ... tedge mqtt pub collectd/localhost/temperature/temp1 "`date +%s.%N`:50" && tedge mqtt pub collectd/localhost/temperature/temp2 "`date +%s.%N`:40" && tedge mqtt pub collectd/localhost/pressure/pres1 "`date +%s.%N`:10" && tedge mqtt pub collectd/localhost/pressure/pres2 "`date +%s.%N`:20" | ||
| Execute Command tedge mqtt pub collectd/localhost/temperature/temp1 "`date +%s.%N`:50" |
There was a problem hiding this comment.
IIRC, the chaining of the pub commands in the former version was done to minimise the timestamp drift between the first measurement and the last measurement, to make sure that they get batched together. So, this version might end up being flaky.
This was the point that we always wanted to improve, at least by making the batch window configurable, so that we don't have to rely on fragile tricks like that.
There was a problem hiding this comment.
So, this version might end up being flaky.
This is not what I observe:
$ invoke flake-finder --test-name "Check grouping of measurements" --iterations 25 --outputdir output_ff --clean
------------------------------
Overall: PASSED
Results: 25 iterations, 25 passed, 0 failed
Elapsed time: 0:10:34.572869
This was the point that we always wanted to improve, at least by making the batch window configurable,
I will do that in this PR.
| let grouped_measurements = {} | ||
|
|
||
| for (let measurement of measurements) { | ||
| Object.assign(grouped_measurements, measurement.payload) |
There was a problem hiding this comment.
That would get the timestamp wrong, no? The timestamp of the last measurement would replace the former ones, while the expectation is for the batch to keep the timestamp of the first measurement that triggered the batch.
There was a problem hiding this comment.
I would not say this is wrong. Not your expectation apparently, but as meaningful.
Anyway, this script is only used for testing purposes and will be replaced by a builtin transformer.
An the joke side. According this comment, messages must happen in a blink to be batched! So first or last timestamp should not be a big deal ;-)
| @@ -0,0 +1,18 @@ | |||
| // Translate an array of thin-edge measurements into a single message with grouped measurements | |||
There was a problem hiding this comment.
That would be a good addition. The input topic would also have to be in the tedge measurements format, right?
The fact that the batcher is always initialized with default values highlights that something is wrong. It's really difficult to pick sensible values. And in practice nobody care. Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
The following sections have still to be deeply rewritten: - Getting Started / A tour of thin-edge.io / Monitor the device - Getting Started / Monitoring - Operate Devices / Troubleshooting / Device Monitoring Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
A value created in JS using `new Date()` cannot be directly returned to the flow engine. A script has then to invoke `valueOf()` method to assign a date to a message timestamp. Signed-off-by: Didier Wenzek <didier.wenzek@free.fr>
89338e2 to
080f3a6
Compare
Proposed changes
The goal of this PR is to deprecate the collectd mapper providing the tools to implement the same using flows
Types of changes
Paste Link to the issue
Checklist
just prepare-devonce)just formatas mentioned in CODING_GUIDELINESjust checkas mentioned in CODING_GUIDELINESFurther comments