Merge ~ballot/charm-telegraf/+git/telegraf-charm:collection_jitter into charm-telegraf:master

Proposed by Benjamin Allot
Status: Merged
Approved by: James Troup
Approved revision: e0c262d0c2a2c78e03f6de1ac1caef784ea2e2bd
Merged at revision: 1c088cc23a77b07dbc2b9cb0544c4b31a8dfb9ef
Proposed branch: ~ballot/charm-telegraf/+git/telegraf-charm:collection_jitter
Merge into: charm-telegraf:master
Diff against target: 21 lines (+2/-2)
1 file modified
src/config.yaml (+2/-2)
Reviewer Review Type Date Requested Status
Canonical IS Reviewers Pending
BootStack Reviewers Pending
Review via email: mp+394475@code.launchpad.net

Commit message

Change the default collection_jitter value to "5s"

To post a comment you must log in.
Revision history for this message
Canonical IS Mergebot (canonical-is-mergebot) wrote :

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message
James Troup (elmo) wrote :

What (range of) latency will this introduce to Telegraf metrics displayed in Grafana?

Revision history for this message
Benjamin Allot (ballot) wrote :

> What (range of) latency will this introduce to Telegraf metrics displayed in
> Grafana?

Instead of having a collection every <interval> (10 seconds by default), it will get a collection between the <interval+collection_jitter> *for each plugin*. So a same instance of telegraf could gather bcache metrics at, say, 11 seconds, and the procstat one at 13 seconds.

The precision of the jitter is the nanosecond though [0][1]

[0]: https://github.com/influxdata/telegraf/blob/v1.16.2/agent/tick.go#L35
[1]: https://golang.org/pkg/time/#Duration

I'll try to gather a snapshot of a graph with "before/after" the change on a compute node and all instances' telegraf.
Changing just on a compute node hasn't been conclusive.

Revision history for this message
Benjamin Allot (ballot) wrote :

A compute node with a significant amount of telegraf inside VMs has been configured with the jitter and the result are really good. The periodic spike (every 1h45) has disappeared.

For me, this is a big +1 to go with this change.

Revision history for this message
Canonical IS Mergebot (canonical-is-mergebot) wrote :

Change successfully merged at revision 1c088cc23a77b07dbc2b9cb0544c4b31a8dfb9ef

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/src/config.yaml b/src/config.yaml
2index 0d02176..66f6105 100644
3--- a/src/config.yaml
4+++ b/src/config.yaml
5@@ -39,14 +39,14 @@ options:
6 interval. Maximum flush_interval will be flush_interval + flush_jitter
7 flush_jitter:
8 type: string
9- default: "0s"
10+ default: "5s"
11 description: >
12 Jitter the flush interval by a random amount. This is primarily to avoid
13 large write spikes for users running a large number of telegraf instances.
14 ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
15 collection_jitter:
16 type: string
17- default: "0s"
18+ default: "5s"
19 description: >
20 Collection jitter is used to jitter the collection by a random amount.
21 Each plugin will sleep for a random time within jitter before collecting.

Subscribers

People subscribed via source and target branches