Merge ~ballot/content-cache-charm/+git/content-cache-charm:prometheus.lua into content-cache-charm:master

Proposed by Benjamin Allot
Status: Merged
Approved by: Benjamin Allot
Approved revision: 96337dcd1b7eaca2a9cefc479e14e53236129bf3
Merged at revision: 3db5170393c32945341cc953684baa807fa9a4aa
Proposed branch: ~ballot/content-cache-charm/+git/content-cache-charm:prometheus.lua
Merge into: content-cache-charm:master
Diff against target: 633 lines (+627/-0)
1 file modified
files/prometheus.lua (+627/-0)
Reviewer Review Type Date Requested Status
Joel Sing (community) +1 Approve
Canonical IS Reviewers Pending
Review via email: mp+372178@code.launchpad.net

Commit message

Preparing Nginx metrics, use the prometheus.lua library

To post a comment you must log in.
Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote :

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message
Benjamin Allot (ballot) wrote :

I was wondering if a subtree or a submodules would be better. For one file, I was not sure so I went the easiest way.
Happy to convert to subtree/submodule if needed.

Revision history for this message
Joel Sing (jsing) wrote :

LGTM

There are a few style nits, but since this is a copy I'm ignoring these.

Also, be aware that there may be a bug (or possibly intentional behaviour) with histograms, where they cannot be reset.

review: Approve (+1)
Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote :

Change successfully merged at revision 3db5170393c32945341cc953684baa807fa9a4aa

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/files/prometheus.lua b/files/prometheus.lua
2new file mode 100644
3index 0000000..00b70fd
4--- /dev/null
5+++ b/files/prometheus.lua
6@@ -0,0 +1,627 @@
7+-- vim: ts=2:sw=2:sts=2:expandtab
8+--
9+-- This module uses a single dictionary shared between Nginx workers to keep
10+-- all metrics. Each counter is stored as a separate entry in that dictionary,
11+-- which allows us to increment them using built-in `incr` method.
12+--
13+-- Prometheus requires that (a) all samples for a given metric are presented
14+-- as one uninterrupted group, and (b) buckets of a histogram appear in
15+-- increasing numerical order. We satisfy that by carefully constructing full
16+-- metric names (i.e. metric name along with all labels) so that they meet
17+-- those requirements while being sorted alphabetically. In particular:
18+--
19+-- * all labels for a given metric are presented in reproducible order (the one
20+-- used when labels were declared). "le" label for histogram metrics always
21+-- goes last;
22+-- * bucket boundaries (which are exposed as values of the "le" label) are
23+-- presented as floating point numbers with leading and trailing zeroes.
24+-- Number of of zeroes is determined for each bucketer automatically based on
25+-- bucket boundaries;
26+-- * internally "+Inf" bucket is stored as "Inf" (to make it appear after
27+-- all numeric buckets), and gets replaced by "+Inf" just before we
28+-- expose the metrics.
29+--
30+-- For example, if you define your bucket boundaries as {0.00005, 10, 1000}
31+-- then we will keep the following samples for a metric `m1` with label
32+-- `site` set to `site1`:
33+--
34+-- m1_bucket{site="site1",le="0000.00005"}
35+-- m1_bucket{site="site1",le="0010.00000"}
36+-- m1_bucket{site="site1",le="1000.00000"}
37+-- m1_bucket{site="site1",le="Inf"}
38+-- m1_count{site="site1"}
39+-- m1_sum{site="site1"}
40+--
41+-- "Inf" will be replaced by "+Inf" while publishing metrics.
42+--
43+-- You can find the latest version and documentation at
44+-- https://github.com/knyar/nginx-lua-prometheus
45+-- Released under MIT license.
46+
47+
48+-- Default set of latency buckets, 5ms to 10s:
49+local DEFAULT_BUCKETS = {0.005, 0.01, 0.02, 0.03, 0.05, 0.075, 0.1, 0.2, 0.3,
50+ 0.4, 0.5, 0.75, 1, 1.5, 2, 3, 4, 5, 10}
51+
52+-- Metric is a "parent class" for all metrics.
53+local Metric = {}
54+function Metric:new(o)
55+ o = o or {}
56+ setmetatable(o, self)
57+ self.__index = self
58+ return o
59+end
60+
61+-- Checks that the right number of labels values have been passed.
62+--
63+-- Args:
64+-- label_values: an array of label values.
65+--
66+-- Returns:
67+-- an error message or nil
68+function Metric:check_label_values(label_values)
69+ if self.label_names == nil and label_values == nil then
70+ return
71+ elseif self.label_names == nil and label_values ~= nil then
72+ return "Expected no labels for " .. self.name .. ", got " .. #label_values
73+ elseif label_values == nil and self.label_names ~= nil then
74+ return "Expected " .. #self.label_names .. " labels for " ..
75+ self.name .. ", got none"
76+ elseif #self.label_names ~= #label_values then
77+ return "Wrong number of labels for " .. self.name .. ". Expected " ..
78+ #self.label_names .. ", got " .. #label_values
79+ else
80+ for i, k in ipairs(self.label_names) do
81+ if label_values[i] == nil then
82+ return "Unexpected nil value for label " .. k .. " of " .. self.name
83+ end
84+ end
85+ end
86+end
87+
88+local Counter = Metric:new()
89+-- Increase a given counter by `value`
90+--
91+-- Args:
92+-- value: (number) a value to add to the counter. Defaults to 1 if skipped.
93+-- label_values: an array of label values. Can be nil (i.e. not defined) for
94+-- metrics that have no labels.
95+function Counter:inc(value, label_values)
96+ local err = self:check_label_values(label_values)
97+ if err ~= nil then
98+ self.prometheus:log_error(err)
99+ return
100+ end
101+ if value ~= nil and value < 0 then
102+ self.prometheus:log_error_kv(self.name, value, "Value should not be negative")
103+ return
104+ end
105+
106+ self.prometheus:inc(self.name, self.label_names, label_values, value or 1)
107+end
108+
109+-- Delete a given counter
110+--
111+-- Args:
112+-- label_values: an array of label values. Can be nil (i.e. not defined) for
113+-- metrics that have no labels.
114+function Counter:del(label_values)
115+ local err = self:check_label_values(label_values)
116+ if err ~= nil then
117+ self.prometheus:log_error(err)
118+ return
119+ end
120+ self.prometheus:set(self.name, self.label_names, label_values, nil)
121+end
122+
123+-- Delete all metrics for this counter. If this counter have no labels, it is
124+-- just the same as Counter:del() function. If this counter have labels, it
125+-- will delete all the metrics with different label values.
126+function Counter:reset()
127+ self.prometheus:reset(self.name)
128+end
129+
130+local Gauge = Metric:new()
131+-- Set a given gauge to `value`
132+--
133+-- Args:
134+-- value: (number) a value to set the gauge to. Should be defined.
135+-- label_values: an array of label values. Can be nil (i.e. not defined) for
136+-- metrics that have no labels.
137+function Gauge:set(value, label_values)
138+ if value == nil then
139+ self.prometheus:log_error("No value passed for " .. self.name)
140+ return
141+ end
142+ local err = self:check_label_values(label_values)
143+ if err ~= nil then
144+ self.prometheus:log_error(err)
145+ return
146+ end
147+ self.prometheus:set(self.name, self.label_names, label_values, value)
148+end
149+
150+-- Delete a given gauge
151+--
152+-- Args:
153+-- label_values: an array of label values. Can be nil (i.e. not defined) for
154+-- metrics that have no labels.
155+function Gauge:del(label_values)
156+ local err = self:check_label_values(label_values)
157+ if err ~= nil then
158+ self.prometheus:log_error(err)
159+ return
160+ end
161+ self.prometheus:set(self.name, self.label_names, label_values, nil)
162+end
163+
164+-- Delete all metrics for this gauge. If this gauge have no labels, it is
165+-- just the same as Gauge:del() function. If this gauge have labels, it
166+-- will delete all the metrics with different label values.
167+function Gauge:reset()
168+ self.prometheus:reset(self.name)
169+end
170+
171+-- Increase a given gauge by `value`
172+--
173+-- Args:
174+-- value: (number) a value to add to the gauge (a negative value when you
175+-- need to decrease the value of the gauge). Defaults to 1 if skipped.
176+-- label_values: an array of label values. Can be nil (i.e. not defined) for
177+-- metrics that have no labels.
178+function Gauge:inc(value, label_values)
179+ local err = self:check_label_values(label_values)
180+ if err ~= nil then
181+ self.prometheus:log_error(err)
182+ return
183+ end
184+ self.prometheus:inc(self.name, self.label_names, label_values, value or 1)
185+end
186+
187+local Histogram = Metric:new()
188+-- Record a given value in a histogram.
189+--
190+-- Args:
191+-- value: (number) a value to record. Should be defined.
192+-- label_values: an array of label values. Can be nil (i.e. not defined) for
193+-- metrics that have no labels.
194+function Histogram:observe(value, label_values)
195+ if value == nil then
196+ self.prometheus:log_error("No value passed for " .. self.name)
197+ return
198+ end
199+ local err = self:check_label_values(label_values)
200+ if err ~= nil then
201+ self.prometheus:log_error(err)
202+ return
203+ end
204+ self.prometheus:histogram_observe(self.name, self.label_names, label_values, value)
205+end
206+
207+local Prometheus = {}
208+Prometheus.__index = Prometheus
209+Prometheus.initialized = false
210+
211+-- Generate full metric name that includes all labels.
212+--
213+-- Args:
214+-- name: string
215+-- label_names: (array) a list of label keys.
216+-- label_values: (array) a list of label values.
217+-- Returns:
218+-- (string) full metric name.
219+local function full_metric_name(name, label_names, label_values)
220+ if not label_names then
221+ return name
222+ end
223+ local label_parts = {}
224+ for idx, key in ipairs(label_names) do
225+ local label_value = (string.format("%s", label_values[idx])
226+ :gsub("[^\032-\126]", "") -- strip non-printable characters
227+ :gsub("\\", "\\\\")
228+ :gsub('"', '\\"'))
229+ table.insert(label_parts, key .. '="' .. label_value .. '"')
230+ end
231+ return name .. "{" .. table.concat(label_parts, ",") .. "}"
232+end
233+
234+-- Construct bucket format for a list of buckets.
235+--
236+-- This receives a list of buckets and returns a sprintf template that should
237+-- be used for bucket boundaries to make them come in increasing order when
238+-- sorted alphabetically.
239+--
240+-- To re-phrase, this is where we detect how many leading and trailing zeros we
241+-- need.
242+--
243+-- Args:
244+-- buckets: a list of buckets
245+--
246+-- Returns:
247+-- (string) a sprintf template.
248+local function construct_bucket_format(buckets)
249+ local max_order = 1
250+ local max_precision = 1
251+ for _, bucket in ipairs(buckets) do
252+ assert(type(bucket) == "number", "bucket boundaries should be numeric")
253+ -- floating point number with all trailing zeros removed
254+ local as_string = string.format("%f", bucket):gsub("0*$", "")
255+ local dot_idx = as_string:find(".", 1, true)
256+ max_order = math.max(max_order, dot_idx - 1)
257+ max_precision = math.max(max_precision, as_string:len() - dot_idx)
258+ end
259+ return "%0" .. (max_order + max_precision + 1) .. "." .. max_precision .. "f"
260+end
261+
262+-- Extract short metric name from the full one.
263+--
264+-- Args:
265+-- full_name: (string) full metric name that can include labels.
266+--
267+-- Returns:
268+-- (string) short metric name with no labels. For a `*_bucket` metric of
269+-- histogram the _bucket suffix will be removed.
270+local function short_metric_name(full_name)
271+ local labels_start, _ = full_name:find("{")
272+ if not labels_start then
273+ -- no labels
274+ return full_name
275+ end
276+ local suffix_idx, _ = full_name:find("_bucket{")
277+ if suffix_idx and full_name:find("le=") then
278+ -- this is a histogram metric
279+ return full_name:sub(1, suffix_idx - 1)
280+ end
281+ -- this is not a histogram metric
282+ return full_name:sub(1, labels_start - 1)
283+end
284+
285+-- Makes a shallow copy of a table
286+local function copy_table(table)
287+ local new = {}
288+ if table ~= nil then
289+ for k, v in ipairs(table) do
290+ new[k] = v
291+ end
292+ end
293+ return new
294+end
295+
296+-- Check metric name and label names for correctness.
297+--
298+-- Regular expressions to validate metric and label names are
299+-- documented in https://prometheus.io/docs/concepts/data_model/
300+--
301+-- Args:
302+-- metric_name: (string) metric name.
303+-- label_names: label names (array of strings).
304+--
305+-- Returns:
306+-- Either an error string, or nil of no errors were found.
307+local function check_metric_and_label_names(metric_name, label_names)
308+ if not metric_name:match("^[a-zA-Z_:][a-zA-Z0-9_:]*$") then
309+ return "Metric name '" .. metric_name .. "' is invalid"
310+ end
311+ for _, label_name in ipairs(label_names or {}) do
312+ if label_name == "le" then
313+ return "Invalid label name 'le' in " .. metric_name
314+ end
315+ if not label_name:match("^[a-zA-Z_][a-zA-Z0-9_]*$") then
316+ return "Metric '" .. metric_name .. "' label name '" .. label_name ..
317+ "' is invalid"
318+ end
319+ end
320+end
321+
322+-- Initialize the module.
323+--
324+-- This should be called once from the `init_by_lua` section in nginx
325+-- configuration.
326+--
327+-- Args:
328+-- dict_name: (string) name of the nginx shared dictionary which will be
329+-- used to store all metrics
330+-- prefix: (optional string) if supplied, prefix is added to all
331+-- metric names on output
332+--
333+-- Returns:
334+-- an object that should be used to register metrics.
335+function Prometheus.init(dict_name, prefix)
336+ local self = setmetatable({}, Prometheus)
337+ dict_name = dict_name or "prometheus_metrics"
338+ self.dict = ngx.shared[dict_name]
339+ if self.dict == nil then
340+ ngx.log(ngx.ERR,
341+ "Dictionary '", dict_name, "' does not seem to exist. ",
342+ "Please define the dictionary using `lua_shared_dict`.")
343+ return self
344+ end
345+ self.help = {}
346+ if prefix then
347+ self.prefix = prefix
348+ else
349+ self.prefix = ''
350+ end
351+ self.type = {}
352+ self.registered = {}
353+ self.buckets = {}
354+ self.bucket_format = {}
355+ self.initialized = true
356+
357+ self:counter("nginx_metric_errors_total",
358+ "Number of nginx-lua-prometheus errors")
359+ self.dict:set("nginx_metric_errors_total", 0)
360+ return self
361+end
362+
363+function Prometheus:log_error(...)
364+ ngx.log(ngx.ERR, ...)
365+ self.dict:incr("nginx_metric_errors_total", 1)
366+end
367+
368+function Prometheus:log_error_kv(key, value, err)
369+ self:log_error(
370+ "Error while setting '", key, "' to '", value, "': '", err, "'")
371+end
372+
373+-- Register a counter.
374+--
375+-- Args:
376+-- name: (string) name of the metric. Required.
377+-- description: (string) description of the metric. Will be used for the HELP
378+-- comment on the metrics page. Optional.
379+-- label_names: array of strings, defining a list of metrics. Optional.
380+--
381+-- Returns:
382+-- a Counter object.
383+function Prometheus:counter(name, description, label_names)
384+ if not self.initialized then
385+ ngx.log(ngx.ERR, "Prometheus module has not been initialized")
386+ return
387+ end
388+
389+ local err = check_metric_and_label_names(name, label_names)
390+ if err ~= nil then
391+ self:log_error(err)
392+ return
393+ end
394+
395+ if self.registered[name] then
396+ self:log_error("Duplicate metric " .. name)
397+ return
398+ end
399+ self.registered[name] = true
400+ self.help[name] = description
401+ self.type[name] = "counter"
402+
403+ return Counter:new{name=name, label_names=label_names, prometheus=self}
404+end
405+
406+-- Register a gauge.
407+--
408+-- Args:
409+-- name: (string) name of the metric. Required.
410+-- description: (string) description of the metric. Will be used for the HELP
411+-- comment on the metrics page. Optional.
412+-- label_names: array of strings, defining a list of metrics. Optional.
413+--
414+-- Returns:
415+-- a Gauge object.
416+function Prometheus:gauge(name, description, label_names)
417+ if not self.initialized then
418+ ngx.log(ngx.ERR, "Prometheus module has not been initialized")
419+ return
420+ end
421+
422+ local err = check_metric_and_label_names(name, label_names)
423+ if err ~= nil then
424+ self:log_error(err)
425+ return
426+ end
427+
428+ if self.registered[name] then
429+ self:log_error("Duplicate metric " .. name)
430+ return
431+ end
432+ self.registered[name] = true
433+ self.help[name] = description
434+ self.type[name] = "gauge"
435+
436+ return Gauge:new{name=name, label_names=label_names, prometheus=self}
437+end
438+
439+-- Register a histogram.
440+--
441+-- Args:
442+-- name: (string) name of the metric. Required.
443+-- description: (string) description of the metric. Will be used for the HELP
444+-- comment on the metrics page. Optional.
445+-- label_names: array of strings, defining a list of metrics. Optional.
446+-- buckets: array if numbers, defining bucket boundaries. Optional.
447+--
448+-- Returns:
449+-- a Histogram object.
450+function Prometheus:histogram(name, description, label_names, buckets)
451+ if not self.initialized then
452+ ngx.log(ngx.ERR, "Prometheus module has not been initialized")
453+ return
454+ end
455+
456+ local err = check_metric_and_label_names(name, label_names)
457+ if err ~= nil then
458+ self:log_error(err)
459+ return
460+ end
461+
462+ for _, suffix in ipairs({"", "_bucket", "_count", "_sum"}) do
463+ if self.registered[name .. suffix] then
464+ self:log_error("Duplicate metric " .. name .. suffix)
465+ return
466+ end
467+ self.registered[name .. suffix] = true
468+ end
469+ self.help[name] = description
470+ self.type[name] = "histogram"
471+
472+ self.buckets[name] = buckets or DEFAULT_BUCKETS
473+ self.bucket_format[name] = construct_bucket_format(self.buckets[name])
474+
475+ return Histogram:new{name=name, label_names=label_names, prometheus=self}
476+end
477+
478+-- Set a given dictionary key.
479+-- This overwrites existing values, so it should only be used when initializing
480+-- metrics or when explicitely overwriting the previous value of a metric.
481+function Prometheus:set_key(key, value)
482+ local ok, err = self.dict:safe_set(key, value)
483+ if not ok then
484+ self:log_error_kv(key, value, err)
485+ end
486+end
487+
488+-- Increment a given metric by `value`.
489+--
490+-- Args:
491+-- name: (string) short metric name without any labels.
492+-- label_names: (array) a list of label keys.
493+-- label_values: (array) a list of label values.
494+-- value: (number) value to add (a negative value when you need to decrease
495+-- the value of the gauge). Optional, defaults to 1.
496+function Prometheus:inc(name, label_names, label_values, value)
497+ local key = full_metric_name(name, label_names, label_values)
498+ if value == nil then value = 1 end
499+
500+ local newval, err = self.dict:incr(key, value)
501+ if newval then
502+ return
503+ end
504+ -- Yes, this looks like a race, so I guess we might under-report some values
505+ -- when multiple workers simultaneously try to create the same metric.
506+ -- Hopefully this does not happen too often (shared dictionary does not get
507+ -- reset during configuation reload).
508+ if err == "not found" then
509+ self:set_key(key, value)
510+ return
511+ end
512+ -- Unexpected error
513+ self:log_error_kv(key, value, err)
514+end
515+
516+-- Set the current value of a gauge to `value`
517+--
518+-- Args:
519+-- name: (string) short metric name without any labels.
520+-- label_names: (array) a list of label keys.
521+-- label_values: (array) a list of label values.
522+-- value: (number) the new value for the gauge.
523+function Prometheus:set(name, label_names, label_values, value)
524+ local key = full_metric_name(name, label_names, label_values)
525+ self:set_key(key, value)
526+end
527+
528+-- Record a given value into a histogram metric.
529+--
530+-- Args:
531+-- name: (string) short metric name without any labels.
532+-- label_names: (array) a list of label keys.
533+-- label_values: (array) a list of label values.
534+-- value: (number) value to observe.
535+function Prometheus:histogram_observe(name, label_names, label_values, value)
536+ self:inc(name .. "_count", label_names, label_values, 1)
537+ self:inc(name .. "_sum", label_names, label_values, value)
538+
539+ -- we are going to mutate arrays of label names and values, so create a copy.
540+ local l_names = copy_table(label_names)
541+ local l_values = copy_table(label_values)
542+
543+ -- Last bucket. Note, that the label value is "Inf" rather than "+Inf"
544+ -- required by Prometheus. This is necessary for this bucket to be the last
545+ -- one when all metrics are lexicographically sorted. "Inf" will get replaced
546+ -- by "+Inf" in Prometheus:collect().
547+ table.insert(l_names, "le")
548+ table.insert(l_values, "Inf")
549+ self:inc(name .. "_bucket", l_names, l_values, 1)
550+
551+ local label_count = #l_names
552+ for _, bucket in ipairs(self.buckets[name]) do
553+ if value <= bucket then
554+ -- last label is now "le"
555+ l_values[label_count] = self.bucket_format[name]:format(bucket)
556+ self:inc(name .. "_bucket", l_names, l_values, 1)
557+ end
558+ end
559+end
560+
561+-- Delete all metrics in a gauge or counter. If this gauge or counter have labels, it
562+-- will delete all the metrics with different label values.
563+function Prometheus:reset(name)
564+ local keys = self.dict:get_keys(0)
565+ for _, key in ipairs(keys) do
566+ local value, err = self.dict:get(key)
567+ if value then
568+ local short_name = short_metric_name(key)
569+ if name == short_name then
570+ self:set_key(key, nil)
571+ end
572+ else
573+ self:log_error("Error getting '", key, "': ", err)
574+ end
575+ end
576+end
577+
578+-- Prometheus compatible metric data as an array of strings.
579+--
580+-- Returns:
581+-- Array of strings with all metrics in a text format compatible with
582+-- Prometheus.
583+function Prometheus:metric_data()
584+ if not self.initialized then
585+ ngx.log(ngx.ERR, "Prometheus module has not been initialized")
586+ return
587+ end
588+
589+ local keys = self.dict:get_keys(0)
590+ -- Prometheus server expects buckets of a histogram to appear in increasing
591+ -- numerical order of their label values.
592+ table.sort(keys)
593+
594+ local seen_metrics = {}
595+ local output = {}
596+ for _, key in ipairs(keys) do
597+ local value, err = self.dict:get(key)
598+ if value then
599+ local short_name = short_metric_name(key)
600+ if not seen_metrics[short_name] then
601+ if self.help[short_name] then
602+ table.insert(output, string.format("# HELP %s%s %s\n",
603+ self.prefix, short_name, self.help[short_name]))
604+ end
605+ if self.type[short_name] then
606+ table.insert(output, string.format("# TYPE %s%s %s\n",
607+ self.prefix, short_name, self.type[short_name]))
608+ end
609+ seen_metrics[short_name] = true
610+ end
611+ -- Replace "Inf" with "+Inf" in each metric's last bucket 'le' label.
612+ if key:find('le="Inf"', 1, true) then
613+ key = key:gsub('le="Inf"', 'le="+Inf"')
614+ end
615+ table.insert(output, string.format("%s%s %s\n", self.prefix, key, value))
616+ else
617+ self:log_error("Error getting '", key, "': ", err)
618+ end
619+ end
620+ return output
621+end
622+
623+-- Present all metrics in a text format compatible with Prometheus.
624+--
625+-- This function should be used to expose the metrics on a separate HTTP page.
626+-- It will get the metrics from the dictionary, sort them, and expose them
627+-- aling with TYPE and HELP comments.
628+function Prometheus:collect()
629+ ngx.header.content_type = "text/plain"
630+ ngx.print(self:metric_data())
631+end
632+
633+return Prometheus

Subscribers

People subscribed via source and target branches