Merge ~samgilson/cloud-init:cloud-init-analyze-boot into cloud-init:master

Proposed by Sam Gilson
Status: Merged
Approved by: Dan Watkins
Approved revision: 54c7df160a48df264447727a85f684f18e2428ef
Merge reported by: Server Team CI bot
Merged at revision: not available
Proposed branch: ~samgilson/cloud-init:cloud-init-analyze-boot
Merge into: cloud-init:master
Diff against target: 661 lines (+546/-12)
6 files modified
cloudinit/analyze/__main__.py (+86/-2)
cloudinit/analyze/show.py (+192/-10)
cloudinit/analyze/tests/test_boot.py (+170/-0)
doc/rtd/topics/analyze.rst (+84/-0)
doc/rtd/topics/capabilities.rst (+1/-0)
doc/rtd/topics/debugging.rst (+13/-0)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Approve
Dan Watkins Approve
Sam Gilson (community) Approve
Chad Smith Approve
Review via email: mp+368943@code.launchpad.net

Commit message

Cloud-init analyze module: Added ability to analyze boot events.

This branch introduces a new command line feature for cloud-init.
Currently, the cloud-init module has the capability to analyze events in
cloud-init.log in three ways: 'show', 'blame', 'dump'.
These changes add a fourth capability, called 'boot'.
Running the command 'cloud-init analyze boot' will provide the user three
timestamps.
1) Timestamp for when the kernel starts initializing.
2) Timestamp for when the kernel finishes its initialization.
3) Timestamp for when systemd activates cloud-init.
This feature enables cloud-init users to analyze different boot phases.
This would aid in debugging performance issues related
to cloud-init startup or tracking regression.

To post a comment you must log in.
Revision history for this message
Chad Smith (chad.smith) wrote :

Thank you for this branch Sam and for making cloud-init better!

A couple of notes:
 1. ^M characters present throughout:
    - cloudinit/analyze/tests/test_boot.py
    - doc/rtd/topics/analyze.rst

 2. Timestamps don't seem to be reported properly in an lxc container for me: (note that the kernel ended boot time abd cloud-init start time is in the future on my system:

root@x1:~# cloud-init analyze boot
-- Most Recent Boot Record --
   Kernel Started at: 2019-06-24 20:24:25.527606
   Kernel ended boot at: 2019-06-27 18:53:04.667329
   Kernel time to boot (seconds): 253719.139722
   Cloud-init start: 2019-06-27 18:53:05.277550
   Time between Kernel boot and Cloud-init start (seconds): 0.610222

root@x1:~# date
Mon Jun 24 20:39:25 UTC 2019

 2. timestamp reported by cloud-init boot doesn't match timestamp in cloud-init logs
  on my system. This may lead to confusing errors because you

 cloud-init analyze boot
   Cloud-init start: 2019-06-21 21:56:24.461153

YET cloud-init does startup @ 21.56:08

grep "running 'init-local" /var/log/cloud-init.log | tail -n 1
2019-06-21 21:56:09,067 - util.py[DEBUG]: Cloud-init v. 18.5-59-gb05832a5-1~bddeb running 'init-local' at Fri, 21 Jun 2019 21:56:08 +0000. Up 5.00 seconds.

Maybe we should be using ActiveEnterTimestampMonotonic or ExecMainStartTimestampMonotonic to represesent on cloud-init-local starts it's datasource discovery?

I'll continue to try to triage what might give us a better handle on cloud-init start time

Revision history for this message
Chad Smith (chad.smith) :
Revision history for this message
Chad Smith (chad.smith) wrote :

Marking needs information until we have an idea of how best to align cloud-init start time to something that aligns more with the timestamp in cloud-init logs.
Currently ActiveEnterTimestamp and InactiveExitTimestamp do show matching string values with the timestampl recoreded in cloud-init.log

# cloud-init.log

2019-06-24 20:24:25,463 - util.py[DEBUG]: Cloud-init v. 19.1-11-ge44586f2-1~bddeb running 'init-local' at Mon, 24 Jun 2019 20:24:25 +0000. Up 0.00 seconds.

# systemctl properties
root@x1:~# systemctl show -p ActiveEnterTimestamp cloud-init-local
ActiveEnterTimestamp=Mon 2019-06-24 20:24:25 UTC
root@x1:~# systemctl show -p InactiveExitTimestamp cloud-init-local
InactiveExitTimestamp=Mon 2019-06-24 20:24:25 UTC

So I think that maybe the timestamp arithmetic in the branch may be suspect

review: Needs Information
Revision history for this message
Sam Gilson (samgilson) wrote :

This is very interesting. I created an Ubuntu 18.04:latest VM on Azure and did the same checks as you, and here is what I got.
cat /var/log/cloud-init.log | grep "running 'init-local"
2019-06-24 18:59:34,735 - util.py[DEBUG]: Cloud-init v. 19.1-1-gbaa47854-0ubuntu1~18.04.1 running 'init-local' at Mon, 24 Jun 2019 18:59:34 +0000. Up 22.97 seconds.

systemctl show -p UserspaceTimestamp
UserspaceTimestamp=Mon 2019-06-24 18:59:22 UTC

systemctl show -p ExecMainStartTimestamp cloud-init-local
ExecMainStartTimestamp=Mon 2019-06-24 19:43:33 UTC

systemctl show -p ActiveEnterTimestamp cloud-init-local
ActiveEnterTimestamp=Mon 2019-06-24 19:43:33 UTC

systemctl show -p InactiveExitTimestamp cloud-init-local
InactiveExitTimestamp=Mon 2019-06-24 18:59:30 UTC

cloud-init analyze boot
-- Most Recent Boot Record --
   Kernel Started at: 2019-06-24 18:59:11.739511
   Kernel ended boot at: 2019-06-24 18:59:22.125180
   Kernel time to boot (seconds): 10.385669
   Cloud-init start: 2019-06-24 18:59:30.426426
   Time between Kernel boot and Cloud-init start (seconds): 8.301246

In my case, the InactiveExitTimestamp and the timestamp shown in cloud-init.log are the closest together (still not correct), with the ActiveEnter and ExecMainStart timestamps being very far off the mark. I am not really certain as to why the ActiveEnter and ExecMainStart are so far off in this case.

Revision history for this message
Sam Gilson (samgilson) wrote :

Keep in mind I had created this VM earlier in the day, hence why all the timestamps are so far off the current time.
> This is very interesting. I created an Ubuntu 18.04:latest VM on Azure and did
> the same checks as you, and here is what I got.
> cat /var/log/cloud-init.log | grep "running 'init-local"
> 2019-06-24 18:59:34,735 - util.py[DEBUG]: Cloud-init v.
> 19.1-1-gbaa47854-0ubuntu1~18.04.1 running 'init-local' at Mon, 24 Jun 2019
> 18:59:34 +0000. Up 22.97 seconds.
>
> systemctl show -p UserspaceTimestamp
> UserspaceTimestamp=Mon 2019-06-24 18:59:22 UTC
>
> systemctl show -p ExecMainStartTimestamp cloud-init-local
> ExecMainStartTimestamp=Mon 2019-06-24 19:43:33 UTC
>
> systemctl show -p ActiveEnterTimestamp cloud-init-local
> ActiveEnterTimestamp=Mon 2019-06-24 19:43:33 UTC
>
> systemctl show -p InactiveExitTimestamp cloud-init-local
> InactiveExitTimestamp=Mon 2019-06-24 18:59:30 UTC
>
> cloud-init analyze boot
> -- Most Recent Boot Record --
> Kernel Started at: 2019-06-24 18:59:11.739511
> Kernel ended boot at: 2019-06-24 18:59:22.125180
> Kernel time to boot (seconds): 10.385669
> Cloud-init start: 2019-06-24 18:59:30.426426
> Time between Kernel boot and Cloud-init start (seconds): 8.301246
>
> In my case, the InactiveExitTimestamp and the timestamp shown in cloud-
> init.log are the closest together (still not correct), with the ActiveEnter
> and ExecMainStart timestamps being very far off the mark. I am not really
> certain as to why the ActiveEnter and ExecMainStart are so far off in this
> case.

Revision history for this message
Sam Gilson (samgilson) wrote :

Hey Chad, in your first comment, what did you mean by "on my system"? I have been testing this on Ubuntu 18.04:latest and have found that on all the VMs I booted and tried, the cloud-init boot analyze is always consistently 4 seconds behind the cloud-init.log reported time. However, when checking the timestamps, it seems to be that ActiveEnter and ExecMainStart are consistently never correct, often off by several factors of 10. However in your test, it seems that InactiveExit and ActiveEnter are the same.

Revision history for this message
Sam Gilson (samgilson) wrote :

It seems that on Azure VMs that have been pre-provisioned, the ActiveEnter and ExecMainStart timestamps are incorrect due to the waiting that the pre-provisioning must do during the cloud-init-local phase of cloud-init in order to get the necessary data from Azure. Due to this, the more consistent time stamp is the InactiveExit timestamp, as it is written before any pre-provisioning occurs. However, InactiveExit is consistently 4 seconds off of the cloud-init.log timestamp.

However, when on a VM that has not been pre-provisioned, using ActiveEnter instead of InactiveExit always nets you the correct answer.

Revision history for this message
Sam Gilson (samgilson) wrote :

Hey Chad, thanks for all the great help in finding these bugs. I went back through and fixed all of the ones you found, so this MP is now ready for a re-review.

review: Needs Resubmitting
Revision history for this message
Chad Smith (chad.smith) wrote :

looks much better Sam, one suggestion inline to drop/adapt get_ci_start_from_log to use exinst event parsing from cloud-init.log

Revision history for this message
Chad Smith (chad.smith) wrote :

one more on some of the formatting success/failure too

Revision history for this message
Sam Gilson (samgilson) wrote :

Hey Chad, thank you for the reviews. I made all the changes you specified, specifically being dropping get_ci_start_from_log and replacing it with your inline comment, and changing the constant functions to formatted strings in analyze_boot. The previous two commits represent these changes, and this merge proposal is now ready for a re-review.

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:8e6d3c9f176e18cf60d3e3281d8f87334ce72417
https://jenkins.ubuntu.com/server/job/cloud-init-ci/766/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    FAILED: Ubuntu LTS: Build

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/766/rebuild

review: Needs Fixing (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:8e6d3c9f176e18cf60d3e3281d8f87334ce72417
https://jenkins.ubuntu.com/server/job/cloud-init-ci/772/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/772/rebuild

review: Approve (continuous-integration)
Revision history for this message
Chad Smith (chad.smith) wrote :

One final round of review commments from me on this.
here's a pastebin of the changes if you think it'll work:

1. I think we need to document that we are currently working around an lxc issue in this code
with a comment to pointing to https://github.com/lxc/lxcfs/issues/292

2. we probably shouldn't subshell with subp to stat a file, but use python's os.stat

here's a pastebin of the changes I suggest:
https://paste.ubuntu.com/p/TYbddMbVJ2/

Revision history for this message
Chad Smith (chad.smith) wrote :

This looks good to me. Will talk with other upstream devs on Monday to see if we have any concerns with including this in 19.2.

review: Approve
Revision history for this message
Dan Watkins (oddbloke) wrote :

I haven't done a full review (because I'm on a plane and about to be interrupted), but I did find some stylistic issues as well as perhaps some functional ones.

Revision history for this message
Sam Gilson (samgilson) :
review: Approve
Revision history for this message
Dan Watkins (oddbloke) wrote :

Thanks Sam!

review: Approve
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

Commit message lints:
 - Line #0 has 26 too many characters. Line starts with: "Cloud-init analyze module:"... - Line #2 has 264 too many characters. Line starts with: "This branch introduces"... - Line #6 has 139 too many characters. Line starts with: "This feature enables"...

review: Needs Fixing
Revision history for this message
Server Team CI bot (server-team-bot) :
review: Approve (continuous-integration)
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

Commit message lints:
 - Line #12 has 26 too many characters. Line starts with: "This would aid in debugging"...

review: Needs Fixing
Revision history for this message
Server Team CI bot (server-team-bot) :
review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/cloudinit/analyze/__main__.py b/cloudinit/analyze/__main__.py
index f861365..99e5c20 100644
--- a/cloudinit/analyze/__main__.py
+++ b/cloudinit/analyze/__main__.py
@@ -7,7 +7,7 @@ import re
7import sys7import sys
88
9from cloudinit.util import json_dumps9from cloudinit.util import json_dumps
1010from datetime import datetime
11from . import dump11from . import dump
12from . import show12from . import show
1313
@@ -52,9 +52,93 @@ def get_parser(parser=None):
52 dest='outfile', default='-',52 dest='outfile', default='-',
53 help='specify where to write output. ')53 help='specify where to write output. ')
54 parser_dump.set_defaults(action=('dump', analyze_dump))54 parser_dump.set_defaults(action=('dump', analyze_dump))
55 parser_boot = subparsers.add_parser(
56 'boot', help='Print list of boot times for kernel and cloud-init')
57 parser_boot.add_argument('-i', '--infile', action='store',
58 dest='infile', default='/var/log/cloud-init.log',
59 help='specify where to read input. ')
60 parser_boot.add_argument('-o', '--outfile', action='store',
61 dest='outfile', default='-',
62 help='specify where to write output.')
63 parser_boot.set_defaults(action=('boot', analyze_boot))
55 return parser64 return parser
5665
5766
67def analyze_boot(name, args):
68 """Report a list of how long different boot operations took.
69
70 For Example:
71 -- Most Recent Boot Record --
72 Kernel Started at: <time>
73 Kernel ended boot at: <time>
74 Kernel time to boot (seconds): <time>
75 Cloud-init activated by systemd at: <time>
76 Time between Kernel end boot and Cloud-init activation (seconds):<time>
77 Cloud-init start: <time>
78 """
79 infh, outfh = configure_io(args)
80 kernel_info = show.dist_check_timestamp()
81 status_code, kernel_start, kernel_end, ci_sysd_start = \
82 kernel_info
83 kernel_start_timestamp = datetime.utcfromtimestamp(kernel_start)
84 kernel_end_timestamp = datetime.utcfromtimestamp(kernel_end)
85 ci_sysd_start_timestamp = datetime.utcfromtimestamp(ci_sysd_start)
86 try:
87 last_init_local = \
88 [e for e in _get_events(infh) if e['name'] == 'init-local' and
89 'starting search' in e['description']][-1]
90 ci_start = datetime.utcfromtimestamp(last_init_local['timestamp'])
91 except IndexError:
92 ci_start = 'Could not find init-local log-line in cloud-init.log'
93 status_code = show.FAIL_CODE
94
95 FAILURE_MSG = 'Your Linux distro or container does not support this ' \
96 'functionality.\n' \
97 'You must be running a Kernel Telemetry supported ' \
98 'distro.\nPlease check ' \
99 'https://cloudinit.readthedocs.io/en/latest' \
100 '/topics/analyze.html for more ' \
101 'information on supported distros.\n'
102
103 SUCCESS_MSG = '-- Most Recent Boot Record --\n' \
104 ' Kernel Started at: {k_s_t}\n' \
105 ' Kernel ended boot at: {k_e_t}\n' \
106 ' Kernel time to boot (seconds): {k_r}\n' \
107 ' Cloud-init activated by systemd at: {ci_sysd_t}\n' \
108 ' Time between Kernel end boot and Cloud-init ' \
109 'activation (seconds): {bt_r}\n' \
110 ' Cloud-init start: {ci_start}\n'
111
112 CONTAINER_MSG = '-- Most Recent Container Boot Record --\n' \
113 ' Container started at: {k_s_t}\n' \
114 ' Cloud-init activated by systemd at: {ci_sysd_t}\n' \
115 ' Cloud-init start: {ci_start}\n' \
116
117 status_map = {
118 show.FAIL_CODE: FAILURE_MSG,
119 show.CONTAINER_CODE: CONTAINER_MSG,
120 show.SUCCESS_CODE: SUCCESS_MSG
121 }
122
123 kernel_runtime = kernel_end - kernel_start
124 between_process_runtime = ci_sysd_start - kernel_end
125
126 kwargs = {
127 'k_s_t': kernel_start_timestamp,
128 'k_e_t': kernel_end_timestamp,
129 'k_r': kernel_runtime,
130 'bt_r': between_process_runtime,
131 'k_e': kernel_end,
132 'k_s': kernel_start,
133 'ci_sysd': ci_sysd_start,
134 'ci_sysd_t': ci_sysd_start_timestamp,
135 'ci_start': ci_start
136 }
137
138 outfh.write(status_map[status_code].format(**kwargs))
139 return status_code
140
141
58def analyze_blame(name, args):142def analyze_blame(name, args):
59 """Report a list of records sorted by largest time delta.143 """Report a list of records sorted by largest time delta.
60144
@@ -119,7 +203,7 @@ def analyze_dump(name, args):
119203
120def _get_events(infile):204def _get_events(infile):
121 rawdata = None205 rawdata = None
122 events, rawdata = show.load_events(infile, None)206 events, rawdata = show.load_events_infile(infile)
123 if not events:207 if not events:
124 events, _ = dump.dump_events(rawdata=rawdata)208 events, _ = dump.dump_events(rawdata=rawdata)
125 return events209 return events
diff --git a/cloudinit/analyze/show.py b/cloudinit/analyze/show.py
index 3e778b8..511b808 100644
--- a/cloudinit/analyze/show.py
+++ b/cloudinit/analyze/show.py
@@ -8,8 +8,11 @@ import base64
8import datetime8import datetime
9import json9import json
10import os10import os
11import time
12import sys
1113
12from cloudinit import util14from cloudinit import util
15from cloudinit.distros import uses_systemd
1316
14# An event:17# An event:
15'''18'''
@@ -49,6 +52,10 @@ format_key = {
4952
50formatting_help = " ".join(["{0}: {1}".format(k.replace('%', '%%'), v)53formatting_help = " ".join(["{0}: {1}".format(k.replace('%', '%%'), v)
51 for k, v in format_key.items()])54 for k, v in format_key.items()])
55SUCCESS_CODE = 'successful'
56FAIL_CODE = 'failure'
57CONTAINER_CODE = 'container'
58TIMESTAMP_UNKNOWN = (FAIL_CODE, -1, -1, -1)
5259
5360
54def format_record(msg, event):61def format_record(msg, event):
@@ -125,9 +132,175 @@ def total_time_record(total_time):
125 return 'Total Time: %3.5f seconds\n' % total_time132 return 'Total Time: %3.5f seconds\n' % total_time
126133
127134
135class SystemctlReader(object):
136 '''
137 Class for dealing with all systemctl subp calls in a consistent manner.
138 '''
139 def __init__(self, property, parameter=None):
140 self.epoch = None
141 self.args = ['/bin/systemctl', 'show']
142 if parameter:
143 self.args.append(parameter)
144 self.args.extend(['-p', property])
145 # Don't want the init of our object to break. Instead of throwing
146 # an exception, set an error code that gets checked when data is
147 # requested from the object
148 self.failure = self.subp()
149
150 def subp(self):
151 '''
152 Make a subp call based on set args and handle errors by setting
153 failure code
154
155 :return: whether the subp call failed or not
156 '''
157 try:
158 value, err = util.subp(self.args, capture=True)
159 if err:
160 return err
161 self.epoch = value
162 return None
163 except Exception as systemctl_fail:
164 return systemctl_fail
165
166 def parse_epoch_as_float(self):
167 '''
168 If subp call succeeded, return the timestamp from subp as a float.
169
170 :return: timestamp as a float
171 '''
172 # subp has 2 ways to fail: it either fails and throws an exception,
173 # or returns an error code. Raise an exception here in order to make
174 # sure both scenarios throw exceptions
175 if self.failure:
176 raise RuntimeError('Subprocess call to systemctl has failed, '
177 'returning error code ({})'
178 .format(self.failure))
179 # Output from systemctl show has the format Property=Value.
180 # For example, UserspaceMonotonic=1929304
181 timestamp = self.epoch.split('=')[1]
182 # Timestamps reported by systemctl are in microseconds, converting
183 return float(timestamp) / 1000000
184
185
186def dist_check_timestamp():
187 '''
188 Determine which init system a particular linux distro is using.
189 Each init system (systemd, upstart, etc) has a different way of
190 providing timestamps.
191
192 :return: timestamps of kernelboot, kernelendboot, and cloud-initstart
193 or TIMESTAMP_UNKNOWN if the timestamps cannot be retrieved.
194 '''
195
196 if uses_systemd():
197 return gather_timestamps_using_systemd()
198
199 # Use dmesg to get timestamps if the distro does not have systemd
200 if util.is_FreeBSD() or 'gentoo' in \
201 util.system_info()['system'].lower():
202 return gather_timestamps_using_dmesg()
203
204 # this distro doesn't fit anything that is supported by cloud-init. just
205 # return error codes
206 return TIMESTAMP_UNKNOWN
207
208
209def gather_timestamps_using_dmesg():
210 '''
211 Gather timestamps that corresponds to kernel begin initialization,
212 kernel finish initialization using dmesg as opposed to systemctl
213
214 :return: the two timestamps plus a dummy timestamp to keep consistency
215 with gather_timestamps_using_systemd
216 '''
217 try:
218 data, _ = util.subp(['dmesg'], capture=True)
219 split_entries = data[0].splitlines()
220 for i in split_entries:
221 if i.decode('UTF-8').find('user') != -1:
222 splitup = i.decode('UTF-8').split()
223 stripped = splitup[1].strip(']')
224
225 # kernel timestamp from dmesg is equal to 0,
226 # with the userspace timestamp relative to it.
227 user_space_timestamp = float(stripped)
228 kernel_start = float(time.time()) - float(util.uptime())
229 kernel_end = kernel_start + user_space_timestamp
230
231 # systemd wont start cloud-init in this case,
232 # so we cannot get that timestamp
233 return SUCCESS_CODE, kernel_start, kernel_end, \
234 kernel_end
235
236 except Exception:
237 pass
238 return TIMESTAMP_UNKNOWN
239
240
241def gather_timestamps_using_systemd():
242 '''
243 Gather timestamps that corresponds to kernel begin initialization,
244 kernel finish initialization. and cloud-init systemd unit activation
245
246 :return: the three timestamps
247 '''
248 kernel_start = float(time.time()) - float(util.uptime())
249 try:
250 delta_k_end = SystemctlReader('UserspaceTimestampMonotonic')\
251 .parse_epoch_as_float()
252 delta_ci_s = SystemctlReader('InactiveExitTimestampMonotonic',
253 'cloud-init-local').parse_epoch_as_float()
254 base_time = kernel_start
255 status = SUCCESS_CODE
256 # lxc based containers do not set their monotonic zero point to be when
257 # the container starts, instead keep using host boot as zero point
258 # time.CLOCK_MONOTONIC_RAW is only available in python 3.3
259 if util.is_container():
260 # clock.monotonic also uses host boot as zero point
261 if sys.version_info >= (3, 3):
262 base_time = float(time.time()) - float(time.monotonic())
263 # TODO: lxcfs automatically truncates /proc/uptime to seconds
264 # in containers when https://github.com/lxc/lxcfs/issues/292
265 # is fixed, util.uptime() should be used instead of stat on
266 try:
267 file_stat = os.stat('/proc/1/cmdline')
268 kernel_start = file_stat.st_atime
269 except OSError as err:
270 raise RuntimeError('Could not determine container boot '
271 'time from /proc/1/cmdline. ({})'
272 .format(err))
273 status = CONTAINER_CODE
274 else:
275 status = FAIL_CODE
276 kernel_end = base_time + delta_k_end
277 cloudinit_sysd = base_time + delta_ci_s
278
279 except Exception as e:
280 # Except ALL exceptions as Systemctl reader can throw many different
281 # errors, but any failure in systemctl means that timestamps cannot be
282 # obtained
283 print(e)
284 return TIMESTAMP_UNKNOWN
285 return status, kernel_start, kernel_end, cloudinit_sysd
286
287
128def generate_records(events, blame_sort=False,288def generate_records(events, blame_sort=False,
129 print_format="(%n) %d seconds in %I%D",289 print_format="(%n) %d seconds in %I%D",
130 dump_files=False, log_datafiles=False):290 dump_files=False, log_datafiles=False):
291 '''
292 Take in raw events and create parent-child dependencies between events
293 in order to order events in chronological order.
294
295 :param events: JSONs from dump that represents events taken from logs
296 :param blame_sort: whether to sort by timestamp or by time taken.
297 :param print_format: formatting to represent event, time stamp,
298 and time taken by the event in one line
299 :param dump_files: whether to dump files into JSONs
300 :param log_datafiles: whether or not to log events generated
301
302 :return: boot records ordered chronologically
303 '''
131304
132 sorted_events = sorted(events, key=lambda x: x['timestamp'])305 sorted_events = sorted(events, key=lambda x: x['timestamp'])
133 records = []306 records = []
@@ -189,19 +362,28 @@ def generate_records(events, blame_sort=False,
189362
190363
191def show_events(events, print_format):364def show_events(events, print_format):
365 '''
366 A passthrough method that makes it easier to call generate_records()
367
368 :param events: JSONs from dump that represents events taken from logs
369 :param print_format: formatting to represent event, time stamp,
370 and time taken by the event in one line
371
372 :return: boot records ordered chronologically
373 '''
192 return generate_records(events, print_format=print_format)374 return generate_records(events, print_format=print_format)
193375
194376
195def load_events(infile, rawdata=None):377def load_events_infile(infile):
196 if rawdata:378 '''
197 data = rawdata.read()379 Takes in a log file, read it, and convert to json.
198 else:380
199 data = infile.read()381 :param infile: The Log file to be read
200382
201 j = None383 :return: json version of logfile, raw file
384 '''
385 data = infile.read()
202 try:386 try:
203 j = json.loads(data)387 return json.loads(data), data
204 except ValueError:388 except ValueError:
205 pass389 return None, data
206
207 return j, data
diff --git a/cloudinit/analyze/tests/test_boot.py b/cloudinit/analyze/tests/test_boot.py
208new file mode 100644390new file mode 100644
index 0000000..706e2cc
--- /dev/null
+++ b/cloudinit/analyze/tests/test_boot.py
@@ -0,0 +1,170 @@
1import os
2from cloudinit.analyze.__main__ import (analyze_boot, get_parser)
3from cloudinit.tests.helpers import CiTestCase, mock
4from cloudinit.analyze.show import dist_check_timestamp, SystemctlReader, \
5 FAIL_CODE, CONTAINER_CODE
6
7err_code = (FAIL_CODE, -1, -1, -1)
8
9
10class TestDistroChecker(CiTestCase):
11
12 @mock.patch('cloudinit.util.system_info', return_value={'dist': ('', '',
13 ''),
14 'system': ''})
15 @mock.patch('platform.linux_distribution', return_value=('', '', ''))
16 @mock.patch('cloudinit.util.is_FreeBSD', return_value=False)
17 def test_blank_distro(self, m_sys_info, m_linux_distribution, m_free_bsd):
18 self.assertEqual(err_code, dist_check_timestamp())
19
20 @mock.patch('cloudinit.util.system_info', return_value={'dist': ('', '',
21 '')})
22 @mock.patch('platform.linux_distribution', return_value=('', '', ''))
23 @mock.patch('cloudinit.util.is_FreeBSD', return_value=True)
24 def test_freebsd_gentoo_cant_find(self, m_sys_info,
25 m_linux_distribution, m_is_FreeBSD):
26 self.assertEqual(err_code, dist_check_timestamp())
27
28 @mock.patch('cloudinit.util.subp', return_value=(0, 1))
29 def test_subp_fails(self, m_subp):
30 self.assertEqual(err_code, dist_check_timestamp())
31
32
33class TestSystemCtlReader(CiTestCase):
34
35 def test_systemctl_invalid_property(self):
36 reader = SystemctlReader('dummyProperty')
37 with self.assertRaises(RuntimeError):
38 reader.parse_epoch_as_float()
39
40 def test_systemctl_invalid_parameter(self):
41 reader = SystemctlReader('dummyProperty', 'dummyParameter')
42 with self.assertRaises(RuntimeError):
43 reader.parse_epoch_as_float()
44
45 @mock.patch('cloudinit.util.subp', return_value=('U=1000000', None))
46 def test_systemctl_works_correctly_threshold(self, m_subp):
47 reader = SystemctlReader('dummyProperty', 'dummyParameter')
48 self.assertEqual(1.0, reader.parse_epoch_as_float())
49 thresh = 1.0 - reader.parse_epoch_as_float()
50 self.assertTrue(thresh < 1e-6)
51 self.assertTrue(thresh > (-1 * 1e-6))
52
53 @mock.patch('cloudinit.util.subp', return_value=('U=0', None))
54 def test_systemctl_succeed_zero(self, m_subp):
55 reader = SystemctlReader('dummyProperty', 'dummyParameter')
56 self.assertEqual(0.0, reader.parse_epoch_as_float())
57
58 @mock.patch('cloudinit.util.subp', return_value=('U=1', None))
59 def test_systemctl_succeed_distinct(self, m_subp):
60 reader = SystemctlReader('dummyProperty', 'dummyParameter')
61 val1 = reader.parse_epoch_as_float()
62 m_subp.return_value = ('U=2', None)
63 reader2 = SystemctlReader('dummyProperty', 'dummyParameter')
64 val2 = reader2.parse_epoch_as_float()
65 self.assertNotEqual(val1, val2)
66
67 @mock.patch('cloudinit.util.subp', return_value=('100', None))
68 def test_systemctl_epoch_not_splittable(self, m_subp):
69 reader = SystemctlReader('dummyProperty', 'dummyParameter')
70 with self.assertRaises(IndexError):
71 reader.parse_epoch_as_float()
72
73 @mock.patch('cloudinit.util.subp', return_value=('U=foobar', None))
74 def test_systemctl_cannot_convert_epoch_to_float(self, m_subp):
75 reader = SystemctlReader('dummyProperty', 'dummyParameter')
76 with self.assertRaises(ValueError):
77 reader.parse_epoch_as_float()
78
79
80class TestAnalyzeBoot(CiTestCase):
81
82 def set_up_dummy_file_ci(self, path, log_path):
83 infh = open(path, 'w+')
84 infh.write('2019-07-08 17:40:49,601 - util.py[DEBUG]: Cloud-init v. '
85 '19.1-1-gbaa47854-0ubuntu1~18.04.1 running \'init-local\' '
86 'at Mon, 08 Jul 2019 17:40:49 +0000. Up 18.84 seconds.')
87 infh.close()
88 outfh = open(log_path, 'w+')
89 outfh.close()
90
91 def set_up_dummy_file(self, path, log_path):
92 infh = open(path, 'w+')
93 infh.write('dummy data')
94 infh.close()
95 outfh = open(log_path, 'w+')
96 outfh.close()
97
98 def remove_dummy_file(self, path, log_path):
99 if os.path.isfile(path):
100 os.remove(path)
101 if os.path.isfile(log_path):
102 os.remove(log_path)
103
104 @mock.patch('cloudinit.analyze.show.dist_check_timestamp',
105 return_value=err_code)
106 def test_boot_invalid_distro(self, m_dist_check_timestamp):
107
108 path = os.path.dirname(os.path.abspath(__file__))
109 log_path = path + '/boot-test.log'
110 path += '/dummy.log'
111 self.set_up_dummy_file(path, log_path)
112
113 parser = get_parser()
114 args = parser.parse_args(args=['boot', '-i', path, '-o',
115 log_path])
116 name_default = ''
117 analyze_boot(name_default, args)
118 # now args have been tested, go into outfile and make sure error
119 # message is in the outfile
120 outfh = open(args.outfile, 'r')
121 data = outfh.read()
122 err_string = 'Your Linux distro or container does not support this ' \
123 'functionality.\nYou must be running a Kernel ' \
124 'Telemetry supported distro.\nPlease check ' \
125 'https://cloudinit.readthedocs.io/en/latest/topics' \
126 '/analyze.html for more information on supported ' \
127 'distros.\n'
128
129 self.remove_dummy_file(path, log_path)
130 self.assertEqual(err_string, data)
131
132 @mock.patch("cloudinit.util.is_container", return_value=True)
133 @mock.patch('cloudinit.util.subp', return_value=('U=1000000', None))
134 def test_container_no_ci_log_line(self, m_is_container, m_subp):
135 path = os.path.dirname(os.path.abspath(__file__))
136 log_path = path + '/boot-test.log'
137 path += '/dummy.log'
138 self.set_up_dummy_file(path, log_path)
139
140 parser = get_parser()
141 args = parser.parse_args(args=['boot', '-i', path, '-o',
142 log_path])
143 name_default = ''
144
145 finish_code = analyze_boot(name_default, args)
146
147 self.remove_dummy_file(path, log_path)
148 self.assertEqual(FAIL_CODE, finish_code)
149
150 @mock.patch("cloudinit.util.is_container", return_value=True)
151 @mock.patch('cloudinit.util.subp', return_value=('U=1000000', None))
152 @mock.patch('cloudinit.analyze.__main__._get_events', return_value=[{
153 'name': 'init-local', 'description': 'starting search', 'timestamp':
154 100000}])
155 @mock.patch('cloudinit.analyze.show.dist_check_timestamp',
156 return_value=(CONTAINER_CODE, 1, 1, 1))
157 def test_container_ci_log_line(self, m_is_container, m_subp, m_get, m_g):
158 path = os.path.dirname(os.path.abspath(__file__))
159 log_path = path + '/boot-test.log'
160 path += '/dummy.log'
161 self.set_up_dummy_file_ci(path, log_path)
162
163 parser = get_parser()
164 args = parser.parse_args(args=['boot', '-i', path, '-o',
165 log_path])
166 name_default = ''
167 finish_code = analyze_boot(name_default, args)
168
169 self.remove_dummy_file(path, log_path)
170 self.assertEqual(CONTAINER_CODE, finish_code)
diff --git a/doc/rtd/topics/analyze.rst b/doc/rtd/topics/analyze.rst
0new file mode 100644171new file mode 100644
index 0000000..5cf38bd
--- /dev/null
+++ b/doc/rtd/topics/analyze.rst
@@ -0,0 +1,84 @@
1*************************
2Cloud-init Analyze Module
3*************************
4
5Overview
6========
7The analyze module was added to cloud-init in order to help analyze cloud-init boot time
8performance. It is loosely based on systemd-analyze where there are 4 main actions:
9show, blame, dump, and boot.
10
11The 'show' action is similar to 'systemd-analyze critical-chain' which prints a list of units, the
12time they started and how long they took. For cloud-init, we have four stages, and within each stage
13a number of modules may run depending on configuration. ‘cloudinit-analyze show’ will, for each
14boot, print this information and a summary total time, per boot.
15
16The 'blame' action matches 'systemd-analyze blame' where it prints, in descending order,
17the units that took the longest to run. This output is highly useful for examining where cloud-init
18is spending its time during execution.
19
20The 'dump' action simply dumps the cloud-init logs that the analyze module is performing
21the analysis on and returns a list of dictionaries that can be consumed for other reporting needs.
22
23The 'boot' action prints out kernel related timestamps that are not included in any of the
24cloud-init logs. There are three different timestamps that are presented to the user:
25kernel start, kernel finish boot, and cloud-init start. This was added for additional
26clarity into the boot process that cloud-init does not have control over, to aid in debugging of
27performance issues related to cloudinit startup or tracking regression.
28
29Usage
30=====
31Using each of the printing formats is as easy as running one of the following bash commands:
32
33.. code-block:: shell-session
34
35 cloud-init analyze show
36 cloud-init analyze blame
37 cloud-init analyze dump
38 cloud-init analyze boot
39
40Cloud-init analyze boot Timestamp Gathering
41===========================================
42The following boot related timestamps are gathered on demand when cloud-init analyze boot runs:
43- Kernel Startup, which is inferred from system uptime
44- Kernel Finishes Initialization, which is inferred from systemd UserSpaceMonotonicTimestamp property
45- Cloud-init activation, which is inferred from the property InactiveExitTimestamp of the cloud-init
46local systemd unit.
47
48In order to gather the necessary timestamps using systemd, running the commands
49
50.. code-block:: shell-session
51
52 systemctl show -p UserspaceTimestampMonotonic
53 systemctl show cloud-init-local -p InactiveExitTimestampMonotonic
54
55will gather the UserspaceTimestamp and InactiveExitTimestamp.
56The UserspaceTimestamp tracks when the init system starts, which is used as an indicator of kernel
57finishing initialization. The InactiveExitTimestamp tracks when a particular systemd unit transitions
58from the Inactive to Active state, which can be used to mark the beginning of systemd's activation
59of cloud-init.
60
61Currently this only works for distros that use systemd as the init process. We will be expanding
62support for other distros in the future and this document will be updated accordingly.
63
64If systemd is not present on the system, dmesg is used to attempt to find an event that logs the
65beginning of the init system. However, with this method only the first two timestamps are able to be found;
66dmesg does not monitor userspace processes, so no cloud-init start timestamps are emitted like when
67using systemd.
68
69List of Cloud-init analyze boot supported distros
70=================================================
71- Arch
72- CentOS
73- Debian
74- Fedora
75- OpenSuSE
76- Red Hat Enterprise Linux
77- Ubuntu
78- SUSE Linux Enterprise Server
79- CoreOS
80
81List of Cloud-init analyze boot unsupported distros
82===================================================
83- FreeBSD
84- Gentoo
0\ No newline at end of file85\ No newline at end of file
diff --git a/doc/rtd/topics/capabilities.rst b/doc/rtd/topics/capabilities.rst
index 0d8b894..6d85a99 100644
--- a/doc/rtd/topics/capabilities.rst
+++ b/doc/rtd/topics/capabilities.rst
@@ -217,6 +217,7 @@ Get detailed reports of where cloud-init spends most of its time. See
217* **dump** Machine-readable JSON dump of all cloud-init tracked events.217* **dump** Machine-readable JSON dump of all cloud-init tracked events.
218* **show** show time-ordered report of the cost of operations during each218* **show** show time-ordered report of the cost of operations during each
219 boot stage.219 boot stage.
220* **boot** show timestamps from kernel initialization, kernel finish initialization, and cloud-init start.
220221
221.. _cli_devel:222.. _cli_devel:
222223
diff --git a/doc/rtd/topics/debugging.rst b/doc/rtd/topics/debugging.rst
index 51363ea..e13d915 100644
--- a/doc/rtd/topics/debugging.rst
+++ b/doc/rtd/topics/debugging.rst
@@ -68,6 +68,19 @@ subcommands default to reading /var/log/cloud-init.log.
68 00.00100s (modules-final/config-rightscale_userdata)68 00.00100s (modules-final/config-rightscale_userdata)
69 ...69 ...
7070
71* ``analyze boot`` Make subprocess calls to the kernel in order to get relevant
72 pre-cloud-init timestamps, such as the kernel start, kernel finish boot, and cloud-init start.
73
74.. code-block:: shell-session
75
76 $ cloud-init analyze boot
77 -- Most Recent Boot Record --
78 Kernel Started at: 2019-06-13 15:59:55.809385
79 Kernel ended boot at: 2019-06-13 16:00:00.944740
80 Kernel time to boot (seconds): 5.135355
81 Cloud-init start: 2019-06-13 16:00:05.738396
82 Time between Kernel boot and Cloud-init start (seconds): 4.793656
83
7184
72Analyze quickstart - LXC85Analyze quickstart - LXC
73---------------------------86---------------------------

Subscribers

People subscribed via source and target branches