Merge ~raharper/cloud-init:feature/cloudinit-clean-from-write-log into cloud-init:master

Proposed by Ryan Harper
Status: Work in progress
Proposed branch: ~raharper/cloud-init:feature/cloudinit-clean-from-write-log
Merge into: cloud-init:master
Diff against target: 328 lines (+154/-7)
5 files modified
cloudinit/cmd/clean.py (+27/-4)
cloudinit/cmd/tests/test_clean.py (+28/-3)
cloudinit/tests/helpers.py (+2/-0)
cloudinit/util.py (+52/-0)
tests/unittests/test_util.py (+45/-0)
Reviewer Review Type Date Requested Status
Server Team CI bot continuous-integration Approve
cloud-init Commiters Pending
Review via email: mp+372946@code.launchpad.net

Commit message

clean: add a write_file audit log and use it to clean artifacts

Cloud-init uses util.write_file() in many places to write files
during cloud-init execution. This branch will create an audit log
in /var/lib/cloud/instance/write_file.log which contains JSON entries
which captures information about each write.

As a user of this newly created log, cloud-init clean subcommand
reads this data (if present) and augments the list of artifacts to
clean.

To post a comment you must log in.
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

FAILED: Continuous integration, rev:6b005ae905b5666a9bcd10865ea95c599a5d7e99
https://jenkins.ubuntu.com/server/job/cloud-init-ci/1163/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    FAILED: Ubuntu LTS: Build

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/1163//rebuild

review: Needs Fixing (continuous-integration)
65b8808... by Ryan Harper

Handle write_file_log_append failures

Revision history for this message
Server Team CI bot (server-team-bot) wrote :

PASSED: Continuous integration, rev:65b88088e1a5f83b41511dfdf57ef3a2315ad59f
https://jenkins.ubuntu.com/server/job/cloud-init-ci/1165/
Executed test runs:
    SUCCESS: Checkout
    SUCCESS: Unit & Style Tests
    SUCCESS: Ubuntu LTS: Build
    SUCCESS: Ubuntu LTS: Integration
    IN_PROGRESS: Declarative: Post Actions

Click here to trigger a rebuild:
https://jenkins.ubuntu.com/server/job/cloud-init-ci/1165//rebuild

review: Approve (continuous-integration)
Revision history for this message
Scott Moser (smoser) wrote :

Some thoughts:
a.) Why? Why do two writes for every write?
b.) currently atomic write_file is not covered.

Revision history for this message
Ryan Harper (raharper) wrote :

For (a) what would you suggest instead?

I'll look at (b).

Revision history for this message
Scott Moser (smoser) wrote :

For (a), I suggest maybe writing exactly one write for every write?
Why do you need a log of files' you've written? Should we have a separate log for:
 - files opened?
 - files checked for existence?
 - commands run?
 - log files written?

What is motivating you to have this list of files that cloud-init wrote? Realize that in regard to usefulness, it will likely be incomplete. For example, /var/log/cloud-init-output.log won't be in there, and neither would files written by commands cloud-init executes.

We already have a log file which I find to be way too verbose, and a "reporting" mechanism. Do we really need more logging?

I suspect that you're after something here, but what?

Revision history for this message
Ryan Harper (raharper) wrote :

We're looking to have cloud-init clean remove as many of the artifacts that cloud-init is responsible for creating. Since clean already removes things in cloud paths, like /var/lib/cloud but we do touch other files outside of those paths and logging what files we wrote makes clean-up of those files easy.

Revision history for this message
Scott Moser (smoser) wrote :

I think that this will not really get you what you're after.
Having a list of the files that cloud-init created or appended to or truncated isn't going to get you to your goal. The partial solution comes at the cost of 2 open and writes for every write. Additionally, we're already writing messages with the same information to the log. You could just parse the log to get the same information, possibly making it more machine friendly if you need to.

I think I'd prefer a solution that could at least get us to the goal.

Unmerged commits

65b8808... by Ryan Harper

Handle write_file_log_append failures

6b005ae... by Ryan Harper

cmd/clean: leverage write_file log to clean additional files

4837140... by Ryan Harper

Switch to json format for records, add unittests

46f3fe9... by Ryan Harper

Write file log intial commit

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
diff --git a/cloudinit/cmd/clean.py b/cloudinit/cmd/clean.py
index 30e49de..06f7daa 100644
--- a/cloudinit/cmd/clean.py
+++ b/cloudinit/cmd/clean.py
@@ -12,7 +12,7 @@ import sys
12from cloudinit.stages import Init12from cloudinit.stages import Init
13from cloudinit.util import (13from cloudinit.util import (
14 ProcessExecutionError, del_dir, del_file, get_config_logfiles,14 ProcessExecutionError, del_dir, del_file, get_config_logfiles,
15 is_link, subp)15 is_link, subp, write_file_log_read)
1616
1717
18def error(msg):18def error(msg):
@@ -45,7 +45,20 @@ def get_parser(parser=None):
45 return parser45 return parser
4646
4747
48def remove_artifacts(remove_logs, remove_seed=False):48def get_written_files(filterfn=None):
49 if filterfn is None:
50 def _exists(fn):
51 return os.path.exists(fn)
52 filterfn = _exists
53
54 for wf in write_file_log_read():
55 fn = wf.get('filename')
56 if fn:
57 if filterfn(fn):
58 yield fn
59
60
61def remove_artifacts(remove_logs=False, remove_seed=False):
49 """Helper which removes artifacts dir and optionally log files.62 """Helper which removes artifacts dir and optionally log files.
5063
51 @param: remove_logs: Boolean. Set True to delete the cloud_dir path. False64 @param: remove_logs: Boolean. Set True to delete the cloud_dir path. False
@@ -56,14 +69,24 @@ def remove_artifacts(remove_logs, remove_seed=False):
56 """69 """
57 init = Init(ds_deps=[])70 init = Init(ds_deps=[])
58 init.read_cfg()71 init.read_cfg()
72
73 logfiles = get_config_logfiles(init.cfg)
59 if remove_logs:74 if remove_logs:
60 for log_file in get_config_logfiles(init.cfg):75 for log_file in logfiles:
61 del_file(log_file)76 del_file(log_file)
6277
63 if not os.path.isdir(init.paths.cloud_dir):78 if not os.path.isdir(init.paths.cloud_dir):
64 return 0 # Artifacts dir already cleaned79 return 0 # Artifacts dir already cleaned
65 seed_path = os.path.join(init.paths.cloud_dir, 'seed')80 seed_path = os.path.join(init.paths.cloud_dir, 'seed')
66 for path in glob.glob('%s/*' % init.paths.cloud_dir):81
82 def _written_filter(fn):
83 if fn not in logfiles:
84 if not fn.startswith(init.paths.cloud_dir):
85 if os.path.exists(fn):
86 return fn
87
88 files_written = list(get_written_files(filterfn=_written_filter))
89 for path in glob.glob('%s/*' % init.paths.cloud_dir) + files_written:
67 if path == seed_path and not remove_seed:90 if path == seed_path and not remove_seed:
68 continue91 continue
69 try:92 try:
diff --git a/cloudinit/cmd/tests/test_clean.py b/cloudinit/cmd/tests/test_clean.py
index f092ab3..143d955 100644
--- a/cloudinit/cmd/tests/test_clean.py
+++ b/cloudinit/cmd/tests/test_clean.py
@@ -2,6 +2,7 @@
22
3from cloudinit.cmd import clean3from cloudinit.cmd import clean
4from cloudinit.util import ensure_dir, sym_link, write_file4from cloudinit.util import ensure_dir, sym_link, write_file
5from cloudinit import util
5from cloudinit.tests.helpers import CiTestCase, wrap_and_call, mock6from cloudinit.tests.helpers import CiTestCase, wrap_and_call, mock
6from collections import namedtuple7from collections import namedtuple
7import os8import os
@@ -18,6 +19,10 @@ class TestClean(CiTestCase):
18 self.artifact_dir = self.tmp_path('artifacts', self.new_root)19 self.artifact_dir = self.tmp_path('artifacts', self.new_root)
19 self.log1 = self.tmp_path('cloud-init.log', self.new_root)20 self.log1 = self.tmp_path('cloud-init.log', self.new_root)
20 self.log2 = self.tmp_path('cloud-init-output.log', self.new_root)21 self.log2 = self.tmp_path('cloud-init-output.log', self.new_root)
22 self.wflog = self.tmp_path('write_file.log', self.artifact_dir)
23 os.makedirs(os.path.dirname(self.wflog))
24 util.WRITE_FILE_LOG = self.wflog
25 util._ENABLE_WRITE_FILE_LOG = True
2126
22 class FakeInit(object):27 class FakeInit(object):
23 cfg = {'def_log_file': self.log1,28 cfg = {'def_log_file': self.log1,
@@ -37,21 +42,31 @@ class TestClean(CiTestCase):
37 """remove_artifacts removes logs when remove_logs is True."""42 """remove_artifacts removes logs when remove_logs is True."""
38 write_file(self.log1, 'cloud-init-log')43 write_file(self.log1, 'cloud-init-log')
39 write_file(self.log2, 'cloud-init-output-log')44 write_file(self.log2, 'cloud-init-output-log')
45 netplan = self.tmp_path('50-cloud-init.yaml', self.artifact_dir)
46 write_file(netplan, 'netplan content')
47 # read the write log before it gets cleaned
48 with open(self.wflog) as fh:
49 contents = fh.readlines()
50 self.assertEqual(3, len(contents))
4051
41 self.assertFalse(
42 os.path.exists(self.artifact_dir), 'Unexpected artifacts dir')
43 retcode = wrap_and_call(52 retcode = wrap_and_call(
44 'cloudinit.cmd.clean',53 'cloudinit.cmd.clean',
45 {'Init': {'side_effect': self.init_class}},54 {'Init': {'side_effect': self.init_class}},
46 clean.remove_artifacts, remove_logs=True)55 clean.remove_artifacts, remove_logs=True)
47 self.assertFalse(os.path.exists(self.log1), 'Unexpected file')56 self.assertFalse(os.path.exists(self.log1), 'Unexpected file')
48 self.assertFalse(os.path.exists(self.log2), 'Unexpected file')57 self.assertFalse(os.path.exists(self.log2), 'Unexpected file')
58 self.assertFalse(os.path.exists(netplan), 'Unexpected file')
49 self.assertEqual(0, retcode)59 self.assertEqual(0, retcode)
60 self.assertFalse(os.path.exists(self.wflog))
5061
51 def test_remove_artifacts_preserves_logs(self):62 def test_remove_artifacts_preserves_logs(self):
52 """remove_artifacts leaves logs when remove_logs is False."""63 """remove_artifacts leaves logs when remove_logs is False."""
53 write_file(self.log1, 'cloud-init-log')64 write_file(self.log1, 'cloud-init-log')
54 write_file(self.log2, 'cloud-init-output-log')65 write_file(self.log2, 'cloud-init-output-log')
66 # read the write log before it gets cleaned
67 with open(self.wflog) as fh:
68 contents = fh.readlines()
69 self.assertEqual(2, len(contents))
5570
56 retcode = wrap_and_call(71 retcode = wrap_and_call(
57 'cloudinit.cmd.clean',72 'cloudinit.cmd.clean',
@@ -60,6 +75,7 @@ class TestClean(CiTestCase):
60 self.assertTrue(os.path.exists(self.log1), 'Missing expected file')75 self.assertTrue(os.path.exists(self.log1), 'Missing expected file')
61 self.assertTrue(os.path.exists(self.log2), 'Missing expected file')76 self.assertTrue(os.path.exists(self.log2), 'Missing expected file')
62 self.assertEqual(0, retcode)77 self.assertEqual(0, retcode)
78 self.assertFalse(os.path.exists(self.wflog))
6379
64 def test_remove_artifacts_removes_unlinks_symlinks(self):80 def test_remove_artifacts_removes_unlinks_symlinks(self):
65 """remove_artifacts cleans artifacts dir unlinking any symlinks."""81 """remove_artifacts cleans artifacts dir unlinking any symlinks."""
@@ -77,6 +93,7 @@ class TestClean(CiTestCase):
77 self.assertFalse(93 self.assertFalse(
78 os.path.exists(path),94 os.path.exists(path),
79 'Unexpected {0} dir'.format(path))95 'Unexpected {0} dir'.format(path))
96 self.assertFalse(os.path.exists(self.wflog))
8097
81 def test_remove_artifacts_removes_artifacts_skipping_seed(self):98 def test_remove_artifacts_removes_artifacts_skipping_seed(self):
82 """remove_artifacts cleans artifacts dir with exception of seed dir."""99 """remove_artifacts cleans artifacts dir with exception of seed dir."""
@@ -101,6 +118,7 @@ class TestClean(CiTestCase):
101 self.assertFalse(118 self.assertFalse(
102 os.path.exists(deleted_dir),119 os.path.exists(deleted_dir),
103 'Unexpected {0} dir'.format(deleted_dir))120 'Unexpected {0} dir'.format(deleted_dir))
121 self.assertFalse(os.path.exists(self.wflog))
104122
105 def test_remove_artifacts_removes_artifacts_removes_seed(self):123 def test_remove_artifacts_removes_artifacts_removes_seed(self):
106 """remove_artifacts removes seed dir when remove_seed is True."""124 """remove_artifacts removes seed dir when remove_seed is True."""
@@ -124,6 +142,8 @@ class TestClean(CiTestCase):
124 os.path.exists(deleted_dir),142 os.path.exists(deleted_dir),
125 'Unexpected {0} dir'.format(deleted_dir))143 'Unexpected {0} dir'.format(deleted_dir))
126144
145 self.assertFalse(os.path.exists(self.wflog))
146
127 def test_remove_artifacts_returns_one_on_errors(self):147 def test_remove_artifacts_returns_one_on_errors(self):
128 """remove_artifacts returns non-zero on failure and prints an error."""148 """remove_artifacts returns non-zero on failure and prints an error."""
129 ensure_dir(self.artifact_dir)149 ensure_dir(self.artifact_dir)
@@ -139,6 +159,7 @@ class TestClean(CiTestCase):
139 self.assertEqual(159 self.assertEqual(
140 'ERROR: Could not remove %s/dir1: oops\n' % self.artifact_dir,160 'ERROR: Could not remove %s/dir1: oops\n' % self.artifact_dir,
141 m_stderr.getvalue())161 m_stderr.getvalue())
162 self.assertFalse(os.path.exists(self.wflog))
142163
143 def test_handle_clean_args_reboots(self):164 def test_handle_clean_args_reboots(self):
144 """handle_clean_args_reboots when reboot arg is provided."""165 """handle_clean_args_reboots when reboot arg is provided."""
@@ -159,10 +180,15 @@ class TestClean(CiTestCase):
159 self.assertEqual(0, retcode)180 self.assertEqual(0, retcode)
160 self.assertEqual(181 self.assertEqual(
161 [(['shutdown', '-r', 'now'], False)], called_cmds)182 [(['shutdown', '-r', 'now'], False)], called_cmds)
183 self.assertFalse(os.path.exists(self.wflog))
162184
163 def test_status_main(self):185 def test_status_main(self):
164 '''clean.main can be run as a standalone script.'''186 '''clean.main can be run as a standalone script.'''
165 write_file(self.log1, 'cloud-init-log')187 write_file(self.log1, 'cloud-init-log')
188 # read the write log before it gets cleaned
189 with open(self.wflog) as fh:
190 contents = fh.readlines()
191 self.assertEqual(1, len(contents))
166 with self.assertRaises(SystemExit) as context_manager:192 with self.assertRaises(SystemExit) as context_manager:
167 wrap_and_call(193 wrap_and_call(
168 'cloudinit.cmd.clean',194 'cloudinit.cmd.clean',
@@ -175,5 +201,4 @@ class TestClean(CiTestCase):
175 self.assertFalse(201 self.assertFalse(
176 os.path.exists(self.log1), 'Unexpected log {0}'.format(self.log1))202 os.path.exists(self.log1), 'Unexpected log {0}'.format(self.log1))
177203
178
179# vi: ts=4 expandtab syntax=python204# vi: ts=4 expandtab syntax=python
diff --git a/cloudinit/tests/helpers.py b/cloudinit/tests/helpers.py
index 4dad2af..c994a3b 100644
--- a/cloudinit/tests/helpers.py
+++ b/cloudinit/tests/helpers.py
@@ -96,6 +96,8 @@ class TestCase(unittest2.TestCase):
96 util.PROC_CMDLINE = None96 util.PROC_CMDLINE = None
97 util._DNS_REDIRECT_IP = None97 util._DNS_REDIRECT_IP = None
98 util._LSB_RELEASE = {}98 util._LSB_RELEASE = {}
99 util._ENABLE_WRITE_FILE_LOG = False
100 util.WRITE_LOG_FILE = '/var/lib/cloud/instance/write_file.log'
99101
100 def setUp(self):102 def setUp(self):
101 super(TestCase, self).setUp()103 super(TestCase, self).setUp()
diff --git a/cloudinit/util.py b/cloudinit/util.py
index aa23b3f..950921c 100644
--- a/cloudinit/util.py
+++ b/cloudinit/util.py
@@ -70,6 +70,8 @@ CONTAINER_TESTS = (['systemd-detect-virt', '--quiet', '--container'],
70 ['lxc-is-container'])70 ['lxc-is-container'])
7171
72PROC_CMDLINE = None72PROC_CMDLINE = None
73WRITE_FILE_LOG = '/var/lib/cloud/instance/write_file.log'
74_ENABLE_WRITE_FILE_LOG = True
7375
74_LSB_RELEASE = {}76_LSB_RELEASE = {}
7577
@@ -1841,6 +1843,55 @@ def chmod(path, mode):
1841 os.chmod(path, real_mode)1843 os.chmod(path, real_mode)
18421844
18431845
1846def write_file_log_append(filename, omode, target=None):
1847 """ Create an audit log of files that cloud-init has written.
1848
1849 The log is located at: /var/lib/cloud/instance/write_file.log
1850
1851 The format is JSON dict, one per line
1852 {'timestamp': time.time(), 'path': filename, 'mode': omode}
1853
1854 Example entries:
1855 {'filename': '/etc/apt/sources.list', 'mode': 'wb', 'timestamp': ts}
1856
1857 """
1858 global WRITE_FILE_LOG
1859 global _ENABLE_WRITE_FILE_LOG
1860
1861 if not _ENABLE_WRITE_FILE_LOG:
1862 return
1863
1864 log_file = target_path(target, path=WRITE_FILE_LOG)
1865 if not os.path.exists(os.path.dirname(log_file)):
1866 return
1867
1868 record = {'timestamp': time.time(), 'filename': filename, 'mode': omode}
1869 content = json.dumps(record, default=json_serialize_default)
1870 try:
1871 with open(log_file, "ab") as wfl:
1872 wfl.write((content + '\n').encode('utf-8'))
1873 wfl.flush()
1874 except IOError as e:
1875 LOG.debug('Failed to append to write file log (%s): %s', log_file, e)
1876
1877
1878def write_file_log_read(target=None):
1879 """ Read the WRITE_FILE_LOG and yield the contents.
1880
1881 :returns a list of record dicts
1882 """
1883 global WRITE_FILE_LOG
1884
1885 log_file = target_path(target, path=WRITE_FILE_LOG)
1886 if os.path.exists(log_file):
1887 with open(log_file, "rb") as wfl:
1888 contents = wfl.read()
1889 for line in contents.splitlines():
1890 record = load_json(line)
1891 if record:
1892 yield record
1893
1894
1844def write_file(filename, content, mode=0o644, omode="wb", copy_mode=False):1895def write_file(filename, content, mode=0o644, omode="wb", copy_mode=False):
1845 """1896 """
1846 Writes a file with the given content and sets the file mode as specified.1897 Writes a file with the given content and sets the file mode as specified.
@@ -1872,6 +1923,7 @@ def write_file(filename, content, mode=0o644, omode="wb", copy_mode=False):
1872 mode_r = "%r" % mode1923 mode_r = "%r" % mode
1873 LOG.debug("Writing to %s - %s: [%s] %s %s",1924 LOG.debug("Writing to %s - %s: [%s] %s %s",
1874 filename, omode, mode_r, len(content), write_type)1925 filename, omode, mode_r, len(content), write_type)
1926 write_file_log_append(filename, omode)
1875 with SeLinuxGuard(path=filename):1927 with SeLinuxGuard(path=filename):
1876 with open(filename, omode) as fh:1928 with open(filename, omode) as fh:
1877 fh.write(content)1929 fh.write(content)
diff --git a/tests/unittests/test_util.py b/tests/unittests/test_util.py
index 0e71db8..82909a7 100644
--- a/tests/unittests/test_util.py
+++ b/tests/unittests/test_util.py
@@ -186,6 +186,51 @@ class TestWriteFile(helpers.TestCase):
186 mockobj.assert_called_once_with('selinux')186 mockobj.assert_called_once_with('selinux')
187187
188188
189class TestWriteFileLogAppend(helpers.CiTestCase):
190
191 idir = 'var/lib/cloud/instance'
192
193 def test_write_file_log_append(self):
194 root_d = self.tmp_dir()
195 log_path = os.path.join(root_d, self.idir, 'write_file.log')
196 util._ENABLE_WRITE_FILE_LOG = True
197 util.WRITE_FILE_LOG = log_path
198 helpers.populate_dir(root_d,
199 {os.path.join(self.idir, 'dummy'): 'i-foobar'})
200
201 content = self.random_string()
202 fname = os.path.join(root_d, self.random_string())
203 util.write_file(fname, content, omode='wb')
204
205 self.assertTrue(os.path.exists(log_path))
206 contents = util.load_file(log_path)
207 self.assertEqual(1, len(contents.splitlines()))
208 for line in contents.splitlines():
209 record = util.load_json(line)
210 self.assertEqual(fname, record['filename'])
211 self.assertEqual('wb', record['mode'])
212 self.assertIn('timestamp', record)
213
214 def test_write_file_log_read(self):
215 root_d = self.tmp_dir()
216 log_path = os.path.join(root_d, self.idir, 'write_file.log')
217 util.WRITE_FILE_LOG = log_path
218 helpers.populate_dir(root_d,
219 {os.path.join(self.idir, 'dummy'): 'i-foobar'})
220
221 fname = os.path.join(root_d, self.random_string())
222
223 expected_record = {'timestamp': 0, 'filename': fname, 'mode': 'wb'}
224 with open(log_path, 'ab') as wfl:
225 record = (json.dumps(expected_record) + '\n').encode('utf-8')
226 wfl.write(record)
227 wfl.flush()
228
229 records = list(util.write_file_log_read())
230 self.assertEqual(1, len(records))
231 self.assertEqual(expected_record, records[0])
232
233
189class TestDeleteDirContents(helpers.TestCase):234class TestDeleteDirContents(helpers.TestCase):
190 def setUp(self):235 def setUp(self):
191 super(TestDeleteDirContents, self).setUp()236 super(TestDeleteDirContents, self).setUp()

Subscribers

People subscribed via source and target branches