Merge lp:~barry/ubuntu-system-image/lp1444347 into lp:ubuntu-system-image/server

Proposed by Barry Warsaw
Status: Merged
Merged at revision: 271
Proposed branch: lp:~barry/ubuntu-system-image/lp1444347
Merge into: lp:ubuntu-system-image/server
Diff against target: 433 lines (+254/-63)
2 files modified
lib/systemimage/diff.py (+89/-51)
lib/systemimage/tests/test_diff.py (+165/-12)
To merge this branch: bzr merge lp:~barry/ubuntu-system-image/lp1444347
Reviewer Review Type Date Requested Status
Steve Langasek (community) Needs Fixing
Caio Begotti (community) Approve
Stéphane Graber Pending
Registry Administrators Pending
Review via email: mp+260597@code.launchpad.net

Description of the change

Addresses LP: #1444347 by ensuring that any hardlinks which at first appear not to change between the source and target, but which their underlying link target *does* change, gets properly modified in the diff. This handles the following cases (among others):

source:
a (regular file)
b -> a (via hardlink)

target
a (regular file, modified)
b -> a (via hardlink)

If the diff doesn't not also include b then after application of the diff, b will point to a's *old* inode, but not the "new" a. This branch ensures that in this case, b also gets modified by the diff so that it will point to a's new inode.

Coincidentally, this also fixes a related previously broken case:

source:
a (regular file)
b -> a (via hardlink)

target:
b (regular file)
a -> b (via hardlink)

In this case, both a and b need to be modified in the diff.

This branch includes a few other drive-by improvements.

* Rather than use hard to read nested indexes to get at the bits of information, we use a namedtuple for the internal data structures, which allow us to use easy to read attribute access to ensure we're getting at the bits and pieces we intend to get to.

* Along those lines, this actually fixes some broken indexing in the big "switched hardlinks" conditional. E.g. devminor wasn't being checked previously, nor was gid. Those are now being checked.

* Lots of comments :)

* A few minor code style improvements.

One more important note as you're reviewing this branch. There is a change in an existing test, but I believe it is legitimate. In the original test, the hardlink c/h -> d did not get modified, but this is exactly the case we're trying to fix above. In the test, d does change but the hardlink to d (i.e. c/h) wasn't being changed. That means that after the delta, c/h would point to the old d inode, and we want it to point to the new d.

To post a comment you must log in.
Revision history for this message
Caio Begotti (caio1982) wrote :

Looks alright given the bug info, even though I can't reproduce or test it in a device myself :-)

review: Approve
Revision history for this message
Steve Langasek (vorlon) wrote :

Applying the changes to the test suite without the changes to the code, I see only the following set of test failures:

test_generate_tarball (systemimage.tests.test_diff.DiffTests) ... FAIL
test_print_changes (systemimage.tests.test_diff.DiffTests) ... FAIL
test_link_count_2_order_ab (systemimage.tests.test_diff.TestHardLinkTargetIsModified) ... FAIL

Crucially, the two unpack tests did *not* fail:

test_unpack_ab (systemimage.tests.test_diff.TestHardLinkTargetIsModified) ... ok
test_unpack_ab_swap_roles (systemimage.tests.test_diff.TestHardLinkTargetIsModified) ... ok

This indicates that the unpack tests have not succeeded in modeling the failure that was seen in the wild (see further comments inline).

Without that, I don't think we can be confident in the correctness of this fix.

I've posted a follow-up comment on the bug with some analysis of the original failed delta. I believe the processing of the 'removed' file is key - in the original scenario, system/usr/bin/python3.4m is removed and then re-unpacked, and system/usr/bin/python3.4 is unchanged as a result (because the removal breaks the relationship between the two hardlinks).

The unpack tests need to accurately replicate the behavior of the upgrader.

review: Needs Fixing
274. By Barry Warsaw

A more accurate test of the unpack case.

Revision history for this message
Barry Warsaw (barry) wrote :

Good catch, thanks! I've merged your branch into mine, so this should be ready to go. I'm running the full tox suite locally, but I expect it to pass.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/systemimage/diff.py'
--- lib/systemimage/diff.py 2015-04-27 20:04:09 +0000
+++ lib/systemimage/diff.py 2015-06-01 16:00:55 +0000
@@ -20,6 +20,7 @@
20import tarfile20import tarfile
21import time21import time
2222
23from collections import namedtuple
23from io import BytesIO24from io import BytesIO
2425
2526
@@ -41,6 +42,14 @@
41 return fd_source.read() == fd_target.read()42 return fd_source.read() == fd_target.read()
4243
4344
45# For the data portion of the set and dict contents.
46FHash = namedtuple('FHash', 'mode devmajor devminor type uid gid size mtime')
47# The set contents record.
48SContent = namedtuple('SContent', 'path filetype data')
49# The dict contents record.
50DContent = namedtuple('DContent', 'filetype data')
51
52
44def list_tarfile(tarfile):53def list_tarfile(tarfile):
45 """54 """
46 Walk through a tarfile and generate a list of the content.55 Walk through a tarfile and generate a list of the content.
@@ -55,22 +64,23 @@
5564
56 for entry in tarfile:65 for entry in tarfile:
57 if entry.isdir():66 if entry.isdir():
58 set_content.add((entry.path, "dir", None))67 set_content.add(SContent(entry.path, "dir", None))
59 dict_content[entry.path] = ("dir", None)68 dict_content[entry.path] = DContent("dir", None)
60 else:69 else:
61 fhash = ("%s" % entry.mode,70 fhash = FHash(
62 "%s" % entry.devmajor,71 "%s" % entry.mode,
63 "%s" % entry.devminor,72 "%s" % entry.devmajor,
64 "%s" % entry.type.decode("utf-8"),73 "%s" % entry.devminor,
65 "%s" % entry.uid,74 "%s" % entry.type.decode("utf-8"),
66 "%s" % entry.gid,75 "%s" % entry.uid,
67 "%s" % entry.size,76 "%s" % entry.gid,
68 "%s" % entry.mtime)77 "%s" % entry.size,
6978 "%s" % entry.mtime)
70 set_content.add((entry.path, "file", fhash))79
71 dict_content[entry.path] = ("file", fhash)80 set_content.add(SContent(entry.path, "file", fhash))
7281 dict_content[entry.path] = DContent("file", fhash)
73 return (set_content, dict_content)82
83 return set_content, dict_content
7484
7585
76class ImageDiff:86class ImageDiff:
@@ -87,15 +97,12 @@
87 Scan the content of an image and return the image tuple.97 Scan the content of an image and return the image tuple.
88 This also caches the content for further use.98 This also caches the content for further use.
89 """99 """
90100 image_file = getattr(self, image + "_file", None)
91 if image not in ("source", "target"):101 if image_file is None:
92 raise KeyError("Invalid image '%s'." % image)102 raise KeyError("Invalid image '%s'." % image)
93103
94 image_file = getattr(self, "%s_file" % image)
95
96 content = list_tarfile(image_file)104 content = list_tarfile(image_file)
97105 setattr(self, image + "_content", content)
98 setattr(self, "%s_content" % image, content)
99 return content106 return content
100107
101 def compare_images(self):108 def compare_images(self):
@@ -105,63 +112,94 @@
105112
106 The set contains tuples of (path, changetype).113 The set contains tuples of (path, changetype).
107 """114 """
108 if not self.source_content:115 if self.source_content is None:
109 self.scan_content("source")116 self.scan_content("source")
110117
111 if not self.target_content:118 if self.target_content is None:
112 self.scan_content("target")119 self.scan_content("target")
113120
121 source_set, source_dict = self.source_content
122 target_set, target_dict = self.target_content
123
114 # Find the changes in the two trees124 # Find the changes in the two trees
115 changes = set()125 changes = set()
116 for change in self.source_content[0] \126 for change in source_set.symmetric_difference(target_set):
117 .symmetric_difference(self.target_content[0]):127 if change.path not in source_dict:
118 if change[0] not in self.source_content[1]:128 change_type = "add"
119 changetype = "add"129 elif change.path not in target_dict:
120 elif change[0] not in self.target_content[1]:130 change_type = "del"
121 changetype = "del"
122 else:131 else:
123 changetype = "mod"132 change_type = "mod"
124 changes.add((change[0], changetype))133 changes.add((change.path, change_type))
134
135 # Do a second pass through the source and target sets, looking for any
136 # hardlinks that point to a file that's being modified in the target.
137 # These links must get also get modified or they'll end up pointing to
138 # the old inode.
139 for no_change in source_set.intersection(target_set):
140 if no_change.filetype == "file" and no_change.data.type == "1":
141 # This is a hardlink which exists in both the source and
142 # target, *and* points to the same link target (by virtue of
143 # the set intersection).
144 changes.add((no_change.path, "mod"))
125145
126 # Ignore files that only vary in mtime146 # Ignore files that only vary in mtime
127 # (separate loop to run after de-dupe)147 # (separate loop to run after de-dupe)
128 for change in sorted(changes):148 for change in sorted(changes):
129 if change[1] == "mod":149 change_path, change_type = change
130 fstat_source = self.source_content[1][change[0]][1]150 if change_type == "mod":
131 fstat_target = self.target_content[1][change[0]][1]151 fstat_source = source_dict[change_path].data
152 fstat_target = target_dict[change_path].data
132153
133 # Skip differences between directories and files154 # Skip differences between directories and files
134 if not fstat_source or not fstat_target: # pragma: no cover155 if not fstat_source or not fstat_target: # pragma: no cover
135 continue156 continue
136157
137 # Deal with switched hardlinks158 # Deal with switched hardlinks.
138 if (fstat_source[0:2] == fstat_target[0:2] and159 #
139 fstat_source[3] != fstat_target[3] and160 # stgraber says on 2015-05-27: this was trying to solve the
140 (fstat_source[3] == "1" or fstat_target[3] == "1") and161 # case where the hardlink target would be placed *after* the
141 fstat_source[4:5] == fstat_target[4:5] and162 # hardlink in the tar archive, leading to a hardlink being
142 fstat_source[7] == fstat_target[7]):163 # created to the wrong file at unpack. barry thinks: ???
143 source_file = self.source_file.getmember(change[0])164 if (
144 target_file = self.target_file.getmember(change[0])165 fstat_source.mode == fstat_target.mode
166 and fstat_source.devmajor == fstat_target.devmajor
167 and fstat_source.devminor == fstat_target.devminor
168 # "1" is the LNKTYPE, i.e. hard link.
169 and (fstat_source.type == "1" or
170 fstat_target.type == "1")
171 and fstat_source.uid == fstat_target.uid
172 and fstat_source.gid == fstat_target.gid
173 # size is ignored since it is always 0 for hardlinks.
174 and fstat_source.mtime == fstat_target.mtime):
175 source_file = self.source_file.getmember(change_path)
176 target_file = self.target_file.getmember(change_path)
145 if compare_files(177 if compare_files(
146 self.source_file.extractfile(change[0]),178 self.source_file.extractfile(change_path),
147 self.target_file.extractfile(change[0])):179 self.target_file.extractfile(change_path)):
148 changes.remove(change)180 changes.remove(change)
149 continue181 continue
150182
151 # Deal with regular files183 # Deal with regular files. Compare all attributes of the file
184 # except the mtime.
152 if fstat_source[0:7] == fstat_target[0:7]:185 if fstat_source[0:7] == fstat_target[0:7]:
153 source_file = self.source_file.getmember(change[0])186 source_file = self.source_file.getmember(change_path)
154 target_file = self.target_file.getmember(change[0])187 target_file = self.target_file.getmember(change_path)
155188 # Symlinks that point to the same file in both the source
156 if (source_file.linkpath189 # and target can be ignored, however *hardlinks* cannot,
190 # since the inode they point to may change out from
191 # underneath them.
192 if (
193 source_file.type == "2"
194 and target_file.type == "2"
157 and source_file.linkpath == target_file.linkpath):195 and source_file.linkpath == target_file.linkpath):
158 changes.remove(change)196 changes.remove(change)
159 continue197 continue
160198
161 if (source_file.isfile() and target_file.isfile()199 if (source_file.isfile() and target_file.isfile()
162 and compare_files(200 and compare_files(
163 self.source_file.extractfile(change[0]),201 self.source_file.extractfile(change_path),
164 self.target_file.extractfile(change[0]))):202 self.target_file.extractfile(change_path))):
165 changes.remove(change)203 changes.remove(change)
166 continue204 continue
167205
168206
=== modified file 'lib/systemimage/tests/test_diff.py'
--- lib/systemimage/tests/test_diff.py 2015-04-27 20:04:09 +0000
+++ lib/systemimage/tests/test_diff.py 2015-06-01 16:00:55 +0000
@@ -15,6 +15,7 @@
15# You should have received a copy of the GNU General Public License15# You should have received a copy of the GNU General Public License
16# along with this program. If not, see <http://www.gnu.org/licenses/>.16# along with this program. If not, see <http://www.gnu.org/licenses/>.
1717
18import os
18import shutil19import shutil
19import sys20import sys
20import tarfile21import tarfile
@@ -26,11 +27,14 @@
2627
2728
28class DiffTests(unittest.TestCase):29class DiffTests(unittest.TestCase):
30 maxDiff = None
31
29 def setUp(self):32 def setUp(self):
30 temp_directory = tempfile.mkdtemp()33 self.temp_directory = temp_directory = tempfile.mkdtemp()
34 self.addCleanup(shutil.rmtree, temp_directory)
3135
32 source_tarball_path = "%s/source.tar" % temp_directory36 source_tarball_path = os.path.join(temp_directory, "source.tar")
33 target_tarball_path = "%s/target.tar" % temp_directory37 target_tarball_path = os.path.join(temp_directory, "target.tar")
3438
35 source_tarball = tarfile.open(39 source_tarball = tarfile.open(
36 source_tarball_path, "w", encoding="utf-8")40 source_tarball_path, "w", encoding="utf-8")
@@ -205,10 +209,6 @@
205 self.imagediff = ImageDiff(source_tarball_path, target_tarball_path)209 self.imagediff = ImageDiff(source_tarball_path, target_tarball_path)
206 self.source_tarball_path = source_tarball_path210 self.source_tarball_path = source_tarball_path
207 self.target_tarball_path = target_tarball_path211 self.target_tarball_path = target_tarball_path
208 self.temp_directory = temp_directory
209
210 def tearDown(self):
211 shutil.rmtree(self.temp_directory)
212212
213 def test_content(self):213 def test_content(self):
214 content_set, content_dict = self.imagediff.scan_content("source")214 content_set, content_dict = self.imagediff.scan_content("source")
@@ -249,10 +249,11 @@
249 output = sys.stdout.getvalue()249 output = sys.stdout.getvalue()
250 sys.stdout = old_stdout250 sys.stdout = old_stdout
251251
252 self.assertEqual(output, """ - b (del)252 self.assertMultiLineEqual(output, """ - b (del)
253 - c/a_i (add)253 - c/a_i (add)
254 - c/c (add)254 - c/c (add)
255 - c/d (mod)255 - c/d (mod)
256 - c/h (mod)
256 - c/j (add)257 - c/j (add)
257 - dir (mod)258 - dir (mod)
258 - e (add)259 - e (add)
@@ -269,13 +270,165 @@
269 tarball = tarfile.open(output_tarball, "r")270 tarball = tarfile.open(output_tarball, "r")
270271
271 files_list = [entry.name for entry in tarball]272 files_list = [entry.name for entry in tarball]
272 self.assertEqual(files_list, ['removed', 'c/c', 'c/a_i', 'c/d', 'c/j',273 self.assertEqual(files_list, [
273 'dir', 'e', 'f', 'system/o',274 'removed',
274 'system/o.1'])275 'c/c',
275276 'c/a_i',
277 'c/d',
278 'c/h',
279 'c/j',
280 'dir',
281 'e',
282 'f',
283 'system/o',
284 'system/o.1',
285 ])
276 removed_list = tarball.extractfile("removed")286 removed_list = tarball.extractfile("removed")
277 self.assertEqual(removed_list.read().decode("utf-8"), u"""b287 self.assertEqual(removed_list.read().decode("utf-8"), u"""b
278c/d288c/d
289c/h
279dir290dir
280system/中文中文中文291system/中文中文中文
281""")292""")
293
294
295class TestHardLinkTargetIsModified(unittest.TestCase):
296 def setUp(self):
297 self.temp_directory = tempfile.mkdtemp()
298 self.addCleanup(shutil.rmtree, self.temp_directory)
299
300 def _make_tarballs(self, order):
301 source_tarball_path = os.path.join(self.temp_directory, "source.tar")
302 target_tarball_path = os.path.join(self.temp_directory, "target.tar")
303
304 # Use an ExitStack() when we drop Python 2.7 compatibility.
305 source_tarball = tarfile.open(
306 source_tarball_path, "w", encoding="utf-8")
307 target_tarball = tarfile.open(
308 target_tarball_path, "w", encoding="utf-8")
309
310 if order == "a->b":
311 # Add a regular file to the source.
312 a_source = tarfile.TarInfo()
313 a_source.name = "a"
314 a_source.size = 4
315 source_tarball.addfile(a_source, BytesIO(b"XXXX"))
316
317 # Add a hardlink to this file.
318 b = tarfile.TarInfo()
319 b.name = "b"
320 b.type = tarfile.LNKTYPE
321 b.linkname = "a"
322 source_tarball.addfile(b)
323
324 # Change the content of link's target, i.e. the "a" file in the
325 # target tarball, but keep the hardlink pointing at it.
326 a_target = tarfile.TarInfo()
327 a_target.name = "a"
328 a_target.size = 5
329
330 target_tarball.addfile(a_target, BytesIO(b"YYYYY"))
331 target_tarball.addfile(b)
332 else:
333 assert order == "b->a", "Bad order: {}".format(order)
334 # Add a regular file to the source.
335 a_source = tarfile.TarInfo()
336 a_source.name = "a"
337 a_source.size = 4
338 source_tarball.addfile(a_source, BytesIO(b"XXXX"))
339
340 # Add a hardlink to this file.
341 b_source = tarfile.TarInfo()
342 b_source.name = "b"
343 b_source.type = tarfile.LNKTYPE
344 b_source.linkname = "a"
345 source_tarball.addfile(b_source)
346
347 # Swap things around in the target such that 'b' is the regular
348 # file and 'a' is the hardlink to b.
349 b_target = tarfile.TarInfo()
350 b_target.name = "b"
351 b_target.size = 5
352 target_tarball.addfile(b_target, BytesIO(b"YYYYY"))
353
354 a_target = tarfile.TarInfo()
355 a_target.name = "a"
356 a_target.type = tarfile.LNKTYPE
357 a_target.linkname = "b"
358 target_tarball.addfile(a_target)
359
360 source_tarball.close()
361 target_tarball.close()
362
363 return source_tarball_path, target_tarball_path
364
365 def test_link_count_2_order_ab(self):
366 # LP: #1444347 - a file with link count 2 (i.e. two hardlinks to the
367 # same inode) doesn't get both sources updated.
368 diff = ImageDiff(*self._make_tarballs("a->b"))
369 change_set = diff.compare_images()
370 self.assertEqual(change_set, {("a", "mod"), ("b", "mod")})
371
372 def test_unpack_ab(self):
373 # Ensure that the unpacked target tarball has a correct hardlink.
374 source_path, target_path = self._make_tarballs("a->b")
375 diff = ImageDiff(source_path, target_path)
376 diff_path = os.path.join(self.temp_directory, "diff.tar")
377 diff.generate_diff_tarball(diff_path)
378 # Unpack the source, then unpack the target over that.
379 unpack_path = os.path.join(self.temp_directory, "unpack")
380 os.mkdir(unpack_path)
381 with tarfile.open(source_path, "r:", encoding="utf-8") as tf:
382 tf.extractall(unpack_path)
383 # Before applying the diff, "b" contains the old "a" file's contents.
384 with open(os.path.join(unpack_path, "b"), "rb") as fp:
385 contents = fp.read()
386 self.assertEqual(contents, b"XXXX")
387 # Unpack the diff, which changes both the contents of 'a' and the
388 # hardlink 'b'.
389 with tarfile.open(diff_path, "r:", encoding="utf-8") as tf:
390 # Process any file removals first.
391 removed_list = tf.extractfile("removed")
392 for line in removed_list:
393 os.unlink(os.path.join(unpack_path,
394 line.decode("utf-8").rstrip()))
395 tf.extractall(unpack_path)
396 with open(os.path.join(unpack_path, "b"), "rb") as fp:
397 contents = fp.read()
398 self.assertEqual(contents, b"YYYYY")
399
400 def test_link_count_2_swap_roles(self):
401 # Like above but the source has regular file 'a' with hardlink 'b'
402 # pointing to it, while the target has regular file 'b' with the
403 # hardlink 'a' pointing to it. Both must be properly updated.
404 diff = ImageDiff(*self._make_tarballs("b->a"))
405 change_set = diff.compare_images()
406 self.assertEqual(change_set, {("a", "mod"), ("b", "mod")})
407
408 def test_unpack_ab_swap_roles(self):
409 # Ensure that the unpacked target tarball has a correct hardlink.
410 source_path, target_path = self._make_tarballs("b->a")
411 diff = ImageDiff(source_path, target_path)
412 diff_path = os.path.join(self.temp_directory, "diff.tar")
413 diff.generate_diff_tarball(diff_path)
414 # Unpack the source, then unpack the target over that.
415 unpack_path = os.path.join(self.temp_directory, "unpack")
416 os.mkdir(unpack_path)
417 with tarfile.open(source_path, "r:", encoding="utf-8") as tf:
418 tf.extractall(unpack_path)
419 # Before applying the diff, "b" contains the old "a" file's contents.
420 with open(os.path.join(unpack_path, "b"), "rb") as fp:
421 contents = fp.read()
422 self.assertEqual(contents, b"XXXX")
423 # Unpack the diff, which changes both the contents of 'a' and the
424 # hardlink 'b'.
425 with tarfile.open(diff_path, "r:", encoding="utf-8") as tf:
426 # Process any file removals first.
427 removed_list = tf.extractfile("removed")
428 for line in removed_list:
429 os.unlink(os.path.join(unpack_path,
430 line.decode("utf-8").rstrip()))
431 tf.extractall(unpack_path)
432 with open(os.path.join(unpack_path, "b"), "rb") as fp:
433 contents = fp.read()
434 self.assertEqual(contents, b"YYYYY")

Subscribers

People subscribed via source and target branches

to all changes: