Merge lp:~barry/ubuntu-system-image/lp1444347 into lp:ubuntu-system-image/server

Proposed by Barry Warsaw
Status: Merged
Merged at revision: 271
Proposed branch: lp:~barry/ubuntu-system-image/lp1444347
Merge into: lp:ubuntu-system-image/server
Diff against target: 433 lines (+254/-63)
2 files modified
lib/systemimage/diff.py (+89/-51)
lib/systemimage/tests/test_diff.py (+165/-12)
To merge this branch: bzr merge lp:~barry/ubuntu-system-image/lp1444347
Reviewer Review Type Date Requested Status
Steve Langasek (community) Needs Fixing
Caio Begotti (community) Approve
Stéphane Graber Pending
Registry Administrators Pending
Review via email: mp+260597@code.launchpad.net

Description of the change

Addresses LP: #1444347 by ensuring that any hardlinks which at first appear not to change between the source and target, but which their underlying link target *does* change, gets properly modified in the diff. This handles the following cases (among others):

source:
a (regular file)
b -> a (via hardlink)

target
a (regular file, modified)
b -> a (via hardlink)

If the diff doesn't not also include b then after application of the diff, b will point to a's *old* inode, but not the "new" a. This branch ensures that in this case, b also gets modified by the diff so that it will point to a's new inode.

Coincidentally, this also fixes a related previously broken case:

source:
a (regular file)
b -> a (via hardlink)

target:
b (regular file)
a -> b (via hardlink)

In this case, both a and b need to be modified in the diff.

This branch includes a few other drive-by improvements.

* Rather than use hard to read nested indexes to get at the bits of information, we use a namedtuple for the internal data structures, which allow us to use easy to read attribute access to ensure we're getting at the bits and pieces we intend to get to.

* Along those lines, this actually fixes some broken indexing in the big "switched hardlinks" conditional. E.g. devminor wasn't being checked previously, nor was gid. Those are now being checked.

* Lots of comments :)

* A few minor code style improvements.

One more important note as you're reviewing this branch. There is a change in an existing test, but I believe it is legitimate. In the original test, the hardlink c/h -> d did not get modified, but this is exactly the case we're trying to fix above. In the test, d does change but the hardlink to d (i.e. c/h) wasn't being changed. That means that after the delta, c/h would point to the old d inode, and we want it to point to the new d.

To post a comment you must log in.
Revision history for this message
Caio Begotti (caio1982) wrote :

Looks alright given the bug info, even though I can't reproduce or test it in a device myself :-)

review: Approve
Revision history for this message
Steve Langasek (vorlon) wrote :

Applying the changes to the test suite without the changes to the code, I see only the following set of test failures:

test_generate_tarball (systemimage.tests.test_diff.DiffTests) ... FAIL
test_print_changes (systemimage.tests.test_diff.DiffTests) ... FAIL
test_link_count_2_order_ab (systemimage.tests.test_diff.TestHardLinkTargetIsModified) ... FAIL

Crucially, the two unpack tests did *not* fail:

test_unpack_ab (systemimage.tests.test_diff.TestHardLinkTargetIsModified) ... ok
test_unpack_ab_swap_roles (systemimage.tests.test_diff.TestHardLinkTargetIsModified) ... ok

This indicates that the unpack tests have not succeeded in modeling the failure that was seen in the wild (see further comments inline).

Without that, I don't think we can be confident in the correctness of this fix.

I've posted a follow-up comment on the bug with some analysis of the original failed delta. I believe the processing of the 'removed' file is key - in the original scenario, system/usr/bin/python3.4m is removed and then re-unpacked, and system/usr/bin/python3.4 is unchanged as a result (because the removal breaks the relationship between the two hardlinks).

The unpack tests need to accurately replicate the behavior of the upgrader.

review: Needs Fixing
274. By Barry Warsaw

A more accurate test of the unpack case.

Revision history for this message
Barry Warsaw (barry) wrote :

Good catch, thanks! I've merged your branch into mine, so this should be ready to go. I'm running the full tox suite locally, but I expect it to pass.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/systemimage/diff.py'
2--- lib/systemimage/diff.py 2015-04-27 20:04:09 +0000
3+++ lib/systemimage/diff.py 2015-06-01 16:00:55 +0000
4@@ -20,6 +20,7 @@
5 import tarfile
6 import time
7
8+from collections import namedtuple
9 from io import BytesIO
10
11
12@@ -41,6 +42,14 @@
13 return fd_source.read() == fd_target.read()
14
15
16+# For the data portion of the set and dict contents.
17+FHash = namedtuple('FHash', 'mode devmajor devminor type uid gid size mtime')
18+# The set contents record.
19+SContent = namedtuple('SContent', 'path filetype data')
20+# The dict contents record.
21+DContent = namedtuple('DContent', 'filetype data')
22+
23+
24 def list_tarfile(tarfile):
25 """
26 Walk through a tarfile and generate a list of the content.
27@@ -55,22 +64,23 @@
28
29 for entry in tarfile:
30 if entry.isdir():
31- set_content.add((entry.path, "dir", None))
32- dict_content[entry.path] = ("dir", None)
33+ set_content.add(SContent(entry.path, "dir", None))
34+ dict_content[entry.path] = DContent("dir", None)
35 else:
36- fhash = ("%s" % entry.mode,
37- "%s" % entry.devmajor,
38- "%s" % entry.devminor,
39- "%s" % entry.type.decode("utf-8"),
40- "%s" % entry.uid,
41- "%s" % entry.gid,
42- "%s" % entry.size,
43- "%s" % entry.mtime)
44-
45- set_content.add((entry.path, "file", fhash))
46- dict_content[entry.path] = ("file", fhash)
47-
48- return (set_content, dict_content)
49+ fhash = FHash(
50+ "%s" % entry.mode,
51+ "%s" % entry.devmajor,
52+ "%s" % entry.devminor,
53+ "%s" % entry.type.decode("utf-8"),
54+ "%s" % entry.uid,
55+ "%s" % entry.gid,
56+ "%s" % entry.size,
57+ "%s" % entry.mtime)
58+
59+ set_content.add(SContent(entry.path, "file", fhash))
60+ dict_content[entry.path] = DContent("file", fhash)
61+
62+ return set_content, dict_content
63
64
65 class ImageDiff:
66@@ -87,15 +97,12 @@
67 Scan the content of an image and return the image tuple.
68 This also caches the content for further use.
69 """
70-
71- if image not in ("source", "target"):
72+ image_file = getattr(self, image + "_file", None)
73+ if image_file is None:
74 raise KeyError("Invalid image '%s'." % image)
75
76- image_file = getattr(self, "%s_file" % image)
77-
78 content = list_tarfile(image_file)
79-
80- setattr(self, "%s_content" % image, content)
81+ setattr(self, image + "_content", content)
82 return content
83
84 def compare_images(self):
85@@ -105,63 +112,94 @@
86
87 The set contains tuples of (path, changetype).
88 """
89- if not self.source_content:
90+ if self.source_content is None:
91 self.scan_content("source")
92
93- if not self.target_content:
94+ if self.target_content is None:
95 self.scan_content("target")
96
97+ source_set, source_dict = self.source_content
98+ target_set, target_dict = self.target_content
99+
100 # Find the changes in the two trees
101 changes = set()
102- for change in self.source_content[0] \
103- .symmetric_difference(self.target_content[0]):
104- if change[0] not in self.source_content[1]:
105- changetype = "add"
106- elif change[0] not in self.target_content[1]:
107- changetype = "del"
108+ for change in source_set.symmetric_difference(target_set):
109+ if change.path not in source_dict:
110+ change_type = "add"
111+ elif change.path not in target_dict:
112+ change_type = "del"
113 else:
114- changetype = "mod"
115- changes.add((change[0], changetype))
116+ change_type = "mod"
117+ changes.add((change.path, change_type))
118+
119+ # Do a second pass through the source and target sets, looking for any
120+ # hardlinks that point to a file that's being modified in the target.
121+ # These links must get also get modified or they'll end up pointing to
122+ # the old inode.
123+ for no_change in source_set.intersection(target_set):
124+ if no_change.filetype == "file" and no_change.data.type == "1":
125+ # This is a hardlink which exists in both the source and
126+ # target, *and* points to the same link target (by virtue of
127+ # the set intersection).
128+ changes.add((no_change.path, "mod"))
129
130 # Ignore files that only vary in mtime
131 # (separate loop to run after de-dupe)
132 for change in sorted(changes):
133- if change[1] == "mod":
134- fstat_source = self.source_content[1][change[0]][1]
135- fstat_target = self.target_content[1][change[0]][1]
136+ change_path, change_type = change
137+ if change_type == "mod":
138+ fstat_source = source_dict[change_path].data
139+ fstat_target = target_dict[change_path].data
140
141 # Skip differences between directories and files
142 if not fstat_source or not fstat_target: # pragma: no cover
143 continue
144
145- # Deal with switched hardlinks
146- if (fstat_source[0:2] == fstat_target[0:2] and
147- fstat_source[3] != fstat_target[3] and
148- (fstat_source[3] == "1" or fstat_target[3] == "1") and
149- fstat_source[4:5] == fstat_target[4:5] and
150- fstat_source[7] == fstat_target[7]):
151- source_file = self.source_file.getmember(change[0])
152- target_file = self.target_file.getmember(change[0])
153+ # Deal with switched hardlinks.
154+ #
155+ # stgraber says on 2015-05-27: this was trying to solve the
156+ # case where the hardlink target would be placed *after* the
157+ # hardlink in the tar archive, leading to a hardlink being
158+ # created to the wrong file at unpack. barry thinks: ???
159+ if (
160+ fstat_source.mode == fstat_target.mode
161+ and fstat_source.devmajor == fstat_target.devmajor
162+ and fstat_source.devminor == fstat_target.devminor
163+ # "1" is the LNKTYPE, i.e. hard link.
164+ and (fstat_source.type == "1" or
165+ fstat_target.type == "1")
166+ and fstat_source.uid == fstat_target.uid
167+ and fstat_source.gid == fstat_target.gid
168+ # size is ignored since it is always 0 for hardlinks.
169+ and fstat_source.mtime == fstat_target.mtime):
170+ source_file = self.source_file.getmember(change_path)
171+ target_file = self.target_file.getmember(change_path)
172 if compare_files(
173- self.source_file.extractfile(change[0]),
174- self.target_file.extractfile(change[0])):
175+ self.source_file.extractfile(change_path),
176+ self.target_file.extractfile(change_path)):
177 changes.remove(change)
178 continue
179
180- # Deal with regular files
181+ # Deal with regular files. Compare all attributes of the file
182+ # except the mtime.
183 if fstat_source[0:7] == fstat_target[0:7]:
184- source_file = self.source_file.getmember(change[0])
185- target_file = self.target_file.getmember(change[0])
186-
187- if (source_file.linkpath
188+ source_file = self.source_file.getmember(change_path)
189+ target_file = self.target_file.getmember(change_path)
190+ # Symlinks that point to the same file in both the source
191+ # and target can be ignored, however *hardlinks* cannot,
192+ # since the inode they point to may change out from
193+ # underneath them.
194+ if (
195+ source_file.type == "2"
196+ and target_file.type == "2"
197 and source_file.linkpath == target_file.linkpath):
198 changes.remove(change)
199 continue
200
201 if (source_file.isfile() and target_file.isfile()
202 and compare_files(
203- self.source_file.extractfile(change[0]),
204- self.target_file.extractfile(change[0]))):
205+ self.source_file.extractfile(change_path),
206+ self.target_file.extractfile(change_path))):
207 changes.remove(change)
208 continue
209
210
211=== modified file 'lib/systemimage/tests/test_diff.py'
212--- lib/systemimage/tests/test_diff.py 2015-04-27 20:04:09 +0000
213+++ lib/systemimage/tests/test_diff.py 2015-06-01 16:00:55 +0000
214@@ -15,6 +15,7 @@
215 # You should have received a copy of the GNU General Public License
216 # along with this program. If not, see <http://www.gnu.org/licenses/>.
217
218+import os
219 import shutil
220 import sys
221 import tarfile
222@@ -26,11 +27,14 @@
223
224
225 class DiffTests(unittest.TestCase):
226+ maxDiff = None
227+
228 def setUp(self):
229- temp_directory = tempfile.mkdtemp()
230+ self.temp_directory = temp_directory = tempfile.mkdtemp()
231+ self.addCleanup(shutil.rmtree, temp_directory)
232
233- source_tarball_path = "%s/source.tar" % temp_directory
234- target_tarball_path = "%s/target.tar" % temp_directory
235+ source_tarball_path = os.path.join(temp_directory, "source.tar")
236+ target_tarball_path = os.path.join(temp_directory, "target.tar")
237
238 source_tarball = tarfile.open(
239 source_tarball_path, "w", encoding="utf-8")
240@@ -205,10 +209,6 @@
241 self.imagediff = ImageDiff(source_tarball_path, target_tarball_path)
242 self.source_tarball_path = source_tarball_path
243 self.target_tarball_path = target_tarball_path
244- self.temp_directory = temp_directory
245-
246- def tearDown(self):
247- shutil.rmtree(self.temp_directory)
248
249 def test_content(self):
250 content_set, content_dict = self.imagediff.scan_content("source")
251@@ -249,10 +249,11 @@
252 output = sys.stdout.getvalue()
253 sys.stdout = old_stdout
254
255- self.assertEqual(output, """ - b (del)
256+ self.assertMultiLineEqual(output, """ - b (del)
257 - c/a_i (add)
258 - c/c (add)
259 - c/d (mod)
260+ - c/h (mod)
261 - c/j (add)
262 - dir (mod)
263 - e (add)
264@@ -269,13 +270,165 @@
265 tarball = tarfile.open(output_tarball, "r")
266
267 files_list = [entry.name for entry in tarball]
268- self.assertEqual(files_list, ['removed', 'c/c', 'c/a_i', 'c/d', 'c/j',
269- 'dir', 'e', 'f', 'system/o',
270- 'system/o.1'])
271-
272+ self.assertEqual(files_list, [
273+ 'removed',
274+ 'c/c',
275+ 'c/a_i',
276+ 'c/d',
277+ 'c/h',
278+ 'c/j',
279+ 'dir',
280+ 'e',
281+ 'f',
282+ 'system/o',
283+ 'system/o.1',
284+ ])
285 removed_list = tarball.extractfile("removed")
286 self.assertEqual(removed_list.read().decode("utf-8"), u"""b
287 c/d
288+c/h
289 dir
290 system/中文中文中文
291 """)
292+
293+
294+class TestHardLinkTargetIsModified(unittest.TestCase):
295+ def setUp(self):
296+ self.temp_directory = tempfile.mkdtemp()
297+ self.addCleanup(shutil.rmtree, self.temp_directory)
298+
299+ def _make_tarballs(self, order):
300+ source_tarball_path = os.path.join(self.temp_directory, "source.tar")
301+ target_tarball_path = os.path.join(self.temp_directory, "target.tar")
302+
303+ # Use an ExitStack() when we drop Python 2.7 compatibility.
304+ source_tarball = tarfile.open(
305+ source_tarball_path, "w", encoding="utf-8")
306+ target_tarball = tarfile.open(
307+ target_tarball_path, "w", encoding="utf-8")
308+
309+ if order == "a->b":
310+ # Add a regular file to the source.
311+ a_source = tarfile.TarInfo()
312+ a_source.name = "a"
313+ a_source.size = 4
314+ source_tarball.addfile(a_source, BytesIO(b"XXXX"))
315+
316+ # Add a hardlink to this file.
317+ b = tarfile.TarInfo()
318+ b.name = "b"
319+ b.type = tarfile.LNKTYPE
320+ b.linkname = "a"
321+ source_tarball.addfile(b)
322+
323+ # Change the content of link's target, i.e. the "a" file in the
324+ # target tarball, but keep the hardlink pointing at it.
325+ a_target = tarfile.TarInfo()
326+ a_target.name = "a"
327+ a_target.size = 5
328+
329+ target_tarball.addfile(a_target, BytesIO(b"YYYYY"))
330+ target_tarball.addfile(b)
331+ else:
332+ assert order == "b->a", "Bad order: {}".format(order)
333+ # Add a regular file to the source.
334+ a_source = tarfile.TarInfo()
335+ a_source.name = "a"
336+ a_source.size = 4
337+ source_tarball.addfile(a_source, BytesIO(b"XXXX"))
338+
339+ # Add a hardlink to this file.
340+ b_source = tarfile.TarInfo()
341+ b_source.name = "b"
342+ b_source.type = tarfile.LNKTYPE
343+ b_source.linkname = "a"
344+ source_tarball.addfile(b_source)
345+
346+ # Swap things around in the target such that 'b' is the regular
347+ # file and 'a' is the hardlink to b.
348+ b_target = tarfile.TarInfo()
349+ b_target.name = "b"
350+ b_target.size = 5
351+ target_tarball.addfile(b_target, BytesIO(b"YYYYY"))
352+
353+ a_target = tarfile.TarInfo()
354+ a_target.name = "a"
355+ a_target.type = tarfile.LNKTYPE
356+ a_target.linkname = "b"
357+ target_tarball.addfile(a_target)
358+
359+ source_tarball.close()
360+ target_tarball.close()
361+
362+ return source_tarball_path, target_tarball_path
363+
364+ def test_link_count_2_order_ab(self):
365+ # LP: #1444347 - a file with link count 2 (i.e. two hardlinks to the
366+ # same inode) doesn't get both sources updated.
367+ diff = ImageDiff(*self._make_tarballs("a->b"))
368+ change_set = diff.compare_images()
369+ self.assertEqual(change_set, {("a", "mod"), ("b", "mod")})
370+
371+ def test_unpack_ab(self):
372+ # Ensure that the unpacked target tarball has a correct hardlink.
373+ source_path, target_path = self._make_tarballs("a->b")
374+ diff = ImageDiff(source_path, target_path)
375+ diff_path = os.path.join(self.temp_directory, "diff.tar")
376+ diff.generate_diff_tarball(diff_path)
377+ # Unpack the source, then unpack the target over that.
378+ unpack_path = os.path.join(self.temp_directory, "unpack")
379+ os.mkdir(unpack_path)
380+ with tarfile.open(source_path, "r:", encoding="utf-8") as tf:
381+ tf.extractall(unpack_path)
382+ # Before applying the diff, "b" contains the old "a" file's contents.
383+ with open(os.path.join(unpack_path, "b"), "rb") as fp:
384+ contents = fp.read()
385+ self.assertEqual(contents, b"XXXX")
386+ # Unpack the diff, which changes both the contents of 'a' and the
387+ # hardlink 'b'.
388+ with tarfile.open(diff_path, "r:", encoding="utf-8") as tf:
389+ # Process any file removals first.
390+ removed_list = tf.extractfile("removed")
391+ for line in removed_list:
392+ os.unlink(os.path.join(unpack_path,
393+ line.decode("utf-8").rstrip()))
394+ tf.extractall(unpack_path)
395+ with open(os.path.join(unpack_path, "b"), "rb") as fp:
396+ contents = fp.read()
397+ self.assertEqual(contents, b"YYYYY")
398+
399+ def test_link_count_2_swap_roles(self):
400+ # Like above but the source has regular file 'a' with hardlink 'b'
401+ # pointing to it, while the target has regular file 'b' with the
402+ # hardlink 'a' pointing to it. Both must be properly updated.
403+ diff = ImageDiff(*self._make_tarballs("b->a"))
404+ change_set = diff.compare_images()
405+ self.assertEqual(change_set, {("a", "mod"), ("b", "mod")})
406+
407+ def test_unpack_ab_swap_roles(self):
408+ # Ensure that the unpacked target tarball has a correct hardlink.
409+ source_path, target_path = self._make_tarballs("b->a")
410+ diff = ImageDiff(source_path, target_path)
411+ diff_path = os.path.join(self.temp_directory, "diff.tar")
412+ diff.generate_diff_tarball(diff_path)
413+ # Unpack the source, then unpack the target over that.
414+ unpack_path = os.path.join(self.temp_directory, "unpack")
415+ os.mkdir(unpack_path)
416+ with tarfile.open(source_path, "r:", encoding="utf-8") as tf:
417+ tf.extractall(unpack_path)
418+ # Before applying the diff, "b" contains the old "a" file's contents.
419+ with open(os.path.join(unpack_path, "b"), "rb") as fp:
420+ contents = fp.read()
421+ self.assertEqual(contents, b"XXXX")
422+ # Unpack the diff, which changes both the contents of 'a' and the
423+ # hardlink 'b'.
424+ with tarfile.open(diff_path, "r:", encoding="utf-8") as tf:
425+ # Process any file removals first.
426+ removed_list = tf.extractfile("removed")
427+ for line in removed_list:
428+ os.unlink(os.path.join(unpack_path,
429+ line.decode("utf-8").rstrip()))
430+ tf.extractall(unpack_path)
431+ with open(os.path.join(unpack_path, "b"), "rb") as fp:
432+ contents = fp.read()
433+ self.assertEqual(contents, b"YYYYY")

Subscribers

People subscribed via source and target branches

to all changes: