Merge lp:~eric97/bzr/regenerate-pack-names into lp:bzr

Proposed by Eric Siegerman on 2011-02-15
Status: Work in progress
Proposed branch: lp:~eric97/bzr/regenerate-pack-names
Merge into: lp:bzr
Diff against target: 455 lines (+325/-8)
8 files modified
bzrlib/builtins.py (+32/-0)
bzrlib/repofmt/pack_repo.py (+53/-4)
bzrlib/repository.py (+12/-1)
bzrlib/tests/__init__.py (+1/-1)
bzrlib/tests/blackbox/__init__.py (+1/-0)
bzrlib/tests/blackbox/test_regenerate_pack_names.py (+131/-0)
bzrlib/tests/per_pack_repository.py (+91/-2)
doc/en/release-notes/bzr-2.4.txt (+4/-0)
To merge this branch: bzr merge lp:~eric97/bzr/regenerate-pack-names
Reviewer Review Type Date Requested Status
Martin Pool 2011-02-15 Needs Fixing on 2011-02-15
Review via email: mp+49756@code.launchpad.net

Description of the change

I clobbered the pack-names file in one of my shared repos. (Not
to worry; I'm 100% certain it was my screwup, *not* Bazaar's.)

This MP is the result. It adds a hidden "regenerate-pack-names"
command.

It does *not* actually overwrite .bzr/repository/pack-names, but
rather writes its new version to "pack-names.generated", for the
user to inspect and, if it looks good, rename to the live
filename. (Yeah, I know JAM's new repair-dirstate command does
the repairs in-place, but pack-names seems a far more dangerous
file to replace, so I wanted to build in an extra layer of
paranoia.)

Two issues before it's ready to merge:
  - Locking: Does this need to lock the repo? If so, read or
    write? Strictly, it should block other processes from even
    renaming new packs into packs/*.pack for the duration, but is
    that even possible? Not much point locking the
    rewrite-pack-names mutex, because it doesn't in fact write
    that file...

  - So far, I've only tested against local file access; no
    remote Transports. I wonder whether there's any point
    supporting it remotely, given that you need local shell
    access to rename the new file into service anyway.

To post a comment you must log in.
Martin Pool (mbp) wrote :

Thanks for addressing that gap. I don't recall hearing of anyone else hitting that exact problem, but it's certainly good to have ways for people to recover whenever we can.

For consistency I think we should call this repair-pack-names.

I think you should hold a lock while doing this. I think the repo lock is held while this is changed. You should check.

--force doesn't seem to have any effect, and the help implies it does.

I see your point about writing it to a temporary file, but I think that just makes it more difficult for people who've got into this state. I would rather for example back up the existing file, or indeed back up the whole repo dir, or show a description of the differences. For users who are not super technical and perhaps have their tree stored on a remote file server, asking them to also find and rename the file just makes it more difficult.

regenerate_pack_names seems likely to be duplicating some knowledge about the file format that's already present.

+ (hdr, err) = self.run_bzr_error([], ['dump-btree', '--raw', path],
+ retcode=0)

It would be cleaner and faster to use an API call for these rather than starting a new copy of bzr.

+ def assertFileExists(self, path):
+ self.assertTrue(os.path.exists(path), "%s exists" % path)
+
+ def assertNotFileExists(self, path):
+ self.assertFalse(os.path.exists(path), "%s does not exist" % path)
+

There is already failIfExists and failUnlessExists in TestCase.

  - So far, I've only tested against local file access; no
    remote Transports. I wonder whether there's any point
    supporting it remotely, given that you need local shell
    access to rename the new file into service anyway.

You're not doing anything transport-specific so I don't see any reason you would need to run per-transport tests. However, it would be good to make the unit tests run against a memory transport because that will be faster and it will validate you're not doing anything accidentally local fs specific.

If you want help with anything just ask.

Thanks, Martin

review: Needs Fixing
Eric Siegerman (eric97) wrote :

I've set this back to Work in Progress while I address Martin's comments.

Martin Pool (mbp) wrote :

On 19 February 2011 06:08, Eric Siegerman <email address hidden> wrote:
> I've set this back to Work in Progress while I address Martin's comments.

If you want help, or want someone else to finish it, just ask.

Unmerged revisions

5671. By Eric Siegerman <email address hidden> on 2011-02-15

Blackbox tests now verify that the old pack-names wasn't touched.

5670. By Eric Siegerman <email address hidden> on 2011-02-15

Doc and release-note fixes.

5669. By Eric Siegerman <email address hidden> on 2011-02-15

Add a test case, and rename another one.

5668. By Eric Siegerman <email address hidden> on 2011-02-14

Implement pack_repo.RepositoryPackCollection.regenerate_pack_names().

5667. By Eric Siegerman <email address hidden> on 2011-02-14

Push the implementation down a couple more levels.
It still doesn't do anything, but checks for a bunch of error
conditions first.

5666. By Eric Siegerman <email address hidden> on 2011-02-14

A couple more doc fixes: simple typos.

5665. By Eric Siegerman <email address hidden> on 2011-02-14

Drive-by doc fix: TestCase.run_captured() doesn't exist, but .run_bzr_error() captures output,
which is why the comment in question seems to think one might want to "see also" it.

5664. By Eric Siegerman <email address hidden> on 2011-02-13

Drive-by doc fix.

5663. By Eric Siegerman <email address hidden> on 2011-02-13

Skeleton.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bzrlib/builtins.py'
2--- bzrlib/builtins.py 2011-02-09 17:10:05 +0000
3+++ bzrlib/builtins.py 2011-02-15 02:33:59 +0000
4@@ -537,6 +537,38 @@
5 raise errors.BzrCommandError('failed to reset the tree state'
6 + extra)
7
8+class cmd_regenerate_pack_names(Command):
9+ __doc__ = """Regenerate the pack-names file.
10+
11+ This is not meant to be used normally, but only as a way to recover from
12+ filesystem corruption, etc. This creates a new .bzr/repository/pack-names
13+ file for the repository from scratch, including in it all packs in the
14+ repository's packs directory.
15+
16+ The existing pack-names file is *not* overwritten; the new file is written
17+ to .bzr/repository/pack-names.generated. After running this command, you
18+ should look over the new file, e.g. using `bzr dump-btree`; then, if it
19+ looks OK, move it into place manually.
20+ """
21+
22+ takes_options = ['directory',
23+ Option('force',
24+ help='Regenerate the file even if it doesn\'t appear to be'
25+ ' corrupted.'),
26+ ]
27+ hidden = True
28+
29+ def run(self, directory='.', force=False):
30+ bzr_dir = bzrdir.BzrDir.open(directory)
31+ repository = bzr_dir.open_repository()
32+ try:
33+ new_content = repository.regenerate_pack_names()
34+ except NotImplementedError:
35+ raise errors.BzrCommandError(directory + ': not a pack repository')
36+ t = repository._transport
37+ mode=repository.bzrdir._get_file_mode()
38+ t.put_file('pack-names.generated', new_content, mode)
39+
40
41 class cmd_revno(Command):
42 __doc__ = """Show current revision number.
43
44=== modified file 'bzrlib/repofmt/pack_repo.py'
45--- bzrlib/repofmt/pack_repo.py 2011-01-20 21:15:10 +0000
46+++ bzrlib/repofmt/pack_repo.py 2011-02-15 02:33:59 +0000
47@@ -1896,10 +1896,11 @@
48 def _diff_pack_names(self):
49 """Read the pack names from disk, and compare it to the one in memory.
50
51- :return: (disk_nodes, deleted_nodes, new_nodes)
52- disk_nodes The final set of nodes that should be referenced
53- deleted_nodes Nodes which have been removed from when we started
54- new_nodes Nodes that are newly introduced
55+ :return: (disk_nodes, deleted_nodes, new_nodes, orig_disk_nodes)
56+ disk_nodes The final set of nodes that should be referenced
57+ deleted_nodes Nodes which have been removed from when we started
58+ new_nodes Nodes that are newly introduced
59+ orig_disk_nodes Original set of nodes that are already referenced
60 """
61 # load the disk nodes across
62 disk_nodes = set()
63@@ -2216,6 +2217,47 @@
64 for token in tokens:
65 self._resume_pack(token)
66
67+ def regenerate_pack_names(self):
68+ """Build a new pack-names file by examining the pack and index files on disk.
69+
70+ Does *not* modify the existing pack-names file, but returns the contents
71+ for a new one.
72+
73+ :return: The contents that (this method believes) should be written
74+ to pack-names
75+ """
76+ if not self._pack_transport.listable():
77+ raise Exception("can't list repository's contents")
78+
79+ # Paranoia: make sure we don't have *anything* cached
80+ self.reset()
81+
82+ builder = self._index_builder_class()
83+ files = self._pack_transport.list_dir('.')
84+ for file in files:
85+ if not file.endswith('.pack'):
86+ continue
87+ pack_name = file[0:-5]
88+
89+ sizes = [None, None, None, None]
90+ if self.chk_index is not None:
91+ sizes.append(None)
92+ for suffix in self._suffix_offsets.keys():
93+ if suffix == '.cix' and self.chk_index is None:
94+ continue
95+ size_offset = self._suffix_offsets[suffix]
96+ index_name = pack_name + suffix
97+ size = self._index_transport.stat(index_name).st_size
98+ sizes[size_offset] = str(size)
99+
100+ builder.add_node((pack_name,), " ".join(sizes))
101+
102+ # More paranoia: if anything did make it back into the cache,
103+ # our caller might well be about to change things out from under it
104+ self.reset()
105+
106+ return builder.finish()
107+
108
109 class KnitPackRepository(KnitRepository):
110 """Repository with knit objects stored inside pack containers.
111@@ -2432,6 +2474,13 @@
112 packer = ReconcilePacker(collection, packs, extension, revs)
113 return packer.pack(pb)
114
115+ def regenerate_pack_names(self):
116+ """Regenerate the pack-names file within the repository.
117+
118+ This is only for disaster recovery; it should not be called in day-to-day use.
119+ """
120+ return self._pack_collection.regenerate_pack_names()
121+
122 @only_raises(errors.LockNotHeld, errors.LockBroken)
123 def unlock(self):
124 if self._write_lock_count == 1 and self._write_group is not None:
125
126=== modified file 'bzrlib/repository.py'
127--- bzrlib/repository.py 2011-02-08 15:41:44 +0000
128+++ bzrlib/repository.py 2011-02-15 02:33:59 +0000
129@@ -1439,7 +1439,7 @@
130 def lock_write(self, token=None):
131 """Lock this repository for writing.
132
133- This causes caching within the repository obejct to start accumlating
134+ This causes caching within the repository object to start accumlating
135 data during reads, and allows a 'write_group' to be obtained. Write
136 groups must be used for actual data insertion.
137
138@@ -2866,6 +2866,17 @@
139 """
140 raise NotImplementedError(self.revision_graph_can_have_wrong_parents)
141
142+ def regenerate_pack_names(self):
143+ """Build a new pack-names file by examining the pack and index files on disk.
144+
145+ Does *not* modify the existing pack-names file, but returns the contents
146+ for a new one.
147+
148+ :return: The contents that (this method believes) should be written
149+ to pack-names
150+ """
151+ raise NotImplementedError(self.regenerate_pack_names)
152+
153
154 def install_revision(repository, rev, revision_tree):
155 """Install all revision data into a repository."""
156
157=== modified file 'bzrlib/tests/__init__.py'
158--- bzrlib/tests/__init__.py 2011-02-11 17:12:35 +0000
159+++ bzrlib/tests/__init__.py 2011-02-15 02:33:59 +0000
160@@ -1896,7 +1896,7 @@
161 or a functional test of the library.)
162
163 This sends the stdout/stderr results into the test's log,
164- where it may be useful for debugging. See also run_captured.
165+ where it may be useful for debugging. See also run_bzr_error.
166
167 :keyword stdin: A string to be used as stdin for the command.
168 :keyword retcode: The status code the command should return;
169
170=== modified file 'bzrlib/tests/blackbox/__init__.py'
171--- bzrlib/tests/blackbox/__init__.py 2011-01-27 17:45:24 +0000
172+++ bzrlib/tests/blackbox/__init__.py 2011-02-15 02:33:59 +0000
173@@ -99,6 +99,7 @@
174 'test_remove',
175 'test_re_sign',
176 'test_remove_tree',
177+ 'test_regenerate_pack_names',
178 'test_repair_workingtree',
179 'test_resolve',
180 'test_revert',
181
182=== added file 'bzrlib/tests/blackbox/test_regenerate_pack_names.py'
183--- bzrlib/tests/blackbox/test_regenerate_pack_names.py 1970-01-01 00:00:00 +0000
184+++ bzrlib/tests/blackbox/test_regenerate_pack_names.py 2011-02-15 02:33:59 +0000
185@@ -0,0 +1,131 @@
186+# Copyright (C) 2011 Canonical Ltd
187+#
188+# This program is free software; you can redistribute it and/or modify
189+# it under the terms of the GNU General Public License as published by
190+# the Free Software Foundation; either version 2 of the License, or
191+# (at your option) any later version.
192+#
193+# This program is distributed in the hope that it will be useful,
194+# but WITHOUT ANY WARRANTY; without even the implied warranty of
195+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
196+# GNU General Public License for more details.
197+#
198+# You should have received a copy of the GNU General Public License
199+# along with this program; if not, write to the Free Software
200+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
201+
202+
203+import os
204+from bzrlib import (
205+ workingtree,
206+ )
207+from bzrlib.tests import TestCaseWithTransport
208+
209+
210+class TestRegeneratePackNames(TestCaseWithTransport):
211+
212+ def get_btree_content(self, path):
213+ """Produce a text dump of a B-Tree file.
214+
215+ :return: A hybrid dump: a --raw dump of the header,
216+ followed by a formatted dump of the content
217+ """
218+ (hdr, err) = self.run_bzr_error([], ['dump-btree', '--raw', path],
219+ retcode=0)
220+ self.assertEquals("", err)
221+ data_offset = hdr.find("\nPage 0")
222+ if data_offset != -1:
223+ hdr = hdr[0:data_offset]
224+
225+ (out, err) = self.run_bzr_error([], ['dump-btree', path], retcode=0)
226+ self.assertEquals("", err)
227+ return hdr + out
228+
229+ def assertFileExists(self, path):
230+ self.assertTrue(os.path.exists(path), "%s exists" % path)
231+
232+ def assertNotFileExists(self, path):
233+ self.assertFalse(os.path.exists(path), "%s does not exist" % path)
234+
235+ def assertFileContains(self, path, expected):
236+ """Check an actual pack-names file"""
237+ if expected is None:
238+ self.assertNotFileExists(path)
239+ else:
240+ self.assertFileExists(path)
241+ actual = self.get_btree_content(path)
242+ self.assertEqualDiff(expected, actual)
243+
244+ def check_files(self, live_expected, gen_expected):
245+ """Check the pack-names files after a test run.
246+
247+ :param live_expected: Expected content of the real pack-names file
248+ (which will be its original contents unless our caller explicitly
249+ modified the file). If None, the file is expected *not* to exist.
250+ :param gen_expected: Expected content of the pack-names.generated
251+ file that might have been produced from the code under test.
252+ If None, the file is expected *not* to exist.
253+ """
254+
255+ self.assertFileContains("tree/.bzr/repository/pack-names",
256+ live_expected)
257+ self.assertFileContains("tree/.bzr/repository/pack-names.generated",
258+ gen_expected)
259+
260+ def make_initial_tree(self, format=None, read_pack_names=True):
261+ tree = self.make_branch_and_tree('tree', format=format)
262+ self.build_tree(['tree/foo', 'tree/dir/', 'tree/dir/bar'])
263+ tree.add(['foo', 'dir', 'dir/bar'])
264+ tree.commit('first')
265+ if read_pack_names:
266+ content = self.get_btree_content("tree/.bzr/repository/pack-names")
267+ else:
268+ content = None
269+ return tree, content
270+
271+ # Success cases
272+
273+ def test_regenerate_pack_names_2a(self):
274+ tree, old_pack_names = self.make_initial_tree(format="2a",
275+ read_pack_names=True)
276+ self.run_bzr('regenerate-pack-names -d tree')
277+ self.check_files(old_pack_names, old_pack_names)
278+
279+ def test_regenerate_pack_names_default(self):
280+ """Some day a new format might become the default. Then,
281+ this test will have its moment in the sun :-)
282+ """
283+ tree, old_pack_names = self.make_initial_tree()
284+ self.run_bzr('regenerate-pack-names -d tree')
285+ self.check_files(old_pack_names, old_pack_names)
286+
287+ # Error cases
288+
289+ def test_regenerate_pack_names_knit(self):
290+ """Can't regenerate-pack-names in a repo that doesn't have one"""
291+ tree, old_pack_names = self.make_initial_tree(format="knit",
292+ read_pack_names=False)
293+ self.run_bzr_error([": not a pack repository"],
294+ 'regenerate-pack-names -d tree')
295+ self.check_files(None, None)
296+
297+ def test_regenerate_pack_names_dir_no_bzrdir(self):
298+ """regenerate-pack-names should fail on a directory that has no .bzr"""
299+ tree, old_pack_names = self.make_initial_tree()
300+ self.run_bzr_error(['Not a branch'],
301+ 'regenerate-pack-names -d tree/dir')
302+ self.check_files(old_pack_names, None)
303+
304+ def test_regenerate_pack_names_file(self):
305+ """regenerate-pack-names should fail on a file"""
306+ tree, old_pack_names = self.make_initial_tree()
307+ self.run_bzr_error(['Not a branch'],
308+ 'regenerate-pack-names -d tree/foo')
309+ self.check_files(old_pack_names, None)
310+
311+ def test_regenerate_pack_names_nonexistent(self):
312+ """regenerate-pack-names should fail on a nonexistent pathname"""
313+ tree, old_pack_names = self.make_initial_tree()
314+ self.run_bzr_error(['Not a branch'],
315+ 'regenerate-pack-names -d nonex')
316+ self.check_files(old_pack_names, None)
317
318=== modified file 'bzrlib/tests/per_pack_repository.py'
319--- bzrlib/tests/per_pack_repository.py 2011-01-26 19:34:58 +0000
320+++ bzrlib/tests/per_pack_repository.py 2011-02-15 02:33:59 +0000
321@@ -19,6 +19,10 @@
322 These tests are repeated for all pack-based repository formats.
323 """
324
325+from os import (
326+ listdir,
327+ remove,
328+ )
329 from stat import S_ISDIR
330
331 from bzrlib.btree_index import BTreeGraphIndex
332@@ -53,7 +57,7 @@
333
334 The following are populated from the test scenario:
335
336- :ivar format_name: Registered name fo the format to test.
337+ :ivar format_name: Registered name of the format to test.
338 :ivar format_string: On-disk format marker.
339 :ivar format_supports_external_lookups: Boolean.
340 """
341@@ -710,7 +714,7 @@
342
343 def _lock_write(self, write_lockable):
344 """Lock write_lockable, add a cleanup and return the result.
345-
346+
347 :param write_lockable: An object with a lock_write method.
348 :return: The result of write_lockable.lock_write().
349 """
350@@ -1120,6 +1124,91 @@
351 self.assertEqual(2, streaming_calls)
352
353
354+class TestRegeneratePackNames(TestCaseWithTransport):
355+ def get_format(self):
356+ return bzrdir.format_registry.make_bzrdir(self.format_name)
357+
358+ def read_file(self, path):
359+ f = file(path, 'rb')
360+ try:
361+ return f.read()
362+ finally:
363+ f.close()
364+
365+ self.fail("Should never get here")
366+
367+ def make_initial_tree(self, num_commits):
368+ """Build and populate a working tree.
369+
370+ :param num_commits: Number of commits to perform (for sufficiently small
371+ values, i.e. until autopacking happens, this is also the number of
372+ packs to create)
373+ :returns: (tree, old_pack_names)
374+ tree The WorkingTree
375+ old_pack_names Content of the repository's pack-names file
376+ """
377+ format = self.get_format()
378+ tree = self.make_branch_and_tree('tree', format=format)
379+ self.build_tree(['tree/foo', 'tree/dir/', 'tree/dir/bar'])
380+ tree.add(['foo', 'dir', 'dir/bar'])
381+ for i in range(1, num_commits + 1):
382+ tree.commit("revision #%d" % i)
383+ old_pack_names = self.read_file('tree/.bzr/repository/pack-names')
384+ return tree, old_pack_names
385+
386+ def run_and_check(self, tree, live_expected, gen_expected):
387+ """Run the method under test, and check the results.
388+
389+ :param tree: The working tree
390+ :param live_expected: Expected content of the real pack-names file
391+ (which will be its original contents unless our caller explicitly
392+ modified the file)
393+ :param gen_expected: Expected content of the newly generated pack-names
394+ """
395+ # Run the code under test
396+ reader = tree.branch.repository.regenerate_pack_names()
397+ # Check its output...
398+ gen_actual = reader.read()
399+ self.assertEqualDiff(gen_expected, gen_actual)
400+ # ...and the live pack-names file
401+ live_actual = self.read_file('tree/.bzr/repository/pack-names')
402+ self.assertEqualDiff(live_expected, live_actual)
403+
404+ def test_zero_packs(self):
405+ tree, old_pack_names = self.make_initial_tree(0)
406+ self.run_and_check(tree, old_pack_names, old_pack_names)
407+
408+ def test_one_pack(self):
409+ tree, old_pack_names = self.make_initial_tree(1)
410+ self.run_and_check(tree, old_pack_names, old_pack_names)
411+
412+ def test_three_packs(self):
413+ tree, old_pack_names = self.make_initial_tree(3)
414+ self.run_and_check(tree, old_pack_names, old_pack_names)
415+
416+ def test_with_real_pack_names_clobbered(self):
417+ """With the real pack-names clobbered, should succeed normally"""
418+ tree, old_pack_names = self.make_initial_tree(1)
419+ f = file('tree/.bzr/repository/pack-names', 'w')
420+ f.write("bogus\n")
421+ f.close()
422+ self.run_and_check(tree, "bogus\n", old_pack_names)
423+
424+ def test_with_an_index_deleted(self):
425+ """Hard-core repo corruption should raise an appropriate exception"""
426+ tree, old_pack_names = self.make_initial_tree(1)
427+ indices = listdir('tree/.bzr/repository/indices')
428+ for index in indices:
429+ if index.endswith('.tix'): # .tix chosen one at random
430+ remove('tree/.bzr/repository/indices/' + index)
431+
432+ self.assertRaises(errors.NoSuchFile,
433+ tree.branch.repository.regenerate_pack_names)
434+
435+ new_pack_names = self.read_file('tree/.bzr/repository/pack-names')
436+ self.assertEqualDiff(old_pack_names, new_pack_names)
437+
438+
439 def load_tests(basic_tests, module, loader):
440 # these give the bzrdir canned format name, and the repository on-disk
441 # format string
442
443=== modified file 'doc/en/release-notes/bzr-2.4.txt'
444--- doc/en/release-notes/bzr-2.4.txt 2011-02-14 12:03:05 +0000
445+++ doc/en/release-notes/bzr-2.4.txt 2011-02-15 02:33:59 +0000
446@@ -36,6 +36,10 @@
447 the dirstate file to be rebuilt, rather than using a ``bzr checkout``
448 workaround. (John Arbash Meinel)
449
450+* A new hidden command ``bzr regenerate-pack-names``. This builds a new
451+ pack-names file (useful if the old one was lost or damaged).
452+ (Eric Siegerman)
453+
454 * Branching, merging and pulling a branch now copies revisions named in
455 tags, not just the tag metadata. (Andrew Bennetts, #309682)
456