Merge lp:~agrimm/bzr-fastimport/baseline-commit into lp:bzr-fastimport

Proposed by Andy Grimm
Status: Merged
Approved by: Jelmer Vernooij
Approved revision: no longer in the source branch.
Merged at revision: 340
Proposed branch: lp:~agrimm/bzr-fastimport/baseline-commit
Merge into: lp:bzr-fastimport
Diff against target: 205 lines (+109/-3)
3 files modified
cmds.py (+16/-2)
exporter.py (+19/-1)
tests/test_commands.py (+74/-0)
To merge this branch: bzr merge lp:~agrimm/bzr-fastimport/baseline-commit
Reviewer Review Type Date Requested Status
Jelmer Vernooij (community) Approve
Review via email: mp+83025@code.launchpad.net

Description of the change

Adds a --baseline option to fast-export to produce an export with a truncated history (a "baseline" plus all commits in the range given by the revisionspec).

To post a comment you must log in.
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Hi Andy,

Thanks again for working on this.

Generally, we try to put commands that modify fastexport streams into "bzr fast-import-filter", rather than "bzr fast-export". In this case, I guess it also skips

+ help="Export an 'abolute' baseline commit prior to"
This should probably say "absolute".

+ assert(len(revobj.parent_ids))
+ previd = revobj.parent_ids[0]
It seems a bit odd that the code uses the parent of the revision that was specified, rather than that revision itself.

Can you add some tests for this new behaviour?

Cheers,

Jelmer

Revision history for this message
Andy Grimm (agrimm) wrote :

> Hi Andy,
>
> Thanks again for working on this.
>
> Generally, we try to put commands that modify fastexport streams into "bzr
> fast-import-filter", rather than "bzr fast-export". In this case, I guess it
> also skips

If you think that having it in fast-import-filter is useful, I can do that. Will that change the logic about how the baseline is produced? Also, was your last sentence cut off here? I didn't understand it as written (skips what?)

> + help="Export an 'abolute' baseline commit prior to"
> This should probably say "absolute".

Indeed.

> + assert(len(revobj.parent_ids))
> + previd = revobj.parent_ids[0]
> It seems a bit odd that the code uses the parent of the revision that was
> specified, rather than that revision itself.

I wrestled with that a little. If the baseline argument were to take a single revision as an argument, I would agree that that is what should be used. Instead, I required the user to specify a revisionspec, which indicates the commits which they'd like to preserve. If I did not use the parent revision here, then they'd lose the first diff they wanted to be able to see. That might be confusing, though, and I'm not set on this behavior.

Do you think that it would be better to have the baseline option take a revisionspec directly, rather than requiring -r along with this option? (If I continue to require -r, I guess I need to writing some code to enforce that requirement.)

> Can you add some tests for this new behaviour?

Sure, I haven't looked at the test suite yet, but I'll give it a try.

> Cheers,
>
> Jelmer

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Am 22/11/11 16:08, schrieb Andy Grimm:
>> Hi Andy,
>>
>> Thanks again for working on this.
>>
>> Generally, we try to put commands that modify fastexport streams into "bzr
>> fast-import-filter", rather than "bzr fast-export". In this case, I guess it
>> also skips
> If you think that having it in fast-import-filter is useful, I can do that. Will that change the logic about how the baseline is produced? Also, was your last sentence cut off here? I didn't understand it as written (skips what?)
Sorry, I meant to say that using a baseline can skip processing of a lot
of revisions. If you have a 20k revision history, and specify a baseline
of r19000 then there are a lot of revisions you don't have to export. If
this functionality is added to fast-import-filter, then you would still
export all 20k revisions but then throw 19k of them away.

In other words, I think it's fine to do this in fast-export. But I
wanted to emphasize that generally we add history-modifying commands to
fast-import-filter.
>
>> + help="Export an 'abolute' baseline commit prior to"
>> This should probably say "absolute".
> Indeed.
>
>> + assert(len(revobj.parent_ids))
>> + previd = revobj.parent_ids[0]
>> It seems a bit odd that the code uses the parent of the revision that was
>> specified, rather than that revision itself.
> I wrestled with that a little. If the baseline argument were to take a single revision as an argument, I would agree that that is what should be used. Instead, I required the user to specify a revisionspec, which indicates the commits which they'd like to preserve. If I did not use the parent revision here, then they'd lose the first diff they wanted to be able to see. That might be confusing, though, and I'm not set on this behavior.
>
> Do you think that it would be better to have the baseline option take a revisionspec directly, rather than requiring -r along with this option? (If I continue to require -r, I guess I need to writing some code to enforce that requirement.)
-r specifies a revision - not a delta between revisions, so I do think
--baseline should only include the revision that was specified and not
its left hand side parent. For deltas (changes) we usually use the -c
option.
>
>> Can you add some tests for this new behaviour?
> Sure, I haven't looked at the test suite yet, but I'll give it a try.
Cool. You know where to find me on IRC if there's anything I can help with.

Cheers,

Jelmer

Revision history for this message
Andy Grimm (agrimm) wrote :

> >> + assert(len(revobj.parent_ids))
> >> + previd = revobj.parent_ids[0]
> >> It seems a bit odd that the code uses the parent of the revision that was
> >> specified, rather than that revision itself.
> > I wrestled with that a little. If the baseline argument were to take a
> single revision as an argument, I would agree that that is what should be
> used. Instead, I required the user to specify a revisionspec, which indicates
> the commits which they'd like to preserve. If I did not use the parent
> revision here, then they'd lose the first diff they wanted to be able to see.
> That might be confusing, though, and I'm not set on this behavior.
> >
> > Do you think that it would be better to have the baseline option take a
> revisionspec directly, rather than requiring -r along with this option? (If I
> continue to require -r, I guess I need to writing some code to enforce that
> requirement.)
> -r specifies a revision - not a delta between revisions, so I do think
> --baseline should only include the revision that was specified and not
> its left hand side parent. For deltas (changes) we usually use the -c
> option.

Oh, I see your point here. when someone gives the command "bzr diff -r 4..6" for example, 4 is the baseline, and the diff shown is the delta between 4 and 6, so it makes sense to make the baseline the state at the first revision given. Thanks, I'll make that change.

Revision history for this message
Andy Grimm (agrimm) wrote :

> > >> + assert(len(revobj.parent_ids))
> > >> + previd = revobj.parent_ids[0]
> > >> It seems a bit odd that the code uses the parent of the revision that was
> > >> specified, rather than that revision itself.
> > > I wrestled with that a little. If the baseline argument were to take a
> > single revision as an argument, I would agree that that is what should be
> > used. Instead, I required the user to specify a revisionspec, which
> indicates
> > the commits which they'd like to preserve. If I did not use the parent
> > revision here, then they'd lose the first diff they wanted to be able to
> see.
> > That might be confusing, though, and I'm not set on this behavior.
> > >
> > > Do you think that it would be better to have the baseline option take a
> > revisionspec directly, rather than requiring -r along with this option? (If
> I
> > continue to require -r, I guess I need to writing some code to enforce that
> > requirement.)
> > -r specifies a revision - not a delta between revisions, so I do think
> > --baseline should only include the revision that was specified and not
> > its left hand side parent. For deltas (changes) we usually use the -c
> > option.
>
> Oh, I see your point here. when someone gives the command "bzr diff -r 4..6"
> for example, 4 is the baseline, and the diff shown is the delta between 4 and
> 6, so it makes sense to make the baseline the state at the first revision
> given. Thanks, I'll make that change.

Upon writing the test, I realized that my original code was using parent there because the starting revision was already being excluded, so getting the parent of the first revision in "interesting" was actually just putting back the starting revision. I've changed how this works now and added a test, which should clear up any confusion about how this behaves.

Revision history for this message
Jelmer Vernooij (jelmer) :
review: Approve
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Merged, thanks !

340. By Jelmer Vernooij

Merge support for --baseline option to 'bzr export'.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'cmds.py'
--- cmds.py 2011-10-25 23:21:02 +0000
+++ cmds.py 2011-11-22 20:40:47 +0000
@@ -606,6 +606,16 @@
606 unsupported in git with an underscore instead, specify606 unsupported in git with an underscore instead, specify
607 --rewrite-tag-names.607 --rewrite-tag-names.
608608
609 :History truncation:
610
611 When code has been significantly refactored over time (e.g., to separate
612 proprietary code from open source code), it is sometimes convenient to
613 simply truncate the revision history at a certain point. The --baseline
614 option, to be used in conjunction with -r, emits a baseline commit
615 containing the state of the entire source tree at the first requested
616 revision. This allows a user to produce a tree identical to the original
617 without munging multiple exports.
618
609 :Examples:619 :Examples:
610620
611 To produce data destined for import into Bazaar::621 To produce data destined for import into Bazaar::
@@ -656,12 +666,16 @@
656 help="Replace characters invalid in git with '_'"666 help="Replace characters invalid in git with '_'"
657 " (plain mode only).",667 " (plain mode only).",
658 ),668 ),
669 Option('baseline',
670 help="Export an 'absolute' baseline commit prior to"
671 "the first relative commit",
672 ),
659 ]673 ]
660 encoding_type = 'exact'674 encoding_type = 'exact'
661 def run(self, source, destination=None, verbose=False,675 def run(self, source, destination=None, verbose=False,
662 git_branch="master", checkpoint=10000, marks=None,676 git_branch="master", checkpoint=10000, marks=None,
663 import_marks=None, export_marks=None, revision=None,677 import_marks=None, export_marks=None, revision=None,
664 plain=True, rewrite_tag_names=False):678 plain=True, rewrite_tag_names=False, baseline=False):
665 load_fastimport()679 load_fastimport()
666 from bzrlib.branch import Branch680 from bzrlib.branch import Branch
667 from bzrlib.plugins.fastimport import exporter681 from bzrlib.plugins.fastimport import exporter
@@ -676,7 +690,7 @@
676 outf=outf, git_branch=git_branch, checkpoint=checkpoint,690 outf=outf, git_branch=git_branch, checkpoint=checkpoint,
677 import_marks_file=import_marks, export_marks_file=export_marks,691 import_marks_file=import_marks, export_marks_file=export_marks,
678 revision=revision, verbose=verbose, plain_format=plain,692 revision=revision, verbose=verbose, plain_format=plain,
679 rewrite_tags=rewrite_tag_names)693 rewrite_tags=rewrite_tag_names, baseline=baseline)
680 return exporter.run()694 return exporter.run()
681695
682696
683697
=== modified file 'exporter.py'
--- exporter.py 2011-10-25 23:21:02 +0000
+++ exporter.py 2011-11-22 20:40:47 +0000
@@ -148,7 +148,8 @@
148148
149 def __init__(self, source, outf, git_branch=None, checkpoint=-1,149 def __init__(self, source, outf, git_branch=None, checkpoint=-1,
150 import_marks_file=None, export_marks_file=None, revision=None,150 import_marks_file=None, export_marks_file=None, revision=None,
151 verbose=False, plain_format=False, rewrite_tags=False):151 verbose=False, plain_format=False, rewrite_tags=False,
152 baseline=False):
152 """Export branch data in fast import format.153 """Export branch data in fast import format.
153154
154 :param plain_format: if True, 'classic' fast-import format is155 :param plain_format: if True, 'classic' fast-import format is
@@ -170,6 +171,7 @@
170 self.excluded_revisions = set()171 self.excluded_revisions = set()
171 self.plain_format = plain_format172 self.plain_format = plain_format
172 self.rewrite_tags = rewrite_tags173 self.rewrite_tags = rewrite_tags
174 self.baseline = baseline
173 self._multi_author_api_available = hasattr(bzrlib.revision.Revision,175 self._multi_author_api_available = hasattr(bzrlib.revision.Revision,
174 'get_apparent_authors')176 'get_apparent_authors')
175 self.properties_to_exclude = ['authors', 'author']177 self.properties_to_exclude = ['authors', 'author']
@@ -213,6 +215,10 @@
213 self.note("Calculating the revisions to exclude ...")215 self.note("Calculating the revisions to exclude ...")
214 self.excluded_revisions = set([rev_id for rev_id, _, _, _ in216 self.excluded_revisions = set([rev_id for rev_id, _, _, _ in
215 self.branch.iter_merge_sorted_revisions(start_rev_id)])217 self.branch.iter_merge_sorted_revisions(start_rev_id)])
218 if self.baseline:
219 # needed so the first relative commit knows its parent
220 self.excluded_revisions.remove(start_rev_id)
221 view_revisions.insert(0, start_rev_id)
216 return list(view_revisions)222 return list(view_revisions)
217223
218 def run(self):224 def run(self):
@@ -225,6 +231,8 @@
225 self._commit_total)231 self._commit_total)
226 if not self.plain_format:232 if not self.plain_format:
227 self.emit_features()233 self.emit_features()
234 if self.baseline:
235 self.emit_baseline(interesting.pop(0), self.git_branch)
228 for revid in interesting:236 for revid in interesting:
229 self.emit_commit(revid, self.git_branch)237 self.emit_commit(revid, self.git_branch)
230 if self.branch.supports_tags():238 if self.branch.supports_tags():
@@ -302,6 +310,16 @@
302 for feature in sorted(commands.FEATURE_NAMES):310 for feature in sorted(commands.FEATURE_NAMES):
303 self.print_cmd(commands.FeatureCommand(feature))311 self.print_cmd(commands.FeatureCommand(feature))
304312
313 def emit_baseline(self, revid, git_branch):
314 # Emit a full source tree of the first commit's parent
315 git_ref = 'refs/heads/%s' % (git_branch,)
316 revobj = self.branch.repository.get_revision(revid)
317 mark = 1
318 self.revid_to_mark[revid] = mark
319 file_cmds = self._get_filecommands(bzrlib.revision.NULL_REVISION, revid)
320 self.print_cmd(self._get_commit_command(git_ref, mark, revobj,
321 file_cmds))
322
305 def emit_commit(self, revid, git_branch):323 def emit_commit(self, revid, git_branch):
306 if revid in self.revid_to_mark or revid in self.excluded_revisions:324 if revid in self.revid_to_mark or revid in self.excluded_revisions:
307 return325 return
308326
=== modified file 'tests/test_commands.py'
--- tests/test_commands.py 2011-10-17 10:25:55 +0000
+++ tests/test_commands.py 2011-11-22 20:40:47 +0000
@@ -16,6 +16,7 @@
16"""Test the command implementations."""16"""Test the command implementations."""
1717
18import os18import os
19import re
19import tempfile20import tempfile
20import gzip21import gzip
2122
@@ -58,6 +59,40 @@
58 self.assertIsNot("bla", stream.read())59 self.assertIsNot("bla", stream.read())
5960
6061
62fast_export_baseline_data = """commit refs/heads/master
63mark :1
64committer
65data 15
66add c, remove b
67M 644 inline a
68data 13
69test 1
70test 3
71M 644 inline c
72data 6
73test 4
74commit refs/heads/master
75mark :2
76committer
77data 14
78modify a again
79from :1
80M 644 inline a
81data 20
82test 1
83test 3
84test 5
85commit refs/heads/master
86mark :3
87committer
88data 5
89add d
90from :2
91M 644 inline d
92data 6
93test 6
94"""
95
61class TestFastExport(ExternalBase):96class TestFastExport(ExternalBase):
6297
63 def test_empty(self):98 def test_empty(self):
@@ -99,6 +134,45 @@
99 # "bad Tag" should be exported as bad_Tag134 # "bad Tag" should be exported as bad_Tag
100 self.assertNotEqual(-1, data.find("reset refs/tags/bad_Tag"))135 self.assertNotEqual(-1, data.find("reset refs/tags/bad_Tag"))
101136
137 def test_baseline_option(self):
138 tree = self.make_branch_and_tree("bl")
139
140 # Revision 1
141 file('bl/a', 'w').write('test 1')
142 tree.add('a')
143 tree.commit(message='add a')
144
145 # Revision 2
146 file('bl/b', 'w').write('test 2')
147 file('bl/a', 'a').write('\ntest 3')
148 tree.add('b')
149 tree.commit(message='add b, modify a')
150
151 # Revision 3
152 file('bl/c', 'w').write('test 4')
153 tree.add('c')
154 tree.remove('b')
155 tree.commit(message='add c, remove b')
156
157 # Revision 4
158 file('bl/a', 'a').write('\ntest 5')
159 tree.commit(message='modify a again')
160
161 # Revision 5
162 file('bl/d', 'w').write('test 6')
163 tree.add('d')
164 tree.commit(message='add d')
165
166 # This exports the baseline state at Revision 3,
167 # followed by the deltas for 4 and 5
168 data = self.run_bzr("fast-export --baseline -r 3.. bl")[0]
169 data = re.sub('committer.*', 'committer', data)
170 self.assertEquals(fast_export_baseline_data, data)
171
172 # Also confirm that --baseline with no args is identical to full export
173 data1 = self.run_bzr("fast-export --baseline bl")[0]
174 data2 = self.run_bzr("fast-export bl")[0]
175 self.assertEquals(data1, data2)
102176
103simple_fast_import_stream = """commit refs/heads/master177simple_fast_import_stream = """commit refs/heads/master
104mark :1178mark :1

Subscribers

People subscribed via source and target branches