Merge lp:~agrimm/bzr-fastimport/baseline-commit into lp:bzr-fastimport

Proposed by Andy Grimm on 2011-11-22
Status: Merged
Approved by: Jelmer Vernooij on 2011-11-23
Approved revision: no longer in the source branch.
Merged at revision: 340
Proposed branch: lp:~agrimm/bzr-fastimport/baseline-commit
Merge into: lp:bzr-fastimport
Diff against target: 205 lines (+109/-3)
3 files modified
cmds.py (+16/-2)
exporter.py (+19/-1)
tests/test_commands.py (+74/-0)
To merge this branch: bzr merge lp:~agrimm/bzr-fastimport/baseline-commit
Reviewer Review Type Date Requested Status
Jelmer Vernooij (community) 2011-11-22 Approve on 2011-11-23
Review via email: mp+83025@code.launchpad.net

Description of the Change

Adds a --baseline option to fast-export to produce an export with a truncated history (a "baseline" plus all commits in the range given by the revisionspec).

To post a comment you must log in.
Jelmer Vernooij (jelmer) wrote :

Hi Andy,

Thanks again for working on this.

Generally, we try to put commands that modify fastexport streams into "bzr fast-import-filter", rather than "bzr fast-export". In this case, I guess it also skips

+ help="Export an 'abolute' baseline commit prior to"
This should probably say "absolute".

+ assert(len(revobj.parent_ids))
+ previd = revobj.parent_ids[0]
It seems a bit odd that the code uses the parent of the revision that was specified, rather than that revision itself.

Can you add some tests for this new behaviour?

Cheers,

Jelmer

Andy Grimm (agrimm) wrote :

> Hi Andy,
>
> Thanks again for working on this.
>
> Generally, we try to put commands that modify fastexport streams into "bzr
> fast-import-filter", rather than "bzr fast-export". In this case, I guess it
> also skips

If you think that having it in fast-import-filter is useful, I can do that. Will that change the logic about how the baseline is produced? Also, was your last sentence cut off here? I didn't understand it as written (skips what?)

> + help="Export an 'abolute' baseline commit prior to"
> This should probably say "absolute".

Indeed.

> + assert(len(revobj.parent_ids))
> + previd = revobj.parent_ids[0]
> It seems a bit odd that the code uses the parent of the revision that was
> specified, rather than that revision itself.

I wrestled with that a little. If the baseline argument were to take a single revision as an argument, I would agree that that is what should be used. Instead, I required the user to specify a revisionspec, which indicates the commits which they'd like to preserve. If I did not use the parent revision here, then they'd lose the first diff they wanted to be able to see. That might be confusing, though, and I'm not set on this behavior.

Do you think that it would be better to have the baseline option take a revisionspec directly, rather than requiring -r along with this option? (If I continue to require -r, I guess I need to writing some code to enforce that requirement.)

> Can you add some tests for this new behaviour?

Sure, I haven't looked at the test suite yet, but I'll give it a try.

> Cheers,
>
> Jelmer

Jelmer Vernooij (jelmer) wrote :

Am 22/11/11 16:08, schrieb Andy Grimm:
>> Hi Andy,
>>
>> Thanks again for working on this.
>>
>> Generally, we try to put commands that modify fastexport streams into "bzr
>> fast-import-filter", rather than "bzr fast-export". In this case, I guess it
>> also skips
> If you think that having it in fast-import-filter is useful, I can do that. Will that change the logic about how the baseline is produced? Also, was your last sentence cut off here? I didn't understand it as written (skips what?)
Sorry, I meant to say that using a baseline can skip processing of a lot
of revisions. If you have a 20k revision history, and specify a baseline
of r19000 then there are a lot of revisions you don't have to export. If
this functionality is added to fast-import-filter, then you would still
export all 20k revisions but then throw 19k of them away.

In other words, I think it's fine to do this in fast-export. But I
wanted to emphasize that generally we add history-modifying commands to
fast-import-filter.
>
>> + help="Export an 'abolute' baseline commit prior to"
>> This should probably say "absolute".
> Indeed.
>
>> + assert(len(revobj.parent_ids))
>> + previd = revobj.parent_ids[0]
>> It seems a bit odd that the code uses the parent of the revision that was
>> specified, rather than that revision itself.
> I wrestled with that a little. If the baseline argument were to take a single revision as an argument, I would agree that that is what should be used. Instead, I required the user to specify a revisionspec, which indicates the commits which they'd like to preserve. If I did not use the parent revision here, then they'd lose the first diff they wanted to be able to see. That might be confusing, though, and I'm not set on this behavior.
>
> Do you think that it would be better to have the baseline option take a revisionspec directly, rather than requiring -r along with this option? (If I continue to require -r, I guess I need to writing some code to enforce that requirement.)
-r specifies a revision - not a delta between revisions, so I do think
--baseline should only include the revision that was specified and not
its left hand side parent. For deltas (changes) we usually use the -c
option.
>
>> Can you add some tests for this new behaviour?
> Sure, I haven't looked at the test suite yet, but I'll give it a try.
Cool. You know where to find me on IRC if there's anything I can help with.

Cheers,

Jelmer

Andy Grimm (agrimm) wrote :

> >> + assert(len(revobj.parent_ids))
> >> + previd = revobj.parent_ids[0]
> >> It seems a bit odd that the code uses the parent of the revision that was
> >> specified, rather than that revision itself.
> > I wrestled with that a little. If the baseline argument were to take a
> single revision as an argument, I would agree that that is what should be
> used. Instead, I required the user to specify a revisionspec, which indicates
> the commits which they'd like to preserve. If I did not use the parent
> revision here, then they'd lose the first diff they wanted to be able to see.
> That might be confusing, though, and I'm not set on this behavior.
> >
> > Do you think that it would be better to have the baseline option take a
> revisionspec directly, rather than requiring -r along with this option? (If I
> continue to require -r, I guess I need to writing some code to enforce that
> requirement.)
> -r specifies a revision - not a delta between revisions, so I do think
> --baseline should only include the revision that was specified and not
> its left hand side parent. For deltas (changes) we usually use the -c
> option.

Oh, I see your point here. when someone gives the command "bzr diff -r 4..6" for example, 4 is the baseline, and the diff shown is the delta between 4 and 6, so it makes sense to make the baseline the state at the first revision given. Thanks, I'll make that change.

Andy Grimm (agrimm) wrote :

> > >> + assert(len(revobj.parent_ids))
> > >> + previd = revobj.parent_ids[0]
> > >> It seems a bit odd that the code uses the parent of the revision that was
> > >> specified, rather than that revision itself.
> > > I wrestled with that a little. If the baseline argument were to take a
> > single revision as an argument, I would agree that that is what should be
> > used. Instead, I required the user to specify a revisionspec, which
> indicates
> > the commits which they'd like to preserve. If I did not use the parent
> > revision here, then they'd lose the first diff they wanted to be able to
> see.
> > That might be confusing, though, and I'm not set on this behavior.
> > >
> > > Do you think that it would be better to have the baseline option take a
> > revisionspec directly, rather than requiring -r along with this option? (If
> I
> > continue to require -r, I guess I need to writing some code to enforce that
> > requirement.)
> > -r specifies a revision - not a delta between revisions, so I do think
> > --baseline should only include the revision that was specified and not
> > its left hand side parent. For deltas (changes) we usually use the -c
> > option.
>
> Oh, I see your point here. when someone gives the command "bzr diff -r 4..6"
> for example, 4 is the baseline, and the diff shown is the delta between 4 and
> 6, so it makes sense to make the baseline the state at the first revision
> given. Thanks, I'll make that change.

Upon writing the test, I realized that my original code was using parent there because the starting revision was already being excluded, so getting the parent of the first revision in "interesting" was actually just putting back the starting revision. I've changed how this works now and added a test, which should clear up any confusion about how this behaves.

Jelmer Vernooij (jelmer) :
review: Approve
Jelmer Vernooij (jelmer) wrote :

Merged, thanks !

340. By Jelmer Vernooij on 2011-11-23

Merge support for --baseline option to 'bzr export'.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'cmds.py'
2--- cmds.py 2011-10-25 23:21:02 +0000
3+++ cmds.py 2011-11-22 20:40:47 +0000
4@@ -606,6 +606,16 @@
5 unsupported in git with an underscore instead, specify
6 --rewrite-tag-names.
7
8+ :History truncation:
9+
10+ When code has been significantly refactored over time (e.g., to separate
11+ proprietary code from open source code), it is sometimes convenient to
12+ simply truncate the revision history at a certain point. The --baseline
13+ option, to be used in conjunction with -r, emits a baseline commit
14+ containing the state of the entire source tree at the first requested
15+ revision. This allows a user to produce a tree identical to the original
16+ without munging multiple exports.
17+
18 :Examples:
19
20 To produce data destined for import into Bazaar::
21@@ -656,12 +666,16 @@
22 help="Replace characters invalid in git with '_'"
23 " (plain mode only).",
24 ),
25+ Option('baseline',
26+ help="Export an 'absolute' baseline commit prior to"
27+ "the first relative commit",
28+ ),
29 ]
30 encoding_type = 'exact'
31 def run(self, source, destination=None, verbose=False,
32 git_branch="master", checkpoint=10000, marks=None,
33 import_marks=None, export_marks=None, revision=None,
34- plain=True, rewrite_tag_names=False):
35+ plain=True, rewrite_tag_names=False, baseline=False):
36 load_fastimport()
37 from bzrlib.branch import Branch
38 from bzrlib.plugins.fastimport import exporter
39@@ -676,7 +690,7 @@
40 outf=outf, git_branch=git_branch, checkpoint=checkpoint,
41 import_marks_file=import_marks, export_marks_file=export_marks,
42 revision=revision, verbose=verbose, plain_format=plain,
43- rewrite_tags=rewrite_tag_names)
44+ rewrite_tags=rewrite_tag_names, baseline=baseline)
45 return exporter.run()
46
47
48
49=== modified file 'exporter.py'
50--- exporter.py 2011-10-25 23:21:02 +0000
51+++ exporter.py 2011-11-22 20:40:47 +0000
52@@ -148,7 +148,8 @@
53
54 def __init__(self, source, outf, git_branch=None, checkpoint=-1,
55 import_marks_file=None, export_marks_file=None, revision=None,
56- verbose=False, plain_format=False, rewrite_tags=False):
57+ verbose=False, plain_format=False, rewrite_tags=False,
58+ baseline=False):
59 """Export branch data in fast import format.
60
61 :param plain_format: if True, 'classic' fast-import format is
62@@ -170,6 +171,7 @@
63 self.excluded_revisions = set()
64 self.plain_format = plain_format
65 self.rewrite_tags = rewrite_tags
66+ self.baseline = baseline
67 self._multi_author_api_available = hasattr(bzrlib.revision.Revision,
68 'get_apparent_authors')
69 self.properties_to_exclude = ['authors', 'author']
70@@ -213,6 +215,10 @@
71 self.note("Calculating the revisions to exclude ...")
72 self.excluded_revisions = set([rev_id for rev_id, _, _, _ in
73 self.branch.iter_merge_sorted_revisions(start_rev_id)])
74+ if self.baseline:
75+ # needed so the first relative commit knows its parent
76+ self.excluded_revisions.remove(start_rev_id)
77+ view_revisions.insert(0, start_rev_id)
78 return list(view_revisions)
79
80 def run(self):
81@@ -225,6 +231,8 @@
82 self._commit_total)
83 if not self.plain_format:
84 self.emit_features()
85+ if self.baseline:
86+ self.emit_baseline(interesting.pop(0), self.git_branch)
87 for revid in interesting:
88 self.emit_commit(revid, self.git_branch)
89 if self.branch.supports_tags():
90@@ -302,6 +310,16 @@
91 for feature in sorted(commands.FEATURE_NAMES):
92 self.print_cmd(commands.FeatureCommand(feature))
93
94+ def emit_baseline(self, revid, git_branch):
95+ # Emit a full source tree of the first commit's parent
96+ git_ref = 'refs/heads/%s' % (git_branch,)
97+ revobj = self.branch.repository.get_revision(revid)
98+ mark = 1
99+ self.revid_to_mark[revid] = mark
100+ file_cmds = self._get_filecommands(bzrlib.revision.NULL_REVISION, revid)
101+ self.print_cmd(self._get_commit_command(git_ref, mark, revobj,
102+ file_cmds))
103+
104 def emit_commit(self, revid, git_branch):
105 if revid in self.revid_to_mark or revid in self.excluded_revisions:
106 return
107
108=== modified file 'tests/test_commands.py'
109--- tests/test_commands.py 2011-10-17 10:25:55 +0000
110+++ tests/test_commands.py 2011-11-22 20:40:47 +0000
111@@ -16,6 +16,7 @@
112 """Test the command implementations."""
113
114 import os
115+import re
116 import tempfile
117 import gzip
118
119@@ -58,6 +59,40 @@
120 self.assertIsNot("bla", stream.read())
121
122
123+fast_export_baseline_data = """commit refs/heads/master
124+mark :1
125+committer
126+data 15
127+add c, remove b
128+M 644 inline a
129+data 13
130+test 1
131+test 3
132+M 644 inline c
133+data 6
134+test 4
135+commit refs/heads/master
136+mark :2
137+committer
138+data 14
139+modify a again
140+from :1
141+M 644 inline a
142+data 20
143+test 1
144+test 3
145+test 5
146+commit refs/heads/master
147+mark :3
148+committer
149+data 5
150+add d
151+from :2
152+M 644 inline d
153+data 6
154+test 6
155+"""
156+
157 class TestFastExport(ExternalBase):
158
159 def test_empty(self):
160@@ -99,6 +134,45 @@
161 # "bad Tag" should be exported as bad_Tag
162 self.assertNotEqual(-1, data.find("reset refs/tags/bad_Tag"))
163
164+ def test_baseline_option(self):
165+ tree = self.make_branch_and_tree("bl")
166+
167+ # Revision 1
168+ file('bl/a', 'w').write('test 1')
169+ tree.add('a')
170+ tree.commit(message='add a')
171+
172+ # Revision 2
173+ file('bl/b', 'w').write('test 2')
174+ file('bl/a', 'a').write('\ntest 3')
175+ tree.add('b')
176+ tree.commit(message='add b, modify a')
177+
178+ # Revision 3
179+ file('bl/c', 'w').write('test 4')
180+ tree.add('c')
181+ tree.remove('b')
182+ tree.commit(message='add c, remove b')
183+
184+ # Revision 4
185+ file('bl/a', 'a').write('\ntest 5')
186+ tree.commit(message='modify a again')
187+
188+ # Revision 5
189+ file('bl/d', 'w').write('test 6')
190+ tree.add('d')
191+ tree.commit(message='add d')
192+
193+ # This exports the baseline state at Revision 3,
194+ # followed by the deltas for 4 and 5
195+ data = self.run_bzr("fast-export --baseline -r 3.. bl")[0]
196+ data = re.sub('committer.*', 'committer', data)
197+ self.assertEquals(fast_export_baseline_data, data)
198+
199+ # Also confirm that --baseline with no args is identical to full export
200+ data1 = self.run_bzr("fast-export --baseline bl")[0]
201+ data2 = self.run_bzr("fast-export bl")[0]
202+ self.assertEquals(data1, data2)
203
204 simple_fast_import_stream = """commit refs/heads/master
205 mark :1

Subscribers

People subscribed via source and target branches