Merge lp:~termie/bzr-fastimport/marks_normalization into lp:bzr-fastimport

Proposed by termie
Status: Merged
Merged at revision: 305
Proposed branch: lp:~termie/bzr-fastimport/marks_normalization
Merge into: lp:bzr-fastimport
Diff against target: 61 lines (+6/-6)
4 files modified
branch_updater.py (+1/-0)
bzr_commit_handler.py (+1/-1)
marks_file.py (+2/-3)
processors/generic_processor.py (+2/-2)
To merge this branch: bzr merge lp:~termie/bzr-fastimport/marks_normalization
Reviewer Review Type Date Requested Status
Jelmer Vernooij Pending
Ian Clatworthy Pending
Gyorgy Kovacs Pending
Bazaar Developers Pending
Review via email: mp+47944@code.launchpad.net

Description of the change

Add a bunch of mark id normalization.

bzr-fastimport is getting marks from a variety of sources, all of which
us the format ':\d+' but it really wants just the raw number internally.

This patch cleans up the symptoms of many places where this normalization
was not occuring.

Testing has been based on using git-bzr-ng (http://github.com/termie/git-bzr-ng).

I would love to get this merged soon so that my tool does not need to depend on a patched version of your library :D

To post a comment you must log in.
Revision history for this message
Martin Pool (mbp) wrote :

Hi, thanks for the patch.

It doesn't look wrong, but I'm not familiar enough with the code to say it's obviously right. Are there already tests for this, or could you add some?

Revision history for this message
termie (termie) wrote :

The existing tests for bzr-fastimport don't run at all on my system (I tried using "bzr selftest fastimport" as the documentation described), the tests I have run have been the tests for my own tool (git-bzr-ng) that depends on this file. Without this patch attempting to move between git and bzr fails in a variety of places.

The main issue is that bzr-fastimport gets marks from:

a) the marks files (marks_files.py)
b) the mark id of the fastimport data (generic_processor.py)
c) the branch tip feature of python-fastimport (branch_updater.py)
d) the parents/heads feature of python-fastimport (bzr_commit_handler.py)

All 4 of those provide bzr-fastimport with data in the format ':\d+', the only ones that bzr-fastimport normalizes are the ones from from (a), however the normalization being done is naive and simply removes a single ':' from the beginning of the mark id, which would probably work fine if normalization wasn't for missing (b), (c) and (d).

This patch extends the normalization to (b), (c) and (d) and makes the normalization a little more forgiving than the naive approach.

It is still realistically a cure for the symptoms rather than the cause, but without being able to run the bzr-fastimport tests (and without being a core dev of the project) it seems a little inefficient to attempt to refactor the project's use of mark ids, the project clearly wants them to be integers in a variety of places.

Revision history for this message
termie (termie) wrote :

Re: the tests not running, they seem to attempt to make use of a bunch of paths that don't exist, I'm guessing the tests are configured to run on a specific kind of system

Revision history for this message
Martin Pool (mbp) wrote :

On 31 January 2011 18:44, termie <email address hidden> wrote:
> Re: the tests not running, they seem to attempt to make use of a bunch of paths that don't exist, I'm guessing the tests are configured to run on a specific kind of system

Most of the bzr tests are isolated so that they can run anywhere.
With bzr, bzr-fastimport and python-fastimport installed I can do

  bzr selftest -s bp.fastimport

and I get 304 passes and no failures.

Revision history for this message
Martin Pool (mbp) wrote :

Basically, I guess basically I'd like to know the failure that motivated you to write this in the first place, and if possible to then turn it into a test.

Revision history for this message
Martin Pool (mbp) wrote :

I ran the tests in your branch and they all passed. So we haven't gone backwards. If it's possible to easily add some tests for what was broken, that would be nice. Otherwise, let's just merge it.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Mon, Jan 31, 2011 at 08:39:52AM -0000, Martin Pool wrote:
> I ran the tests in your branch and they all passed. So we haven't gone backwards. If it's possible to easily add some tests for what was broken, that would be nice. Otherwise, let's just merge it.
+1

It'd be nice to have a single function that can resolve marks (and
just does .lstrip(":") for now, but perhaps more later).

It also seems reasonable enough to land as-is, though.

Cheers,

Jelmer

Revision history for this message
termie (termie) wrote :

I was previously running `bzr selftest fastimport` as it that is what is described in the README, `bzr selftest -s bp.fastimport` does work for me.

Looking through your tests I don't see any that are very similar in form to what I would expect to use for testing this (at least as I imagine it) so I don't think adding them would be very easy for me.

My test would be to construct a bzr repo, and attempt to parse a stream built by git fast-export using bzr fast-import with a given --import-marks marks file. This is how git-bzr-ng is currently testing it (you are welcome to look at the tests there and massage some of them into your framework), if you want to avoid the dependency on Git I can also provide you with the example fast-export output and marks files, but I do hope that this patch's acceptance isn't dependent on those tests being added to this repo.

Revision history for this message
Martin Pool (mbp) wrote :

I think a small example input would be nice; we can then just paste
that into the tests and check that they handle it correctly.

Revision history for this message
termie (termie) wrote :

It is not quite that simple, as there are 4 distinct bugs being resolved here that don't all get triggered through the same code paths. Anyway, it fails on pretty much any trivial import from git (i don't see any tests actually testing the functionality of the code anywhere :/), here is a shell script that demonstrates the failure. Also probably requires bzr-2.2 because there is a bug in bzr-2.3 that will produce an exception unrelated to the bug in bzr-fastimport.

#!/bin/bash

TESTDIR=/tmp/git-bzr-test
BZRBRANCHNAME=bzrtest
BZRBRANCH=$TESTDIR/$BZRBRANCHNAME

if [ -d "$TESTDIR" ];
then
  rm -rf $TESTDIR
fi

mkdir $TESTDIR
cd $TESTDIR

# Make a bzr branch to interact with
bzr init $BZRBRANCH
cd $BZRBRANCH
echo "touch" > touch.txt
bzr add touch.txt
bzr commit -m "touch test"
echo "touch2" > touch2.txt
bzr add touch2.txt
bzr commit -m "touch2 test"

# Make the git repo and import the bzr repo
git init ${BZRBRANCH}_git
cd ${BZRBRANCH}_git
mkdir -p ${BZRBRANCH}_git/.git/bzr/map
bzr init-repo --no-trees ${BZRBRANCH}_git/.git/bzr/repo
bzr branch ${BZRBRANCH} ${BZRBRANCH}_git/.git/bzr/repo/master
bzr fast-export --plain \
                --export-marks=${BZRBRANCH}_git/.git/bzr/map/master-bzr \
                --git-branch=bzr/master ${BZRBRANCH}_git/.git/bzr/repo/master | \
  git fast-import --quiet --export-marks=${BZRBRANCH}_git/.git/bzr/map/master-git

# -- captured output (bzr fast-export) --
#commit refs/heads/bzr/master
#mark :1
#committer termie <email address hidden> 1296532203 -0800
#data 10
#touch test
#M 644 inline touch.txt
#data 6
#touch

#commit refs/heads/bzr/master
#mark :2
#committer termie <email address hidden> 1296532203 -0800
#data 11
#touch2 test
#from :1
#M 644 inline touch2.txt
#data 7
#touch2

# -- .git/bzr/map/master-bzr --
#:1 <email address hidden>
#:2 <email address hidden>

git branch master bzr/master
git checkout master

# Make some changes and push them back
git checkout -b pushed
echo 'touch3' > touch2.txt
git add touch2.txt
git commit -m 'touch3 test'
bzr branch ${BZRBRANCH}_git/.git/bzr/repo/master ${BZRBRANCH}_git/.git/bzr/repo/pushed
git fast-export --import-marks=${BZRBRANCH}_git/.git/bzr/map/master-git \
                --export-marks=${BZRBRANCH}_git/.git/bzr/map/pushed-git \
                pushed | \
    bzr fast-import --import-marks=${BZRBRANCH}_git/.git/bzr/map/master-bzr \
                    --export-marks=${BZRBRANCH}_git/.git/bzr/map/pushed-bzr \
                    - \
                    ${BZRBRANCH}_git/.git/bzr/repo/pushed

# -- captured output (git fast-export) --
#blob
#mark :3
#data 7
#touch3

#commit refs/heads/pushed
#mark :4
#author termie <email address hidden> 1296532204 -0800
#committer termie <email address hidden> 1296532204 -0800
#data 12
#touch3 test
#from :2
#M 100644 :3 touch2.txt

Revision history for this message
termie (termie) wrote :

This, btw, is only demonstrating the first of the failures. To test all of them I suggest you refer to the full tests in git-bzr-ng

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Sun, Jan 30, 2011 at 10:52:15PM -0000, termie wrote:
> termie has proposed merging lp:~termie/bzr-fastimport/marks_normalization into lp:bzr-fastimport.
>
> Requested reviews:
> Bazaar Developers (bzr)
>
> For more details, see:
> https://code.launchpad.net/~termie/bzr-fastimport/marks_normalization/+merge/47944
>
> Add a bunch of mark id normalization.
>
> bzr-fastimport is getting marks from a variety of sources, all of which
> us the format ':\d+' but it really wants just the raw number internally.
>
> This patch cleans up the symptoms of many places where this normalization
> was not occuring.
I've pushed a new branch based on this one to
lp:~jelmer/bzr-fastimport/marks-normalization, which centralizes the dealing
with marks and allows for easier support of other committish in the future.

Do those changes look reasonable, and does it still work with git-bzr-ng ?

Cheers,

Jelmer

Revision history for this message
termie (termie) wrote :

There appears to be no such branch, nor any recent modifications to trunk. I do suggest, however, that you simply install git, git clone git://github.com/termie/git-bzr-ng.git and run the tests yourself. The tests are self-contained, simply replace "fastimport" in vendor with your version.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :
Download full text (8.1 KiB)

On Tue, 2011-02-01 at 18:05 +0000, termie wrote:
> There appears to be no such branch, nor any recent modifications to
> trunk.
Sorry, it should be there now -
lp:~jelmer/bzr-fastimport/marks-normalization

I've also inlined the diff for these changes (against trunk) below.

> I do suggest, however, that you simply install git, git clone
> git://github.com/termie/git-bzr-ng.git and run the tests yourself. The
> tests are self-contained, simply replace "fastimport" in vendor with
> your version.
I'm keen on improving bzr-fastimport to comply better with the
fastimport protocol (and extending its testsuite to make sure it does)
and appreciate the patch. However, I have no interest in git-bzr-ng
specifically so I'll leave its testing up to you.

Cheers,

Jelmer

=== modified file 'branch_updater.py'
--- branch_updater.py 2010-10-15 19:25:16 +0000
+++ branch_updater.py 2011-02-01 10:04:51 +0000
@@ -121,7 +121,7 @@
                 except errors.BzrError, ex:
                     error("ERROR: failed to create branch %s: %s",
                         location, ex)
- lost_head = self.cache_mgr.revision_ids[tip]
+ lost_head = self.cache_mgr.lookup_committish(tip)
             lost_info = (name, lost_head)
             lost_heads.append(lost_info)
         return branch_tips, lost_heads
@@ -147,11 +147,11 @@

     def _update_branch(self, br, last_mark):
         """Update a branch with last revision and tag information.
-
+
         :return: whether the branch was changed or not
         """
         from fastimport.helpers import single_plural
- last_rev_id = self.cache_mgr.revision_ids[last_mark]
+ last_rev_id = self.cache_mgr.lookup_committish(last_mark)
         revs =
list(self.repo.iter_reverse_revision_history(last_rev_id))
         revno = len(revs)
         existing_revno, existing_last_rev_id = br.last_revision_info()

=== modified file 'bzr_commit_handler.py'
--- bzr_commit_handler.py 2011-01-23 09:27:40 +0000
+++ bzr_commit_handler.py 2011-02-01 10:04:51 +0000
@@ -113,7 +113,7 @@

         # Convert the parent commit-ids to bzr revision-ids
         if parents:
- self.parents = [self.cache_mgr.revision_ids[p]
+ self.parents = [self.cache_mgr.lookup_committish(p)
                 for p in parents]
         else:
             self.parents = []

=== modified file 'cache_manager.py'
--- cache_manager.py 2010-12-22 03:36:48 +0000
+++ cache_manager.py 2011-02-01 10:04:51 +0000
@@ -96,7 +96,7 @@

         # import commmit-ids -> revision-id lookup table
         # we need to keep all of these but they are small
- self.revision_ids = {}
+ self.marks = {}

         # (path, branch_ref) -> file-ids - as generated.
         # (Use store_file_id/fetch_fileid methods rather than direct
access.)
@@ -121,12 +121,25 @@

         self.reftracker = RefTracker()

+ def add_mark(self, mark, commit_id):
+ assert mark[0] != ':'
+ self.marks[mark] = commit_id
+
+ def lookup_committish(self, committish):
+ """Resolve a 'committish' to a revision id.
+
+ :param committish: A "committish" string
+ :return: Bazaar revision id
+ """...

Read more...

Revision history for this message
termie (termie) wrote :

Your patch works fine in my tests, looking forward to it getting merged :)

If you have a keen interest in making bzr-fastimport work, please add tests for it like the one I provided (and was asked for by Martin), any trivial attempt to actually import a git branch as provided by git fast-export without these patches has been failing, likely for some time.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

On Tue, 2011-02-01 at 21:19 +0000, termie wrote:
> Your patch works fine in my tests, looking forward to it getting merged :)
Great, thanks for checking. :) I'll have a look at merging this into trunk.

> If you have a keen interest in making bzr-fastimport work, please add
> tests for it like the one I provided (and was asked for by Martin), any
> trivial attempt to actually import a git branch as provided by git
> fast-export without these patches has been failing, likely for some
> time.
I've successfully imported streams generated by git-fast-export using
bzr-fast-import without these patches in the past. It's probably just
incremental imports that were affected.

Cheers,

Jelmer

Revision history for this message
termie (termie) wrote :

K, well either way you are welcome to make use of my test script. Can I expect that this is now going to be merged into trunk?

Revision history for this message
Martin Pool (mbp) wrote :

Yes, this should get merged to trunk as part of Jelmer's work. Thanks.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'branch_updater.py'
2--- branch_updater.py 2010-10-15 19:25:16 +0000
3+++ branch_updater.py 2011-01-30 22:52:14 +0000
4@@ -151,6 +151,7 @@
5 :return: whether the branch was changed or not
6 """
7 from fastimport.helpers import single_plural
8+ last_mark = last_mark.lstrip(':')
9 last_rev_id = self.cache_mgr.revision_ids[last_mark]
10 revs = list(self.repo.iter_reverse_revision_history(last_rev_id))
11 revno = len(revs)
12
13=== modified file 'bzr_commit_handler.py'
14--- bzr_commit_handler.py 2011-01-23 09:27:40 +0000
15+++ bzr_commit_handler.py 2011-01-30 22:52:14 +0000
16@@ -113,7 +113,7 @@
17
18 # Convert the parent commit-ids to bzr revision-ids
19 if parents:
20- self.parents = [self.cache_mgr.revision_ids[p]
21+ self.parents = [self.cache_mgr.revision_ids[p.lstrip(':')]
22 for p in parents]
23 else:
24 self.parents = []
25
26=== modified file 'marks_file.py'
27--- marks_file.py 2010-12-11 21:27:52 +0000
28+++ marks_file.py 2011-01-30 22:52:14 +0000
29@@ -53,8 +53,7 @@
30 while line:
31 line = line.rstrip('\n')
32 mark, revid = line.split(' ', 1)
33- if mark.startswith(':'):
34- mark = mark[1:]
35+ mark = mark.lstrip(':')
36 revision_ids[mark] = revid
37 line = f.readline()
38 f.close()
39@@ -76,5 +75,5 @@
40
41 # Write the revision info
42 for mark, revid in revision_ids.iteritems():
43- f.write(':%s %s\n' % (mark, revid))
44+ f.write(':%s %s\n' % (str(mark).lstrip(':'), revid))
45 f.close()
46
47=== modified file 'processors/generic_processor.py'
48--- processors/generic_processor.py 2010-12-12 03:36:13 +0000
49+++ processors/generic_processor.py 2011-01-30 22:52:14 +0000
50@@ -533,9 +533,9 @@
51 except:
52 print "ABORT: exception occurred processing commit %s" % (cmd.id)
53 raise
54- self.cache_mgr.revision_ids[cmd.id] = handler.revision_id
55+ self.cache_mgr.revision_ids[cmd.id.lstrip(':')] = handler.revision_id
56 self._revision_count += 1
57- self.report_progress("(%s)" % cmd.id)
58+ self.report_progress("(%s)" % cmd.id.lstrip(':'))
59
60 if cmd.ref.startswith('refs/tags/'):
61 tag_name = cmd.ref[len('refs/tags/'):]

Subscribers

People subscribed via source and target branches