~nacc/git-ubuntu:lp1731554-importer-rework

Last commit made on 2017-11-17
Get this branch:
git clone -b lp1731554-importer-rework https://git.launchpad.net/~nacc/git-ubuntu
Only Nish Aravamudan can upload to this branch. If you are Nish Aravamudan please log in for upload directions.

Branch merges

Branch information

Name:
lp1731554-importer-rework
Repository:
lp:~nacc/git-ubuntu

Recent commits

0fd899e... by Nish Aravamudan

importer: more consolidation

ff6d378... by Nish Aravamudan

importer: massive rework

I'm sorry, Robie :)

Basically, our current main importer loop looks like:

for applied in unapplied, applied:
    for dist in debian, ubuntu:
        for new publishes in $dist relative to $applied $dist branches
            import $applied publish
                this updates the branch pointer for the pocket + series
                if dist == 'ubuntu': this also updates the devel pointers

That is a fair amount of branch manipulation that will be discarded (the
series might have multiple publishes). Esp. on the first/reimport import
(worst case).

So instead:

for dist in debian, ubuntu:
    for applied in unapplied, applied:
        for new publishes in $dist relative to $applied $dist branches
            if publish has already been imported: continue
            import $applied publish

update all affected branch pointers,
    where affected is defined by running track of refs we *would* have
    updated in the original algorithm, and we also store the 'last'
    commit that it would have pointed to. This consolidates the branch
    and devel updates into one place as well.
    This also makes the importer algorithm match the spec on some level
    -- update the commit graph by importing only new publishes (which
    would create new import tags and new applied tags, while verifying
    any repeated publish data matches exactly) and then forcibly moving
    branch pointers to where they are "now" in Launchpad. The branch
    pointers are not part of the commit graph, so they change in a
    distinct step.

In my testing, this drops the reimport time for ipsec-tools from ~65
minutes to ~42 minutes consistently. We could probably speed it up more
starting from this base, if need be -- e.g., I'm not sure it makes sense
to to do the import action itself in the loop. Perhaps it's better to
accumulate the set of unique publishes in a set and then iterate that
set in a second loop.

Given that we are going to be doing a lot of reimporting soon, speeding
it up is a net win. This also tries to clean up code for readability,
and add comment as we go.

Note this does not resolve the fundamental issue I am sure exists:
orphan tags. It simply removes support for them.

LP: #1731554

e138c26... by Nish Aravamudan

source_information: drop unused parent_{applied,}_head_name

These are no longer used after the importer publishing parent drop.

2bbeba2... by Nish Aravamudan

importer: LP: #1731555

b1551b1... by Robie Basak

Initial tests for _devel_branch_updates

8a605fe... by Robie Basak

Adjust head_versions structure

Let's simplify this a little. Instead of containing a pygit2.Reference,
resolve it to a commit hash string first to simplify testing.

While I'm there, as we only need two elements, just use a two-tuple for
the dictionary values.

f060cc4... by Robie Basak

Factor out and rework _devel_branch_updates

Move the core functionality into _devel_branch_updates, making
update_devel_branches a thin wrapper to it. _devel_branch_updates now
has no dependencies so should be easier to test.

Rename namespace, applied_prefix to ref_prefix. _devel_branch_updates
can be simplified by collapsing namespace and applied_prefix into a
single concept ref_prefix.

Move printing to wrapper function. Really the inner function should do
computation only and leave it to the caller to report warnings etc. This
will prevent noise when under test.

_devel_branch_updates returns None for commit hashes, in which case the
intention is that the caller will suppress setting those refs but will
be able to note which refs are not being set. This is useful to maintain
reporting.

The debug and warning messages are now slightly changed since less
information is available one level up the stack. It should still be
sufficient for debugging or warning purposes.

8af442a... by Nish Aravamudan

importer: rework and move devel pointer moving

After discussion with Robie on IRC, we decided that 0f3c943054ab
("import: drop publishing parent functionality") and 5aa33fa08078 ("Also
reset devel heads") introduced a regression in the semantics of the
devel pointers.

Before those changes, the devel pointers were merged up so as to be
fast-forwarding, as we imported publication entries, if the publication
entry was newer than the current devel pointer. In other words, the
devel pointers were part of the commit graph itself.

After those changes, the devel pointers are more like symbolic
references, describing meta-state about the commit graph, rather than
integral to the graph itself:

A given series devel branch, after a successful import, points to the
latest publication record in a given series.

The ubuntu/devel branch, after a successful import, points to the latest
series devel branch.

Given these 'rules', we can stop updating the devel pointers in the main
import loop and just do so after we are done importing. All series
branch pointers are updated, which is unnecessary but will generally be
a no-op. This allows the method to not be aware of reimporting or not.

LP: #1730655

Fixes: 0f3c943054ab ("import: drop publishing parent functionality")
Fixes: 5aa33fa08078 ("Also reset devel heads")

ab12071... by Nish Aravamudan

source_information: add API to obtain all series objects/names

f3ff76d... by Nish Aravamudan

source_information: cleanup formatting per style