Code review comment for ~nacc/git-ubuntu:lp1730734-cache-importer-progress

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

[...]

> So now if we use an unmodified importer, we will end up going back in
> IMPORT to the last publish that was the first time we saw a given
> version (typically in Debian, therefore) and walking forward from there.

To confirm - you go to "the first time we saw a given version" because without parenting history that is the only safe place to walk forward from?

> This will be unnecessary iteration of publishing data

because some might already have been handled.

>, at least, and
> unnecessary moving of the branch pointers.

For the walking

Thanks for the details, much clearer to me now what your case actually is.

>
> My branch modifies the catch-up to move the storage of the publication
> event to an external cache, currently DBM.

[...]

> The more I think about it, I think I am spinning myself on this problem
> for no reason :)
>
> I think we can let the linear walker not use the dbm [we can keep the
> code, just not pass anything] (or sqlite) and it will just mean if there
> is a race between when the linear walker hits a source package and when
> the keep-up script does, there will be some no-op looping (unless new
> SPRs are found).

Sounds good, and then you can actually lock the DB to be fully safe against unintentional concurrent use of it by the catch up (if one manually starts it twice or so).

[...]

> I do think we actually need 4 caches, since the unapplied / applied heads can easily be at a
> different state. I'll work on this now.

> I'll also add some changes to the importer to clean up that code.

Since you will end up only accessing it by code and never manually I think you could even implement as debian/ubuntu x unapplied/applied x per-package. If you end up locking the DBs in any way you will still have a better granularity on those locks.
And for your code it doesn't matter how split the file structure is.

« Back to merge proposal