Code review comment for ~nacc/git-ubuntu:lp1730734-cache-importer-progress

Revision history for this message
Nish Aravamudan (nacc) wrote :

I am fairly confident in the actual changes here. What I think needs deciding is how the linear import script and publisher-watching script should interact with a shared DBM cache. I think it is relatively safe within one process, but from what I can tell, there is no guarantee the two won't race and the DBM implementation itself is not concurrency safe.

Possibly we should switch to sqlite3 for that, then, even though it is more overhead to configure and query.

Finally, I'm wondering how to describe the race in question. Basically, we'd see two processes read different (possibly unset in one case) values for the last SPPHR seen in the cache. They would then iterate a different set of Launchpad records, but the result would be the same, relative to the actual contents of the imports. The question is the branches, I think. They both start with the same git repository (effectively, under the assumption neither has pushed) and move the branches. In the case of one processing more (older) records, they would move the branches more, but the end result *should* be the same, or an ancestor of what will occur. But that's the concern (ancestor) as we are now timing-sensitive to what gets pushed (since we force push branches). I wonder if we should treat it more like RCU, in that we can always read or write, but the read data may be stale. So it needs to be verified before and after our loop. I imagine we could read the data, iterate our records and import them, and before we write our last_spphr back into the database, check to see if the current value is still the same? That would shrink the race to between re-reading the value and writing the new value (and the linear importer is a one-time operation, in theory).

« Back to merge proposal