lp:~nacc/extract-changelogs/changelog-components

Created by Nish Aravamudan and last modified
Get this branch:
bzr branch lp:~nacc/extract-changelogs/changelog-components
Only Nish Aravamudan can upload to this branch. If you are Nish Aravamudan please log in for upload directions.

Branch merges

Related bugs

Related blueprints

Branch information

Owner:
Nish Aravamudan
Project:
extract-changelogs
Status:
Development

Recent revisions

53. By Nish Aravamudan

_create_{binary,component}_symlinks: specify type of broken symlink to remove

52. By Nish Aravamudan

Move pool/ changelogs into the root directory, symlink in component subdirs

If a srcpkg is in main but some binaries are in universe, `apt changelog
$binpkg` will fail because `apt` has the component in the URL and the
component comes from the binary package, but the changelog is stored
according to the source package. There isn't really any reason for the
component to be in the URI, but we need to maintain
backwards-compatibility.

For new changelogs, elide the component from the URI for the actual
file. Make symlinks in the old component/ path for compatibility.

LP: #1672555

51. By Nish Aravamudan

_create_binary_symlinks: create symlinks regardless of existing changelog

r45 change the behavior in attempting to fix the code, by creating the
binary/ symlinks even if the target changelog already exists. However,
it removed the invocation of (what is now) _create_binary_symlinks that
occurs when the target changelog is initially created.

50. By Nish Aravamudan

create_binary_symlinks: log symlinks correctly

Right now, the symlink deletion and creation logs appear to be reversed
(referring to the target not to the link).

49. By Greg Mason

[cjwatson, r=gmason] Use archive.getPublishedSources(order_by_date=True) for a significant speedup.

The query that extract-changelogs is currently relying on is very slow, and there are some subtle ways in which iterating over the collection can go wrong. For ddeb-retriever, we did a fair bit of work on this:

  https://bugs.launchpad.net/launchpad/+bug/1441729
  https://code.launchpad.net/~cjwatson/launchpad/db-index-bpph-datecreated/+merge/255539
  https://code.launchpad.net/~cjwatson/launchpad/getpublishedbinaries-sorting/+merge/255822

In the case of extract-changelogs, it should be sufficient to add order_by_date=True, which has the effect of joining fewer tables and using a reasonably well-indexed query to return a collection which is in decreasing ID order. If the collection changes during iteration (as long as you don't try to do any status filtering or similar, as explained in a comment here) then the worst case is that you get the same source package more than once, but extract-changelogs already handles this in LaunchpadChangelogsCrawler._unpack_changelogs_to_target.

Please do test this! I have not done so. However, I hear that extract-changelogs times out when asked to work from a very old starting date, and this should make it behave a lot better.

Review:

moon127 ran this successfully. After discussion, it looks safe to merge this with IS superpowers.

48. By Brian Murray

merge mvo's r41 which set a socket timeout to avoid hanging for days

47. By Brian Murray

add date, process id to log file. explictly close the apt lock.

46. By Brian Murray

log how many symlinks we created

45. By Brian Murray

move symlink creation into its own function and try to create the symlink if even if the changelog has been extracted.

44. By Stéphane Graber

Apply local changes from production

Branch metadata

Branch format:
Branch format 6
Repository format:
Bazaar pack repository format 1 (needs bzr 0.92)
This branch contains Public information 
Everyone can see this information.

Subscribers