lp:extract-changelogs

Created by Michael Vogt and last modified
Get this branch:
bzr branch lp:extract-changelogs
Members of Ubuntu Core Development Team can upload to this branch. Log in for directions.

Branch merges

Related bugs

Related blueprints

Branch information

Owner:
Ubuntu Core Development Team
Project:
extract-changelogs
Status:
Development

Recent revisions

49. By Greg Mason

[cjwatson, r=gmason] Use archive.getPublishedSources(order_by_date=True) for a significant speedup.

The query that extract-changelogs is currently relying on is very slow, and there are some subtle ways in which iterating over the collection can go wrong. For ddeb-retriever, we did a fair bit of work on this:

  https://bugs.launchpad.net/launchpad/+bug/1441729
  https://code.launchpad.net/~cjwatson/launchpad/db-index-bpph-datecreated/+merge/255539
  https://code.launchpad.net/~cjwatson/launchpad/getpublishedbinaries-sorting/+merge/255822

In the case of extract-changelogs, it should be sufficient to add order_by_date=True, which has the effect of joining fewer tables and using a reasonably well-indexed query to return a collection which is in decreasing ID order. If the collection changes during iteration (as long as you don't try to do any status filtering or similar, as explained in a comment here) then the worst case is that you get the same source package more than once, but extract-changelogs already handles this in LaunchpadChangelogsCrawler._unpack_changelogs_to_target.

Please do test this! I have not done so. However, I hear that extract-changelogs times out when asked to work from a very old starting date, and this should make it behave a lot better.

Review:

moon127 ran this successfully. After discussion, it looks safe to merge this with IS superpowers.

48. By Brian Murray

merge mvo's r41 which set a socket timeout to avoid hanging for days

47. By Brian Murray

add date, process id to log file. explictly close the apt lock.

46. By Brian Murray

log how many symlinks we created

45. By Brian Murray

move symlink creation into its own function and try to create the symlink if even if the changelog has been extracted.

44. By Stéphane Graber

Apply local changes from production

43. By Stéphane Graber

Change the binary symlink logic to create symlinks under binary/<poolhash>/<binary>/<version>

42. By Stéphane Graber

Use urllib2 to download the files to avoid a weird urllib/launchpadlib interaction preventing me from downloading the files on my machine.

41. By Stéphane Graber

Cleanup the existing code to pass pyflakes, pyflakes3 and pep8

40. By Michael Vogt

lp-extract-changelogs.py: do not append "/" to dest

Branch metadata

Branch format:
Branch format 6
Repository format:
Bazaar pack repository format 1 (needs bzr 0.92)
This branch contains Public information 
Everyone can see this information.