lp:extract-changelogs

Created by Michael Vogt on 2016-03-11 and last modified on 2016-04-29
Get this branch:
bzr branch lp:extract-changelogs
Members of Ubuntu Core Development Team can upload to this branch. Log in for directions.

Branch merges

Related bugs

Related blueprints

Branch information

Owner:
Ubuntu Core Development Team
Project:
extract-changelogs
Status:
Development

Recent revisions

49. By Greg Mason on 2016-04-29

[cjwatson, r=gmason] Use archive.getPublishedSources(order_by_date=True) for a significant speedup.

The query that extract-changelogs is currently relying on is very slow, and there are some subtle ways in which iterating over the collection can go wrong. For ddeb-retriever, we did a fair bit of work on this:

  https://bugs.launchpad.net/launchpad/+bug/1441729
  https://code.launchpad.net/~cjwatson/launchpad/db-index-bpph-datecreated/+merge/255539
  https://code.launchpad.net/~cjwatson/launchpad/getpublishedbinaries-sorting/+merge/255822

In the case of extract-changelogs, it should be sufficient to add order_by_date=True, which has the effect of joining fewer tables and using a reasonably well-indexed query to return a collection which is in decreasing ID order. If the collection changes during iteration (as long as you don't try to do any status filtering or similar, as explained in a comment here) then the worst case is that you get the same source package more than once, but extract-changelogs already handles this in LaunchpadChangelogsCrawler._unpack_changelogs_to_target.

Please do test this! I have not done so. However, I hear that extract-changelogs times out when asked to work from a very old starting date, and this should make it behave a lot better.

Review:

moon127 ran this successfully. After discussion, it looks safe to merge this with IS superpowers.

48. By Brian Murray on 2016-03-11

merge mvo's r41 which set a socket timeout to avoid hanging for days

47. By Brian Murray on 2016-03-11

add date, process id to log file. explictly close the apt lock.

46. By Brian Murray on 2016-03-11

log how many symlinks we created

45. By Brian Murray on 2016-03-11

move symlink creation into its own function and try to create the symlink if even if the changelog has been extracted.

44. By Stéphane Graber on 2013-07-24

Apply local changes from production

43. By Stéphane Graber on 2013-07-24

Change the binary symlink logic to create symlinks under binary/<poolhash>/<binary>/<version>

42. By Stéphane Graber on 2013-07-24

Use urllib2 to download the files to avoid a weird urllib/launchpadlib interaction preventing me from downloading the files on my machine.

41. By Stéphane Graber on 2013-07-24

Cleanup the existing code to pass pyflakes, pyflakes3 and pep8

40. By Michael Vogt on 2011-07-07

lp-extract-changelogs.py: do not append "/" to dest

Branch metadata

Branch format:
Branch format 6
Repository format:
Bazaar pack repository format 1 (needs bzr 0.92)
This branch contains Public information 
Everyone can see this information.