generate-ppa-htaccess is too slow

Bug #628711 reported by Michael Nelson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Michael Nelson

Bug Description

Looking at the logs, generate-ppa-htaccess is taking close to 2 minutes on each run currently, which makes it difficult to schedule any more frequently than every 5 minutes.

30 seconds of each run is setup/teardown (ie. time between creating/removing lockfiles and starting/finishing the script), but 1:30 is running the actual script.

Looking at the code, the script is doing three things for each private PPA:
 1) Iterating all subscriptions for the PPA to check if they are expired, and expiring where appropriate
 2) Iterates through all the tokens for the PPA and ensures they are all still valid, deactivating where appropriate
 3) Checks and updates the htaccess file:
   i) gets a publisher configuration for the PPA and checks that an htaccess exists
   ii) gets a publisher configuration for the PPA and generates a new htpasswd file
   iii) gets a publisher configuration for the PPA and compares the new passwd file with the old, replacing when appropriate.
Finally, the transaction is committed.

First, it doesn't look like the getPubConfig is too expensive, but still, I can't see a reason why we can't call this just once (actually, even just once for the complete set as we know they are all private PPAs).

Second, if we were to split this job so that (1) and (2) above were in a separate cron that ran hourly (or some other interval other than 5 minutes), the remainder of (3) would be much faster - in fact, I don't see why it would even need to run in a transaction.

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote :

I think (1) could potentially be separate, expiring doesn't have to be *that* timely.

However, (2) *does* need to be timely since someone may be deactivating a token for a malicious user or for a security leak.

We might be able to speed this script up by having an intermediate table that holds a set of actions that must take place, which gets populated when someone a) deactivates, b) creates a token, c) regenerates a token (which is similar to the publisher's "dirty pockets" concept).

Obviously, the other cope optimisations you mention would also be useful.

tags: added: p3a ppa software-center
Changed in soyuz:
status: New → Triaged
importance: Undecided → High
Changed in soyuz:
status: Triaged → In Progress
assignee: nobody → Michael Nelson (michael.nelson)
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 628711] [NEW] generate-ppa-htaccess is too slow

Is there any reason this can't be:
 - query expired subscriptions; deactivate
 - query invalid tokens; deactive
 - output new htaccess files for affected ppas
 - commit-and-done?

The incremental work at any point should be pretty small right, if we
are running this every (say) minute.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

On Thu, Sep 2, 2010 at 10:17 PM, Robert Collins
<email address hidden> wrote:
> Is there any reason this can't be:
>  - query expired subscriptions; deactivate

That is possible (and I think would be easy)

>  - query invalid tokens; deactive

This *may* be possible. I can see how we could do one query per PPA,
and that should generalise for all PPAs. I'll look into it.

>  - output new htaccess files for affected ppas

That will ensure that people who no longer have valid tokens can no
longer access the PPAs, but it won't help us update those passwd files
where there are new tokens. AFAICS, the only way currently to know we
have a token that is not yet in the passwd file is to check the passwd
file.

That said, we could narrow down the number of ppas we update using the
tokens.date_created timestamp (ie. by default only add ppas with new
tokens created in the last 5 minutes - or double the cron time etc.,
as a parameter - to the affected ppas)

>  - commit-and-done?
>
> The incremental work at any point should be pretty small right, if we
> are running this every (say) minute.

Sounds good - thanks for the pointers Robert.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Friday 03 September 2010 08:55:10 Michael Nelson wrote:
> That said, we could narrow down the number of ppas we update using the
> tokens.date_created timestamp (ie. by default only add ppas with new
> tokens created in the last 5 minutes - or double the cron time etc.,
> as a parameter - to the affected ppas)

Ah this is a better strategy. But rather than guessing at the time, can we
use the entry in scriptactivity to see when we last ran successfully?

Revision history for this message
Michael Nelson (michael.nelson) wrote :

On Fri, Sep 3, 2010 at 11:16 AM, Julian Edwards
<email address hidden> wrote:
> Ah this is a better strategy.  But rather than guessing at the time, can we
> use the entry in scriptactivity to see when we last ran successfully?

That would be perfect :)

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Preliminary results from running the cronscript on dogfood when there are no changes (no new tokens or tokens to deactivate):

Before: 33 seconds, after applying the diff from the above branch: 1-2 seconds.

Details here: http://pastebin.ubuntu.com/487832/

Same performance when new tokens activated or deactivated.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

QA on dogfood: https://pastebin.canonical.com/36739/ (of r11493)

Revision history for this message
Launchpad QA Bot (lpqabot) wrote : Bug fixed by a commit
Changed in soyuz:
milestone: none → 10.09
tags: added: qa-needstesting
Changed in soyuz:
status: In Progress → Fix Committed
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 628711] Re: generate-ppa-htaccess is too slow

\o/

tags: added: qa-ok
removed: qa-needstesting
Curtis Hovey (sinzui)
Changed in soyuz:
status: Fix Committed → Fix Released
Revision history for this message
Michael Nelson (michael.nelson) wrote :

The logs look much healthier for generate-ppa-htaccess:

2010-09-28 00:10:06 INFO Creating lockfile:/var/lock/launchpad-generate-ppa-htaccess.lock
2010-09-28 00:10:23 INFO Starting the PPA .htaccess generation
2010-09-28 00:10:23 INFO Committing transaction...
2010-09-28 00:10:24 INFO Finished PPA .htaccess generation
2010-09-28 00:10:33 DEBUG Removing lock file:/var/lock/launchpad-generate-ppa-htaccess.lock

I've requested that the cronjob be updated to every minute (RT 41629).

Revision history for this message
Julian Edwards (julian-edwards) wrote :

On Tuesday 28 September 2010 16:50:11 you wrote:
> The logs look much healthier for generate-ppa-htaccess:
>
> 2010-09-28 00:10:06 INFO Creating
> lockfile:/var/lock/launchpad-generate-ppa-htaccess.lock 2010-09-28
> 00:10:23 INFO Starting the PPA .htaccess generation
> 2010-09-28 00:10:23 INFO Committing transaction...
> 2010-09-28 00:10:24 INFO Finished PPA .htaccess generation
> 2010-09-28 00:10:33 DEBUG Removing lock
> file:/var/lock/launchpad-generate-ppa-htaccess.lock
>
> I've requested that the cronjob be updated to every minute (RT 41629).

It's probably easier to just make the change yourself on the private
production crontabs branch and get them to roll it out - the branch is not
managed by PQM :(

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.