patchwork:stable/2.1

Last commit made on 2020-04-14
Get this branch:
git clone -b stable/2.1 https://git.launchpad.net/patchwork

Branch merges

Branch information

Name:
stable/2.1
Repository:
lp:patchwork

Recent commits

2a940ff... by Daniel Axtens on 2020-04-14

Post-release version bump

Signed-off-by: Daniel Axtens <email address hidden>

2dd5d6c... by Daniel Axtens on 2020-04-14

Release 2.1.6

Signed-off-by: Daniel Axtens <email address hidden>

eddea66... by Daniel Axtens on 2020-03-17

REST: Add release note for faster queries

Didn't quite seem like it fit anywhere else in the series. I want
the release note mostly because I hope to backport this to stable.

Signed-off-by: Daniel Axtens <email address hidden>
Reviewed-by: Stephen Finucane <email address hidden>
(cherry picked from commit 271d91341e0e9ebea13fcfc46fb2f68106753c66)
Signed-off-by: Daniel Axtens <email address hidden>

90d85e2... by Daniel Axtens on 2020-03-17

REST: extend performance improvements to other parts of the API

We can trivially extend what we've just done to other parts of the API.

I haven't done much by way of benchmark but we're seeing multiple 'x's
pretty much across the board when filtering.

Signed-off-by: Daniel Axtens <email address hidden>
Reviewed-by: Stephen Finucane <email address hidden>
(backported from commit 046aa155c3bf827691bab9e1df8916c969a30d54
 - dropped tests, it depends on a test we don't carry
 - rejigged to suit old M:N series model and old API)
Signed-off-by: Daniel Axtens <email address hidden>

4e3100f... by Daniel Axtens on 2020-03-17

REST: fix patch listing query

The patch listing query is punishingly slow under even very simple
filters.

The new data model in 3.0 will help _a lot_, so this is a simple fix: I
did try indexes but haven't got really deeply into the weeds of what we can do
with them.

Move a number of things from select_related to prefetch_related: we trade off
one big, inefficient query for a slightly larger number of significantly more
efficient queries.

On my laptop with 2 copies of the canonical kernel team list loaded into the
database, and considering only the API view (the JSON view is always faster)
with warm caches and considering the entire set of SQL queries:

 - /api/patches/?project=1
    ~1.4-1.5s -> <100ms, something like 14x better

 - /api/patches/?project=1&since=2010-11-01T00:00:00
    ~1.7-1.8s -> <560ms, something like 3x better (now dominated by the
                         counting query only invoked on the HTML API view,
                         not the pure JSON API view.)

The things I moved:

 * project: this was generating SQL that looked like:

   INNER JOIN `patchwork_project` T5
    ON (`patchwork_submission`.`project_id` = T5.`id`)

   This is correct but we've already had to join the patchwork_submission
   table and perhaps as a result it seems to be inefficient.

 * series__project: Likewise we've already had to join the series table,
   doing another join is possibly why it is inefficient.

 * delegate: I do not know why this was tanking performance. I think it
   might relate to the strategy mysql was using.

Reported-by: Konstantin Ryabitsev <email address hidden>
Signed-off-by: Daniel Axtens <email address hidden>
Reviewed-by: Stephen Finucane <email address hidden>
(backported from commit 98a2d051372dcedb889c4cb94ebd8ed7b399b522
 - dropped tests, it depends on a test we don't carry
 - rejigged to suit old M:N series model)
Signed-off-by: Daniel Axtens <email address hidden>

913f195... by Daniel Axtens on 2020-03-17

REST: massively improve the patch counting query under filters

The DRF web view counts the patches as part of pagination.

The query it uses is a disaster zone:

  SELECT Count(*) FROM (
    SELECT DISTINCT
      `patchwork_submission`.`id` AS Col1,
      `patchwork_submission`.`msgid` AS Col2,
      `patchwork_submission`.`date` AS Col3,
      `patchwork_submission`.`submitter_id` AS Col4,
      `patchwork_submission`.`project_id` AS Col5,
      `patchwork_submission`.`name` AS Col6,
      `patchwork_patch`.`submission_ptr_id` AS Col7,
      `patchwork_patch`.`commit_ref` AS Col8,
      `patchwork_patch`.`pull_url` AS Col9,
      `patchwork_patch`.`delegate_id` AS Col10,
      `patchwork_patch`.`state_id` AS Col11,
      `patchwork_patch`.`archived` AS Col12,
      `patchwork_patch`.`hash` AS Col13,
      `patchwork_patch`.`patch_project_id` AS Col14,
      `patchwork_patch`.`series_id` AS Col15,
      `patchwork_patch`.`number` AS Col16,
      `patchwork_patch`.`related_id` AS Col17
    FROM `patchwork_patch`
    INNER JOIN `patchwork_submission`
      ON (`patchwork_patch`.`submission_ptr_id`=`patchwork_submission`.`id`)
    WHERE `patchwork_submission`.`project_id`=1
  )

This is because django-filters adds a DISTINCT qualifier on a
ModelMultiChoiceFilter by default. I guess it makes sense and they do a
decent job of justifying it, but it causes the count to be made with
this awful subquery. (The justification is that they don't know if you're
filtering on a to-many relationship, in which case there could be
duplicate values that need to be removed.)

While fixing that, we can also tell the filter to filter on patch_project
rather than submission's project, which allows us in some cases to avoid
the join entirely.

The resultant SQL is beautiful when filtering by project only:

  SELECT COUNT(*) AS `__count`
  FROM `patchwork_patch`
  WHERE `patchwork_patch`.`patch_project_id` = 1

On my test setup (2x canonical kernel mailing list in the db, warm cache,
my laptop) this query goes from >1s to ~10ms, a ~100x improvement.

If we filter by project and date the query is still nice, but still also
very slow:

  SELECT COUNT(*) AS `__count`
  FROM `patchwork_patch`
  INNER JOIN `patchwork_submission`
    ON (`patchwork_patch`.`submission_ptr_id`=`patchwork_submission`.`id`)
  WHERE (`patchwork_patch`.`patch_project_id`=1 AND `patchwork_submission`.`date`>='2010-11-01 00:00:00')

This us from ~1.3s to a bit under 400ms - still not ideal, but I'll take
the 3x improvement!

Reported-by: Konstantin Ryabitsev <email address hidden>
Signed-off-by: Daniel Axtens <email address hidden>
Reviewed-by: Stephen Finucane <email address hidden>
(backported from commit 97155c0bc8881787f6c536031b678a4c3f89bda6
  old django-filters uses 'name' instead of 'field_name')
Signed-off-by: Daniel Axtens <email address hidden>

f40bcd0... by Mete Polat <email address hidden> on 2020-01-29

REST: Fix duplicate project queries

Eliminates duplicate project queries caused by calling
get_absolute_url() in the embedded serializers. Following foreign keys
with 'series__project' will cache the project of the series as well as
the series itself.

Signed-off-by: Mete Polat <email address hidden>
Signed-off-by: Stephen Finucane <email address hidden>
Closes: #335
(backported from commit ec00daae4d79bf2560034e1b2bc3cf76a98a3212
  dropped all the tests, they clash horribly)
Signed-off-by: Daniel Axtens <email address hidden>

5805143... by Konstantin Ryabitsev <email address hidden> on 2020-01-27

Handle pull requests with random trailing space

Another fix for copy-pasted pull requests, this time for cases
when something is copy-pasted from a terminal and retains all
the bogus trailing whitespace.

Example:
https://<email address hidden>

Signed-off-by: Konstantin Ryabitsev <email address hidden>
Reviewed-by: Stephen Finucane <email address hidden>
(cherry picked from commit 1633afe5b46042d522777f66b1959a82298d0ab2)

aa75400... by Stephen Finucane <email address hidden> on 2019-11-30

parser: Fix style issues

NOTE(stephenfin): Modified to fix another latent style issue that's
popping up here.

Signed-off-by: Stephen Finucane <email address hidden>
Fixes: 8c229caa ("Improve pull request URL matching regex")
(cherry picked from commit 3ed6a1434dd90f073f5db5a6e85a80469aaaed40)

86137cc... by Konstantin Ryabitsev <email address hidden> on 2019-11-16

Improve pull request URL matching regex

When git-request-pull output is pasted into a mail client instead of
mailed directly, the ref part of the pull URL may end up wrapped to the
next line.

Example: https://<email address hidden>/

This change properly parses URLs both with and without newlines.

Signed-off-by: Konstantin Ryabitsev <email address hidden>
Reviewed-by: Andrew Donnellan <email address hidden>
(cherry picked from commit 8c229caa71d58e971708ac7ebdb02d6858cd2a4c)