Merge ~ahasenack/ubuntu/+source/pyzmq:eoan-pyzmq-test-poll-timeout-bump into ~ahasenack/ubuntu/+source/pyzmq:master

Proposed by Andreas Hasenack
Status: Merged
Approved by: Andreas Hasenack
Approved revision: fa3aeae065c912676f07abd8a50bcc6e5bf0dc9a
Merged at revision: fa3aeae065c912676f07abd8a50bcc6e5bf0dc9a
Proposed branch: ~ahasenack/ubuntu/+source/pyzmq:eoan-pyzmq-test-poll-timeout-bump
Merge into: ~ahasenack/ubuntu/+source/pyzmq:master
Diff against target: 68 lines (+34/-1)
4 files modified
debian/changelog (+7/-0)
debian/control (+2/-1)
debian/patches/bump-poll-timeout-auth-test.patch (+24/-0)
debian/patches/series (+1/-0)
Reviewer Review Type Date Requested Status
Christian Ehrhardt  (community) Approve
Bryce Harrington (community) Approve
Review via email: mp+370368@code.launchpad.net

Description of the change

This package isn't imported into git-ubuntu, so I imported the debian package using gbp (that's the "master" branch) and branched off with my changes. I'm proposing my changes against this master branch.

pyzmq is failing DEP8 in autopkgtest.ubuntu.com, specifically the test_curve_user_id test. This also runs during package build, where it passes, but not as DEP8, and not in the autopkgtest.u.c infrastructure. The history is quite bad also: http://autopkgtest.ubuntu.com/packages/pyzmq/eoan/amd64

I couldn't reproduce it reliably enough in a local vm, with or without proposed, to troubleshoot. What I came up with here is to just bump the poll timeout. I never saw the error locally with that bump, but, as I said, not reliable. I might just not have tried long enough.

To post a comment you must log in.
Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Bryce Harrington (bryce) wrote :

The LP generated debdiff appears to be against a test build, so I constructed one against what's in eoan: http://paste.ubuntu.com/p/GBS8nD68Zg/

- Verified upstream-maintainer is updated
- Verified patch is correctly listed in debian/patches/series
- Verified debdiff targets correct release codename

Bumping the timeout seems safe and I don't have any strong arguments *against* doing so, and the packaging changes LGTM, so I feel comfortable giving a provisional +1 approval.

However, my guess is this will only partially solve the actual issue, so further thinking below:

The test cases in test_auth.py appear to be doing a bunch of connection tests against localhost. One should think these connections would be super-speedy. If they don't connect in under a second, then while giving it a larger timeout should reduce the frequency of the failures, presumably there'll still be cases where it will fail. I wonder if maybe a) use an even larger timeout??, b) skip the test if upstream already runs it "enough", or c) investigate why this particular test fails.

Spot checking the failure logs, the failures due indeed all seem to be occurring with the test_curve_user_id test case, although sometimes are under python2 and sometimes python3 so its an intermittent failure. The proposed patch would increase the timeout for the can_connect() routine, but there are a number of test cases in test_auth.py which call this routine so why is it this one test where that call fails? If the root cause here was simply machine sluggishness I would expect the failures to occur semi-randomly across a variety of callers; the test case consistency makes me wonder if there is a legit bug.

The most notable thing test_curve_user_id() is doing differently from the other test cases is setting up CURVE-based authentication certs. I wouldn't think the mere tacking on of some certs would cause this much delay, and wonder if something is attempting a network lookup somewhere under the hood. The failure errors in the log file aren't detailed enough to explore fully, but digging through the code I got as far as zmq/backend/cffi/_poll.py:zmq_poll() which has a 'while True' loop that might be where things are getting stuck; I couldn't guess why, but maybe tracing into this (or adding debug print's) would verify it's hanging in this routine, and at what part.

If bumping the timeout doesn't eliminate the test failures entirely, it would probably be a good idea to ensure there's a bug report for this for further investigation. If it ends up being more than just a sluggish machine, then this could be worth raising with upstream - particularly if a reproducible test case can be found.

review: Approve
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

If this still fails, then I'm inclined to stop this test from running and ask for help from upstream indeed, as using the ubuntu archive to troubleshoot this is far from optimal. If only it happened more reliably locally.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Debian uploaded a -3 release, and we synced that automatically. It's failing again in excuses, and in ppc64el too now.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I rebased

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The status still is WIP, did the "I rebased" mean this is fully ready for re-review?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Not yet, I was trying a bit harder to reproduce the failures locally. I'll switch the status back to needs review when it's ready (even if there are no changes)

Revision history for this message
Andreas Hasenack (ahasenack) wrote :
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

But didn't try with the unchanged timeout, so I still don't know if bileto is a good enough replica of the real archive testing environment.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The change itself looks fine, if you can confirm the issue triggers with the unchanged timeout I think you are good to push this to Eoan and to Debian to get rid of the delta at some point.

OTOH Debian might suggest doing the same upstream instead, anyway up to you and probably fine for now.

review: Approve
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Since the target is in your repo I can't add back (and could only consume) canonical-server - please add it back yourself.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Reverting the timeout bump triggered a failure in bileto in the same curve_id test:

https://bileto.ubuntu.com/excuses/3763/eoan.html

\o/

I'm uploading with the timeout bump then.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

This is not in g-u, so no tag to push, just a regular upload:
$ dput ubuntu ../pyzmq_17.1.2-3ubuntu1_source.changes
Checking signature on .changes
gpg: ../pyzmq_17.1.2-3ubuntu1_source.changes: Valid signature from AC983EB5BF6BCBA9
Checking signature on .dsc
gpg: ../pyzmq_17.1.2-3ubuntu1.dsc: Valid signature from AC983EB5BF6BCBA9
Uploading to ubuntu (via ftp to upload.ubuntu.com):
  Uploading pyzmq_17.1.2-3ubuntu1.dsc: done.
  Uploading pyzmq_17.1.2-3ubuntu1.debian.tar.xz: done.
  Uploading pyzmq_17.1.2-3ubuntu1_source.buildinfo: done.
  Uploading pyzmq_17.1.2-3ubuntu1_source.changes: done.
Successfully uploaded packages.

That was fa3aeae065c912676f07abd8a50bcc6e5bf0dc9a

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/debian/changelog b/debian/changelog
2index 33dae4a..54e35df 100644
3--- a/debian/changelog
4+++ b/debian/changelog
5@@ -1,3 +1,10 @@
6+pyzmq (17.1.2-3ubuntu1) eoan; urgency=medium
7+
8+ * d/p/bump-poll-timeout-auth-test.patch: increase poll timeout to
9+ two seconds to avoid test failure on slow machines.
10+
11+ -- Andreas Hasenack <andreas@canonical.com> Tue, 23 Jul 2019 11:56:58 -0300
12+
13 pyzmq (17.1.2-3) unstable; urgency=medium
14
15 * Team upload.
16diff --git a/debian/control b/debian/control
17index 9ae5f9b..d3451e8 100644
18--- a/debian/control
19+++ b/debian/control
20@@ -1,7 +1,8 @@
21 Source: pyzmq
22 Section: python
23 Priority: optional
24-Maintainer: Debian Python Modules Team <python-modules-team@lists.alioth.debian.org>
25+Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
26+XSBC-Original-Maintainer: Debian Python Modules Team <python-modules-team@lists.alioth.debian.org>
27 Uploaders: Julian Taylor <jtaylor.debian@googlemail.com>,
28 Vincent Bernat <bernat@debian.org>,
29 Laszlo Boszormenyi (GCS) <gcs@debian.org>
30diff --git a/debian/patches/bump-poll-timeout-auth-test.patch b/debian/patches/bump-poll-timeout-auth-test.patch
31new file mode 100644
32index 0000000..655b7b3
33--- /dev/null
34+++ b/debian/patches/bump-poll-timeout-auth-test.patch
35@@ -0,0 +1,24 @@
36+Description: Bump poll timeout to avoid test error on slower machines
37+ Not forwarded yet because this is hard to reproduce outside of the Ubuntu
38+ autopkgtest infrastructure and we first want to see if it helps our case.
39+Author: Andreas Hasenack <andreas@canonical.com>
40+Forwarded: no
41+Last-Update: 2019-07-18
42+---
43+This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
44+diff --git a/zmq/tests/test_auth.py b/zmq/tests/test_auth.py
45+index 003f171..53596ca 100644
46+--- a/zmq/tests/test_auth.py
47++++ b/zmq/tests/test_auth.py
48+@@ -102,9 +102,9 @@ class TestThreadAuthentication(BaseAuthTestCase):
49+ port = server.bind_to_random_port(iface)
50+ client.connect("%s:%i" % (iface, port))
51+ msg = [b"Hello World"]
52+- if server.poll(1000, zmq.POLLOUT):
53++ if server.poll(2000, zmq.POLLOUT):
54+ server.send_multipart(msg)
55+- if client.poll(1000):
56++ if client.poll(2000):
57+ rcvd_msg = client.recv_multipart()
58+ self.assertEqual(rcvd_msg, msg)
59+ result = True
60diff --git a/debian/patches/series b/debian/patches/series
61index 7931f9e..9756f48 100644
62--- a/debian/patches/series
63+++ b/debian/patches/series
64@@ -2,3 +2,4 @@ noncopysend-test.patch
65 cffi-fix.patch
66 skip_large_send
67 fix_monitor_test.patch
68+bump-poll-timeout-auth-test.patch

Subscribers

People subscribed via source and target branches

to all changes: