mailman crashing in production

Bug #791492 reported by Robert Collins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Gary Poster

Bug Description

|| <<DateTime(2011-05-25T22:16:57Z)>> || - || mailman xmlrpc log not updating - restarted mailman || ||

Related branches

tags: added: canonical-losa-lp
Revision history for this message
Gary Poster (gary) wrote :

This incident is in the very last logs available from forster, so I'm copying the log output here, replacing the names of the email addresses with "XXX".

May 25 20:10:27 2011 (14860) Membership updates for ubuntu-tour: ['<email address hidden> -> <email address hidden>']
May 25 22:15:34 2011 (14860) Cannot talk to Launchpad: [Errno 4] Interrupted system call
May 25 22:15:34 2011 (14860) batch: ['rosalila-studio']
May 25 22:15:35 2011 (14860) Cannot talk to Launchpad: [Errno 4] Interrupted system call
May 25 22:15:35 2011 (14860) batch: ['ubuntu-ba']
May 25 22:15:47 2011 (14860) Cannot talk to Launchpad: [Errno 4] Interrupted system call
May 25 22:15:47 2011 (14860) batch: ['ubuntu-server-ec2-testing-notifications']
May 25 22:15:53 2011 (14860) Membership updates for openstreetmap: ['<email address hidden> -> <email address hidden>']
May 25 22:17:02 2011 (17848) XMLRPC runner starting

Revision history for this message
Gary Poster (gary) wrote :

I looked through a number of launchpad log files to try and make a connection to the "Cannot talk to Launchpad" error. I didn't see any. FTR, here's where I looked.

gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad135.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad134.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad133.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad132.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad131.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad130.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad129.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad128.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad127.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad126.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad125.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad124.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad123.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/soybean/launchpad122.log-20110521.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/chaenomeles/launchpad55.log-20110525.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/chaenomeles/launchpad54.log-20110525.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/chaenomeles/launchpad53.log-20110525.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/chaenomeles/launchpad52.log-20110525.bz2 | less
gary@carob:~$ bunzip2 -c /srv/launchpad.net-logs/production/chaenomeles/launchpad51.log-20110525.bz2 | less

I could have gone further, and maybe looking at more would show some kind of hang. However, it seems more likely that this is some kind of network hang. Access logs seem to show that we are serving files just fine at the time of the top log entry in comment 1.

In any case, it's likely that we simply need to make mailman have a timeout in its LP connections. The "Cannot talk to Launchpad" string appears repeatedly in lib/lp/services/mailman/monkeypatches/xmlrpcrunner.py . I think a simple fix for this is to add a timeout to the xmlrpclib ServerProxy instance, using the "transport" argument. That's what I plan to do.

Changed in launchpad:
assignee: nobody → Gary Poster (gary)
Gary Poster (gary)
Changed in launchpad:
status: Triaged → In Progress
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Gary Poster (gary)
tags: added: qa-ok
removed: qa-needstesting
Revision history for this message
Gary Poster (gary) wrote :

The code for this has been deployed to production appserver machines, but not to the Mailman machines. I'll make an RT for this.

Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.