dpkg hangs during sync under 2.6.32-25-server kernel

Bug #675613 reported by Tony Travis
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Binary package hint: coreutils

During the past two weeks, "dpkg" has hung during unattended upgrades under the 2.6.32-25-server kernel on two systems that have not previously reported any errors. The "dpkg" process goes catatonic, and the kernel hung process detector shows "dpkg" attempting the sync the filesystem. Running "sync" manually after "dpkg" fails the same way: The process ceases to respond to any signals, and becomes a zombie. The only way to recover the system is to reboot:

[ 4080.400070] INFO: task dpkg:21019 blocked for more than 120 seconds.
[ 4080.441107] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4080.482752] dpkg D 0000000000000000 0 21019 20910 0x00000000
[ 4080.482757] ffff88027ac9bd38 0000000000000086 0000000000015bc0 0000000000015bc0
[ 4080.482762] ffff88027749df80 ffff88027ac9bfd8 0000000000015bc0 ffff88027749dbc0
[ 4080.482766] 0000000000015bc0 ffff88027ac9bfd8 0000000000015bc0 ffff88027749df80
[ 4080.482770] Call Trace:
[ 4080.482780] [<ffffffff8155841d>] schedule_timeout+0x22d/0x300
[ 4080.482785] [<ffffffff8105df42>] ? enqueue_entity+0x122/0x1a0
[ 4080.482788] [<ffffffff8105e005>] ? enqueue_task_fair+0x45/0x90
[ 4080.482792] [<ffffffff815576c6>] wait_for_common+0xd6/0x180
[ 4080.482797] [<ffffffff8105a254>] ? try_to_wake_up+0x284/0x380
[ 4080.482800] [<ffffffff8105a350>] ? default_wake_function+0x0/0x20
[ 4080.482804] [<ffffffff8155782d>] wait_for_completion+0x1d/0x20
[ 4080.482808] [<ffffffff811666b7>] sync_inodes_sb+0x87/0xb0
[ 4080.482812] [<ffffffff8116af92>] __sync_filesystem+0x82/0x90
[ 4080.482815] [<ffffffff8116b079>] sync_filesystems+0xd9/0x130
[ 4080.482818] [<ffffffff8116b131>] sys_sync+0x21/0x40
[ 4080.482823] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: coreutils 7.4-2ubuntu3
ProcVersionSignature: Ubuntu 2.6.32-25.45-server 2.6.32.21+drm33.7
Uname: Linux 2.6.32-25-server x86_64
Architecture: amd64
Date: Mon Nov 15 15:26:55 2010
ExecutablePath: /bin/sync
InstallationMedia: Bio-Linux 6 based on Ubuntu 10.04 "Lucid Lynx" - Release Candidate amd64 (20100419.1)
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
SourcePackage: coreutils

Revision history for this message
Tony Travis (ajtravis) wrote :
Revision history for this message
Tony Travis (ajtravis) wrote :

After a long delay, the manual "sync" eventually returns to the command prompt and "dpkg" can be used again.

Revision history for this message
Tony Travis (ajtravis) wrote :

The "sync" command then works as expected, and returns to the command prompt after a few seconds.

Revision history for this message
C de-Avillez (hggdh2) wrote :

Thank you for opening this bug and helping make Ubuntu better.

I am confused: you report an issue with dpkg, assign the package as coreutils, and show a kernel OOPS. I am pretty sure coreutils has nothing to do with it...

Changed in coreutils (Ubuntu):
status: New → Incomplete
Revision history for this message
Tony Travis (ajtravis) wrote : Re: [Bug 675613] Re: dpkg hangs during sync under 2.6.32-25-server kernel

On 15/11/10 19:21, C de-Avillez wrote:
> Thank you for opening this bug and helping make Ubuntu better.
>
> I am confused: you report an issue with dpkg, assign the package as
> coreutils, and show a kernel OOPS. I am pretty sure coreutils has
> nothing to do with it...
>
> ** Changed in: coreutils (Ubuntu)
> Status: New => Incomplete

Sorry my report was confusing: The "dpkg" command hung up during an
unattended upgrade. I could not kill the "dpkg" process, and after
rebooting I encountered the same problem again but this time with a
different package being installed. I encountered a very similar problem
on a colleague's machine running the same 2.6.32-25-server kernel.

The kernel messages indicate that the "dpkg" process was attempting to
sync the filesystem, so I tried running "sync" manually to see if that
would work. The sync process blocked when a zombie "dpkg" was present
but, after an hour or so, it eventually completed and I could use "dpkg"
again. I reported it as an issue with coreutils because "sync" hung when
"dpkg" is blocked. I'm not sure if the bug is in "dpkg" or if it's a
kernel issue: Similar bugs have been filed about "dpkg" hangs during
package installation, but AFAIK none have linked this problem to the
"sync" command or Kernel 'sync' issues.

HTH,

   Tony.
--
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:<email address hidden>, http://bioinformatics.rri.sari.ac.uk/~ajt

Revision history for this message
C de-Avillez (hggdh2) wrote :

Hi Tony,

Yes, it does help :-)

Now, dpkg is a binary -- meaning the 'sync' used inside dpkg is not equivalent to the coreutils' 'sync' utility. It may even be that both dpkg and sync are using a fsync() call (or equivalent) -- but this does not make it a coreutils bug, it only shows, so far, the same basic issue.

Also, the kernel OOPS trace suggests the *kernel* got hung while performing a sys_sync() call (and yes, I agree that it is probable both coreutils and dpkg would end up in the same kernel path). So... whatever happened seems to have happened below either utility.

I wonder if you have remote filesystems, or some sort of disc/filesystem issue. It may be a good move to run apport-collect linux 675613' to collect kernel data.

Revision history for this message
Tony Travis (ajtravis) wrote :

On 15/11/10 20:35, C de-Avillez wrote:
> Hi Tony,
>
> Yes, it does help :-)
>
> Now, dpkg is a binary -- meaning the 'sync' used inside dpkg is not
> equivalent to the coreutils' 'sync' utility. It may even be that both
> dpkg and sync are using a fsync() call (or equivalent) -- but this does
> not make it a coreutils bug, it only shows, so far, the same basic
> issue.

Hi,

Yes, I realise that the "dpkg" and "sync" binaries make system calls
into the kernel but I thought it might be appropriate to report a bug
with "coreutils" because of the interaction between "sync" and "dpkg".

> Also, the kernel OOPS trace suggests the *kernel* got hung while
> performing a sys_sync() call (and yes, I agree that it is probable both
> coreutils and dpkg would end up in the same kernel path). So... whatever
> happened seems to have happened below either utility.

OK, I should have reported this as a kernel bug not a "coreutils" bug.

> I wonder if you have remote filesystems, or some sort of disc/filesystem
> issue. It may be a good move to run apport-collect linux 675613' to
> collect kernel data.

I'm running the nfs-kernel server, but the systems concerned have all
been running without any "sync" or "dpkg" problems until just recently.

An important point is that the coreutils "sync" command works properly
on my server unless "dpkg" is deadlocked after a timeout attempting to
sync() the filesystem, in which case the coreutils "sync" then blocks.

Bye,

   Tony.
--
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:<email address hidden>, http://bioinformatics.rri.sari.ac.uk/~ajt

Revision history for this message
C de-Avillez (hggdh2) wrote :

I just noticed you are running bio-linux, *not* Ubuntu itself. I am not sure what changes the bio-linux providers have made in regard to a standard Ubuntu distro.

It is extremely important you contact the bio-linux providers: I do not know if they have made changes to the kernel, and I am afraid we will not be able to help you.

I tried looking at their website, but I did not find any way to report issues, except by an email to <email address hidden>.

Revision history for this message
Tony Travis (ajtravis) wrote :

On 16/11/10 00:07, C de-Avillez wrote:
> I just noticed you are running bio-linux, *not* Ubuntu itself. I am not
> sure what changes the bio-linux providers have made in regard to a
> standard Ubuntu distro.

Hi,

Bio-Linux is a remastered version of the standard 64-bit Ubuntu 10.04
LTS Desktop CD, with FLOSS bioinformatics software pre-installed.

The system I had the problem with "dpkg" and "sync" on is a standard
Bio-Linux (Ubuntu 10.04 LTS) Desktop install, with the server kernel and
appropriate server daemons (NFS, SSH etc.) installed to convert it into
a Bio-Linux server.

However, my colleague who had the same problem with "dkpk" and "sync" is
running a standard Ubuntu 10.04 LTS server install - NOT Bio-Linux.

> It is extremely important you contact the bio-linux providers: I do not
> know if they have made changes to the kernel, and I am afraid we will
> not be able to help you.

I'm one of the people involved in the development of Bio-Linux, and I'm
running a standard Ubuntu 10.04 LTS server kernel with NO modifications.

> I tried looking at their website, but I did not find any way to report
> issues, except by an email to <email address hidden>.

OK, I've cc'ed this email to the list: Let's close the bug report.

My system is working again and if the problem returns I'll report it as
a kernel bug instead. I'm not the only one having this type of problem
with "dpkg" stalling, so there might be an underlying kernel problem.

Thanks for your help,

   Tony.
--
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:<email address hidden>, http://bioinformatics.rri.sari.ac.uk/~ajt

Revision history for this message
C de-Avillez (hggdh2) wrote :

I would keep this bug open; now that you have clarified the bio-linux setup, there should not be an issue keeping it open. Now, I do hope you agree that the bio-linux pages are sort of lacking on details -- most importantly, I could not find a reference to dealing with bugs.

I will assign this bug to the linux kernel, although I am still not sure (it is certainly better than coreutils, and marginally better than dpkg). I think NFS is playing a role here, so it might be worth to check with your colleague if her/his setup is similar (re. remote FSs).

On a new stall, please:

1. run 'sudo ubuntu-bug -P $(pidof dpkg)' # replace 'dpkg' by whatever PID stalling;
2. open a *NEW* bug with the output of the command above, and add a link to this one;
3. run 'sudo apport-collect 675613' -- this will collect kernel data on this bug.

It is possible that the dpkg backtrace will give us a bit more on what was being done; it is also possible the the collected kernel data (logs, dmseg, filesystems, etc) will help. I have a vague memory of something similar...

affects: coreutils (Ubuntu) → linux (Ubuntu)
Revision history for this message
Tony Travis (ajtravis) wrote :

On 16/11/10 15:32, C de-Avillez wrote:
> I would keep this bug open; now that you have clarified the bio-linux
> setup, there should not be an issue keeping it open. Now, I do hope you
> agree that the bio-linux pages are sort of lacking on details -- most
> importantly, I could not find a reference to dealing with bugs.
>
> I will assign this bug to the linux kernel, although I am still not sure
> (it is certainly better than coreutils, and marginally better than
> dpkg). I think NFS is playing a role here, so it might be worth to check
> with your colleague if her/his setup is similar (re. remote FSs).
>
> On a new stall, please:
>
> 1. run 'sudo ubuntu-bug -P $(pidof dpkg)' # replace 'dpkg' by whatever PID stalling;

Hi,

The "sync" command (binary) was invoked by my backup script, and stalled
in a similar way to my coreutils bug report 675613. Attempting to run
the "sync" binary manually, after the "sync" started by the backup strip
stalled resulted in two stalled "sync" processes:

> root 19701 0.0 0.0 6116 848 ? Ss 07:30 0:00 anacron -s
> root 19821 0.0 0.0 4096 584 ? S 07:35 0:00 \_ /bin/sh -c nice run-parts --report /etc/cron.daily
> root 19822 0.0 0.0 4004 644 ? SN 07:35 0:00 \_ run-parts --report /etc/cron.daily
> root 20206 0.0 0.0 9228 1332 ? SN 07:47 0:00 \_ /bin/bash /etc/cron.daily/backup
> root 20227 0.0 0.0 4020 472 ? DN 07:47 0:00 \_ sync

> 2. open a *NEW* bug with the output of the command above, and add a link to this one;

OK, done.

> 3. run 'sudo apport-collect 675613' -- this will collect kernel data on this bug.

I tried, but it didn't work:

> root@bobcat:~# apport-collect 675613
> The authorization page:
> (https://edge.launchpad.net/+authorize-token?oauth_token=Fz0q8TJ4JHv4XFpJlvsH&allow_permission=WRITE_PRIVATE)
> should be opening in your browser. After you have authorized
> this program to access Launchpad on your behalf you should come
> back here and press <Enter> to finish the authentication process.
>
> Error connecting to Launchpad: Request token has not yet been reviewed. Try again later.
> You can reset the credentials by removing the file "/root/.cache/apport/launchpad.credentials"
> root@bobcat:~# ls -l /root/.cache/apport/
> total 0

>
> It is possible that the dpkg backtrace will give us a bit more on what
> was being done; it is also possible the the collected kernel data (logs,
> dmseg, filesystems, etc) will help. I have a vague memory of something
> similar...
>
> ** Package changed: coreutils (Ubuntu) => linux (Ubuntu)

Bye,

   Tony.
--
Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition
and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK
tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk
mailto:<email address hidden>, http://bioinformatics.rri.sari.ac.uk/~ajt

Revision history for this message
Tony Travis (ajtravis) wrote :

I think the same sync() problem underlies "dpkg" problems reported in [Bug 537241]

Revision history for this message
Tony Travis (ajtravis) wrote :

And also sync() problems in [Bug 624229]

Revision history for this message
C de-Avillez (hggdh2) wrote :

yes, it does sound like it. I am trying to find a common thread on all the bugs dealing with a similar scenario. I have so far found some, spread on many different packages. I do feel like a remote FS (or a USB-attached device?) is related on all.

tags: added: sync
Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

This bug report was marked as Incomplete and has not had any updated comments for quite some time. As a result this bug is being closed. Please reopen if this is still an issue in the current Ubuntu development release http://cdimage.ubuntu.com/daily-live/current/ . Also, please be sure to provide any requested information that may have been missing. To reopen the bug, click on the current status under the Status column and change the status back to "New". Thanks.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: kj-expired
Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.