linaro-media-create crashes during media setup/partitioning

Bug #717129 reported by Zygmunt Krynicki
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Linaro Image Tools
Fix Released
High
Georgy Redkozubov

Bug Description

zyga@fx-maverick:~/Downloads$ linaro-media-create --mmc /dev/sdd --binary=linaro-m-headless-tar-20101108-2.tar.gz --hwpack=hwpack_linaro-omap3_20101109-1_armel_supported.tar.gz --hwpack-force-yes --dev=beagle

I see...
Device Mount point Size
/dev/fd0 none 0MB
/dev/sr0 none 0MB
/dev/sda1 / 39315MB
/dev/sda2 none 1639MB
/dev/sdd none 3781MB
/dev/sde none 0MB
/dev/sda none 40960MB
/dev/sdf none 0MB
/dev/sdb none 0MB
/dev/sdd1 /media/OMAP3 117MB
/dev/sdc none 0MB
Are you 100% sure, on selecting [/dev/sdd] (y/n)? y
[sudo] password for zyga:
------------------------------------------------------------
Installing (apt-get) hwpack_linaro-omap3_20101109-1_armel_supported.tar.gz in target rootfs.
Unpacking hardware pack ...Done
Updating apt package lists ...
Ign file: ./ Release.gpg
Ign file:/tmp/tmp.eWrazkCztj/unpacked/pkgs/ ./ Translation-en
Ign file: ./ Release
Ign file: ./ Packages
Get:1 http://ppa.launchpad.net maverick Release.gpg [316B]
Ign http://ppa.launchpad.net/linaro-maintainers/overlay/ubuntu/ maverick/main Translation-en
Hit http://ports.ubuntu.com maverick Release.gpg
Ign http://ports.ubuntu.com/ maverick/main Translation-en
Get:2 http://ppa.launchpad.net maverick Release [39.8kB]
Ign http://ports.ubuntu.com/ maverick/universe Translation-en
Get:3 http://ports.ubuntu.com maverick-security Release.gpg [198B]
Ign http://ports.ubuntu.com/ maverick-security/main Translation-en
Ign http://ports.ubuntu.com/ maverick-security/universe Translation-en
Get:4 http://ports.ubuntu.com maverick-updates Release.gpg [198B]
Ign http://ports.ubuntu.com/ maverick-updates/main Translation-en
Ign http://ports.ubuntu.com/ maverick-updates/universe Translation-en
Hit http://ports.ubuntu.com maverick Release
Get:5 http://ports.ubuntu.com maverick-security Release [31.4kB]
Get:6 http://ports.ubuntu.com maverick-updates Release [31.4kB]
Hit http://ports.ubuntu.com maverick/main armel Packages
Hit http://ports.ubuntu.com maverick/universe armel Packages
Get:7 http://ppa.launchpad.net maverick/main armel Packages [11.5kB]
Get:8 http://ports.ubuntu.com maverick-security/main armel Packages [80.9kB]
Get:9 http://ports.ubuntu.com maverick-security/universe armel Packages [46.6kB]
Get:10 http://ports.ubuntu.com maverick-updates/main armel Packages [222kB]
Get:11 http://ports.ubuntu.com maverick-updates/universe armel Packages [95.4kB]
Fetched 560kB in 10s (52.8kB/s)
Reading package lists...
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following NEW packages will be installed:
  devio flash-kernel linux-firmware linux-image-2.6.35-1008-linaro-omap linux-image-linaro-omap u-boot-linaro-omap3-beagle uboot-mkimage
  wireless-crda x-loader-omap
0 upgraded, 9 newly installed, 0 to remove and 28 not upgraded.
Need to get 0B/30.6MB of archives.
After this operation, 0B of additional disk space will be used.
WARNING: The following packages cannot be authenticated!
  wireless-crda linux-image-2.6.35-1008-linaro-omap devio flash-kernel linux-firmware linux-image-linaro-omap u-boot-linaro-omap3-beagle
  uboot-mkimage x-loader-omap
debconf: delaying package configuration, since apt-utils is not installed
Can not write log, openpty() failed (/dev/pts not mounted?)
Selecting previously deselected package wireless-crda.
(Reading database ... 9522 files and directories currently installed.)
Unpacking wireless-crda (from .../wireless-crda_1.12_armel.deb) ...
Selecting previously deselected package linux-image-2.6.35-1008-linaro-omap.
Unpacking linux-image-2.6.35-1008-linaro-omap (from .../linux-image-2.6.35-1008-linaro-omap_2.6.35-1008.15_armel.deb) ...
Done.
Selecting previously deselected package devio.
Unpacking devio (from .../devio_1.2-1build1_armel.deb) ...
Selecting previously deselected package flash-kernel.
Unpacking flash-kernel (from .../flash-kernel_2.28ubuntu10_armel.deb) ...
Selecting previously deselected package linux-firmware.
Unpacking linux-firmware (from .../linux-firmware_1.38_all.deb) ...
Selecting previously deselected package linux-image-linaro-omap.
Unpacking linux-image-linaro-omap (from .../linux-image-linaro-omap_2.6.35.1008.13_armel.deb) ...
Selecting previously deselected package u-boot-linaro-omap3-beagle.
Unpacking u-boot-linaro-omap3-beagle (from .../u-boot-linaro-omap3-beagle_2010.09~rc1.1-0ubuntu3_armel.deb) ...
Selecting previously deselected package uboot-mkimage.
Unpacking uboot-mkimage (from .../uboot-mkimage_0.4build1_armel.deb) ...
Selecting previously deselected package x-loader-omap.
Unpacking x-loader-omap (from .../x-loader-omap_1.4.4git20100713-1ubuntu2_armel.deb) ...
Can not write log, openpty() failed (/dev/pts not mounted?)
Setting up wireless-crda (1.12) ...
Setting up linux-image-2.6.35-1008-linaro-omap (2.6.35-1008.15) ...
Running depmod.
update-initramfs: Generating /boot/initrd.img-2.6.35-1008-linaro-omap
/bin/df: Warning: cannot read table of mounted file systems: No such file or directory
Examining /etc/kernel/postinst.d.
run-parts: executing /etc/kernel/postinst.d/initramfs-tools 2.6.35-1008-linaro-omap /boot/vmlinuz-2.6.35-1008-linaro-omap
Setting up devio (1.2-1build1) ...
Setting up flash-kernel (2.28ubuntu10) ...
Setting up linux-firmware (1.38) ...
Setting up linux-image-linaro-omap (2.6.35.1008.13) ...
Setting up u-boot-linaro-omap3-beagle (2010.09~rc1.1-0ubuntu3) ...
Setting up uboot-mkimage (0.4build1) ...
Setting up x-loader-omap (1.4.4git20100713-1ubuntu2) ...
Done
Cleaning up ...Done
------------------------------------------------------------
proc umounted
Checking that no-one is using this disk right now ...
OK
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
saTraceback (most recent call last):
  File "/usr/bin/linaro-media-create", line 127, in <module>
    args.should_format_rootfs)
  File "/usr/lib/pymodules/python2.6/linaro_media_create/partitions.py", line 75, in setup_partitions
    media, board_config)
  File "/usr/lib/pymodules/python2.6/linaro_media_create/partitions.py", line 204, in get_boot_and_root_partitions_for_media
    media.path, 1 + board_config.mmc_part_offset)
  File "/usr/lib/pymodules/python2.6/linaro_media_create/partitions.py", line 221, in _get_device_file_for_partition_number
    device_path = _get_udisks_device_path(dev_file)
  File "/usr/lib/pymodules/python2.6/linaro_media_create/partitions.py", line 236, in _get_udisks_device_path
    return udisks.get_dbus_method('FindDeviceByDeviceFile')(device)
  File "/usr/lib/pymodules/python2.6/dbus/proxies.py", line 68, in __call__
    return self._proxy_method(*args, **keywords)
  File "/usr/lib/pymodules/python2.6/dbus/proxies.py", line 140, in __call__
    **keywords)
  File "/usr/lib/pymodules/python2.6/dbus/connection.py", line 620, in call_blocking
    message, timeout)
dbus.exceptions.DBusException: org.freedesktop.UDisks.Error.Failed: No such device

Related branches

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This was done from the PPA, package version was 0.4.3-0ubuntu2~linaro1

Revision history for this message
Zygmunt Krynicki (zyga) wrote :
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This time the failure is slightly different. I used a different SD card but this issue occurred on the previous card as well.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Another failure session, this time the error is:

Traceback (most recent call last):
  File "/usr/bin/linaro-media-create", line 127, in <module>
    args.should_format_rootfs)
  File "/usr/lib/pymodules/python2.6/linaro_media_create/partitions.py", line 75, in setup_partitions
    media, board_config)
  File "/usr/lib/pymodules/python2.6/linaro_media_create/partitions.py", line 208, in get_boot_and_root_partitions_for_media
    "Could not find boot/root partition for %s" % media.path)
AssertionError: Could not find boot/root partition for /dev/sdd

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Successful session on the same 16GB SD card.

Revision history for this message
Guilherme Salgado (salgado) wrote :

The first issue reported here is really a race condition: we create the partitions using sfdisk but before udisks get notified about them we query udisks for information about them, which fails. This is related to bug 701678, but in that case we could easily change the code to no longer use udisks.

Here we might have to either use udisks for partitioning (if we do that we must do it only when formatting SD cards because we must not use udisks when building image files) or wait for udisks signals after the partitions are created and before we proceed. Although in the meantime we could just increase the sleep() in create_partitions().

Changed in linaro-image-tools:
status: New → Triaged
importance: Undecided → High
Revision history for this message
AJ ONeal (coolaj86) wrote :

The one-second sleep fix didn't work for me. I set it to 30 seconds so that I had time to remove, reinsert, and un-automount my sd card.

Now it seems to be working.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I added a second timeout in my branch, this helped (note, the discussion indicates that there are two racy places, prolonging the delay in just one of those places does not fix my problem, I had to add another delay in different spot)

https://code.launchpad.net/~zkrynicki/linaro-image-tools/hacking

Revision history for this message
Peter Pearse (peter-pearse) wrote :

Here's another one

mkfs.vfat 3.0.9 (31 Jan 2010)
Traceback (most recent call last):
  File "/work/linaro-image-tools/linaro-media-create", line 135, in <module>
    rootfs_uuid = get_uuid(root_partition)
  File "/work/linaro-image-tools/linaro_media_create/partitions.py", line 120, in get_uuid
    blkid_output, _ = proc.communicate()
  File "/usr/lib/python2.7/subprocess.py", line 737, in communicate
    self.wait()
  File "/work/linaro-image-tools/linaro_media_create/cmd_runner.py", line 62, in wait
    raise SubcommandNonZeroReturnValue(self._my_args, returncode)
linaro_media_create.cmd_runner.SubcommandNonZeroReturnValue: Sub process "['blkid', '-o', 'udev', '-p', '-c', '/dev/null', '/dev/sdc2']" returned a non-zero value: 2

Log attached

I'll try Zygmunt's branch

Revision history for this message
Peter Pearse (peter-pearse) wrote :

Same error.

(like the media indicator Zygmunt ;-) )

Checking that no-one is using this disk right now ...
OK
Warning: partition 1 does not end at a cylinder boundary
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
Waiting for 5 seconds after partitioning

Formating boot partition

mkfs.vfat 3.0.9 (31 Jan 2010)
Traceback (most recent call last):
  File "/work/hacking/linaro-media-create", line 135, in <module>
    rootfs_uuid = get_uuid(root_partition)
  File "/work/hacking/linaro_media_create/partitions.py", line 120, in get_uuid
    blkid_output, _ = proc.communicate()
  File "/usr/lib/python2.7/subprocess.py", line 737, in communicate
    self.wait()
  File "/work/hacking/linaro_media_create/cmd_runner.py", line 62, in wait
    raise SubcommandNonZeroReturnValue(self._my_args, returncode)
linaro_media_create.cmd_runner.SubcommandNonZeroReturnValue: Sub process "['sudo', '-E', 'blkid', '-o', 'udev', '-p', '-c', '/dev/null', '/dev/sdc2']" returned a non-zero value: 2

Revision history for this message
Peter Pearse (peter-pearse) wrote :

Successful with Zygmunt's branch.

Revision history for this message
Loïc Minier (lool) wrote :

Peter, what's your linaro-image-tools version and how do you run linaro-media-create?

Thanks,

Revision history for this message
Zygmunt Krynicki (zyga) wrote : Re: [Bug 717129] Re: linaro-media-create crashes during media setup/partitioning

W dniu 11.03.2011 11:37, Peter Pearse pisze:
> Same error.
>
> (like the media indicator Zygmunt ;-) )

Thanks, I added that after buying 51-in-one card reader with integrated
hub and getting a ton of devices with funky names.

I need to clean up a part of my branch related to that and push it to
Salgado again.

BTW: Peter, you could try to increase the timeout to 30 seconds and
checking if that helps you (you have to edit the method signatures in
partitions.py, it's easy to find.

Thanks
ZK

Revision history for this message
Peter Pearse (peter-pearse) wrote :

Loic - from the log:-

DistUtilsExtra.auto.setup(
        name="linaro-image-tools",
        version="0.4.3.1",
        description="Tools to create and write Linaro images",
        url="https://launchpad.net/linaro-image-tools",
        license="GPL v3 or later",
        author='Linaro Infrastructure team',
        <email address hidden>",

        scripts=[
            "linaro-hwpack-create", "linaro-hwpack-install",
            "linaro-media-create"],
     )

root@ubuntu:/work/linaro-image-tools# bzr update >> /work/images/nano/20110310-0/lmc.log
Tree is up to date at revision 296 of branch /work/linaro-image-tools

linaro-media-create --mmc /dev/sdc --no-rootfs --dev vexpress --console ttyAMA0,38400n8 --binary linaro-natty-*.tar.gz --hwpack hwpack_linaro-vexpress_*_armel_supported.tar.gz

Command

Revision history for this message
Peter Pearse (peter-pearse) wrote :

Whoops - please ignore the success report above - I was running with --image_file.......

Revision history for this message
Matt Waddel (mwaddel) wrote :

Peter, the failure you're seeing looks suspiciously similar to bug #729467. I believe Loïc found it was related to the --no-rootfs option and using pre-formated cards. If you run linaro-media-create without the --no-rootfs option, then run it again it doesn't fail.

Revision history for this message
Peter Pearse (peter-pearse) wrote :

Well I prefer Zygmunt's fix - works fine on two cards (both pre-formatted)

Revision history for this message
Loïc Minier (lool) wrote :

Peter, I don't think you're using --no-rootfs, but just to confirm: are you?

Revision history for this message
Peter Pearse (peter-pearse) wrote :

Loic

linaro-media-create --mmc /dev/sdc --no-rootfs --dev vexpress --console ttyAMA0,38400n8 --binary linaro-natty-*.tar.gz --hwpack hwpack_linaro-vexpress_*_armel_supported.tar.gz

Works fine with 30sec delay as per zygmunt's branch

Revision history for this message
Alexander Sack (asac) wrote :

have I asked whether running # sudo udisks --inhibit in a second terminal helps?

try:
1. start lmc
2. when lmc asks to confirm parttion layout, run sudo udisks --inhibit command above in new terminal window (don't hit ctrl-c)
3. continue with lmc

Revision history for this message
Mattias Backman (mabac) wrote :

Hi all,

I've tried to understand this problem, let's see if I'm anywhere close.

So there are two places in create_partitions where device operations take an unknown amount of time to complete, still after the command has returned. One place is after running parted, which zyga indicates with the sleep there, the other is after the sfdisk call.

It seems to me that the proper way to handle this would be to wait for a signal that everything is done. I see that there have been a few suggestions for waiting for the device to settle or get a signal back but I guess they don't work since there's no consensus around any method? (udevadm settle was mentioned to not work in https://bugs.launchpad.net/linaro-image-tools/+bug/701678/comments/2)

Some of the failures happen when running udisks in get_boot_and_root_partitions_for_media so some way of checking that udisks have gotten notification about partition changes would help in that case. Still wouldn't help when the sfdisk call fails after parted though. So perhaps this can't be solved by a single neat fix.

A crude way around this would be to just retry the sfdisk command if it raises an Error and give up after MAX_RETRIES attempts. The same could be done for the calls to udisks methods. It's similar to the sleeps but at least won't wait longer than necessary.

Thanks,

Mattias

Revision history for this message
Loïc Minier (lool) wrote : Re: [Bug 717129] Re: linaro-media-create crashes during media setup/partitioning

On Mon, Mar 28, 2011, Mattias Backman wrote:
> Some of the failures happen when running udisks in
> get_boot_and_root_partitions_for_media so some way of checking that
> udisks have gotten notification about partition changes would help in
> that case. Still wouldn't help when the sfdisk call fails after parted
> though. So perhaps this can't be solved by a single neat fix.
>
> A crude way around this would be to just retry the sfdisk command if it
> raises an Error and give up after MAX_RETRIES attempts. The same could
> be done for the calls to udisks methods. It's similar to the sleeps but
> at least won't wait longer than necessary.

 I think you actually need both, for both cases of "unknown time is
 needed for operation to complete"; unfortunately, the sfdisk one is
 worth than you imagine: it seems sometimes the SD card NEVER gets back
 to ready state:
    http://lists.linaro.org/pipermail/linaro-dev/2011-March/003182.html
 the thread has the details.

 I think it would be worthwhile to have a sfdisk -l loop after parted
 mklabel runs, perhaps also when changing the partition tables with
 sfdisk later on.

 Another thing we could try is whether parted does something specific
 which other partitioning tools don't; for instance, perhaps Shawn's USB
 reader breaks in the case of some special USB command triggered by some
 special ioctl() which is optional.

 A completely safe way to deliver the functionality would be to change
 linaro-image-tools to always generate an image file and then write it
 to the SD with a single dd run.

 Thanks a lot for your work on this!

--
Loïc Minier

Revision history for this message
Guilherme Salgado (salgado) wrote :

In the case of the race between parted/sfdisk, I think Loïc managed to reproduce it reliably so there seems to be an inherent race there, but maybe there's a way to have the changes done by parted flushed before we run sfdisk? If not, I'd expect a very short sleep to be enough.

The other case involves udisks, so I think we either need to do all disk operations using udev (if we don't do that we'll probably end up with a race somewhere else) or wait (somehow) until udev is notified about the disk changes.

Revision history for this message
Guilherme Salgado (salgado) wrote :

On Mon, 2011-03-28 at 15:56 +0000, Loïc Minier wrote:
[...]
> A completely safe way to deliver the functionality would be to change
> linaro-image-tools to always generate an image file and then write it
> to the SD with a single dd run.

I think this would simplify a bunch of things, allow us to get rid of
some dependencies (e.g. udisks) and make lmc more robust in general.

There's also the fact that when lmc fails to write the image to the SD
card we would then be able to point the user to the created image file
and they could dd it again to the current card or a different one --
that saves them having to re-run everything because of a flaky card or
bug in lmc.

And I don't think it'd be much work to implement this.

Revision history for this message
Loïc Minier (lool) wrote :

On Mon, Mar 28, 2011, Guilherme Salgado wrote:
> In the case of the race between parted/sfdisk, I think Loïc managed to
> reproduce it reliably so there seems to be an inherent race there, but
> maybe there's a way to have the changes done by parted flushed before we
> run sfdisk? If not, I'd expect a very short sleep to be enough.

 Well, my analysis might be incorrect, but my understanding of the issue
 is that on some combinations of (system, USB SD adapater, SD card) you
 can't access your SD card anymore after "parted mklabel msdos"; I don't
 think the fact we're calling sfdisk does anything, it just fails
 "forever". But perhaps other tools than parted wouldn't cause this,
 not sure. I actually don't have a way to reproduce this, but Shawn Guo
 does.

--
Loïc Minier

Revision history for this message
Loïc Minier (lool) wrote :

On Mon, Mar 28, 2011, Guilherme Salgado wrote:
> On Mon, 2011-03-28 at 15:56 +0000, Loïc Minier wrote:
> [...]
> > A completely safe way to deliver the functionality would be to change
> > linaro-image-tools to always generate an image file and then write it
> > to the SD with a single dd run.
>
> I think this would simplify a bunch of things, allow us to get rid of
> some dependencies (e.g. udisks) and make lmc more robust in general.

 awesome, I filed bug #744449 to not forget about this change

--
Loïc Minier

Revision history for this message
Guilherme Salgado (salgado) wrote :

On Mon, 2011-03-28 at 17:53 +0000, Loïc Minier wrote:
[...]
> Well, my analysis might be incorrect, but my understanding of the issue
> is that on some combinations of (system, USB SD adapater, SD card) you
> can't access your SD card anymore after "parted mklabel msdos"; I don't
> think the fact we're calling sfdisk does anything, it just fails
> "forever". But perhaps other tools than parted wouldn't cause this,
> not sure. I actually don't have a way to reproduce this, but Shawn Guo
> does.

Right, I think I misunderstood some things in the thread where you
proposed running "parted [...] && sfdisk [...]", but I've read it again
and now it's clear.

Although in Zygmunt's case there must've been a race because otherwise
the sleep() he added between parted/sfdisk wouldn't have helped.

Revision history for this message
Loïc Minier (lool) wrote :

On Mon, Mar 28, 2011, Guilherme Salgado wrote:
> Although in Zygmunt's case there must've been a race because otherwise
> the sleep() he added between parted/sfdisk wouldn't have helped.

 Yup; in fact, Shawn's testing shows both cases where sleeping would
 help and cases where it wouldn't; it would be best if we could find a
 solution around the latter as I think it would also help with the
 former.

--
Loïc Minier

Revision history for this message
Mattias Backman (mabac) wrote :

I have looked at the code since I can't reproduce the error and can't find any obvious fix that is significantly better than the timed sleeps.

There where some more discussions last week about changing the behavior (see bug #744449) which would get rid of the cause for this bug.

Thanks,

Mattias

Revision history for this message
Guilherme Salgado (salgado) wrote :

It seems to be easy to reproduce this inside a VM. IIRC it happened on the Fedora VM I tried to run l-m-c and another use just reported he's affected by it on a Ubuntu 10.04 VM.

Revision history for this message
Fathi Boudra (fboudra) wrote :

using 2011.08 under Debian, I trigger the same issue.
I tried the work-arounds:
* udevadm settle -> doesn't work
* udisks --inhibit -> doesn't work
* 30s sleep -> works

Changed in linaro-image-tools:
status: Triaged → Confirmed
Changed in linaro-image-tools:
assignee: nobody → Georgy Redkozubov (georgy-redkozubov)
Changed in linaro-image-tools:
status: Confirmed → In Progress
Revision history for this message
Guilherme Salgado (salgado) wrote :

I am able to reproduce this by running lmc/lamc on a VM, and with that I was able to check that lp:~salgado/linaro-image-tools/717129 fixes the issue.

Georgy, maybe you'd like to merge from that branch (which is based on yours) and we'll get it reviewed and committed?

Revision history for this message
LetoThe2nd (zentrale-der-macht) wrote :

I can confirm that lp:~salgado/linaro-image-tools/717129 seems to fix the issue (as tried with lamc for bug #902227)

Changed in linaro-image-tools:
status: In Progress → Fix Committed
Mattias Backman (mabac)
Changed in linaro-image-tools:
milestone: none → 12.01
Mattias Backman (mabac)
Changed in linaro-image-tools:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.