Top Crasher: /usr/sbin/system-image-dbus:RuntimeError:_download

Bug #1365646 reported by errors.ubuntu.com bug bridge
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu system image
Fix Released
Critical
Barry Warsaw
system-image (Ubuntu)
Fix Released
Critical
Barry Warsaw
ubuntu-system-settings (Ubuntu)
Fix Released
Critical
Ken VanDine

Bug Description

The Ubuntu Error Tracker has been receiving reports about a problem regarding system-image. This problem was most recently seen with version 2.3.2-0ubuntu2, the problem page at https://errors.ubuntu.com/problem/2ddecb9c81d025a9d1342a892298e1f9788500b6 contains more details.

Related branches

tags: added: qa-daily-testing qasoak rtm14
summary: - /usr/sbin/system-image-dbus:RuntimeError:_download
+ Top Crasher: /usr/sbin/system-image-dbus:RuntimeError:_download
Revision history for this message
Barry Warsaw (barry) wrote :

So, what changed recently? Did we get a new version of system-settings or (less likely to be affecting) ubuntu-download-manager?

tags: added: client
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

https://errors.ubuntu.com/oops/73dfcb02-3447-11e4-8466-fa163e75317b was generated by a monkey runner, so there are no steps to reproduce.

On a mako and another device with the 20140903.1 image. The phone was flashed using --bootstrap so there is no old software. It was also reproduced by the same monkey script on 20140902 on both devices.

Revision history for this message
Barry Warsaw (barry) wrote :

Does the monkey need a banana? ;)

I tried a few scenarios with my n10 and cannot reproduce it so far.

Changed in ubuntu-system-image:
status: New → Incomplete
Changed in system-image (Ubuntu):
status: New → Incomplete
Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I've set up a camera to see if we can catch the monkey when it throws a wrench. The camera has a hard time transitioning between the dark screen of the indicator and the white screen of the system-settings app, but maybe it will work.

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I haven't had this crash in my monkey test again, but the crash is still being reported in the field.

Revision history for this message
Barry Warsaw (barry) wrote :

Any information from the field on how to reproduce it? If not, for someone who has seen it, please attach /var/log/system-image/client.log

Revision history for this message
Ken VanDine (ken-vandine) wrote :

My mako is now hitting this repeatably, here's my client.log http://pastebin.ubuntu.com/8302100/

I'm wondering if it's being triggered by some race, the main view of system-settings does an update check to show available updates, and entering the updates page starts another check. It seems if I very quickly enter the updates page, before the check on the main page completes, it doesn't crash. But if I wait long enough on the main view for the updates available count to show, then enter updates system-settings and system-image-dbus both crash.

Revision history for this message
Ken VanDine (ken-vandine) wrote :

I've narrowed this down a bit. It seems if there's a system update available, and you get prompted to install it (after it's downloaded) and you click "Not now", the update is still left in /android/cache/recovery/.

If you go back into system-settings later, and the latest system-update is still available, and already downloaded, system-image-dbus crashes. I verified this by deleting the downloaded update from /android/cache/recovery/ and restart system-settings. I got prompted to update, and after choosing "Install & restart", everything worked and system-image-dbus left no crash file behind.

Changed in system-image (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Barry Warsaw (barry) wrote :

auto_download = wifi only & device is on wifi

Changed in system-image (Ubuntu):
assignee: nobody → Barry Warsaw (barry)
Changed in system-image (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Barry Warsaw (barry) wrote :

Here's what I see on krillin r26.

* Go to System Settings
* When an update is available, click on Updates available (not the Updates icon)
* When the update is done downloading, click on Not Now
* I'm left on the Updates page (still showing r26)
* Hit back arrow to return to System Settings
* Wait 2 minutes
* Click on Updates available again
* I see the progress meters stuck on 0% but then eventually I get prompted to do an Update System
* No system-image or system-setting crash file, and system-image-dbus is still running

# system-image-cli --info
current build number: 9
device name: krillin
channel: ubuntu-touch/ubuntu-rtm/14.09-proposed
last update: 2014-09-01 04:08:48
version version: 9
version ubuntu: 20140901
version device: 20140829-678239d

Not sure why system settings says I'm on r26 but --info says 9. I am now going to try to upgrade and see what happens.

Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

Any chance bug 1324612 is related? Its a slightly different scenario where update was never downloaded afaik

Revision history for this message
Barry Warsaw (barry) wrote :

Hmm, maybe I was on 9 and it offered to upgrade me to 26. I'm definitely on 26 now. Trying recipe again.

Revision history for this message
Barry Warsaw (barry) wrote : Re: [Bug 1365646] Re: Top Crasher: /usr/sbin/system-image-dbus:RuntimeError:_download

On Sep 09, 2014, at 09:33 PM, Pat McGowan wrote:

>Any chance bug 1324612 is related? Its a slightly different scenario
>where update was never downloaded afaik

You'd think not, since system-settings is a different process than
system-image. But right now I can't say for sure.

Revision history for this message
Barry Warsaw (barry) wrote :

Okay, so after setting .writable_image and hacking channel.ini to think it's r25 instead of r26. I can reproduce the crash file.

I'm seeing some weird behavior though coming from the dbus api via system-settings. Not enough to report them, but it does make me think that some s-s behavior has changed, which is triggering this problem. Could be a latent race window now hit by s-s where it wasn't before.

Revision history for this message
Barry Warsaw (barry) wrote :

Note that you do not even have to hit Not Now for the crash to occur. You'll get the crash file even while this dialog is still on the screen.

I instrumented systemimage/dbus.py and here's something very interesting:

[systemimage] Sep 09 22:05:19 2014 (13932) Update is downloading
[systemimage] Sep 09 22:05:20 2014 (13932) all files available in /android/cache/recovery
[systemimage] Sep 09 22:05:20 2014 (13932) preparing recovery
[systemimage] Sep 09 22:05:20 2014 (13932) Update downloaded
[systemimage] Sep 09 22:05:20 2014 (13932) release checking lock from _download()
[systemimage] Sep 09 22:05:20 2014 (13932) released!
[systemimage] Sep 09 22:05:20 2014 (13932) -----> DU
[systemimage] Sep 09 22:05:20 2014 (13932) -----> DU
[systemimage] Sep 09 22:05:20 2014 (13932) -----> DU
[systemimage] Sep 09 22:05:20 2014 (13932) Update is downloading
[systemimage] Sep 09 22:05:20 2014 (13932) Update downloaded
[systemimage] Sep 09 22:05:20 2014 (13932) release checking lock from _download()
[systemimage] Sep 09 22:05:20 2014 (13932) Update is downloading
[systemimage] Sep 09 22:05:20 2014 (13932) Update downloaded
[systemimage] Sep 09 22:05:20 2014 (13932) release checking lock from _download()
[systemimage] Sep 09 22:05:20 2014 (13932) Update is downloading
[systemimage] Sep 09 22:05:20 2014 (13932) Update downloaded
[systemimage] Sep 09 22:05:20 2014 (13932) release checking lock from _download()

What this is telling me is that CheckForUpdate is being called, the update is downloaded (since I'm on wifi), and all files are available. It tries to release the checking lock, and it gets released. So far so good.

Next, system-settings calls DownloadUpdate three times in a row! I have no idea why.

Because DownloadUpdate was called without calling CheckForUpdate, when the download is complete (as it is almost immediately because all the files are already available), it will try to release the checking lock, which was never acquired, since it's only acquired in CFU. I need to think on the right fix for this, but it probably involves a downloading lock as well, and perhaps a guard around the checking lock.

Anyway, I think I have enough information to reproduce this in a test case.

Changed in ubuntu-system-image:
status: Incomplete → Triaged
assignee: nobody → Barry Warsaw (barry)
importance: Undecided → Critical
milestone: none → 2.3.3
status: Triaged → In Progress
milestone: 2.3.3 → 2.4
Revision history for this message
Barry Warsaw (barry) wrote :

BTW, I seem to have to click Not Now three times to dismiss the dialog. Something tells me this is related to the system settings changes. I'll leave those bugs for another day, but system-image-dbus should still be robust against any system settings craziness. :)

Revision history for this message
Ken VanDine (ken-vandine) wrote :

The multiple dialogs getting opened was a bug I just fixed (or worked around), landed in utopic-proposed last night.

In the UpdateManager code, it was getting onSystemUpdateDownloaded signal multiple times, and each time opening a dialog. The number of times this signal was emitted seems to vary, I had times where it only happened twice (1 too many) and other times where it was 5 or more times. In system-settings, it was creating a dialog each time the signal was emitted. It wasn't visual, and you could only notice it if you hit "not now". If you choose to install the update, the view changes so you never noticed the additional dialogs, then the device rebooted. What I did was to do a check, if there is a system update in progress or awaiting confirmation from the user, it just does nothing on the additional signals.

I'm not very familiar with the update code, could it be emitting that signal for each update, not just system updates?

Revision history for this message
Chris Gagnon (chris.gagnon) wrote :

I saw this 300 times last night during long running tests. let me know if you still need logs.

Revision history for this message
Barry Warsaw (barry) wrote :

On Sep 10, 2014, at 01:16 PM, Ken VanDine wrote:

>In the UpdateManager code, it was getting onSystemUpdateDownloaded
>signal multiple times, and each time opening a dialog.

That's not a signal that system-image-dbus emits. s-i-d emits
UpdateAvailableStatus for each call to CheckForUpdate. The flags in that
signal will tell you what's happening in the client, e.g. the `downloading`
flag is set if an update is being downloaded.

Downloading always completes with an UpdateDownloaded or UpdateFailed signal
(I'm ignoring pauses for now). You will get UpdateProgress signals at regular
intervals while an update is being downloaded.

Maybe that onSystemUpdateDownloaded is a translation of the s-i-d
UpdateDownloaded signal? That signal should only get emitted after an
automatic or manual download is complete. However, I note in comment #15 that
three DownloadUpdate() calls were made in a row - that could trigger three
UpdateDownloaded signals even if the download was already complete. s-i-d
would check the already downloaded files for validity, and if they're good, it
will emit an UpdateDownloaded signal.

>The number of times this signal was emitted seems to vary, I had times where
>it only happened twice (1 too many) and other times where it was 5 or more
>times. In system-settings, it was creating a dialog each time the signal was
>emitted. It wasn't visual, and you could only notice it if you hit "not
>now". If you choose to install the update, the view changes so you never
>noticed the additional dialogs, then the device rebooted. What I did was to
>do a check, if there is a system update in progress or awaiting confirmation
>from the user, it just does nothing on the additional signals.
>
>I'm not very familiar with the update code, could it be emitting that
>signal for each update, not just system updates?

Of course, s-i-d only does system updates, so I'm not sure whether this other
signal would get emitted from. Perhaps something click related?

Revision history for this message
Ken VanDine (ken-vandine) wrote :

The signals I see comes from the system-settings UpdateManager plugin, so it's likely a logic problem in that plugin. I've never looked at this code before, I'm trying to get my head around it now. There must be bug in system-settings that is triggering the multiple downloads requests, I'll try to figure that out on my end. But there must be a bug in s-i-d too causing it to crash if it gets a download request for an update that is already downloaded.

Changed in ubuntu-system-settings (Ubuntu):
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Ken VanDine (ken-vandine)
Revision history for this message
Barry Warsaw (barry) wrote :

On Sep 10, 2014, at 02:39 PM, Ken VanDine wrote:

>But there must be a bug in s-i-d too causing it to crash if it gets a
>download request for an update that is already downloaded.

That aligns with my analysis of s-i-d. This bug task for system-image is to
make DownloadUpdate robust in that case; i.e. not crash, but emit an
UpdateDownloaded signal.

Revision history for this message
Barry Warsaw (barry) wrote :

Not that you care, but it's tricky to craft a test case for this because the crash doesn't actually affect any behavior! Since the UpdateDownloaded signal is emitted before the checking lock is released, everything still works. The only effect is that s-i-d tracebacks, and you only see that in the crash report because of the way dbus-python's error handling works.

This probably means that the best thing to do is just explicitly swallow any exceptions from .release().

Revision history for this message
Barry Warsaw (barry) wrote :

More fun: I've verified that when this .crash file gets written, the si-dbus process does *not* exit. That seems crazy right? Here's what I think is happening. PyErr_Print() calls PyErr_PrintEx(1), and the latter does sys.excepthook processing. There are several places within dbus-python that calls PyErr_Print(). I haven't gdb'd the process but my guess is that the "exception-in-@method" case causes a PyErr_Print() but clears, <gasp> ignores, or otherwise arranges for the exception to not propagate to the Python main loop or otherwise cause the process to exit. Since apport hooks into sys.excepthook to write the .crash file, that's how you get the crash without exiting the si-dbus process.

How to test the darn thing?

Revision history for this message
Ken VanDine (ken-vandine) wrote :

I'm confident the trigger of the crash is system-settings calling downloadUpdate on an item that is already downloading. I have a branch that fixes that in system-settings.

Changed in ubuntu-system-settings (Ubuntu):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubuntu-system-settings - 0.3+14.10.20140911-0ubuntu1

---------------
ubuntu-system-settings (0.3+14.10.20140911-0ubuntu1) utopic; urgency=low

  [ Ken VanDine ]
  * Don't call downloadUpdate for system updates that are already
    downloading (LP: #1365646) (LP: #1311219, #1365646)
 -- Ubuntu daily release <email address hidden> Thu, 11 Sep 2014 00:42:18 +0000

Changed in ubuntu-system-settings (Ubuntu):
status: Fix Committed → Fix Released
Barry Warsaw (barry)
Changed in ubuntu-system-image:
status: In Progress → Fix Committed
Barry Warsaw (barry)
Changed in ubuntu-system-image:
status: Fix Committed → Fix Released
Barry Warsaw (barry)
Changed in system-image (Ubuntu):
status: Confirmed → In Progress
tags: added: touch-2014-09-25
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package system-image - 2.4-0ubuntu1

---------------
system-image (2.4-0ubuntu1) utopic; urgency=medium

  [ Barry Warsaw ]
  * New upstream release.
    - LP: #1353178 - The channel.ini file can override the device name by
      setting `[service]device`.
    - LP: #1324241 - Add optional instrumentation to collect code coverage
      data during test suite run via tox.
    - LP: #1279970 - When an exception occurs in a `system-image-dbus`
      D-Bus method, signal, or callback, this exception is logged in the
      standard log file, and the process exits. Also, `[system]loglevel`
      can now take an optional ":level" prefix which can be used to set
      the log level for the D-Bus API methods. By default, they log at
      `ERROR` level, but can be set lower for debugging purposes.
    - LP: #1365646 - Don't crash when releasing an unacquired checking lock.
    - LP: #1365761 - When checking files for `last_update_date()` ignore
      PermissionErrors and just keep checking the fall backs.
    - LP: #1369714 - `system-image-cli --dbus` has been deprecated and
      will be removed in the future.
  * d/control: Remove tox as a build dependency to avoid having to MIR tox,
    virtualenv, and pip.
  * d/rules:
    - Call nose2 explicitly to avoid use of tox.
    - Remove unnecessary override_dh_auto_clean rule.
  * d/system-image-common.post{inst,rm}: `set -e` to make lintian happy.
 -- Ubuntu daily release <email address hidden> Wed, 17 Sep 2014 17:56:45 +0000

Changed in system-image (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.