screenshot_fullscreen_video doesn't work if run from checkbox-gui with auto-started checkbox service

Bug #1377237 reported by Daniel Manrique
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Checkbox Provider - Base
Triaged
High
Unassigned

Bug Description

Steps to reproduce:

- Run checkbox-gui and choose either the autotesting or sru whitelist
- Select *only* the graphics/1_screenshot_fullscreen_video-something tests
- Run it

Expected result:
- Test runs, and test run completes successfully

Actual result:
- GUI client stalls with the progress bar showing the fullscreen_video test as if it were still in progress. Maybe it is, but things are stuck anyway.

Observations:

- Doesn't happen if I run with plainbox -i (to test this, the graphics/generator_screenshot_fullscreen_video.* jobs also need to be run
- Doesn't happen if I run checkbox service manually. Meaning, it only happens if checkbox service is started automatically via dbus.

I suspect the problem is with the dbus-launch commands in the test's script, these may be having trouble if started from the auto-started checkbox service.

Tags: bite-size
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Confirmed it doesn't happen with plainbox. Debugging checkbox gui now

Revision history for this message
Daniel Manrique (roadmr) wrote :

OK, I confirmed the problem is with dbus-launch:

 command:
  [ -f ${PLAINBOX_PROVIDER_DATA}/video/Ogg_Theora_Video.ogv ] || { echo "Video file not found"; exit 1; }
  dbus-launch gsettings set org.gnome.totem repeat true
  totem --fullscreen ${PLAINBOX_PROVIDER_DATA}/video/Ogg_Theora_Video.ogv 2>/dev/null &
  set -o pipefail
  sleep 15 && camera_test still --device=/dev/external_webcam -f ${PLAINBOX_SESSION_SHARE}/screenshot_fullscreen_video_${index}.jpg -q 2>&1 | ansi_parser
  sleep 5 && totem --quit 2>/dev/null
  dbus-launch gsettings set org.gnome.totem repeat false

The wrapping dbus-launch lines set totem to repeat for the duration of the test, then restore to the previous setting.

For some reason, when using checkbox-gui and the automatically-started plainbox service, these lines cause things to stall. This is a bit weird because the first line seems to at least run and exit, the rest of the test runs, then the last line gets stuck.

I confirmed that this doesn't happen if run using plainbox run, or starting checkbox service manually, then connecting checkbox-gui to it to run the tests (this requires removing the .service file). I didn't check with checkbox-gui.

I also checked that removing the dbus-launch part (so the commands start with just gsettings) makes them work; from checkbox-gui, things work as expected with the video configuration being correctly set/reset.

Interestingly, and annoyingly, if I modify the command to keep dbus-launch but pipe the output to a file (to maybe see what dbus-launch is doing), things work, i.e.

  dbus-launch gsettings set org.gnome.totem repeat true >/tmp/log 2>&1

Revision history for this message
Daniel Manrique (roadmr) wrote :
Changed in checkbox:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Daniel Manrique (roadmr) wrote :

This is a sample of the session as it was when the client stalled. Since things are still running/stuck, the session may be weird or incomplete.

Revision history for this message
Daniel Manrique (roadmr) wrote :
summary: - screenshot_fullscreen_video doesn't work if run from checkbox-gui
+ screenshot_fullscreen_video doesn't work if run from checkbox-gui with
+ auto-started checkbox service
Revision history for this message
Daniel Manrique (roadmr) wrote :
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Not sure what is going on but there are two evident sub-bugs here:

1) dbus-launch is always wrong to use
2) plainbox needs to become like systemd and control all the spawned children and needs to kill them reliably

Revision history for this message
Daniel Manrique (roadmr) wrote :

Per Zygmunt's suggestion, I added set -x to the beginning of the command, then wrapped it all in a giant () and redirected to a file (blah) >/tmp/megalog 2>&1. This is the result:

+ '[' -f /usr/share/2013.com.canonical.certification:checkbox/data/video/Ogg_Theora_Video.ogv ']'
+ dbus-launch gsettings set org.gnome.totem repeat true
Activating service name='ca.desrt.dconf'
Successfully activated service 'ca.desrt.dconf'
+ set -o pipefail
+ sleep 15
+ totem --fullscreen /usr/share/2013.com.canonical.certification:checkbox/data/video/Ogg_Theora_Video.ogv
+ camera_test still --device=/dev/external_webcam -f /home/ubuntu/.cache/plainbox/sessions/pbox-va2_uf96.session/CHECKBOX_DATA/screenshot_fullscreen_video_1.jpg -q
+ ansi_parser
+ sleep 5
+ totem --quit
+ dbus-launch gsettings set org.gnome.totem repeat false
Activating service name='ca.desrt.dconf'
Successfully activated service 'ca.desrt.dconf'

Interestingly, and as can be seen here, when I did this, things *worked* again :D

Revision history for this message
Zygmunt Krynicki (zyga) wrote :
Download full text (8.4 KiB)

19:41 < zyga> roadmr: I can hear you now
19:41 < zyga> roadmr: I've killed dropbox
19:41 < zyga> roadmr: sorry
19:41 < zyga> yes
19:41 < zyga> it's good
19:41 -!- Irssi: Join to #Cert was synced in 30 secs
19:41 < zyga> it's just laggy
19:41 < zyga> but I can
19:41 < zyga> ok
19:41 < zyga> so we have all the tests
19:41 < zyga> wich one should I keep?
19:41 < zyga> hehe
19:42 < zyga> so why?
19:42 < zyga> run?
19:42 < zyga> yes
19:52 < roadmr> zyga: see my last comment, an easy workaround seems to be piping dbus-launch to a file or something. Kinda scary
                because I don't know why that works (voodoo), but it may help figure things out
19:52 < roadmr> zyga: console log and compressed session attached, though the logs for that job in the session look empty :/
19:52 < roadmr> (even the io logs)
19:55 < zyga> roadmr: hmmmmmmmmmmmm
19:55 < roadmr> zyga: fun, eh? :)
19:55 < zyga> so maybe something is stuck on a PIPE buffer
19:57 < zyga> roadmr: do you have the log files from plainbox itself?
19:57 < zyga> ~/.cache/plainbox/logs/
19:57 < roadmr> zyga: oh... that one was empty :/
19:57 < zyga> roadmr: ok
19:57 < zyga> roadmr: HMM
19:57 < zyga> roadmr: /me tries something
19:58 < roadmr> zyga: if you want to see differences in environment, a dummy job that just does "env" is usually what I do
19:58 < roadmr> er, env >/tmp/some-file
19:59 < roadmr> cgregan: I pushed the whitelist change to the testing branch and am building a release candidate now, if you're
                happy with everything else we can skip testing it and just do the same process for the release package
19:59 < zyga> roadmr: I was going to try that, did you get both env outputs?
20:00 < roadmr> zyga: no, but I can do it easily, give me 5 mins
20:00 < zyga> k
20:01 < cgregan> yeah...just push to prod roadmr
20:01 < cgregan> I will do a little sanity check.
20:01 < roadmr> cgregan: ok, here I go then
20:03 < zyga> roadmr: so why do we use dbus-launch there?
20:03 < roadmr> zyga: no idea, we'd have to ask spineau, he wrote that part of the job
20:04 * roadmr would benefit from 2 computers side-by-side instead of having to 180 the chair to reach the one behind...
20:07 < zyga> roadmr: ok I have a theory
20:07 < zyga> roadmr: hehe
20:07 < zyga> :)
20:07 < roadmr> :) glad to see it's at least an interesting bug...
20:08 < roadmr> almost done collecting env data
20:12 < zyga> roadmr: so first part of the theory
20:12 < zyga> roadmr: using dbus-launch is totally broken
20:12 < zyga> roadmr: it's leaving processes behind
20:12 < zyga> roadmr: it starts new dbus daemons each time
20:14 < zyga> roadmr: that part is non-controversial and can be quickly tested by using dbus-launch and reading the maual page
20:14 < roadmr> ok... yes I thought starting a new dbus bus was weird
20:15 < zyga> roadmr: so what happens when you redirect to a file
20:15 < zyga> it "all works"
20:15 < zyga> I think that's because..
20:16 < zyga> the file that we save is actually in a temporary directory
20:17 < zyga> no, that makes no sense :/
20:17 < zyga> no idea how that changes anything
20:17 < roadmr> zyga: a head-scratcher... ok, I attached good and bad environments
20:17 < zyga> roadmr: quick...

Read more...

Revision history for this message
Daniel Manrique (roadmr) wrote :

OK, wrapping the command in () 2>&1 | tee /tmp/megalog produces the exact same log file *but* it stalls, as Zygmunt predicted. Not pasting output from that as it's identical as the above comment.

Revision history for this message
Daniel Manrique (roadmr) wrote :
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

20:41 < cgregan> agreed roadmr
20:45 < roadmr> zyga: yes, using | tee does stall
20:50 < zyga> roadmr: but now we have a real log
20:50 < zyga> roadmr: let's see it
20:50 < zyga> roadmr: I bet it will be of interesting size
20:51 < zyga> roadmr: can you tell me the size of the log file as it is now?
20:52 < zyga> roadmr: is it close to 1048576
20:55 < roadmr> zyga: hmm, it's 727 bytes, this is the teed file. The plainbox io logs are also still at size 0
20:56 < zyga> roadmr: hmmm
20:56 < zyga> roadmr: curious
20:56 < zyga> roadmr: so what's in that log?
20:56 < zyga> roadmr: is this with dbus-launch still in?
20:56 < zyga> (me would rather change one thing at a time)
20:58 < roadmr> zyga: yes, dbus-launch... I'm trying to change one thing at a time only :)
20:58 < roadmr> zyga: that 727-byte log? it's the one I posted in the bug report
20
:58 < roadmr> it ends with "Successfully activated service blah"
21:06 < zyga> ah
21:06 < zyga> looking
21:07 < zyga> roadmr: ok
21:07 < zyga> roadmr: *interesting*
21:07 < zyga> roadmr: look, it's actually hanged there on the second gsettings
21:07 < zyga> roadmr: can you see the process list?
21:08 < zyga> roadmr: ps aux if you can
21:08 < zyga> kissiel: has your neighbour finished drilling yet?
21:08 < kissiel> zyga, y, it's 9pm after all
21:08 < roadmr> zyga: yep... ok, there's no gsettings or dbus-launch individual process, though I do see the huge command (as
21:09 < zyga> roadmr: if you run htop in tree mode
21:09 < zyga> roadmr: what is that big command waiting on?
21:09 < zyga> roadmr: and which children does it have/
21:10 < roadmr> zyga: just "tee megalog"
21:10 < zyga> hmm
21:10 < zyga> can you pastebin that somehow?
21:10 < kissiel> zyga, oh shoot, drilling
21:11 < zyga> kissiel: OMG :)
21:11 < kissiel> zyga, unbelievable
21:11 < zyga> kissiel: when does it become illegal to make noise?
21:11 < roadmr> zyga: I can try :)
21:11 < kissiel> zyga, 10pm
21:11 < zyga> roadmr: thanks
21:11 < zyga> roadmr: ps aux is good too :)
21:14 < roadmr> zyga: attached to the bug
21
:14 < roadmr> zyga: I need to take the dog for a walk, is begging to go out
21:17 < zyga> roadmr: thanks
21:17 < zyga> roadmr: go :)
21:17 < roadmr> brb
21:18 -!- roadmr is now known as roadmr_afk
21:22 < zyga> roadmr_afk: when you are back
21:22 < zyga> roadmr_afk: could you please kill processes 24849 and 24833
21:23 < zyga> roadmr_afk: after each kill, check if that causes process 24839 to exit
21:23 < zyga> roadmr_afk: I have a theory
21:23 < zyga> roadmr_afk: which is harder to test artificially but I'll do my best
21:23 < zyga> roadmr_afk: for now it might require --close-stderr to each dbus-launch
21:23 < zyga> roadmr_afk: (after we test that without dbus-launch it doesn't freeze at all)
21:24 < zyga> roadmr_afk: so I'm trying to build an artificial test case
21:24 < zyga> roadmr_afk: but once you are back and still want to work on this, ping me please ;0

Revision history for this message
Daniel Manrique (roadmr) wrote :

Though it's not entirely clear why this happens, Zygmunt figured out that removing dbus-launch helps things work well, and per dbus-launch's page, that seems correct, so I'll set this to triaged, with the intention of removing dbus-launch from all commands that use it. There are 4 instances, 2 in graphics.txt.in and 2 in graphics-legacy.txt.in.

tags: added: bite-size
Changed in checkbox:
status: Confirmed → Triaged
affects: checkbox → plainbox-provider-checkbox
Changed in plainbox-provider-checkbox:
milestone: none → 0.14
Changed in plainbox-provider-checkbox:
milestone: 0.14 → 0.15
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.