Code review comment for lp:~kdub/powerd/fix-1258655

Revision history for this message
Seth Forshee (sforshee) wrote :

I don't think this actually "fixes" anything, I think it probably just
shuffles timing around and ends up somehow hiding the problem.

On Wed, Dec 18, 2013 at 10:40:39PM -0000, Kevin DuBois wrote:
> Kevin DuBois has proposed merging lp:~kdub/powerd/fix-1258655 into lp:powerd.
>
> Commit message:
> 1) eliminating fb_state as a global variable and locking actual_display_state properly.

Previously all updates to actual_display_state happened on the main
loop. There were a couple of reads off the main loop related to power
button handling, but adding a lock doesn't eliminate any class of races
(i.e. the state changes just after the read). So actual_screen_state
doesn't *need* any locking until you start manipulating it from
wait_for_fb_thread(). Even then I find most of the locking to be
dubious, more on that later.

> powerd must ensure that the system is in a usable state before
> triggering the calls to the display server. The problem was that
> for earlysuspend devices, there were two threads that thought they had
> responsibility for turning the display on, when really only the thread
> monitoring the files in /sys/power had the right to do so. Furthermore,
> the variables concerned with display state "actual_display_state" and
> "fb_state" were accessed by both the file-monitoring thread and the the
> request-servicing thread, causing races around the actual state of the
> display.

Previously only the main thread ever turned the display on/off; the
monitoring thready only scheduled work on the main thread to do so. The
monitoring thread never checks actual_screen_state, only the work which
is scheduled on the main thread does this. Therefore there is no race in
accessing actual_screen_state, since it's always happening on the main
thread.

This change actually reduces serialization. Previously changes to the
display state and to the system power state were serialized relative to
each other by nature of both happening on the main thread. It _should_
be impossible to have earlysuspend enabled while the display was on due
the following:

 * Before the display was turned on update_display_state() requested the
   active state, which would synchronously disable earlysuspend before
   proceeding to turn on the display.
 * The display was turned off before releasing the active state request.
   Releasing this request may cause the system to be put into
   suspend/earlysuspend mode.
 * All of this happens on the main thread and is thus serialized

These changes move the display on/off to another thread, so the
guarantees of the display state being serialized wrt the system power
state are broken. There *are* races between turning the screen on on the
monitor thread and turning it off on the main thread with your changes.
Expanding the amount of code serialized by the mutex in the screen off
case could help some, but I'm not sure that it will ever be entirely
sufficient.

My guess would be that the actual problem is that unity's
setScreenPowerMode() doesn't complete synchronously wrt what powerd is
doing. So something like this might happen:

 * powerd takes an active state request to guarantee that earlysuspend
   is disabled, then calls setScreenPowerMode to turn the screen on
 * Unity hands the request off to another thread and setScreenPowerMode
   returns
 * powerd considers the "screen on" action to be complete and proceeds
 * Another power button press happens. powerd calls setScreenPowerMode.
 * Unity still hasn't finished the screen on request, so it queues the
   screen off request and setScreenPowerMode returns.
 * powerd releases the active state request, and autosuspend (i.e.
   earlysuspend) is enabled.
 * Unity tries to turn the screen on, but the display driver is now in
   earlysuspend.

Does that seem possible? If so, I think the appropriate fix would be to
either make setScreenPowerMode() synchronous with turning the display on
or off or else providing some way for powerd to know when the state
change completes. This change might just speed up the process of calling
into unity enough to mask the problem, or something like that.

> +/* the Display Server (mir or sf) is free to fire off a few more frames as
> + part of its shutdown process. powerd must guarantee the fb_state is FB_AWAKE,
> + and the system is not suspended until the request returns. */

powerd should already guarantee that the state is FB_AWAKE before making
the call to turn on the display, and nothing you've said here has
convinced me that it is otherwise.

> gboolean powerd_display_enabled(void)
> {
> - return actual_screen_state == POWERD_DISPLAY_STATE_ON;
> + g_mutex_lock(&actual_screen_state_guard);
> + gboolean ret = (actual_screen_state == POWERD_DISPLAY_STATE_ON);
> + g_mutex_unlock(&actual_screen_state_guard);
> + return ret;
> }

I'm not convinced that using a mutex here really adds anything of value.
Sure, if the screen is being turned on while the screen is getting
turned on you're going to see the new state instead of the old one, but
it's still timing sensitive; e.g. what if this is called just before the
screen state is changed, then there's a context switch which changes the
screen state, then there's another context switch back which uses the
now stale result? To avoid that class of races you have to lock all code
both when it is reading the variable and when it is taking action based
on the value it read.

> @@ -230,11 +227,19 @@
> if (using_ab)
> powerd_autobrightness_disable();
> powerd_set_brightness(0);
> +
> if (!display_set_power_mode(0, "off")) {
> powerd_warn("failed to set display power mode, not clearing state");
> return;
> }
>
> + if (!earlysuspend_used)
> + {
> + g_mutex_lock(&actual_screen_state_guard);
> + actual_screen_state = DISPLAY_STATE_OFF;
> + g_mutex_unlock(&actual_screen_state_guard);
> + }
> +

If you want this to be effective you need to take the mutex before
turning the screen off and release it after clearing the system state
request. But I still don't see how that would be better than the current
technique of always doing display on/off on the main thread.

> @@ -253,16 +259,18 @@
> if (!ret)
> powerd_warn("Request for active state failed");
>
> - /*
> - * If the current state is suspend we need to wait for
> - * notification of leaving the active state, otherwise the
> - * screen may fail to turn on for devices using earlysuspend.
> - * Otherwise we can turn on the screen right away.
> - */
> - if (fb_state == FB_AWAKE)
> + /* if the device uses earlysuspend, responsibility for turning
> + the display server and changing actual_screen_state falls
> + to wait_for_fb_thread() */
> + if (!earlysuspend_used)
> + {
> turn_on_display(use_ab);
> + g_mutex_lock(&actual_screen_state_guard);
> + actual_screen_state = POWERD_DISPLAY_STATE_ON;
> + g_mutex_unlock(&actual_screen_state_guard);
> + }

Again, I find using the mutex in this way to be pointless.

> } else {
> - /* Only changing brightness */
> + /* display is already on, must only changing brightness */

Bad grammar ;-)

> static gpointer wait_for_fb_thread(gpointer unused)
> {
> - unsigned long state = fb_state;
> -
> - if (state >= NUM_FB_STATES) {
> - powerd_warn("Invalid FB state");
> - return NULL;
> - }
> -
> + enum fb_state fb_state = FB_AWAKE;
> while (1) {
> - state = (state == FB_SLEEP) ? FB_AWAKE : FB_SLEEP;
> - if (wait_for_file(fb_file_names[state])) {
> - powerd_warn("Error reading %s", fb_file_names[state]);
> + enum fb_state next_state = (fb_state == FB_SLEEP) ? FB_AWAKE : FB_SLEEP;
> +
> + if (wait_for_file(fb_file_names[next_state])) {
> + powerd_warn("Error reading %s", fb_file_names[next_state]);
> return NULL;
> }
> - g_timeout_add(0, fb_state_changed, (gpointer)state);
> +
> + fb_state = next_state;
> + powerd_debug("fb state %s", fb_state == FB_SLEEP ? "sleep" : "awake");
> +
> + g_mutex_lock(&actual_screen_state_guard);
> + if (fb_state == FB_AWAKE)
> + {
> + turn_on_display(ab_enabled(internal_state.flags));
> + actual_screen_state = POWERD_DISPLAY_STATE_ON;
> + }
> + else if (fb_state == FB_SLEEP)
> + {
> + actual_screen_state = DISPLAY_STATE_OFF;
> + }
> + g_mutex_unlock(&actual_screen_state_guard);

This part is wrong. We do not want to turn the screen uconditionally
when the FB state changes from sleep->wake. I don't know if there are
any today, but there will undoubtedly be cases in the future where we
want to come out of suspend without turning the screen on. For example:
we receive a push notification of a new email and want to turn on
briefly to play a sound then go back to suspend without ever turning on
the screen.

I can give you a synthetic test case though that might demonstrate
problems (depending on what the system does when USB is plugged in
nowadays, but in the build I flashed yesterday I think it should show
problems). Plug in USB and turn off the screen. From a shell run
'powerd-cli active'. Before your changes the screen would have stayed
off, but now it will probably turn on. That would be a regression. Then
I suspect if you press Ctrl-C to stop powerd-cli you might see something
crash, or some interesting messages in dmesg, because powerd is going to
re-enable suspend without calling into unity to turn off the display.

So obviously, I'm giving a NACK on these changes.

« Back to merge proposal