Mir

Merge lp:~albaguirre/mir/screencast-crash-fix into lp:mir

screencast-crash-fix
Merge into development-branch

Proposed by Alberto Aguirre on 2014-02-14

Status:

Merged

Approved by:

Daniel van Vugt on 2014-02-18

Approved revision:

no longer in the source branch.

Merged at revision:

1404

Proposed branch:

lp:~albaguirre/mir/screencast-crash-fix

Merge into:

lp:mir

Diff against target:

135 lines (+38/-36)

1 file modified

src/utils/screencast.cpp (+38/-36)

To merge this branch:

bzr merge lp:~albaguirre/mir/screencast-crash-fix

Medium

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
PS Jenkins bot (community)	continuous-integration	2014-02-14	Approve on 2014-02-18
Alexandros Frantzis (community)		2014-02-14	Approve on 2014-02-14
Review via email: mp+206337@code.launchpad.net

Commit message

Fix crash in android devices by working around a subtle threading bug

Use a dummy thead_local array to push the gl/egl context TLS into a region
where the future wait code does not overwrite it.

Other changes:
- Query the preferred read pixel format/type at setup.
- Write a video file instead of snapshot files; the raw file can then be played as a video with vlc

fixes: lp: #1280086

Description of the change

Fix crash in android devices by working around a subtle threading bug

Use a dummy thead_local array to push the gl/egl context TLS into a region
where the future wait code does not overwrite it.

Other changes:
- Query the preferred read pixel format/type at setup.
- Write a video file instead of snapshot files; the raw file can then be played as a video with vlc

fixes: lp: #1280086

For some unkown reason, on android devices, the EGL context goes bad after waiting on a future in the main thread - the EGL context was made current in the main thread and should remain so.

Possibly a TLS issue with libhybris?

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-02-14:

> Remove the use of asyncs to dispatch file writes; no real reason to launch one as the main loop just synchronizes it with a wait anyway.

Actually there is a reason: with an async file write, we can swap the screencast buffer and write to file concurrently, which depending on the actual operation timings and IO blocking, helps improve the capture frame rate (so, iteration_time = max(write_time, swap_time) instead of iteration_time = write_time + swap_time).

Anyway, this is the main point of the workaround, so no point arguing for this to stay :)
Hopefully we can reinstate it when when we fix the core issue.

> std::ofstream videoFile(ss.str());

We should use underscores in variable names: http://unity.ubuntu.com/mir/cppguide/index.html#Variable_Names

> glGetIntegerv(GL_IMPLEMENTATION_COLOR_READ_FORMAT, &format);

GL_IMPLEMENTATION_COLOR_READ_FORMAT is meaningful only in combination with GL_IMPLEMENTATION_COLOR_READ_TYPE (i.e., we can't assume per the standard that GL_IMPLEMENTATION_COLOR_READ_TYPE => GL_UNSIGNED_BYTE, although it's often the case). Since we currently want support for either RGBA8888 or BGRA8888 frames, we should check that GL_IMPLEMENTATION_COLOR_READ_TYPE == GL_UNSIGNED_BYTE and GL_IMPLEMENTATION_COLOR_READ_FORMAT == BGRA and only then return it as valid, otherwise return the guaranteed default RGBA.

Note that when implementing surface snapshots I found that using the GL_IMPLEMENTATION_COLOR_READ_FORMAT/TYPE was unreliable (e.g., using the returned values failed depending on the underlying buffer type), although the situation was more complex there, involving FBOs backed by textures backed by EGLImages. That's why I used the double glReadPixels scheme, which was also used here.

review: Needs Fixing

Revision history for this message

Alberto Aguirre (albaguirre) wrote on 2014-02-14:

> > Remove the use of asyncs to dispatch file writes; no real reason to launch
> one as the main loop just synchronizes it with a wait anyway.
>
> Actually there is a reason: with an async file write, we can swap the
> screencast buffer and write to file concurrently, which depending on the
> actual operation timings and IO blocking, helps improve the capture frame rate
> (so, iteration_time = max(write_time, swap_time) instead of iteration_time =
> write_time + swap_time).
>
Aahh, very true. Updated commit/description.

> > std::ofstream videoFile(ss.str());
>
> We should use underscores in variable names:
> http://unity.ubuntu.com/mir/cppguide/index.html#Variable_Names
>

Fixed now.

> > glGetIntegerv(GL_IMPLEMENTATION_COLOR_READ_FORMAT, &format);
>
> GL_IMPLEMENTATION_COLOR_READ_FORMAT is meaningful only in combination with
> GL_IMPLEMENTATION_COLOR_READ_TYPE (i.e., we can't assume per the standard that
> GL_IMPLEMENTATION_COLOR_READ_TYPE => GL_UNSIGNED_BYTE, although it's often the
> case). Since we currently want support for either RGBA8888 or BGRA8888 frames,
> we should check that GL_IMPLEMENTATION_COLOR_READ_TYPE == GL_UNSIGNED_BYTE and
> GL_IMPLEMENTATION_COLOR_READ_FORMAT == BGRA and only then return it as valid,
> otherwise return the guaranteed default RGBA.
>
> Note that when implementing surface snapshots I found that using the
> GL_IMPLEMENTATION_COLOR_READ_FORMAT/TYPE was unreliable (e.g., using the
> returned values failed depending on the underlying buffer type), although the
> situation was more complex there, involving FBOs backed by textures backed by
> EGLImages. That's why I used the double glReadPixels scheme, which was also
> used here.

Ok. I implemented these suggestions at setup time. To make sure I issue a glReadPixels of 1 pixel to validate the read format and type.

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-02-14:

56 + if (type == GL_UNSIGNED_BYTE)
57 + read_pixel_format = static_cast<GLenum>(format);

Better to use (type == GL_UNSIGNED_BYTE && format == GL_BGRA_EXT), for now at least, since we only support BGRA as an alternative, and platforms are allowed to return other types (e.g. GL_RGB). Of course, ideally we would support other types too.

Revision history for this message

Alberto Aguirre (albaguirre) wrote on 2014-02-14:

> 56 + if (type == GL_UNSIGNED_BYTE)
> 57 + read_pixel_format = static_cast<GLenum>(format);
>
> Better to use (type == GL_UNSIGNED_BYTE && format == GL_BGRA_EXT), for now at
> least, since we only support BGRA as an alternative, and platforms are allowed
> to return other types (e.g. GL_RGB). Of course, ideally we would support other
> types too.

Yeah, I just realize it could be another non 4-byte format. So then I'll do just the
glReadPixels as before but at setup time. See update.

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-02-14:

Other than what I mentioned above, looks ok, so pre-approving in order not to block this.

There is a chance we will encounter similar issues as I did for surface snapshots, in which case we will need to try both formats (RGBA and implementation/BGRA) and take the one that works, but let's do this only if the need comes up.

review: Approve

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-02-14:

(Latest updates) Looks good.

review: Approve

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-14:

review: Needs Fixing (continuous-integration)

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-15:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/mir-team-mir-development-branch-ci/852/rebuild

review: Approve (continuous-integration)

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-15:

review: Needs Fixing (continuous-integration)

Revision history for this message

Alberto Aguirre (albaguirre) wrote on 2014-02-17:

I restored the async file write by using a different workaround.

Using a thread_local variable of enough size seems to push the egl/gl context TLS into a region where the future wait code does not overwrite it.

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-17:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/mir-team-mir-development-branch-ci/853/rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-02-17:

Failure not related to this branch, rebuilding.

(see https://launchpad.net/bugs/1281146)

Revision history for this message

Alexandros Frantzis (afrantzis) wrote on 2014-02-17:

> Failure not related to this branch, rebuilding.

Ah, hadn't seen the latest update :)

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-17:

Click here to trigger a rebuild:
http://s-jenkins.ubuntu-ci:8080/job/mir-team-mir-development-branch-ci/866/rebuild

review: Approve (continuous-integration)

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-18:

Hmm, usually I'd say wait for a second review+approval on all MPs. But in this case it's really just Alexandros who needed to check it.

Revision history for this message

PS Jenkins bot (ps-jenkins) wrote on 2014-02-18:

review: Needs Fixing (continuous-integration)

Revision history for this message

Daniel van Vugt (vanvugt) wrote on 2014-02-18:

Now landing is blocked by bug 1281145. We need that fix to land too :/

Revision history for this message

PS Jenkins bot (ps-jenkins) on 2014-02-18:

review: Approve (continuous-integration)

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Alberto Aguirre

Emanuele Antonio Faraone

Gerry Boland

Mir development team

 === modified file 'src/utils/screencast.cpp'
 --- src/utils/screencast.cpp	2014-02-06 14:44:00 +0000
 +++ src/utils/screencast.cpp	2014-02-17 14:56:45 +0000
@@ -42,45 +42,23 @@
  volatile sig_atomic_t running = 1;
++//In android, waiting for a future is causing the gl/egl context to become invalid
++//possibly due to assumptions in libhybris/android linker.
++//A TLS allocation in the main thread is forced with this variable which seems to push
++//the gl/egl context TLS into a slot where the future wait code does not overwrite it.
++thread_local int tls_hack[2];
++
  void shutdown(int)
+ {
      running = 0;
+ }
--std::future<void> write_frame_to_file(
--    std::vector<char> const& frame_data, int frame_number, GLenum format)
--{
--    return std::async(
--        std::launch::async,
--        [&frame_data, frame_number, format]
--        {
--            std::stringstream ss;
--            ss << "/tmp/mir_" ;
--            ss.width(5);
--            ss.fill('0');
--            ss << frame_number;
--            ss << (format == GL_BGRA_EXT ? ".bgra" : ".rgba");
--            std::ofstream f(ss.str());
--            f.write(frame_data.data(), frame_data.size());
--        });
--}
--
--GLenum read_pixels(mir::geometry::Size const& size, void* buffer)
++void read_pixels(GLenum format, mir::geometry::Size const& size, void* buffer)
+ {
      auto width = size.width.as_uint32_t();
      auto height = size.height.as_uint32_t();
--    GLenum format = GL_BGRA_EXT;
--
      glReadPixels(0, 0, width, height, format, GL_UNSIGNED_BYTE, buffer);
--
--    if (glGetError() != GL_NO_ERROR)
--    {
--        format = GL_RGBA;
--        glReadPixels(0, 0, width, height, format, GL_UNSIGNED_BYTE, buffer);
--    }
--
--    return format;
+ }
@@ -184,6 +162,13 @@
+         {
              throw std::runtime_error("Failed to make screencast surface current");
+         }
++
++        uint32_t a_pixel;
++        glReadPixels(0, 0, 1, 1, GL_BGRA_EXT, GL_UNSIGNED_BYTE, &a_pixel);
++        if (glGetError() == GL_NO_ERROR)
++            read_pixel_format = GL_BGRA_EXT;
++        else
++            read_pixel_format = GL_RGBA;
+     }
      ~EGLSetup()
@@ -200,10 +185,16 @@
              throw std::runtime_error("Failed to swap screencast surface buffers");
+     }
++    GLenum pixel_read_format()
++    {
++        return read_pixel_format;
++    }
++
      EGLDisplay egl_display;
      EGLContext egl_context;
      EGLSurface egl_surface;
      EGLConfig egl_config;
++    GLenum read_pixel_format;
  };
  void do_screencast(MirConnection* connection, MirScreencast* screencast,
@@ -216,22 +207,30 @@
                                    frame_size.width.as_uint32_t() *
                                    frame_size.height.as_uint32_t();
--    int frame_number{0};
      std::vector<char> frame_data(frame_size_bytes, 0);
--    std::future<void> frame_written_future =
--        std::async(std::launch::deferred, []{});
      EGLSetup egl_setup{connection, screencast};
++    auto format = egl_setup.pixel_read_format();
++
++    std::stringstream ss;
++    ss << "/tmp/mir_screencast_" ;
++    ss << frame_size.width << "x" << frame_size.height;
++    ss << (format == GL_BGRA_EXT ? ".bgra" : ".rgba");
++    std::ofstream video_file(ss.str());
      while (running)
+     {
--        frame_written_future.wait();
++        read_pixels(format, frame_size, frame_data.data());
--        auto format = read_pixels(frame_size, frame_data.data());
--        frame_written_future = write_frame_to_file(frame_data, frame_number, format);
++        auto write_out_future = std::async(
++                std::launch::async,
++                [&video_file, &frame_data] {
++                    video_file.write(frame_data.data(), frame_data.size());
++                });
          egl_setup.swap_buffers();
--        ++frame_number;
++
++        write_out_future.wait();
+     }
+ }
@@ -245,6 +244,9 @@
      char const* socket_file = nullptr;
      uint32_t output_id = mir_display_output_id_invalid;
++    //avoid unused warning/error
++    tls_hack[0] = 0;
++
      while ((arg = getopt (argc, argv, "hm:o:")) != -1)
+     {
          switch (arg)