Mir

Protocol (buffer) replies are being sent from the compositor thread, holding up the render loop

Bug #1395421 reported by Daniel van Vugt
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mir
Fix Released
Medium
Chris Halse Rogers

Bug Description

Protocol (exchange_buffer) replies are being sent from the compositor thread, holding up the render loop.

Attached is a Google Profiler graph which shows the issue. It's fair to say that on a low-end host blocking the compositor loop to do significant protocol work is going to cause premature frame skipping.

Tags: performance

Related branches

Revision history for this message
Daniel van Vugt (vanvugt) wrote :
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

On the other hand, if we optimize our protocol code then it shouldn't matter what thread does the work in future.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I think the core of the issue is this synchronous socket write. Not only does the render loop get held up for one write() per client, but the OS is likely to context switch in that time too. Last time I measured this, it took around 1ms per client, which is a bit scary if you want to run multiple clients and still have time to composite them all within 16ms...

void mfd::SocketMessenger::send(char const* data, size_t length, FdSets const& fd_set)
{
    static size_t const header_size{2};
    mir::VariableLengthArray<mf::serialization_buffer_size> whole_message{header_size + length};

    whole_message.data()[0] = static_cast<unsigned char>((length >> 8) & 0xff);
    whole_message.data()[1] = static_cast<unsigned char>((length >> 0) & 0xff);
    std::copy(data, data + length, whole_message.data() + header_size);

    std::unique_lock<std::mutex> lg(message_lock);

    // TODO: This should be asynchronous, but we are not making sure
    // that a potential call to send_fds is executed _after_ this
    // function has completed (if it would be executed asynchronously.
    // NOTE: we rely on this synchronous behavior as per the comment in
    // mf::SessionMediator::create_surface
    ba::write(*socket, ba::buffer(whole_message.data(), whole_message.size()));

    for (auto const& fds : fd_set)
        mir::send_fds(socket_fd, fds);
}

Revision history for this message
PS Jenkins bot (ps-jenkins) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.16.0

Changed in mir:
status: Triaged → Fix Committed
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The branch that landed was really just a workaround that made things less bad :)

Changed in mir:
status: Fix Committed → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Another observation that's probably just this bug again:

On my desktop I can start 50 clients and they all run smooth. Compositor render time in the worst case never exceeds 4ms. However before I reach 60 clients they all stutter while the total render time is still much less than half a frame. It appears that I'm limited by the number of send()'s I can do sequentially per frame on a single thread rather than being limited by graphics performance. We desperately need asynchronous send support.

Revision history for this message
Mir CI Bot (mir-ci-bot) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 0.25.0

Changed in mir:
status: Triaged → Fix Committed
Changed in mir:
status: Fix Committed → Triaged
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Here's a fresh profile from the latest lp:mir code (using NBS now).

It shows much the same problem as essentially 10% of the server's real time is spent under clean_onscreen_buffers sending buffers to clients via synchronous socket writes.

Although you need to keep in mind this profile is exaggerated with 49 clients rendering simultaneously. So the fact that protocol IO is taking around 50% of Mir's time and the other 50% rendering is not interesting. What is interesting is the fact that 20% of the renderer's time is still spent in IPC sending buffer replies when it should be doing rendering.

Changed in mir:
importance: High → Medium
summary: - Protocol (exchange_buffer) replies are being sent from the compositor
- thread, holding up the render loop
+ Protocol (buffer) replies are being sent from the compositor thread,
+ holding up the render loop
Changed in mir:
milestone: none → 0.26.0
Changed in mir:
milestone: 0.26.0 → none
Changed in mir:
assignee: nobody → Chris Halse Rogers (raof)
status: Triaged → In Progress
milestone: none → 0.28.0
Revision history for this message
Mir CI Bot (mir-ci-bot) wrote :

Fix committed into lp:mir at revision None, scheduled for release in mir, milestone 1.0.0

Changed in mir:
status: In Progress → Fix Committed
Changed in mir:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.