Code review comment for ~rafaeldtinoco/ubuntu/+source/qemu:lp1805256-bionic

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hey Dann,

This, per se, is not a change backed by an upstream change (not an entire change at least). The diff brought by you is part of a bigger commit:

----
Message-Id: <email address hidden>
Signed-off-by: Fam Zheng <email address hidden>
Signed-off-by: Rafael David Tinoco <email address hidden>

Author: Paolo Bonzini <email address hidden>
Origin: upstream, https://github.com/qemu/qemu/commit/b37548fcd
Bug: https://launchpad.net/bugs/1805256
Bug-Ubuntu: https://launchpad.net/bugs/1805256
Reviewed-By: Rafael David Tinoco <email address hidden>
Last-Update: 2020-05-07
----

also added, by me, to the Bionic SRU, just like all further SRUs, with the following backport note:

    [Rafael David Tinoco]

    I have cherry-picked this fix because of the AIO primitives fix
    dependency, nevertheless, it is good to notice it fixes the case
    above as well. Per se it would be difficult to SRU this, but,
    since we're stressing AIO mechanism with the other Aarch64 needed
    fix, this one fits the purpose as well.

and the filename:

debian/patches/ubuntu/lp-1805256-aio-Do-aio_notify_accept-only-during-blocking.patch

containing:

https://paste.ubuntu.com/p/hFb3kzJx3f/

and the *most important* part of the code change you referred to.

----

The thing is.. changes to AIO code can make the issue to appear or disappear and/or make it more or less intermittent than it was, depending on how the cache line is being used by the binary object during its execution (specially in the case where the 2 variables that need syncing are sharing the same cache line <- which is hardly ever the truth).

It's hard to consider something fixed because of a change in binaries not backed by upstream in this case (remember when we added a local - to a function - variable and caused the issue to temporarily disappear because of diff cache lines having the 2 addresses that had to be synced ?).

Looks like there are no available TX2 machines for me to play with: I provisioned one and after a bit of work there (couldn't reproduce) I realized it wasn't a TX2 =\. I'll keep it there so if I can get a TX2 this week I can just move my files from one machine to another.

----

Current Status in my POV:

- Either we fixed the issue and the patch above is causing the same regression for the version we have in Bionic (which seems less likely)

- Or the issue is still there but being reproduced in a few builds only (like the one we have in my PPA) and not in others (like the one you had without the patch above).

----

Id like to have access to TX2 machine so I can test the same build with and without commit
lp-1805256-aio-Do-aio_notify_accept-only-during-blocking.patch in Bionic and check if I can reproduce the issue and also do some objdumps in qemu.

It could be the case that we did not fix any version but we can only reproduce the issue in Bionic, for some reason, for example. Then I would have to get back to Paolo showing that by using a dump analysis to back us up.

Thoughts ?

« Back to merge proposal