Comment 21 for bug 1994002

Revision history for this message
Mauricio Faria de Oliveira (mfo) wrote :

These are the steps for a synthetic reproducer with GDB and QEMU.

It's sufficient to validate the change on QEMU and move forward with SRUs.
(We can add libvirt for reaching the error message too, but it's a plus.)

I'll check/coordinate on the QEMU uploads next week.

cheers,
Mauricio

...

Original code.

The race condition is, one thread can read 's->state' as SETUP (line 1078/1083/1086),
and another thread changes it to ACTIVE _before_ this thread assigns (the different)
's->state' to 'info->status' (line 1123), which will NOT have RAM info expected for
the ACTIVE status (line 1087/1096).

 1056 static void fill_source_migration_info(MigrationInfo *info)
 1057 {
 1058 MigrationState *s = migrate_get_current();
 ...
 1078 switch (s->state) {
 ...
 1083 case MIGRATION_STATUS_SETUP:
 ...
 1086 break;
 1087 case MIGRATION_STATUS_ACTIVE:
 ...
 1096 populate_ram_info(info, s);
 ...
 1123 info->status = s->state;
 1124 }

We'll break on this function that changes migration states.

 void migrate_set_state(int *state, int old_state, int new_state)

Terminal 1)

QEMU process for incoming migration (ie, destination)

qemu-system-x86_64 -nodefaults -nographic -S -incoming tcp:0:4444

Terminal 2)

QEMU process for outgoing migration (ie, source)
with QEMU monitor on port 3333,
with GDB in non-stop threads mode
(and debuginfod for debug symbols),

gdb \
  -ex 'set non-stop on' -ex 'set pagination off' -ex 'set confirm off' \
  -iex 'set debuginfod enabled on' -iex 'set debuginfod urls "https://debuginfod.ubuntu.com"' \
  qemu-system-x86_64

Set breakpoints on the functions above
(and line number of the 'break' statement under 'case MIGRATION_STATUS_SETUP'):

b migrate_set_state
b migration/migration.c:1086

run -nodefaults -nographic -S -monitor tcp:0:3333,server,wait=off
...

Terminal 3)

Connect to QEMU monitor with netcat,
and start the the migration:

nc 127.0.0.1 3333
...
(qemu) migrate -d tcp:127.0.0.1:4444

Terminal 2)

GDB breaks on change from migration status NONE to SETUP.

Thread 1 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=0, new_state=1) at ../../migration/migration.c:1746

(gdb) p (MigrationStatus) 0
$1 = MIGRATION_STATUS_NONE

(gdb) p (MigrationStatus) 1
$2 = MIGRATION_STATUS_SETUP

Just continue.

(gdb) c

GDB breaks on change from migration status SETUP to ACTIVE.

Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=1, new_state=4) at ../../migration/migration.c:1746

(gdb) p (MigrationStatus) 1
$2 = MIGRATION_STATUS_SETUP

(gdb) p (MigrationStatus) 4
$3 = MIGRATION_STATUS_ACTIVE

Let's explore the race condition.

Terminal 2)

Check the migration information:

(qemu) info migrate

Terminal 3)

GDB breaks on the migration information function.

The status is now observed as SETUP (not yet ACTIVE),
and is not yet propagated to the migration information
to be returned to the monitor.

Thread 1 "qemu-system-x86" hit Breakpoint 2, fill_source_migration_info (info=0x555556d65c70) at ../../migration/migration.c:1086

(gdb) p (MigrationStatus) s.state
$4 = MIGRATION_STATUS_SETUP

(gdb) p info.status
$5 = MIGRATION_STATUS_NONE

Now, allow the other thread to continue and change status to ACTIVE.

(gdb) info threads
  Id Target Id Frame
* 1 Thread 0x7ffff6c21e80 (LWP 995) "qemu-system-x86" fill_source_migration_info (info=0x555556d65c70) at ../../migration/migration.c:1123
  2 Thread 0x7ffff65ff640 (LWP 998) "qemu-system-x86" (running)
  3 Thread 0x7ffff5d7c640 (LWP 999) "qemu-system-x86" (running)
  5 Thread 0x7ffff49ff640 (LWP 1002) "qemu-system-x86" migrate_set_state (state=0x5555566f8628, old_state=1, new_state=4) at ../../migration/migration.c:1746

(gdb) thread 5
[Switching to thread 5 (Thread 0x7ffff49ff640 (LWP 1002))]
#0 migrate_set_state (state=0x5555566f8628, old_state=1, new_state=4) at ../../migration/migration.c:1746
1746 in ../../migration/migration.c

(gdb) continue &
Continuing.

The first thread now observes the ACTIVE status.

(gdb) info threads
  Id Target Id Frame
  1 Thread 0x7ffff6c21e80 (LWP 995) "qemu-system-x86" fill_source_migration_info (info=0x555556d65c70) at ../../migration/migration.c:1123
  2 Thread 0x7ffff65ff640 (LWP 998) "qemu-system-x86" (running)
  3 Thread 0x7ffff5d7c640 (LWP 999) "qemu-system-x86" (running)
* 5 Thread 0x7ffff49ff640 (LWP 1002) "qemu-system-x86" (running)
(gdb)

(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff6c21e80 (LWP 995))]
#0 fill_source_migration_info (info=0x555556d65c70) at ../../migration/migration.c:1123
1123 in ../../migration/migration.c

(gdb) p (MigrationStatus) s.state
$7 = MIGRATION_STATUS_ACTIVE

Done, the next statement in the code assigns the ACTIVE
status to the migration information, but it did NOT add
any RAM statistics (as it executed for the SETUP status).

Just continue, and check the resulting migration info:

(gdb) c

Terminal 2)
... info migrate
...
Migration status: active
total time: 0 ms
(qemu)

Migration status is active, without any RAM statistics.

...
...
...

Note that, normally, without exploring the race condition,
the status is 'active' and there _is_ memory statistics:

[just a summary from gdb/monitor steps]

...
(qemu) migrate -d tcp:127.0.0.1:4444

...
Thread 1 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=0, new_state=1) at ../../migration/migration.c:1746
(gdb) c
Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=1, new_state=4) at ../../migration/migration.c:1746
(gdb) thread 5
(gdb) c
Thread 5 "qemu-system-x86" hit Breakpoint 1, migrate_set_state (state=0x5555566f8628, old_state=4, new_state=8) at ../../migration/migration.c:1746

(qemu) info migrate
...
Migration status: active
total time: 41387 ms
expected downtime: 300 ms
setup: 22590 ms
transferred ram: 485 kbytes
throughput: 0.18 mbps
remaining ram: 0 kbytes
total ram: 131592 kbytes
duplicate: 32849 pages
skipped: 0 pages
normal: 49 pages
normal bytes: 196 kbytes
dirty sync count: 3
page size: 4 kbytes
multifd bytes: 0 kbytes
pages-per-second: 1453
(qemu)

(gdb) c
...