~ubuntu-support-team/binutils/+git/binutils-gdb:users/palves/detach-step-over

Last commit made on 2021-01-13
Get this branch:
git clone -b users/palves/detach-step-over https://git.launchpad.net/~ubuntu-support-team/binutils/+git/binutils-gdb

Branch merges

Branch information

Name:
users/palves/detach-step-over
Repository:
lp:~ubuntu-support-team/binutils/+git/binutils-gdb

Recent commits

b5d7ccb... by Pedro Alves <email address hidden>

Testcase for detaching while stepping over breakpoint

This adds a testcase that exercises detaching while GDB is stepping
over a breakpoint, in all combinations of:

  - maint target non-stop off/on
  - set non-stop on/off
  - displaced stepping on/off

This exercises the bugs fixed in the previous 8 patches.

gdb/testsuite/ChangeLog:

 * gdb.threads/detach-step-over.c: New file.
 * gdb.threads/detach-step-over.exp: New file.

350909b... by Pedro Alves <email address hidden>

detach in all-stop with threads running

A following patch will add a testcase that has a number of threads
constantly stepping over a breakpoint, and then has GDB detach the
process, while threads are running. If we have more then one inferior
running, and we detach from just one of the inferiors, we expect that
the remaining inferior continues running. However, in all-stop, if
GDB needs to pause the target for the detach, nothing is re-resuming
the other inferiors after the detach. "info threads" shows the
threads as running, but they really aren't. This fixes it.

gdb/ChangeLog:

 * infcmd.c (detach_command): Hold strong reference to target, and
 if all-stop on entry, restart threads on exit.
 * infrun.c (switch_back_to_stepped_thread): Factor out bits to ...
 (restart_stepped_thread): ... this new function. Also handle
 trap_expected.
 (restart_after_all_stop_detach): New function.
 * infrun.h (restart_after_all_stop_detach): Declare.

4874985... by Pedro Alves <email address hidden>

detach with in-line step over in progress

A following patch will add a testcase that has a number of threads
constantly stepping over a breakpoint, and then has GDB detach the
process. That testcase exercises both "set displaced-stepping
on/off". Testing with "set displaced-stepping off" reveals that GDB
does not handle the case of the user typing "detach" just while some
thread is in the middle of an in-line step over. If that thread
belongs to the inferior that is being detached, then the step-over
never finishes, and threads of other inferiors are never re-resumed.
This fixes it.

gdb/ChangeLog:

 * infrun.c (struct step_over_info): Initialize fields.
 (prepare_for_detach): Handle ongoing in-line step over.

bae1a66... by Pedro Alves <email address hidden>

detach and breakpoint removal

A following patch will add a testcase that has a number of threads
constantly stepping over a breakpoint, and then has GDB detach the
process. That testcase sometimes fails with the inferior crashing
with SIGTRAP after the detach because of the bug fixed by this patch,
when tested with the native target.

The problem is that target_detach removes breakpoints from the target
immediately, and that does not work with the native GNU/Linux target
(and probably no other native target) currently. The test wouldn't
fail with this issue when testing against gdbserver, because gdbserver
does allow accessing memory while the current thread is running, by
transparently pausing all threads temporarily, without GDB noticing.
Implementing that in gdbserver was a lot of work, so I'm not looking
forward right now to do the same in the native target. Instead, I
came up with a simpler solution -- push the breakpoints removal down
to the targets. The Linux target conveniently already pauses all
threads before detaching them, since PTRACE_DETACH only works with
stopped threads, so we move removing breakpoints to after that. Only
the remote and GNU/Linux targets support support async execution, so
no other target should really need this.

gdb/ChangeLog:

 * linux-nat.c (linux_nat_target::detach): Remove breakpoints
 here...
 * remote.c (remote_target::remote_detach_1): ... and here ...
 * target.c (target_detach): ... instead of here.

80d8b6d... by Pedro Alves <email address hidden>

prepare_for_detach and ongoing displaced stepping

I noticed that "detach" while a program was running sometimes resulted
in the process crashing. I tracked it down to this change to
prepare_for_detach in commit 187b041e ("gdb: move displaced stepping
logic to gdbarch, allow starting concurrent displaced steps"):

    /* Is any thread of this process displaced stepping? If not,
       there's nothing else to do. */
 - if (displaced->step_thread == nullptr)
 + if (displaced_step_in_progress (inf))
      return;

The problem above is that the condition was inadvertently flipped. It
should have been:

   if (!displaced_step_in_progress (inf))

So I fixed it, and wrote a testcase to exercise it. The testcase has
a number of threads constantly stepping over a breakpoint, and then
GDB detaches the process, while threads are running and stepping over
the breakpoint. And then I was surprised that my testcase would hang
-- GDB would get stuck in an infinite loop in prepare_for_detach,
here:

  while (displaced_step_in_progress (inf))
    {
      ...

What is going on is that since we now have two displaced stepping
buffers, as one displaced step finishes, GDB starts another, and
there's another one already in progress, and on and on, so the
displaced_step_in_progress condition never turns false. This happens
because we go via the whole handle_inferior_event, which tries to
start new step overs when one finishes. And also because while we
remove breakpoints from the target before prepare_for_detach is
called, handle_inferior_event ends up calling insert_breakpoints via
e.g. keep_going.

Thinking through all this, I came to the conclusion that going through
the whole handle_inferior_event isn't ideal. A _lot_ is done by that
function, e.g., some thread may get a signal which is passed to the
inferior, and gdb decides to try to get over the signal handler, which
reinstalls breakpoints. Or some process may exit. We can end up
reporting these events via normal_stop while detaching, maybe end up
running some breakpoint commands, or maybe even something runs an
inferior function call. Etc. All this after the user has already
declared they don't want to debug the process anymore, by asking to
detach.

I came to the conclusion that it's better to do the minimal amount of
work possible, in a more controlled fashion, without going through
handle_inferior_event. So in the new approach implemented by this
patch, if there are threads of the inferior that we're detaching in
the middle of a displaced step, stop them, and cancel the displaced
step. This is basically what stop_all_threads already does, via
wait_one and (the now factored out) handle_one, so I'm reusing those.

gdb/ChangeLog:

 * infrun.c (struct wait_one_event): Move higher up.
 (prepare_for_detach): Abort in-progress displaced steps instead of
 letting them complete.
 (handle_one): If the inferior is detaching, don't add the thread
 back to the global step-over chain.
 (restart_threads): Don't restart threads if detaching.
 (handle_signal_stop): Remove inferior::detaching reference.

0dc1231... by Pedro Alves <email address hidden>

prepare_for_detach: don't release scoped_restore at the end

After detaching from a process, the inf->detaching flag is
inadvertently left set to true. If you afterwards reuse the same
inferior to start a new process, GDB will mishave...

The problem is that prepare_for_detach discards the scoped_restore at
the end, while the intention is for the flag to be set only for the
duration of prepare_for_detach.

This was already a bug in the original commit that added
prepare_for_detach, commit 24291992dac3 ("PR gdb/11321"), by yours
truly. Back then, we still used cleanups, and the function called
discard_cleanups instead of do_cleanups, by mistake.

gdb/ChangeLog:

 * infrun.c (prepare_for_detach): Don't release scoped_restore at
 the end.

f49e903... by Pedro Alves <email address hidden>

Factor out after-stop event handling code from stop_all_threads

This moves the code handling an event out of wait_one to a separate
function, to be used in another context in a following patch.

gdb/ChangeLog:

 * infrun.c (handle_one): New function, factored out from ...
 (stop_all_threads): ... here.

5e78777... by Pedro Alves <email address hidden>

gdbserver: spurious SIGTRAP w/ detach while step-over in progress

A following patch will add a new testcase that has two processes, each
with a number of threads constantly tripping a breakpoint and stepping
over it, because the breakpoint has a condition that evals false.
Then GDB detaches from one of the processes, while both processes are
running. And then the testcase sends a SIGUSR1 to the other process.

When run against gdbserver, that would occasionaly fail like this:

 (gdb) PASS: gdb.threads/detach-step-over.exp: iter 1: detach
 Executing on target: kill -SIGUSR1 208303 (timeout = 300)
 spawn -ignore SIGHUP kill -SIGUSR1 208303

 Thread 2.5 "detach-step-ove" received signal SIGTRAP, Trace/breakpoint trap.
 [Switching to Thread 208303.208305]
 0x000055555555522a in thread_func (arg=0x0) at /home/pedro/gdb/binutils-gdb/src/gdb/testsuite/gdb.threads/detach-step-over.c:54
 54 counter++; /* Set breakpoint here. */

Note that it's gdbserver itself that steps over the breakpoint.

The gdbserver logs reveal what happened:

 - GDB manages to detach while a step over is in progress. That reaches
   linux_process_target::complete_ongoing_step_over(), which does:

      /* Passing NULL_PTID as filter indicates we want all events to
  be left pending. Eventually this returns when there are no
  unwaited-for children left. */
      ret = wait_for_event_filtered (minus_one_ptid, null_ptid, &wstat,
         __WALL);

   As the comment say, this leaves all events pending, _including_ the
   just finished step SIGTRAP. We never discard that SIGTRAP. So
   GDBserver reports the SIGTRAP to GDB. GDB can't explain the
   SIGTRAP, so it reports it to the user.

The gdbserver log looks like this. The LWP of interest is 208305:

 Need step over [LWP 208305]? yes, found breakpoint at 0x555555555227
 proceed_all_lwps: found thread 208305 needing a step-over
 Starting step-over on LWP 208305. Stopping all threads

208305 starts a step-over.

 >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
 stop_all_lwps (stop-and-suspend, except=LWP 208303.208305)
 Sending sigstop to lwp 208303
 Sending sigstop to lwp 207755
 wait_for_sigstop: pulling events
 LWFE: waitpid(-1, ...) returned 207755, ERRNO-OK
 LLW: waitpid 207755 received Stopped (signal) (stopped)
 pc is 0x7f7e045593bf
 Expected stop.
 LLW: SIGSTOP caught for LWP 207755.207755 while stopping threads.
 LWFE: waitpid(-1, ...) returned 208303, ERRNO-OK
 LLW: waitpid 208303 received Stopped (signal) (stopped)
 pc is 0x7ffff7e743bf
 Expected stop.
 LLW: SIGSTOP caught for LWP 208303.208303 while stopping threads.
 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
 leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
 leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
 LLW: exit (no unwaited-for LWP)
 stop_all_lwps done, setting stopping_threads back to !stopping
 <<<< exiting void linux_process_target::stop_all_lwps(int, lwp_info*)
 Done stopping all threads for step-over.
 pc is 0x555555555227
 Writing 8b to 0x555555555227 in process 208305
 Could not findsigchld_handler
  fast tracepoint jump at 0x555555555227 in list (uninserting).
   pending reinsert at 0x555555555227
   step from pc 0x555555555227
 Resuming lwp 208305 (step, signal 0, stop expected)
 <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)
 handling possible serial event
 getpkt ("D;32b8b"); [no ack sent]

The detach request arrives.

 sigchld_handler
 Tracing is already off, ignoring
 detach: step over in progress, finish it first

gdbserver realizes a step over for 208305 was in progress, let's it
finish.

 LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
 LLW: waitpid 208305 received Stopped (signal) (stopped)
 pc is 0x555555555227
 Expected stop.
 LLW: step LWP 208303.208305, 0, 0 (discard delayed SIGSTOP)
   pending reinsert at 0x555555555227
   step from pc 0x555555555227
 Resuming lwp 208305 (step, signal 0, stop not expected)
 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
 leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
 leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
 sigsuspend'ing
 LWFE: waitpid(-1, ...) returned 208305, ERRNO-OK
 LLW: waitpid 208305 received Trace/breakpoint trap (stopped)
 pc is 0x55555555522a
 CSBB: LWP 208303.208305 stopped by trace
 LWFE: waitpid(-1, ...) returned 0, ERRNO-OK
 leader_pid=208303, leader_lp!=NULL=1, num_lwps=11, zombie=0
 leader_pid=207755, leader_lp!=NULL=1, num_lwps=11, zombie=0
 LLW: exit (no unwaited-for LWP)
 Finished step over.

The step-over for 208305 finishes.

 Writing cc to 0x555555555227 in process 208305
 Could not find fast tracepoint jump at 0x555555555227 in list (reinserting).
 >>>> entering void linux_process_target::stop_all_lwps(int, lwp_info*)
 stop_all_lwps (stop, except=none)
 wait_for_sigstop: pulling events

The detach proceeds (snipped).

...

 proceed_one_lwp: lwp 208305
    LWP 208305 has pending status, leaving stopped

Later on, 208305 has a pending status (the step SIGTRAP from the
step-over), so GDBserver starts the process of reporting it.

...

 wait_1 ret = LWP 208303.208305, 1, 5
 <<<< exiting ptid_t linux_process_target::wait_1(ptid_t, target_waitstatus*, target_wait_flags)

...

and eventually GDB receives the stop notification (T05 == SIGTRAP):

 getpkt ("vStopped"); [no ack sent]
 sigchld_handler
 vStopped: acking 3
 Writing resume reply for LWP 208303.208305:1
 putpkt ("$T0506:f0ee58f7ff7f0* ;07:f0ee58f7ff7f0* ;10:2a525*"550* ;thread:p32daf.32db1;core:c;#37"); [noack mode]

From the GDB side, we see:

 [infrun] fetch_inferior_event: enter
   [infrun] fetch_inferior_event: fetch_inferior_event enter
   [infrun] do_target_wait: Found 2 inferiors, starting at #1
   [infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
   [infrun] print_target_wait_results: 208303.208305.0 [Thread 208303.208305],
   [infrun] print_target_wait_results: status->kind = stopped, signal = GDB_SIGNAL_TRAP
   [infrun] handle_inferior_event: status->kind = stopped, signal = GDB_SIGNAL_TRAP
   [infrun] start_step_over: enter
     [infrun] start_step_over: stealing global queue of threads to step, length = 6
     [infrun] operator(): putting back 6 threads to step in global queue
   [infrun] start_step_over: exit
   [infrun] handle_signal_stop: context switch
   [infrun] context_switch: Switching context from process 0 to Thread 208303.208305
   [infrun] handle_signal_stop: stop_pc=0x55555555522a
   [infrun] handle_signal_stop: random signal (GDB_SIGNAL_TRAP)
   [infrun] stop_waiting: stop_waiting
   [infrun] stop_all_threads: starting

The fix is to discard the step SIGTRAP, unless GDB wanted the thread
to step.

gdbserver/ChangeLog:

 * linux-low.cc (linux_process_target::complete_ongoing_step_over):
 Discard step SIGTRAP, unless GDB wanted the thread to step.

8189abc... by Pedro Alves <email address hidden>

Fix a couple vStopped pending ack bugs

A following patch will add a testcase that has two processes with
threads stepping over a breakpoint continuously, and then detaches
from one of the processes while threads are running. The other
process continues stepping over its breakpoint. And then the testcase
sends a SIGUSR1, expecting that GDB reports it. That would sometimes
hang against gdbserver, due to the bugs fixed here. Both bugs are
related, in that they're about remote protocol asynchronous Stop
notifications. There's a bug in GDB, and another in GDBserver.

The GDB bug:

- when we detach from a process, the remote target discards any
  pending RSP notification related to that process, including the
  in-flight, yet-unacked notification. Discarding the in-flight
  notification is the problem. Until the in-flight notification is
  acked with a vStopped packet, the server won't send another %Stop
  notification. As a result, the debug session gets messed up. In
  the new testcase's case, GDB would hang inside stop_all_threads,
  waiting for a stop for one of the process'es threads, which never
  arrived -- its stop reply was permanently stuck in the stop reply
  queue, waiting for a vStopped packet that never arrived.

The GDBserver bug:

  GDBserver has the opposite bug. It also discards notifications for
  the process being detached. If that discards the head of the
  notification queue, when gdb sends an ack, it ends up acking the
  _next_ notification. Meaning, gdb loses one notification. In the
  testcase, this results in a similar hang in stop_all_threads.

So we have two very similar bugs in GDB and GDBserver, both resulting
in a similar symptom. That's why I'm fixing them both at the same
time.

gdb/ChangeLog:

 * remote.c (remote_notif_stop_ack): Don't error out on
 TARGET_WAITKIND_IGNORE; instead, just ignore the notification.
 (remote_target::discard_pending_stop_replies): Don't delete
 in-flight notification; instead, clear its contents.

gdbserver/ChangeLog:

 * server.cc (discard_queued_stop_replies): Don't ever discard the
 notification at the head of the list.

87183c0... by Pedro Alves <email address hidden>

Testcase for attaching in non-stop mode

This adds a testcase exercising attaching to a multi-threaded process,
in all combinations of:

  - set non-stop on/off
  - maint target non-stop off/on
  - "attach" vs "attach &"

This exercises the bugs fixed in the two previous patches.

gdb/testsuite/ChangeLog:

 * gdb.threads/attach-non-stop.c: New file.
 * gdb.threads/attach-non-stop.exp: New file.