~ubuntu-support-team/binutils/+git/binutils-gdb:users/aburgess/bp-inferior-calls

Last commit made on 2023-01-25
Get this branch:
git clone -b users/aburgess/bp-inferior-calls https://git.launchpad.net/~ubuntu-support-team/binutils/+git/binutils-gdb

Branch merges

Branch information

Name:
users/aburgess/bp-inferior-calls
Repository:
lp:~ubuntu-support-team/binutils/+git/binutils-gdb

Recent commits

3c2dc6d... by Andrew Burgess <email address hidden>

gdb: rename unwindonsignal to unwind-on-signal

We now have unwind-on-timeout and unwind-on-terminating-exception, and
then the odd one out unwindonsignal.

I'm not a great fan of these squashed together command names, so in
this commit I propose renaming this to unwind-on-signal.

Obviously I've added the hidden alias unwindonsignal so any existing
GDB scripts will keep working.

There's one test that I've extended to test the alias works, but in
most of the other test scripts I've changed over to use the new name.

The docs are updated to reference the new name.

c7d08e6... by Andrew Burgess <email address hidden>

gdb: introduce unwind-on-timeout setting

Now that inferior function calls can timeout (see the recent
introduction of direct-call-timeout and indirect-call-timeout), this
commit adds a new setting unwind-on-timeout.

This new setting is just like the existing unwindonsignal and
unwind-on-terminating-exception, but the new setting will cause GDB to
unwind the stack if an inferior function call times out.

The existing inferior function call timeout tests have been updated to
cover the new setting.

7a8cf75... by Andrew Burgess <email address hidden>

gdb/remote: avoid SIGINT after calling remote_target::stop

Currently, if the remote target is not running in non-stop mode, then,
when GDB calls remote_target::stop, we end up sending an interrupt
packet \x03 to the remote target.

If the user interrupts the inferior from the GDB prompt (e.g. by
typing Ctrl-c), then GDB calls remote_target::interrupt, which also
ends up sending the interrupt packet.

The problem here is that both of these mechanisms end up sending the
interrupt packet, which means, when the target stops with a SIGINT,
and this is reported back to GDB, we have no choice but to report this
to the user as a SIGINT stop event.

Now maybe this is the correct thing to do, after all the target has
been stopped with SIGINT. However, this leads to an unfortunate
change in behaviour.

When running in non-stop mode, and remote_target::stop is called, the
target will be stopped with a vCont packet, and this stop is then
reported back to GDB as GDB_SIGNAL_0, this will cause GDB to print a
message like:

  Program stopped.

Or:

  Thread NN "binary name" stopped.

In contrast, when non-stop mode is off, we get messages like:

  Program received SIGINT, Segmentation fault.

Or:

  Thread NN "binary name" received SIGINT, Segmentation fault.

In this commit I propose a mechanism where we can track that a stop
has been requested for a particular thread through
remote_target::stop, then, when the stop arrives, we can convert the
SIGINT to a GDB_SIGNAL_0. With this done GDB will now display the
"stopped" based messages rather than the "received SIGINT" messages.

Two of the tests added in the previous commit exposed this issue. In
the previous commit the tests looked for either of the above
patterns. In this commit I've updated these tests to only look for
the "stopped" based messages.

8e62567... by Andrew Burgess <email address hidden>

gdb: add timeouts for inferior function calls

In the previous commits I have been working on improving inferior
function call support. One thing that worries me about using inferior
function calls from a conditional breakpoint is: what happens if the
inferior function call fails?

If the failure is obvious, e.g. the thread performing the call
crashes, or hits a breakpoint, then this case is already well handled,
and the error is reported to the user.

But what if the thread performing the inferior call just deadlocks?
If the user made the call from a 'print' or 'call' command, then the
user might have some expectation of when the function call should
complete, and, when this time limit is exceeded, the user
will (hopefully) interrupt GDB and regain control of the debug
session.

But, when the inferior function call is from a breakpoint condition it
is much harder to understand that GDB is deadlocked within an inferior
call. Maybe the breakpoint hasn't been hit yet? Or maybe the
condition was always false? Or maybe GDB is deadlocked in an inferior
call? The only way to know for sure is to periodically interrupt GDB,
check on all the threads, and then continue.

Additionally, the focus of the previous commit was inferior function
calls, from a conditional breakpoint, in a multi-threaded inferior.
This opens up a whole new set of potential failure conditions. For
example, what if the function called relies on interaction with some
other thread, and the other thread crashes? Or hits a breakpoint?
Given how inferior function calls work - in a synchronous manor, a
stop event in some other thread is going to be ignored when the
inferior function call is being done as part of a breakpoint
condition, and this means that GDB could get stuck waiting for the
original condition thread, which will now never complete.

In this commit I propose a solution to this problem. A timeout. For
targets that support async-mode we can install an event-loop timer
before starting the inferior function call. When the timer expires we
will stop the thread performing the inferior function call. With this
mechanism in place a user can be sure that any inferior call they make
will either complete, or timeout eventually.

Adding a timer like this is obviously a change in behaviour for the
more common 'call' and 'print' uses of inferior function calls, so, in
this patch, I propose having two different timers. One I call the
'direct-call-timeout', which is used for 'call' and 'print' commands.
This timeout is by default set to unlimited, which, not surprisingly,
means there is no timeout in place.

A second timer, which I've called 'indirect-call-timeout', is used for
inferior function calls from breakpoint conditions. This timeout has
a default value of 300 seconds. This is still a pretty substantial
time to be waiting for a single inferior call to complete, but I
didn't want to be too aggressive with the value I selected. A user
can, of course, still use Ctrl-c to interrupt an inferior function
call, but this limit will ensure that GDB will stop at some point.

The new commands added by this commit are:

  set direct-call-timeout SECONDS
  show direct-call-timeout
  set indirect-call-timeout SECONDS
  show indirect-call-timeout

These new timeouts do depend on async-mode, so, if async-mode is
disabled (maint set target-async off), or not supported (e.g. target
sim), then the timeout is treated as unlimited (that is, no timeout is
set).

For targets that "fake" non-async mode, e.g. Linux native, where
non-async mode is really just async mode, but then we park the target
in a sissuspend, we could easily fix things so that the timeouts still
work, however, for targets that really are not async aware, like the
simulator, fixing things so that timeouts work correctly would be a
much bigger task - that effort would be better spent just making the
target async-aware. And so, I'm happy for now that this feature will
only work on async targets.

The two new show commands will display slightly different text if the
current target is a non-async target, which should allow users to
understand what's going on.

There's a somewhat random test adjustment needed in gdb.base/help.exp,
the test uses a regexp with the apropos command, and expects to find a
single result. Turns out the new settings I added also matched the
regexp, which broke the test. I've updated the regexp a little to
exclude my new settings.

a65fc33... by Andrew Burgess <email address hidden>

gdb: fix b/p conditions with infcalls in multi-threaded inferiors

This commit fixes bug PR 28942, that is, creating a conditional
breakpoint in a multi-threaded inferior, where the breakpoint
condition includes an inferior function call.

Currently, when a user tries to create such a breakpoint, then GDB
will fail with:

  (gdb) break infcall-from-bp-cond-single.c:61 if (return_true ())
  Breakpoint 2 at 0x4011fa: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c, line 61.
  (gdb) continue
  Continuing.
  [New Thread 0x7ffff7c5d700 (LWP 2460150)]
  [New Thread 0x7ffff745c700 (LWP 2460151)]
  [New Thread 0x7ffff6c5b700 (LWP 2460152)]
  [New Thread 0x7ffff645a700 (LWP 2460153)]
  [New Thread 0x7ffff5c59700 (LWP 2460154)]
  Error in testing breakpoint condition:
  Couldn't get registers: No such process.
  An error occurred while in a function called from GDB.
  Evaluation of the expression containing the function
  (return_true) will be abandoned.
  When the function is done executing, GDB will silently stop.
  Selected thread is running.
  (gdb)

Or, in some cases, like this:

  (gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg, 1))
  Breakpoint 2 at 0x401194: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c, line 56.
  (gdb) continue
  Continuing.
  [New Thread 0x7ffff7c5d700 (LWP 2461106)]
  [New Thread 0x7ffff745c700 (LWP 2461107)]
  ../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.
  A problem internal to GDB has been detected,
  further debugging may prove unreliable.

The precise error depends on the exact thread state; so there's race
conditions depending on which threads have fully started, and which
have not. But the underlying problem is always the same; when GDB
tries to execute the inferior function call from within the breakpoint
condition, GDB will, incorrectly, try to resume threads that are
already running - GDB doesn't realise that some threads might already
be running.

The solution proposed in this patch requires an additional member
variable thread_info::in_cond_eval. This flag is set to true (in
breakpoint.c) when GDB is evaluating a breakpoint condition.

In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is
true, then GDB will only try to resume the current thread, that is,
the thread for which the breakpoint condition is being evaluated.
This solves the problem of GDB trying to resume threads that are
already running.

The next problem is that inferior function calls are assumed to be
synchronous, that is, GDB doesn't expect to start an inferior function
call in thread #1, then receive a stop from thread #2 for some other,
unrelated reason. To prevent GDB responding to an event from another
thread, we update fetch_inferior_event and do_target_wait in infrun.c,
so that, when an inferior function call (on behalf of a breakpoint
condition) is in progress, we only wait for events from the current
thread (the one evaluating the condition).

The fix in do_target_wait is because previously, we only ever waited
for the general any-thread, minus_one_ptid, for which matching
against the inferior::pid would always succeed. However, now we might
wait against a specific ptid value, in which case we need to ensure we
only compare the pid part of the ptid.

In fetch_inferior_event, after receiving the event, we only want to
stop all the other threads, and call inferior_event_handler with
INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint.
If we are, then all the other threads should be left doing whatever
they were before. The inferior_event_handler call will be performed
once the breakpoint condition has finished being evaluated, and GDB
decides to stop or not.

The final problem that needs solving relates to GDB's commit-resume
mechanism, this allows GDB to collect resume requests into a single
packet in order to reduce traffic to a remote target.

The problem is that the commit-resume mechanism will not send any
resume requests for an inferior if there are already events pending on
the GDB side.

Imagine an inferior with two threads. Both threads hit a breakpoint,
maybe the same conditional breakpoint. At this point there are two
pending events, one for each thread.

GDB selects one of the events and spots that this is a conditional
breakpoint, GDB evaluates the condition.

The condition includes an inferior function call, so GDB sets up for
the call and resumes the one thread, the resume request is added to
the commit-resume queue.

When the commit-resume queue is committed GDB sees that there is a
pending event from another thread, and so doesn't send any resume
requests to the actual target, GDB is assuming that when we wait we
will select the event from the other thread.

However, as this is an inferior function call for a condition
evaluation, we will not select the event from the other thread, we
only care about events from the thread that is evaluating the
condition - and the resume for this thread was never sent to the
target.

And so, GDB hangs, waiting for an event from a thread that was never
fully resumed.

To fix this issue I have added the concept of "forcing" the
commit-resume queue. When enabling commit resume, if the force flag
is true, then any resumes will be committed to the target, even if
there are other threads with pending events.

A note on authorship: this patch was based on some work done by
Natalia Saiapova and Tankut Baris Aktemur from Intel[1]. I have made
some changes to their work in this version.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942

[1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html

Co-authored-by: Natalia Saiapova <email address hidden>
Co-authored-by: Tankut Baris Aktemur <email address hidden>

ffc275f... by Andrew Burgess <email address hidden>

Revert "gdb: remove unnecessary parameter wait_ptid from do_target_wait"

This reverts commit ac0d67ed1dcf470bad6a3bc4800c2ddc9bedecca.

There was nothing wrong with the commit which I'm reverting here, but
it removed some functionality that will be needed for a later commit;
that is, the ability for GDB to ask for events from a specific ptid_t
via the do_target_wait function.

In a follow up commit, this functionality will be used to implement
inferior function calls in multi-threaded inferiors.

9190dd3... by Andrew Burgess <email address hidden>

gdb: don't always print breakpoint location after failed condition check

Consider the following session:

  (gdb) list some_func
  1 int
  2 some_func ()
  3 {
  4 int *p = 0;
  5 return *p;
  6 }
  7
  8 void
  9 foo ()
  10 {
  (gdb) break foo if (some_func ())
  Breakpoint 1 at 0x40111e: file bpcond.c, line 11.
  (gdb) r
  Starting program: /tmp/bpcond

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000000401116 in some_func () at bpcond.c:5
  5 return *p;
  Error in testing condition for breakpoint 1:
  The program being debugged stopped while in a function called from GDB.
  Evaluation of the expression containing the function
  (some_func) will be abandoned.
  When the function is done executing, GDB will silently stop.

  Breakpoint 1, 0x0000000000401116 in some_func () at bpcond.c:5
  5 return *p;
  (gdb)

What happens here is the breakpoint condition includes a call to an
inferior function, and the inferior function segfaults. We can see
that GDB reports the segfault, and then gives an error message that
indicates that an inferior function call was interrupted.

After this GDB appears to report that it is stopped at Breakpoint 1,
inside some_func.

I find this second stop report a little confusing. Yes, GDB has
stopped as a result of hitting breakpoint 1, but, I think the message
as it currently is might give the impression that the thread is
actually stopped at a location of breakpoint 1, which is not the case.

Also, I find the second stop message draws attention away from
the "Program received signal SIGSEGV, Segmentation fault" stop
message, and this second stop might be thought of as replacing in
someway the earlier message.

In short, I think that the in the situation above, I think things
would be clearer if the second stop message were not reported at all,
so the output should (I think) look like this:

  (gdb) list some_func
  1 int
  2 some_func ()
  3 {
  4 int *p = 0;
  5 return *p;
  6 }
  7
  8 void
  9 foo ()
  10 {
  (gdb) break foo if (some_func ())
  Breakpoint 1 at 0x40111e: file bpcond.c, line 11.
  (gdb) r
  Starting program: /tmp/bpcond

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000000401116 in some_func () at bpcond.c:5
  5 return *p;
  Error in testing condition for breakpoint 1:
  The program being debugged stopped while in a function called from GDB.
  Evaluation of the expression containing the function
  (some_func) will be abandoned.
  When the function is done executing, GDB will silently stop.
  (gdb)

The user can still find the number of the breakpoint that triggered
the initial stop in this line:

  Error in testing condition for breakpoint 1:

But there's now only one stop reason reported, the SIGSEGV, which I
think is much clearer.

To achieve this change I set the bpstat::print field when:

  (a) a breakpoint condition evaluation failed, and

  (b) the $pc of the thread changed during condition evaluation.

I've updated the existing tests that checked the error message printed
when a breakpoint condition evaluation failed.

be99338... by Andrew Burgess <email address hidden>

gdb: avoid repeated signal reporting during failed conditional breakpoint

Consider the following case:

  (gdb) list some_func
  1 int
  2 some_func ()
  3 {
  4 int *p = 0;
  5 return *p;
  6 }
  7
  8 void
  9 foo ()
  10 {
  (gdb) break foo if (some_func ())
  Breakpoint 1 at 0x40111e: file bpcond.c, line 11.
  (gdb) r
  Starting program: /tmp/bpcond

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000000401116 in some_func () at bpcond.c:5
  5 return *p;
  Error in testing breakpoint condition:
  The program being debugged was signaled while in a function called from GDB.
  GDB remains in the frame where the signal was received.
  To change this behavior use "set unwindonsignal on".
  Evaluation of the expression containing the function
  (some_func) will be abandoned.
  When the function is done executing, GDB will silently stop.

  Program received signal SIGSEGV, Segmentation fault.

  Breakpoint 1, 0x0000000000401116 in some_func () at bpcond.c:5
  5 return *p;
  (gdb)

Notice that this line:

  Program received signal SIGSEGV, Segmentation fault.

Appears twice in the output. The first time is followed by the
current location. The second time is a little odd, why do we print
that?

Printing that line is controlled, in part, by a global variable,
stopped_by_random_signal. This variable is reset to zero in
handle_signal_stop, and is set if/when GDB figures out that the
inferior stopped due to some random signal.

The problem is, in our case, GDB first stops at the breakpoint for
foo, and enters handle_signal_stop and the stopped_by_random_signal
global is reset to 0.

Later within handle_signal_stop GDB calls bpstat_stop_status, it is
within this function (via bpstat_check_breakpoint_conditions) that the
breakpoint condition is checked, and, we end up calling the inferior
function (some_func in our example above).

In our case above the thread performing the inferior function call
segfaults in some_func. GDB catches the SIGSEGV and handles the stop,
this causes us to reenter handle_signal_stop. The global variable
stopped_by_random_signal is updated, this time it is set to true
because the thread stopped due to SIGSEGV. As a result of this we
print the first instance of the line (as seen above in the example).

Finally we unwind GDB's call stack, the inferior function call is
complete, and we return to the original handle_signal_stop. However,
the stopped_by_random_signal global is still carrying the value as
computed for the inferior function call's stop, which is why we now
print a second instance of the line, as seen in the example.

To prevent this, I propose adding a scoped_restore before we start an
inferior function call, this will save and restore the global
stopped_by_random_signal value.

With this done, the output from our example is now this:

 (gdb) list some_func
  1 int
  2 some_func ()
  3 {
  4 int *p = 0;
  5 return *p;
  6 }
  7
  8 void
  9 foo ()
  10 {
  (gdb) break foo if (some_func ())
  Breakpoint 1 at 0x40111e: file bpcond.c, line 11.
  (gdb) r
  Starting program: /tmp/bpcond

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000000401116 in some_func () at bpcond.c:5
  5 return *p;
  Error in testing condition for breakpoint 1:
  The program being debugged stopped while in a function called from GDB.
  Evaluation of the expression containing the function
  (some_func) will be abandoned.
  When the function is done executing, GDB will silently stop.

  Breakpoint 1, 0x0000000000401116 in some_func () at bpcond.c:5
  5 return *p;
  (gdb)

We now only see the 'Program received signal SIGSEGV, ...' line once,
which I think makes more sense.

Finally, I'm aware that the last few lines, that report the stop as
being at 'Breakpoint 1', when this is not where the thread is actually
located anymore, is not great. I'll address that in the next commit.

4e5c0af... by Andrew Burgess <email address hidden>

gdbserver: allow agent expressions to fail with invalid memory access

This commit extends gdbserver to take account of a failed memory
access from agent_mem_read, and to return a new eval_result_type
expr_eval_invalid_memory_access.

I have only updated the agent_mem_read calls related directly to
reading memory, I have not updated any of the calls related to
tracepoint data collection. This is just because I'm not familiar
with that area of gdb/gdbserver, and I don't want to break anything,
so leaving the existing behaviour as is seems like the safest
approach.

I've then update gdb.base/bp-cond-failure.exp to test evaluating the
breakpoints on the target, and have also extended the test so that it
checks for different sizes of memory access.

97a4893... by Andrew Burgess <email address hidden>

gdbserver: allows agent_mem_read to return an error code

Currently the gdbserver function agent_mem_read ignores any errors
from calling read_inferior_memory. This means that if there is an
attempt to access invalid memory then this will appear to succeed.

In this I update agent_mem_read so that if read_inferior_memory fails,
agent_mem_read will return an error code.

However, non of the callers of agent_mem_read actually check the
return value, so this commit will have no effect on anything. In the
next commit I will update the users of agent_mem_read to check for the
error code.

I've also updated the header comments on agent_mem_read to better
reflect what the function does, and its possible return values.