lttng-tools:stable-2.11

Last commit made on 2021-10-18
Get this branch:
git clone -b stable-2.11 https://git.launchpad.net/lttng-tools

Branch merges

Branch information

Name:
stable-2.11
Repository:
lp:lttng-tools

Recent commits

dd8ce0a... by =?utf-8?q?J=C3=A9r=C3=A9mie_Galarneau?= <email address hidden>

Update version to v2.11.8

Signed-off-by: Jérémie Galarneau <email address hidden>

19bfbd1... by Jonathan Rajotte

Fix: sessiond: previously created channel cannot be enabled

Observed issue
==============

A previously created channel cannot be enabled back once a session is
started.

Cause
=====

The check validating that the session was started is to early in the
`cmd_enable_channel` function.

Solution
========

Move the check at the creation code path when the channel is not found.

Known drawbacks
=========

None.

Signed-off-by: Jonathan Rajotte <email address hidden>
Change-Id: I8e7d62b7e97246e65f1cf9022270293a6dd34cc9
Signed-off-by: Jérémie Galarneau <email address hidden>

77ef915... by Jonathan Rajotte

Fix: ust: app stuck on recv message during UST comm timeout scenario

Observed issue
==============

The following scenario lead to the UST thread to be "stuck" on recvmsg
on the notify socket.
The problem manifest itself when an application is unresponsive during
the ustctl_start_session call. Note that the default timeout for ust
communication is 5 seconds.

  # Start an instrumented app
  ./app
  gdb lttng-sessiond
  # put a breakpoint on ustctl_start_session
  lttng create my_session
  lttng enable-event -u -a
  lttng start
  # The tracepoint should hit. Do not continue.
  kill -s SIGSTOP $(pgrep app)
  # Continue lttng-sessiond.
  sleep 5 # This make sure lttng-sessiond unregister the app from its point of view
  kill -s SIGCONT $(pgrep app)
  gdb -p $(pgrep app)
  thread apply all bt

App stack trace:

  Thread 3 (Thread 0x7fe2c6f58700 (LWP 48172)):
  #0 __libc_recvmsg (flags=0, msg=0x7fe2c6f56ac0, fd=4) at ../sysdeps/unix/sysv/linux/recvmsg.c:28
  #1 __libc_recvmsg (fd=fd@entry=4, msg=msg@entry=0x7fe2c6f56ac0, flags=flags@entry=0) at ../sysdeps/unix/sysv/linux/recvmsg.c:25
  #2 0x00007fe2c7a010ba in ustcomm_recv_unix_sock (sock=sock@entry=4, buf=buf@entry=0x7fe2c6f56ea0, len=len@entry=48) at lttng-ust-comm.c:308
  #3 0x00007fe2c7a037c3 in ustcomm_register_channel (sock=4, session=session@entry=0x7fe2c0000ba0, session_objd=<optimized out>, channel_objd=<optimized out>, nr_ctx_fields=nr_ctx_fields@entry=0, ctx_fields=<optimized out>, chan_id=0x7fe2
  c6f5716c, header_type=0x7fe2c0012b18) at lttng-ust-comm.c:1544
  #4 0x00007fe2c7a10787 in lttng_session_enable (session=0x7fe2c0000ba0) at lttng-events.c:444
  #5 0x00007fe2c7a0b785 in lttng_session_cmd (objd=1, cmd=128, arg=140611977311672, uargs=0x7fe2c6f57800, owner=0x7fe2c7a5da00 <local_apps>) at lttng-ust-abi.c:576
  #6 0x00007fe2c7a07d6d in handle_message (lum=0x7fe2c6f57590, sock=3, sock_info=0x7fe2c7a5da00 <local_apps>) at lttng-ust-comm.c:1003
  #7 ust_listener_thread (arg=0x7fe2c7a5da00 <local_apps>) at lttng-ust-comm.c:1712
  #8 0x00007fe2c7993609 in start_thread (arg=<optimized out>) at pthread_create.c:477
  #9 0x00007fe2c78ba293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

  ...

Cause
=====

When the app continues after the timeout from lttng-sessiond side, the
actual start_session message is received on the application side then
UST, app side, send commands on the notify socket. On lttng-sessiond
side, the command is received but no reply is sent.

This is due to the fact that the lookup against the
ust_app_ht_by_notify_sock hash table (find_app_by_notify_sock)
return nothing since the app is unregistered at this point and the hash
table node was removed on unregistration.

Solution
========

When the app lookup fails, return an error that will trigger the cleanup
of the notify socket.

Known drawbacks
=========
None

Note
=========
Subsequent error path in reply_ust_register_channel,
add_event_ust_registry, and add_enum_ust_registry might lead to the same
type of problem since no reply is sent to the app. Still, for those
cases the complete application/notify socket should not be destroyed
since the error path relate to either a session or a sub object of a
session.

Signed-off-by: Jonathan Rajotte <email address hidden>
Change-Id: Iea0dc027ca1ee772e84c7e545114f1be69fd1f63
Signed-off-by: Jérémie Galarneau <email address hidden>

ee3f516... by Jonathan Rajotte

Fix: ust: UST communication can return -EAGAIN

Observed issue
==============

The following scenario lead to an abort on event creation. The
problem manifest itself when an application is unresponsive. Note that
the default timeout for ust communication is 5 seconds.

  # Start an instrumented app
  ./app
  gdb lttng-sessiond
  # put a breakpoint on ustctl_create_event.
  lttng create my_session
  lttng enable-event -u -a
  lttng start
  # The tracepoint should hit. Do not continue.
  kill -s SIGSTOP $(pgrep app)
  # Continue lttng-sessiond.
  # lttng-sessiond will abort.

Note that for UST this is not an expected behaviour. Expected
communication failure with a single app should not invalidate the
complete channel, compromise its setup or result in an abort.

Note that a similar scenario for the following ustctl call sites also
lead to scenario where failure of a single app lead to error reporting
and/or error propagation to upper level object.

Problematic callsites:
   ustctl_set_exclusion
   ustctl_set_filter
   ustctl_disable_channel

These callsites are also fixed by this patch.

Cause
=====

For an unresponsive application, EAGAIN is returned and is treated as an
"unknown" hard error.

In this particular case the abort() call was introduced by commit:
88e3c2f5610b9ac89b0923d448fee34140fc46fb [1]. It is not clear if this is
a leftover from debugging session since this is the only callsite where
an abort is issued on communication failure via ustctl.

Solution
========

Handle EAGAIN coming from ustctl_* and treat it the same way a
dying application is handled. The only minor difference is that we WARN
on communication time out. Albeit not the most useful thing for a CLI
client, it could help overall user of lttng-sessiond in time out
situation.

Most call site already handled "unknown" error correctly. For those call
site we simply end up bringing more info in regards to the timeout
issue instead of mentioning that "-11" was returned.

Note, the reclamation of "app" is handled by the poll loop and
ust_app_unregister since the socket is shutdown by lttng-ust internally
on error, including EAGAIN.

Note that the application will try to register itself back to the
lttng-sessiond based on its configuration.

Known drawbacks
=========
None

References
==========
[1] https://github.com/lttng/lttng-tools/commit/88e3c2f5610b9ac89b0923d448fee34140fc46fb

Fixes: #1384

Signed-off-by: Jonathan Rajotte <email address hidden>
Change-Id: If364b5d48e7fd2b664276a0fb1b7eec2c45ed683
Signed-off-by: Jonathan Rajotte <email address hidden>
Signed-off-by: Jérémie Galarneau <email address hidden>

b77e9a3... by Jonathan Rajotte

Fix: ust: segfault on lttng start on filter bytecode copy

Observed issue
==============

A segmentation fault is observed for multiple UST timeout scenarios.

Backtrace:

 #0 __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:384
 #1 0x0000557fe0395df9 in copy_filter_bytecode (orig_f=0x7f9c5802b790) at ust-app.c:1196
 #2 0x0000557fe0397702 in shadow_copy_event (ua_event=0x7f9c58025ff0, uevent=0x7f9c58033560) at ust-app.c:1824
 #3 0x0000557fe039ac46 in create_ust_app_event (ua_sess=0x7f9c5802ec20, ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, app=0x7f9c5c001da0) at ust-app.c:3192
 #4 0x0000557fe03a054d in ust_app_channel_synchronize_event (ua_chan=0x7f9c58025cc0, uevent=0x7f9c58033560, ua_sess=0x7f9c5802ec20, app=0x7f9c5c001da0) at ust-app.c:5096
 #5 0x0000557fe03a0772 in ust_app_synchronize (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5173
 #6 0x0000557fe03a0a70 in ust_app_global_update (usess=0x7f9c580074a0, app=0x7f9c5c001da0) at ust-app.c:5255
 #7 0x0000557fe03a00e0 in ust_app_start_trace_all (usess=0x7f9c580074a0) at ust-app.c:4987
 #8 0x0000557fe0355c6a in cmd_start_trace (session=0x7f9c5800a190) at cmd.c:2668
 #9 0x0000557fe0382e70 in process_client_msg (cmd_ctx=0x7f9c58003d70, sock=0x7f9c74bf44e0, sock_error=0x7f9c74bf44e4) at client.c:1527
 #10 0x0000557fe03848a2 in thread_manage_clients (data=0x557fe06d9440) at client.c:2200
 #11 0x0000557fe037d1cb in launch_thread (data=0x557fe06d94b0) at thread.c:75
 #12 0x00007f9c796af609 in start_thread (arg=<optimized out>) at pthread_create.c:477
 #13 0x00007f9c795b6293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The scenario:

  # Start an instrumented app
  ./app
  gdb lttng-sessiond
  # put a breakpoint on ustctl_set_filter
  lttng create my_session
  lttng enable-event -u tp:tp_test
  lttng start
  lttng enable-event -u __dummy --filter 'my_field == "user34"'
  # The tracepoint should hit. Do not continue.
  kill -s SIGSTOP $(pgrep app)
  # Continue lttng-sessiond.
  # enable-event will return an error. This a bug in itself, still let's
  # continue with the current bug.
  lttng stop
  # Start a new app that will register.
  ./app &
  sleep 1
  lttng start
  # lttng-sessiond should segfault.

Cause
=====

During the "lttng enable-event" command, the timeout error bubbles up
all the way to event_ust_enable_tracepoint and is different from
LTTNG_UST_ERR_EXIST. `trace_ust_destroy_event` is called and frees the
`uevent` object. Note that contrary to the comment `uevent` is added to
the channel event hash table at this point.

On the next `lttng start` command, the event node is still present in
the hash table and is iterated on. lttng-sessiond segfault on the first
data access of the previously freed memory.

The problem was introduced by commit
88e3c2f5610b9ac89b0923d448fee34140fc46fb [1]. Which essentially move the
callsite of `add_unique_ust_event` before `ust_app_*_event_glb` calls.

Solution
========

Go to `end` label to prevent freeing of the uevent object.

Note that app synchronization should not force an error at the channel
level, since a single app can fail but the whole channel should not.

The `error` label is now obsolete.

Known drawbacks
=========

None.

References
==========

[1] https://github.com/lttng/lttng-tools/commit/88e3c2f5610b9ac89b0923d448fee34140fc46fb

Signed-off-by: Jonathan Rajotte <email address hidden>
Change-Id: Ifaf3f4c71bb2da869c7b441aaa4b367f8f7cbdd6
Signed-off-by: Jérémie Galarneau <email address hidden>

74b8147... by =?utf-8?q?J=C3=A9r=C3=A9mie_Galarneau?= <email address hidden>

Build fix: Missing message in LTTNG_DEPRECATED invocation

Coverity scan build jobs fail since LTTNG_DEPRECATED expects a string
and none is provided at the lttng_metadata_regenerate use site.

Signed-off-by: Jérémie Galarneau <email address hidden>
Change-Id: I7e6701abd24c679f578b0adead771ac93b6566cd

486042a... by Simon Marchi <email address hidden>

include: remove spurious spaces in condition/session-rotation.h

Change-Id: Ia525d24c3b4098dff5c50fb2c5d93c16f6e08f5c
Signed-off-by: Simon Marchi <email address hidden>
Signed-off-by: Jérémie Galarneau <email address hidden>

6ecf04f... by Michael Jeanson <email address hidden>

fix: wrong define used for GCC version check

As far as I can tell, the __GNUC_MAJOR__ define has never existed, the
proper define for the major version is __GNUC__. See
https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html for
more details.

Change-Id: I0d47d524e7efd204fd2f8976311c62e872eb6170
Signed-off-by: Michael Jeanson <email address hidden>
Signed-off-by: Jérémie Galarneau <email address hidden>

e994318... by =?utf-8?q?J=C3=A9r=C3=A9mie_Galarneau?= <email address hidden>

Fix: userspace-probe: unreported error on string copy error

Issue
=====

String copy errors, either due to the length or an allocation failure,
are not reported by
lttng_userspace_probe_location_tracepoint_create_from_payload
and don't log a clear error message.

This allowed truncation bugs like the one fixed in b45a296 to go
unnoticed.

Fix
===

Return an "invalid" status code and log a more descriptive error
message.

Signed-off-by: Jérémie Galarneau <email address hidden>
Change-Id: Ia07cac7cba315ea79337262e9082dd06eb60950f

7b25210... by Francis Deslauriers <email address hidden>

Fix: userspace-probe: truncating binary path for SDT

Issue
=====
This issue was uncovered when we enabled the testing of the SDT
userspace probe instrumentation on the CI, where the paths to file are
specially long.

The reported error is:
  - rule: ma-probe-sdt (type: kernel:uprobe, location type: SDT, location: /root/workspace/dev_gerrit_lttng-tools_rootbuild/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/agents/liburcu_version/master/node/amd64-rootnode/test_type/base/src/lttng-tools/tests/utils/testapp/userspace-probe-sdt-binary/.libs/userspace-probe-sdt-binary:foobar:tp1)
  + rule: ma-probe-sdt (type: kernel:uprobe, location type: SDT, location: /root/workspace/dev_gerrit_lttng-tools_rootbuild/arch/amd64/babeltrace_version/stable-2.0/build/std/conf/agents/liburcu_version/master/node/amd64-rootnode/test_type/base/src/lttng-tools/tests/utils/testapp/userspace-probe-sdt-binary/.libs/userspace-probe-s:foobar:tp1)

The important part to notice is that the path to the binary is truncated
compared to was is expected by the test case.

The problem is caused by the
`lttng_userspace_probe_location_tracepoint_create_from_payload()`
function that strdup() the path string using the wrong defined value.

Fix
===
Use LTTNG_PATH_MAX rather then LTTNG_SYMBOL_NAME_LEN to copy the binary
path.

Signed-off-by: Francis Deslauriers <email address hidden>
Signed-off-by: Jérémie Galarneau <email address hidden>
Change-Id: I24cbf413baba405bf4c4b534ccbc2b18f8d5d43f