glibc:nsz/bug19329-v2

Last commit made on 2021-04-13
Get this branch:
git clone -b nsz/bug19329-v2 https://git.launchpad.net/glibc

Branch merges

Branch information

Name:
nsz/bug19329-v2
Repository:
lp:glibc

Recent commits

b116855... by Szabolcs Nagy <email address hidden>

RFC elf: Fix slow tls access after dlopen [BZ #19924]

In short: __tls_get_addr checks the global generation counter,
_dl_update_slotinfo updates up to the generation of the accessed
module. If the global generation is newer than geneneration of the
module then __tls_get_addr keeps hitting the slow path that updates
the dtv.

Possible approaches i can see:

1. update to global generation instead of module,
2. check the module generation in the fast path.

This patch is 1.: it needs additional sync (load acquire) so the
slotinfo list is up to date with the observed global generation.

Approach 2. would require walking the slotinfo list at all times.
I don't know how to make that fast with many modules.

Note: in the x86_64 version of dl-tls.c the generation is only loaded
once, since relaxed mo is not faster than acquire mo load.

I have not benchmarked this yet.

f8ea2b9... by Szabolcs Nagy <email address hidden>

elf: Remove lazy tlsdesc relocation related code

Remove generic tlsdesc code related to lazy tlsdesc processing since
lazy tlsdesc relocation is no longer supported. This includes removing
GL(dl_load_lock) from _dl_make_tlsdesc_dynamic which is only called at
load time when that lock is already held.

Added a documentation comment too.

13859e5... by Szabolcs Nagy <email address hidden>

i386: Remove lazy tlsdesc relocation related code

Like in commit e75711ebfa976d5468ec292282566a18b07e4d67 for x86_64,
remove unused lazy tlsdesc relocation processing code:

  _dl_tlsdesc_resolve_abs_plus_addend
  _dl_tlsdesc_resolve_rel
  _dl_tlsdesc_resolve_rela
  _dl_tlsdesc_resolve_hold

5419c82... by Szabolcs Nagy <email address hidden>

x86_64: Remove lazy tlsdesc relocation related code

_dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for
lazy tlsdesc relocation processing which is no longer supported.

--
v2:
- fix elf_machine_runtime_setup: remove tlsdesc got setup.

e865fa9... by Szabolcs Nagy <email address hidden>

i386: Avoid lazy relocation of tlsdesc [BZ #27137]

Lazy tlsdesc relocation is racy because the static tls optimization and
tlsdesc management operations are done without holding the dlopen lock.

This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67
for aarch64, but it fixes a different race: bug 27137.

On i386 the code is a bit more complicated than on x86_64 because both
rel and rela relocs are supported.

b37a0e4... by Szabolcs Nagy <email address hidden>

x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]

Lazy tlsdesc relocation is racy because the static tls optimization and
tlsdesc management operations are done without holding the dlopen lock.

This similar to the commit b7cf203b5c17dd6d9878537d41e0c7cc3d270a67
for aarch64, but it fixes a different race: bug 27137.

Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to
relocate tlsdesc lazily, but that does not work in a BIND_NOW module
due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at
load time fixes this bug 27721 too.

--
v2:
- mention the ldaudit issue with bindnow and tlsdesc.

299d28c... by Szabolcs Nagy <email address hidden>

elf: Fix DTV gap reuse logic [BZ #27135]

For some reason only dlopen failure caused dtv gaps to be reused.

It is possible that the intent was to never reuse modids for a
different module, but after dlopen failure all gaps are reused
not just the ones caused by the unfinished dlopened.

So the code has to handle reused modids already which seems to
work, however the data races at thread creation and tls access
(see bug 19329 and bug 27111) may be more severe if slots are
reused so this is scheduled after those fixes. I think fixing
the races are not simpler if reuse is disallowed and reuse has
other benefits, so set GL(dl_tls_dtv_gaps) whenever entries are
removed from the middle of the slotinfo list. The value does
not have to be correct: incorrect true value causes the next
modid query to do a slotinfo walk, incorrect false will leave
gaps and new entries are added at the end.

Fixes bug 27135.

8df69db... by Szabolcs Nagy <email address hidden>

elf: Add test case for [BZ #19329]

Test concurrent dlopen and pthread_create when the loaded modules have
TLS. This triggers dl-tls assertion failures more reliably than the
nptl/tst-stack4 test.

The dlopened module has 100 DT_NEEDED dependencies with TLS, they were
reused from an existing TLS test. The number of created threads during
dlopen depends on filesystem speed and hardware, but at most 3 threads
are alive at a time to limit resource usage.

---
v5:
- moved from nptl/ to elf/
- use many tls modules from another test (affects the timing a bit as
  these need more relocation processing, but still detects the race)
- use xdlopen
- use stdatomic.h
- moved from tests-internal back to tests
- use acq/rel atomics instead of relaxed, this works too and cleaner.
- moved after bug fix patch.

v4:
- rebased, updated copyright year.
- moved to tests-internal because of <atomic.h>

v3:
- use the new test support code.
- better Makefile usage so modules are cleaned properly.

v2:
- undef NDEBUG.
- join nop threads so at most 3 threads exist at a time.
- remove stack size setting (resource usage is no longer a concern).
- stop creating threads after dlopen observably finished.
- print the number of threads created for debugging.

10fb15a... by Szabolcs Nagy <email address hidden>

elf: Use relaxed atomics for racy accesses [BZ #19329]

This is a follow up patch to the fix for bug 19329. This adds
relaxed MO atomics to accesses that are racy, but relaxed MO is
enough.

--
v2:
- handle x86_64 dl-tls.c too

ae99087... by Szabolcs Nagy <email address hidden>

elf: Fix data races in pthread_create and TLS access [BZ #19329]

DTV setup at thread creation (_dl_allocate_tls_init) is changed
to take the dlopen lock, GL(dl_load_lock). Avoiding data races
here without locks would require design changes: the map that is
accessed for static TLS initialization here may be concurrently
freed by dlclose. That use after free may be solved by only
locking around static TLS setup or by ensuring dlclose does not
free modules with static TLS, however currently every link map
with TLS has to be accessed at least to see if it needs static
TLS. And even if that's solved, still a lot of atomics would be
needed to synchronize DTV related globals without a lock. So fix
both bug 19329 and bug 27111 with a lock that prevents DTV setup
running concurrently with dlopen or dlclose.

_dl_update_slotinfo at TLS access still does not use any locks
so CONCURRENCY NOTES are added to explain the synchronization.
The early exit from the slotinfo walk when max_modid is reached
is not strictly necessary, but does not hurt either.

An incorrect acquire load was removed from _dl_resize_dtv: it
did not synchronize with any release store or fence and
synchronization is now handled separately at thread creation
and TLS access time.

There are still a number of racy read accesses to globals that
will be changed to relaxed MO atomics in a followup patch. This
should not introduce regressions compared to existing behaviour
and avoid cluttering the main part of the fix.

Not all TLS access related data races got fixed here: there are
additional races at lazy tlsdesc relocations see bug 27137.