ceph patch as of 8/29 segfaults all bluestore osds

Bug #1842020 reported by Harry Coin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ceph (Ubuntu)
Fix Released
High
James Page

Bug Description

The ceph patch on eoan distributed 8/29 crashes all bluestore OSD's and is unusable at least on some systems. Failed on an old dual Xeon E5345 box.

Easy test. Run:

/usr/bin/ceph-bluestore-tool

On working systems it reports a help message. On the latest eoan release distributed by canonical it
turns in to an illegal instruction process kill, just after reading /proc/<...>/auxv

looks like something to do with vsock issues.

strace -k -y /usr/bin/ceph-bluestore-tool
is instructive.

Notice valgrind against that program reports thousands of memory allocation issues.
Reverting the system to the snapshot before the apt upgrade restores full operations.
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu7
Architecture: amd64
DistroRelease: Ubuntu 19.10
InstallationDate: Installed on 2019-07-30 (30 days ago)
InstallationMedia: Ubuntu-MATE 19.10 "Eoan Ermine" - Alpha amd64 (20190726)
Package: ceph 14.2.2-0ubuntu2
PackageArchitecture: amd64
ProcEnviron:
 LANGUAGE=en_US
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 5.2.0-15.16-generic 5.2.9
Tags: eoan
Uname: Linux 5.2.0-15-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
mtime.conffile..etc.default.apport: 2019-08-30T11:35:29.463071

Revision history for this message
Harry Coin (hcoin) wrote :

Confirmed on two different boxes of the same processor vintage. Otherwise latest eoan updates. Desktop ceph mate though it shouldn't matter.

Revision history for this message
Harry Coin (hcoin) wrote :

Bisected the problem starts with 14.2.2-0ubuntu1, might be the next one. It works in 14.2.1-0ubuntu3. Look for the change in the file size of ceph-bluestore-osd.

Revision history for this message
James Page (james-page) wrote :

Attempting to reproduce:

$ apt-cache policy ceph-osd
ceph-osd:
  Installed: 14.2.2-0ubuntu2
  Candidate: 14.2.2-0ubuntu2
  Version table:
 *** 14.2.2-0ubuntu2 500
        500 http://gb.archive.ubuntu.com/ubuntu eoan/main amd64 Packages
        100 /var/lib/dpkg/status

$ ceph-bluestore-tool
must specify an action; --help for help

$ ceph-bluestore-tool --help
All options:

Options:
  -h [ --help ] produce help message
  --path arg bluestore path
  --out-dir arg output directory
  -l [ --log-file ] arg log file
  --log-level arg log level (30=most, 20=lots, 10=some, 1=little)
  --dev arg device(s)
  --devs-source arg bluefs-dev-migrate source device(s)
  --dev-target arg target/resulting device
  --deep arg deep fsck (read all data)
  -k [ --key ] arg label metadata key name
  -v [ --value ] arg label metadata value

Positional options:
  --command arg fsck, repair, bluefs-export, bluefs-bdev-sizes,
                         bluefs-bdev-expand, bluefs-bdev-new-db,
                         bluefs-bdev-new-wal, bluefs-bdev-migrate, show-label,
                         set-label-key, rm-label-key, prime-osd-dir,
                         bluefs-log-dump

That said I'm running on a much new processor type.

Changed in ceph (Ubuntu):
assignee: nobody → James Page (james-page)
Revision history for this message
James Page (james-page) wrote :

I don't have access to the same processor class; I'd suspect this is not a ceph specific issue but might be a compiler bug in eoan.

The stacktrace for the issue might be recorded on one of your deployments - could you try to collect it using:

  apport-collect 1842020

alternatively you can collect a backtrace with full debug symbols:

  https://wiki.ubuntu.com/Backtrace

and attach to this bug report.

Revision history for this message
Trent Lloyd (lathiat) wrote :

At a super basic level I can't reproduce this. With an eoan container on an eoan host I don't get a segfault from ceph-bluestore-tool.

I'd suggest we may need to look at getting
 (1) a coredump
 (2) the somewhat unlikely but not impossible chance that it's CPU-dependent for some kind of optimization reason or similar as this CPU is quite old [can you confirm the install is also 64-bit?]
 (3) A bunch of information about the system configuration.. e.g. from 'sosreport' would work or similar. [I'm not sure if you can use reportbug to upload system info about an existing bug] - including at least the "dpkg -l" full package list.

Revision history for this message
Harry Coin (hcoin) wrote :

Attached is the result of
apport-bug --save
it can be viewed with
apport-unpack
It has the answers to most of the above questions. Yes, it is amd64 (dual xeon...)

Revision history for this message
Harry Coin (hcoin) wrote :

Backtrace log for you.

Revision history for this message
Harry Coin (hcoin) wrote :

The above data is from a run in an eoan VM on the same processor that hosts the osd's in the baremetal layer (same bug in both cases, but I reverted the baremetal layer to the state prior to the 'upgrade' to restore ceph osd function).
Here's the dpkg -l you asked for.

Here's cat /proc/cpuinfo on the VM
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel Celeron_4x0 (Conroe/Merom Class Core 2)
stepping : 3
microcode : 0x1
cpu MHz : 2327.284
cache size : 16384 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx lm constant_tsc rep_good nopl cpuid tsc_known_freq pni vmx ssse3 cx16 x2apic tsc_deadline_timer hypervisor lahf_lm cpuid_fault pti tpr_shadow vnmi flexpriority tsc_adjust arat arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips : 4654.56
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 15
model name : Intel Celeron_4x0 (Conroe/Merom Class Core 2)
stepping : 3
microcode : 0x1
cpu MHz : 2327.284
cache size : 16384 KB
physical id : 1
siblings : 1
core id : 0
cpu cores : 1
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 10
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx lm constant_tsc rep_good nopl cpuid tsc_known_freq pni vmx ssse3 cx16 x2apic tsc_deadline_timer hypervisor lahf_lm cpuid_fault pti tpr_shadow vnmi flexpriority tsc_adjust arat arch_capabilities
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs
bogomips : 4654.56
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

Revision history for this message
Harry Coin (hcoin) wrote :

FYI, in the eoan VM you used to duplicate the bug, try it again with the processor set to 2 'Conroe' cpus. i440FX chipset, BIOS, kvm-spice emulator.

Also, I tried compiling ceph nautilus from upstream, can't do it in eoan without installing tox from bionic, and a couple other packages not in eoan (e.g. libcui60 dependency upstream, eoan has ...63). Had to add these #pragmas which look memory related...

src/spdk/dpdk/lib/librte_eal/common/eal_common_memory.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_eal/common/eal_common_memzone.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_eal/common/eal_common_tailqs.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_eal/common/malloc_heap.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_eal/common/rte_malloc.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_eal/linuxapp/eal/eal_memalloc.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_eal/linuxapp/eal/eal_memory.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_eal/linuxapp/eal/eal_vfio.h:#pragma message("VFIO configured but not supported by this kernel, disabling.")
src/spdk/dpdk/lib/librte_ethdev/rte_tm.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_ethdev/ethdev_profile.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_ethdev/rte_ethdev.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_ethdev/rte_flow.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_ethdev/rte_mtr.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_mempool/rte_mempool.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_net/rte_arp.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_net/rte_net.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"
src/spdk/dpdk/lib/librte_ring/rte_ring.c:#pragma GCC diagnostic warning "-Waddress-of-packed-member"

Revision history for this message
Harry Coin (hcoin) wrote : Dependencies.txt

apport information

tags: added: apport-collected eoan
description: updated
Revision history for this message
Harry Coin (hcoin) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Harry Coin (hcoin) wrote : modified.conffile..etc.default.apport.txt

apport information

Revision history for this message
Harry Coin (hcoin) wrote :

here's a core.gz of a typical crash in case the previous apport thing didn't get it to you.

Revision history for this message
Harry Coin (hcoin) wrote : Re: [Bug 1842020] Re: ceph patch as of 8/29 segfaults all bluestore osds

Try it setting the processors on the VM to dual conroe.

On 8/30/19 3:38 AM, Trent Lloyd wrote:
> At a super basic level I can't reproduce this. With an eoan container on
> an eoan host I don't get a segfault from ceph-bluestore-tool.
>
> I'd suggest we may need to look at getting
> (1) a coredump
> (2) the somewhat unlikely but not impossible chance that it's CPU-dependent for some kind of optimization reason or similar as this CPU is quite old [can you confirm the install is also 64-bit?]
> (3) A bunch of information about the system configuration.. e.g. from 'sosreport' would work or similar. [I'm not sure if you can use reportbug to upload system info about an existing bug] - including at least the "dpkg -l" full package list.
>

Revision history for this message
Harry Coin (hcoin) wrote :
Download full text (5.0 KiB)

And with debug symbols:

(gdb) run
Starting program: /usr/bin/ceph-bluestore-tool
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGILL, Illegal instruction.
0x0000555555743984 in eth_dev_init_cb_lists ()
(gdb) backtrace full
#0 0x0000555555743984 in eth_dev_init_cb_lists ()
No symbol table info available.
#1 0x0000555555dc045d in __libc_csu_init ()
No symbol table info available.
#2 0x00007fffee524e2e in __libc_start_main (main=0x5555557346b0 <main(int, char**)>, argc=1, argv=0x7fffffffe328, init=0x555555dc0410 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe318)
at ../csu/libc-start.c:264
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737351328816, 5390835150573769728, 140737353589600, 140737353586152, 1, 140737488347944, 140737488347960, 140737354007706}, mask_was_saved = 8}}, priv = {pad = {0x1,
0x7fffffffe328, 0x7fffffffe338, 0x7ffff7ffe190}, data = {prev = 0x1, cleanup = 0x7fffffffe328, canceltype = -7368}}}
not_first_call = <optimized out>
#3 0x000055555581e47e in _start () at /usr/include/c++/9/ostream:108
No symbol table info available.
(gdb) backtrace full
#0 0x0000555555743984 in eth_dev_init_cb_lists () at /usr/include/c++/9/ostream:108
No symbol table info available.
#1 0x0000555555dc045d in __libc_csu_init ()
No symbol table info available.
#2 0x00007fffee524e2e in __libc_start_main (main=0x5555557346b0 <main(int, char**)>, argc=1, argv=0x7fffffffe328, init=0x555555dc0410 <__libc_csu_init>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe318)
at ../csu/libc-start.c:264
result = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737351328816, 5390835150573769728, 140737353589600, 140737353586152, 1, 140737488347944, 140737488347960, 140737354007706}, mask_was_saved = 8}}, priv = {pad = {0x1,
0x7fffffffe328, 0x7fffffffe338, 0x7ffff7ffe190}, data = {prev = 0x1, cleanup = 0x7fffffffe328, canceltype = -7368}}}
not_first_call = <optimized out>
#3 0x000055555581e47e in _start () at /usr/include/c++/9/ostream:108
No symbol table info available.
(gdb) info registers
rax 0x555555fe0340 93825003291456
rbx 0x36 54
rcx 0xb 11
rdx 0x5555568921a0 93825012408736
rsi 0x7fffffffe328 140737488347944
rdi 0x1 1
rbp 0xc5 0xc5
rsp 0x7fffffffe208 0x7fffffffe208
r8 0x0 0
r9 0x0 0
r10 0x642e6264626f6c62 7218815436009204834
r11 0x20 32
r12 0x555555f2b510 93825002550544
r13 0x1 1
r14 0x7fffffffe328 140737488347944
r15 0x5555568921a0 93825012408736
rip 0x555555743984 0x555555743984 <eth_dev_init_cb_lists+68>
eflags 0x10212 [ AF IF RF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
f...

Read more...

Revision history for this message
Harry Coin (hcoin) wrote :

I think I found it, but could use some validation.
Notice at https://ceph.io/geen-categorie/sse-optimization-for-erasure-code-in-ceph/
we have the precedent for ceph's checking what level of SSE instructions are available then using the appropriate one.

However, in the ubuntu version, littered around the makefiles we see -msse4.2 in several places
oddly (there is no mssse3)
-msse -msse2 -msse3 -mssse3 -mpclmul -msse4.1 -msse4.2

in rocksdb we see often -msse4.2

Canonical should remove the -msse4.2 compiler flags as ceph doesn't advertise it is not compatible with systems with less than sse4 capabilities.

I'm looking in to this further, but it appears to fit what I know so far.

Revision history for this message
Harry Coin (hcoin) wrote :

Not so great minds think alike. Here it is, from upstream:
https://tracker.ceph.com/issues/41330

Revision history for this message
Harry Coin (hcoin) wrote :

Upstream has two approaches to a solution. One was to disable sdpk except for development versions because the spdk folks set their lowest usable software level to corei7. I couldn't get that patch to work in the ubuntu packaging 'apt-get source ceph'.

I was able to get the patch working that edited the two files mentioned in the above-- editing the memcpy code and commenting out the corei7.

What I would like to see, and see as a general solution that might set canonical apart from others in a good way is:

when compiling using dpkg-buildpackage ...
a canonical-wide flag that overrides whatever -msse and -march might be the defaults and replace that with -march=native.

In that way, those who want to compile a package to get best performance (or any performance) on a particular machine can make it 'just work'.

James Page (james-page)
Changed in ceph (Ubuntu):
status: New → Triaged
importance: Undecided → High
Revision history for this message
James Page (james-page) wrote :

Package with SPDK disable (inline with short term upstream fix) building here:

  https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/3535

this also includes a fix for py3 compat in ceph-crash.

James Page (james-page)
Changed in ceph (Ubuntu):
status: Triaged → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ceph - 14.2.2-0ubuntu3

---------------
ceph (14.2.2-0ubuntu3) eoan; urgency=medium

  * d/rules: Disable SPDK support as this generates a build which
    has a minimum CPU baseline of 'corei7' on x86_64 which is not
    compatible with older CPU's (LP: #1842020).
  * d/p/issue40781.patch: Cherry pick fix for py3 compatibility in ceph-
    crash.

 -- James Page <email address hidden> Tue, 03 Sep 2019 14:52:38 +0100

Changed in ceph (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.