CONFIG_LOG_BUF_SHIFT set to 14 is too low on arm64

Bug #1824864 reported by Dimitri John Ledkov
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Cosmic
Invalid
Undecided
Unassigned
Disco
Fix Released
Undecided
Unassigned
Eoan
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * Too small dmsg kernel buf ring size leads to loosing/missing early boot kernel messages which happen before journald starts slurping them up and storing them on disc. This results in messages similar to this one on boot "missed NN kernel messages on boot". This is especially pronounced on arm64 as the default setting there is way lower than any other 32bit or 64bit architecture we ship. Also amd64 appears to have the highest setting of 18 among all architectures we ship. The best course of action to bump all 64bit arches to 18, and keep all 32bit arches at the current & upstream default of 17.

[Test Case]

 * $ cat /boot/config-`uname -r` | grep CONFIG_LOG_BUF_SHIFT

on 64bit arches result should be: CONFIG_LOG_BUF_SHIFT=18
on 32bit arches result should be: CONFIG_LOG_BUF_SHIFT=17

 * run systemd adt test, the boot-and-services test case should not fail journald tests with "missed kernel messages" visible in the error logs.

[Regression Potential]

 * Increasing the size of the log_buf, will increase kernel memory usage which cannot be reclaimed. It will now become 256kb on arm64, ppc64el, s390x instead of 8kB/128kb/128kb respectively. 32bit arches remain unchanged at 128kb.

[Other Info]

 * Original bug report

CONFIG_LOG_BUF_SHIFT
policy<{
'amd64' : '18',
'arm64' : '14',
'armhf' : '17',
'i386' : '17',
'ppc64el': '17',
's390x' : '17'}>

Please set CONFIG_LOG_BUF_SHIFT to at least 17 on arm64.

Potentially bump all 64-bit arches to 18 (or higher!) as was done on amd64, meaning set 18 on arm64 s390x ppc64el.

I have a systemd autopkgtest test that asserts that we see Linux kernel command line in the dmesg (journalctl -k -b). And it is consistently failing on arm64 scalingstack KVM EFI machines with messages of "missing 81 kernel messages".

config LOG_BUF_SHIFT
        int "Kernel log buffer size (16 => 64KB, 17 => 128KB)"
        range 12 25
        default 17
        depends on PRINTK
        help
          Select the minimal kernel log buffer size as a power of 2.
          The final size is affected by LOG_CPU_MAX_BUF_SHIFT config
          parameter, see below. Any higher size also might be forced
          by "log_buf_len" boot parameter.

          Examples:
                     17 => 128 KB
                     16 => 64 KB
                     15 => 32 KB
                     14 => 16 KB
                     13 => 8 KB
                     12 => 4 KB

14 sounds like redictiously low for arm64. given that 17 is default across 32-bit arches, and 18 is default on amd64.

On a related note, we have CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT policy<{'amd64': '13', 'arm64': '13', 'armhf': '13', 'i386': '13', 'ppc64el': '13', 's390x': '13'}>
I'm not sure if we want to bump these up to LOG_BUF_SHIFT size or not.

Please backport this to xenial and up.

=== systemd ===

systemd, boot-and-services test case can bump the ring buffer before running the tests.

tags: added: bot-stop-nagging
summary: - CONFIG_LOG_BUF_SHIFT is too low on arm64
+ CONFIG_LOG_BUF_SHIFT set to 14 is too low on arm64
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1824864

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
Changed in linux (Ubuntu Cosmic):
status: New → Incomplete
Changed in linux (Ubuntu Xenial):
status: New → Incomplete
description: updated
Changed in linux (Ubuntu Disco):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Cosmic):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Bionic):
status: Incomplete → Confirmed
Changed in linux (Ubuntu Xenial):
status: Incomplete → Confirmed
description: updated
description: updated
Changed in linux (Ubuntu Disco):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-disco
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

$ cat /boot/config-5.0.0-16-generic | grep LOG_BUF
CONFIG_LOG_BUF_SHIFT=18
CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=13

$ uname -a
Linux doerfel 5.0.0-16-generic #17-Ubuntu SMP Wed May 15 10:54:19 UTC 2019 aarch64 aarch64 aarch64 GNU/Linux

And journal shows complete kernel boot messages, thus this all now good on disco.

tags: added: verification-done-disco
removed: verification-needed-disco
no longer affects: systemd (Ubuntu Xenial)
no longer affects: systemd (Ubuntu Bionic)
no longer affects: systemd (Ubuntu Cosmic)
no longer affects: systemd (Ubuntu Disco)
no longer affects: systemd (Ubuntu Eoan)
no longer affects: systemd (Ubuntu)
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (24.0 KiB)

This bug was fixed in the package linux - 5.0.0-16.17

---------------
linux (5.0.0-16.17) disco; urgency=medium

  * linux: 5.0.0-16.17 -proposed tracker (LP: #1829173)

  * shiftfs: lock security sensitive superblock flags (LP: #1827122)
    - SAUCE: shiftfs: lock down certain superblock flags

  * Please package libbpf (which is done out of the kernel src) in Debian [for
    19.10] (LP: #1826410)
    - SAUCE: tools -- fix add ability to disable libbfd

  * Disco update: 5.0.8 upstream stable release (LP: #1828415)
    - drm/i915/gvt: do not let pin count of shadow mm go negative
    - kbuild: pkg: use -f $(srctree)/Makefile to recurse to top Makefile
    - netfilter: nft_compat: use .release_ops and remove list of extension
    - netfilter: nf_tables: use-after-free in dynamic operations
    - netfilter: nf_tables: add missing ->release_ops() in error path of newrule()
    - hv_netvsc: Fix unwanted wakeup after tx_disable
    - ibmvnic: Fix completion structure initialization
    - ip6_tunnel: Match to ARPHRD_TUNNEL6 for dev type
    - ipv6: Fix dangling pointer when ipv6 fragment
    - ipv6: sit: reset ip header pointer in ipip6_rcv
    - kcm: switch order of device registration to fix a crash
    - net: ethtool: not call vzalloc for zero sized memory request
    - net-gro: Fix GRO flush when receiving a GSO packet.
    - net/mlx5: Decrease default mr cache size
    - netns: provide pure entropy for net_hash_mix()
    - net: rds: force to destroy connection if t_sock is NULL in
      rds_tcp_kill_sock().
    - net/sched: act_sample: fix divide by zero in the traffic path
    - net/sched: fix ->get helper of the matchall cls
    - qmi_wwan: add Olicard 600
    - r8169: disable ASPM again
    - sctp: initialize _pad of sockaddr_in before copying to user memory
    - tcp: Ensure DCTCP reacts to losses
    - tcp: fix a potential NULL pointer dereference in tcp_sk_exit
    - vrf: check accept_source_route on the original netdevice
    - net/mlx5e: Fix error handling when refreshing TIRs
    - net/mlx5e: Add a lock on tir list
    - nfp: validate the return code from dev_queue_xmit()
    - nfp: disable netpoll on representors
    - bnxt_en: Improve RX consumer index validity check.
    - bnxt_en: Reset device on RX buffer errors.
    - net: ip_gre: fix possible use-after-free in erspan_rcv
    - net: ip6_gre: fix possible use-after-free in ip6erspan_rcv
    - net: bridge: always clear mcast matching struct on reports and leaves
    - net: thunderx: fix NULL pointer dereference in nicvf_open/nicvf_stop
    - net: vrf: Fix ping failed when vrf mtu is set to 0
    - net: core: netif_receive_skb_list: unlist skb before passing to pt->func
    - r8169: disable default rx interrupt coalescing on RTL8168
    - net: mlx5: Add a missing check on idr_find, free buf
    - net/mlx5e: Update xoff formula
    - net/mlx5e: Update xon formula
    - kbuild: clang: choose GCC_TOOLCHAIN_DIR not on LD
    - lib/string.c: implement a basic bcmp
    - Revert "clk: meson: clean-up clock registration"
    - tty: mark Siemens R3964 line discipline as BROKEN
    - [Config]: remove CONFIG_R3964
    - [Config]: add CONFIG_LDISC_AUTOLOAD=y
    - tty: ldisc: add sysctl to p...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Eoan):
status: Confirmed → Fix Released
Revision history for this message
Evgeny Vereshchagin (evvers) wrote :

Given that https://github.com/systemd/systemd/issues/11104 is a variation of this bug and the reason `arm64` has been off since December 2018, I'm wondering how long it will take to fix this on Bionic.

Changed in linux (Ubuntu Bionic):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Cosmic):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Xenial):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
tags: added: verification-needed-bionic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Dan Streetman (ddstreet) wrote :
tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Dan Streetman (ddstreet) wrote :

cosmic:

the failure has been more intermittent on cosmic, but the latest run on kernel 4.18.0-26 did not fail:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-cosmic/cosmic/arm64/s/systemd/20190703_141922_8afb6@/log.gz

tags: added: verification-done-cosmic
removed: verification-needed-cosmic
Revision history for this message
Dan Streetman (ddstreet) wrote :

bionic:

the testbed appears to be very slow and is regularly timing out (or, there is some code or test bug that is hanging the test), but it does it make it past testing boot-and-services, with the results:

boot-and-services/test_no_options fails with kernel 4.15.0-54:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/arm64/s/systemd/20190625_054508_8441b@/log.gz

boot-and-services/test_no_options passes with kernel 4.15.0-55:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-bionic/bionic/arm64/s/systemd/20190703_184234_631d0@/log.gz

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (11.2 KiB)

This bug was fixed in the package linux - 4.15.0-55.60

---------------
linux (4.15.0-55.60) bionic; urgency=medium

  * linux: 4.15.0-55.60 -proposed tracker (LP: #1834954)

  * Request backport of ceph commits into bionic (LP: #1834235)
    - ceph: use atomic_t for ceph_inode_info::i_shared_gen
    - ceph: define argument structure for handle_cap_grant
    - ceph: flush pending works before shutdown super
    - ceph: send cap releases more aggressively
    - ceph: single workqueue for inode related works
    - ceph: avoid dereferencing invalid pointer during cached readdir
    - ceph: quota: add initial infrastructure to support cephfs quotas
    - ceph: quota: support for ceph.quota.max_files
    - ceph: quota: don't allow cross-quota renames
    - ceph: fix root quota realm check
    - ceph: quota: support for ceph.quota.max_bytes
    - ceph: quota: update MDS when max_bytes is approaching
    - ceph: quota: add counter for snaprealms with quota
    - ceph: avoid iput_final() while holding mutex or in dispatch thread

  * QCA9377 isn't being recognized sometimes (LP: #1757218)
    - SAUCE: USB: Disable USB2 LPM at shutdown

  * hns: fix ICMP6 neighbor solicitation messages discard problem (LP: #1833140)
    - net: hns: fix ICMP6 neighbor solicitation messages discard problem
    - net: hns: fix unsigned comparison to less than zero

  * Fix occasional boot time crash in hns driver (LP: #1833138)
    - net: hns: Fix probabilistic memory overwrite when HNS driver initialized

  * use-after-free in hns_nic_net_xmit_hw (LP: #1833136)
    - net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()

  * hns: attempt to restart autoneg when disabled should report error
    (LP: #1833147)
    - net: hns: Restart autoneg need return failed when autoneg off

  * systemd 237-3ubuntu10.14 ADT test failure on Bionic ppc64el (test-seccomp)
    (LP: #1821625)
    - powerpc: sys_pkey_alloc() and sys_pkey_free() system calls
    - powerpc: sys_pkey_mprotect() system call

  * [UBUNTU] pkey: Indicate old mkvp only if old and curr. mkvp are different
    (LP: #1832625)
    - pkey: Indicate old mkvp only if old and current mkvp are different

  * [UBUNTU] kernel: Fix gcm-aes-s390 wrong scatter-gather list processing
    (LP: #1832623)
    - s390/crypto: fix gcm-aes-s390 selftest failures

  * System crashes on hot adding a core with drmgr command (4.15.0-48-generic)
    (LP: #1833716)
    - powerpc/numa: improve control of topology updates
    - powerpc/numa: document topology_updates_enabled, disable by default

  * Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

  * [UBUNTU] kernel: Fix wrong dispatching for control domain CPRBs
    (LP: #1832624)
    - s390/zcrypt: Fix wrong dispatching for control domain CPRBs

  * CVE-2019-11815
    - net: rds: force to destroy connection if t_sock is NULL in
      rds_tcp_kill_sock().

  * Sound device not detected after resume from hibernate (LP: #1826868)
    - drm/i915: Force 2*96 MHz cdclk on glk/cnl when audio power is enabled
    - drm/i915: Save the old CDCLK atomic state
...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (30.5 KiB)

This bug was fixed in the package linux - 4.4.0-157.185

---------------
linux (4.4.0-157.185) xenial; urgency=medium

  * linux: 4.4.0-157.185 -proposed tracker (LP: #1837476)

  * systemd 229-4ubuntu21.22 ADT test failure with linux 4.4.0-156.183 (storage)
    (LP: #1837235)
    - Revert "block/bio: Do not zero user pages"
    - Revert "block: Clear kernel memory before copying to user"
    - Revert "bio_copy_from_iter(): get rid of copying iov_iter"

linux (4.4.0-156.183) xenial; urgency=medium

  * linux: 4.4.0-156.183 -proposed tracker (LP: #1836880)

  * BCM43602 802.11ac Wireless regression - PCI ID 14e4:43ba (LP: #1836801)
    - brcmfmac: add eth_type_trans back for PCIe full dongle

linux (4.4.0-155.182) xenial; urgency=medium

  * linux: 4.4.0-155.182 -proposed tracker (LP: #1834918)

  * Geneve tunnels don't work when ipv6 is disabled (LP: #1794232)
    - geneve: correctly handle ipv6.disable module parameter

  * Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

  * Handle overflow in proc_get_long of sysctl (LP: #1833935)
    - sysctl: handle overflow in proc_get_long

  * Xenial update: 4.4.181 upstream stable release (LP: #1832661)
    - x86/speculation/mds: Revert CPU buffer clear on double fault exit
    - x86/speculation/mds: Improve CPU buffer clear documentation
    - ARM: exynos: Fix a leaked reference by adding missing of_node_put
    - crypto: vmx - fix copy-paste error in CTR mode
    - crypto: crct10dif-generic - fix use via crypto_shash_digest()
    - crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
    - ALSA: usb-audio: Fix a memory leak bug
    - ALSA: hda/hdmi - Consider eld_valid when reporting jack event
    - ALSA: hda/realtek - EAPD turn on later
    - ASoC: max98090: Fix restore of DAPM Muxes
    - ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
    - mm/mincore.c: make mincore() more conservative
    - ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
    - mfd: da9063: Fix OTP control register names to match datasheets for
      DA9063/63L
    - tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
    - ext4: actually request zeroing of inode table after grow
    - ext4: fix ext4_show_options for file systems w/o journal
    - Btrfs: do not start a transaction at iterate_extent_inodes()
    - bcache: fix a race between cache register and cacheset unregister
    - bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
    - ipmi:ssif: compare block number correctly for multi-part return messages
    - crypto: gcm - Fix error return code in crypto_gcm_create_common()
    - crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
    - crypto: chacha20poly1305 - set cra_name correctly
    - crypto: salsa20 - don't access already-freed walk.iv
    - crypto: arm/aes-neonbs - don't access already-freed walk.iv
    - writeback: synchronize sync(2) against cgroup writeback membership switches
    - fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going
      into workqueue when umount
    - ALSA: hda/realtek - Fix for Lenovo B...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Terry Rudd (terrykrudd)
Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.