Kdump fails on Ubuntu 16.04 (PowerVM/PowerKVM/BareMetal)

Bug #1536904 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
makedumpfile (Ubuntu)
Fix Released
High
Unassigned
Wily
Confirmed
High
Unassigned

Bug Description

== Comment: #0 - ==
---Problem Description---
Kdump fails on Ubuntu 16.04 with Austin adapter(tg3)

Contact Information = <email address hidden>, <email address hidden>,<email address hidden>

---uname output---
linux ltciofvtr-s822l1 4.3.0-5-generic #16-Ubuntu SMP Wed Dec 16 23:32:23 UTC 2015 ppc64le ppc64le ppc64le GNU/Linux

---Additional Hardware Info---
Machine details:
9.47.67.156 (root/ltcnetdd)

Machine Type = 8247-22L

---System Hang---
 The system hangs after triggering a crash. Need to reboot to bring it up and functional.

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 Steps to follow:
1. apt-get install linux-crashdump
2. apt-get install kdump-tools
3. Edit /etc/default/kdump-tools and change the following:
USE_KDUMP=0 to 1
4. Change the size of the crash kernel in /boot/grub/grub.cfg to crashkernel=4096M-:4096M
5. Load the kdump config file: kdump-config load
6. echo 1 > /proc/sys/kernel/sysrq
7. echo c > /proc/sysrq-trigger

Things to look at to cross-check are:

After loading the kdump-config file, check for it's status
root@ltciofvtr-s822l1:~# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr:
SSH: root@35.35.35.36
SSH_KEY: /root/.ssh/id_rsa
HOSTTAG: ip
current state: ready to kdump

kexec command:
  /sbin/kexec -p --args-linux --command-line="root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet splash irqpoll maxcpus=1 nousb systemd.unit=kdump-tools.service" --initrd=/boot/initrd.img-4.3.0-5-generic /boot/vmlinux-4.3.0-5-generic

root@ltciofvtr-s822l1:~# kdump-config status
 * Broken symlink : /var/lib/kdump/vmlinuz: broken symbolic link to /boot/vmlinuz-4.3.0-5-generic
current state : ready to kdump

root@ltciofvtr-s822l1:~# cat /proc/cmdline
root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet splash crashkernel=4096M-:4096M

root@ltciofvtr-s822l1:~# dmesg| grep -i crash
[ 0.000000] Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 131072MB)
[ 0.000000] Kernel command line: root=UUID=e445a093-4593-4e91-bebb-6968483bf2ea ro quiet splash crashkernel=4096M-:4096M

Observations:
1. Kdump-config status command reports that there is a broken symbloic link suggesting that kdump-config file is unable to handle the symbolic link.

2. Trace observed on console:
root@ltciofvtr-s822l1:~# echo c | tee /proc/sysrq-trigger
c
[ 238.872102] sysrq: SysRq : Trigger a crash
[ 238.872179] Unable to handle kernel paging request for data at address 0x00000000
[ 238.872256] Faulting instruction address: 0xc000000000646534
[ 238.872322] Oops: Kernel access of bad area, sig: 11 [#1]
[ 238.872373] SMP NR_CPUS=2048 NUMA PowerNV
[ 238.872427] Modules linked in: dm_round_robin dm_service_time ipmi_powernv ipmi_msghandler leds_powernv uio_pdrv_genirq powernv_rng uio dm_multipath sunrpc bonding autofs4 btrfs xor raid6_pq mlx4_en ses enclosure bnx2x mlx4_core lpfc qla2xxx mdio libcrc32c be2net e1000e vxlan ipr ip6_udp_tunnel udp_tunnel scsi_transport_fc
[ 238.872895] CPU: 121 PID: 3861 Comm: tee Not tainted 4.3.0-5-generic #16-Ubuntu
[ 238.872973] task: c000000fe01ce860 ti: c000000fe022c000 task.ti: c000000fe022c000
[ 238.873049] NIP: c000000000646534 LR: c0000000006475f8 CTR: c000000000646500
[ 238.873125] REGS: c000000fe022f990 TRAP: 0300 Not tainted (4.3.0-5-generic)
[ 238.873200] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 28004222 XER: 20000000
[ 238.873392] CFAR: c000000000008468 DAR: 0000000000000000 DSISR: 42000000 SOFTE: 1
GPR00: c0000000006475f8 c000000fe022fc10 c00000000155e400 0000000000000063
GPR04: c0000007fc648450 c0000007fc659cf0 c000001fff830000 0000000000000792
GPR08: 0000000000000007 0000000000000001 0000000000000000 c000001fff861780
GPR12: c000000000646500 c000000007b87d80 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000010009d88 0000000000000001
GPR24: 0000000010009d88 00003fffe7b210b0 c0000000014a5cb0 0000000000000004
GPR28: c0000000014a6070 0000000000000063 c000000001460de4 0000000000000000
[ 238.875062] NIP [c000000000646534] sysrq_handle_crash+0x34/0x50
[ 238.875178] LR [c0000000006475f8] __handle_sysrq+0xe8/0x280
[ 238.875270] Call Trace:
[ 238.875322] [c000000fe022fc10] [c000000000dc92a0] _fw_tigon_tg3_bin_name+0x2c5d0/0x33708 (unreliable)
[ 238.875516] [c000000fe022fc30] [c0000000006475f8] __handle_sysrq+0xe8/0x280
[ 238.875658] [c000000fe022fcd0] [c000000000647da8] write_sysrq_trigger+0x78/0xa0
[ 238.875820] [c000000fe022fd00] [c00000000036bf50] proc_reg_write+0xb0/0x110
[ 238.875963] [c000000fe022fd50] [c0000000002d45bc] __vfs_write+0x6c/0xe0
[ 238.876104] [c000000fe022fd90] [c0000000002d52f0] vfs_write+0xc0/0x230
[ 238.876246] [c000000fe022fde0] [c0000000002d632c] SyS_write+0x6c/0x110
[ 238.876389] [c000000fe022fe30] [c000000000009204] system_call+0x38/0xb4
[ 238.876525] Instruction dump:
[ 238.876601] 38427f00 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d22001a 3949aae4
[ 238.876843] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6
[ 238.877091] ---[ end trace 2028716a4fb3f0e5 ]---
[ 238.880521]
[ 238.880590] Sending IPI to other CPUs
[ 238.881716] IPI complete

The system hang is observed here.

3. No crash dump generated after a reboot.

4. Kdump hang also observed on kvm ,PowerVM as well open power

Stack trace output:
 [ 238.875270] Call Trace:
[ 238.875322] [c000000fe022fc10] [c000000000dc92a0] _fw_tigon_tg3_bin_name+0x2c5d0/0x33708 (unreliable)
[ 238.875516] [c000000fe022fc30] [c0000000006475f8] __handle_sysrq+0xe8/0x280
[ 238.875658] [c000000fe022fcd0] [c000000000647da8] write_sysrq_trigger+0x78/0xa0
[ 238.875820] [c000000fe022fd00] [c00000000036bf50] proc_reg_write+0xb0/0x110
[ 238.875963] [c000000fe022fd50] [c0000000002d45bc] __vfs_write+0x6c/0xe0
[ 238.876104] [c000000fe022fd90] [c0000000002d52f0] vfs_write+0xc0/0x230
[ 238.876246] [c000000fe022fde0] [c0000000002d632c] SyS_write+0x6c/0x110
[ 238.876389] [c000000fe022fe30] [c000000000009204] system_call+0x38/0xb4
[ 238.876525] Instruction dump:
[ 238.876601] 38427f00 7c0802a6 f8010010 f821ffe1 60000000 60000000 3d22001a 3949aae4
[ 238.876843] 39200001 912a0000 7c0004ac 39400000 <992a0000> 38210020 e8010010 7c0803a6
[ 238.877091] ---[ end trace 2028716a4fb3f0e5 ]---

Oops output:
 no

System Dump Location:
 No dump generated

*Additional Instructions for <email address hidden>, <email address hidden>,<email address hidden>:
-Post a private note with access information to the machine that the bug is occuring on.
-Attach sysctl -a output output to the bug.

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-135822 severity-critical targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1536904/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Steve Langasek (vorlon)
affects: ubuntu → makedumpfile (Ubuntu)
Revision history for this message
Louis Bouchard (louis) wrote :
Download full text (4.3 KiB)

Hello,

I must admit that I am a bit puzzled by some of your statements :

  ---Steps to Reproduce---
   Steps to follow:
  1. apt-get install linux-crashdump
  2. apt-get install kdump-tools

Step 2 is not required : kdump-tools is a dependency of linux-crashdump so it is installed automatically :

  3. Edit /etc/default/kdump-tools and change the following:
  USE_KDUMP=0 to 1
This is not required. kdump-tools 1:1.5.9-3 does that automatically during installation and you should be prompted to accept it :

  ┌──────────────────────────────────────────────┤ Configuring kdump-tools ├──────────────────────────────────────────────┐
  │ │
  │ If you choose this option, the kdump-tools mechanism will be enabled. A reboot is still required in order to enable │
  │ the crashkernel kernel parameter. │
  │ │
  │ Should kdump-tools be enabled by default? │
  │ │
  │ <Yes> <No> │
  │ │
  └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

  4. Change the size of the crash kernel in /boot/grub/grub.cfg to crashkernel=4096M-:4096M
  5. Load the kdump config file: kdump-config load

This will fail with the following :

# kdump-config load
 * no crashkernel= parameter in the kernel cmdline

Which is normal as a reboot is required in order to have the crashkernel parameter taken into account after the reboot.

  6. echo 1 > /proc/sys/kernel/sysrq
  7. echo c > /proc/sysrq-trigger

The hang following this command is normal : as previously stated, a reboot is required otherwise kdump-tools is not loaded.

After the reboot, you should see the following in /var/log/syslog :

Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: Starting kdump-tools: * Missing symlink : /var/lib/kdump/initrd.img
Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: * Creating symlink /var/lib/kdump/initrd.img
Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: * Missing symlink : /var/lib/kdump/vmlinuz
Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: * Creating symlink /var/lib/kdump/vmlinuz
Jan 25 11:12:26 XenialS-crashdump kdump-tools[523]: * loaded kdump kernel

To verify the status of kdump you can do :

# kdump-config show
DUMP_MODE: kdump
USE_KDUMP: 1
KDUMP_SYSCTL: kernel.panic_on_oops=1
KDUMP_COREDIR: /var/crash
crashkernel addr: 0x2c000000
current state: ready to kdump

kexec command:
  /sbin/kexec -p --command-line="BOOT_I...

Read more...

Changed in makedumpfile (Ubuntu):
status: New → Invalid
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (4.9 KiB)

------- Comment From <email address hidden> 2016-01-25 06:32 EDT-------
(In reply to comment #12)
> Hello,
>
> I must admit that I am a bit puzzled by some of your statements :
>
> ---Steps to Reproduce---
> Steps to follow:
> 1. apt-get install linux-crashdump
> 2. apt-get install kdump-tools
>
> Step 2 is not required : kdump-tools is a dependency of linux-crashdump so
> it is installed automatically :
>
> 3. Edit /etc/default/kdump-tools and change the following:
> USE_KDUMP=0 to 1
> This is not required. kdump-tools 1:1.5.9-3 does that automatically during
> installation and you should be prompted to accept it :
>
> ???????????????????????????????????????????????? Configuring kdump-tools
> ????????????????????????????????????????????????
> ?
> ?
> ? If you choose this option, the kdump-tools mechanism will be enabled. A
> reboot is still required in order to enable ?
> ? the crashkernel kernel parameter.
> ?
> ?
> ?
> ? Should kdump-tools be enabled by default?
> ?
> ?
> ?
> ? <Yes>
> <No> ?
> ?
> ?
> ?????????????????????????????????????????????????????????????????????????????
> ????????????????????????????????????????????
>
> 4. Change the size of the crash kernel in /boot/grub/grub.cfg to
> crashkernel=4096M-:4096M
> 5. Load the kdump config file: kdump-config load
>
> This will fail with the following :
>
> # kdump-config load
> * no crashkernel= parameter in the kernel cmdline
>

Hi Louis,

The description should have another step between 4 & 5.
4a. Reboot.

While this step was missing in description, reboot was done as you can see
from the output of "kdump-config show" command.

> Which is normal as a reboot is required in order to have the crashkernel
> parameter taken into account after the reboot.
>
> 6. echo 1 > /proc/sys/kernel/sysrq
> 7. echo c > /proc/sysrq-trigger
>
> The hang following this command is normal : as previously stated, a reboot
> is required otherwise kdump-tools is not loaded.
>
> After the reboot, you should see the following in /var/log/syslog :
>
> Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: Starting kdump-tools: *
> Missing symlink : /var/lib/kdump/initrd.img
> Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: * Creating symlink
> /var/lib/kdump/initrd.img
> Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: * Missing symlink :
> /var/lib/kdump/vmlinuz
> Jan 25 11:12:25 XenialS-crashdump kdump-tools[523]: * Creating symlink
> /var/lib/kdump/vmlinuz
> Jan 25 11:12:26 XenialS-crashdump kdump-tools[523]: * loaded kdump kernel
>
> To verify the status of kdump you can do :
>
> # kdump-config show
> DUMP_MODE: kdump
> USE_KDUMP: 1
> KDUMP_SYSCTL: kernel.panic_on_oops=1
> KDUMP_COREDIR: /var/crash
> crashkernel addr: 0x2c000000
> current state: ready to kdump
>
> kexec command:
> /sbin/kexec -p --command-line="BOOT_IMAGE=/vmlinuz-4.3.0-7-generic
> root=/dev/mapper/VividS--vg-root ro console=ttyS0,115200 irqpoll maxcpus=1
> nousb systemd.unit=kdump-tools.service" --initrd=/var/lib/kdump/initrd.img
> /var/lib/kdump/vmlinuz
>
> Your bug statement shows the following :
>
> /sbin/kexec -p --args-linux
> --command-...

Read more...

Revision history for this message
Louis Bouchard (louis) wrote :

Hello,

Sorry for the delay in replying.

This is definitively a regression following the implementation of smaller initrd. I am currently working at fixing this. Your second problem might be caused by not using smaller initrd so I would suggest to wait to test the fix for this.

I can have a test package available quickly if you have the possibility of testing from a PPA (according to a previous bug I think you do) so let me know & I'll tell you where to find the PPA.

Changed in makedumpfile (Ubuntu):
status: Invalid → In Progress
importance: Undecided → High
assignee: Taco Screen team (taco-screen-team) → Louis Bouchard (louis-bouchard)
Changed in makedumpfile (Ubuntu Wily):
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Louis Bouchard (louis-bouchard)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-01-28 06:24 EDT-------
(In reply to comment #15)
Hey louis,
I can test with the PPA. Kindly point to it's location

Revision history for this message
Louis Bouchard (louis) wrote :

Hello,

You can find the upcoming version in the following PPA : ppa:louis-bouchard/makedumpfile-tests. Here is the changelog :

  * Allow for symlinks to be created for vmlinux files : On Power8
    architecture, systems are booting from a vmlinux file. The symlink
    /var/lib/kdump/vmlinuz has to point to this file. (LP: #1536904)

  * Add functionality to create symlinks for older kernels : If kdump
    is installed on systems with more than one kernel package, the smaller
    initrd.img file will only be created for the latest kernel. Adding the
    'symlinks' functionality will allow for the creation of symlinks to
    older kernels. If the smaller initrd.img file is missing in /var/lib/kdump
    it will be created beforehand. This will be preempted if kdump is already
    loaded. (LP: #1537714)

  * Fix kdump-config manpage : add documentation for the propagate option.
    (LP: #1538148)

  * Improve manpage for kdump-config

Revision history for this message
Louis Bouchard (louis) wrote :

Sorry for the formatting : the ppa is

ppa:louis-bouchard/makedumpfile-tests

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-04 04:14 EDT-------
raised kdump bug :

https://bugzilla.linux.ibm.com/show_bug.cgi?id=136588

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-05 00:30 EDT-------
(In reply to comment #21)
> Ok.. Let us close this bug as symlink issue is resolved.
> Please open a new bug to track the hang issue.
>
> Thanks
> Hari

Alright..Closing this bug.

tags: added: targetmilestone-inin1604
removed: targetmilestone-inin---
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.5.9-4

---------------
makedumpfile (1:1.5.9-4) sid; urgency=medium

  * Allow for symlinks to be created for vmlinux files : On Power8
    architecture, systems are booting from a vmlinux file. The symlink
    /var/lib/kdump/vmlinuz has to point to this file. (LP: #1536904)

  * Add functionality to create symlinks for older kernels : If kdump
    is installed on systems with more than one kernel package, the smaller
    initrd.img file will only be created for the latest kernel. Adding the
    'symlinks' functionality will allow for the creation of symlinks to
    older kernels. If the smaller initrd.img file is missing in /var/lib/kdump
    it will be created beforehand. This will be preempted if kdump is already
    loaded. (LP: #1537714)

  * Fix kdump-config manpage : add documentation for the propagate option.
    (LP: #1538148)

  * Improve manpage for kdump-config

 -- Louis Bouchard <email address hidden> Tue, 26 Jan 2016 15:30:48 +0100

Changed in makedumpfile (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-02-10 04:02 EDT-------
*** Bug 136820 has been marked as a duplicate of this bug. ***

Louis Bouchard (louis)
Changed in makedumpfile (Ubuntu):
assignee: Louis Bouchard (louis) → nobody
Changed in makedumpfile (Ubuntu Wily):
assignee: Louis Bouchard (louis) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.