Ubuntu 16.04.01 kdump: IP prefix is missing in the directory name after first dump during kdump over ssh.

Bug #1599561 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
makedumpfile (Ubuntu)
Fix Released
Undecided
Taco Screen team
Xenial
Fix Released
High
Louis Bouchard

Bug Description

[SRU justification]
HOSTTAG=ip is unusable without this fix on some platforms

[Impact]
HOSTTAG functionality works as expected.

[Fix]
Loop on hostname -I for five seconds. Revert to HOSTTAG=hostname if IP address not found.

[Test Case]
The race condition is difficult to run into but the bug reporter has been able to confirm the change from a PPA (see comment #5).

[Regression]
Minimal, as the code path is only enhanced by one loop. Exit path remains the same.

[Original description of the problem]

== Comment: #0 - PAVITHRA R. PRAKASH - 2016-06-30 07:12:40 ==
---Problem Description---

During kdump over ssh IP prefix will be present only in first dump, subsequent dumps will not have IP in directory name.

---Steps to Reproduce---

1) apt-get install linux-crashdump
2) increase crashdump size:
sudo vim /etc/default/grub.d/kexec-tools.cfg

GRUB_CMDLINE_LINUX_DEFAULT="$GRUB_CMDLINE_LINUX_DEFAULT crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M

3) sudo update-grub ; reboot the machine
4) sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/g' /etc/default/kdump-tools
5) kdump-config show # should show "ready to dump"
6) ssh-keygen -t rsa
7) Edit below parameters in /etc/default/kdump-tools
SSH="root@<server IP>"
SSH_KEY=/root/.ssh/id_rsa
8) kdump-config propagate
9) kdump-config show
10) reboot
11) echo "c" > /proc/sysrq-trigger
12) verify dump is created in ssh server.
13) trigger the crash again.

Logs
=====
root@ubuntu:/home/ubuntu# uname -a
Linux ubuntu 4.4.0-26-generic #45-Ubuntu SMP Mon Jun 20 17:27:01 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux

dumps on ssh server when 3 crashes are triggered
------------------------------------------------------------------------
[root@ltc-fire5 crash]# ls
192.168.122.25-201606300523
[root@ltc-fire5 crash]# ls
192.168.122.25-201606300523 -201606300524
[root@ltc-fire5 crash]# ls
192.168.122.25-201606300523 -201606300524 -201606300525
[root@ltc-fire5 crash]# cd -201606300524
-bash: cd: -2: invalid option
cd: usage: cd [-L|[-P [-e]]] [dir]
[root@ltc-fire5 crash]# cd -- -201606300524
[root@ltc-fire5 -201606300524]# ls
dmesg.201606300524 dump.201606300524

== Comment: #8 - Kevin W. Rudd - 2016-07-05 18:23:01 ==

Canonical,

The behavior appears to be somewhat intermittent, and it looks as if define_stampdir() might be getting called before the network init has completed. If delaying this function is not practical, it might be helpful to have define_stampdir() fall back to using just "hostname" if "hostname -I" doesn't return any useful information for setting THIS_HOST.

Revision history for this message
bugproxy (bugproxy) wrote : Sosreport

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-143269 severity-high targetmilestone-inin16041
Changed in ubuntu:
assignee: nobody → Taco Screen team (taco-screen-team)
affects: ubuntu → makedumpfile (Ubuntu)
Louis Bouchard (louis)
Changed in makedumpfile (Ubuntu):
status: New → Triaged
Changed in makedumpfile (Ubuntu Xenial):
assignee: nobody → Louis Bouchard (louis-bouchard)
no longer affects: makedumpfile (Ubuntu Yakkety)
Changed in makedumpfile (Ubuntu Xenial):
status: New → Confirmed
Revision history for this message
Louis Bouchard (louis) wrote :

Hello,

While I'm not able to reproduce the behavior I do think that the bug is valid. It also bring up another issue with how the IP address is collected. On system with multiple IP address in IPv4 & IPv6, hostname -I would return :

192.168.88.240 10.0.4.1 192.168.30.1 192.168.10.1 192.168.122.1 192.168.11.1 10.0.3.1 10.172.64.118 fd39:6c94:2e21:a491::1 2001:67c:1562:8007::aac:4076

So while the race condition needs to be adressed, the definition method also needs to be improved.

I'll do my best to look at it in a timely manner.

Kind regards,

...Louis

Louis Bouchard (louis)
Changed in makedumpfile (Ubuntu):
status: Triaged → Confirmed
Revision history for this message
Louis Bouchard (louis) wrote :

Hello,

A tentative fix for this bug is available for testing in the following PPA :

   ppa:louis-bouchard/makedumpfile-test

If you could kindly test and verify that it solves this issue on Xenial as I am unable to reproduce the issue, it would help me a great deal.

If this is successful, I will upload it to Yakkety and proceed with the SRU process.

Kind regards,

...Louis

Changed in makedumpfile (Ubuntu Xenial):
status: Confirmed → In Progress
Changed in makedumpfile (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-07-12 16:11 EDT-------
Thanks Louis.

Version 1:1.5.9-5ubuntu1 seemed to do the trick in my secondary replication environment. But, since this behavior is very timing dependent (simply adding "debug" to the kdump kernel options would make the problem go away for my test), I'm also going to ask the original bug submitter to test in their environment.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2016-07-14 10:12 EDT-------
Hi Louis.

Your change looks good in both environments.

Thanks.

(In reply to comment #16)
> Hi Kevin,
>
> After the given update all the dump directories had ip prefix in directory
> name.
>
> [root@lep8a crash]# ls -lrt
> total 28
>
> Before update:
> drwxr-xr-x. 2 root root 4096 Jul 9 17:52 -201607132347
> drwxr-xr-x. 2 root root 4096 Jul 9 20:02 -201607140156
> drwxr-xr-x. 2 root root 4096 Jul 9 20:02 -201607140157
>
> After update:
> drwxr-xr-x. 2 root root 4096 Jul 9 20:09 192.168.122.62-201607140204
> drwxr-xr-x. 2 root root 4096 Jul 9 20:11 192.168.122.62-201607140205
> drwxr-xr-x. 2 root root 4096 Jul 9 20:14 192.168.122.62-201607140208
> drwxr-xr-x. 2 root root 4096 Jul 9 20:16 192.168.122.62-201607140210
>
>
> Thanks,
> Pavithra

Revision history for this message
Louis Bouchard (louis) wrote :

Hello,

Thanks for testing the fix. I will proceed with the SRU process.

Kind regards,

...Louis

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.0-2

---------------
makedumpfile (1:1.6.0-2) sid; urgency=medium

  * define_stampdir() : Loop on hostname -I for 5 sec to get IP address
    if HOSTTAG=ip. The network stack may not be ready when kdump-config runs.
    Give it some time before reverting HOSTTAG to hostname if an IP address
    cannot be found. (LP: #1599561)

  * debian/rules : drop the dh_installinit override
    Uses a syntax which is no longer supported and generate an error on
    install. (LP: #1599491)

 -- Louis Bouchard <email address hidden> Fri, 15 Jul 2016 17:25:33 +0200

Changed in makedumpfile (Ubuntu):
status: In Progress → Fix Released
Louis Bouchard (louis)
Changed in makedumpfile (Ubuntu Xenial):
importance: Undecided → High
description: updated
Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.5.9-5ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in makedumpfile (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-07-22 17:14 EDT-------
Looks good in my repeat dump testing.

Thanks.

tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.5.9-5ubuntu0.1

---------------
makedumpfile (1:1.5.9-5ubuntu0.1) xenial; urgency=medium

  [ Hari Bathini <email address hidden> ]
  * Fix networked kdump failure to reach remote server.
    Avoids "Network is unreachable" message when trying to do remote dumps on
    either SSH or NFS. (LP: #1571590)

  * Replace maxcpus by nr_cpus
    nr_cpus is a hard limit that has an impact on the (kdump) kernel
    memory consumption, while it is not the case with maxcpus=1, as we can
    theoretically hotplug cpus with maxcpus=1 (LP: #1568952)

  * define_stampdir() : Loop on hostname -I for 5 sec to get IP address
    if HOSTTAG=ip. The network stack may not be ready when kdump-config runs.
    Give it some time before reverting HOSTTAG to hostname if an IP address
    cannot be found. (LP: #1599561)

  * Add cio_ignore result to /etc/default/kdump-tools on s390x
    In order to have crashkernel=128M to work correctly on the s390
    architecture the result of cio_ignore -u -k needs to be appended to the
    KDUMP_CMDLINE_APPEND variable in /etc/default/kdump-tools. This patch
    adds the required logic to do the proper modification. (LP: #1570775)

  * debian/rules : drop the dh_installinit override
    Uses a syntax which is no longer supported and generate an error on
    install. (LP: #1599491)

 -- Louis Bouchard <email address hidden> Fri, 22 Jul 2016 10:15:20 +0200

Changed in makedumpfile (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for makedumpfile has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.