Cannot collect dump due to "Can't get a valid pmd_pte" error

Bug #1857616 reported by Guilherme G. Piccoli
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
makedumpfile (Ubuntu)
Fix Released
High
Guilherme G. Piccoli
Xenial
Opinion
Low
Guilherme G. Piccoli
Bionic
Fix Released
High
Guilherme G. Piccoli
Disco
Invalid
Undecided
Unassigned
Eoan
Fix Released
High
Guilherme G. Piccoli
Focal
Fix Released
High
Guilherme G. Piccoli

Bug Description

[Impact]

* Currently makedumpfile has 2 flaws due to: (a) out of synchronization with kernel code and, (b) kaslr handling. The first is related to a definition of a memory flag bit, whereas the second is about kaslr offset calculation - both cause similar failures when collecting the vmcore in kdump environment:

Excluding unnecessary pages : [ 46.3 %] / __vtop4_x86_64[ 39.341233]: Can't get a valid pmd_pte.
readmem: Can't convert a virtual address(ffffe05cb4000000) to physical address.
readmem: type_addr: 0, addr:ffffe05cb4000000, size:32768
__exclude_unnecessary_pages: Can't read the buffer of struct page.
create_2nd_bitmap: Can't exclude unnecessary pages.

* The report is mainly related to the first issue, which started to happen after the merge of kernel commit 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY flag"), introduced in kernel 5.3. After this commit, a memory flag was changed and induced the error in dump collection. The fix is available in makedumpfile, as commit 7bdb468c2c ("Increase SECTION_MAP_LAST_BIT to 4"). This is hereby SRUed to Bionic (due to HWE kernel 5.3), Eoan and Focal.

* The other issue is fixed in both Eoan and Focal, on makedumpfile 1.6.6-based version. It is related with the kaslr offset: if the offset is small enough, we may return 0 wrongly in get_kaslr_offset_x86_64(), causing the vmcore collection to fail or even worse, to erase unintended data from the memory dump. This is fixed by makedumpfile commit 3222d4ad04 ("x86_64: fix get_kaslr_offset_x86_64() to return kaslr_offset correctly"), which isn't present in versions before 1.6.6. We hereby SRU this fix for Bionic.

* Notice this modification is being worked concurrently with other kdump-tools' changes in LP #1828596.

[Test Case]

1) Deploy an Eoan VM e.g. with uvt-kvm;
2) Set-up console output in the guest;
3) Install the kdump-tools package;
4) Configure and collect a dump (with sysrq to panic the system) - the error aforementioned is observed given Eoan kernel is 5.3-based.

[Regression Potential]

* The modifications hereby proposed are minimal and scope-constrained to makedumpfile in x86_64; both are merged in makedumpfile upstream, one of them being already released in E/F. An unlikely regression would potentially fails vmcore collection in kdump environment.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

This proved not to be the case; in fact, the issue was caused by a change in kernel that changed a bit and hence, the vmcore collection failed.

The culprit was kernel commit 326e1b8f83a4 ("mm/sparsemem: introduce a SECTION_IS_EARLY flag"), introduced in kernel 5.3.
The makedumpfile fix for this comes in commit 7bdb468c2c ("Increase SECTION_MAP_LAST_BIT to 4").

The PPA launchpad.net/~gpiccoli/+archive/ubuntu/lp1857616 contains a build with this fix for testing purposes.
Cheers,

Guilherme

Changed in makedumpfile (Ubuntu Xenial):
status: New → Invalid
Changed in makedumpfile (Ubuntu Bionic):
status: New → Confirmed
Changed in makedumpfile (Ubuntu Disco):
status: New → Invalid
Changed in makedumpfile (Ubuntu Eoan):
status: New → Confirmed
importance: Undecided → High
Changed in makedumpfile (Ubuntu Disco):
importance: Undecided → Low
Changed in makedumpfile (Ubuntu Xenial):
importance: Undecided → Low
Changed in makedumpfile (Ubuntu Bionic):
importance: Undecided → High
Changed in makedumpfile (Ubuntu Eoan):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in makedumpfile (Ubuntu Disco):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in makedumpfile (Ubuntu Bionic):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Changed in makedumpfile (Ubuntu Xenial):
assignee: nobody → Guilherme G. Piccoli (gpiccoli)
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

We still need to discuss the feasibility/need of inclusion of each fix in Xenial.

description: updated
Changed in makedumpfile (Ubuntu Xenial):
status: Invalid → Opinion
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.6-4ubuntu2

---------------
makedumpfile (1:1.6.6-4ubuntu2) focal; urgency=medium

  * x86_64: Fix an error due to makedumpfile being out-of-sync with recent
    kernels. To achieve that, add the following patch: "Increase
    SECTION_MAP_LAST_BIT to 4". (LP: #1857616)

 -- <email address hidden> (Guilherme G. Piccoli) Fri, 03 Jan 2020 16:57:00 -0300

Changed in makedumpfile (Ubuntu Focal):
status: Confirmed → Fix Released
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

For this LP SRU submission, the following candidate packages were tested in amd64 arch running kernel 5.3:
* Bionic, candidate version 1.6.5-1ubuntu1~18.04.4;
* Eoan, candidate version 1.6.6-2ubuntu2;

The test consisted in performing a kdump collection with the package currently in -updates, and after it fails with the error message "Can't get a valid pmd_pte", we used the candidate version and observed the kdump complete successfully.

Cheers,

Guilherme

Changed in makedumpfile (Ubuntu Disco):
importance: Low → Undecided
assignee: Guilherme G. Piccoli (gpiccoli) → nobody
Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello Guilherme, or anyone else affected,

Accepted makedumpfile into eoan-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.6-2ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-eoan to verification-done-eoan. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-eoan. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Eoan):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-eoan
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (makedumpfile/1:1.6.6-2ubuntu2)

All autopkgtests for the newly accepted makedumpfile (1:1.6.6-2ubuntu2) for eoan have finished running.
The following regressions have been reported in tests triggered by the package:

makedumpfile/1:1.6.6-2ubuntu2 (i386, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/eoan/update_excuses.html#makedumpfile

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello Guilherme, or anyone else affected,

Accepted makedumpfile into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1~18.04.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Bionic):
status: Confirmed → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (makedumpfile/1:1.6.5-1ubuntu1~18.04.4)

All autopkgtests for the newly accepted makedumpfile (1:1.6.5-1ubuntu1~18.04.4) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

makedumpfile/1:1.6.5-1ubuntu1~18.04.4 (ppc64el, s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#makedumpfile

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

I was able to verify the Eoan -proposed version (1:1.6.6-2ubuntu2) - the test was to try a kdump using the makedumpfile from eoan-release, running kernel 5.3.0-26-generic; as expected, it fails with the error message "__vtop4_x86_64: Can't get a valid pmd_pte." and fallbacks to cp.
When using the -proposed version, it works as expected.

Cheers,

Guilherme

tags: added: verification-done-eoan
removed: verification-needed-eoan
Revision history for this message
Tushar Dave (tdavenvidia) wrote :

I tested kdump functionality using makedumpfile and kdump version '1:1.6.5-1ubuntu1~18.04.4' from -proposed on kernel 5.3.0-24 kernel and it works.

tags: added: verification-done verification-done-bionic
removed: verification-needed verification-needed-bionic
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for makedumpfile has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.6-2ubuntu2

---------------
makedumpfile (1:1.6.6-2ubuntu2) eoan; urgency=medium

  [ Thadeu Lima de Souza Cascardo ]
  * Fixes for DLPAR cpu add operation (LP: #1828596)
    - d/kdump-config.in: Add a reload command.
    - d/kdump-config.in: implement try-reload.
    - d/50-kdump-tools.rules: Use kdump-config reload after cpu or memory hotplug
    - d/50-kdump-tools.rules: use try-reload instead.
  * d/rules: Use reset_devices as a cmdline parameter. (LP: #1800566)

  [ Guilherme G. Piccoli ]
  * d/kdump-tools-dump.service: Add a systemd-resolved service dependency
    in order to make kdump-tool able to resolve DNS when in kdump boot.
    (LP: #1856323)
  * d/p/0003-Increase-SECTION_MAP_LAST_BIT-to-4.patch: x86_64: Fix an error due
    to makedumpfile being out-of-sync with recent kernels. (LP: #1857616)

 -- <email address hidden> (Guilherme G. Piccoli) Fri, 03 Jan 2020 16:10:19 -0300

Changed in makedumpfile (Ubuntu Eoan):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu1~18.04.4

---------------
makedumpfile (1:1.6.5-1ubuntu1~18.04.4) bionic; urgency=medium

  [ Thadeu Lima de Souza Cascardo ]
  * Fixes for DLPAR cpu add operation (LP: #1828596)
    - d/kdump-config.in: Add a reload command.
    - d/kdump-config.in: implement try-reload.
    - d/50-kdump-tools.rules: Use kdump-config reload after cpu or memory hotplug
    - d/50-kdump-tools.rules: use try-reload instead.
  * d/rules: Use reset_devices as a cmdline parameter. (LP: #1800566)

  [ Guilherme G. Piccoli ]
  * d/kdump-tools-dump.service: Add a systemd-resolved service dependency
    in order to make kdump-tool able to resolve DNS when in kdump boot.
    (LP: #1856323)
  * Fix an error due to makedumpfile being out-of-sync with recent kernels.
    (LP: #1857616)
    - d/p/0004-x86_64-fix-get_kaslr_offset_x86_64-to-return-kaslr_offset-correctly.patch
    - d/p/0005-Increase-SECTION_MAP_LAST_BIT-to-4.patch

 -- <email address hidden> (Guilherme G. Piccoli) Fri, 03 Jan 2020 13:14:39 -0300

Changed in makedumpfile (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.