HP ProLiant m400 nic doesn't work after trusty

Bug #1386490 reported by dann frazier
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
debian-installer (Ubuntu)
Fix Released
High
Unassigned
Utopic
Fix Released
High
Unassigned
Vivid
Fix Released
High
Unassigned
linux (Ubuntu)
Fix Released
High
Craig Magina
Utopic
Fix Released
High
Ming Lei
Vivid
Fix Released
High
Craig Magina

Bug Description

Starting in 3.15, arm64 began defaulting to non-coherent dma_ops:

commit c7a4a7658d689f664050c45493d79adf053f226e
Author: Ritesh Harjani <email address hidden>
Date: Wed Apr 23 06:29:46 2014 +0100

    arm64: Make default dma_ops to be noncoherent

Firmware (dtb in the case of the m400) is responsible for telling the kernel when a device requires coherent dma_ops. However, as of utopic, this property is not being inherited by downstream devices. Specifically, the xgene-pcie device is marked as coherent, but the devices behind it (mellanox card) still get initialized with non-coherent ops.

This results in the mlx4 driver bailing out with the following messages:
[ 18.703635] mlx4_core 0000:01:00.0: command 0x23 timed out (go bit not cleared)
[ 18.710911] mlx4_core 0000:01:00.0: Failed to initialize queue pair table, aborting

There's an upstream discussion on the topic here:
  http://www.spinics.net/lists/arm-kernel/msg362320.html

Revision history for this message
dann frazier (dannf) wrote :
dann frazier (dannf)
Changed in linux (Ubuntu Utopic):
status: New → Confirmed
importance: Undecided → High
tags: added: kernel-da-key utopic vivid
Revision history for this message
Ming Lei (tom-leiming) wrote :

I have sent out one patch to fix the problem for upstream:

    http://marc.info/?t=141706718400005&r=1&w=2

Let's see the further discussion.

Revision history for this message
Ming Lei (tom-leiming) wrote :

From ARM64 maintainer's viewpoint:

         http://marc.info/?l=linux-arm-kernel&m=141708838404470&w=2

         Either way, I don't think it's a problem for the kernel. We just need to
         change the default DMA ops to coherent when booting with ACPI (using
         non-coherent ops for a coherent device is not safe as the CPU can
         corrupt cache lines written by the device).

So I suggest to revert c7a4a7658d689f6 for utopic since utopic ships APM's
non-upstreamed PCI implementation, and APM's ARM64 Soc is coherent
arch.

Dann, what do you think about it?

Thanks,

Changed in linux (Ubuntu Utopic):
assignee: nobody → Ming Lei (tom-leiming)
Revision history for this message
dann frazier (dannf) wrote : Re: [Bug 1386490] Re: HP ProLiant m400 nic doesn't work after trusty

On Thu, Nov 27, 2014 at 8:17 PM, Ming Lei <email address hidden> wrote:
> >From ARM64 maintainer's viewpoint:
>
> http://marc.info/?l=linux-arm-kernel&m=141708838404470&w=2
>
> Either way, I don't think it's a problem for the kernel. We just need to
> change the default DMA ops to coherent when booting with ACPI (using
> non-coherent ops for a coherent device is not safe as the CPU can
> corrupt cache lines written by the device).
>
> So I suggest to revert c7a4a7658d689f6 for utopic since utopic ships APM's
> non-upstreamed PCI implementation, and APM's ARM64 Soc is coherent
> arch.
>
> Dann, what do you think about it?

Ming,
 Thanks for looking into this.
I'm not sure I follow the relevancy of a couple things:

 - The upstream discussion seems to have gotten rerouted to solving
the problem for ACPI, but this issue is regarding device-tree (as was
your patch), since that is what Ubuntu/m400 uses. I'm having trouble
seeing how the ACPI solution helps us.

 - I'm confident I reproduced this problem with all upstream bits
after PCI was merged. Therefore my assumption was that this issue is
independent of PCI stack choice. Do you believe that the solution for
the upstream PCI stack is different than the solution for the APM PCI
stack?

In general I'm OK with a revert of c7a4a7658d689f6 for utopic, as long
as we have a plan for an upstream answer for vivid and beyond (>=
3.19). I don't think the kernel team would support a plan that
involves carrying this revert indefinitely (if that's what you're
suggesting, I'm not sure it is).

  -dann

> Thanks,
>
> ** Changed in: linux (Ubuntu Utopic)
> Assignee: (unassigned) => Ming Lei (tom-leiming)
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1386490
>
> Title:
> HP ProLiant m400 nic doesn't work after trusty
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386490/+subscriptions

Revision history for this message
Ming Lei (tom-leiming) wrote :
Download full text (4.4 KiB)

On Tue, Dec 2, 2014 at 2:49 AM, dann frazier <email address hidden> wrote:
> On Thu, Nov 27, 2014 at 8:17 PM, Ming Lei <email address hidden> wrote:
>> >From ARM64 maintainer's viewpoint:
>>
>> http://marc.info/?l=linux-arm-kernel&m=141708838404470&w=2
>>
>> Either way, I don't think it's a problem for the kernel. We just need to
>> change the default DMA ops to coherent when booting with ACPI (using
>> non-coherent ops for a coherent device is not safe as the CPU can
>> corrupt cache lines written by the device).
>>
>> So I suggest to revert c7a4a7658d689f6 for utopic since utopic ships APM's
>> non-upstreamed PCI implementation, and APM's ARM64 Soc is coherent
>> arch.
>>
>> Dann, what do you think about it?
>
> Ming,
> Thanks for looking into this.
> I'm not sure I follow the relevancy of a couple things:
>
> - The upstream discussion seems to have gotten rerouted to solving
> the problem for ACPI, but this issue is regarding device-tree (as was
> your patch), since that is what Ubuntu/m400 uses. I'm having trouble
> seeing how the ACPI solution helps us.

Firstly I understand upstream prefers ACPI for arm64 server.

For DT based solution, my patch or sort of fix is needed for the
issue, but upstream community thought handling dma coherency
(include irq, dma mask, iommu, ...) should be moved to drivers/pci
of kernel first, and that work need cooperation between arm64
and pci community, and it might not easy to merge soon.

From the discussion, Redhat also ships the similar patch in their
internal tree.

>
> - I'm confident I reproduced this problem with all upstream bits
> after PCI was merged. Therefore my assumption was that this issue is
> independent of PCI stack choice. Do you believe that the solution for
> the upstream PCI stack is different than the solution for the APM PCI
> stack?

The issue or root cause is very clear, and all PCI devices can't
inherit dma coherent attribute on ARM64, both upstream and
APM PCI stack.

The patch I posted can't apply on APM PCI stack since its
implementation is very different with upstream, so reverting
c7a4a7658d is easier for utopic.

>
> In general I'm OK with a revert of c7a4a7658d689f6 for utopic, as long
> as we have a plan for an upstream answer for vivid and beyond (>=
> 3.19). I don't think the kernel team would support a plan that
> involves carrying this revert indefinitely (if that's what you're
> suggesting, I'm not sure it is).

It depends on upstream, as I described above, :-)

Thanks,

>
> -dann
>
>> Thanks,
>>
>> ** Changed in: linux (Ubuntu Utopic)
>> Assignee: (unassigned) => Ming Lei (tom-leiming)
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1386490
>>
>> Title:
>> HP ProLiant m400 nic doesn't work after trusty
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386490/+subscriptions
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1386490
>
> Title:
> HP ProLiant m400 nic doesn't work after trusty
>
> Statu...

Read more...

Brad Figg (brad-figg)
Changed in linux (Ubuntu Utopic):
status: Confirmed → Fix Committed
tags: added: arm-hs-vivid
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-utopic' to 'verification-done-utopic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-utopic
Revision history for this message
Ming Lei (tom-leiming) wrote :

ubuntu@ms10-42-mcdivittA3:~$ uname -a
Linux ms10-42-mcdivittA3 3.16.0-29-generic #39-Ubuntu SMP Mon Dec 15 22:31:48 UTC 2014 aarch64 aarch64 aarch64 GNU/Linux
ubuntu@ms10-42-mcdivittA3:~$ ping kernel.ubuntu.com
PING kernel.ubuntu.com (91.189.94.216) 56(84) bytes of data.
64 bytes from zinc.canonical.com (91.189.94.216): icmp_seq=1 ttl=58 time=80.4 ms
64 bytes from zinc.canonical.com (91.189.94.216): icmp_seq=2 ttl=58 time=80.4 ms
64 bytes from zinc.canonical.com (91.189.94.216): icmp_seq=3 ttl=58 time=80.5 ms

tags: added: verification-done-utopic
removed: verification-needed-utopic
Revision history for this message
Adam Conrad (adconrad) wrote : Please test proposed package

Hello dann, or anyone else affected,

Accepted debian-installer into utopic-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/debian-installer/20101020ubuntu352.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in debian-installer (Ubuntu Utopic):
status: New → Fix Committed
tags: added: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (21.8 KiB)

This bug was fixed in the package linux - 3.16.0-29.39

---------------
linux (3.16.0-29.39) utopic; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1402822

  [ AceLan Kao ]

  * SAUCE: Add use_native_backlight quirk for HP ProBook 6570b
    - LP: #1359010

  [ Andy Whitcroft ]

  * Revert "SAUCE: (no-up) arm64: optimized copy_to_user and copy_from_user
    assembly code"
    - LP: #1398596
  * [Config] updateconfigs to balance CONFIG_SCOM_DEBUGFS

  [ Paolo Pisati ]

  * [Config] armhf: VIRTIO_[BALLOON|MMIO]=y

  [ Upstream Kernel Changes ]

  * Revert "arm64: Make default dma_ops to be noncoherent"
    - LP: #1386490
  * Revert "percpu: free percpu allocation info for uniprocessor system"
    - LP: #1401079
  * ath3k: Add support of MCI 13d3:3408 bt device
    - LP: #1395465
  * x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is
    read-only
    - LP: #1379340
  * cpufreq: Allow stop CPU callback to be used by all cpufreq drivers
    - LP: #1397928
  * cpufreq: powernv: Set the pstate of the last hotplugged out cpu in
    policy->cpus to minimum
    - LP: #1397928
  * cpufreq: powernv: Set the cpus to nominal frequency during reboot/kexec
    - LP: #1397928
  * xen-netfront: Remove BUGs on paged skb data which crosses a page
    boundary
    - LP: #1275879
  * ACPI / blacklist: blacklist Win8 OSI for Dell Vostro 3546
    - LP: #1383589
  * iwlwifi: add device / firmware to fw-error-dump file
    - LP: #1399440
  * iwlwifi: rename iwl_mvm_fw_error_next_data
    - LP: #1399440
  * iwlwifi: pcie: add firmware monitor capabilities
    - LP: #1399440
  * iwlwifi: remove wrong comment about alignment in iwl-fw-error-dump.h
    - LP: #1399440
  * iwlwifi: mvm: don't collect logs in the interrupt thread
    - LP: #1399440
  * iwlwifi: mvm: kill iwl_mvm_fw_error_rxf_dump
    - LP: #1399440
  * iwlwifi: mvm: update layout of firmware error dump
    - LP: #1399440
  * powerpc/pseries: Fix endiannes issue in RTAS call from xmon
    - LP: #1396235
  * mmc: sdhci-pci-o2micro: Fix Dell E5440 issue
    - LP: #1346067
  * mfd: rtsx: Fix PM suspend for 5227 & 5249
    - LP: #1359052
  * samsung-laptop: Add broken-acpi-video quirk for NC210/NC110
    - LP: #1401079
  * acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
    - LP: #1401079
  * pinctrl: baytrail: show output gpio state correctly on Intel Baytrail
    - LP: #1401079
  * ALSA: hda - Add dock support for Thinkpad T440 (17aa:2212)
    - LP: #1401079
  * ALSA: hda - Add ultra dock support for Thinkpad X240.
    - LP: #1401079
  * rbd: Fix error recovery in rbd_obj_read_sync()
    - LP: #1401079
  * ds3000: fix LNB supply voltage on Tevii S480 on initialization
    - LP: #1401079
  * powerpc: do_notify_resume can be called with bad thread_info flags
    argument
    - LP: #1401079
  * powerpc/powernv: Properly fix LPC debugfs endianness
    - LP: #1401079
  * irqchip: armada-370-xp: Fix MSI interrupt handling
    - LP: #1401079
  * irqchip: armada-370-xp: Fix MPIC interrupt handling
    - LP: #1401079
  * USB: kobil_sct: fix non-atomic allocation in write path
    - LP: #1401079
  * USB: opticon: fix non-atomic allocation in write path
    - LP: #14010...

Changed in linux (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package debian-installer - 20101020ubuntu352.1

---------------
debian-installer (20101020ubuntu352.1) utopic; urgency=medium

  * Move master kernels to 3.16.0-29 (LP: #1386490)
 -- Adam Conrad <email address hidden> Wed, 07 Jan 2015 11:39:23 -0700

Changed in debian-installer (Ubuntu Utopic):
status: Fix Committed → Fix Released
Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Quick history summary and status update: Ming originally proposed a fix for upstream, but this was rejected as upstream wanted a more complete, larger fix that included some PCI changes. Such a change has now been proposed to upstream, is being discussed and is nearing acceptance.

Until the upstream fix is accepted and pulled into an Ubuntu release, we will continue to attempt to carry the reversion in the generic Ubuntu kernel.

This bug is now tracking the propagation of the reversion, and the successful upstream submission of a fix for this issue.

Revision history for this message
Mathew Hodson (mhodson) wrote :

debian-installer was moved to utopic-updates so removing tag.

tags: removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in debian-installer (Ubuntu):
status: New → Confirmed
Samir Methia (smethia)
tags: added: mcdivitt-fwd-comp
Raghuram Kota (rkota)
Changed in santacruz:
milestone: none → 15.04-required
Samir Methia (smethia)
Changed in lomond:
assignee: nobody → Craig Magina (craig.magina)
Changed in santacruz:
assignee: nobody → Craig Magina (craig.magina)
Changed in linux (Ubuntu Vivid):
assignee: nobody → Craig Magina (craig.magina)
Samir Methia (smethia)
no longer affects: lomond
Changed in debian-installer (Ubuntu Vivid):
importance: Undecided → High
Changed in debian-installer (Ubuntu Utopic):
importance: Undecided → High
Raghuram Kota (rkota)
no longer affects: santacruz
Revision history for this message
Mathieu Trudel-Lapierre (cyphermox) wrote :

I see it was fixed in utopic for d-i already; are we good on vivid too?

Revision history for this message
Craig Magina (craig.magina) wrote :

The upstream has merged a patch set into the iommu topic tree that solves this issue for all platforms. I am currently testing that patch set with the vivid 3.19 kernel.

Revision history for this message
Brian Murray (brian-murray) wrote :

What is the status of your testing Craig?

Revision history for this message
Craig Magina (craig.magina) wrote :

Here is the patch set that fixes this issue:
https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/log/?h=pci/iommu&showmsg=1

I have tested them and they work when applied to the 3.19.0-10.10 kernel. The set of patches are from HEAD to pci/iommu.

Changed in debian-installer (Ubuntu Vivid):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Vivid):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (6.8 KiB)

This bug was fixed in the package linux - 3.19.0-12.12

---------------
linux (3.19.0-12.12) vivid; urgency=low

  [ Andy Whitcroft ]

  * [Packaging] do_common_tools should always be on
  * [Packaging] Provides: virtualbox-guest-modules when appropriate
    - LP: #1434579

  [ Chris J Arges ]

  * Revert "SAUCE: ext4: disable ext4_punch_hole for indirect filesystems"
    - LP: #1292234

  [ Leann Ogasawara ]

  * Release Tracking Bug
    - LP: #1439803

  [ Timo Aaltonen ]

  * SAUCE: i915_bpo: Provide a backport driver for Skylake & Cherryview
    graphics
    - LP: #1420774
  * SAUCE: i915_bpo: Update intel_ips.h file location
    - LP: #1420774
  * SAUCE: i915_bpo: Only support Skylake and Cherryview with the backport
    driver
    - LP: #1420774
  * SAUCE: i915_bpo: Rename the backport driver to i915_bpo
    - LP: #1420774
  * i915_bpo: [Config] Enable CONFIG_DRM_I915_BPO=m
    - LP: #1420774
  * SAUCE: i915_bpo: Add i915_bpo_*() calls for ubuntu/i915
    - LP: #1420774
  * SAUCE: i915_bpo: Revert "drm/i915: remove unused
    power_well/get_cdclk_freq api"
    - LP: #1420774
  * SAUCE: i915_bpo: Add i915_bpo specific power well calls
    - LP: #1420774
  * SAUCE: Backport I915_PARAM_MMAP_VERSION and I915_MMAP_WC
    - LP: #1420774
  * SAUCE: Partial backport of drm/i915: Add ioctl to set per-context
    parameters
    - LP: #1420774
  * SAUCE: drm/i915: Specify bsd rings through exec flag
    - LP: #1420774
  * SAUCE: drm/i915: add I915_PARAM_HAS_BSD2 to i915_getparam
    - LP: #1420774
  * SAUCE: drm/i915: add component support
    - LP: #1420774
  * SAUCE: drm/i915: Add tiled framebuffer modifiers
    - LP: #1420774
  * SAUCE: Backport new displayable tiling formats
    - LP: #1420774
  * SAUCE: Backport drm_crtc_vblank_reset() helper
    - LP: #1420774
  * SAUCE: drm/i915: Add I915_PARAM_REVISION
    - LP: #1420774
  * SAUCE: drm/i915: Export total subslice and EU counts
    - LP: #1420774
  * SAUCE: i915_bpo: Revert drm/mm: Support 4 GiB and larger ranges
    - LP: #1420774

  [ Upstream Kernel Changes ]

  * drm/i915/skl: Split the SKL PCI ids by GT
    - LP: #1420774
  * drm: Reorganize probed mode validation
    - LP: #1420774
  * drm: Perform basic sanity checks on probed modes
    - LP: #1420774
  * drm: Do basic sanity checks for user modes
    - LP: #1420774
  * drm/atomic-helper: Export both plane and modeset check helpers
    - LP: #1420774
  * drm/atomic-helper: Again check modeset *before* plane states
    - LP: #1420774
  * drm/atomic: Introduce state->obj backpointers
    - LP: #1420774
  * drm: allow property validation for refcnted props
    - LP: #1420774
  * drm: store property instead of id in obj attachment
    - LP: #1420774
  * drm: get rid of direct property value access
    - LP: #1420774
  * drm: add atomic_set_property wrappers
    - LP: #1420774
  * drm: tweak getconnector locking
    - LP: #1420774
  * drm: add atomic_get_property
    - LP: #1420774
  * drm: Remove unneeded braces for single statement blocks
    - LP: #1420774
  * drm: refactor getproperties/getconnector
    - LP: #1420774
  * drm: add atomic properties
    - LP: #1420774
  * drm/atomic: atomic_check functions
    - LP: #1420774
  * drm: s...

Read more...

Changed in linux (Ubuntu Vivid):
status: Fix Committed → Fix Released
Adam Conrad (adconrad)
Changed in debian-installer (Ubuntu Vivid):
status: Fix Committed → Fix Released
Changed in debian-installer (Ubuntu):
status: Fix Committed → Fix Released
Mathew Hodson (mhodson)
affects: linux → ubuntu-translations
no longer affects: ubuntu-translations
Revision history for this message
Bjorn Helgaas (bjorn-helgaas) wrote :

The git link in comment #17 is stale because it references a branch name, which is often re-used.

I think http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/log/?h=9a6d7298b0833614c411f774c46514efb1bd5651 is a permanent link to the same thing.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.