driver installation cost about 20 mins on 64 cores system

Bug #1688431 reported by Alex Tu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Undecided
Unassigned
nvidia-graphics-drivers-375 (Ubuntu)
Fix Released
High
Alberto Milone

Bug Description

Ubuntu version: 16.04
Kernel: 4.4.0-67-generic

Issue:
With current makefile setting "make -j$(nproc)".

And on a 64 cores system the nvidia driver[1] installation stucks in "Building initial module for 4.4.0-67-generic" for about 20 mins.

Workaround:
repack driver to change setting to "make -j16" , then it just spend about 3 mins to pass "Building initial module"

Investigation:
from iotop, there are 35 processes were using >90% CPU, 23 processes >50% , it might could be a evidence that too heavy IO accessing by -j$(nproc) caused whole system hangs up when nv driver installation.

 htop:
 http://paste.ubuntu.com/24514786/

[1] https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa/+packages

Revision history for this message
Alex Tu (alextu) wrote :

add verbose debug message to check what it was doing after printing "Building initial module for 4.4.0-67-generic",

And attached tarball include the message for -j$(nproc) and -j16
├── make-nvidia-j16-verbose-complete.log : the complete message for buiding in -j16
├── make-nvidia-jnproc-verbose-complete.log : the complete message for buiding in -j$(proc)
├── make-nvidia-jnproc-verbose.log : the message copied when saw "Building initial module for 4.4.0-67-generic"
├── make-nvidia-jnproc-verbose-2.log : the message copied when saw "Building initial module for 4.4.0-67-generic" and stucked for a while.

Changed in nvidia-graphics-drivers-375 (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Alberto Milone (albertomilone)
tags: added: originate-from-1675061 somerville
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-375 - 375.66-0ubuntu1

---------------
nvidia-graphics-drivers-375 (375.66-0ubuntu1) artful; urgency=medium

  * New upstream release:
    - Added support for the following GPUs:
      o GeForce GTX 1080 Ti
      o Quadro P3000
      o Quadro M520
      o TITAN Xp
    - Fixed a bug that could cause EGL applications to crash when
      calling eglInitialize() multiple times on X11-backed displays.
    - Fixed a regression that could cause rendering corruption on a
      monitor connected via DisplayPort upon a modeset event (for
      example, changing resolutions or power cycling the monitor).
    - Fixed a bug that could cause OpenGL applications to crash when
      VT switching between multiple X servers.
    - Fixed a bug that caused the system to become unresponsive after
      resuming from power management suspend/hibernate.  Additional
      symptoms of this bug included display flickering and "Xid 56"
      errors in the kernel log.
    - Fixed a bug that caused backlight brightness to not be
      controllable on some notebooks with DisplayPort internal
      panels.
    - Fixed a bug that left HDMI and DisplayPort audio muted after a
      framebuffer console mode was restored. For some displays, this
      caused the display to remain blank.
    - Fixed a bug that caused audio over DisplayPort to stop working
      when the monitor was unplugged and plugged back in or awoken
      from DPMS power-saving mode.
    - Restored support for the following GPU:
      GRID K520
    - Fixed a regression that caused corruption in certain
      applications, such as window border shadows in Unity, after
      resuming from suspend.
    - Fixed a bug that could cause some applications to crash when
      running with PRIME Sync.
    - Fixed a bug that prevented PRIME Sync from working on notebooks
      with GeForce GTX 4xx and 5xx series GPUs.
    - Fixed a bug that caused OpenGL apps to have excessive CPU usage
      when running with PRIME Sync but without native displays
      enabled.
    - Fixed a bug that could cause PRIME Sync to deadlock in the
      kernel, particularly common on Linux 4.10.
    - Fixed a bug that caused PRIME Sync to run slowly on systems
      with Pascal GPUs.

  [ Alberto Milone ]
  * debian/templates/dkms_nvidia.conf.in:
    - Drop buildfix_kernel_4.10.patch.
    - Limit the amount of cores to a maximum of 16 (LP: #1688431).

  [ Jeremy Bicha ]
  * Depend on xserver-xorg-legacy (LP: #1559576).

 -- Alberto Milone <email address hidden> Fri, 05 May 2017 15:13:39 +0200

Changed in nvidia-graphics-drivers-375 (Ubuntu):
status: In Progress → Fix Released
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.