[ppc64el] curtin block-discover fails to generate storage-config for partitions on raids

Bug #1853018 reported by Paride Legovini
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
curtin
Fix Committed
Medium
Unassigned
subiquity
Invalid
Undecided
Unassigned

Bug Description

Premise: I don't think this is really specific to ppc64el, but as I'm not sure of what the problem is I'll start with describing what I observe, then let's try to draw the right conclusions.

When trying to perform an automated raid1 installation with the answers.yaml [1] by using virt-install like this:

virt-install --name isotest-raid1 --os-variant ubuntu18.04 --noautoconsole --wait -1 --memory 2048 --disk bus=virtio,cache=unsafe,size=6 --disk bus=virtio,cache=unsafe,size=6 --disk disk3.qcow2,bus=virtio,cache=unsafe,size=6 --disk vol=answers.img,bus=virtio,readonly=on --cdrom /path/to/focal-live-server-ppc64el.iso

the install fails printing the following errors in the console where subiquity is running:

ERROR: hook /usr/share/apport/general-hooks/powerpc.py crashed:
Traceback (most recent call last):
File "/snap/subiquity/1335/usr/lib/python3/dist-packages/apport/report.py", line 198, in _run_hook
symb['add_info'](report, ui)
File "/usr/share/apport/general-hooks/powerpc.py", line 42, in add_info
is_kernel = report['ProblemType'].startswith('Kernel') or 'linux' in report.get('Package')
TypeError: argument of type 'NoneType' is not iterable

which seems to be an error with the call to apport. This is probably by itself a bug. Looking in /var/log/install, there are 3 subiquity log files:

$ du -h subiquity-debug.log.*
608K subiquity-debug.log.2167
56K subiquity-debug.log.2180
20K subiquity-debug.log.4033

The bigger one is likely the one that actually did the installation. Looking at what's in it, the first thing I notice is:

2019-11-17 11:24:36,725 ERROR subiquity:139 Failed to lock auto answers file, proceding without it.
Traceback (most recent call last):
  File "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/cmd/tui.py", line 136, in main
    fcntl.flock(opts.answers, fcntl.LOCK_EX | fcntl.LOCK_NB)
BlockingIOError: [Errno 11] Resource temporarily unavailable

so this subiquity seems to have lost the "race" lock the answers.yaml, but then did a lot of things. Strange. At the very end of the log there are two tracebacks, probably the true reason of the failure. The subiquity-debug.log.2167 log file shows very similar tracebacks.

Interesting data points:

 - The very same installation succeeds on amd64;
 - An automated, non-raid installation done on ppc64el
   using virt-install in the same way suceeds;
 - A non-automated install with the same setup on
   ppc64el also succeeds (!).

Attached: the full /var/log of the installer system.

Paride

[1] http://paste.ubuntu.com/p/xvKtK5bY5P/

Related branches

Revision history for this message
Paride Legovini (paride) wrote :
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1853018] [NEW] [autoinstall] raid1 install fails on ppc64el
Download full text (3.7 KiB)

   Reading package lists...
        E: Failed to fetch
http://us.archive.ubuntu.com/ubuntu/dists/focal/main/binary-ppc64el/Packages
404 Not Found [IP: 91.189.91.23 80]
        E: Some index files failed to download. They have been ignored, or
old ones used instead.

Failure during apt-get update command,

+ /snap/subiquity/1335/usr/bin/python3 -m curtin in-target -- apt-get update

On Mon, Nov 18, 2019 at 10:10 AM Paride Legovini <
<email address hidden>> wrote:

> Public bug reported:
>
> Premise: I don't think this is really specific to ppc64el, but as I'm
> not sure of what the problem is I'll start with describing what I
> observe, then let's try to draw the right conclusions.
>
>
> When trying to perform an automated raid1 installation with the
> answers.yaml [1] by using virt-install like this:
>
> virt-install --name isotest-raid1 --os-variant ubuntu18.04
> --noautoconsole --wait -1 --memory 2048 --disk
> bus=virtio,cache=unsafe,size=6 --disk bus=virtio,cache=unsafe,size=6
> --disk disk3.qcow2,bus=virtio,cache=unsafe,size=6 --disk
> vol=answers.img,bus=virtio,readonly=on --cdrom /path/to/focal-live-
> server-ppc64el.iso
>
> the install fails printing the following errors in the console where
> subiquity is running:
>
>
> ERROR: hook /usr/share/apport/general-hooks/powerpc.py crashed:
> Traceback (most recent call last):
> File
> "/snap/subiquity/1335/usr/lib/python3/dist-packages/apport/report.py", line
> 198, in _run_hook
> symb['add_info'](report, ui)
> File "/usr/share/apport/general-hooks/powerpc.py", line 42, in add_info
> is_kernel = report['ProblemType'].startswith('Kernel') or 'linux' in
> report.get('Package')
> TypeError: argument of type 'NoneType' is not iterable
>
>
> which seems to be an error with the call to apport. This is probably by
> itself a bug. Looking in /var/log/install, there are 3 subiquity log files:
>
> $ du -h subiquity-debug.log.*
> 608K subiquity-debug.log.2167
> 56K subiquity-debug.log.2180
> 20K subiquity-debug.log.4033
>
> The bigger one is likely the one that actually did the installation.
> Looking at what's in it, the first thing I notice is:
>
>
> 2019-11-17 11:24:36,725 ERROR subiquity:139 Failed to lock auto answers
> file, proceding without it.
> Traceback (most recent call last):
> File
> "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/cmd/tui.py",
> line 136, in main
> fcntl.flock(opts.answers, fcntl.LOCK_EX | fcntl.LOCK_NB)
> BlockingIOError: [Errno 11] Resource temporarily unavailable
>
>
> so this subiquity seems to have lost the "race" lock the answers.yaml, but
> then did a lot of things. Strange. At the very end of the log there are two
> tracebacks, probably the true reason of the failure. The
> subiquity-debug.log.2167 log file shows very similar tracebacks.
>
> Interesting data points:
>
> - The very same installation succeeds on amd64;
> - An automated, non-raid installation done on ppc64el
> using virt-install in the same way suceeds;
> - A non-automated install with the same setup on
> ppc64el also succeeds (!).
>
> Attached: the full /var/log of the installer system.
>
> Paride
>
>
> [1] http://paste.ubuntu.com/p/xvKtK5bY5P/
>
> ** A...

Read more...

Revision history for this message
Paride Legovini (paride) wrote : Re: [autoinstall] raid1 install fails on ppc64el

Right! That was pretty obvious from the curtin logs, less so from the subiquity logs. The problem was that this was on ppc64el and I was using archive.ubuntu.com instead of ports.ubuntu.com.

Switching to the correct mirror allows the installation to complete, *however* the tracebacks are still there, e.g.:

2019-11-18 23:03:54,528 ERROR block-discover:210 load_probe_data failed restricted=False
Traceback (most recent call last):
  File "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/controllers/filesystem.py", line 207, in _probe_done
    self.model.load_probe_data(probe.result)
  File "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/models/filesystem.py", line 1378, in load_probe_data
    self.reset()
  File "/snap/subiquity/1335/lib/python3.6/site-packages/subiquity/models/filesystem.py", line 1202, in reset
    config = storage_config.extract_storage_config(self._probe_data)
  File "/snap/subiquity/1335/lib/python3.6/site-packages/curtin/storage_config.py", line 1249, in extract_storage_config
    tree = get_config_tree(cfg.get('id'), final_config)
  File "/snap/subiquity/1335/lib/python3.6/site-packages/curtin/storage_config.py", line 274, in get_config_tree
    for dep in find_item_dependencies(item, sconfig):
  File "/snap/subiquity/1335/lib/python3.6/site-packages/curtin/storage_config.py", line 244, in find_item_dependencies
    _validate_dep_type(item_id, dep_key, dep, config)
  File "/snap/subiquity/1335/lib/python3.6/site-packages/curtin/storage_config.py", line 192, in _validate_dep_type
    'Invalid dep_id (%s) not in storage config' % dep_id)
ValueError: Invalid dep_id (raid-md1p1) not in storage config

I can attach the logs of an installation with the correct mirror set up.

It would be to add a "no tracebacks in the installer logs" step in the ISO testing jobs, what do you think?

Revision history for this message
Paride Legovini (paride) wrote :

Attached: the installer log of a successful installation, but with several Tracebacks in them.

summary: - [autoinstall] raid1 install fails on ppc64el
+ raid1 install on ppc64el hits Tracebacks
Revision history for this message
Ryan Harper (raharper) wrote : Re: [Bug 1853018] Re: [autoinstall] raid1 install fails on ppc64el

On Tue, Nov 19, 2019 at 7:40 AM Paride Legovini <
<email address hidden>> wrote:

> Attached: the installer log of a successful installation, but with
> several Tracebacks in them.
>
> ** Attachment added: "installer_logs.tar.gz"
>
> https://bugs.launchpad.net/subiquity/+bug/1853018/+attachment/5306471/+files/installer_logs.tar.gz
>
> ** Summary changed:
>
> - [autoinstall] raid1 install fails on ppc64el
> + raid1 install on ppc64el hits Tracebacks
>
>
The second traceback, on 'probe_data missing bcache data' is known, and
there's a branch
for it:

 https://code.launchpad.net/~raharper/curtin/+git/curtin/+merge/375463

The first trace is related to partitions on a raid. Let's rename this bug
for this
curtin block-discover fails to generate storage-config for partitions on
raids.

--
> You received this bug notification because you are subscribed to
> subiquity.
> Matching subscriptions: subiquity-bugs
> https://bugs.launchpad.net/bugs/1853018
>
> Title:
> raid1 install on ppc64el hits Tracebacks
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/subiquity/+bug/1853018/+subscriptions
>

Revision history for this message
Paride Legovini (paride) wrote :

Thanks Ryan. I renamed the bug, adding a [ppc64el] tag because I'm not getting those tracebacks on amd64.

summary: - raid1 install on ppc64el hits Tracebacks
+ [ppc64el] curtin block-discover fails to generate storage-config for
+ partitions on raids
Revision history for this message
Paride Legovini (paride) wrote :

And neither on arm64, now that I tested it.

Ryan Harper (raharper)
Changed in curtin:
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Server Team CI bot (server-team-bot) wrote :

This bug is fixed with commit 8c4b2252 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=8c4b2252

Changed in curtin:
status: In Progress → Fix Committed
Paride Legovini (paride)
Changed in subiquity:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.