mdcheck_start.service trying to start unexisting file

Bug #1852747 reported by Alexey Bazhin
158
This bug affects 36 people
Affects Status Importance Assigned to Milestone
mdadm (Debian)
Fix Released
Unknown
mdadm (Ubuntu)
Fix Released
High
Eric Desrochers
Bionic
Won't Fix
Undecided
Unassigned
Focal
Fix Released
High
Eric Desrochers

Bug Description

[Impact]

The mdadm package is missing the mdcheck script. This has two consequences:

In the immediate term, that means that we get failed systemd units on all of our physical machines (because they have mirrored disks) as we upgrade them to 20.04. This raises alarms in our monitoring system as we monitor systemd unit failures.

In the longer-term, this means that the arrays are not being checked. If a drive develops a bad sector, this would normally be caught by the checking and a good copy would be rewritten from the other side of the mirror. Without the checking, that will not happen. If the other drive (the one with the good version of the sector) dies, then that sector's data is lost permanently. The consequences of that depend on what that sector was storing, but it's not good, obviously.

[Test Case]

* systemctl start mdcheck_start.service

* journalctl -u mdcheck_start
-- Logs begin at Wed 2020-09-23 18:33:35 UTC, end at Wed 2020-09-23 18:40:27 UTC. --
Sep 23 18:40:27 mdadmgroovy systemd[1]: Starting MD array scrubbing...
Sep 23 18:40:27 mdadmgroovy systemd[1515]: mdcheck_start.service: Failed to execute command: No such file or directory
Sep 23 18:40:27 mdadmgroovy systemd[1515]: mdcheck_start.service: Failed at step EXEC spawning /usr/share/mdadm/mdcheck: No such file or directory
Sep 23 18:40:27 mdadmgroovy systemd[1]: mdcheck_start.service: Main process exited, code=exited, status=203/EXEC
Sep 23 18:40:27 mdadmgroovy systemd[1]: mdcheck_start.service: Failed with result 'exit-code'.
Sep 23 18:40:27 mdadmgroovy systemd[1]: Failed to start MD array scrubbing.

* ls -altr /usr/share/mdadm/mdcheck
ls: cannot access '/usr/share/mdadm/mdcheck': No such file or directory

* dpkg -l mdadm
ii mdadm 4.1-5ubuntu1 amd64 tool to administer Linux MD arrays (software RAID)

* dpkg -L mdadm | grep -i mdcheck
/lib/systemd/system/mdcheck_continue.service
/lib/systemd/system/mdcheck_continue.timer
/lib/systemd/system/mdcheck_start.service
/lib/systemd/system/mdcheck_start.timer

* Also, we'd like to see if the mdcheck is performed under the 'natural' scheduled execution (so on nearest Sunday) and have impacted users to report feedback supported with logs.

* We found a regression fixed upstream:
https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/commit/?id=6636788aaf4ec0cacaefb6e77592e4a68e70a957

* We then found a regression fix for the above regression fix, push into groovy, and then submitted upstream to linux-raid ML:
https://marc.info/?l=linux-raid&m=160130979927617&w=2

* We'd like to see if when mdcheck_start is enabled, enable mdcheck_continue too.

[Regression Potential]

* 'misc/mdcheck' will be introduced in Ubuntu for the first time, and is pretty young in the Debian mdadm story too (introduced in Sept 12 2020).

Not known fix since debian introduced it 2 weeks-ish ago has been added on top of it so far.

$ git log --oneline --grep="mdcheck"
5a3db0f Install misc/mdcheck; turn on hardening; enable dh_lintian. (Closes: #960132)
f258a5e mdcheck: improve cleanup
ea83549 mdcheck: add some logging.
979b1fe mdcheck: be careful when sourcing the output of "mdadm --detail --export"
36dab45 mdcheck: don't git error if not /dev/md?* devices exist.
868ab80 mdcheck: don't pass the '+' to "date".
df881f7 mdcheck: new script to help with regular checks of md arrays.

And no presence of new opened bug(s) related to mdcheck introduction.

At code inspection, 'mdcheck' script seems to be harmless (at least at first glance), of course, real case scenario testing within raid types situations will be needed to conclude during the verification testing phase, and if possible, running the script in debug mode (set -xv) might be a good idea to see the script workflow in action.

This change will permit 'mdcheck' to be run on the first Sunday of each month for 6 hours (mdcheck_start.timer: OnCalendar=Sun *-*-1..7 1:00:00), then on every subsequent morning until the check is finished (mdcheck_continue.timer:OnCalendar=daily).

It's not a script that one would typically run manually on a regular basis.

The script uses 'logger' to enter messages into the system log, so we will have a trace of its execution (in addition the systemd unit,timer usual logs) when it begins, paused and continue. I also added in my upload a patch in which mdcheck logs the completion as well. Giving the opportunity to user to know how long the raid check took, which I think is paramount information to include with the introduction of this script in Ubuntu.

I would suggest we don't release the package in focal-updates before having at least one sample of a 'natural' scheduled execution on the first Sunday of the month (Next should be October 4th ?), and have impacted users to report feedback supported with logs.

I think running it on Sunday is reasonable, (just like fstrim, zfs scrub, ...). Typically, Sunday is a day when cron and timer runs to do some execution like that.

One thing, I would like to confirm, but maybe not a blocker for this case, is to make sure 'mdcheck_continue' starts fine when condition are met, since it has never been tester due to 'mdcheck_start' failure due to missing 'mdcheck' script.

[Other Info]

Debian bug:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960132

salsa commit:
https://salsa.debian.org/lechner/mdadm/-/commit/5a3db0f5429fc81e0f53cbf9aa473059b74fe057

[Original Description]

mdcheck_start.service trying to start unexisting file

root@d:~# cat /lib/systemd/system/mdcheck_start.service | grep Exec
ExecStart=/usr/share/mdadm/mdcheck --duration $MDADM_CHECK_DURATION

root@d:~# ls -la /usr/share/mdadm/mdcheck
ls: cannot access '/usr/share/mdadm/mdcheck': No such file or directory

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: mdadm 4.1-2ubuntu3
ProcVersionSignature: Ubuntu 5.3.0-19.20-generic 5.3.1
Uname: Linux 5.3.0-19-generic x86_64
ApportVersion: 2.20.11-0ubuntu8.2
Architecture: amd64
Date: Fri Nov 15 13:13:17 2019
Lspci: Error: [Errno 2] No such file or directory: 'lspci': 'lspci'
Lsusb: Error: [Errno 2] No such file or directory: 'lsusb': 'lsusb'
MachineType: HP HP EliteBook x360 1030 G3
ProcEnviron:
 LANG=C
 TERM=screen
 PATH=(custom, no user)
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.3.0-19-generic root=/dev/mapper/system-root ro cryptdevice=UUID=95c107ea-73d0-4206-a31c-fb0ed6d7d6a9:cryptlvm mem_sleep_default=deep
ProcMDstat:
 Personalities :
 unused devices: <none>
SourcePackage: mdadm
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/07/2019
dmi.bios.vendor: HP
dmi.bios.version: Q90 Ver. 01.08.01
dmi.board.name: 8438
dmi.board.vendor: HP
dmi.board.version: KBC Version 14.3F.00
dmi.chassis.asset.tag: 5CD9296RDC
dmi.chassis.type: 31
dmi.chassis.vendor: HP
dmi.modalias: dmi:bvnHP:bvrQ90Ver.01.08.01:bd08/07/2019:svnHP:pnHPEliteBookx3601030G3:pvr:rvnHP:rn8438:rvrKBCVersion14.3F.00:cvnHP:ct31:cvr:
dmi.product.family: 103C_5336AN HP EliteBook x360
dmi.product.name: HP EliteBook x360 1030 G3
dmi.product.sku: 5SR46ES#ACB
dmi.sys.vendor: HP
etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'
initrd.files: Error: [Errno 2] No such file or directory: '/boot/initrd.img-5.3.0-19-generic'

Revision history for this message
Alexey Bazhin (baz-irc) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mdadm (Ubuntu):
status: New → Confirmed
Revision history for this message
legioner (legioner) wrote :

Current version of mdadm package contain checkarray script instead of mdcheck.

Workaround: create /etc/cron.d/mdadm as it was in previous versions.

Revision history for this message
legioner (legioner) wrote :

Here is a patch that fixes missed scripts, but didn't touch docs.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "mdcheck-fix.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
KonishchevDmitry (konishchevdmitry) wrote :

Any activity here? :)

The bug silently breaks periodic RAID checks. A lot of people rely on these checks, but most of them don't aware that actually these checks aren't triggered anymore.

tags: added: focal
tags: added: groovy
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Alexey, thanks for your report! And thanks legioner for your comment and patch. I think this is indeed a bug, we have systemd services for mdcheck but the script itself is not in the package.
For Bionic, neither the script or systemd services are present.

Debian came-up with the "checkarray" approach long time ago [0]; it does pretty much the same as mdcheck, with some small differences (checkarray allows you to ionice the process, mdcheck allows to stop/continue the process, etc). I disagree with legioner patch portion that "removes" the checkarray, I don't see a reason for this. At the same time, we should either (a) include mdcheck in the package, or (b) remove the systemd mdcheck-related patches and officially not support mdcheck.

I've noticed mdadm has the same issue on Debian Sid, not containing the script mdcheck (although the systemd services are there), so my suggestion is to also open a Debian bug about this (and link it here in the Launchpad) to get the input from the Debian mdadm maintainers on the subject.

Cheers,

Guilherme

[0]https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=377071

Revision history for this message
legioner (legioner) wrote :

Thanks for your help, Guilherme.
I've opened a Debian bug - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960132
It contains new version of patch, without removing "checkarray" script, as you suggest.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Thank you legioner, let's see how Debian maintainers respond to that. Keep us posted :)
Cheers,

Guilherme

Changed in mdadm (Debian):
status: Unknown → New
Revision history for this message
legioner (legioner) wrote :

Hooray! Debian mdadm maintainers fixed that bug in version 4.1-6. This version currently available in Debian Sid.

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Cool, thanks for the update legioner!

Revision history for this message
Richard Laager (rlaager) wrote :

Unfortunately, we are past the DebianImportFreeze for groovy. Can you apply the one-line bug fix to Groovy so that it can then SRU into Focal?

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960132#15

Revision history for this message
Richard Laager (rlaager) wrote :

I have tested the fix on Focal and confirmed it works. Here is a link to the diff in our PPA:
https://launchpadlibrarian.net/498490932/mdadm_4.1-5ubuntu1_4.1-5ubuntu1.1~wiktel1.20.04.1.diff.gz

Eric Desrochers (slashd)
Changed in mdadm (Ubuntu):
assignee: nobody → Eric Desrochers (slashd)
status: Confirmed → In Progress
Eric Desrochers (slashd)
description: updated
description: updated
description: updated
Eric Desrochers (slashd)
description: updated
description: updated
description: updated
Eric Desrochers (slashd)
description: updated
description: updated
description: updated
Eric Desrochers (slashd)
Changed in mdadm (Ubuntu):
importance: Undecided → High
description: updated
Revision history for this message
Eric Desrochers (slashd) wrote :

[sts-sponsors]

Uploaded in groovy. I'll SRU once the package lands in groovy-releases.

Regards,
Eric

tags: added: sts-sponsors-slashd
Eric Desrochers (slashd)
description: updated
description: updated
description: updated
Changed in mdadm (Ubuntu Focal):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Eric Desrochers (slashd)
Eric Desrochers (slashd)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mdadm - 4.1-5ubuntu2

---------------
mdadm (4.1-5ubuntu2) groovy; urgency=medium

  * d/rules: Install misc/mdcheck (LP: #1852747)
    - The absence of mdcheck utility is preventing
      mdcheck_continue.service & mdcheck_start.service
      to execute the command when the service start via
      ExecStart. (Closes: #960132)

 -- Eric Desrochers <email address hidden> Wed, 23 Sep 2020 14:08:32 -0400

Changed in mdadm (Ubuntu):
status: In Progress → Fix Released
Eric Desrochers (slashd)
description: updated
Revision history for this message
Eric Desrochers (slashd) wrote :

[sts-sponsors]

Uploaded in focal upload queue for SRU verification team approval before it start building in focal-proposed, entering the testing phase.

- Eric

Eric Desrochers (slashd)
Changed in mdadm (Ubuntu Bionic):
status: New → Won't Fix
Eric Desrochers (slashd)
description: updated
description: updated
description: updated
Eric Desrochers (slashd)
description: updated
description: updated
Eric Desrochers (slashd)
description: updated
description: updated
Eric Desrochers (slashd)
description: updated
description: updated
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Ok, I looked at this briefly. So basically, the situation right now in focal is like a bit of both: partially this is a bug, but partially we're also trying to introduce a behavior change. In theory this is how it was supposed to work, but it didn't - which doesn't change the fact that existing users will now notice behavior change of mdcheck actually being run.

Such cases are always a bit tricky, especially for LTSes where some people might have given expectations. We also don't want to regress by accident. That being said, I still *feel* that everyone would merit from the arrays getting checked periodically, and I also feel that the risk here is relatively low.

Under these circumstances, I would like to try getting this into -proposed. I agree with the regression potential section that it would be very reasonable to, besides manually testing the service and mdcheck itself, leave the -proposed package on a system to get the check executed naturally per the schedule. I'll copy over that part to the test case so that it's not forgotten. This is also good, because I'd like for this change to age a bit longer in -proposed anyway.

description: updated
Changed in mdadm (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Alexey, or anyone else affected,

Accepted mdadm into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mdadm/4.1-5ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Eric Desrochers (slashd) wrote :

@sil2100, I totally agree with your statement.

I was on the fence too (bug vs feature request) to introduce the script in LTSes but since the service and all are already introduced and implemented and the script was forgotten, it leave a message to the community that something is wrong or missing with 'mdadm'.

And monitoring a raid array is always good to have.

Thanks for the approval.

Revision history for this message
Richard Laager (rlaager) wrote :

[The following is probably outside the scope of this SRU, but since this will be the first time that people see this logging, maybe you do want to improve it now.]

The existing log statements are:

logger -p daemon.info mdcheck start checking $dev
logger -p daemon.info mdcheck continue checking $dev from $start
logger -p daemon.info mdcheck finished checking $dev
logger -p daemon.info pause checking $dev at `cat $fl`

Some issues:
1. The last one does not contain "mdcheck", which is inconsistent and hampers grepping.
2. These do not set a "tag", so we get "root" as the tag. The typical syslog convention is that the tag is the daemon/script name. I propose "-t mdcheck". That can just be the "mdcheck" that starts the log messages now; there is no need for two "mdcheck"s.
3. nit: I'd use $() instead of ``.

That is, I would change them to the following:

logger -p daemon.info -t mdcheck start checking $dev
logger -p daemon.info -t mdcheck continue checking $dev from $start
logger -p daemon.info -t mdcheck finished checking $dev
logger -p daemon.info -t mdcheck pause checking $dev at $(cat $fl)

Revision history for this message
Richard Laager (rlaager) wrote :

I installed the update on 4 basically identical systems (note to self: hostnames starting with g, k, r, w):

I enabled -proposed and installed the package:

sudo vi /etc/apt/sources.list.d/ubuntu-proposed.list
sudo apt update
sudo apt install mdadm=4.1-5ubuntu1.1

I tested the scrub on one system (hostname starts with k):

# In another terminal:
watch cat /proc/mdstat

sudo systemctl start mdcheck_start.service &
# This started the scrub.

In ~20 minutes, the scrub completed and the service stopped ~1 min thereafter. (This makes sense given the "sleep 120" that the script uses.)

Logs:
2020-09-24T12:28:56.517769-05:00 k... root: mdcheck start checking /dev/md0
2020-09-24T12:50:56.665042-05:00 k... root: mdcheck finished checking /dev/md0

I tested the continue script on one other system (hostname starts with r):

# I changed the time from 6 hours to 3 minutes with a drop-in unit file.
sudo mkdir /etc/systemd/system/mdcheck_start.service.d
sudo vi /etc/systemd/system/mdcheck_start.service.d/time.conf
sudo systemctl daemon-reload

# In another terminal:
watch cat /proc/mdstat

sudo systemctl start mdcheck_start.service &
# This started the scrub.

watch systemctl status mdcheck_start.service
# Again, the script uses a "sleep 120" (two minutes), so at the 4 minute mark
# the service stopped, as did the scrub.

sudo systemctl start mdcheck_continue.service &
watch systemctl status mdcheck_start.service
# The scrub started where it left off.
# The time on this was still the default of 6 hours.
# After another ~18 minutes, the scrub completed.

sudo rm /etc/systemd/system/mdcheck_start.service.d/time.conf
sudo rmdir /etc/systemd/system/mdcheck_start.service.d
sudo systemctl daemon-reload

sudo systemctl start mdcheck_start.service &
# This started the scrub.

watch systemctl status mdcheck_start.service
# After ~20 minutes, the scrub completed and the service stopped.

Logs:
2020-09-24T12:14:56.204254-05:00 r... root: mdcheck start checking /dev/md0
2020-09-24T12:17:37.912431-05:00 r... root: mdcheck start checking /dev/md0
2020-09-24T12:21:38.282462-05:00 r... root: pause checking /dev/md0 at 95207168
2020-09-24T12:21:50.636301-05:00 r... root: mdcheck continue checking /dev/md0 from 95207168
2020-09-24T12:39:50.737671-05:00 r... root: mdcheck finished checking /dev/md0
2020-09-24T12:41:03.127050-05:00 r... root: mdcheck start checking /dev/md0
2020-09-24T13:03:03.243179-05:00 r... root: mdcheck finished checking /dev/md0

I have NOT marked this verification-done, as I believe you wanted to see a "natural" start of the service on October 4 before calling the testing complete. I will leave these systems as is. That will give us 4 examples of it, two of which were not touched in any special way at all (other than installing the update from -proposed, of course). However, if you want to mark this verification-done now, or ask me to do it, I am not opposed to that either.

Revision history for this message
Eric Desrochers (slashd) wrote :

Richard Laager (rlaager)

* Thanks for the thorough testing. Yes, let's wait for a 'natural' start of the service on October 4th, and then if everything still looks good, then let's change the tag to 'done'.

* In regard to your code suggestion to improve 'mdcheck'. The first step would be to make the changes adopted upstream: https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/

If you submitted it upstream and this get approved in a timely fashion, with sil2100's approval (if he is agreed with it), we could possibly in conjunction with the current bug fix add it. But I want to mention that there is no guarantee, and this could add extra delay for the package release.

Regards,
Eric

Eric Desrochers (slashd)
tags: added: seg sts
removed: sts-sponsors-slashd
tags: removed: eoan
Revision history for this message
Eric Desrochers (slashd) wrote :

In case you want to submit a patch upstream

https://raid.wiki.kernel.org/index.php/Linux_Raid

Linux RAID issues are discussed in the linux-raid mailing list to be found at http://vger.kernel.org/vger-lists.html#linux-raid

Revision history for this message
Richard Laager (rlaager) wrote :

It was trivial, so I sent in the patches. I didn't change `...` to $(...) as I don't care to argue with them about that. We'll see what upstream says.

Revision history for this message
Eric Desrochers (slashd) wrote :

* [PATCH 1/2] mdcheck: Prefix pause message with mdcheck
https://marc.info/?l=linux-raid&m=160098735125487&w=2

* [PATCH 2/2] mdcheck: Set a tag of mdcheck
https://marc.info/?l=linux-raid&m=160098735125488&w=2

Revision history for this message
legioner (legioner) wrote :

I enabled -proposed and installed the package:

# dpkg -l mdadm | grep mdadm
ii mdadm 4.1-5ubuntu1.1 amd64 tool to administer Linux MD arrays (software RAID)

# dpkg -L mdadm | grep -i mdcheck
/lib/systemd/system/mdcheck_continue.service
/lib/systemd/system/mdcheck_continue.timer
/lib/systemd/system/mdcheck_start.service
/lib/systemd/system/mdcheck_start.timer
/usr/share/mdadm/mdcheck

# ls -altr /usr/share/mdadm/mdcheck
-rwxr-xr-x 1 root root 3884 Sep 24 18:54 /usr/share/mdadm/mdcheck

Original bug is fixed now.

But I see another problem with systemd files - mdcheck_continue.timer is misconfigured and will never be started:

# systemctl status mdcheck_continue.timer
● mdcheck_continue.timer - MD array scrubbing - continuation
     Loaded: loaded (/lib/systemd/system/mdcheck_continue.timer; static; vendor preset: enabled)
     Active: inactive (dead)
    Trigger: n/a
   Triggers: ● mdcheck_continue.service

# systemctl enable mdcheck_continue.timer
The unit files have no installation config (WantedBy=, RequiredBy=, Also=,
Alias= settings in the [Install] section, and DefaultInstance= for template
units). This means they are not meant to be enabled using systemctl.

Possible reasons for having this kind of units are:
• A unit may be statically enabled by being symlinked from another unit's
  .wants/ or .requires/ directory.
• A unit's purpose may be to act as a helper for some other unit which has
  a requirement dependency on it.
• A unit may be started when needed via activation (socket, path, timer,
  D-Bus, udev, scripted systemctl call, ...).
• In case of template units, the unit is meant to be enabled with some
  instance name specified.

I suppose, that I need to create another bug.

Revision history for this message
Richard Laager (rlaager) wrote :
Revision history for this message
Eric Desrochers (slashd) wrote :

@richard @legioner

I'll wait to see if Richard's patches are adopted upstream and will re-upload beginning of next week, including "6636788a".

- Eric

Eric Desrochers (slashd)
description: updated
Revision history for this message
Eric Desrochers (slashd) wrote :

@legioner

# build logs
dh_installsystemd -pmdadm mdcheck_continue.timer mdcheck_start.timer mdmonitor-oneshot.timer
dh_installsystemd: error: Package 'mdadm' does not install unit ''.
make: *** [debian/rules:88: binary-arch] Error 25
dpkg-buildpackage: error: fakeroot debian/rules binary subprocess returned exit status 2

debhelper src code:
    355 # This hash prevents us from looping forever in the following
    356 # while loop. An actual real-world example of such a loop is
    357 # systemd's systemd-readahead-drop.service, which contains
    358 # Also=systemd-readahead-collect.service, and that file in turn
    359 # contains Also=systemd-readahead-drop.service, thus forming an
    360 # endless loop.
    361 my %seen;
    362
    363 # Must use while and shift because the loop alters the list.
    364 while (@args) {
    365 my $unit = shift @args;
    366 my $path = "${tmpdir}/lib/systemd/system/${unit}";
    367
    368 error("Package '$package' does not install unit '$unit'.") unless (-f $path);

I need to investigate what dh_installsystemd doesn't like about this.

Revision history for this message
Eric Desrochers (slashd) wrote :

the above was after applying commit "6636788a".

Revision history for this message
Eric Desrochers (slashd) wrote :

It could simply be a matter of ordering...

- dh_installsystemd -pmdadm mdcheck_continue.timer mdcheck_start.timer mdmonitor-oneshot.timer
+ dh_installsystemd -pmdadm mdcheck_start.timer mdcheck_continue.timer mdmonitor-oneshot.timer

Will try the following and see if my hypothesis is right.

Revision history for this message
Eric Desrochers (slashd) wrote :

Didn't work...

dh_installsystemd -pmdadm mdcheck_start.timer mdcheck_continue.timer mdmonitor-oneshot.timer
dh_installsystemd: error: Package 'mdadm' does not install unit ''.

Still investigating ...

Revision history for this message
Eric Desrochers (slashd) wrote :

For some reasons, 'dh_installsystemd' doesn't like the "+Also= mdcheck_continue.timer" in mdcheck_start.timer. It builds fine without it.

Revision history for this message
Eric Desrochers (slashd) wrote :

It's probably detecting a loop as explain in the above 'dh_installsystemd' I have cut/paste earlier. I'll need to review all this.

Revision history for this message
Richard Laager (rlaager) wrote :

It might be mad about the extra space after the equals. Note that is is complaining about the empty string. If it is splitting by spaces, that would explain it.

Revision history for this message
Eric Desrochers (slashd) wrote :

It's not an extra space issue, I have tested it.

Still investigating, it seems like dh_installsystemd has a directive to look for unit file directive to detect potential looping issue.

Revision history for this message
Eric Desrochers (slashd) wrote :

Nothing wrong reported by 'systemd-analyze verify <UNIT>' with regard to the 'offending' "Also=" directives.

root@groovymdadm:/tmp/md/groovy/mdadm-4.1# systemd-analyze verify /lib/systemd/system/mdcheck_continue.timer
/lib/systemd/system/plymouth-start.service:17: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
/lib/systemd/system/dbus.service:12: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.

root@groovymdadm:/tmp/md/groovy/mdadm-4.1# systemd-analyze verify /lib/systemd/system/mdcheck_start.timer
/lib/systemd/system/plymouth-start.service:17: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.
/lib/systemd/system/dbus.service:12: Unit configured to use KillMode=none. This is unsafe, as it disables systemd's process lifecycle management for the service. Please update your service to use a safer KillMode=, such as 'mixed' or 'control-group'. Support for KillMode=none is deprecated and will eventually be removed.

Revision history for this message
Eric Desrochers (slashd) wrote :

I ran it locally with no space, and it did the trick. Maybe something went wrong with my PPA.
Will give it another try with no spaces.

Revision history for this message
Eric Desrochers (slashd) wrote :

Yup, I confirm, removing the spaces does the trick. My first ppa test went wrong for some reasons.
With a new fresh ppa it went well.

I'll submit the fix upstream, but will upload it now in groovy.

- Eric

Eric Desrochers (slashd)
description: updated
Revision history for this message
Eric Desrochers (slashd) wrote :
Revision history for this message
Eric Desrochers (slashd) wrote :
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Alexey, or anyone else affected,

Accepted mdadm into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/mdadm/4.1-5ubuntu1.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
legioner (legioner) wrote :

Test finished =)

# dpkg -l mdadm | grep mdadm
ii mdadm 4.1-5ubuntu1.2 amd64 tool to administer Linux MD arrays (software RAID)

Problem with mdadm_continue.timer is fixed now:

# systemctl status mdcheck_continue.timer
● mdcheck_continue.timer - MD array scrubbing - continuation
     Loaded: loaded (/lib/systemd/system/mdcheck_continue.timer; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2020-09-29 09:25:28 +07; 35s ago
    Trigger: n/a
   Triggers: ● mdcheck_continue.service

Sep 29 09:25:28 fatman systemd[1]: Started MD array scrubbing - continuation.

Moreover it automatically continue previous check on latest version install.

# journalctl
...
Sep 29 09:25:28 fatman systemd[1]: Started MD array scrubbing - continuation.
Sep 29 09:25:28 fatman systemd[1]: Starting MD array scrubbing - continuation...
Sep 29 09:25:28 fatman systemd[1]: mdcheck_start.timer: Succeeded.
Sep 29 09:25:28 fatman systemd[1]: Stopped MD array scrubbing.
Sep 29 09:25:28 fatman systemd[1]: Stopping MD array scrubbing.
Sep 29 09:25:28 fatman systemd[1]: Started MD array scrubbing.
Sep 29 09:25:28 fatman systemd[1]: mdmonitor-oneshot.timer: Succeeded.
Sep 29 09:25:28 fatman systemd[1]: Stopped Reminder for degraded MD arrays.
Sep 29 09:25:28 fatman systemd[1]: Stopping Reminder for degraded MD arrays.
Sep 29 09:25:28 fatman systemd[1]: Started Reminder for degraded MD arrays.
Sep 29 09:25:28 fatman root[2983618]: mdcheck continue checking /dev/md0 from 1604272336
Sep 29 09:25:28 fatman kernel: md: data-check of RAID array md0

Revision history for this message
Richard Laager (rlaager) wrote :

I repeated my same test procedure. Everything worked as expected.

Revision history for this message
Eric Desrochers (slashd) wrote :

Great thanks guys for your active participation.

I checked this morning, and Richard's patch haven't been reviewed/approved nor merged yet upstream.

Since this is esthetic and not a bug per see (until I'm proven wrong).
Let's keep 'mdadm' in focal as is for now I guess.

- Eric

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Eric Desrochers (slashd) wrote :

[VERIFICATION FOCAL]

Along with the 2 verifications done by Richard and legioner.

Here's an extra visual verification:

[4.1-5ubuntu1]

# systemd-analyze verify /lib/systemd/system/mdcheck_start.service
mdcheck_start.service: Command /usr/share/mdadm/mdcheck is not executable: No such file or directory

# systemd-analyze verify /lib/systemd/system/mdcheck_continue.service
mdcheck_continue.service: Command /usr/share/mdadm/mdcheck is not executable: No such file or directory

[4.1-5ubuntu1.2]
The "Command ... is not executable" error is now gone.

- Eric

Revision history for this message
Eric Desrochers (slashd) wrote :

Actually, I have marked the bug as verification-done too fast ... one item that I wanted to see is still missing.

I'd like feedback on the 'natural' run that should happen on October 4 (sunday).

I'll wait for your feedback.

For now, I'll switch the LP back to verification-needed.

- Eric

tags: added: verification-needed verification-needed-focal
removed: verification-done verification-done-focal
Revision history for this message
Eric Desrochers (slashd) wrote :

Took from my mdadm focal-proposed testbed:

$ systemctl list-timers | grep -i mdcheck
Sun 2020-10-04 04:47:32 UTC 2 days left n/a n/a mdcheck_start.timer mdcheck_start.service

Revision history for this message
legioner (legioner) wrote :

Everything work as expected:

# systemctl status mdcheck_start.timer
● mdcheck_start.timer - MD array scrubbing
     Loaded: loaded (/lib/systemd/system/mdcheck_start.timer; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2020-09-29 09:25:28 +07; 5 days ago
    Trigger: n/a
   Triggers: ● mdcheck_start.service

# systemctl status mdcheck_start.service
● mdcheck_start.service - MD array scrubbing
     Loaded: loaded (/lib/systemd/system/mdcheck_start.service; static; vendor preset: enabled)
     Active: activating (start) since Sun 2020-10-04 06:40:59 +07; 5h 17min ago
TriggeredBy: ● mdcheck_start.timer
   Main PID: 3696570 (mdcheck)
      Tasks: 2 (limit: 9481)
     Memory: 1.7M
     CGroup: /system.slice/mdcheck_start.service
             ├─3696570 /bin/bash /usr/share/mdadm/mdcheck --duration 6 hours
             └─3724586 sleep 120

Oct 04 06:40:59 fatman systemd[1]: Starting MD array scrubbing...

Revision history for this message
Richard Laager (rlaager) wrote :

The "natural start" succeeded on all 4 of my systems. The start times were 01:41, 10:50, 18:11, and 21:43.

tags: added: verification-done verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package mdadm - 4.1-5ubuntu1.2

---------------
mdadm (4.1-5ubuntu1.2) focal; urgency=medium

  * d/p/mdcheck-when-mdcheck-start-is-enabled-mdcheck-continue-too.patch:
    - When mdcheck_start is enabled, enable mdcheck_continue too.

mdadm (4.1-5ubuntu1.1) focal; urgency=medium

  * d/rules: Install misc/mdcheck (LP: #1852747)
    - The absence of mdcheck utility is preventing
      mdcheck_continue.service & mdcheck_start.service
      to execute the command when the service start via
      ExecStart. (Closes: #960132)

  * d/p/mdcheck-log-when-done.patch:
    - Make sure mdcheck logs the completion too, so that
      it can be determined how long the raid check took.

 -- Eric Desrochers <email address hidden> Mon, 28 Sep 2020 12:29:11 -0400

Changed in mdadm (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for mdadm has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in mdadm (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.