Hot-Add Memory failing for lack of udev rule

Bug #1233466 reported by Abhishek Gupta
28
This bug affects 3 people
Affects Status Importance Assigned to Milestone
udev
Unknown
Unknown
linux (Ubuntu)
Triaged
Medium
Andy Whitcroft
systemd (Ubuntu)
Fix Released
Medium
Andy Whitcroft

Bug Description

It appears that the Hot-Add mechanism is not working because of lack of a udev rule. The required rule is attached to this bug. Please include it.

Furthermore, please ensure that the following patches are included:

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/hv?id=20138d6cb838aa01bb1b382dcb5f3d3a119ff2cb

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/hv?id=c5e2254f8d63a6654149aa32ac5f2b7dd66a976d

https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/drivers/hv?id=ed07ec93e83ec471d365ce084e43ad90fd205903

The above patches should be in but just wanted to ensure that you guys got these.

Thanks again for the great last minute help.
Abhishek

Revision history for this message
Abhishek Gupta (abgupta) wrote :
information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1233466

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Abhishek Gupta (abgupta) wrote :

Log files not needed as solution included in bug report.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Abhishek Gupta (abgupta) wrote :

Being a little proactive and adding Andy as well as he is fixing the other bugs. Hope that is ok Ben.

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Andy Whitcroft (apw)
Changed in linux (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Andy Whitcroft (apw)
Revision history for this message
Andy Whitcroft (apw) wrote :

These commits from mainline which are in v3.12-rc1, v3.11-rc3, and v3.11-rc3 respectively:

  commit 20138d6cb838aa01bb1b382dcb5f3d3a119ff2cb
  Author: K. Y. Srinivasan <email address hidden>
  Date: Wed Jul 17 17:27:27 2013 -0700

    Drivers: hv: balloon: Initialize the transaction ID just before sending the packet

  commit c5e2254f8d63a6654149aa32ac5f2b7dd66a976d
  Author: K. Y. Srinivasan <email address hidden>
  Date: Sun Jul 14 22:38:12 2013 -0700

    Drivers: hv: balloon: Do not post pressure status if interrupted

  commit ed07ec93e83ec471d365ce084e43ad90fd205903
  Author: K. Y. Srinivasan <email address hidden>
  Date: Sun Jul 14 22:38:11 2013 -0700

    Drivers: hv: balloon: Fix a bug in the hot-add code

Revision history for this message
Abhishek Gupta (abgupta) wrote :

Could someone also ack that the udev rule will be included? Just want to make sure you guys noticed it. Furthermore, this is high pri than medium from our perspective.

Revision history for this message
Andy Whitcroft (apw) wrote :

Confirmed that two of the kernel patches are already applied, pulled the third in and sent kernels for testing.

Changed in systemd (Ubuntu):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Andy Whitcroft (apw)
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Andy Whitcroft (apw) wrote :

Ok I have built a test kernel containing the missing patch above which is here:

    http://people.canonical.com/~apw/hyperv-saucy/

I have also attempted to add the required rule to udev which is here:

    http://people.canonical.com/~apw/hyperv-saucy/udev/

If you could test those and confirm that this combination works as expected.

Please note I have made some changes to the udev rule to firstly make it use modern ATTR manipulation, and secondly to only apply it for hyper-v guests. Please report any testing back here.

Revision history for this message
Abhishek Gupta (abgupta) wrote :

Testing now.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.11.0-11.17

---------------
linux (3.11.0-11.17) saucy; urgency=low

  [ Andy Whitcroft ]

  * [Packaging] tools -- only build common tools package in master
  * SAUCE: tools -- report when tool is not available
    - LP: #1233376
  * [Packaging] tools -- common tools must carry all possible tools
    - LP: #1233376

  [ Colin Ian King ]

  * [Config] Fix power and performance regression
    - LP: #1233681

  [ Colin Watson ]

  * [Config] Clean up various udeb Provides

  [ Tim Gardner ]

  * rebase to v3.11.3
  * Release tracker
    - LP: #1233808

  [ Upstream Kernel Changes ]

  * ALSA: hda - Add CS4208 codec support for MacBook 6,1 and 6,2
    - LP: #1233623
  * ALSA: hda - Add fixup for MacBook Air 6,1 and 6,2 with CS4208 codec
    - LP: #1233623
  * NVMe: Remove "process_cq did something" message
    - LP: #1233686
  * Drivers: hv: balloon: Initialize the transaction ID just before sending the packet
    - LP: #1233466
  * cciss: fix info leak in cciss_ioctl32_passthru()
    - LP: #1188355
    - CVE-2013-2147
  * cpqarray: fix info leak in ida_locked_ioctl()
    - LP: #1188355
    - CVE-2013-2147
  * Drivers: hv: util: Fix a bug in version negotiation code for util services
    - LP: #1233433
 -- Tim Gardner <email address hidden> Tue, 01 Oct 2013 07:24:27 -0600

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Abhishek Gupta (abgupta) wrote :

I am still testing the fix. It appears that the udev rule installs correctly but I see two issues:

a) Memory demand from Ubuntu 13.10 is extraordinarily high on Hyper-V. It is likely to be a bug on our side and we will work on it.
b) After applying all the packages under udev, our mouse driver is disabled for some reason.

Please hold until tomorrow for more results.
Abhishek

Revision history for this message
Martin Pitt (pitti) wrote :

Please don't apply this udev rule as it is:

 * It can be made *much* more efficient and avoid an extra shell with

     SUBSYSTEM=="memory", ACTION=="add", DEVPATH=="/devices/system/memory/memory*[0-9]", ATTR{online}="1"

   (same for the other rule)

 * In this generic way they don't look plausible. If we *always* want hotplugged CPUs and memory to be online (which sounds plausible), then the kernel should do just that. It shouldn't require lots of extra userspace overhead to always set a driver attribute to a constant value. It is certainly ok as a workaround until the kernel gets fixed properly, though.

 * If this applies to Hyper-V only, it would be better to surround these rules with some check if the system is actually running on this hypervisor. This might happen via DMI data (like ATTR{[dmi/id]product_version}=="hyper-v"), the presence of a Hyper-V specific device/subsystem in sysfs, and so on. This ATTR{[subsystem/device]attribute} syntax works generically by the way, so that you can look at a device different than the one you are currently selecting.

Revision history for this message
Abhishek Gupta (abgupta) wrote :

Hi there! The udev rule works but after installing the following udev package the following drivers do not load anymore:

1. hyperv-fb
2. hv_utils
3. hv_balloon

Is this just a side effect or something else is wrong? Please let us know.

Thanks,
Abhishek

Revision history for this message
Andy Whitcroft (apw) wrote :

@Pitti -- for completeness this is the proposed udev rule which is being tested:

# On Hyper-V Virtual Machines we want to add memory and cpus on demand.
ATTR{[dmi/id]sys_vendor}!="Microsoft Corporation", GOTO="hyperv_hotadd_end"
ATTR{[dmi/id]product_name}!="Virtual Machine", GOTO="hyperv_hotadd_end"

# Memory hotadd request
SUBSYSTEM=="memory", ACTION=="add", DEVPATH=="/devices/system/memory/memory*[0-9]", TEST=="state", ATTR{state}="online"

# CPU hotadd request
SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu*[0-9]", TEST=="online", ATTR{online}="1"

LABEL="hyperv_hotadd_end"

Revision history for this message
Abhishek Gupta (abgupta) wrote :
Download full text (3.9 KiB)

Hi guys,

We went the alternate route of trying to convince kernel folks of fixing this problem through a patch. Unfortunately, the patch was rejected. The discussion involved is quoted below. Please also note that in the above patch we can remove the CPU hotadd request command as that is extraneous:

# CPU hotadd request
 SUBSYSTEM=="cpu", ACTION=="add", DEVPATH=="/devices/system/cpu/cpu*[0-9]", TEST=="online", ATTR{online}="1"

Please include the memory hotadd patch as not including it may lead to kernel panics related to memory management.

Thanks,
Abhishek

-----Original Message-----
From: Dave Hansen [mailto:<email address hidden>]
Sent: Thursday, July 25, 2013 9:35 AM
To: Kay Sievers
Cc: KY Srinivasan; Michal Hocko; <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>; <email address hidden>
Subject: Re: [PATCH 1/1] Drivers: base: memory: Export symbols for onlining memory blocks

On 07/25/2013 08:15 AM, Kay Sievers wrote:
> Complexity, well, it's just a bit of code which belongs in the kernel.
> The mentioned unconditional hotplug loop through userspace is
> absolutely pointless. Such defaults never belong in userspace tools if
> they do not involve data that is only available in userspace and
> something would make a decision about that. Saying "hello" to
> userspace and usrspace has a hardcoded "yes" in return makes no sense
> at all. The kernel can just go ahead and do its job, like it does for
> all other devices it finds too.

Sorry, but memory is different than all other devices. You never need a mouse in order to add another mouse to the kernel.

I'll repaste something I said earlier in this thread:

> A system under memory pressure is going to have troubles doing a
> hot-add. You need memory to add memory. Of the two operations ("add"
> and "online"), "add" is the one vastly more likely to fail. It has to
> allocate several large swaths of contiguous physical memory. For that
> reason, the system was designed so that you could "add" and "online"
> separately. The intention was that you could "add" far in advance and
> then "online" under memory pressure, with the "online" having *VASTLY*
> smaller memory requirements and being much more likely to succeed.

So, no, it makes no sense to just have userspace always unconditionally online all the memory that the kernel adds. But, the way it's set up, we _have_ a method that can work under lots memory pressure, and it is available for users that want it. It was designed 10 years ago, and maybe it's outdated, or history has proved that nobody is going to use it the way it was designed.

If I had it to do over again, I'd probably set up configurable per-node sets of spare kernel metadata. That way, you could say "make sure we have enough memory reserved to add $FOO sections to node $BAR". Use that for the largest allocations, then depend on PF_MEMALLOC to get us enough for the other little bits along the way.

Also, if this is a problem, it's going to be a problem for *EVERY...

Read more...

Revision history for this message
Martin Pitt (pitti) wrote :

Abhishek Gupta, did you really quote the right mails? Because those mails are just proving my point, it makes no sense to have an unconditional userspace rule which is just there to always enable new memory. To the contrary, as this is always going to involve more memory allocation (like it might spawn a new udev worker), it's much more prone to fail in userspace than in kernel space. This would make more sense if, as the thread suggests, you want to pre-add memory without activating it yet (I have no idea why this would be useful, but I don't claim to have a full understanding of all the use cases here), but then we shouldn't have that udev rule.

> Please include the memory hotadd patch

That's another kernel patch not yet covered by the initial set of three (which should already be in precise)?

Revision history for this message
Abhishek Gupta (abgupta) wrote :

Hi Martin, I will circle back with KY and reply by tomorrow morning on this. What we are trying to convey here is that this was discussed with the kernel folks and they shot down the kernel patch which would have got rid of the udev rule.

> Please include the memory hotadd patch
I apologize for not being clear In the above statement. I meant to imply that please include the udev rule we have provided as without it we will run in to kernel panics.

If you absolutely do not want to include the udev rule we can send you the kernel patch which was debated upon and perhaps you guys can carry that until further resolution with the kernel community.

Let me know what you think.
Thanks,
Abhishek

Revision history for this message
Abhishek Gupta (abgupta) wrote :

Btw, another thought is that we do realize that the udev rule is not the best solution here but given the release time frame perhaps we can compromise a little. We will be more than happy to work with you to provide an alternate solution. However releasing 13.10 with risks of panics is more worrisome and we might want to avoid that.

Revision history for this message
Martin Pitt (pitti) wrote :

Yes, I think Andy's udev rule from comment 14 is an acceptable distro-level compromise for 13.10, until that gets solved more cleanly. I didn't see a confirmation from you that this worked, is that the one which got tested?

> If you absolutely do not want to include the udev rule we can send you the kernel patch which was debated upon and perhaps you guys can carry that until further resolution with the kernel community.

It would be interesting to see, at least for me.

Thank you!

Revision history for this message
Martin Pitt (pitti) wrote :

BTW, where does this rule need to live, in the hypervisor or in the guests? In the latter case I guess we should ship it in udev, in the former case they hypervisor package could just ship it? That would avoid the extra rule for other installs.

Revision history for this message
Abhishek Gupta (abgupta) wrote :

Hi Martin, Thanks for accommodating the request for the udev rule. The rule needs to live in the guest as Hyper-V does not understand udev rules.

Yes, we did test with Andy's udev rule from comment 14 and it seems to work. Only issue we saw was that it seemed to suppress some of our other drivers such as:

1. hyperv-fb
2. hv_utils
3. hv_balloon

It would be really helpful if you guys could do some unit testing to ensure that our drivers are loaded even after adding the udev rule then that would be great.

The entire patch along with complete conversation is available here:

https://groups.google.com/forum/#!msg/linux.kernel/AxvyuQjr4GY/TLC-K0sL_NEJ

Please let me know if you need further information.
Thanks,
Abhishek

Revision history for this message
Abhishek Gupta (abgupta) wrote :

In my last comment, I meant to say that the patch that got rejected on lkml is at:

https://groups.google.com/forum/#!msg/linux.kernel/AxvyuQjr4GY/TLC-K0sL_NEJ

Thanks,
Abhishek

Revision history for this message
Martin Pitt (pitti) wrote :

Time is running out for saucy, so I'll upload the rule from comment 14 now, with some correction: I take it you actually mean "memory[0-9]*" instead of "memory*[0-9]" (same for CPU). @Abhishek, I don't see how these rules would cause any other kernel module to suddenly not load? All they do is to change a sysfs attribute of an existing device.

Martin Pitt (pitti)
Changed in systemd (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 204-0ubuntu18

---------------
systemd (204-0ubuntu18) saucy; urgency=low

  * Configure with --disable-silent-rules, as per request from Matthias.
  * Add debian/extra/rules/40-hyperv-hotadd.rules: On Hyper-V VMs, bring new
    CPU and memory devices online as soon as they appear. This is not quite an
    ideal solution, but an unintrusive compromise for Saucy. (LP: #1233466)
 -- Martin Pitt <email address hidden> Sat, 12 Oct 2013 12:16:29 +0200

Changed in systemd (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Martin Pitt (pitti) wrote :

Reopening the kernel task here. We've carried the "temporary" udev rule workaround for three releases now; it seems we all agree that it's wrong (unconditional "unbreak my kernel rule") and makes things even worse under memory pressure. Is there any better solution in sight? Thanks!

Changed in linux (Ubuntu):
status: Fix Released → Confirmed
Revision history for this message
Benjamin Drung (bdrung) wrote :

Can we also enable the hotplug udev rules for QEMU guests? See the attached git format-patch.

tags: added: patch
Revision history for this message
Martin Pitt (pitti) wrote :

Maintaining an ever-growing list of vendors is error-prone and TBH quite silly. Are there any CPUs where we *don't* want hot-add? If not, why can't the kernel just default to enabling this, instead of userspace unconditionally turning this on? But if hot-add is supposed to work everywhere, then for SRUs at least the udev rule could be simplified to drop the vendor check.

Revision history for this message
Benjamin Drung (bdrung) wrote :

I fail to come up with a use-case that just wants to add CPU/RAM without onlining it sooner or later. For virtual environments, you add CPU/RAM to use it and therefore want to online it once it appears. In a cloud environment (like ProfitBricks), it makes no sense to add CPU/RAM without using it, because you have to pay for it.

Hot-add is supposed to work on recent kernels (>= 3.10). Kernel 3.2 had an issue, but I can't remember if adding CPU/RAM or onlining it causes the issue.

I would be happy to unconditionally turning hot-add on and SRU the udev rule back to Ubuntu 14.04.

Benjamin Drung (bdrung)
affects: linux → udev
Revision history for this message
Benjamin Drung (bdrung) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

> I would be happy to unconditionally turning hot-add on and SRU the udev rule back to Ubuntu 14.04.

As I (and Kay) said, this is okay for an SRU, but unconditionally second-guessing/changing the kernel defaults is just wrong and inefficient too. For yakkety and onwards this *really* should be fixed in the kernel (preferrably upstream too), and that silly rule be dropped.

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
Benjamin Drung (bdrung) wrote :

Any updates on this?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.