Merge ~pguimaraes/maas:kvm-overcommit-fix-lp1842896 into ~blake-rouse/maas:fix-1843493

Proposed by Pedro Guimarães
Status: Rejected
Rejected by: Blake Rouse
Proposed branch: ~pguimaraes/maas:kvm-overcommit-fix-lp1842896
Merge into: ~blake-rouse/maas:fix-1843493
Diff against target: 110 lines (+72/-2)
2 files modified
src/maasserver/forms/pods.py (+55/-2)
src/maasserver/models/bmc.py (+17/-0)
Reviewer Review Type Date Requested Status
Blake Rouse Pending
Review via email: mp+372808@code.launchpad.net

Commit message

Fixes LP: #1842896. Following changes made on Pods form and model:
* created machine inherits Pod zone instead of Default value, since we are filtering Pods by zone just before calling get_allocated_composed_machine
* Pod model's create_machine now checks for overcommit ratios just before running the creation process, failing with PodProblem if overcommit limits arenot respected
* create_and_sync runs create_machine within transactional() method to avoid concurrency where multiple allocate requests ends up landing on same Pod and not respecting overcommit limits for that Pod

Description of the change

Fixes LP: #1842896. Following changes made on Pods form and model:
* created machine inherits Pod zone instead of Default value, since we are filtering Pods by zone just before calling get_allocated_composed_machine
* Pod model's create_machine now checks for overcommit ratios just before running the creation process, failing with PodProblem if overcommit limits arenot respected
* create_and_sync runs create_machine within transactional() method to avoid concurrency where multiple allocate requests ends up landing on same Pod and not respecting overcommit limits for that Pod

To post a comment you must log in.
Revision history for this message
Blake Rouse (blake-rouse) wrote :

I am actively working on proper fix for this, jsyk

Unmerged commits

8aad7b4... by Pedro Guimarães

Fixes LP: #1842896. Following changes made on Pods form and model:
* created machine inherits Pod zone instead of Default value, since we are filtering Pods by zone just before calling get_allocated_composed_machine
* Pod model's create_machine now checks for overcommit ratios just before running the creation process, failing with PodProblem if overcommit limits arenot respected
* create_and_sync runs create_machine within transactional() method to avoid concurrency where multiple allocate requests ends up landing on same Pod and not respecting overcommit limits for that Pod

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/src/maasserver/forms/pods.py b/src/maasserver/forms/pods.py
2index 85c1da2..54dc71f 100644
3--- a/src/maasserver/forms/pods.py
4+++ b/src/maasserver/forms/pods.py
5@@ -443,7 +443,7 @@ class ComposeMachineForm(forms.Form):
6 self.initial['domain'] = Domain.objects.get_default_domain()
7 self.fields['zone'] = ModelChoiceField(
8 required=False, queryset=Zone.objects.all())
9- self.initial['zone'] = Zone.objects.get_default_zone()
10+ self.initial['zone'] = self.pod.zone
11 self.fields['pool'] = ModelChoiceField(
12 required=False, queryset=ResourcePool.objects.all())
13 self.initial['pool'] = self.pod.pool
14@@ -639,6 +639,59 @@ class ComposeMachineForm(forms.Form):
15 self.pod.sync_hints(pod_hints)
16 return created_machine
17
18+ def create_and_sync_nonIOThread(result, post_pod_sync=False):
19+ requested_machine, result = result
20+ discovered_machine, pod_hints = result
21+
22+ @asynchronous
23+ def wrap_create_machine(func,
24+ discovered_machine,
25+ commissioning_user,
26+ skip_commissioning,
27+ creation_type,
28+ interfaces,requested_machine,
29+ domain,pool,zone):
30+ d = deferToDatabase(transactional(
31+ partial(
32+ func,
33+ discovered_machine=discovered_machine,
34+ commissioning_user=commissioning_user,
35+ skip_commissioning=skip_commissioning,
36+ creation_type=creation_type,
37+ interfaces=interfaces,
38+ requested_machine=requested_machine,
39+ domain=domain,
40+ pool=pool,
41+ zone=zone,
42+ )
43+ ))
44+ return d
45+ try:
46+ created_machine = wrap_create_machine(
47+ self.pod.create_machine,
48+ discovered_machine=discovered_machine,
49+ commissioning_user=self.request.user,
50+ skip_commissioning=skip_commissioning,
51+ creation_type=creation_type,
52+ interfaces=self.get_value_for('interfaces'),
53+ requested_machine=requested_machine,
54+ domain=self.get_value_for('domain'),
55+ pool=self.get_value_for('pool'),
56+ zone=self.get_value_for('zone'),
57+ ).wait(timeout)
58+ except crochet.TimeoutError:
59+ log.debug("Compose: wrap_create_machine timed out")
60+ raise Exception(
61+ "Unable to compose a machine because database driver "
62+ "timed out after %d seconds." % (
63+ timeout))
64+
65+ post_commit_do(
66+ deferToDatabase,
67+ transactional(self.pod.sync_hints), pod_hints)
68+ log.info("Composition: Result received is: {}".format(created_machine))
69+ return created_machine
70+
71 @inlineCallbacks
72 def async_compose_machine(
73 result, power_type, power_paramaters, **kwargs):
74@@ -706,7 +759,7 @@ class ComposeMachineForm(forms.Form):
75 "timed out after %d seconds." % (
76 self.pod.power_type, timeout))
77 try:
78- return create_and_sync(
79+ return create_and_sync_nonIOThread(
80 (requested_machine, result), post_pod_sync=True)
81 except Exception:
82 # Issue saving the machine into the database. The composed
83diff --git a/src/maasserver/models/bmc.py b/src/maasserver/models/bmc.py
84index 5b76b6c..b312dbf 100644
85--- a/src/maasserver/models/bmc.py
86+++ b/src/maasserver/models/bmc.py
87@@ -783,6 +783,23 @@ class Pod(BMC):
88 skip_commissioning=False,
89 creation_type=NODE_CREATION_TYPE.PRE_EXISTING, interfaces=None,
90 requested_machine=None, **kwargs):
91+
92+ # TODO: Both check_over_commit_ratios and following code to create
93+ # machine should be run within same DB atomic transaction.
94+ # that will ensure that a single thread is adding machines
95+ # to this KVM host.
96+ # TODO action is to try detect if we're under
97+ # transaction.atomic()
98+ # if not, generate LOG warning in case with:
99+ # "models.bmc.pod.create_machine should always run within
100+ # transaction.atomic(): <ADD STACK TRACE>"
101+ msg = self.check_over_commit_ratios(discovered_machine.cores,
102+ discovered_machine.memory)
103+ if msg:
104+ raise PodProblem(
105+ "Unable to compose KVM instance in '%s'. %s" % (
106+ self.name, msg))
107+
108 """Create's a `Machine` from `discovered_machines` for this pod."""
109 if skip_commissioning:
110 status = NODE_STATUS.READY

Subscribers

People subscribed via source and target branches

to all changes: