Merge lp:~nuclearbob/utah/cobbler-pidlock into lp:utah

Proposed by Max Brustkern
Status: Rejected
Rejected by: Max Brustkern
Proposed branch: lp:~nuclearbob/utah/cobbler-pidlock
Merge into: lp:utah
Diff against target: 90 lines (+44/-26)
1 file modified
utah/provisioning/inventory/sqlite.py (+44/-26)
To merge this branch: bzr merge lp:~nuclearbob/utah/cobbler-pidlock
Reviewer Review Type Date Requested Status
Max Brustkern (community) Disapprove
Review via email: mp+129046@code.launchpad.net

Description of the change

The original cobbler inventory used 'available' and 'provisioned' as states. If a job timed out and left the machine in the 'provisioned' state, there's no automatic process to recover that. This changes the inventory to use the pid of the process requesting the machine as the state, and if a machine is requested and that pid is no longer active on the system, the machine will be claimed anyway. If the listed pid is still active, the machine will not be used. I've tested it in magners-orchestra with the dx team's jenkins job. It could cause problems if multiple provisioning hosts were trying to use the same sqlite database to manage physical machines, but I think that scenario would introduce other problems as well. If we're dealing with multiple hosts trying to provision the same pool of machines, a more robust solution than sqlite is probably warranted.

To post a comment you must log in.
Revision history for this message
Joe Talbott (joetalbott) wrote :

Keep in mind that on a highly loaded machine PIDs get recycled so you might have a whole different process getting an available PID that is in the inventory database.

Revision history for this message
Javier Collado (javier.collado) wrote :

To take Joe's comment into account, you might want to check the process command line:

if psutil.pid_exists(pid) and 'run_utah_test.py' in '.join(psutil.Process(pid).cmdline):
    <machine is being used>

utah already depends on psutil, so this won't add an extra dependency to the package.

Revision history for this message
Javier Collado (javier.collado) wrote :

One more comment is that if you plan to use the pid to track machine reservation,
then the column name in the table should be renamed to pid.

lp:~nuclearbob/utah/cobbler-pidlock updated
708. By Max Brustkern

Adding command line check

709. By Max Brustkern

Added check for "run_cobbler_test.py"

710. By Max Brustkern

Changed run_cobbler_test to run_test_cobbler

Revision history for this message
Max Brustkern (nuclearbob) wrote :

I updated the check to look for "utah" or "run_test_cobbler.py" on the command line. I'm testing now. The thing about changing the name of the field is that it still could be "available" which isn't a pid. What's a good name that indicates it can either be a state or a pid? Or can we change available to some number that will never be a pid, like -1?

Revision history for this message
Joe Talbott (joetalbott) wrote :

On Thu, Oct 11, 2012 at 01:24:20PM -0000, Max Brustkern wrote:
> I updated the check to look for "utah" or "run_test_cobbler.py" on the command line. I'm testing now. The thing about changing the name of the field is that it still could be "available" which isn't a pid. What's a good name that indicates it can either be a state or a pid? Or can we change available to some number that will never be a pid, like -1?

What about changing the value to 'available - <pid>'? Then you can string
compare on 'available' and easily parse out the pid for doing the check.
> --
> https://code.launchpad.net/~nuclearbob/utah/cobbler-pidlock/+merge/129046
> Your team UTAH Dev is requested to review the proposed merge of lp:~nuclearbob/utah/cobbler-pidlock into lp:utah.

Revision history for this message
Javier Collado (javier.collado) wrote :

@Max

I think this looks like two different columns are needed. One for the state
and one for the pid and that the pid is used to check if the state can be trusted.
Encoding both things in one column could lead to confusion.

Another improvement could be to have a new table named states with all the possible
states as strings and replace the state column in the machines with state_id to be
used as a foreign key.

Finally, I'm not sure if this will grow in complexity in the future, but have you
thought about using an ORM instead of write sql commands directly?

Revision history for this message
Max Brustkern (nuclearbob) wrote :

I'll rework this and resubmit later.

review: Disapprove

Unmerged revisions

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'utah/provisioning/inventory/sqlite.py'
2--- utah/provisioning/inventory/sqlite.py 2012-08-22 15:20:25 +0000
3+++ utah/provisioning/inventory/sqlite.py 2012-10-11 13:23:20 +0000
4@@ -2,6 +2,7 @@
5
6 import sqlite3
7 import os
8+import psutil
9 from utah.provisioning.inventory.exceptions import \
10 UTAHProvisioningInventoryException
11 from utah.provisioning.inventory.inventory import Inventory
12@@ -112,35 +113,52 @@
13 else:
14 for machineinfo in result:
15 cargs = dict(machineinfo)
16- machineid = cargs.pop('machineid')
17- name = cargs.pop('name')
18- state = cargs.pop('state')
19- if state == 'available':
20- update = self.connection.execute(
21- "UPDATE machines SET state='provisioned' "
22- "WHERE machineid=? AND state='available'",
23- [machineid]).rowcount
24- if update == 1:
25- machine = CobblerMachine(*args, cargs=cargs,
26- inventory=self, name=name, **kw)
27- self.machines.append(machine)
28- return machine
29- elif update == 0:
30- raise UTAHProvisioningInventoryException(
31- 'Machine was requested by another process '
32- 'before we could request it')
33- elif update > 1:
34- raise UTAHProvisioningInventoryException(
35- 'Multiple machines exist '
36- 'matching those criteria; '
37- 'database ' + self.db + ' may be corrupt')
38- else:
39- raise UTAHProvisioningInventoryException(
40- 'Negative rowcount returned '
41- 'when attempting to request machine')
42+ if cargs['state'] == 'available':
43+ return self._take(cargs, *args, **kw)
44+ for machineinfo in result:
45+ cargs = dict(machineinfo)
46+ name = cargs['name']
47+ state = cargs['state']
48+ try:
49+ pid = int(state)
50+ try:
51+ if not (psutil.pid_exists(pid) and ('utah' in
52+ ' '.join(psutil.Process(pid).cmdline)
53+ or 'run_test_cobbler.py' in
54+ ' '.join(psutil.Process(pid).cmdline))):
55+ return self._take(cargs, *args, **kw)
56+ except ValueError:
57+ continue
58+
59 raise UTAHProvisioningInventoryException(
60 'All machines meeting criteria are currently unavailable')
61
62+ def _take(self, cargs, *args, **kw):
63+ machineid = cargs.pop('machineid')
64+ name = cargs.pop('name')
65+ state = cargs.pop('state')
66+ update = self.connection.execute(
67+ "UPDATE machines SET state=? WHERE machineid=? AND state=?",
68+ [os.getpid(), machineid, state]).rowcount
69+ if update == 1:
70+ machine = CobblerMachine(*args, cargs=cargs,
71+ inventory=self, name=name, **kw)
72+ self.machines.append(machine)
73+ return machine
74+ elif update == 0:
75+ raise UTAHProvisioningInventoryException(
76+ 'Machine was requested by another process '
77+ 'before we could request it')
78+ elif update > 1:
79+ raise UTAHProvisioningInventoryException(
80+ 'Multiple machines exist '
81+ 'matching those criteria; '
82+ 'database ' + self.db + ' may be corrupt')
83+ else:
84+ raise UTAHProvisioningInventoryException(
85+ 'Negative rowcount returned '
86+ 'when attempting to request machine')
87+
88 def release(self, machine=None, name=None):
89 if machine is not None:
90 name = machine.name

Subscribers

People subscribed via source and target branches