Merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8
- pool-size-1.8
- Merge into 1.8
Status: | Merged |
---|---|
Approved by: | Raphaël Badin |
Approved revision: | no longer in the source branch. |
Merged at revision: | 4021 |
Proposed branch: | lp:~rvb/maas/pool-size-1.8 |
Merge into: | lp:maas/1.8 |
Diff against target: |
193 lines (+44/-10) 6 files modified
src/maasserver/api/nodes.py (+2/-3) src/maasserver/plugin.py (+14/-0) src/maasserver/tests/test_plugin.py (+14/-2) src/maasserver/websockets/handlers/node.py (+3/-4) src/provisioningserver/drivers/power/__init__.py (+9/-0) src/provisioningserver/rpc/power.py (+2/-1) |
To merge this branch: | bzr merge lp:~rvb/maas/pool-size-1.8 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Raphaël Badin (community) | Approve | ||
Review via email: mp+264390@code.launchpad.net |
Commit message
Backport 4072: Set the maximum thread pool size to 100: this is meant as a workaround for deadlocks such as the one described in the linked bug. It's not meant to be a permanent solution but rather to buy us time to come up with a proper solution. Unify the power query timeout: use a unique timeout across the board and use a value that can accommodate seamicro's tendency to take 30s to reply to a power query.
Description of the change
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Ign http://
Get:2 http://
Get:3 http://
Get:4 http://
Get:5 http://
Get:6 http://
Hit http://
Hit http://
Ign http://
Hit http://
Get:7 http://
Hit http://
Get:8 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:9 http://
Get:10 http://
Get:11 http://
Get:12 http://
Get:13 http://
Get:14 http://
Get:15 http://
Hit http://
Hit http://
Get:16 http://
Get:17 http://
Ign http://
Ign http://
Fetched 1,155 kB in 16min 35s (1,159 B/s)
W: Failed to fetch http://
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Get:2 http://
Ign http://
Ign http://
Hit http://
Get:3 http://
Get:4 http://
Get:5 http://
Get:6 http://
Hit http://
Hit http://
Get:7 http://
Hit http://
Get:8 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:9 http://
Get:10 http://
Get:11 http://
Get:12 http://
Get:13 http://
Hit http://
Hit http://
Get:14 http://
Ign http://
Ign http://
Fetched 1,604 kB in 11min 55s (2,243 B/s)
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Get:2 http://
Get:3 http://
Get:4 http://
Get:5 http://
Get:6 http://
Hit http://
Hit http://
Ign http://
Ign http://
Hit http://
Get:7 http://
Hit http://
Get:8 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:9 http://
Get:10 http://
Get:11 http://
Get:12 http://
Hit http://
Hit http://
Get:13 http://
Ign http://
Ign http://
Fetched 1,882 kB in 9min 57s (3,151 B/s)
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Get:2 http://
Ign http://
Ign http://
Hit http://
Get:3 http://
Hit http://
Get:4 http://
Get:5 http://
Get:6 http://
Get:7 http://
Get:8 http://
Hit http://
Hit http://
Get:9 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:10 http://
Get:11 http://
Get:12 http://
Get:13 http://
Hit http://
Hit http://
Get:14 http://
Ign http://
Ign http://
Fetched 1,732 kB in 11min 26s (2,521 B/s)
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
E: Could not get lock /var/lib/
E: Unable to lock directory /var/lib/apt/lists/
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Get:2 http://
Ign http://
Ign http://
Hit http://
Get:3 http://
Hit http://
Get:4 http://
Get:5 http://
Get:6 http://
Get:7 http://
Hit http://
Hit http://
Get:8 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:9 http://
Get:10 http://
Get:11 http://
Get:12 http://
Hit http://
Hit http://
Get:13 http://
Ign http://
Ign http://
Fetched 1,663 kB in 9min 19s (2,974 B/s)
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
Preview Diff
1 | === modified file 'src/maasserver/api/nodes.py' | |||
2 | --- src/maasserver/api/nodes.py 2015-05-28 09:38:40 +0000 | |||
3 | +++ src/maasserver/api/nodes.py 2015-07-10 09:11:31 +0000 | |||
4 | @@ -77,6 +77,7 @@ | |||
5 | 77 | from maasserver.utils import find_nodegroup | 77 | from maasserver.utils import find_nodegroup |
6 | 78 | from maasserver.utils.orm import get_first | 78 | from maasserver.utils.orm import get_first |
7 | 79 | from piston.utils import rc | 79 | from piston.utils import rc |
8 | 80 | from provisioningserver.drivers.power import POWER_QUERY_TIMEOUT | ||
9 | 80 | from provisioningserver.power.poweraction import ( | 81 | from provisioningserver.power.poweraction import ( |
10 | 81 | PowerActionFail, | 82 | PowerActionFail, |
11 | 82 | UnknownPowerType, | 83 | UnknownPowerType, |
12 | @@ -696,9 +697,7 @@ | |||
13 | 696 | power_type=power_info.power_type, | 697 | power_type=power_info.power_type, |
14 | 697 | context=power_info.power_parameters) | 698 | context=power_info.power_parameters) |
15 | 698 | try: | 699 | try: |
19 | 699 | # Allow 30 seconds for the power query max as we're holding | 700 | state = call.wait(POWER_QUERY_TIMEOUT) |
17 | 700 | # up an appserver thread here. | ||
18 | 701 | state = call.wait(30) | ||
20 | 702 | except crochet.TimeoutError: | 701 | except crochet.TimeoutError: |
21 | 703 | maaslog.error( | 702 | maaslog.error( |
22 | 704 | "%s: Timed out waiting for power response in Node.power_state", | 703 | "%s: Timed out waiting for power response in Node.power_state", |
23 | 705 | 704 | ||
24 | === modified file 'src/maasserver/plugin.py' | |||
25 | --- src/maasserver/plugin.py 2015-05-07 18:14:38 +0000 | |||
26 | +++ src/maasserver/plugin.py 2015-07-10 09:11:31 +0000 | |||
27 | @@ -24,6 +24,7 @@ | |||
28 | 24 | from twisted.plugin import IPlugin | 24 | from twisted.plugin import IPlugin |
29 | 25 | from twisted.python import usage | 25 | from twisted.python import usage |
30 | 26 | from zope.interface import implementer | 26 | from zope.interface import implementer |
31 | 27 | from twisted.internet import reactor | ||
32 | 27 | 28 | ||
33 | 28 | 29 | ||
34 | 29 | def serverFromString(description): | 30 | def serverFromString(description): |
35 | @@ -45,6 +46,14 @@ | |||
36 | 45 | ] | 46 | ] |
37 | 46 | 47 | ||
38 | 47 | 48 | ||
39 | 49 | # The maximum number of threads used by the default twisted thread pool. | ||
40 | 50 | # This value is a trade-off between a small value (such as the default: 10) | ||
41 | 51 | # which can create deadlocks (see 1470013) and a huge value which can cause | ||
42 | 52 | # MAAS to hit other limitations such as the number of open files or the | ||
43 | 53 | # number of concurrent database connexions. | ||
44 | 54 | MAX_THREADS = 100 | ||
45 | 55 | |||
46 | 56 | |||
47 | 48 | @implementer(IServiceMaker, IPlugin) | 57 | @implementer(IServiceMaker, IPlugin) |
48 | 49 | class RegionServiceMaker: | 58 | class RegionServiceMaker: |
49 | 50 | """Create a service for the Twisted plugin.""" | 59 | """Create a service for the Twisted plugin.""" |
50 | @@ -76,6 +85,10 @@ | |||
51 | 76 | import crochet | 85 | import crochet |
52 | 77 | crochet.no_setup() | 86 | crochet.no_setup() |
53 | 78 | 87 | ||
54 | 88 | def _configurePoolSize(self): | ||
55 | 89 | threadpool = reactor.getThreadPool() | ||
56 | 90 | threadpool.adjustPoolsize(10, MAX_THREADS) | ||
57 | 91 | |||
58 | 79 | def _makeIntrospectionService(self, endpoint): | 92 | def _makeIntrospectionService(self, endpoint): |
59 | 80 | from provisioningserver.utils import introspect | 93 | from provisioningserver.utils import introspect |
60 | 81 | introspect_service = ( | 94 | introspect_service = ( |
61 | @@ -91,6 +104,7 @@ | |||
62 | 91 | self._configureLogging() | 104 | self._configureLogging() |
63 | 92 | self._configureDjango() | 105 | self._configureDjango() |
64 | 93 | self._configureCrochet() | 106 | self._configureCrochet() |
65 | 107 | self._configurePoolSize() | ||
66 | 94 | 108 | ||
67 | 95 | # Populate the region's event-loop with services. | 109 | # Populate the region's event-loop with services. |
68 | 96 | from maasserver import eventloop | 110 | from maasserver import eventloop |
69 | 97 | 111 | ||
70 | === modified file 'src/maasserver/tests/test_plugin.py' | |||
71 | --- src/maasserver/tests/test_plugin.py 2015-05-07 18:14:38 +0000 | |||
72 | +++ src/maasserver/tests/test_plugin.py 2015-07-10 09:11:31 +0000 | |||
73 | @@ -17,6 +17,7 @@ | |||
74 | 17 | import crochet | 17 | import crochet |
75 | 18 | from maasserver import eventloop | 18 | from maasserver import eventloop |
76 | 19 | from maasserver.plugin import ( | 19 | from maasserver.plugin import ( |
77 | 20 | MAX_THREADS, | ||
78 | 20 | Options, | 21 | Options, |
79 | 21 | RegionServiceMaker, | 22 | RegionServiceMaker, |
80 | 22 | ) | 23 | ) |
81 | @@ -24,11 +25,13 @@ | |||
82 | 24 | from maastesting.testcase import MAASTestCase | 25 | from maastesting.testcase import MAASTestCase |
83 | 25 | from provisioningserver import logger | 26 | from provisioningserver import logger |
84 | 26 | from provisioningserver.utils.twisted import asynchronous | 27 | from provisioningserver.utils.twisted import asynchronous |
85 | 28 | from testtools.matchers import GreaterThan | ||
86 | 27 | from twisted.application.service import MultiService | 29 | from twisted.application.service import MultiService |
87 | 30 | from twisted.internet import reactor | ||
88 | 28 | 31 | ||
89 | 29 | 32 | ||
90 | 30 | class TestOptions(MAASTestCase): | 33 | class TestOptions(MAASTestCase): |
92 | 31 | """Tests for `provisioningserver.plugin.Options`.""" | 34 | """Tests for `maasserver.plugin.Options`.""" |
93 | 32 | 35 | ||
94 | 33 | def test_defaults(self): | 36 | def test_defaults(self): |
95 | 34 | options = Options() | 37 | options = Options() |
96 | @@ -42,7 +45,7 @@ | |||
97 | 42 | 45 | ||
98 | 43 | 46 | ||
99 | 44 | class TestRegionServiceMaker(MAASTestCase): | 47 | class TestRegionServiceMaker(MAASTestCase): |
101 | 45 | """Tests for `provisioningserver.plugin.RegionServiceMaker`.""" | 48 | """Tests for `maasserver.plugin.RegionServiceMaker`.""" |
102 | 46 | 49 | ||
103 | 47 | def setUp(self): | 50 | def setUp(self): |
104 | 48 | super(TestRegionServiceMaker, self).setUp() | 51 | super(TestRegionServiceMaker, self).setUp() |
105 | @@ -78,3 +81,12 @@ | |||
106 | 78 | "Not all services are named.") | 81 | "Not all services are named.") |
107 | 79 | self.assertThat(logger.basicConfig, MockCalledOnceWith()) | 82 | self.assertThat(logger.basicConfig, MockCalledOnceWith()) |
108 | 80 | self.assertThat(crochet.no_setup, MockCalledOnceWith()) | 83 | self.assertThat(crochet.no_setup, MockCalledOnceWith()) |
109 | 84 | |||
110 | 85 | @asynchronous(timeout=5) | ||
111 | 86 | def test__sets_pool_size(self): | ||
112 | 87 | service_maker = RegionServiceMaker("Harry", "Hill") | ||
113 | 88 | service_maker.makeService(Options()) | ||
114 | 89 | threadpool = reactor.getThreadPool() | ||
115 | 90 | self.assertEqual(MAX_THREADS, threadpool.max) | ||
116 | 91 | # Max threads is reasonable. | ||
117 | 92 | self.assertThat(threadpool.max, GreaterThan(50)) | ||
118 | 81 | 93 | ||
119 | === modified file 'src/maasserver/websockets/handlers/node.py' | |||
120 | --- src/maasserver/websockets/handlers/node.py 2015-05-28 15:07:01 +0000 | |||
121 | +++ src/maasserver/websockets/handlers/node.py 2015-07-10 09:11:31 +0000 | |||
122 | @@ -48,6 +48,7 @@ | |||
123 | 48 | ) | 48 | ) |
124 | 49 | from metadataserver.enum import RESULT_TYPE | 49 | from metadataserver.enum import RESULT_TYPE |
125 | 50 | from metadataserver.models import NodeResult | 50 | from metadataserver.models import NodeResult |
126 | 51 | from provisioningserver.drivers.power import POWER_QUERY_TIMEOUT | ||
127 | 51 | from provisioningserver.logger import get_maas_logger | 52 | from provisioningserver.logger import get_maas_logger |
128 | 52 | from provisioningserver.power.poweraction import ( | 53 | from provisioningserver.power.poweraction import ( |
129 | 53 | PowerActionFail, | 54 | PowerActionFail, |
130 | @@ -547,9 +548,7 @@ | |||
131 | 547 | power_type=power_info.power_type, | 548 | power_type=power_info.power_type, |
132 | 548 | context=power_info.power_parameters) | 549 | context=power_info.power_parameters) |
133 | 549 | try: | 550 | try: |
137 | 550 | # Allow 15 seconds for the power query max as we're holding | 551 | state = call.wait(POWER_QUERY_TIMEOUT)['state'] |
135 | 551 | # up a thread waiting. | ||
136 | 552 | state = call.wait(15)['state'] | ||
138 | 553 | except crochet.TimeoutError: | 552 | except crochet.TimeoutError: |
139 | 554 | maaslog.error( | 553 | maaslog.error( |
140 | 555 | "%s: Timed out waiting for power response in " | 554 | "%s: Timed out waiting for power response in " |
141 | @@ -569,5 +568,5 @@ | |||
142 | 569 | # sure the change is committed and retried if required. Not pushing | 568 | # sure the change is committed and retried if required. Not pushing |
143 | 570 | # this to another thread, would result in the entire power query being | 569 | # this to another thread, would result in the entire power query being |
144 | 571 | # performed again. | 570 | # performed again. |
146 | 572 | update_power_state(state).wait(15) | 571 | update_power_state(state).wait(POWER_QUERY_TIMEOUT) |
147 | 573 | return state | 572 | return state |
148 | 574 | 573 | ||
149 | === modified file 'src/provisioningserver/drivers/power/__init__.py' | |||
150 | --- src/provisioningserver/drivers/power/__init__.py 2015-05-07 18:14:38 +0000 | |||
151 | +++ src/provisioningserver/drivers/power/__init__.py 2015-07-10 09:11:31 +0000 | |||
152 | @@ -13,6 +13,7 @@ | |||
153 | 13 | 13 | ||
154 | 14 | __metaclass__ = type | 14 | __metaclass__ = type |
155 | 15 | __all__ = [ | 15 | __all__ = [ |
156 | 16 | "POWER_QUERY_TIMEOUT", | ||
157 | 16 | "PowerActionError", | 17 | "PowerActionError", |
158 | 17 | "PowerAuthError", | 18 | "PowerAuthError", |
159 | 18 | "PowerConnError", | 19 | "PowerConnError", |
160 | @@ -29,6 +30,7 @@ | |||
161 | 29 | abstractmethod, | 30 | abstractmethod, |
162 | 30 | abstractproperty, | 31 | abstractproperty, |
163 | 31 | ) | 32 | ) |
164 | 33 | from datetime import timedelta | ||
165 | 32 | 34 | ||
166 | 33 | from jsonschema import validate | 35 | from jsonschema import validate |
167 | 34 | from provisioningserver.drivers import ( | 36 | from provisioningserver.drivers import ( |
168 | @@ -52,6 +54,13 @@ | |||
169 | 52 | } | 54 | } |
170 | 53 | 55 | ||
171 | 54 | 56 | ||
172 | 57 | # Timeout for the power query action. We might be holding up a thread for that | ||
173 | 58 | # long but some BMCs (notably seamicro) can take a long time to respond to | ||
174 | 59 | # a power query request. | ||
175 | 60 | # This should be configurable per-BMC. | ||
176 | 61 | POWER_QUERY_TIMEOUT = timedelta(seconds=45).total_seconds() | ||
177 | 62 | |||
178 | 63 | |||
179 | 55 | class PowerError(Exception): | 64 | class PowerError(Exception): |
180 | 56 | """Base error for all power driver failure commands.""" | 65 | """Base error for all power driver failure commands.""" |
181 | 57 | 66 | ||
182 | 58 | 67 | ||
183 | === modified file 'src/provisioningserver/rpc/power.py' | |||
184 | --- src/provisioningserver/rpc/power.py 2015-05-14 07:22:16 +0000 | |||
185 | +++ src/provisioningserver/rpc/power.py 2015-07-10 09:11:31 +0000 | |||
186 | @@ -78,7 +78,8 @@ | |||
187 | 78 | 78 | ||
188 | 79 | 79 | ||
189 | 80 | # Timeout for change_power_state(). We set it to 5 minutes by default, | 80 | # Timeout for change_power_state(). We set it to 5 minutes by default, |
191 | 81 | # but it would be lovely if this was configurable. | 81 | # but it would be lovely if this was configurable. This is only a backstop |
192 | 82 | # meant to cope with broken BMCs. | ||
193 | 82 | CHANGE_POWER_STATE_TIMEOUT = timedelta(minutes=5).total_seconds() | 83 | CHANGE_POWER_STATE_TIMEOUT = timedelta(minutes=5).total_seconds() |
194 | 83 | 84 | ||
195 | 84 | 85 |
Simple backport, self-approving. My testing in one of our lab (ivok's lab) indicates this definitely improves the situation.