Merge lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk into lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle
- Charm Bundles (0.0)
- trunk
- Merge into bundle
Proposed by
Kevin W Monroe
Status: | Merged |
---|---|
Merged at revision: | 5 |
Proposed branch: | lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk |
Merge into: | lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle |
Diff against target: |
462 lines (+147/-194) 5 files modified
README.md (+78/-78) bundle.yaml (+9/-9) tests/00-setup (+0/-8) tests/01-bundle.py (+57/-99) tests/tests.yaml (+3/-0) |
To merge this branch: | bzr merge lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Kevin W Monroe | Approve | ||
Review via email: mp+286952@code.launchpad.net |
Commit message
Description of the change
updates from bigdata-dev:
- version lock charms in bundle.yaml
- update bundle tests
- fix README formatting
To post a comment you must log in.
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'README.md' |
2 | --- README.md 2015-07-09 15:09:34 +0000 |
3 | +++ README.md 2016-02-23 21:01:00 +0000 |
4 | @@ -16,81 +16,81 @@ |
5 | - 1 Notebook (colocated on the Spark unit) |
6 | |
7 | |
8 | - ## Usage |
9 | - Deploy this bundle using juju-quickstart: |
10 | - |
11 | - juju quickstart u/bigdata-dev/apache-hadoop-spark-notebook |
12 | - |
13 | - See `juju quickstart --help` for deployment options, including machine |
14 | - constraints and how to deploy a locally modified version of the |
15 | - apache-hadoop-spark-notebook bundle.yaml. |
16 | - |
17 | - |
18 | - ## Testing the deployment |
19 | - |
20 | - ### Smoke test HDFS admin functionality |
21 | - Once the deployment is complete and the cluster is running, ssh to the HDFS |
22 | - Master unit: |
23 | - |
24 | - juju ssh hdfs-master/0 |
25 | - |
26 | - As the `ubuntu` user, create a temporary directory on the Hadoop file system. |
27 | - The steps below verify HDFS functionality: |
28 | - |
29 | - hdfs dfs -mkdir -p /tmp/hdfs-test |
30 | - hdfs dfs -chmod -R 777 /tmp/hdfs-test |
31 | - hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists |
32 | - hdfs dfs -rm -R /tmp/hdfs-test |
33 | - hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed |
34 | - exit |
35 | - |
36 | - ### Smoke test YARN and MapReduce |
37 | - Run the `terasort.sh` script from the Spark unit to generate and sort data. The |
38 | - steps below verify that Spark is communicating with the cluster via the plugin |
39 | - and that YARN and MapReduce are working as expected: |
40 | - |
41 | - juju ssh spark/0 |
42 | - ~/terasort.sh |
43 | - exit |
44 | - |
45 | - ### Smoke test HDFS functionality from user space |
46 | - From the Spark unit, delete the MapReduce output previously generated by the |
47 | - `terasort.sh` script: |
48 | - |
49 | - juju ssh spark/0 |
50 | - hdfs dfs -rm -R /user/ubuntu/tera_demo_out |
51 | - exit |
52 | - |
53 | - ### Smoke test Spark |
54 | - SSH to the Spark unit and run the SparkPi demo as follows: |
55 | - |
56 | - juju ssh spark/0 |
57 | - ~/sparkpi.sh |
58 | - exit |
59 | - |
60 | - ### Access the IPython Notebook web interface |
61 | - Access the notebook web interface at |
62 | - http://{spark_unit_ip_address}:8880. The ip address can be found by running |
63 | - `juju status spark/0 | grep public-address`. |
64 | - |
65 | - |
66 | - ## Scale Out Usage |
67 | - This bundle was designed to scale out. To increase the amount of Compute |
68 | - Slaves, you can add units to the compute-slave service. To add one unit: |
69 | - |
70 | - juju add-unit compute-slave |
71 | - |
72 | - Or you can add multiple units at once: |
73 | - |
74 | - juju add-unit -n4 compute-slave |
75 | - |
76 | - |
77 | - ## Contact Information |
78 | - |
79 | - - <bigdata-dev@lists.launchpad.net> |
80 | - |
81 | - |
82 | - ## Help |
83 | - |
84 | - - [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju) |
85 | - - [Juju community](https://jujucharms.com/community) |
86 | +## Usage |
87 | +Deploy this bundle using juju-quickstart: |
88 | + |
89 | + juju quickstart apache-hadoop-spark-notebook |
90 | + |
91 | +See `juju quickstart --help` for deployment options, including machine |
92 | +constraints and how to deploy a locally modified version of the |
93 | +apache-hadoop-spark-notebook bundle.yaml. |
94 | + |
95 | + |
96 | +## Testing the deployment |
97 | + |
98 | +### Smoke test HDFS admin functionality |
99 | +Once the deployment is complete and the cluster is running, ssh to the HDFS |
100 | +Master unit: |
101 | + |
102 | + juju ssh hdfs-master/0 |
103 | + |
104 | +As the `ubuntu` user, create a temporary directory on the Hadoop file system. |
105 | +The steps below verify HDFS functionality: |
106 | + |
107 | + hdfs dfs -mkdir -p /tmp/hdfs-test |
108 | + hdfs dfs -chmod -R 777 /tmp/hdfs-test |
109 | + hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists |
110 | + hdfs dfs -rm -R /tmp/hdfs-test |
111 | + hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed |
112 | + exit |
113 | + |
114 | +### Smoke test YARN and MapReduce |
115 | +Run the `terasort.sh` script from the Spark unit to generate and sort data. The |
116 | +steps below verify that Spark is communicating with the cluster via the plugin |
117 | +and that YARN and MapReduce are working as expected: |
118 | + |
119 | + juju ssh spark/0 |
120 | + ~/terasort.sh |
121 | + exit |
122 | + |
123 | +### Smoke test HDFS functionality from user space |
124 | +From the Spark unit, delete the MapReduce output previously generated by the |
125 | +`terasort.sh` script: |
126 | + |
127 | + juju ssh spark/0 |
128 | + hdfs dfs -rm -R /user/ubuntu/tera_demo_out |
129 | + exit |
130 | + |
131 | +### Smoke test Spark |
132 | +SSH to the Spark unit and run the SparkPi demo as follows: |
133 | + |
134 | + juju ssh spark/0 |
135 | + ~/sparkpi.sh |
136 | + exit |
137 | + |
138 | +### Access the IPython Notebook web interface |
139 | +Access the notebook web interface at |
140 | +http://{spark_unit_ip_address}:8880. The ip address can be found by running |
141 | +`juju status spark/0 | grep public-address`. |
142 | + |
143 | + |
144 | +## Scale Out Usage |
145 | +This bundle was designed to scale out. To increase the amount of Compute |
146 | +Slaves, you can add units to the compute-slave service. To add one unit: |
147 | + |
148 | + juju add-unit compute-slave |
149 | + |
150 | +Or you can add multiple units at once: |
151 | + |
152 | + juju add-unit -n4 compute-slave |
153 | + |
154 | + |
155 | +## Contact Information |
156 | + |
157 | +- <bigdata-dev@lists.launchpad.net> |
158 | + |
159 | + |
160 | +## Help |
161 | + |
162 | +- [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju) |
163 | +- [Juju community](https://jujucharms.com/community) |
164 | |
165 | === modified file 'bundle.yaml' |
166 | --- bundle.yaml 2015-07-16 20:35:31 +0000 |
167 | +++ bundle.yaml 2016-02-23 21:01:00 +0000 |
168 | @@ -1,46 +1,46 @@ |
169 | services: |
170 | compute-slave: |
171 | - charm: cs:trusty/apache-hadoop-compute-slave |
172 | + charm: cs:trusty/apache-hadoop-compute-slave-9 |
173 | num_units: 3 |
174 | annotations: |
175 | gui-x: "300" |
176 | gui-y: "200" |
177 | - constraints: mem=3G |
178 | + constraints: mem=7G |
179 | hdfs-master: |
180 | - charm: cs:trusty/apache-hadoop-hdfs-master |
181 | + charm: cs:trusty/apache-hadoop-hdfs-master-9 |
182 | num_units: 1 |
183 | annotations: |
184 | gui-x: "600" |
185 | gui-y: "350" |
186 | constraints: mem=7G |
187 | plugin: |
188 | - charm: cs:trusty/apache-hadoop-plugin |
189 | + charm: cs:trusty/apache-hadoop-plugin-10 |
190 | annotations: |
191 | gui-x: "900" |
192 | gui-y: "200" |
193 | secondary-namenode: |
194 | - charm: cs:trusty/apache-hadoop-hdfs-secondary |
195 | + charm: cs:trusty/apache-hadoop-hdfs-secondary-7 |
196 | num_units: 1 |
197 | annotations: |
198 | gui-x: "600" |
199 | gui-y: "600" |
200 | constraints: mem=7G |
201 | spark: |
202 | - charm: cs:trusty/apache-spark |
203 | + charm: cs:trusty/apache-spark-6 |
204 | num_units: 1 |
205 | annotations: |
206 | gui-x: "1200" |
207 | gui-y: "200" |
208 | constraints: mem=3G |
209 | yarn-master: |
210 | - charm: cs:trusty/apache-hadoop-yarn-master |
211 | + charm: cs:trusty/apache-hadoop-yarn-master-7 |
212 | num_units: 1 |
213 | annotations: |
214 | gui-x: "600" |
215 | gui-y: "100" |
216 | constraints: mem=7G |
217 | notebook: |
218 | - charm: cs:trusty/apache-spark-notebook |
219 | + charm: cs:trusty/apache-spark-notebook-3 |
220 | annotations: |
221 | gui-x: "1200" |
222 | gui-y: "450" |
223 | @@ -53,4 +53,4 @@ |
224 | - [plugin, yarn-master] |
225 | - [plugin, hdfs-master] |
226 | - [spark, plugin] |
227 | - - [notebook, spark] |
228 | + - [spark, notebook] |
229 | |
230 | === removed file 'tests/00-setup' |
231 | --- tests/00-setup 2015-07-16 20:35:31 +0000 |
232 | +++ tests/00-setup 1970-01-01 00:00:00 +0000 |
233 | @@ -1,8 +0,0 @@ |
234 | -#!/bin/bash |
235 | - |
236 | -if ! dpkg -s amulet &> /dev/null; then |
237 | - echo Installing Amulet... |
238 | - sudo add-apt-repository -y ppa:juju/stable |
239 | - sudo apt-get update |
240 | - sudo apt-get -y install amulet |
241 | -fi |
242 | |
243 | === modified file 'tests/01-bundle.py' |
244 | --- tests/01-bundle.py 2015-07-16 20:35:31 +0000 |
245 | +++ tests/01-bundle.py 2016-02-23 21:01:00 +0000 |
246 | @@ -1,61 +1,32 @@ |
247 | #!/usr/bin/env python3 |
248 | |
249 | import os |
250 | -import time |
251 | import unittest |
252 | |
253 | import yaml |
254 | import amulet |
255 | |
256 | |
257 | -class Base(object): |
258 | - """ |
259 | - Base class for tests for Apache Hadoop Bundle. |
260 | - """ |
261 | +class TestBundle(unittest.TestCase): |
262 | bundle_file = os.path.join(os.path.dirname(__file__), '..', 'bundle.yaml') |
263 | - profile_name = None |
264 | |
265 | @classmethod |
266 | - def deploy(cls): |
267 | - # classmethod inheritance doesn't work quite right with |
268 | - # setUpClass / tearDownClass, so subclasses have to manually call this |
269 | + def setUpClass(cls): |
270 | cls.d = amulet.Deployment(series='trusty') |
271 | with open(cls.bundle_file) as f: |
272 | bun = f.read() |
273 | - profiles = yaml.safe_load(bun) |
274 | - # amulet always selects the first profile, so we have to fudge it here |
275 | - profile = {cls.profile_name: profiles[cls.profile_name]} |
276 | - cls.d.load(profile) |
277 | - cls.d.setup(timeout=9000) |
278 | - cls.d.sentry.wait() |
279 | - cls.hdfs = cls.d.sentry.unit['hdfs-master/0'] |
280 | - cls.yarn = cls.d.sentry.unit['yarn-master/0'] |
281 | - cls.slave = cls.d.sentry.unit['compute-slave/0'] |
282 | - cls.secondary = cls.d.sentry.unit['secondary-namenode/0'] |
283 | - cls.plugin = cls.d.sentry.unit['plugin/0'] |
284 | - cls.client = cls.d.sentry.unit['client/0'] |
285 | - |
286 | - @classmethod |
287 | - def reset_env(cls): |
288 | - # classmethod inheritance doesn't work quite right with |
289 | - # setUpClass / tearDownClass, so subclasses have to manually call this |
290 | - juju_env = amulet.helpers.default_environment() |
291 | - services = ['hdfs-master', 'yarn-master', 'compute-slave', 'secondary-namenode', 'plugin', 'client'] |
292 | - |
293 | - def check_env_clear(): |
294 | - state = amulet.waiter.state(juju_env=juju_env) |
295 | - for service in services: |
296 | - if state.get(service, {}) != {}: |
297 | - return False |
298 | - return True |
299 | - |
300 | - for service in services: |
301 | - cls.d.remove(service) |
302 | - with amulet.helpers.timeout(300): |
303 | - while not check_env_clear(): |
304 | - time.sleep(5) |
305 | - |
306 | - def test_hadoop_components(self): |
307 | + bundle = yaml.safe_load(bun) |
308 | + cls.d.load(bundle) |
309 | + cls.d.setup(timeout=1800) |
310 | + cls.d.sentry.wait_for_messages({'notebook': 'Ready'}, timeout=1800) |
311 | + cls.hdfs = cls.d.sentry['hdfs-master'][0] |
312 | + cls.yarn = cls.d.sentry['yarn-master'][0] |
313 | + cls.slave = cls.d.sentry['compute-slave'][0] |
314 | + cls.secondary = cls.d.sentry['secondary-namenode'][0] |
315 | + cls.spark = cls.d.sentry['spark'][0] |
316 | + cls.notebook = cls.d.sentry['notebook'][0] |
317 | + |
318 | + def test_components(self): |
319 | """ |
320 | Confirm that all of the required components are up and running. |
321 | """ |
322 | @@ -63,17 +34,48 @@ |
323 | yarn, retcode = self.yarn.run("pgrep -a java") |
324 | slave, retcode = self.slave.run("pgrep -a java") |
325 | secondary, retcode = self.secondary.run("pgrep -a java") |
326 | - client, retcode = self.client.run("pgrep -a java") |
327 | + spark, retcode = self.spark.run("pgrep -a java") |
328 | + notebook, retcode = self.spark.run("pgrep -a python") |
329 | |
330 | # .NameNode needs the . to differentiate it from SecondaryNameNode |
331 | assert '.NameNode' in hdfs, "NameNode not started" |
332 | + assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master" |
333 | + assert '.NameNode' not in slave, "NameNode should not be running on compute-slave" |
334 | + assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode" |
335 | + assert '.NameNode' not in spark, "NameNode should not be running on spark" |
336 | + |
337 | assert 'ResourceManager' in yarn, "ResourceManager not started" |
338 | + assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master" |
339 | + assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave" |
340 | + assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode" |
341 | + assert 'ResourceManager' not in spark, "ResourceManager should not be running on spark" |
342 | + |
343 | assert 'JobHistoryServer' in yarn, "JobHistoryServer not started" |
344 | + assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master" |
345 | + assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave" |
346 | + assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode" |
347 | + assert 'JobHistoryServer' not in spark, "JobHistoryServer should not be running on spark" |
348 | + |
349 | assert 'NodeManager' in slave, "NodeManager not started" |
350 | + assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master" |
351 | + assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master" |
352 | + assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode" |
353 | + assert 'NodeManager' not in spark, "NodeManager should not be running on spark" |
354 | + |
355 | assert 'DataNode' in slave, "DataServer not started" |
356 | + assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master" |
357 | + assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master" |
358 | + assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode" |
359 | + assert 'DataNode' not in spark, "DataNode should not be running on spark" |
360 | + |
361 | assert 'SecondaryNameNode' in secondary, "SecondaryNameNode not started" |
362 | + assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master" |
363 | + assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master" |
364 | + assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave" |
365 | + assert 'SecondaryNameNode' not in spark, "SecondaryNameNode should not be running on spark" |
366 | |
367 | - return hdfs, yarn, slave, secondary, client # allow subclasses to do additional checks |
368 | + assert 'spark' in spark, 'Spark should be running on spark' |
369 | + assert 'notebook' in notebook, 'Notebook should be running on spark' |
370 | |
371 | def test_hdfs_dir(self): |
372 | """ |
373 | @@ -84,11 +86,11 @@ |
374 | |
375 | NB: These are order-dependent, so must be done as part of a single test case. |
376 | """ |
377 | - output, retcode = self.client.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'") |
378 | + output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'") |
379 | assert retcode == 0, "Created a user directory on hdfs FAILED:\n{}".format(output) |
380 | - output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'") |
381 | + output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'") |
382 | assert retcode == 0, "Assigning an owner to hdfs directory FAILED:\n{}".format(output) |
383 | - output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'") |
384 | + output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'") |
385 | assert retcode == 0, "seting directory permission on hdfs FAILED:\n{}".format(output) |
386 | |
387 | def test_yarn_mapreduce_exe(self): |
388 | @@ -112,59 +114,15 @@ |
389 | ('cleanup', "su hdfs -c 'hdfs dfs -rm -r /user/ubuntu/teragenout'"), |
390 | ] |
391 | for name, step in test_steps: |
392 | - output, retcode = self.client.run(step) |
393 | + output, retcode = self.spark.run(step) |
394 | assert retcode == 0, "{} FAILED:\n{}".format(name, output) |
395 | |
396 | - |
397 | -class TestScalable(unittest.TestCase, Base): |
398 | - profile_name = 'apache-core-batch-processing' |
399 | - |
400 | - @classmethod |
401 | - def setUpClass(cls): |
402 | - cls.deploy() |
403 | - |
404 | - @classmethod |
405 | - def tearDownClass(cls): |
406 | - cls.reset_env() |
407 | - |
408 | - def test_hadoop_components(self): |
409 | - """ |
410 | - In addition to testing that the components are running where they |
411 | - are supposed to be, confirm that none of them are also running where |
412 | - they shouldn't be. |
413 | - """ |
414 | - hdfs, yarn, slave, secondary, client = super(TestScalable, self).test_hadoop_components() |
415 | - |
416 | - # .NameNode needs the . to differentiate it from SecondaryNameNode |
417 | - assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master" |
418 | - assert '.NameNode' not in slave, "NameNode should not be running on compute-slave" |
419 | - assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode" |
420 | - assert '.NameNode' not in client, "NameNode should not be running on client" |
421 | - |
422 | - assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master" |
423 | - assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave" |
424 | - assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode" |
425 | - assert 'ResourceManager' not in client, "ResourceManager should not be running on client" |
426 | - |
427 | - assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master" |
428 | - assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave" |
429 | - assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode" |
430 | - assert 'JobHistoryServer' not in client, "JobHistoryServer should not be running on client" |
431 | - |
432 | - assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master" |
433 | - assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master" |
434 | - assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode" |
435 | - assert 'NodeManager' not in client, "NodeManager should not be running on client" |
436 | - |
437 | - assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master" |
438 | - assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master" |
439 | - assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode" |
440 | - assert 'DataNode' not in client, "DataNode should not be running on client" |
441 | - |
442 | - assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master" |
443 | - assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master" |
444 | - assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave" |
445 | - assert 'SecondaryNameNode' not in client, "SecondaryNameNode should not be running on client" |
446 | + def test_spark(self): |
447 | + output, retcode = self.spark.run("su ubuntu -c 'bash -lc /home/ubuntu/sparkpi.sh 2>&1'") |
448 | + assert 'Pi is roughly' in output, 'SparkPI test failed: %s' % output |
449 | + |
450 | + def test_notebook(self): |
451 | + pass # requires javascript; how to test? |
452 | |
453 | |
454 | if __name__ == '__main__': |
455 | |
456 | === added file 'tests/tests.yaml' |
457 | --- tests/tests.yaml 1970-01-01 00:00:00 +0000 |
458 | +++ tests/tests.yaml 2016-02-23 21:01:00 +0000 |
459 | @@ -0,0 +1,3 @@ |
460 | +reset: false |
461 | +packages: |
462 | + - amulet |
+1, test deploy on AWS succeeded