Merge lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk into lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle

Proposed by Kevin W Monroe on 2016-02-23
Status: Merged
Merged at revision: 5
Proposed branch: lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk
Merge into: lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle
Diff against target: 462 lines (+147/-194)
5 files modified
README.md (+78/-78)
bundle.yaml (+9/-9)
tests/00-setup (+0/-8)
tests/01-bundle.py (+57/-99)
tests/tests.yaml (+3/-0)
To merge this branch: bzr merge lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk
Reviewer Review Type Date Requested Status
Kevin W Monroe Approve on 2016-02-23
Review via email: mp+286952@code.launchpad.net

Description of the change

updates from bigdata-dev:
- version lock charms in bundle.yaml
- update bundle tests
- fix README formatting

To post a comment you must log in.
Kevin W Monroe (kwmonroe) wrote :

+1, test deploy on AWS succeeded

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'README.md'
2--- README.md 2015-07-09 15:09:34 +0000
3+++ README.md 2016-02-23 21:01:00 +0000
4@@ -16,81 +16,81 @@
5 - 1 Notebook (colocated on the Spark unit)
6
7
8- ## Usage
9- Deploy this bundle using juju-quickstart:
10-
11- juju quickstart u/bigdata-dev/apache-hadoop-spark-notebook
12-
13- See `juju quickstart --help` for deployment options, including machine
14- constraints and how to deploy a locally modified version of the
15- apache-hadoop-spark-notebook bundle.yaml.
16-
17-
18- ## Testing the deployment
19-
20- ### Smoke test HDFS admin functionality
21- Once the deployment is complete and the cluster is running, ssh to the HDFS
22- Master unit:
23-
24- juju ssh hdfs-master/0
25-
26- As the `ubuntu` user, create a temporary directory on the Hadoop file system.
27- The steps below verify HDFS functionality:
28-
29- hdfs dfs -mkdir -p /tmp/hdfs-test
30- hdfs dfs -chmod -R 777 /tmp/hdfs-test
31- hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
32- hdfs dfs -rm -R /tmp/hdfs-test
33- hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
34- exit
35-
36- ### Smoke test YARN and MapReduce
37- Run the `terasort.sh` script from the Spark unit to generate and sort data. The
38- steps below verify that Spark is communicating with the cluster via the plugin
39- and that YARN and MapReduce are working as expected:
40-
41- juju ssh spark/0
42- ~/terasort.sh
43- exit
44-
45- ### Smoke test HDFS functionality from user space
46- From the Spark unit, delete the MapReduce output previously generated by the
47- `terasort.sh` script:
48-
49- juju ssh spark/0
50- hdfs dfs -rm -R /user/ubuntu/tera_demo_out
51- exit
52-
53- ### Smoke test Spark
54- SSH to the Spark unit and run the SparkPi demo as follows:
55-
56- juju ssh spark/0
57- ~/sparkpi.sh
58- exit
59-
60- ### Access the IPython Notebook web interface
61- Access the notebook web interface at
62- http://{spark_unit_ip_address}:8880. The ip address can be found by running
63- `juju status spark/0 | grep public-address`.
64-
65-
66- ## Scale Out Usage
67- This bundle was designed to scale out. To increase the amount of Compute
68- Slaves, you can add units to the compute-slave service. To add one unit:
69-
70- juju add-unit compute-slave
71-
72- Or you can add multiple units at once:
73-
74- juju add-unit -n4 compute-slave
75-
76-
77- ## Contact Information
78-
79- - <bigdata-dev@lists.launchpad.net>
80-
81-
82- ## Help
83-
84- - [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
85- - [Juju community](https://jujucharms.com/community)
86+## Usage
87+Deploy this bundle using juju-quickstart:
88+
89+ juju quickstart apache-hadoop-spark-notebook
90+
91+See `juju quickstart --help` for deployment options, including machine
92+constraints and how to deploy a locally modified version of the
93+apache-hadoop-spark-notebook bundle.yaml.
94+
95+
96+## Testing the deployment
97+
98+### Smoke test HDFS admin functionality
99+Once the deployment is complete and the cluster is running, ssh to the HDFS
100+Master unit:
101+
102+ juju ssh hdfs-master/0
103+
104+As the `ubuntu` user, create a temporary directory on the Hadoop file system.
105+The steps below verify HDFS functionality:
106+
107+ hdfs dfs -mkdir -p /tmp/hdfs-test
108+ hdfs dfs -chmod -R 777 /tmp/hdfs-test
109+ hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
110+ hdfs dfs -rm -R /tmp/hdfs-test
111+ hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
112+ exit
113+
114+### Smoke test YARN and MapReduce
115+Run the `terasort.sh` script from the Spark unit to generate and sort data. The
116+steps below verify that Spark is communicating with the cluster via the plugin
117+and that YARN and MapReduce are working as expected:
118+
119+ juju ssh spark/0
120+ ~/terasort.sh
121+ exit
122+
123+### Smoke test HDFS functionality from user space
124+From the Spark unit, delete the MapReduce output previously generated by the
125+`terasort.sh` script:
126+
127+ juju ssh spark/0
128+ hdfs dfs -rm -R /user/ubuntu/tera_demo_out
129+ exit
130+
131+### Smoke test Spark
132+SSH to the Spark unit and run the SparkPi demo as follows:
133+
134+ juju ssh spark/0
135+ ~/sparkpi.sh
136+ exit
137+
138+### Access the IPython Notebook web interface
139+Access the notebook web interface at
140+http://{spark_unit_ip_address}:8880. The ip address can be found by running
141+`juju status spark/0 | grep public-address`.
142+
143+
144+## Scale Out Usage
145+This bundle was designed to scale out. To increase the amount of Compute
146+Slaves, you can add units to the compute-slave service. To add one unit:
147+
148+ juju add-unit compute-slave
149+
150+Or you can add multiple units at once:
151+
152+ juju add-unit -n4 compute-slave
153+
154+
155+## Contact Information
156+
157+- <bigdata-dev@lists.launchpad.net>
158+
159+
160+## Help
161+
162+- [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
163+- [Juju community](https://jujucharms.com/community)
164
165=== modified file 'bundle.yaml'
166--- bundle.yaml 2015-07-16 20:35:31 +0000
167+++ bundle.yaml 2016-02-23 21:01:00 +0000
168@@ -1,46 +1,46 @@
169 services:
170 compute-slave:
171- charm: cs:trusty/apache-hadoop-compute-slave
172+ charm: cs:trusty/apache-hadoop-compute-slave-9
173 num_units: 3
174 annotations:
175 gui-x: "300"
176 gui-y: "200"
177- constraints: mem=3G
178+ constraints: mem=7G
179 hdfs-master:
180- charm: cs:trusty/apache-hadoop-hdfs-master
181+ charm: cs:trusty/apache-hadoop-hdfs-master-9
182 num_units: 1
183 annotations:
184 gui-x: "600"
185 gui-y: "350"
186 constraints: mem=7G
187 plugin:
188- charm: cs:trusty/apache-hadoop-plugin
189+ charm: cs:trusty/apache-hadoop-plugin-10
190 annotations:
191 gui-x: "900"
192 gui-y: "200"
193 secondary-namenode:
194- charm: cs:trusty/apache-hadoop-hdfs-secondary
195+ charm: cs:trusty/apache-hadoop-hdfs-secondary-7
196 num_units: 1
197 annotations:
198 gui-x: "600"
199 gui-y: "600"
200 constraints: mem=7G
201 spark:
202- charm: cs:trusty/apache-spark
203+ charm: cs:trusty/apache-spark-6
204 num_units: 1
205 annotations:
206 gui-x: "1200"
207 gui-y: "200"
208 constraints: mem=3G
209 yarn-master:
210- charm: cs:trusty/apache-hadoop-yarn-master
211+ charm: cs:trusty/apache-hadoop-yarn-master-7
212 num_units: 1
213 annotations:
214 gui-x: "600"
215 gui-y: "100"
216 constraints: mem=7G
217 notebook:
218- charm: cs:trusty/apache-spark-notebook
219+ charm: cs:trusty/apache-spark-notebook-3
220 annotations:
221 gui-x: "1200"
222 gui-y: "450"
223@@ -53,4 +53,4 @@
224 - [plugin, yarn-master]
225 - [plugin, hdfs-master]
226 - [spark, plugin]
227- - [notebook, spark]
228+ - [spark, notebook]
229
230=== removed file 'tests/00-setup'
231--- tests/00-setup 2015-07-16 20:35:31 +0000
232+++ tests/00-setup 1970-01-01 00:00:00 +0000
233@@ -1,8 +0,0 @@
234-#!/bin/bash
235-
236-if ! dpkg -s amulet &> /dev/null; then
237- echo Installing Amulet...
238- sudo add-apt-repository -y ppa:juju/stable
239- sudo apt-get update
240- sudo apt-get -y install amulet
241-fi
242
243=== modified file 'tests/01-bundle.py'
244--- tests/01-bundle.py 2015-07-16 20:35:31 +0000
245+++ tests/01-bundle.py 2016-02-23 21:01:00 +0000
246@@ -1,61 +1,32 @@
247 #!/usr/bin/env python3
248
249 import os
250-import time
251 import unittest
252
253 import yaml
254 import amulet
255
256
257-class Base(object):
258- """
259- Base class for tests for Apache Hadoop Bundle.
260- """
261+class TestBundle(unittest.TestCase):
262 bundle_file = os.path.join(os.path.dirname(__file__), '..', 'bundle.yaml')
263- profile_name = None
264
265 @classmethod
266- def deploy(cls):
267- # classmethod inheritance doesn't work quite right with
268- # setUpClass / tearDownClass, so subclasses have to manually call this
269+ def setUpClass(cls):
270 cls.d = amulet.Deployment(series='trusty')
271 with open(cls.bundle_file) as f:
272 bun = f.read()
273- profiles = yaml.safe_load(bun)
274- # amulet always selects the first profile, so we have to fudge it here
275- profile = {cls.profile_name: profiles[cls.profile_name]}
276- cls.d.load(profile)
277- cls.d.setup(timeout=9000)
278- cls.d.sentry.wait()
279- cls.hdfs = cls.d.sentry.unit['hdfs-master/0']
280- cls.yarn = cls.d.sentry.unit['yarn-master/0']
281- cls.slave = cls.d.sentry.unit['compute-slave/0']
282- cls.secondary = cls.d.sentry.unit['secondary-namenode/0']
283- cls.plugin = cls.d.sentry.unit['plugin/0']
284- cls.client = cls.d.sentry.unit['client/0']
285-
286- @classmethod
287- def reset_env(cls):
288- # classmethod inheritance doesn't work quite right with
289- # setUpClass / tearDownClass, so subclasses have to manually call this
290- juju_env = amulet.helpers.default_environment()
291- services = ['hdfs-master', 'yarn-master', 'compute-slave', 'secondary-namenode', 'plugin', 'client']
292-
293- def check_env_clear():
294- state = amulet.waiter.state(juju_env=juju_env)
295- for service in services:
296- if state.get(service, {}) != {}:
297- return False
298- return True
299-
300- for service in services:
301- cls.d.remove(service)
302- with amulet.helpers.timeout(300):
303- while not check_env_clear():
304- time.sleep(5)
305-
306- def test_hadoop_components(self):
307+ bundle = yaml.safe_load(bun)
308+ cls.d.load(bundle)
309+ cls.d.setup(timeout=1800)
310+ cls.d.sentry.wait_for_messages({'notebook': 'Ready'}, timeout=1800)
311+ cls.hdfs = cls.d.sentry['hdfs-master'][0]
312+ cls.yarn = cls.d.sentry['yarn-master'][0]
313+ cls.slave = cls.d.sentry['compute-slave'][0]
314+ cls.secondary = cls.d.sentry['secondary-namenode'][0]
315+ cls.spark = cls.d.sentry['spark'][0]
316+ cls.notebook = cls.d.sentry['notebook'][0]
317+
318+ def test_components(self):
319 """
320 Confirm that all of the required components are up and running.
321 """
322@@ -63,17 +34,48 @@
323 yarn, retcode = self.yarn.run("pgrep -a java")
324 slave, retcode = self.slave.run("pgrep -a java")
325 secondary, retcode = self.secondary.run("pgrep -a java")
326- client, retcode = self.client.run("pgrep -a java")
327+ spark, retcode = self.spark.run("pgrep -a java")
328+ notebook, retcode = self.spark.run("pgrep -a python")
329
330 # .NameNode needs the . to differentiate it from SecondaryNameNode
331 assert '.NameNode' in hdfs, "NameNode not started"
332+ assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master"
333+ assert '.NameNode' not in slave, "NameNode should not be running on compute-slave"
334+ assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode"
335+ assert '.NameNode' not in spark, "NameNode should not be running on spark"
336+
337 assert 'ResourceManager' in yarn, "ResourceManager not started"
338+ assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master"
339+ assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave"
340+ assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode"
341+ assert 'ResourceManager' not in spark, "ResourceManager should not be running on spark"
342+
343 assert 'JobHistoryServer' in yarn, "JobHistoryServer not started"
344+ assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master"
345+ assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave"
346+ assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode"
347+ assert 'JobHistoryServer' not in spark, "JobHistoryServer should not be running on spark"
348+
349 assert 'NodeManager' in slave, "NodeManager not started"
350+ assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master"
351+ assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master"
352+ assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode"
353+ assert 'NodeManager' not in spark, "NodeManager should not be running on spark"
354+
355 assert 'DataNode' in slave, "DataServer not started"
356+ assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master"
357+ assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master"
358+ assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode"
359+ assert 'DataNode' not in spark, "DataNode should not be running on spark"
360+
361 assert 'SecondaryNameNode' in secondary, "SecondaryNameNode not started"
362+ assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master"
363+ assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master"
364+ assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave"
365+ assert 'SecondaryNameNode' not in spark, "SecondaryNameNode should not be running on spark"
366
367- return hdfs, yarn, slave, secondary, client # allow subclasses to do additional checks
368+ assert 'spark' in spark, 'Spark should be running on spark'
369+ assert 'notebook' in notebook, 'Notebook should be running on spark'
370
371 def test_hdfs_dir(self):
372 """
373@@ -84,11 +86,11 @@
374
375 NB: These are order-dependent, so must be done as part of a single test case.
376 """
377- output, retcode = self.client.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'")
378+ output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'")
379 assert retcode == 0, "Created a user directory on hdfs FAILED:\n{}".format(output)
380- output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'")
381+ output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'")
382 assert retcode == 0, "Assigning an owner to hdfs directory FAILED:\n{}".format(output)
383- output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'")
384+ output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'")
385 assert retcode == 0, "seting directory permission on hdfs FAILED:\n{}".format(output)
386
387 def test_yarn_mapreduce_exe(self):
388@@ -112,59 +114,15 @@
389 ('cleanup', "su hdfs -c 'hdfs dfs -rm -r /user/ubuntu/teragenout'"),
390 ]
391 for name, step in test_steps:
392- output, retcode = self.client.run(step)
393+ output, retcode = self.spark.run(step)
394 assert retcode == 0, "{} FAILED:\n{}".format(name, output)
395
396-
397-class TestScalable(unittest.TestCase, Base):
398- profile_name = 'apache-core-batch-processing'
399-
400- @classmethod
401- def setUpClass(cls):
402- cls.deploy()
403-
404- @classmethod
405- def tearDownClass(cls):
406- cls.reset_env()
407-
408- def test_hadoop_components(self):
409- """
410- In addition to testing that the components are running where they
411- are supposed to be, confirm that none of them are also running where
412- they shouldn't be.
413- """
414- hdfs, yarn, slave, secondary, client = super(TestScalable, self).test_hadoop_components()
415-
416- # .NameNode needs the . to differentiate it from SecondaryNameNode
417- assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master"
418- assert '.NameNode' not in slave, "NameNode should not be running on compute-slave"
419- assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode"
420- assert '.NameNode' not in client, "NameNode should not be running on client"
421-
422- assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master"
423- assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave"
424- assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode"
425- assert 'ResourceManager' not in client, "ResourceManager should not be running on client"
426-
427- assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master"
428- assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave"
429- assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode"
430- assert 'JobHistoryServer' not in client, "JobHistoryServer should not be running on client"
431-
432- assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master"
433- assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master"
434- assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode"
435- assert 'NodeManager' not in client, "NodeManager should not be running on client"
436-
437- assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master"
438- assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master"
439- assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode"
440- assert 'DataNode' not in client, "DataNode should not be running on client"
441-
442- assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master"
443- assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master"
444- assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave"
445- assert 'SecondaryNameNode' not in client, "SecondaryNameNode should not be running on client"
446+ def test_spark(self):
447+ output, retcode = self.spark.run("su ubuntu -c 'bash -lc /home/ubuntu/sparkpi.sh 2>&1'")
448+ assert 'Pi is roughly' in output, 'SparkPI test failed: %s' % output
449+
450+ def test_notebook(self):
451+ pass # requires javascript; how to test?
452
453
454 if __name__ == '__main__':
455
456=== added file 'tests/tests.yaml'
457--- tests/tests.yaml 1970-01-01 00:00:00 +0000
458+++ tests/tests.yaml 2016-02-23 21:01:00 +0000
459@@ -0,0 +1,3 @@
460+reset: false
461+packages:
462+ - amulet

Subscribers

People subscribed via source and target branches