Merge lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk into lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle

Proposed by Kevin W Monroe on 2016-02-23
Status: Merged
Merged at revision: 5
Proposed branch: lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk
Merge into: lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle
Diff against target: 462 lines (+147/-194)
5 files modified
README.md (+78/-78)
bundle.yaml (+9/-9)
tests/00-setup (+0/-8)
tests/01-bundle.py (+57/-99)
tests/tests.yaml (+3/-0)
To merge this branch: bzr merge lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk
Reviewer Review Type Date Requested Status
Kevin W Monroe Approve on 2016-02-23
Review via email: mp+286952@code.launchpad.net

Description of the change

updates from bigdata-dev:
- version lock charms in bundle.yaml
- update bundle tests
- fix README formatting

To post a comment you must log in.
Kevin W Monroe (kwmonroe) wrote :

+1, test deploy on AWS succeeded

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'README.md'
--- README.md 2015-07-09 15:09:34 +0000
+++ README.md 2016-02-23 21:01:00 +0000
@@ -16,81 +16,81 @@
16 - 1 Notebook (colocated on the Spark unit)16 - 1 Notebook (colocated on the Spark unit)
1717
1818
19 ## Usage19## Usage
20 Deploy this bundle using juju-quickstart:20Deploy this bundle using juju-quickstart:
2121
22 juju quickstart u/bigdata-dev/apache-hadoop-spark-notebook22 juju quickstart apache-hadoop-spark-notebook
2323
24 See `juju quickstart --help` for deployment options, including machine24See `juju quickstart --help` for deployment options, including machine
25 constraints and how to deploy a locally modified version of the25constraints and how to deploy a locally modified version of the
26 apache-hadoop-spark-notebook bundle.yaml.26apache-hadoop-spark-notebook bundle.yaml.
2727
2828
29 ## Testing the deployment29## Testing the deployment
3030
31 ### Smoke test HDFS admin functionality31### Smoke test HDFS admin functionality
32 Once the deployment is complete and the cluster is running, ssh to the HDFS32Once the deployment is complete and the cluster is running, ssh to the HDFS
33 Master unit:33Master unit:
3434
35 juju ssh hdfs-master/035 juju ssh hdfs-master/0
3636
37 As the `ubuntu` user, create a temporary directory on the Hadoop file system.37As the `ubuntu` user, create a temporary directory on the Hadoop file system.
38 The steps below verify HDFS functionality:38The steps below verify HDFS functionality:
3939
40 hdfs dfs -mkdir -p /tmp/hdfs-test40 hdfs dfs -mkdir -p /tmp/hdfs-test
41 hdfs dfs -chmod -R 777 /tmp/hdfs-test41 hdfs dfs -chmod -R 777 /tmp/hdfs-test
42 hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists42 hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
43 hdfs dfs -rm -R /tmp/hdfs-test43 hdfs dfs -rm -R /tmp/hdfs-test
44 hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed44 hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
45 exit45 exit
4646
47 ### Smoke test YARN and MapReduce47### Smoke test YARN and MapReduce
48 Run the `terasort.sh` script from the Spark unit to generate and sort data. The48Run the `terasort.sh` script from the Spark unit to generate and sort data. The
49 steps below verify that Spark is communicating with the cluster via the plugin49steps below verify that Spark is communicating with the cluster via the plugin
50 and that YARN and MapReduce are working as expected:50and that YARN and MapReduce are working as expected:
5151
52 juju ssh spark/052 juju ssh spark/0
53 ~/terasort.sh53 ~/terasort.sh
54 exit54 exit
5555
56 ### Smoke test HDFS functionality from user space56### Smoke test HDFS functionality from user space
57 From the Spark unit, delete the MapReduce output previously generated by the57From the Spark unit, delete the MapReduce output previously generated by the
58 `terasort.sh` script:58`terasort.sh` script:
5959
60 juju ssh spark/060 juju ssh spark/0
61 hdfs dfs -rm -R /user/ubuntu/tera_demo_out61 hdfs dfs -rm -R /user/ubuntu/tera_demo_out
62 exit62 exit
6363
64 ### Smoke test Spark64### Smoke test Spark
65 SSH to the Spark unit and run the SparkPi demo as follows:65SSH to the Spark unit and run the SparkPi demo as follows:
6666
67 juju ssh spark/067 juju ssh spark/0
68 ~/sparkpi.sh68 ~/sparkpi.sh
69 exit69 exit
7070
71 ### Access the IPython Notebook web interface71### Access the IPython Notebook web interface
72 Access the notebook web interface at72Access the notebook web interface at
73 http://{spark_unit_ip_address}:8880. The ip address can be found by running73http://{spark_unit_ip_address}:8880. The ip address can be found by running
74 `juju status spark/0 | grep public-address`.74`juju status spark/0 | grep public-address`.
7575
7676
77 ## Scale Out Usage77## Scale Out Usage
78 This bundle was designed to scale out. To increase the amount of Compute78This bundle was designed to scale out. To increase the amount of Compute
79 Slaves, you can add units to the compute-slave service. To add one unit:79Slaves, you can add units to the compute-slave service. To add one unit:
8080
81 juju add-unit compute-slave81 juju add-unit compute-slave
8282
83 Or you can add multiple units at once:83Or you can add multiple units at once:
8484
85 juju add-unit -n4 compute-slave85 juju add-unit -n4 compute-slave
8686
8787
88 ## Contact Information88## Contact Information
8989
90 - <bigdata-dev@lists.launchpad.net>90- <bigdata-dev@lists.launchpad.net>
9191
9292
93 ## Help93## Help
9494
95 - [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)95- [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
96 - [Juju community](https://jujucharms.com/community)96- [Juju community](https://jujucharms.com/community)
9797
=== modified file 'bundle.yaml'
--- bundle.yaml 2015-07-16 20:35:31 +0000
+++ bundle.yaml 2016-02-23 21:01:00 +0000
@@ -1,46 +1,46 @@
1services:1services:
2 compute-slave:2 compute-slave:
3 charm: cs:trusty/apache-hadoop-compute-slave3 charm: cs:trusty/apache-hadoop-compute-slave-9
4 num_units: 34 num_units: 3
5 annotations:5 annotations:
6 gui-x: "300"6 gui-x: "300"
7 gui-y: "200"7 gui-y: "200"
8 constraints: mem=3G8 constraints: mem=7G
9 hdfs-master:9 hdfs-master:
10 charm: cs:trusty/apache-hadoop-hdfs-master10 charm: cs:trusty/apache-hadoop-hdfs-master-9
11 num_units: 111 num_units: 1
12 annotations:12 annotations:
13 gui-x: "600"13 gui-x: "600"
14 gui-y: "350"14 gui-y: "350"
15 constraints: mem=7G15 constraints: mem=7G
16 plugin:16 plugin:
17 charm: cs:trusty/apache-hadoop-plugin17 charm: cs:trusty/apache-hadoop-plugin-10
18 annotations:18 annotations:
19 gui-x: "900"19 gui-x: "900"
20 gui-y: "200"20 gui-y: "200"
21 secondary-namenode:21 secondary-namenode:
22 charm: cs:trusty/apache-hadoop-hdfs-secondary22 charm: cs:trusty/apache-hadoop-hdfs-secondary-7
23 num_units: 123 num_units: 1
24 annotations:24 annotations:
25 gui-x: "600"25 gui-x: "600"
26 gui-y: "600"26 gui-y: "600"
27 constraints: mem=7G27 constraints: mem=7G
28 spark:28 spark:
29 charm: cs:trusty/apache-spark29 charm: cs:trusty/apache-spark-6
30 num_units: 130 num_units: 1
31 annotations:31 annotations:
32 gui-x: "1200"32 gui-x: "1200"
33 gui-y: "200"33 gui-y: "200"
34 constraints: mem=3G34 constraints: mem=3G
35 yarn-master:35 yarn-master:
36 charm: cs:trusty/apache-hadoop-yarn-master36 charm: cs:trusty/apache-hadoop-yarn-master-7
37 num_units: 137 num_units: 1
38 annotations:38 annotations:
39 gui-x: "600"39 gui-x: "600"
40 gui-y: "100"40 gui-y: "100"
41 constraints: mem=7G41 constraints: mem=7G
42 notebook:42 notebook:
43 charm: cs:trusty/apache-spark-notebook43 charm: cs:trusty/apache-spark-notebook-3
44 annotations:44 annotations:
45 gui-x: "1200"45 gui-x: "1200"
46 gui-y: "450"46 gui-y: "450"
@@ -53,4 +53,4 @@
53 - [plugin, yarn-master]53 - [plugin, yarn-master]
54 - [plugin, hdfs-master]54 - [plugin, hdfs-master]
55 - [spark, plugin]55 - [spark, plugin]
56 - [notebook, spark]56 - [spark, notebook]
5757
=== removed file 'tests/00-setup'
--- tests/00-setup 2015-07-16 20:35:31 +0000
+++ tests/00-setup 1970-01-01 00:00:00 +0000
@@ -1,8 +0,0 @@
1#!/bin/bash
2
3if ! dpkg -s amulet &> /dev/null; then
4 echo Installing Amulet...
5 sudo add-apt-repository -y ppa:juju/stable
6 sudo apt-get update
7 sudo apt-get -y install amulet
8fi
90
=== modified file 'tests/01-bundle.py'
--- tests/01-bundle.py 2015-07-16 20:35:31 +0000
+++ tests/01-bundle.py 2016-02-23 21:01:00 +0000
@@ -1,61 +1,32 @@
1#!/usr/bin/env python31#!/usr/bin/env python3
22
3import os3import os
4import time
5import unittest4import unittest
65
7import yaml6import yaml
8import amulet7import amulet
98
109
11class Base(object):10class TestBundle(unittest.TestCase):
12 """
13 Base class for tests for Apache Hadoop Bundle.
14 """
15 bundle_file = os.path.join(os.path.dirname(__file__), '..', 'bundle.yaml')11 bundle_file = os.path.join(os.path.dirname(__file__), '..', 'bundle.yaml')
16 profile_name = None
1712
18 @classmethod13 @classmethod
19 def deploy(cls):14 def setUpClass(cls):
20 # classmethod inheritance doesn't work quite right with
21 # setUpClass / tearDownClass, so subclasses have to manually call this
22 cls.d = amulet.Deployment(series='trusty')15 cls.d = amulet.Deployment(series='trusty')
23 with open(cls.bundle_file) as f:16 with open(cls.bundle_file) as f:
24 bun = f.read()17 bun = f.read()
25 profiles = yaml.safe_load(bun)18 bundle = yaml.safe_load(bun)
26 # amulet always selects the first profile, so we have to fudge it here19 cls.d.load(bundle)
27 profile = {cls.profile_name: profiles[cls.profile_name]}20 cls.d.setup(timeout=1800)
28 cls.d.load(profile)21 cls.d.sentry.wait_for_messages({'notebook': 'Ready'}, timeout=1800)
29 cls.d.setup(timeout=9000)22 cls.hdfs = cls.d.sentry['hdfs-master'][0]
30 cls.d.sentry.wait()23 cls.yarn = cls.d.sentry['yarn-master'][0]
31 cls.hdfs = cls.d.sentry.unit['hdfs-master/0']24 cls.slave = cls.d.sentry['compute-slave'][0]
32 cls.yarn = cls.d.sentry.unit['yarn-master/0']25 cls.secondary = cls.d.sentry['secondary-namenode'][0]
33 cls.slave = cls.d.sentry.unit['compute-slave/0']26 cls.spark = cls.d.sentry['spark'][0]
34 cls.secondary = cls.d.sentry.unit['secondary-namenode/0']27 cls.notebook = cls.d.sentry['notebook'][0]
35 cls.plugin = cls.d.sentry.unit['plugin/0']28
36 cls.client = cls.d.sentry.unit['client/0']29 def test_components(self):
37
38 @classmethod
39 def reset_env(cls):
40 # classmethod inheritance doesn't work quite right with
41 # setUpClass / tearDownClass, so subclasses have to manually call this
42 juju_env = amulet.helpers.default_environment()
43 services = ['hdfs-master', 'yarn-master', 'compute-slave', 'secondary-namenode', 'plugin', 'client']
44
45 def check_env_clear():
46 state = amulet.waiter.state(juju_env=juju_env)
47 for service in services:
48 if state.get(service, {}) != {}:
49 return False
50 return True
51
52 for service in services:
53 cls.d.remove(service)
54 with amulet.helpers.timeout(300):
55 while not check_env_clear():
56 time.sleep(5)
57
58 def test_hadoop_components(self):
59 """30 """
60 Confirm that all of the required components are up and running.31 Confirm that all of the required components are up and running.
61 """32 """
@@ -63,17 +34,48 @@
63 yarn, retcode = self.yarn.run("pgrep -a java")34 yarn, retcode = self.yarn.run("pgrep -a java")
64 slave, retcode = self.slave.run("pgrep -a java")35 slave, retcode = self.slave.run("pgrep -a java")
65 secondary, retcode = self.secondary.run("pgrep -a java")36 secondary, retcode = self.secondary.run("pgrep -a java")
66 client, retcode = self.client.run("pgrep -a java")37 spark, retcode = self.spark.run("pgrep -a java")
38 notebook, retcode = self.spark.run("pgrep -a python")
6739
68 # .NameNode needs the . to differentiate it from SecondaryNameNode40 # .NameNode needs the . to differentiate it from SecondaryNameNode
69 assert '.NameNode' in hdfs, "NameNode not started"41 assert '.NameNode' in hdfs, "NameNode not started"
42 assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master"
43 assert '.NameNode' not in slave, "NameNode should not be running on compute-slave"
44 assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode"
45 assert '.NameNode' not in spark, "NameNode should not be running on spark"
46
70 assert 'ResourceManager' in yarn, "ResourceManager not started"47 assert 'ResourceManager' in yarn, "ResourceManager not started"
48 assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master"
49 assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave"
50 assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode"
51 assert 'ResourceManager' not in spark, "ResourceManager should not be running on spark"
52
71 assert 'JobHistoryServer' in yarn, "JobHistoryServer not started"53 assert 'JobHistoryServer' in yarn, "JobHistoryServer not started"
54 assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master"
55 assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave"
56 assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode"
57 assert 'JobHistoryServer' not in spark, "JobHistoryServer should not be running on spark"
58
72 assert 'NodeManager' in slave, "NodeManager not started"59 assert 'NodeManager' in slave, "NodeManager not started"
60 assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master"
61 assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master"
62 assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode"
63 assert 'NodeManager' not in spark, "NodeManager should not be running on spark"
64
73 assert 'DataNode' in slave, "DataServer not started"65 assert 'DataNode' in slave, "DataServer not started"
66 assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master"
67 assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master"
68 assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode"
69 assert 'DataNode' not in spark, "DataNode should not be running on spark"
70
74 assert 'SecondaryNameNode' in secondary, "SecondaryNameNode not started"71 assert 'SecondaryNameNode' in secondary, "SecondaryNameNode not started"
72 assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master"
73 assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master"
74 assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave"
75 assert 'SecondaryNameNode' not in spark, "SecondaryNameNode should not be running on spark"
7576
76 return hdfs, yarn, slave, secondary, client # allow subclasses to do additional checks77 assert 'spark' in spark, 'Spark should be running on spark'
78 assert 'notebook' in notebook, 'Notebook should be running on spark'
7779
78 def test_hdfs_dir(self):80 def test_hdfs_dir(self):
79 """81 """
@@ -84,11 +86,11 @@
8486
85 NB: These are order-dependent, so must be done as part of a single test case.87 NB: These are order-dependent, so must be done as part of a single test case.
86 """88 """
87 output, retcode = self.client.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'")89 output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'")
88 assert retcode == 0, "Created a user directory on hdfs FAILED:\n{}".format(output)90 assert retcode == 0, "Created a user directory on hdfs FAILED:\n{}".format(output)
89 output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'")91 output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'")
90 assert retcode == 0, "Assigning an owner to hdfs directory FAILED:\n{}".format(output)92 assert retcode == 0, "Assigning an owner to hdfs directory FAILED:\n{}".format(output)
91 output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'")93 output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'")
92 assert retcode == 0, "seting directory permission on hdfs FAILED:\n{}".format(output)94 assert retcode == 0, "seting directory permission on hdfs FAILED:\n{}".format(output)
9395
94 def test_yarn_mapreduce_exe(self):96 def test_yarn_mapreduce_exe(self):
@@ -112,59 +114,15 @@
112 ('cleanup', "su hdfs -c 'hdfs dfs -rm -r /user/ubuntu/teragenout'"),114 ('cleanup', "su hdfs -c 'hdfs dfs -rm -r /user/ubuntu/teragenout'"),
113 ]115 ]
114 for name, step in test_steps:116 for name, step in test_steps:
115 output, retcode = self.client.run(step)117 output, retcode = self.spark.run(step)
116 assert retcode == 0, "{} FAILED:\n{}".format(name, output)118 assert retcode == 0, "{} FAILED:\n{}".format(name, output)
117119
118120 def test_spark(self):
119class TestScalable(unittest.TestCase, Base):121 output, retcode = self.spark.run("su ubuntu -c 'bash -lc /home/ubuntu/sparkpi.sh 2>&1'")
120 profile_name = 'apache-core-batch-processing'122 assert 'Pi is roughly' in output, 'SparkPI test failed: %s' % output
121123
122 @classmethod124 def test_notebook(self):
123 def setUpClass(cls):125 pass # requires javascript; how to test?
124 cls.deploy()
125
126 @classmethod
127 def tearDownClass(cls):
128 cls.reset_env()
129
130 def test_hadoop_components(self):
131 """
132 In addition to testing that the components are running where they
133 are supposed to be, confirm that none of them are also running where
134 they shouldn't be.
135 """
136 hdfs, yarn, slave, secondary, client = super(TestScalable, self).test_hadoop_components()
137
138 # .NameNode needs the . to differentiate it from SecondaryNameNode
139 assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master"
140 assert '.NameNode' not in slave, "NameNode should not be running on compute-slave"
141 assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode"
142 assert '.NameNode' not in client, "NameNode should not be running on client"
143
144 assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master"
145 assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave"
146 assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode"
147 assert 'ResourceManager' not in client, "ResourceManager should not be running on client"
148
149 assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master"
150 assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave"
151 assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode"
152 assert 'JobHistoryServer' not in client, "JobHistoryServer should not be running on client"
153
154 assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master"
155 assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master"
156 assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode"
157 assert 'NodeManager' not in client, "NodeManager should not be running on client"
158
159 assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master"
160 assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master"
161 assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode"
162 assert 'DataNode' not in client, "DataNode should not be running on client"
163
164 assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master"
165 assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master"
166 assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave"
167 assert 'SecondaryNameNode' not in client, "SecondaryNameNode should not be running on client"
168126
169127
170if __name__ == '__main__':128if __name__ == '__main__':
171129
=== added file 'tests/tests.yaml'
--- tests/tests.yaml 1970-01-01 00:00:00 +0000
+++ tests/tests.yaml 2016-02-23 21:01:00 +0000
@@ -0,0 +1,3 @@
1reset: false
2packages:
3 - amulet

Subscribers

People subscribed via source and target branches