Juju Charms Collection
apache-hadoop-spark-notebook package

Merge lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk into lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle

Charm Bundles (0.0)
trunk
Merge into bundle

Proposed by Kevin W Monroe on 2016-02-23

Status:

Merged

Merged at revision:

Proposed branch:

lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk

Merge into:

lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle

Diff against target:

462 lines (+147/-194)

5 files modified

README.md (+78/-78)
bundle.yaml (+9/-9)
tests/00-setup (+0/-8)
tests/01-bundle.py (+57/-99)
tests/tests.yaml (+3/-0)

To merge this branch:

bzr merge lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk

Undecided

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Kevin W Monroe			Approve on 2016-02-23
Review via email: mp+286952@code.launchpad.net

Description of the change

updates from bigdata-dev:
- version lock charms in bundle.yaml
- update bundle tests
- fix README formatting

Revision history for this message

Kevin W Monroe (kwmonroe) wrote on 2016-02-23:

+1, test deploy on AWS succeeded

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Big Data Charmers

Juju Big Data Development

charmers

 === modified file 'README.md'
 --- README.md	2015-07-09 15:09:34 +0000
 +++ README.md	2016-02-23 21:01:00 +0000
@@ -16,81 +16,81 @@
    - 1 Notebook (colocated on the Spark unit)
--  ## Usage
--  Deploy this bundle using juju-quickstart:
--
--      juju quickstart u/bigdata-dev/apache-hadoop-spark-notebook
--
--  See `juju quickstart --help` for deployment options, including machine
--  constraints and how to deploy a locally modified version of the
--  apache-hadoop-spark-notebook bundle.yaml.
--
--
--  ## Testing the deployment
--
--  ### Smoke test HDFS admin functionality
--  Once the deployment is complete and the cluster is running, ssh to the HDFS
--  Master unit:
--
--      juju ssh hdfs-master/0
--
--  As the `ubuntu` user, create a temporary directory on the Hadoop file system.
--  The steps below verify HDFS functionality:
--
--      hdfs dfs -mkdir -p /tmp/hdfs-test
--      hdfs dfs -chmod -R 777 /tmp/hdfs-test
--      hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
--      hdfs dfs -rm -R /tmp/hdfs-test
--      hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
--      exit
--
--  ### Smoke test YARN and MapReduce
--  Run the `terasort.sh` script from the Spark unit to generate and sort data. The
--  steps below verify that Spark is communicating with the cluster via the plugin
--  and that YARN and MapReduce are working as expected:
--
--      juju ssh spark/0
--      ~/terasort.sh
--      exit
--
--  ### Smoke test HDFS functionality from user space
--  From the Spark unit, delete the MapReduce output previously generated by the
--  `terasort.sh` script:
--
--      juju ssh spark/0
--      hdfs dfs -rm -R /user/ubuntu/tera_demo_out
--      exit
--
--  ### Smoke test Spark
--  SSH to the Spark unit and run the SparkPi demo as follows:
--
--      juju ssh spark/0
--      ~/sparkpi.sh
--      exit
--
--  ### Access the IPython Notebook web interface
--  Access the notebook web interface at
--  http://{spark_unit_ip_address}:8880. The ip address can be found by running
--  `juju status spark/0 | grep public-address`.
--
--
--  ## Scale Out Usage
--  This bundle was designed to scale out. To increase the amount of Compute
--  Slaves, you can add units to the compute-slave service. To add one unit:
--
--      juju add-unit compute-slave
--
--  Or you can add multiple units at once:
--
--      juju add-unit -n4 compute-slave
--
--
--  ## Contact Information
--
--  - <bigdata-dev@lists.launchpad.net>
--
--
--  ## Help
--
--  - [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
--  - [Juju community](https://jujucharms.com/community)
++## Usage
++Deploy this bundle using juju-quickstart:
++
++    juju quickstart apache-hadoop-spark-notebook
++
++See `juju quickstart --help` for deployment options, including machine
++constraints and how to deploy a locally modified version of the
++apache-hadoop-spark-notebook bundle.yaml.
++
++
++## Testing the deployment
++
++### Smoke test HDFS admin functionality
++Once the deployment is complete and the cluster is running, ssh to the HDFS
++Master unit:
++
++    juju ssh hdfs-master/0
++
++As the `ubuntu` user, create a temporary directory on the Hadoop file system.
++The steps below verify HDFS functionality:
++
++    hdfs dfs -mkdir -p /tmp/hdfs-test
++    hdfs dfs -chmod -R 777 /tmp/hdfs-test
++    hdfs dfs -ls /tmp # verify the newly created hdfs-test subdirectory exists
++    hdfs dfs -rm -R /tmp/hdfs-test
++    hdfs dfs -ls /tmp # verify the hdfs-test subdirectory has been removed
++    exit
++
++### Smoke test YARN and MapReduce
++Run the `terasort.sh` script from the Spark unit to generate and sort data. The
++steps below verify that Spark is communicating with the cluster via the plugin
++and that YARN and MapReduce are working as expected:
++
++    juju ssh spark/0
++    ~/terasort.sh
++    exit
++
++### Smoke test HDFS functionality from user space
++From the Spark unit, delete the MapReduce output previously generated by the
++`terasort.sh` script:
++
++    juju ssh spark/0
++    hdfs dfs -rm -R /user/ubuntu/tera_demo_out
++    exit
++
++### Smoke test Spark
++SSH to the Spark unit and run the SparkPi demo as follows:
++
++    juju ssh spark/0
++    ~/sparkpi.sh
++    exit
++
++### Access the IPython Notebook web interface
++Access the notebook web interface at
++http://{spark_unit_ip_address}:8880. The ip address can be found by running
++`juju status spark/0 | grep public-address`.
++
++
++## Scale Out Usage
++This bundle was designed to scale out. To increase the amount of Compute
++Slaves, you can add units to the compute-slave service. To add one unit:
++
++    juju add-unit compute-slave
++
++Or you can add multiple units at once:
++
++    juju add-unit -n4 compute-slave
++
++
++## Contact Information
++
++- <bigdata-dev@lists.launchpad.net>
++
++
++## Help
++
++- [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
++- [Juju community](https://jujucharms.com/community)
 === modified file 'bundle.yaml'
 --- bundle.yaml	2015-07-16 20:35:31 +0000
 +++ bundle.yaml	2016-02-23 21:01:00 +0000
@@ -1,46 +1,46 @@
  services:
    compute-slave:
--    charm: cs:trusty/apache-hadoop-compute-slave
++    charm: cs:trusty/apache-hadoop-compute-slave-9
      num_units: 3
      annotations:
        gui-x: "300"
        gui-y: "200"
--    constraints: mem=3G
++    constraints: mem=7G
    hdfs-master:
--    charm: cs:trusty/apache-hadoop-hdfs-master
++    charm: cs:trusty/apache-hadoop-hdfs-master-9
      num_units: 1
      annotations:
        gui-x: "600"
        gui-y: "350"
      constraints: mem=7G
    plugin:
--    charm: cs:trusty/apache-hadoop-plugin
++    charm: cs:trusty/apache-hadoop-plugin-10
      annotations:
        gui-x: "900"
        gui-y: "200"
    secondary-namenode:
--    charm: cs:trusty/apache-hadoop-hdfs-secondary
++    charm: cs:trusty/apache-hadoop-hdfs-secondary-7
      num_units: 1
      annotations:
        gui-x: "600"
        gui-y: "600"
      constraints: mem=7G
    spark:
--    charm: cs:trusty/apache-spark
++    charm: cs:trusty/apache-spark-6
      num_units: 1
      annotations:
        gui-x: "1200"
        gui-y: "200"
      constraints: mem=3G
    yarn-master:
--    charm: cs:trusty/apache-hadoop-yarn-master
++    charm: cs:trusty/apache-hadoop-yarn-master-7
      num_units: 1
      annotations:
        gui-x: "600"
        gui-y: "100"
      constraints: mem=7G
    notebook:
--    charm: cs:trusty/apache-spark-notebook
++    charm: cs:trusty/apache-spark-notebook-3
      annotations:
        gui-x: "1200"
        gui-y: "450"
@@ -53,4 +53,4 @@
    - [plugin, yarn-master]
    - [plugin, hdfs-master]
    - [spark, plugin]
--  - [notebook, spark]
++  - [spark, notebook]
 === removed file 'tests/00-setup'
 --- tests/00-setup	2015-07-16 20:35:31 +0000
 +++ tests/00-setup	1970-01-01 00:00:00 +0000
@@ -1,8 +0,0 @@
--#!/bin/bash
--
--if ! dpkg -s amulet &> /dev/null; then
--    echo Installing Amulet...
--    sudo add-apt-repository -y ppa:juju/stable
--    sudo apt-get update
--    sudo apt-get -y install amulet
--fi
 === modified file 'tests/01-bundle.py'
 --- tests/01-bundle.py	2015-07-16 20:35:31 +0000
 +++ tests/01-bundle.py	2016-02-23 21:01:00 +0000
@@ -1,61 +1,32 @@
  #!/usr/bin/env python3
  import os
--import time
  import unittest
  import yaml
  import amulet
--class Base(object):
--    """
--    Base class for tests for Apache Hadoop Bundle.
--    """
++class TestBundle(unittest.TestCase):
      bundle_file = os.path.join(os.path.dirname(__file__), '..', 'bundle.yaml')
--    profile_name = None
      @classmethod
--    def deploy(cls):
--        # classmethod inheritance doesn't work quite right with
--        # setUpClass / tearDownClass, so subclasses have to manually call this
++    def setUpClass(cls):
          cls.d = amulet.Deployment(series='trusty')
          with open(cls.bundle_file) as f:
              bun = f.read()
--        profiles = yaml.safe_load(bun)
--        # amulet always selects the first profile, so we have to fudge it here
--        profile = {cls.profile_name: profiles[cls.profile_name]}
--        cls.d.load(profile)
--        cls.d.setup(timeout=9000)
--        cls.d.sentry.wait()
--        cls.hdfs = cls.d.sentry.unit['hdfs-master/0']
--        cls.yarn = cls.d.sentry.unit['yarn-master/0']
--        cls.slave = cls.d.sentry.unit['compute-slave/0']
--        cls.secondary = cls.d.sentry.unit['secondary-namenode/0']
--        cls.plugin = cls.d.sentry.unit['plugin/0']
--        cls.client = cls.d.sentry.unit['client/0']
--
--    @classmethod
--    def reset_env(cls):
--        # classmethod inheritance doesn't work quite right with
--        # setUpClass / tearDownClass, so subclasses have to manually call this
--        juju_env = amulet.helpers.default_environment()
--        services = ['hdfs-master', 'yarn-master', 'compute-slave', 'secondary-namenode', 'plugin', 'client']
--
--        def check_env_clear():
--            state = amulet.waiter.state(juju_env=juju_env)
--            for service in services:
--                if state.get(service, {}) != {}:
--                    return False
--            return True
--
--        for service in services:
--            cls.d.remove(service)
--        with amulet.helpers.timeout(300):
--            while not check_env_clear():
--                time.sleep(5)
--
--    def test_hadoop_components(self):
++        bundle = yaml.safe_load(bun)
++        cls.d.load(bundle)
++        cls.d.setup(timeout=1800)
++        cls.d.sentry.wait_for_messages({'notebook': 'Ready'}, timeout=1800)
++        cls.hdfs = cls.d.sentry['hdfs-master'][0]
++        cls.yarn = cls.d.sentry['yarn-master'][0]
++        cls.slave = cls.d.sentry['compute-slave'][0]
++        cls.secondary = cls.d.sentry['secondary-namenode'][0]
++        cls.spark = cls.d.sentry['spark'][0]
++        cls.notebook = cls.d.sentry['notebook'][0]
++
++    def test_components(self):
          """
          Confirm that all of the required components are up and running.
          """
@@ -63,17 +34,48 @@
          yarn, retcode = self.yarn.run("pgrep -a java")
          slave, retcode = self.slave.run("pgrep -a java")
          secondary, retcode = self.secondary.run("pgrep -a java")
--        client, retcode = self.client.run("pgrep -a java")
++        spark, retcode = self.spark.run("pgrep -a java")
++        notebook, retcode = self.spark.run("pgrep -a python")
          # .NameNode needs the . to differentiate it from SecondaryNameNode
          assert '.NameNode' in hdfs, "NameNode not started"
++        assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master"
++        assert '.NameNode' not in slave, "NameNode should not be running on compute-slave"
++        assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode"
++        assert '.NameNode' not in spark, "NameNode should not be running on spark"
++
          assert 'ResourceManager' in yarn, "ResourceManager not started"
++        assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master"
++        assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave"
++        assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode"
++        assert 'ResourceManager' not in spark, "ResourceManager should not be running on spark"
++
          assert 'JobHistoryServer' in yarn, "JobHistoryServer not started"
++        assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master"
++        assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave"
++        assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode"
++        assert 'JobHistoryServer' not in spark, "JobHistoryServer should not be running on spark"
++
          assert 'NodeManager' in slave, "NodeManager not started"
++        assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master"
++        assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master"
++        assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode"
++        assert 'NodeManager' not in spark, "NodeManager should not be running on spark"
++
          assert 'DataNode' in slave, "DataServer not started"
++        assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master"
++        assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master"
++        assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode"
++        assert 'DataNode' not in spark, "DataNode should not be running on spark"
++
          assert 'SecondaryNameNode' in secondary, "SecondaryNameNode not started"
++        assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master"
++        assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master"
++        assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave"
++        assert 'SecondaryNameNode' not in spark, "SecondaryNameNode should not be running on spark"
--        return hdfs, yarn, slave, secondary, client  # allow subclasses to do additional checks
++        assert 'spark' in spark, 'Spark should be running on spark'
++        assert 'notebook' in notebook, 'Notebook should be running on spark'
      def test_hdfs_dir(self):
          """
@@ -84,11 +86,11 @@
          NB: These are order-dependent, so must be done as part of a single test case.
          """
--        output, retcode = self.client.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'")
++        output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -mkdir -p /user/ubuntu'")
          assert retcode == 0, "Created a user directory on hdfs FAILED:\n{}".format(output)
--        output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'")
++        output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chown ubuntu:ubuntu /user/ubuntu'")
          assert retcode == 0, "Assigning an owner to hdfs directory FAILED:\n{}".format(output)
--        output, retcode = self.client.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'")
++        output, retcode = self.spark.run("su hdfs -c 'hdfs dfs -chmod -R 755 /user/ubuntu'")
          assert retcode == 0, "seting directory permission on hdfs FAILED:\n{}".format(output)
      def test_yarn_mapreduce_exe(self):
@@ -112,59 +114,15 @@
              ('cleanup',      "su hdfs -c 'hdfs dfs -rm -r /user/ubuntu/teragenout'"),
+         ]
          for name, step in test_steps:
--            output, retcode = self.client.run(step)
++            output, retcode = self.spark.run(step)
              assert retcode == 0, "{} FAILED:\n{}".format(name, output)
--
--class TestScalable(unittest.TestCase, Base):
--    profile_name = 'apache-core-batch-processing'
--
--    @classmethod
--    def setUpClass(cls):
--        cls.deploy()
--
--    @classmethod
--    def tearDownClass(cls):
--        cls.reset_env()
--
--    def test_hadoop_components(self):
--        """
--        In addition to testing that the components are running where they
--        are supposed to be, confirm that none of them are also running where
--        they shouldn't be.
--        """
--        hdfs, yarn, slave, secondary, client = super(TestScalable, self).test_hadoop_components()
--
--        # .NameNode needs the . to differentiate it from SecondaryNameNode
--        assert '.NameNode' not in yarn, "NameNode should not be running on yarn-master"
--        assert '.NameNode' not in slave, "NameNode should not be running on compute-slave"
--        assert '.NameNode' not in secondary, "NameNode should not be running on secondary-namenode"
--        assert '.NameNode' not in client, "NameNode should not be running on client"
--
--        assert 'ResourceManager' not in hdfs, "ResourceManager should not be running on hdfs-master"
--        assert 'ResourceManager' not in slave, "ResourceManager should not be running on compute-slave"
--        assert 'ResourceManager' not in secondary, "ResourceManager should not be running on secondary-namenode"
--        assert 'ResourceManager' not in client, "ResourceManager should not be running on client"
--
--        assert 'JobHistoryServer' not in hdfs, "JobHistoryServer should not be running on hdfs-master"
--        assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on compute-slave"
--        assert 'JobHistoryServer' not in secondary, "JobHistoryServer should not be running on secondary-namenode"
--        assert 'JobHistoryServer' not in client, "JobHistoryServer should not be running on client"
--
--        assert 'NodeManager' not in yarn, "NodeManager should not be running on yarn-master"
--        assert 'NodeManager' not in hdfs, "NodeManager should not be running on hdfs-master"
--        assert 'NodeManager' not in secondary, "NodeManager should not be running on secondary-namenode"
--        assert 'NodeManager' not in client, "NodeManager should not be running on client"
--
--        assert 'DataNode' not in yarn, "DataNode should not be running on yarn-master"
--        assert 'DataNode' not in hdfs, "DataNode should not be running on hdfs-master"
--        assert 'DataNode' not in secondary, "DataNode should not be running on secondary-namenode"
--        assert 'DataNode' not in client, "DataNode should not be running on client"
--
--        assert 'SecondaryNameNode' not in yarn, "SecondaryNameNode should not be running on yarn-master"
--        assert 'SecondaryNameNode' not in hdfs, "SecondaryNameNode should not be running on hdfs-master"
--        assert 'SecondaryNameNode' not in slave, "SecondaryNameNode should not be running on compute-slave"
--        assert 'SecondaryNameNode' not in client, "SecondaryNameNode should not be running on client"
++    def test_spark(self):
++        output, retcode = self.spark.run("su ubuntu -c 'bash -lc /home/ubuntu/sparkpi.sh 2>&1'")
++        assert 'Pi is roughly' in output, 'SparkPI test failed: %s' % output
++
++    def test_notebook(self):
++        pass  # requires javascript; how to test?
  if __name__ == '__main__':
 === added file 'tests/tests.yaml'
 --- tests/tests.yaml	1970-01-01 00:00:00 +0000
 +++ tests/tests.yaml	2016-02-23 21:01:00 +0000
@@ -0,0 +1,3 @@
++reset: false
++packages:
++  - amulet

Juju Charms Collectionapache-hadoop-spark-notebook package

Merge lp:~bigdata-dev/charms/bundles/apache-hadoop-spark-notebook/trunk into lp:~charmers/charms/bundles/apache-hadoop-spark-notebook/bundle

Commit message

Description of the change

Preview Diff

Subscribers

Juju Charms Collection
apache-hadoop-spark-notebook package