Merge into trunk : spark-config : Trusty Tahr (14.04) : Code : apache-spark package : Juju Charms Collection

Status:	Needs review
Proposed branch:	lp:~ewollesen/charms/trusty/apache-spark/spark-config
Merge into:	lp:~bigdata-dev/charms/trusty/apache-spark/trunk
Diff against target:	235 lines (+125/-17) 4 files modified config.yaml (+17/-1) hooks/callbacks.py (+20/-16) hooks/eawutils.py (+45/-0) tests/100-spark-config (+43/-0)
To merge this branch:	bzr merge lp:~ewollesen/charms/trusty/apache-spark/spark-config
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Juju Big Data Development		2015-06-02	Pending
Review via email: mp+260782@code.launchpad.net

Description of the change

Adds config options for +spark_local_dir+ and +spark_driver_cores+.

lp:~ewollesen/charms/trusty/apache-spark/spark-config updated on 2015-06-03

13. By Eric Wollesen on 2015-06-03

Merged ~bigdata-dev's trunk

14. By Eric Wollesen on 2015-06-03

Re-arrange the install callback, so config-changed works

15. By Eric Wollesen on 2015-06-03

Take two at a spark config amulet test.

The amulet configure method isn't working as I would expect. It doesn't
appear to be triggering the juju hooks that effect the requested
changes. As a result, the values aren't being written to disk, and
the test fails. Modifying values via juju set, however, works fine.

Unmerged revisions

15. By Eric Wollesen on 2015-06-03

Take two at a spark config amulet test.

The amulet configure method isn't working as I would expect. It doesn't
appear to be triggering the juju hooks that effect the requested
changes. As a result, the values aren't being written to disk, and
the test fails. Modifying values via juju set, however, works fine.

14. By Eric Wollesen on 2015-06-03

Re-arrange the install callback, so config-changed works

13. By Eric Wollesen on 2015-06-03

Merged ~bigdata-dev's trunk

12. By Eric Wollesen on 2015-06-02

Adds configuration for driver cores and local dir.

See config.yaml for details.

Juju Charms Collection
apache-spark package

Merge lp:~ewollesen/charms/trusty/apache-spark/spark-config into lp:~bigdata-dev/charms/trusty/apache-spark/trunk

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers

 === modified file 'config.yaml'
 --- config.yaml	2015-05-30 04:34:18 +0000
 +++ config.yaml	2015-06-03 05:31:01 +0000
@@ -4,4 +4,20 @@
          default: ''
          description: |
              URL from which to fetch resources (e.g., Hadoop binaries) instead of Launchpad.
--
++    spark_driver_cores:
++        type: int
++        default: 1
++        description: |
++            Number of cores to use for the driver process, only in cluster
++            mode.
++    spark_local_dir:
++        type: string
++        default: /tmp
++        description: |
++            Directory to use for "scratch" space in Spark, including map
++            output files and RDDs that get stored on disk. This should be on a
++            fast, local disk in your system. It can also be a comma-separated
++            list of multiple directories on different disks. NOTE: In Spark
++            1.0 and later this will be overriden by SPARK_LOCAL_DIRS
++            (Standalone, Mesos) or LOCAL_DIRS (YARN) environment variables set
++            by the cluster manager.
 === modified file 'hooks/callbacks.py'
 --- hooks/callbacks.py	2015-06-02 13:45:28 +0000
 +++ hooks/callbacks.py	2015-06-03 05:31:01 +0000
@@ -1,4 +1,3 @@
--
  from subprocess import check_output, Popen
  import jujuresources
@@ -6,7 +5,7 @@
  from charmhelpers.core import unitdata
  from charmhelpers.contrib.bigdata import utils
  from path import Path
--
++import eawutils
  class Spark(object):
@@ -18,8 +17,11 @@
          return unitdata.kv().get('spark.installed')
      def install(self, force=False):
--        if not force and self.is_installed():
--            return
++        if force or not self.is_installed():
++            install_spark()
++        self.configure_spark()
++
++    def install_spark(self):
          mirror_url = hookenv.config()['resources_mirror']
          jujuresources.fetch('spark-%s' % self.cpu_arch, mirror_url=mirror_url)
          jujuresources.install('spark-%s' % self.cpu_arch,
@@ -29,9 +31,8 @@
          self.dist_config.add_dirs()
          self.dist_config.add_packages()
          self.setup_spark_config()
--        self.configure_spark()
          unitdata.kv().set('spark.installed', True)
--
++
      def install_demo(self):
          '''
          Install demo.sh script to /home/ubuntu
@@ -42,11 +43,11 @@
          Path(demo_source).copy(demo_target)
          Path(demo_target).chmod(0o755)
          Path(demo_target).chown('ubuntu', 'hadoop')
--
++
      def setup_spark_config(self):
          '''
          copy Spark's default configuration files to spark_conf property defined
--        in dist.yaml
++        in dist.yaml
          '''
          conf_dir = self.dist_config.path('spark') / 'conf'
          self.dist_config.path('spark_conf').rmtree_p()
@@ -64,10 +65,10 @@
          utils.re_edit_in_place(spark_log4j, {
                  r'log4j.rootCategory=INFO, console': 'log4j.rootCategory=ERROR, console',
                  })
--
++
      def configure_spark(self):
          '''
--        Configure spark environment for all users
++        Configure spark environment for all users
          '''
          from subprocess import call
          spark_bin = self.dist_config.path('spark') / 'bin'
@@ -78,12 +79,15 @@
              env['SPARK_CONF_DIR'] = self.dist_config.path('spark_conf')
          self.configure_spark_hdfs()
          self.spark_optimize()
++        spark_default = self.dist_config.path('spark_conf') / 'spark-defaults.conf'
++        spark_config = eawutils.getSparkConfig(hookenv.config())
++        eawutils.updateSparkConfig(spark_default, spark_config)
          cmd = "chown -R ubuntu:hadoop {}".format (spark_home)
          call(cmd.split())
          cmd = "chown -R ubuntu:hadoop {}".format (self.dist_config.path('spark_conf'))
          call(cmd.split())
--
--    def configure_spark_hdfs(self):
++
++    def configure_spark_hdfs(self):
          e = utils.read_etc_env()
          utils.run_as('hdfs', 'hdfs', 'dfs', '-mkdir', '-p', '/user/ubuntu/directory', env=e)
          utils.run_as('hdfs', 'hdfs', 'dfs', '-chown', '-R', 'ubuntu:hadoop', '/user/ubuntu/directory', env=e)
@@ -107,19 +111,19 @@
              r'.*spark.eventLog.enabled *.*':'spark.eventLog.enabled    true',
              r'.*spark.eventLog.dir *.*':'spark.eventLog.dir    hdfs:///user/ubuntu/directory',
                                                                                              })
--
--
++
++
      def start(self):
          e = utils.read_etc_env()
          spark_home = self.dist_config.path('spark')
          if utils.jps("HistoryServer"):
              self.stop()
          utils.run_as('ubuntu', '{}/sbin/start-history-server.sh'.format(spark_home), 'hdfs:///user/ubuntu/directory', env=e)
--
++
      def stop(self):
          e = utils.read_etc_env()
          spark_home = self.dist_config.path('spark')
--        utils.run_as('ubuntu', '{}/sbin/stop-history-server.sh'.format(spark_home), env=e)
++        utils.run_as('ubuntu', '{}/sbin/stop-history-server.sh'.format(spark_home), env=e)
      def cleanup(self):
          self.dist_config.remove_dirs()
 === added file 'hooks/eawutils.py'
 --- hooks/eawutils.py	1970-01-01 00:00:00 +0000
 +++ hooks/eawutils.py	2015-06-03 05:31:01 +0000
@@ -0,0 +1,45 @@
++# These functions should live in charmhelpers.contrib.bigdata.utils or
++# somewhere similar.
++import re
++from charmhelpers.contrib.bigdata import utils
++
++def updateSparkConfig(path, config):
++    """Updates spark config settings in +path+.
++
++    Assumes +path+ is in spark config file syntax."""
++    inserts, updates = calcSparkConfigUpserts(path, config)
++
++    utils.re_edit_in_place(path, updates)
++    with open(path, 'a') as configFile:
++        for item in inserts.items():
++            configFile.write("%s\t%s\n" % item)
++
++def calcSparkConfigUpserts(path, config):
++    """Calculate upserts to transform +path+ to +config+, idempotently.
++
++    Returns (inserts, updates)."""
++    inserts = config.copy()
++    updates = {}
++
++    with open(path, 'r') as configFile:
++        for line in configFile.readlines():
++            if line.startswith("#") or re.match('\A\s*\Z', line):
++                continue
++            key = line.split(None, 1)[0]
++            if key in config:
++                updates["^%s\s.*" % key] = "%s\t%s" % (key, config[key])
++                inserts.pop(key)
++
++    return inserts, updates
++
++def getKeysStartingWith(d, prefix):
++    "Return a dict of the keys prefixed with +prefix+."
++    return dict([(k,v) for k,v in d.items() if k.startswith(prefix)])
++
++def underscoreToDot(d):
++    "Return the dictionary with underscores in keys replaced with dots."
++    return dict([(k.replace("_", "."),v) for k,v in d.items()])
++
++def getSparkConfig(config):
++    "Return a dict of the keys prefixed with 'spark.', that have non-default values."
++    return underscoreToDot(getKeysStartingWith(config, "spark_"))
 === added file 'tests/100-spark-config'
 --- tests/100-spark-config	1970-01-01 00:00:00 +0000
 +++ tests/100-spark-config	2015-06-03 05:31:01 +0000
@@ -0,0 +1,43 @@
++#!/usr/bin/python3
++
++import unittest
++import amulet
++
++
++class TestSparkConfig(unittest.TestCase):
++    """
++    Configuration settings test for Apache Spark.
++    """
++
++    @classmethod
++    def setUpClass(cls):
++        cls.d = amulet.Deployment(series='trusty')
++        #### Deploy a hadoop cluster
++        cls.d.add('yarn-master', charm='cs:~bigdata-dev/trusty/apache-hadoop-yarn-master')
++        cls.d.add('hdfs-master', charm='cs:~bigdata-dev/trusty/apache-hadoop-hdfs-master')
++        cls.d.add('compute-slave', charm='cs:~bigdata-dev/trusty/apache-hadoop-compute-slave', units=2)
++        cls.d.add('hadoop-plugin', charm='cs:~bigdata-dev/trusty/apache-hadoop-plugin')
++        cls.d.relate('yarn-master:namenode', 'hdfs-master:namenode')
++        cls.d.relate('yarn-master:resourcemanager', 'hadoop-plugin:resourcemanager')
++        cls.d.relate('hadoop-plugin:namenode', 'hdfs-master:namenode')
++
++        cls.d.relate('compute-slave:nodemanager', 'yarn-master:nodemanager')
++        cls.d.relate('compute-slave:datanode', 'hdfs-master:datanode')
++
++        ### Add Spark Service
++        cls.d.add('spark', 'apache-spark')
++        cls.d.configure('spark', {'spark_driver_cores': 2,
++                                  'spark_local_dir': '/var'})
++        cls.d.relate('hadoop-plugin:hadoop-plugin', 'spark:hadoop-plugin')
++
++        cls.d.setup(timeout=9000)
++        cls.d.sentry.wait()
++        cls.unit = cls.d.sentry.unit['spark/0']
++
++    def test_config_setting(self):
++        output, retcode = self.unit.run("grep -Pq 'spark.driver.cores\t2' /etc/spark/conf/spark-defaults.conf")
++        self.assertEqual(retcode, 0, 'failed to configure spark service\n')
++
++
++if __name__ == '__main__':
++    unittest.main()

Juju Charms Collectionapache-spark package

Merge lp:~ewollesen/charms/trusty/apache-spark/spark-config into lp:~bigdata-dev/charms/trusty/apache-spark/trunk

Commit message

Description of the change

Unmerged revisions

Preview Diff

Subscribers

Juju Charms Collection
apache-spark package