Graylog Charm

Merge ~raychan96/charm-graylog:1970964_and_1835156 into charm-graylog:master

Proposed by Chi Wai CHAN on 2022-12-21

Status:

Merged

Approved by:

Eric Chen on 2022-12-26

Approved revision:

c4365e49ac50e96d4af4d1cd57ddc6ca3166d5d2

Merged at revision:

4f92d5ba4a8fe0f32622190d9214300e8dc3281e

Proposed branch:

~raychan96/charm-graylog:1970964_and_1835156

Merge into:

charm-graylog:master

Diff against target:

485 lines (+159/-90)

7 files modified

src/lib/charms/layer/elasticsearch/api.py (+29/-16)
src/lib/charms/layer/graylog/__init__.py (+1/-1)
src/lib/charms/layer/graylog/api.py (+45/-12)
src/lib/charms/layer/graylog/constants.py (+0/-1)
src/reactive/graylog.py (+53/-29)
src/tests/unit/requirements.txt (+1/-0)
src/tests/unit/test_es_api.py (+30/-31)

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
JamesLin		2022-12-21	Approve on 2022-12-26
🤖 prod-jenkaas-bootstack (community)	continuous-integration		Approve on 2022-12-23
Eric Chen		2022-12-21	Approve on 2022-12-23
BootStack Reviewers		2022-12-21	Pending
BootStack Reviewers		2022-12-21	Pending
Review via email: mp+434900@code.launchpad.net

Commit message

This patch fixes #1970964 and #1835156 (the two are related).

Description of the change

* Properly handle non-healthy (not green) elasticsearch cluster with extra hooks. (addressed #1970964)

* Add health check to graylog api and properly handle the cases when graylog api is not ready.

* Installation of content packs should only be done once; calling it multiple times will lead to HTTP 400 bad request.

Revision history for this message

🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote on 2022-12-21:

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2022-12-21:

FAILED: Continuous integration, rev:f4e7c73d3cd4197fe9cca54a4342ea5a99f57c0b
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/203/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-functest/1784/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/1837/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/203//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

JamesLin (jneo8) wrote on 2022-12-21:

Retry should based on we know which exception to handle.

Revision history for this message

Chi Wai CHAN (raychan96) on 2022-12-21:

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2022-12-21:

FAILED: Continuous integration, rev:c41928d7918b7654f11d3b151a509dda8d841dc9
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/204/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-functest/1786/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/1839/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/204//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Eric Chen (eric-chen) wrote on 2022-12-22:

Compared the solution from Sudeep. It seems very similiar.
One is included in ElasticSearchApi , another one is an adaptor when using ElasticSearchApi

https://code.launchpad.net/~sudeephb/charm-graylog/+git/charm-graylog/+merge/433720

The common problem: it may block the whole juju agent process.

Is it possible that
- when connection (elasticsearch) fail or healthy check fail, graylog's status will become error
- juju agent still can handle other event, eg: config-change
- recover when the connection okay

review: Needs Fixing

Revision history for this message

JamesLin (jneo8) wrote on 2022-12-22:

It'a a little bit grey area here because there will have multiple status of ElasticSearch.

1. ElasticSearch is not health, something went wrong.
2. ElasticSearch is still initiating, just need more time.
3. All nodes are health
3. Some nodes are not health

Can we just set graylog charm to BlockState(waiting for ElasticSearch healthy) instead of raise error? Error status may not be precise here.

Also another topic is: The green/yellow/red lights in elasticsearch are on each node. But our checking logic seems don't handle the details like "how many percentage of nodes are healthy then we can use it."

Also the checking logic should be include in update_status hook(This should be another task)

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2022-12-23:

FAILED: Continuous integration, rev:67ee699d90e0c562cda5e5504859a98c2aacbe9a
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/205/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-functest/1794/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/1847/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/205//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Eric Chen (eric-chen) wrote on 2022-12-23:

LGTM, wait for jenkins CI pass

review: Approve

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2022-12-23:

FAILED: Continuous integration, rev:8178486678699e4044919c7c3830a99108315cb5
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/206/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-functest/1795/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/1848/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/206//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

Chi Wai CHAN (raychan96) wrote on 2022-12-23:

Interesting pattern when comparing one of the previous successful CI run and current failed CI: they both Error at checking mongodb health for gl2 tests (graylog 2, bionic tests). But the successful CI run passed the test after 3 retries.; while the current failed CI failed because there's no retry now. I am adding back the retry logic in the request function.

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2022-12-23:

FAILED: Continuous integration, rev:9b9af11feaccbe9460995f73a5fe794774e72129
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/207/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-functest/1796/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/1849/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/207//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2022-12-23:

PASSED: Continuous integration, rev:c4365e49ac50e96d4af4d1cd57ddc6ca3166d5d2
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/208/
Executed test runs:
SUCCESS: https://jenkins.canonical.com/bootstack/job/lp-charm-test-functest/1797/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/1850/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/208//rebuild

review: Approve (continuous-integration)

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2022-12-23:

FAILED: Continuous integration, rev:9b9af11feaccbe9460995f73a5fe794774e72129
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/209/
Executed test runs:
FAILURE: https://jenkins.canonical.com/bootstack/job/lp-charm-test-functest/1798/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/1851/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/209//rebuild

review: Needs Fixing (continuous-integration)

Revision history for this message

🤖 prod-jenkaas-bootstack (prod-jenkaas-bootstack) wrote on 2022-12-23:

PASSED: Continuous integration, rev:c4365e49ac50e96d4af4d1cd57ddc6ca3166d5d2
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/210/
Executed test runs:
SUCCESS: https://jenkins.canonical.com/bootstack/job/lp-charm-test-functest/1799/
None: https://jenkins.canonical.com/bootstack/job/lp-update-mp/1852/

Click here to trigger a rebuild:
https://jenkins.canonical.com/bootstack/job/lp-charm-graylog-ci/210//rebuild

review: Approve (continuous-integration)

Revision history for this message

JamesLin (jneo8) on 2022-12-26:

review: Approve

Revision history for this message

🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote on 2022-12-26:

Change successfully merged at revision 4f92d5ba4a8fe0f32622190d9214300e8dc3281e

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Chi Wai CHAN

 diff --git a/src/lib/charms/layer/elasticsearch/api.py b/src/lib/charms/layer/elasticsearch/api.py
 index 811432b..b78064c 100644
 --- a/src/lib/charms/layer/elasticsearch/api.py
 +++ b/src/lib/charms/layer/elasticsearch/api.py
@@ -1,30 +1,43 @@
  """Elastic search API connector."""
  import json
++from urllib.parse import urljoin
  from charmhelpers.core import hookenv
  import requests
++class ElasticSearchNotHealthy(Exception):
++    pass
++
++
  class ElasticSearchApi(object):
      def __init__(self, endpoints):
--        self.endpoints = endpoints
--        self.reachable_ep = None
--        for ep in self.endpoints:
--            try:
--                hookenv.log("Trying to connect to endpoint {}".format(ep))
--                requests.get(ep)
--                self.reachable_ep = ep
--                break
--            except Exception:
--                hookenv.log("Endpoint {} is not a valid endpoint".format(ep))
--
--        if self.reachable_ep is not None:
--            hookenv.log("Connected to ES endpoint: {}".format(self.reachable_ep))
++        # any endpoint will do
++        self.endpoint = endpoints[0]
++
++    def health_check(self):
++        response = None
++        health_endpoint = urljoin(
++            self.endpoint, "/_cluster/health?wait_for_status=green&timeout=10s"
++        )
++        try:
++            hookenv.log("Verifying ES cluster is healthy (green).")
++            response = requests.get(health_endpoint)
++        except Exception:
++            message = "Endpoint {} is not a valid endpoint".format(health_endpoint)
++            hookenv.log(message)
++            raise ConnectionError(message)
          else:
--            hookenv.log("Can not connect to ES API")
--            raise ConnectionError
++            result = response.json()
++            message = "ES cluster is healthy (green)."
++            if result.get("status") == "green":
++                hookenv.log(message)
++            else:
++                message = "ES cluster is not healthy or cannot connect to ES."
++                hookenv.log(message)
++                raise ElasticSearchNotHealthy(message)
      # NOTE(erlon): As per Graylog's docs, graylog can inadvertently create a
      # index called 'graylog_deflector' on elastic search. This index conflicts
@@ -36,5 +49,5 @@ class ElasticSearchApi(object):
              "persistent": {"action.auto_create_index": "-graylog_deflector,+*"}
+         }
          api_params = json.dumps(api_params)
--        url = "%s%s" % (self.reachable_ep, api_url)
++        url = urljoin(self.endpoint, api_url)
          requests.put(url, data=api_params)
 diff --git a/src/lib/charms/layer/graylog/__init__.py b/src/lib/charms/layer/graylog/__init__.py
 index 5bab227..fa74c7d 100644
 --- a/src/lib/charms/layer/graylog/__init__.py
 +++ b/src/lib/charms/layer/graylog/__init__.py
@@ -1,6 +1,6 @@
  """Graylog library."""
--from .api import GraylogApi  # noqa: F401
++from .api import ApiTimeout, GraylogApi  # noqa: F401
  from .logextract import (  # noqa: F401
      GraylogPipelines,
      GraylogRules,
 diff --git a/src/lib/charms/layer/graylog/api.py b/src/lib/charms/layer/graylog/api.py
 index d93e741..cdda27b 100644
 --- a/src/lib/charms/layer/graylog/api.py
 +++ b/src/lib/charms/layer/graylog/api.py
@@ -4,6 +4,11 @@ import os
  import requests
++from tenacity import retry
++from tenacity.retry import retry_if_exception_type
++from tenacity.stop import stop_after_delay
++from tenacity.wait import wait_fixed
++
  # When using 'certifi' from the virtualenv, the system-wide certificates store
  # is not used, so installed certificates won't be used to validate hosts.
@@ -11,6 +16,7 @@ import requests
  # https://git.launchpad.net/ubuntu/+source/python-certifi/tree/debian/patches/0001-Use-Debian-provided-etc-ssl-certs-ca-certificates.cr.patch
  SYSTEM_CA_BUNDLE = "/etc/ssl/certs/ca-certificates.crt"
  DEFAULT_BACKEND_USER_ROLE = "Reader"
++DEFAULT_REST_API_TIMEOUT = 10
  # We are in a charm environment
  charm = False
@@ -25,6 +31,12 @@ def get_ignore_indexer_failures_file():  # noqa: D103
      return "/usr/local/lib/nagios/plugins/ignore_indexer_failures.timestamp"
++class ApiTimeout(Exception):
++    """Unable to restart Graylog in a timely manner."""
++
++    pass
++
++
  class GraylogApi:
      """Manage Graylog via its API."""
@@ -41,10 +53,23 @@ class GraylogApi:
          self.token_name = token_name
          self.token = None
          self.input_types = None
--        self.req_timeout = 3
          self.req_retries = 4
++        self.req_timeout = 3
          self.verify = verify
++    @retry(
++        wait=wait_fixed(5),
++        stop=stop_after_delay(DEFAULT_REST_API_TIMEOUT),
++        retry=retry_if_exception_type(ApiTimeout),
++        reraise=True,
++    )
++    def health_check(self):
++        health_check_endpoint = ""
++        if not self.request(health_check_endpoint, prime_token=False):
++            raise ApiTimeout(
++                "Timeout waiting for graylog api; will retry after update-status."
++            )
++
      def request(  # noqa: C901
          self, path, method="GET", data={}, params=None, prime_token=True
      ):
@@ -65,11 +90,13 @@ class GraylogApi:
+         }
          if data:
              data = json.dumps(data, indent=True)
++
          tries = 0
--        while tries < self.req_retries:
++        result = False
++        while tries < self.req_retries and not result:
              tries += 1
              try:
--                resp = requests.api.request(
++                resp = requests.request(
                      method,
                      url,
                      auth=self.auth,
@@ -79,24 +106,30 @@ class GraylogApi:
                      timeout=self.req_timeout,
                      verify=SYSTEM_CA_BUNDLE if self.verify is None else self.verify,
+                 )
++            except Exception as ex:
++                msg = "Error calling graylog api: {}".format(ex)
++                if charm:
++                    log(msg)
++                else:
++                    print(msg)
++                result = False
++            else:
                  if resp.ok:
                      if method == "DELETE":
--                        return True
++                        result = True
                      if resp.content:
--                        return resp.json()
++                        result = resp.json()
                  else:
--                    msg = "API error code: {}".format(resp.status_code)
++                    msg = "{}: response of graylog api is not okay. Reason: {}".format(
++                        resp.status_code, resp.reason
++                    )
                      if charm:
                          log(msg)
                          hookenv.status_set("blocked", msg)
                      else:
                          print(msg)
--            except Exception as ex:
--                msg = "Error calling graylog api: {}".format(ex)
--                if charm:
--                    log(msg)
--                else:
--                    print(msg)
++                    result = False
++        return result
      def token_get(self, token_name=None, halt=False):
          """Return a token."""
 diff --git a/src/lib/charms/layer/graylog/constants.py b/src/lib/charms/layer/graylog/constants.py
 index 4340dc8..2c5f733 100644
 --- a/src/lib/charms/layer/graylog/constants.py
 +++ b/src/lib/charms/layer/graylog/constants.py
@@ -14,7 +14,6 @@ SHIPPED_SNAP_SERVER_DEFAULT_CONF_FILE = (
  SERVER_DEFAULT_CONF_FILE = "/var/snap/graylog/current/default-graylog-server"
  ELASTICSEARCH_DISCOVERY_PORT = "9300"
  SERVICE_NAME = "snap.graylog.graylog"
--DEFAULT_REST_API_TIMEOUT = 120
  NAGIOS_USERNAME = "nagios"
  CERT_PATH = os.path.join(SNAP_COMMON_DIR, "server.crt")
  CERT_KEY_PATH = os.path.join(SNAP_COMMON_DIR, "server.key")
 diff --git a/src/reactive/graylog.py b/src/reactive/graylog.py
 index b6f4313..ffd17ad 100644
 --- a/src/reactive/graylog.py
 +++ b/src/reactive/graylog.py
@@ -17,6 +17,7 @@ import charms.leadership
  from charms.layer import snap, tls_client
  from charms.layer.elasticsearch import api as esapi
  from charms.layer.graylog import (
++    ApiTimeout,
      GraylogApi,
      LogExtractPipeline,
      create_or_update_ldap_backend,
@@ -32,7 +33,6 @@ from charms.layer.graylog.constants import (
      CERT_PATH,
      CONF_FILE,
      CONTENT_PACKS_PATH,
--    DEFAULT_REST_API_TIMEOUT,
      ELASTICSEARCH_DISCOVERY_PORT,
      NAGIOS_USERNAME,
      SERVER_DEFAULT_CONF_FILE,
@@ -60,12 +60,6 @@ from charms.reactive.helpers import data_changed
  import yaml
--class ApiTimeout(Exception):
--    """Unable to restart Graylog in a timely manner."""
--
--    pass
--
--
  @hook("upgrade-charm")
  def upgrade_charm():
      """Reconfigure Graylog upon Juju charm upgrade."""
@@ -326,6 +320,7 @@ def report_status():  # noqa: C901
      beats_available = is_state("beat.setup")
      es_connected = is_state("elasticsearch.connected")
      es_available = is_state("elasticsearch.available")
++    es_ready = is_state("elasticsearch.ready")
      mongodb_connected = is_state("mongodb.connected")
      mongodb_available = is_state("mongodb.available")
      requested_certs = is_state("graylog.certificates.configured")
@@ -389,7 +384,7 @@ def report_status():  # noqa: C901
          # Elasticsearch
          if es_connected and not es_available:
              waiting_apps.append("elasticsearch")
--        elif es_available:
++        elif es_available and es_ready:
              ready_apps.append("elasticsearch")
          # MongoDB
@@ -400,7 +395,7 @@ def report_status():  # noqa: C901
          # Graylog REST API
          try:
--            _verify_rest_api_is_alive(timeout=5)
++            _verify_rest_api_is_alive()
          except ApiTimeout:
              waiting_apps.append("REST API")
@@ -415,6 +410,15 @@ def report_status():  # noqa: C901
  @when("graylog.configured")
++@when("graylog_api.configured")
++@when_not("graylog.content_pack_installed")
++def configure_content_packs():
++    """Install graylog content packs."""
++    install_content_packs()
++    set_state("graylog.content_pack_installed")
++
++
++@when("graylog.configured")
  @when("mongodb_config.set")
  @when("elasticsearch_config.set")
  @when_not("graylog.needs_restart")
@@ -426,9 +430,8 @@ def configure_graylog_api(*discard):
      except ApiTimeout:
          # Corner case: ES/Mongo are up, but REST API is not up yet.
          # Just wait (status already set in report_status()) and try again next time.
--        pass
++        remove_state("graylog_api.configured")
      else:
--        install_content_packs()
          remove_state("beat.setup")
          remove_state("graylog_index_sets.configured")
          remove_state("graylog_inputs.configured")
@@ -723,7 +726,29 @@ def configure_inputs(*discard):
      set_state("graylog_inputs.configured")
++@when_not("elasticsearch.ready")
++@when("elasticsearch.available")
++def check_elasticsearch_health(elasticsearch):
++    """Check if ES cluster is in green state."""
++    http_hosts = [
++        "http://{}:{}".format(unit["host"], unit["port"])
++        for unit in elasticsearch.list_unit_data()
++    ]
++    es = esapi.ElasticSearchApi(http_hosts)
++    try:
++        es.health_check()
++    except ConnectionError as conn_err:
++        remove_state("elasticsearch.ready")
++        hookenv.log("ES cluster not ready: {}".format(conn_err))
++    except esapi.ElasticSearchNotHealthy as not_healthy_err:
++        remove_state("elasticsearch.ready")
++        hookenv.log("ES cluster not ready: {}".format(not_healthy_err))
++    else:
++        set_state("elasticsearch.ready")
++
++
  @when("graylog.configured")
++@when("elasticsearch.ready")
  @when("elasticsearch.available")
  def configure_elasticsearch(elasticsearch):
      """Configure ES parameters in Graylog's configuration file."""
@@ -940,13 +965,13 @@ def restart_service(service=SERVICE_NAME):
      This handles situations when relations are adding quick enough to trigger
      systemd's standard protection against frequent restarts.
      """
--    host.service_restart(service)
++    num_retries = 2
      remove_state("graylog.needs_restart")
--    if host.service_running(service):
--        return
--
--    time.sleep(15)
--    host.service_restart(service)
++    for i in range(num_retries):
++        host.service_restart(service)
++        if host.service_running(service):
++            return
++        time.sleep(15)
  def set_conf(key, value, conf_path=CONF_FILE):
@@ -1197,6 +1222,7 @@ def set_up_nagios_user():
  @when("leadership.is_leader")
  @when("leadership.set.nagios_password")
++@when("graylog_api.configured")
  @when_not("leadership.set.nagios_token")
  def set_up_nagios_token():
      """Configure a token for Nagios to use."""
@@ -1376,7 +1402,7 @@ def trigger_restart_after_tls_cert_update():
      set_state("graylog.initial_certs_received")
      remove_state("tls_client.certs.saved")
      remove_state("tls_client.server.certs.changed")
--    set_state("graylog.needs_restart")
++    flag_restart_and_api_reconfigure_needed()
  @when("graylog.configured")
@@ -1449,19 +1475,17 @@ def get_default_graylog_client():  # noqa: D103
+     )
--def _verify_rest_api_is_alive(timeout=DEFAULT_REST_API_TIMEOUT):
++def _verify_rest_api_is_alive():
      hookenv.log("Verifying REST API is alive...")
      g = get_default_graylog_client()
--    url = ""  # Will query using the base URL of the client, i.e. /api/
--    resp = g.request(url)
--    start_ts = time.time()
--    while resp is None:
--        time.sleep(5)
--        hookenv.log("Retrying REST API check...")
--        resp = g.request(url)
--        if time.time() - start_ts > timeout:
--            raise ApiTimeout()
--    hookenv.log("REST API is up")
++    try:
++        g.health_check()
++    except ApiTimeout as err:
++        hookenv.log("REST API is not up")
++        hookenv.status_set("blocked", str(err))
++        raise ApiTimeout(err)
++    else:
++        hookenv.log("REST API is up")
  def _maybe_install_ca_certificates_hook():
 diff --git a/src/tests/unit/requirements.txt b/src/tests/unit/requirements.txt
 index c465d62..fbcf43e 100644
 --- a/src/tests/unit/requirements.txt
 +++ b/src/tests/unit/requirements.txt
@@ -5,3 +5,4 @@ netifaces
  pytest
  pytest-cov
  requests
++tenacity
 diff --git a/src/tests/unit/test_es_api.py b/src/tests/unit/test_es_api.py
 index 8032de5..c4959d0 100644
 --- a/src/tests/unit/test_es_api.py
 +++ b/src/tests/unit/test_es_api.py
@@ -8,40 +8,35 @@ import charms.layer.elasticsearch.api as api
  class TestESAPI(unittest.TestCase):
      @mock.patch("requests.get")
--    def test_init_class(self, req_get):
--        def _side_effect_generator(endpoint):
--            if "unreachable" in endpoint:
--                raise ConnectionError
++    def test_init_class_all_okay(self, req_get):
++        fake_endpoints = ["http://reachable.host1:9000", "http://reachable.host2:9000"]
++        try:
++            mock_resp = mock.Mock()
++            mock_resp.json.return_value = {"status": "green"}
++            req_get.return_value = mock_resp
++            es = api.ElasticSearchApi(fake_endpoints)
++            es.health_check()
++        except Exception:
++            self.fail("Elasticsearch should be in green state.")
--        # All reachable
++    @mock.patch("requests.get")
++    def test_init_class_connection_error(self, req_get):
          fake_endpoints = ["http://reachable.host1:9000", "http://reachable.host2:9000"]
--        es = api.ElasticSearchApi(fake_endpoints)
--        self.assertTrue(es.reachable_ep in fake_endpoints[0])
--
--        fake_endpoints = [
--            "http://unreachable.host1:9000",
--            "http://unreachable.host2:9000",
--            "http://host2:9000",
--        ]
--        req_get.side_effect = _side_effect_generator
--        es = api.ElasticSearchApi(fake_endpoints)
--        self.assertTrue(es.reachable_ep in fake_endpoints[2])
--
--        fake_endpoints = [
--            "http://reachable.host1:9000",
--            "http://unreachable.host1:9000",
--            "http://unreachable.host2:9000",
--        ]
--        req_get.side_effect = _side_effect_generator
--        es = api.ElasticSearchApi(fake_endpoints)
--        self.assertTrue(es.reachable_ep in fake_endpoints[0])
++        req_get.reset_mock()
++        req_get.side_effect = ConnectionError
++        with self.assertRaises(ConnectionError):
++            es = api.ElasticSearchApi(fake_endpoints)
++            es.health_check()
--        fake_endpoints = [
--            "http://unreachable.host1:9000",
--            "http://unreachable.host2:9000",
--        ]
--        req_get.side_effect = _side_effect_generator
--        self.assertRaises(ConnectionError, api.ElasticSearchApi, fake_endpoints)
++    @mock.patch("requests.get")
++    def test_init_class_not_healthy(self, req_get):
++        fake_endpoints = ["http://reachable.host1:9000", "http://reachable.host2:9000"]
++        mock_resp = mock.Mock()
++        mock_resp.json.return_value = {"status": "yellow"}
++        req_get.return_value = mock_resp
++        with self.assertRaises(api.ElasticSearchNotHealthy):
++            es = api.ElasticSearchApi(fake_endpoints)
++            es.health_check()
      @mock.patch("json.dumps")
      @mock.patch("requests.put")
@@ -50,6 +45,10 @@ class TestESAPI(unittest.TestCase):
          fake_endpoints = ["http://host1:9000", "http://host2:9000"]
          j_dumps.return_value = '{"fake_data":"data"}'
++        mock_resp = mock.Mock()
++        mock_resp.json.return_value = {"status": "green"}
++        req_get.return_value = mock_resp
++
          es = api.ElasticSearchApi(fake_endpoints)
          es.disable_auto_create_index()
          req_put.assert_called_with(