OpenStack Compute (nova)

Merge lp:~termie/nova/rpc_multicall into lp:~hudson-openstack/nova/trunk

rpc_multicall
Merge into trunk

Proposed by termie on 2011-05-20

Status:	Merged
Approved by:	Vish Ishaya on 2011-05-27
Approved revision:	1138
Merged at revision:	1116
Proposed branch:	lp:~termie/nova/rpc_multicall
Merge into:	lp:~hudson-openstack/nova/trunk
Diff against target:	995 lines (+443/-144) 8 files modified nova/fakerabbit.py (+23/-8) nova/rpc.py (+208/-71) nova/service.py (+36/-24) nova/test.py (+6/-3) nova/tests/integrated/integrated_helpers.py (+1/-4) nova/tests/test_cloud.py (+9/-17) nova/tests/test_rpc.py (+107/-11) nova/tests/test_service.py (+53/-6)
To merge this branch:	bzr merge lp:~termie/nova/rpc_multicall
Related bugs:	Link a bug report
Related blueprints:	No DB Messaging (High)

Reviewer	Date Requested	Status
Vish Ishaya (community)		Approve on 2011-05-27
Ed Leafe (community)		Approve on 2011-05-27
Chris Behrens (community)	2011-05-20	Approve on 2011-05-27
Jay Pipes (community)		Needs Information on 2011-05-21
Review via email: mp+61686@code.launchpad.net

Description of the change

Adds the ability to make a call that returns multiple times (a call returning a generator). This is also based on the work in rpc-improvements + a bunch of fixes Vish and I worked through to get all the tests to pass so the code is a bit all over the place.

The functionality is being added to support Vish's work on removing worker access to the database, this allows us to write multi-phase actions that yield state updates as they progress, letting the frontend update the db.

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-05-20:

Added ConnectionPool tests, and changed the Pool to be LiFO: lp:~cbehrens/nova/rpc_multicall
(See comments added around ConnectionPool in rpc.py)

One other problem I can think of here:

MulticallWaiter() will put the connection back into the pool as long as its 'close' method is called.
But what happens if an exception occurs between someone pulling results and calling .close()?

Should we put in a def __del__() in MulticallWaiter() to make absolutely sure we .put the connection back when we're done with the result?

lp:~termie/nova/rpc_multicall updated on 2011-05-21

1088. By Justin Shepherd on 2011-05-20

The tools/* directory is now included in pep8 runs. Added an opt-out system for excluding files/dirs from pep8 (using GLOBIGNORE).

1089. By Johannes Erdfelt on 2011-05-20

The XenAPI driver uses openssl as part of the nova-agent implementation to set the password for root. It uses a temporary file insecurely and unnecessarily. Change the code to write the password directly to stdin of the openssl process instead.

1090. By Johannes Erdfelt on 2011-05-20

Add new flag 'max_kernel_ramdisk_size' to specify a maximum size of kernel or ramdisk so we don't copy large files to dom0 and fill up /boot/guest

1091. By Anthony Young on 2011-05-20

This fix ensures that kpartx -d is called in the event that tune2fs fails during key injection, as it does when trying to inject a key into a windows instance. Bug #760921 is a symptom of this issue, as if kpartx -d is not called then partitions remain mapped that prevent the underlying nbd from being reused.

Couldn't think of a good way to regression test for this - any ideas?

1092. By John Tran on 2011-05-20

Added an EC2 API endpoint that'll allow import of public key. Prior, api only allowed generation of new keys.

1093. By Dan Prince on 2011-05-20

Update OSAPI v1.1 extensions so that it supports RequestExtensions. ResponseExtensions were removed since the new RequestExtension covers both use cases. This branch also removes some of the odd serialization code in the RequestExtensionController that converted dictionary objects into webob objects. RequestExtension handlers should now always return proper webob objects.

1094. By William Wolf on 2011-05-20

Get rid of old virt/images.py functions that are no longer needed. Checked for any loose calls to these functions and found none. All tests pass for me.

1095. By Rick Harris on 2011-05-20

This is the groundwork for the upcoming distributed scheduler changes. Nothing is actually wired up here, so it shouldn't break any existing code (and all tests pass).

The goals were to:

1. Define the basic distributed scheduler communication mechanism:
a. call_zone_method - how each zone can communicate with its children

b. encrypted child-blobs - how child zones an securely and statelessly report back weight and build-plan info

2. Put in hooks for advanced-filtering (hard-requirements, capabilities) as well as preferences (least-cost-scheduling)

3. Create a base set of dist-scheduler tests that we can extend as we add more functionality.

Next up will be to:

1. Add in a filtering driver

2. Add in a cost-scheduler driver

1096. By Eldar Nugaev on 2011-05-20

print information about nova-manage project problems

1097. By Vish Ishaya on 2011-05-20

Makes sure vlan creation locks so we don't race and fail to create a vlan.

1098. By Soren Hansen on 2011-05-20

Include data files for public key tests in the tarball.

1099. By <email address hidden> on 2011-05-20

found a typo in the xenserver glance plugin that doesn't work with glance trunk. Also modified the image url to fetch from /v1/image/X instead of /image/X as that returned a 300.

1100. By Andrey Brindeyev on 2011-05-21

--dhcp-lease-max=150 by default. This prevents >150 instances in one network.

Revision history for this message

Jay Pipes (jaypipes) wrote on 2011-05-21:

Hey terms. Looks great overall, including Chris' additions and LiFO.

One little question, though.

37 - greenthread.sleep(0)
38 + for (queue, callback) in CONSUMERS.itervalues():
39 + item = self.get(queue)
40 + if item:
41 + callback(item)
42 + num += 1
43 + yield
44 + if limit and num == limit:
45 + raise StopIteration()
46 + greenthread.sleep(0.1)

Why did you change the sleep time from 0 to 0.1. Did you notice something lock up when it was at 0?

Just curious :) Cheers!
-jay

review: Needs Information

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-05-22:

Jay: I think it may be so that there's less CPU spinning... though this is just in fakerabbit.

I'm worried about the 'time.sleep(0.01)' in MulticallWaiter, though. That's a lot of send/recvs spinning until the result actually comes into the queue...

What about this:

    def wait(self):
        while not self._closed:
            try:
                self._consumer.wait(limit=1)
            except Exception:
                self.close()
                raise

            result = self._results.get()
            if isinstance(result, Exception):
                self.close()
                raise result
            if result == None:
                self.close()
                raise StopIteration
            yield result
        raise StopIteration

That removes the need for the sleep and will be less harsh on rabbit.. wait() will block until a result is received (eventlet will be polling for data). And if one happens to call wait() after close(), StopIteration is raised.

Couple other things:

1) Service.kill puts self.conn back into the pool, but it never came out of the pool in the first place. I'd just remove the 'put'. No need to use the pool for it right now, and it somewhat breaks the pool if ConsumerSet needs to reconnect to rabbit.
2) Should this code that kills the csetthread, etc, actually be in Service.stop?
3) There's no handling of reconnecting to rabbit if the connection dies. We really need a larger re-factor. But since only 'call' is using this right now, and 'call' already didn't handle reconnecting, there's not really any behavior change.

lp:~termie/nova/rpc_multicall updated on 2011-05-25

1101. By justinsb on 2011-05-22

Fix bug #744150 by starting nova-api on an unused port.

1102. By Renuka Apte on 2011-05-23

Fixes euca-attach-volume for iscsi using Xenserver

Minor changes required to xenapi functions to get correct format for volume-id, iscsi-host, etc.

1103. By Anne Gentle on 2011-05-24

Fixes some minor doc issues - misspelled flags in zones doc and also adds zones doc to an index for easier findability

1104. By Dave Walker on 2011-05-24

When adding a keypair with ec2 API that already exists, give a friendly error and no traceback in nova-api

1105. By termie on 2011-05-24

Fixes a bug related to incorrect reparsing of flags and prevents many extra reparses.

1106. By Chris Behrens on 2011-05-24

Pretty simple. We call openssl to encrypt the admin password, but the recent changes around this code forgot to strip the newline off the read from stdout.

1107. By Johannes Erdfelt on 2011-05-24

Using the root-password subcommand of the nova client results in the password being changed for the instance specified, but to a different unknown password. The patch changes nova to use the password specified in the API call.

1108. By Johannes Erdfelt on 2011-05-24

eventlet.spawn_n() expects the function and arguments, but it expects the arguments unpacked since it uses *args.

1109. By Ed Leafe on 2011-05-25

The code for getting an opaque reference to an instance assumed that there was a reference to an instance obj available when raising an exception. I changed this from raising an InstanceNotFound exception to a NotFound, as this is more appropriate for the failure, and doesn't require an instance ID.

1110. By Brian Lamar on 2011-05-25

Created new libvirt directory, moved libvirt_conn.py to libvirt/connection.py, moved libvirt templates, broke out firewall and network utilities.

1111. By Chris Behrens on 2011-05-25

During the API create call, the API would kick off a build and then loop in a greenthread waiting for the scheduler to pick a host for the instance. After API would see a host was picked, it would cast to the compute node's set_admin_password method.

The API server really should not have to do this. The password to set should be pushed along with the build request, instead. The compute node can then set the password after it detects the instance has booted. This removes a greenthread from the API server, a loop that constantly checks the DB for the host, and finally a cast to the compute node.

1112. By Mark Washenberger on 2011-05-25

Several changes designed to bring the openstack api 1.1 closer to spec
- add ram limits to the nova compute quotas
- enable injected file limits and injected file size limits to be overridden in the quota database table
- expose quota limits as absolute limits in the openstack api 1.1 limits resource
- add support for controlling 'unlimited' quotas to nova-manage

1113. By Alex Meade on 2011-05-25

Fixed the mistyped line referred to in bug 787023

Revision history for this message

termie (termie) wrote on 2011-05-25:

Chris: I've just tried a few different variations of the above suggested code as well as code based on iterconsume, but all have resulted in issues of some sort or another that have been difficult to debug (either hangs or returns nothing)... the code works as it is so I am hesitant to go back down the rabbit hole. Will push newer version shortly.

lp:~termie/nova/rpc_multicall updated on 2011-05-25

1114. By termie on 2011-05-25: add support to rpc for multicall
1115. By termie on 2011-05-25: make the test more expicit
1116. By termie on 2011-05-25: add commented out unworking code for yield-based returns
1117. By Chris Behrens on 2011-05-25: Add a connection pool for rpc cast/call
Use the same rabbit connection for all topic listening and wait to be notified vs doing a 0.1 second poll for each.
1118. By Chris Behrens on 2011-05-25: pep8 and comment fixes
1119. By Chris Behrens on 2011-05-25: convert fanout_cast to ConnectionPool
1120. By Chris Behrens on 2011-05-25: fakerabbit's declare_consumer should support more than 1 consumer. also: make fakerabbit Backend.consume be an iterator like it should be..
1121. By Vish Ishaya on 2011-05-25: fix consumers to actually be deleted and clean up cloud test
1122. By Chris Behrens on 2011-05-25: catch greenlet.GreenletExit when shutting service down
1123. By Chris Behrens on 2011-05-25: Always create Service consumers no matter if report_interval is 0
Fix tests to handle how Service loads Consumers now
1124. By Chris Behrens on 2011-05-25: Add rpc_conn_pool_size flag for the new connection pool
1125. By Chris Behrens on 2011-05-25: connection pool tests and make the pool LIFO
1126. By termie on 2011-05-25: bring back commits lost in merge
1127. By termie on 2011-05-25: almost everything working with fake_rabbit
1128. By termie on 2011-05-25: don't need to use a separate connection
1129. By Vish Ishaya on 2011-05-25: lots of fixes for rpc and extra imports
1130. By termie on 2011-05-25: make sure that using multicall on a call with a single result still functions
1131. By termie on 2011-05-25: cleanup the code for merging
1132. By termie on 2011-05-25: cleanups
1133. By termie on 2011-05-25: replace removed import
1134. By termie on 2011-05-25: don't put connection back in pool
1135. By termie on 2011-05-25: move consumerset killing into stop
1136. By termie on 2011-05-25: change the behavior of calling a multicall

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-05-26:

termie: Strange. Well, I'd like to get this merged sooner rather than later, so it's probably okay for now. I have a couple of other starts to redoing this code even more. One of them uses kombu, which seems to simplify some things for us, but it still has some annoyances. More on that later.

I think there was one more thing I spotted when playing around with larger refactors that I didn't comment on here yet. I'll take a look at what you've changed here since I last looked..

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-05-26:

Ah, I added a self.conn.close() to Service.stop() after killing the csetthread. Not terribly important.

Looks good to me.

Revision history for this message

Ed Leafe (ed-leafe) wrote on 2011-05-26:

When testing if an object is a generator, relying on having an attribute named 'send' is weak and error-prone. It would be preferable to use:
isinstance(rval, types.GeneratorType)
to check for generators.

Line 239 of the diff (wait method): Logging the error should probably not only include the type of exception, but the exception message as well:
    type_e = type(e)
    LOG.error(_("Received exception %(e)s (%(type_e)s while processing consumer")
            % locals())

In the call() method of nova/rpc.py (416 of the diff), is there any chance that MultiCall() will return either a) a non-list result or b) an empty list? If so the slice action will throw an error.

In nova/service.py, the declaration at line 109 is stylistically inconsistent. The declarations above it use full names ('consumer_all', 'consumer_node', etc.), while the last name is 'cset'; it should be 'consumer_set' for naming consistency. Also, the indentation of continued lines is inconsistent; since the preceeding lines use the standard two indents, this last declaration should follow that convention. And while I'm nitpicking, there should not be a blank line between the first 3 and the last declarations (i.e., remove line 108).

review: Needs Fixing

Revision history for this message

Josh Kearney (jk0) wrote on 2011-05-26:

Not that it matters much, but I always prefer doing my for loops like such (in reference to the tests near the end):

for i, x in enumerate(y):

^ This eliminates having to set i = 0 and manually incrementing it inside the loop.

lp:~termie/nova/rpc_multicall updated on 2011-05-26

1137. By termie on 2011-05-26: changes per review

Revision history for this message

termie (termie) wrote on 2011-05-26:

Ed and jk0, I think I've addressed all the issues you've brought up.

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-05-26:

Something I spot which is a bug before your changes:

215 msg_reply(msg_id, _('No method for message: %s') % message_data)

msg_id could be None there.

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-05-26:

Also: would it simplify things by removing this:

- # This will be popped off in _unpack_context
- msg_id = message_data.get('_msg_id', None)

And and using ctxt.reply going forward? See: http://paste.openstack.org/show/1433/

RPC test passes.

lp:~termie/nova/rpc_multicall updated on 2011-05-27

1138. By termie on 2011-05-27: fix a minor bug unrelated to this change

Revision history for this message

termie (termie) wrote on 2011-05-27:

fixed the bug, I don't particularly want to do that other change for this but you are welcome to submit such a fixup later :)

Revision history for this message

Chris Behrens (cbehrens) wrote on 2011-05-27:

I'm good with that. :)

review: Approve

Revision history for this message

Ed Leafe (ed-leafe) wrote on 2011-05-27:

Fixes look good.

review: Approve

Revision history for this message

Vish Ishaya (vishvananda) wrote on 2011-05-27:

I've been testing this as we go as well. I think it is good to go.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Adam Johnson

Anne Gentle

Anthony Young

Brian Waldon

Chuck Short

Dan Mihai Dumitriu

Dave Walker

David Pravec

Diego Parrilla

Edgar Magana

Endre Karlson

Ilya Alekseyev

Isaku Yamahata

JJ Asghar

Jay Pipes

Jonathan Bryce

Kapil Thangavelu

Keisuke Tagami

Koji Iida

Krisztian Eyssen

Lorin Hochstein

Mark McLoughlin

Masanori Itoh

Milind Barve

Nachi Ueno

Paul Guth

Pedro Perez

Rajesh Battala

Ram Durairaj

Robert Middleswarth

Salvatore Orlando

Sateesh

Soren Hansen

Tomoya Masuko

Vish Ishaya

Vladimir Popovski

Youcef Laribi

adil mukarram

jawaid ekram

justinsb

jxta

makki

med makki maalej

sreekanth

termie

to status/vote changes:

Chris Behrens

 === modified file 'nova/fakerabbit.py'
 --- nova/fakerabbit.py	2011-02-22 23:05:48 +0000
 +++ nova/fakerabbit.py	2011-05-27 00:09:30 +0000
@@ -31,6 +31,7 @@
  EXCHANGES = {}
  QUEUES = {}
++CONSUMERS = {}
  class Message(base.BaseMessage):
@@ -96,17 +97,29 @@
                  ' key %(routing_key)s') % locals())
          EXCHANGES[exchange].bind(QUEUES[queue].push, routing_key)
--    def declare_consumer(self, queue, callback, *args, **kwargs):
--        self.current_queue = queue
--        self.current_callback = callback
++    def declare_consumer(self, queue, callback, consumer_tag, *args, **kwargs):
++        global CONSUMERS
++        LOG.debug("Adding consumer %s", consumer_tag)
++        CONSUMERS[consumer_tag] = (queue, callback)
++
++    def cancel(self, consumer_tag):
++        global CONSUMERS
++        LOG.debug("Removing consumer %s", consumer_tag)
++        del CONSUMERS[consumer_tag]
      def consume(self, limit=None):
++        global CONSUMERS
++        num = 0
          while True:
--            item = self.get(self.current_queue)
--            if item:
--                self.current_callback(item)
--                raise StopIteration()
--            greenthread.sleep(0)
++            for (queue, callback) in CONSUMERS.itervalues():
++                item = self.get(queue)
++                if item:
++                    callback(item)
++                    num += 1
++                    yield
++                    if limit and num == limit:
++                        raise StopIteration()
++            greenthread.sleep(0.1)
      def get(self, queue, no_ack=False):
          global QUEUES
@@ -134,5 +147,7 @@
  def reset_all():
      global EXCHANGES
      global QUEUES
++    global CONSUMERS
      EXCHANGES = {}
      QUEUES = {}
++    CONSUMERS = {}
 === modified file 'nova/rpc.py'
 --- nova/rpc.py	2011-04-20 19:08:22 +0000
 +++ nova/rpc.py	2011-05-27 00:09:30 +0000
@@ -28,12 +28,15 @@
  import sys
  import time
  import traceback
++import types
  import uuid
  from carrot import connection as carrot_connection
  from carrot import messaging
  from eventlet import greenpool
--from eventlet import greenthread
++from eventlet import pools
++from eventlet import queue
++import greenlet
  from nova import context
  from nova import exception
@@ -47,7 +50,10 @@
  FLAGS = flags.FLAGS
--flags.DEFINE_integer('rpc_thread_pool_size', 1024, 'Size of RPC thread pool')
++flags.DEFINE_integer('rpc_thread_pool_size', 1024,
++                     'Size of RPC thread pool')
++flags.DEFINE_integer('rpc_conn_pool_size', 30,
++                     'Size of RPC connection pool')
  class Connection(carrot_connection.BrokerConnection):
@@ -90,6 +96,22 @@
          return cls.instance()
++class Pool(pools.Pool):
++    """Class that implements a Pool of Connections."""
++
++    # TODO(comstud): Timeout connections not used in a while
++    def create(self):
++        LOG.debug('Creating new connection')
++        return Connection.instance(new=True)
++
++# Create a ConnectionPool to use for RPC calls.  We'll order the
++# pool as a stack (LIFO), so that we can potentially loop through and
++# timeout old unused connections at some point
++ConnectionPool = Pool(
++        max_size=FLAGS.rpc_conn_pool_size,
++        order_as_stack=True)
++
++
  class Consumer(messaging.Consumer):
      """Consumer base class.
@@ -131,7 +153,9 @@
                  self.connection = Connection.recreate()
                  self.backend = self.connection.create_backend()
                  self.declare()
--            super(Consumer, self).fetch(no_ack, auto_ack, enable_callbacks)
++            return super(Consumer, self).fetch(no_ack,
++                                               auto_ack,
++                                               enable_callbacks)
              if self.failed_connection:
                  LOG.error(_('Reconnected to queue'))
                  self.failed_connection = False
@@ -159,13 +183,13 @@
          self.pool = greenpool.GreenPool(FLAGS.rpc_thread_pool_size)
          super(AdapterConsumer, self).__init__(connection=connection,
                                                topic=topic)
--
--    def receive(self, *args, **kwargs):
--        self.pool.spawn_n(self._receive, *args, **kwargs)
--
--    @exception.wrap_exception
--    def _receive(self, message_data, message):
--        """Magically looks for a method on the proxy object and calls it.
++        self.register_callback(self.process_data)
++
++    def process_data(self, message_data, message):
++        """Consumer callback to call a method on a proxy object.
++
++        Parses the message for validity and fires off a thread to call the
++        proxy object method.
          Message data should be a dictionary with two keys:
              method: string representing the method to call
@@ -175,8 +199,8 @@
          """
          LOG.debug(_('received %s') % message_data)
--        msg_id = message_data.pop('_msg_id', None)
--
++        # This will be popped off in _unpack_context
++        msg_id = message_data.get('_msg_id', None)
          ctxt = _unpack_context(message_data)
          method = message_data.get('method')
@@ -188,8 +212,17 @@
              #             we just log the message and send an error string
              #             back to the caller
              LOG.warn(_('no method for message: %s') % message_data)
--            msg_reply(msg_id, _('No method for message: %s') % message_data)
++            if msg_id:
++                msg_reply(msg_id,
++                          _('No method for message: %s') % message_data)
              return
++        self.pool.spawn_n(self._process_data, msg_id, ctxt, method, args)
++
++    @exception.wrap_exception
++    def _process_data(self, msg_id, ctxt, method, args):
++        """Thread that maigcally looks for a method on the proxy
++        object and calls it.
++        """
          node_func = getattr(self.proxy, str(method))
          node_args = dict((str(k), v) for k, v in args.iteritems())
@@ -197,7 +230,18 @@
          try:
              rval = node_func(context=ctxt, **node_args)
              if msg_id:
--                msg_reply(msg_id, rval, None)
++                # Check if the result was a generator
++                if isinstance(rval, types.GeneratorType):
++                    for x in rval:
++                        msg_reply(msg_id, x, None)
++                else:
++                    msg_reply(msg_id, rval, None)
++
++                # This final None tells multicall that it is done.
++                msg_reply(msg_id, None, None)
++            elif isinstance(rval, types.GeneratorType):
++                # NOTE(vish): this iterates through the generator
++                list(rval)
          except Exception as e:
              logging.exception('Exception during message handling')
              if msg_id:
@@ -205,11 +249,6 @@
          return
--class Publisher(messaging.Publisher):
--    """Publisher base class."""
--    pass
--
--
  class TopicAdapterConsumer(AdapterConsumer):
      """Consumes messages on a specific topic."""
@@ -242,6 +281,58 @@
                                      topic=topic, proxy=proxy)
++class ConsumerSet(object):
++    """Groups consumers to listen on together on a single connection."""
++
++    def __init__(self, connection, consumer_list):
++        self.consumer_list = set(consumer_list)
++        self.consumer_set = None
++        self.enabled = True
++        self.init(connection)
++
++    def init(self, conn):
++        if not conn:
++            conn = Connection.instance(new=True)
++        if self.consumer_set:
++            self.consumer_set.close()
++        self.consumer_set = messaging.ConsumerSet(conn)
++        for consumer in self.consumer_list:
++            consumer.connection = conn
++            # consumer.backend is set for us
++            self.consumer_set.add_consumer(consumer)
++
++    def reconnect(self):
++        self.init(None)
++
++    def wait(self, limit=None):
++        running = True
++        while running:
++            it = self.consumer_set.iterconsume(limit=limit)
++            if not it:
++                break
++            while True:
++                try:
++                    it.next()
++                except StopIteration:
++                    return
++                except greenlet.GreenletExit:
++                    running = False
++                    break
++                except Exception as e:
++                    LOG.exception(_("Exception while processing consumer"))
++                    self.reconnect()
++                    # Break to outer loop
++                    break
++
++    def close(self):
++        self.consumer_set.close()
++
++
++class Publisher(messaging.Publisher):
++    """Publisher base class."""
++    pass
++
++
  class TopicPublisher(Publisher):
      """Publishes messages on a specific topic."""
@@ -306,16 +397,18 @@
          LOG.error(_("Returning exception %s to caller"), message)
          LOG.error(tb)
          failure = (failure[0].__name__, str(failure[1]), tb)
--    conn = Connection.instance()
--    publisher = DirectPublisher(connection=conn, msg_id=msg_id)
--    try:
--        publisher.send({'result': reply, 'failure': failure})
--    except TypeError:
--        publisher.send(
--                {'result': dict((k, repr(v))
--                                for k, v in reply.__dict__.iteritems()),
--                 'failure': failure})
--    publisher.close()
++
++    with ConnectionPool.item() as conn:
++        publisher = DirectPublisher(connection=conn, msg_id=msg_id)
++        try:
++            publisher.send({'result': reply, 'failure': failure})
++        except TypeError:
++            publisher.send(
++                    {'result': dict((k, repr(v))
++                                    for k, v in reply.__dict__.iteritems()),
++                     'failure': failure})
++
++        publisher.close()
  class RemoteError(exception.Error):
@@ -347,8 +440,9 @@
          if key.startswith('_context_'):
              value = msg.pop(key)
              context_dict[key[9:]] = value
++    context_dict['msg_id'] = msg.pop('_msg_id', None)
      LOG.debug(_('unpacked context: %s'), context_dict)
--    return context.RequestContext.from_dict(context_dict)
++    return RpcContext.from_dict(context_dict)
  def _pack_context(msg, context):
@@ -360,70 +454,112 @@
      for args at some point.
      """
--    context = dict([('_context_%s' % key, value)
--                   for (key, value) in context.to_dict().iteritems()])
--    msg.update(context)
--
--
--def call(context, topic, msg):
--    """Sends a message on a topic and wait for a response."""
++    context_d = dict([('_context_%s' % key, value)
++                      for (key, value) in context.to_dict().iteritems()])
++    msg.update(context_d)
++
++
++class RpcContext(context.RequestContext):
++    def __init__(self, *args, **kwargs):
++        msg_id = kwargs.pop('msg_id', None)
++        self.msg_id = msg_id
++        super(RpcContext, self).__init__(*args, **kwargs)
++
++    def reply(self, *args, **kwargs):
++        msg_reply(self.msg_id, *args, **kwargs)
++
++
++def multicall(context, topic, msg):
++    """Make a call that returns multiple times."""
      LOG.debug(_('Making asynchronous call on %s ...'), topic)
      msg_id = uuid.uuid4().hex
      msg.update({'_msg_id': msg_id})
      LOG.debug(_('MSG_ID is %s') % (msg_id))
      _pack_context(msg, context)
--    class WaitMessage(object):
--        def __call__(self, data, message):
--            """Acks message and sets result."""
--            message.ack()
--            if data['failure']:
--                self.result = RemoteError(*data['failure'])
--            else:
--                self.result = data['result']
--
--    wait_msg = WaitMessage()
--    conn = Connection.instance()
--    consumer = DirectConsumer(connection=conn, msg_id=msg_id)
++    con_conn = ConnectionPool.get()
++    consumer = DirectConsumer(connection=con_conn, msg_id=msg_id)
++    wait_msg = MulticallWaiter(consumer)
      consumer.register_callback(wait_msg)
--    conn = Connection.instance()
--    publisher = TopicPublisher(connection=conn, topic=topic)
++    publisher = TopicPublisher(connection=con_conn, topic=topic)
      publisher.send(msg)
      publisher.close()
--    try:
--        consumer.wait(limit=1)
--    except StopIteration:
--        pass
--    consumer.close()
--    # NOTE(termie): this is a little bit of a change from the original
--    #               non-eventlet code where returning a Failure
--    #               instance from a deferred call is very similar to
--    #               raising an exception
--    if isinstance(wait_msg.result, Exception):
--        raise wait_msg.result
--    return wait_msg.result
++    return wait_msg
++
++
++class MulticallWaiter(object):
++    def __init__(self, consumer):
++        self._consumer = consumer
++        self._results = queue.Queue()
++        self._closed = False
++
++    def close(self):
++        self._closed = True
++        self._consumer.close()
++        ConnectionPool.put(self._consumer.connection)
++
++    def __call__(self, data, message):
++        """Acks message and sets result."""
++        message.ack()
++        if data['failure']:
++            self._results.put(RemoteError(*data['failure']))
++        else:
++            self._results.put(data['result'])
++
++    def __iter__(self):
++        return self.wait()
++
++    def wait(self):
++        while True:
++            rv = None
++            while rv is None and not self._closed:
++                try:
++                    rv = self._consumer.fetch(enable_callbacks=True)
++                except Exception:
++                    self.close()
++                    raise
++                time.sleep(0.01)
++
++            result = self._results.get()
++            if isinstance(result, Exception):
++                self.close()
++                raise result
++            if result == None:
++                self.close()
++                raise StopIteration
++            yield result
++
++
++def call(context, topic, msg):
++    """Sends a message on a topic and wait for a response."""
++    rv = multicall(context, topic, msg)
++    # NOTE(vish): return the last result from the multicall
++    rv = list(rv)
++    if not rv:
++        return
++    return rv[-1]
  def cast(context, topic, msg):
      """Sends a message on a topic without waiting for a response."""
      LOG.debug(_('Making asynchronous cast on %s...'), topic)
      _pack_context(msg, context)
--    conn = Connection.instance()
--    publisher = TopicPublisher(connection=conn, topic=topic)
--    publisher.send(msg)
--    publisher.close()
++    with ConnectionPool.item() as conn:
++        publisher = TopicPublisher(connection=conn, topic=topic)
++        publisher.send(msg)
++        publisher.close()
  def fanout_cast(context, topic, msg):
      """Sends a message on a fanout exchange without waiting for a response."""
      LOG.debug(_('Making asynchronous fanout cast...'))
      _pack_context(msg, context)
--    conn = Connection.instance()
--    publisher = FanoutPublisher(topic, connection=conn)
--    publisher.send(msg)
--    publisher.close()
++    with ConnectionPool.item() as conn:
++        publisher = FanoutPublisher(topic, connection=conn)
++        publisher.send(msg)
++        publisher.close()
  def generic_response(message_data, message):
@@ -459,6 +595,7 @@
      if wait:
          consumer.wait()
++        consumer.close()
  if __name__ == '__main__':
 === modified file 'nova/service.py'
 --- nova/service.py	2011-05-20 04:03:15 +0000
 +++ nova/service.py	2011-05-27 00:09:30 +0000
@@ -19,14 +19,11 @@
  """Generic Node baseclass for all workers that run on hosts."""
++import greenlet
  import inspect
  import os
--import sys
--import time
--from eventlet import event
  from eventlet import greenthread
--from eventlet import greenpool
  from nova import context
  from nova import db
@@ -91,27 +88,37 @@
          if 'nova-compute' == self.binary:
              self.manager.update_available_resource(ctxt)
--        conn1 = rpc.Connection.instance(new=True)
--        conn2 = rpc.Connection.instance(new=True)
--        conn3 = rpc.Connection.instance(new=True)
++        self.conn = rpc.Connection.instance(new=True)
++        logging.debug("Creating Consumer connection for Service %s" %
++                      self.topic)
++
++        # Share this same connection for these Consumers
++        consumer_all = rpc.TopicAdapterConsumer(
++                connection=self.conn,
++                topic=self.topic,
++                proxy=self)
++        consumer_node = rpc.TopicAdapterConsumer(
++                connection=self.conn,
++                topic='%s.%s' % (self.topic, self.host),
++                proxy=self)
++        fanout = rpc.FanoutAdapterConsumer(
++                connection=self.conn,
++                topic=self.topic,
++                proxy=self)
++        consumer_set = rpc.ConsumerSet(
++                connection=self.conn,
++                consumer_list=[consumer_all, consumer_node, fanout])
++
++        # Wait forever, processing these consumers
++        def _wait():
++            try:
++                consumer_set.wait()
++            finally:
++                consumer_set.close()
++
++        self.consumer_set_thread = greenthread.spawn(_wait)
++
          if self.report_interval:
--            consumer_all = rpc.TopicAdapterConsumer(
--                    connection=conn1,
--                    topic=self.topic,
--                    proxy=self)
--            consumer_node = rpc.TopicAdapterConsumer(
--                    connection=conn2,
--                    topic='%s.%s' % (self.topic, self.host),
--                    proxy=self)
--            fanout = rpc.FanoutAdapterConsumer(
--                    connection=conn3,
--                    topic=self.topic,
--                    proxy=self)
--
--            self.timers.append(consumer_all.attach_to_eventlet())
--            self.timers.append(consumer_node.attach_to_eventlet())
--            self.timers.append(fanout.attach_to_eventlet())
--
              pulse = utils.LoopingCall(self.report_state)
              pulse.start(interval=self.report_interval, now=False)
              self.timers.append(pulse)
@@ -174,6 +181,11 @@
              logging.warn(_('Service killed that has no database entry'))
      def stop(self):
++        self.consumer_set_thread.kill()
++        try:
++            self.consumer_set_thread.wait()
++        except greenlet.GreenletExit:
++            pass
          for x in self.timers:
              try:
                  x.stop()
 === modified file 'nova/test.py'
 --- nova/test.py	2011-04-20 19:08:22 +0000
 +++ nova/test.py	2011-05-27 00:09:30 +0000
@@ -31,17 +31,15 @@
  import unittest
  import mox
--import shutil
  import stubout
  from eventlet import greenthread
--from nova import context
--from nova import db
  from nova import fakerabbit
  from nova import flags
  from nova import rpc
  from nova import service
  from nova import wsgi
++from nova.virt import fake
  FLAGS = flags.FLAGS
@@ -85,6 +83,7 @@
          self._monkey_patch_attach()
          self._monkey_patch_wsgi()
          self._original_flags = FLAGS.FlagValuesDict()
++        rpc.ConnectionPool = rpc.Pool(max_size=FLAGS.rpc_conn_pool_size)
      def tearDown(self):
          """Runs after each test method to tear down test environment."""
@@ -99,6 +98,10 @@
              if FLAGS.fake_rabbit:
                  fakerabbit.reset_all()
++            if FLAGS.connection_type == 'fake':
++                if hasattr(fake.FakeConnection, '_instance'):
++                    del fake.FakeConnection._instance
++
              # Reset any overriden flags
              self.reset_flags()
 === modified file 'nova/tests/integrated/integrated_helpers.py'
 --- nova/tests/integrated/integrated_helpers.py	2011-03-30 16:08:36 +0000
 +++ nova/tests/integrated/integrated_helpers.py	2011-05-27 00:09:30 +0000
@@ -154,10 +154,7 @@
          # set up services
          self.start_service('compute')
          self.start_service('volume')
--        # NOTE(justinsb): There's a bug here which is eluding me...
--        # If we start the network_service, all is good, but then subsequent
--        # tests fail: CloudTestCase.test_ajax_console in particular.
--        #self.start_service('network')
++        self.start_service('network')
          self.start_service('scheduler')
          self._start_api_service()
 === modified file 'nova/tests/test_cloud.py'
 --- nova/tests/test_cloud.py	2011-05-20 06:51:29 +0000
 +++ nova/tests/test_cloud.py	2011-05-27 00:09:30 +0000
@@ -17,13 +17,9 @@
  #    under the License.
  from base64 import b64decode
--import json
  from M2Crypto import BIO
  from M2Crypto import RSA
  import os
--import shutil
--import tempfile
--import time
  from eventlet import greenthread
@@ -33,12 +29,10 @@
  from nova import flags
  from nova import log as logging
  from nova import rpc
--from nova import service
  from nova import test
  from nova import utils
  from nova import exception
  from nova.auth import manager
--from nova.compute import power_state
  from nova.api.ec2 import cloud
  from nova.api.ec2 import ec2utils
  from nova.image import local
@@ -79,14 +73,21 @@
          self.stubs.Set(local.LocalImageService, 'show', fake_show)
          self.stubs.Set(local.LocalImageService, 'show_by_name', fake_show)
++        # NOTE(vish): set up a manual wait so rpc.cast has a chance to finish
++        rpc_cast = rpc.cast
++
++        def finish_cast(*args, **kwargs):
++            rpc_cast(*args, **kwargs)
++            greenthread.sleep(0.2)
++
++        self.stubs.Set(rpc, 'cast', finish_cast)
++
      def tearDown(self):
          network_ref = db.project_get_network(self.context,
                                               self.project.id)
          db.network_disassociate(self.context, network_ref['id'])
          self.manager.delete_project(self.project)
          self.manager.delete_user(self.user)
--        self.compute.kill()
--        self.network.kill()
          super(CloudTestCase, self).tearDown()
      def _create_key(self, name):
@@ -113,7 +114,6 @@
          self.cloud.describe_addresses(self.context)
          self.cloud.release_address(self.context,
                                    public_ip=address)
--        greenthread.sleep(0.3)
          db.floating_ip_destroy(self.context, address)
      def test_associate_disassociate_address(self):
@@ -129,12 +129,10 @@
          self.cloud.associate_address(self.context,
                                       instance_id=ec2_id,
                                       public_ip=address)
--        greenthread.sleep(0.3)
          self.cloud.disassociate_address(self.context,
                                          public_ip=address)
          self.cloud.release_address(self.context,
                                    public_ip=address)
--        greenthread.sleep(0.3)
          self.network.deallocate_fixed_ip(self.context, fixed)
          db.instance_destroy(self.context, inst['id'])
          db.floating_ip_destroy(self.context, address)
@@ -306,31 +304,25 @@
                    'instance_type': instance_type,
                    'max_count': max_count}
          rv = self.cloud.run_instances(self.context, **kwargs)
--        greenthread.sleep(0.3)
          instance_id = rv['instancesSet'][0]['instanceId']
          output = self.cloud.get_console_output(context=self.context,
                                                 instance_id=[instance_id])
          self.assertEquals(b64decode(output['output']), 'FAKE CONSOLE?OUTPUT')
          # TODO(soren): We need this until we can stop polling in the rpc code
          #              for unit tests.
--        greenthread.sleep(0.3)
          rv = self.cloud.terminate_instances(self.context, [instance_id])
--        greenthread.sleep(0.3)
      def test_ajax_console(self):
          kwargs = {'image_id': 'ami-1'}
          rv = self.cloud.run_instances(self.context, **kwargs)
          instance_id = rv['instancesSet'][0]['instanceId']
--        greenthread.sleep(0.3)
          output = self.cloud.get_ajax_console(context=self.context,
                                               instance_id=[instance_id])
          self.assertEquals(output['url'],
                            '%s/?token=FAKETOKEN' % FLAGS.ajax_console_proxy_url)
          # TODO(soren): We need this until we can stop polling in the rpc code
          #              for unit tests.
--        greenthread.sleep(0.3)
          rv = self.cloud.terminate_instances(self.context, [instance_id])
--        greenthread.sleep(0.3)
      def test_key_generation(self):
          result = self._create_key('test')
 === modified file 'nova/tests/test_rpc.py'
 --- nova/tests/test_rpc.py	2011-02-23 22:41:11 +0000
 +++ nova/tests/test_rpc.py	2011-05-27 00:09:30 +0000
@@ -31,7 +31,6 @@
  class RpcTestCase(test.TestCase):
--    """Test cases for rpc"""
      def setUp(self):
          super(RpcTestCase, self).setUp()
          self.conn = rpc.Connection.instance(True)
@@ -43,14 +42,55 @@
          self.context = context.get_admin_context()
      def test_call_succeed(self):
--        """Get a value through rpc call"""
          value = 42
          result = rpc.call(self.context, 'test', {"method": "echo",
                                                   "args": {"value": value}})
          self.assertEqual(value, result)
++    def test_call_succeed_despite_multiple_returns(self):
++        value = 42
++        result = rpc.call(self.context, 'test', {"method": "echo_three_times",
++                                                 "args": {"value": value}})
++        self.assertEqual(value + 2, result)
++
++    def test_call_succeed_despite_multiple_returns_yield(self):
++        value = 42
++        result = rpc.call(self.context, 'test',
++                          {"method": "echo_three_times_yield",
++                           "args": {"value": value}})
++        self.assertEqual(value + 2, result)
++
++    def test_multicall_succeed_once(self):
++        value = 42
++        result = rpc.multicall(self.context,
++                              'test',
++                              {"method": "echo",
++                               "args": {"value": value}})
++        for i, x in enumerate(result):
++            if i > 0:
++                self.fail('should only receive one response')
++            self.assertEqual(value + i, x)
++
++    def test_multicall_succeed_three_times(self):
++        value = 42
++        result = rpc.multicall(self.context,
++                              'test',
++                              {"method": "echo_three_times",
++                               "args": {"value": value}})
++        for i, x in enumerate(result):
++            self.assertEqual(value + i, x)
++
++    def test_multicall_succeed_three_times_yield(self):
++        value = 42
++        result = rpc.multicall(self.context,
++                              'test',
++                              {"method": "echo_three_times_yield",
++                               "args": {"value": value}})
++        for i, x in enumerate(result):
++            self.assertEqual(value + i, x)
++
      def test_context_passed(self):
--        """Makes sure a context is passed through rpc call"""
++        """Makes sure a context is passed through rpc call."""
          value = 42
          result = rpc.call(self.context,
                            'test', {"method": "context",
@@ -58,11 +98,12 @@
          self.assertEqual(self.context.to_dict(), result)
      def test_call_exception(self):
--        """Test that exception gets passed back properly
++        """Test that exception gets passed back properly.
          rpc.call returns a RemoteError object.  The value of the
          exception is converted to a string, so we convert it back
          to an int in the test.
++
          """
          value = 42
          self.assertRaises(rpc.RemoteError,
@@ -81,7 +122,7 @@
              self.assertEqual(int(exc.value), value)
      def test_nested_calls(self):
--        """Test that we can do an rpc.call inside another call"""
++        """Test that we can do an rpc.call inside another call."""
          class Nested(object):
              @staticmethod
              def echo(context, queue, value):
@@ -108,25 +149,80 @@
                                                "value": value}})
          self.assertEqual(value, result)
++    def test_connectionpool_single(self):
++        """Test that ConnectionPool recycles a single connection."""
++        conn1 = rpc.ConnectionPool.get()
++        rpc.ConnectionPool.put(conn1)
++        conn2 = rpc.ConnectionPool.get()
++        rpc.ConnectionPool.put(conn2)
++        self.assertEqual(conn1, conn2)
++
++    def test_connectionpool_double(self):
++        """Test that ConnectionPool returns and reuses separate connections.
++
++        When called consecutively we should get separate connections and upon
++        returning them those connections should be reused for future calls
++        before generating a new connection.
++
++        """
++        conn1 = rpc.ConnectionPool.get()
++        conn2 = rpc.ConnectionPool.get()
++
++        self.assertNotEqual(conn1, conn2)
++        rpc.ConnectionPool.put(conn1)
++        rpc.ConnectionPool.put(conn2)
++
++        conn3 = rpc.ConnectionPool.get()
++        conn4 = rpc.ConnectionPool.get()
++        self.assertEqual(conn1, conn3)
++        self.assertEqual(conn2, conn4)
++
++    def test_connectionpool_limit(self):
++        """Test connection pool limit and connection uniqueness."""
++        max_size = FLAGS.rpc_conn_pool_size
++        conns = []
++
++        for i in xrange(max_size):
++            conns.append(rpc.ConnectionPool.get())
++
++        self.assertFalse(rpc.ConnectionPool.free_items)
++        self.assertEqual(rpc.ConnectionPool.current_size,
++                rpc.ConnectionPool.max_size)
++        self.assertEqual(len(set(conns)), max_size)
++
  class TestReceiver(object):
--    """Simple Proxy class so the consumer has methods to call
--
--    Uses static methods because we aren't actually storing any state"""
++    """Simple Proxy class so the consumer has methods to call.
++
++    Uses static methods because we aren't actually storing any state.
++
++    """
      @staticmethod
      def echo(context, value):
--        """Simply returns whatever value is sent in"""
++        """Simply returns whatever value is sent in."""
          LOG.debug(_("Received %s"), value)
          return value
      @staticmethod
      def context(context, value):
--        """Returns dictionary version of context"""
++        """Returns dictionary version of context."""
          LOG.debug(_("Received %s"), context)
          return context.to_dict()
      @staticmethod
++    def echo_three_times(context, value):
++        context.reply(value)
++        context.reply(value + 1)
++        context.reply(value + 2)
++
++    @staticmethod
++    def echo_three_times_yield(context, value):
++        yield value
++        yield value + 1
++        yield value + 2
++
++    @staticmethod
      def fail(context, value):
--        """Raises an exception with the value sent in"""
++        """Raises an exception with the value sent in."""
          raise Exception(value)
 === modified file 'nova/tests/test_service.py'
 --- nova/tests/test_service.py	2011-03-17 13:35:00 +0000
 +++ nova/tests/test_service.py	2011-05-27 00:09:30 +0000
@@ -106,7 +106,10 @@
          # NOTE(vish): Create was moved out of mox replay to make sure that
          #             the looping calls are created in StartService.
--        app = service.Service.create(host=host, binary=binary)
++        app = service.Service.create(host=host, binary=binary, topic=topic)
++
++        self.mox.StubOutWithMock(service.rpc.Connection, 'instance')
++        service.rpc.Connection.instance(new=mox.IgnoreArg())
          self.mox.StubOutWithMock(rpc,
                                   'TopicAdapterConsumer',
@@ -114,6 +117,11 @@
          self.mox.StubOutWithMock(rpc,
                                   'FanoutAdapterConsumer',
                                   use_mock_anything=True)
++
++        self.mox.StubOutWithMock(rpc,
++                                 'ConsumerSet',
++                                 use_mock_anything=True)
++
          rpc.TopicAdapterConsumer(connection=mox.IgnoreArg(),
                              topic=topic,
                              proxy=mox.IsA(service.Service)).AndReturn(
@@ -129,9 +137,14 @@
                              proxy=mox.IsA(service.Service)).AndReturn(
                                      rpc.FanoutAdapterConsumer)
--        rpc.TopicAdapterConsumer.attach_to_eventlet()
--        rpc.TopicAdapterConsumer.attach_to_eventlet()
--        rpc.FanoutAdapterConsumer.attach_to_eventlet()
++        def wait_func(self, limit=None):
++            return None
++
++        mock_cset = self.mox.CreateMock(rpc.ConsumerSet,
++                {'wait': wait_func})
++        rpc.ConsumerSet(connection=mox.IgnoreArg(),
++                        consumer_list=mox.IsA(list)).AndReturn(mock_cset)
++        wait_func(mox.IgnoreArg())
          service_create = {'host': host,
                            'binary': binary,
@@ -287,8 +300,42 @@
          # Creating mocks
          self.mox.StubOutWithMock(service.rpc.Connection, 'instance')
          service.rpc.Connection.instance(new=mox.IgnoreArg())
--        service.rpc.Connection.instance(new=mox.IgnoreArg())
--        service.rpc.Connection.instance(new=mox.IgnoreArg())
++
++        self.mox.StubOutWithMock(rpc,
++                                 'TopicAdapterConsumer',
++                                 use_mock_anything=True)
++        self.mox.StubOutWithMock(rpc,
++                                 'FanoutAdapterConsumer',
++                                 use_mock_anything=True)
++
++        self.mox.StubOutWithMock(rpc,
++                                 'ConsumerSet',
++                                 use_mock_anything=True)
++
++        rpc.TopicAdapterConsumer(connection=mox.IgnoreArg(),
++                            topic=topic,
++                            proxy=mox.IsA(service.Service)).AndReturn(
++                                    rpc.TopicAdapterConsumer)
++
++        rpc.TopicAdapterConsumer(connection=mox.IgnoreArg(),
++                            topic='%s.%s' % (topic, host),
++                            proxy=mox.IsA(service.Service)).AndReturn(
++                                    rpc.TopicAdapterConsumer)
++
++        rpc.FanoutAdapterConsumer(connection=mox.IgnoreArg(),
++                            topic=topic,
++                            proxy=mox.IsA(service.Service)).AndReturn(
++                                    rpc.FanoutAdapterConsumer)
++
++        def wait_func(self, limit=None):
++            return None
++
++        mock_cset = self.mox.CreateMock(rpc.ConsumerSet,
++                {'wait': wait_func})
++        rpc.ConsumerSet(connection=mox.IgnoreArg(),
++                        consumer_list=mox.IsA(list)).AndReturn(mock_cset)
++        wait_func(mox.IgnoreArg())
++
          self.mox.StubOutWithMock(serv.manager.driver,
                                   'update_available_resource')
          serv.manager.driver.update_available_resource(mox.IgnoreArg(), host)

OpenStack Compute (nova)

Merge lp:~termie/nova/rpc_multicall into lp:~hudson-openstack/nova/trunk

Commit message

Description of the change

Preview Diff

Subscribers