Merge lp:~nttdata/nova/live-migration into lp:~hudson-openstack/nova/trunk

Proposed by Kei Masumoto
Status: Merged
Merged at revision: 799
Proposed branch: lp:~nttdata/nova/live-migration
Merge into: lp:~hudson-openstack/nova/trunk
Diff against target: 3335 lines (+2788/-10)
21 files modified
bin/nova-manage (+88/-0)
contrib/nova.sh (+1/-0)
nova/compute/manager.py (+252/-1)
nova/db/api.py (+59/-0)
nova/db/sqlalchemy/api.py (+121/-0)
nova/db/sqlalchemy/migrate_repo/versions/010_add_live_migration.py (+83/-0)
nova/db/sqlalchemy/models.py (+38/-0)
nova/scheduler/driver.py (+237/-0)
nova/scheduler/manager.py (+52/-0)
nova/service.py (+3/-0)
nova/tests/test_compute.py (+294/-0)
nova/tests/test_scheduler.py (+622/-1)
nova/tests/test_service.py (+41/-0)
nova/tests/test_virt.py (+223/-3)
nova/tests/test_volume.py (+195/-0)
nova/virt/cpuinfo.xml.template (+9/-0)
nova/virt/fake.py (+21/-0)
nova/virt/libvirt_conn.py (+369/-0)
nova/virt/xenapi_conn.py (+21/-0)
nova/volume/driver.py (+52/-4)
nova/volume/manager.py (+7/-1)
To merge this branch: bzr merge lp:~nttdata/nova/live-migration
Reviewer Review Type Date Requested Status
Ken Pepple (community) Approve
Thierry Carrez (community) Approve
Jay Pipes (community) Approve
Rick Harris (community) Approve
Brian Schott (community) Approve
termie (community) Needs Fixing
Review via email: mp+49699@code.launchpad.net

Description of the change

Main changes from previous merge request:

1. Adding test code
2. Bug fixing
   - improper resource checking
     (memory checking is enough for current version)
   - retrying when continuously live migration request
     in this case, iptables complains, so let's retry!
3. ISCSI EBS volume checking
   - adding nova.volume.driver.ISCSIDriver.check_for_export
   - changing nova.compute.post_live_migration for logging out from iscsi server.

Please feel free to give us comments.
Thanks in advance.

To post a comment you must log in.
Revision history for this message
Rick Harris (rconradharris) wrote :

Just a few nits :)

> + def describeresource(self, host):
> + def updateresource(self, host):

These should probably be `describe_resource` and `update_resource` respectively.

3083 +def mktmpfile(dir):
3084 + """create tmpfile under dir, and return filename."""
3085 + filename = datetime.datetime.utcnow().strftime('%Y%m%d%H%M%S')
3086 + fpath = os.path.join(dir, filename)
3087 + open(fpath, 'a+').write(fpath + '\n')
3088 + return fpath

It would probably be better to use the `tempfile` module in the Python stdlib.

3091 +def exists(filename):
3092 + """check file path existence."""
3093 + return os.path.exists(filename)
3094 +
3095 +
3096 +def remove(filename):
3097 + """remove file."""
3098 + return os.remove(filename)

These wrapper functions seem unnecessary, it would probably be better to just use os.path.exists and os.remove directly in the code.

If you need a stub-point for testing, you can stub out `os.path` and `os` directly.

+ LOG.info('post_live_migration() is started..')

Needs i18n _('post_live...') treatment.

533 + #services.create_column(services_vcpus)
534 + #services.create_column(services_memory_mb)
535 + #services.create_column(services_local_gb)
536 + #services.create_column(services_vcpus_used)
537 + #services.create_column(services_memory_mb_used)
538 + #services.create_column(services_local_gb_used)
539 + #services.create_column(services_hypervisor_type)
540 + #services.create_column(services_hypervisor_version)
541 + #services.create_column(services_cpu_info)

Was this left in by mistake?

902 + print 'manager.attrerr', e

Probably should be logging here, rather than printing to stdout.

review: Needs Fixing
Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi Rick,

Thanks for review!
I think I fixed all of your comments.

Additional changes are made at nova.compute.api.create and nova.image.s3.
It is not related to live-migration, but instances which has kernel and ramdisk cannot launch without this changes. I never change this file and I tested not only run_test.sh but also confirm instances were successfully migrated at real server before I raised merge request. So I completely have no idea when this changes are included...
Anyway, I think this change is necessary. Could you also please review it?

Kindly Regards,
Kei

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Rick Harris
Sent: Friday, February 18, 2011 7:10 AM
To: <email address hidden>
Subject: Re: [Merge] lp:~nttdata/nova/live-migration into lp:nova

Review: Needs Fixing
Just a few nits :)

> + def describeresource(self, host):
> + def updateresource(self, host):

These should probably be `describe_resource` and `update_resource` respectively.

3083 +def mktmpfile(dir):
3084 + """create tmpfile under dir, and return filename."""
3085 + filename = datetime.datetime.utcnow().strftime('%Y%m%d%H%M%S')
3086 + fpath = os.path.join(dir, filename)
3087 + open(fpath, 'a+').write(fpath + '\n')
3088 + return fpath

It would probably be better to use the `tempfile` module in the Python stdlib.

3091 +def exists(filename):
3092 + """check file path existence."""
3093 + return os.path.exists(filename)
3094 +
3095 +
3096 +def remove(filename):
3097 + """remove file."""
3098 + return os.remove(filename)

These wrapper functions seem unnecessary, it would probably be better to just use os.path.exists and os.remove directly in the code.

If you need a stub-point for testing, you can stub out `os.path` and `os` directly.

+ LOG.info('post_live_migration() is started..')

Needs i18n _('post_live...') treatment.

533 + #services.create_column(services_vcpus)
534 + #services.create_column(services_memory_mb)
535 + #services.create_column(services_local_gb)
536 + #services.create_column(services_vcpus_used)
537 + #services.create_column(services_memory_mb_used)
538 + #services.create_column(services_local_gb_used)
539 + #services.create_column(services_hypervisor_type)
540 + #services.create_column(services_hypervisor_version)
541 + #services.create_column(services_cpu_info)

Was this left in by mistake?

902 + print 'manager.attrerr', e

Probably should be logging here, rather than printing to stdout.

--
https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
Your team NTT DATA is subscribed to branch lp:~nttdata/nova/live-migration.

Revision history for this message
Brian Schott (bfschott) wrote :
Download full text (6.8 KiB)

We're very interested in this capability, so looking forward to it. Few comments.

1. Current branch conflicts with lp:nova trunk.

+N nova/db/sqlalchemy/migrate_repo/versions/003_add_label_to_networks.py
+N nova/db/sqlalchemy/migrate_repo/versions/004_add_zone_tables.py

fix: bschott@island100:~/source/nova/live-migration/nova/db/sqlalchemy/migrate_repo/versions$ bzr rename 003_cactus.py 005_add_instance_migration.py

2. Should these be in their own table? That is a lot of fields to add to the Service table directly, since this is a table that has entries for every service type. I was thinking about adding a ComputeService (compute_services) table for our heterogeneous compute cluster.

627 + # The below items are compute node only.
628 + # None is inserted for other service.
629 + vcpus = Column(Integer, nullable=True)
630 + memory_mb = Column(Integer, nullable=True)
631 + local_gb = Column(Integer, nullable=True)
632 + vcpus_used = Column(Integer, nullable=True)
633 + memory_mb_used = Column(Integer, nullable=True)
634 + local_gb_used = Column(Integer, nullable=True)
635 + hypervisor_type = Column(Text, nullable=True)
636 + hypervisor_version = Column(Integer, nullable=True)

3. We can use the "arch" sub-field below for our project. Can we talk about adding accelerator_info (for GPUs, FPGAs, or other co-procesors) and possibly network_info for details on the physical network interface?

    # Note(masumotok): Expected Strings example:
    #
    # '{"arch":"x86_64", "model":"Nehalem",
    # "topology":{"sockets":1, "threads":2, "cores":3},
    # features:[ "tdtscp", "xtpr"]}'
    #
    # Points are "json translatable" and it must have all
    # dictionary keys above.
    cpu_info = Column(Text, nullable=True)

bschott@island100:~/source/nova/live-migration$ bzr merge lp:nova
+N nova/api/openstack/zones.py
+N nova/db/sqlalchemy/migrate_repo/versions/003_add_label_to_networks.py
+N nova/db/sqlalchemy/migrate_repo/versions/004_add_zone_tables.py
+N nova/tests/api/openstack/test_common.py
+N nova/tests/api/openstack/test_zones.py
 M .mailmap
 M Authors
 M HACKING
 M MANIFEST.in
 M bin/nova-manage
 M locale/nova.pot
 M nova/api/ec2/cloud.py
 M nova/api/openstack/__init__.py
 M nova/api/openstack/auth.py
 M nova/api/openstack/common.py
 M nova/api/openstack/servers.py
 M nova/auth/ldapdriver.py
 M nova/auth/novarc.template
 M nova/compute/api.py
 M nova/compute/manager.py
 M nova/compute/power_state.py
 M nova/context.py
 M nova/db/api.py
 M nova/db/sqlalchemy/api.py
 M nova/db/sqlalchemy/migrate_repo/versions/001_austin.py
 M nova/db/sqlalchemy/migrate_repo/versions/002_bexar.py
 M nova/db/sqlalchemy/migration.py
 M nova/db/sqlalchemy/models.py
 M nova/flags.py
 M nova/log.py
 M nova/network/linux_net.py
 M nova/network/manager.py
 M nova/rpc.py
 M nova/tests/api/openstack/__init__.py
 M nova/tests/api/openstack/test_servers.py
 M nova/tests/test_api.py
 M nova/tests/test_compute.py
 M nova/tests/test_log.py
 M nova/tests/test_xenapi.py
 M nova/twistd.py
 M nova/u...

Read more...

review: Needs Fixing
Revision history for this message
termie (termie) wrote :
Download full text (9.8 KiB)

Hello :) I think the code looks very good (tests especially appear thorough), however there are many places for style cleanup, you may want read the part of the HACKING file about docstrings before going on:

in bin/nova-api:

looks like utils.default_flagfile() should be in the __main__ function rather than at the top of the file.

in bin/nova-dhcpbridge:

looks like there is a leftover debugging statement ('open...')

in bin/nova-manage:

please update the docstring for 'live_migration' to describe what it will do (something like "Migrates a running instance to a new machine." is fine)

for the long "if FLAGS.volume_driver..." line, please instead put the line in parens like so:

if (FLAGS.volume_driver != 'nova.volume.driver.AOEDriver' and
    FLAGS.volume_driver != 'nova.volume.driver.ISCSIDriver'):

When generating the "msg" you can do something similar:

msg = ('Migration of %s initiated. Checking its progress'
       ' using euca-describe-instances.') % ec2_id

in the docstring for describe_resource, please capitalize the first word (Describe...)

the comment at line 83 ("Checking result msg format is necessary...") is a little unclear, are you saying:

It will be necessary to check the result msg format when this feature is included in the API.

if so, you could say:

TODO(masumotok): It will be necessary to check the result msg...

Please capitalize the first letter of the docstring for update_resource

in nova/compute/manager.py:

the triple quotes are not necessary around the description of the 'flags.DEFINE_string' line, single quotes are fine.

flags.DEFINE_string looks like it should be flags.DEFINE_integer

the docstring for compare_cpu has an extra space at the beginning that is not necessary.

please capitalize the first letter of the docstring for mktmpfile

if you are only writing to the tmpfile for debugging purposes, perhaps that should be a logging.debug call?

please add a period to the end of the docstring for update_available_resource

in the pre_live_migration method, there should be an apostrophe in the word "doesnt" (doesn't)

may as well capitalize the first letter in the Bridge settings comment ('Call this method...')

in the message about failing a retry you can remove the 'th.' part, and change 'fail' to 'failed', it still doesn't read perfectly but pluralization isn't really necessary for log messages.

in the live_migration method, you can delete the line about #@exception.wrap_exception if you are going to comment it out.

also, please capitalize the first letter of the docstring.

in post_live_migration, please move the first line to the first line of the docstring... and reformat the string a bit like so:

"""Post operations for live migration.

Mainly, database updating.

"""

also in post_live_migration you check 'None == some_variable' a couple times, in python we don't usually do this because it is impossible to write 'if some_variable = None' because the assignment operation is not an expression... which means you don't need to be extra safe with which side the variable is on and having the variable first is easier to read (at least in english).

also a bit further down you don't need to use triple ...

Read more...

review: Needs Fixing
Revision history for this message
Kei Masumoto (masumotok) wrote :
Download full text (8.3 KiB)

Hi Brian,
Thanks for review!

I think I fixed based on all of your comments.
(branch will update soon)
Also, regarding to the point 3 below,

> 3. We can use the "arch" sub-field below for our project. Can we talk about adding
> accelerator_info (for GPUs, FPGAs, or other co-procesors) and possibly network_info
> for details on the physical network interface?

We use cpu_info column to store an argument of compareCPU() in virConnect.
You can get examples following by the below procedure.

# python
# import libvirt
# conn = libvirt.openReadOnly()
# conn.getCapabilities()

Once you follow above, you get xml. We cut and store <cpu>...</cpu> to db.
I think some other result will be shown in your hardware environment.
Then FPGA/GPU info is also included, I suppose.
If libvirt don’t find any FPGA/GPU info, then please let me know.
I have to give it much thought..

-----Original Message-----
From: <email address hidden> [mailto:<email address hidden>] On Behalf Of Brian Schott
Sent: Saturday, February 19, 2011 5:02 AM
To: <email address hidden>
Subject: Re: [Merge] lp:~nttdata/nova/live-migration into lp:nova

Review: Needs Fixing

We're very interested in this capability, so looking forward to it. Few comments.

1. Current branch conflicts with lp:nova trunk.

+N nova/db/sqlalchemy/migrate_repo/versions/003_add_label_to_networks.py
+N nova/db/sqlalchemy/migrate_repo/versions/004_add_zone_tables.py

fix: bschott@island100:~/source/nova/live-migration/nova/db/sqlalchemy/migrate_repo/versions$ bzr rename 003_cactus.py 005_add_instance_migration.py

2. Should these be in their own table? That is a lot of fields to add to the Service table directly, since this is a table that has entries for every service type. I was thinking about adding a ComputeService (compute_services) table for our heterogeneous compute cluster.

627 + # The below items are compute node only.
628 + # None is inserted for other service.
629 + vcpus = Column(Integer, nullable=True)
630 + memory_mb = Column(Integer, nullable=True)
631 + local_gb = Column(Integer, nullable=True)
632 + vcpus_used = Column(Integer, nullable=True)
633 + memory_mb_used = Column(Integer, nullable=True)
634 + local_gb_used = Column(Integer, nullable=True)
635 + hypervisor_type = Column(Text, nullable=True)
636 + hypervisor_version = Column(Integer, nullable=True)

3. We can use the "arch" sub-field below for our project. Can we talk about adding accelerator_info (for GPUs, FPGAs, or other co-procesors) and possibly network_info for details on the physical network interface?

    # Note(masumotok): Expected Strings example:
    #
    # '{"arch":"x86_64", "model":"Nehalem",
    # "topology":{"sockets":1, "threads":2, "cores":3},
    # features:[ "tdtscp", "xtpr"]}'
    #
    # Points are "json translatable" and it must have all
    # dictionary keys above.
    cpu_info = Column(Text, nullable=True)

bschott@island100:~/source/nova/live-migration$ bzr merge lp:nova
+N nova/api/openstack/zones.py ...

Read more...

Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi termie,

Thank you very much for reviewing my branch!
I fixed based on all comments from you.
Hope I don’t miss to fix...

Please let me know if you have further comments.

Regards, ,
Kei

--
https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
Your team NTT DATA is subscribed to branch lp:~nttdata/nova/live-migration.

Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi Rick,

Thanks again for reviewing my branch.
After you gave me a review, I went through 2 other reviewers and things are improved.
Hopefully, let me make proceed for merging.
Also, I kindly ask you to tell me if I missed your point.

Regards,
Kei

________________________________________
差出人: <email address hidden> [<email address hidden>] は Kei Masumoto [<email address hidden>] の代理
送信日時: 2011年2月23日 2:08
宛先: termie
件名: RE: [Merge] lp:~nttdata/nova/live-migration into lp:nova

Hi termie,

Thank you very much for reviewing my branch!
I fixed based on all comments from you.
Hope I don’t miss to fix...

Please let me know if you have further comments.

Regards, ,
Kei

--
https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
Your team NTT DATA is subscribed to branch lp:~nttdata/nova/live-migration.

--
https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
Your team NTT DATA is subscribed to branch lp:~nttdata/nova/live-migration.

Revision history for this message
Brian Schott (bfschott) wrote :

I've been reviewing your changes. Thank you for doing the compute service table changes. I plan to do some integration testing next week with our hpc-trunk branch.

Brian Schott
<email address hidden>

On Feb 25, 2011, at 8:47 AM, Kei Masumoto wrote:

> Hi Rick,
>
> Thanks again for reviewing my branch.
> After you gave me a review, I went through 2 other reviewers and things are improved.
> Hopefully, let me make proceed for merging.
> Also, I kindly ask you to tell me if I missed your point.
>
> Regards,
> Kei
>
> ________________________________________
> 差出人: <email address hidden> [<email address hidden>] は Kei Masumoto [<email address hidden>] の代理
> 送信日時: 2011年2月23日 2:08
> 宛先: termie
> 件名: RE: [Merge] lp:~nttdata/nova/live-migration into lp:nova
>
> Hi termie,
>
> Thank you very much for reviewing my branch!
> I fixed based on all comments from you.
> Hope I don’t miss to fix...
>
> Please let me know if you have further comments.
>
> Regards, ,
> Kei
>
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> Your team NTT DATA is subscribed to branch lp:~nttdata/nova/live-migration.
>
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> Your team NTT DATA is subscribed to branch lp:~nttdata/nova/live-migration.
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> You are reviewing the proposed merge of lp:~nttdata/nova/live-migration into lp:nova.

Revision history for this message
Kei Masumoto (masumotok) wrote :

To All reviewers:

It has been passed a week that I did "bzr merge lp:nova" to this branch, and I think it's better to update again to avoid any conflict.
If you are on the way to review once again, please wait for a while. I will e-mail again once I finished. Few hours are enough, hopefully.

Regards,
Kei Masumoto
--
https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
Your team NTT DATA is subscribed to branch lp:~nttdata/nova/live-migration.

Revision history for this message
Kei Masumoto (masumotok) wrote :

To All reviewers:

I completed to merge trunk rev 752.
Few changes are necessary. Please check the below comment.

> 1. merged trunk rev749
> 2. rpc.call returns '/' as '\/', so nova.compute.manager.mktmpfile, nova.compute.manager.confirm_tmpfile, nova.scheduler.driver.Scheduler.mounted_on_same_shared_storage are modified followed by this changes.
> 3. nova.tests.test_virt.py is modified so that other teams modification is easily detected since other team is using nova.db.sqlalchemy.models.ComputeService.

If you have further comments, if I missed your point, please let me know.

--
https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
Your team NTT DATA is subscribed to branch lp:~nttdata/nova/live-migration.

Revision history for this message
Rick Harris (rconradharris) wrote :
Download full text (4.1 KiB)

Hi Kei! Improvement look good, thanks for the updates. Here are my round-two review notes:

> def mktmpfile(self, context):

It might be a good idea to rename these functions. Right now, the name
confirm_tmpfile contain implementation details, but doesn't provide a good
hint as to what it's used for.

Might be better as:

    create_shared_storage_test_file

Also, what happens if the destination isn't on shared storage? We've deposited
the test file, will that ever be cleaned up?

Perhaps in pseudo-code it should be something like:

    def mounted_on_same_shared_storage:

        create_shared_storage_test_file(dest)
        try:
            # Unlike confirm_tmpfile, this doesn't delete the test_file; that is left
            # to cleanup_test_file
            test_file_exists = check_shared_storage_test_file(source)
        finally:
            # Regardless of whether we find it, we always delete it
            cleanup_shared_storage_test_file(dest)

> ======================================================================
> ERROR: Failure: ImportError (No module named libvirt)
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> File "/Library/Python/2.6/site-packages/nose-0.11.3-py2.6.egg/nose/loader.py", line 382, in loadTestsFromName
> addr.filename, addr.module)
> File "/Library/Python/2.6/site-packages/nose-0.11.3-py2.6.egg/nose/importer.py", line 39, in importFromPath
> return self.importFromDir(dir_path, fqname)
> File "/Library/Python/2.6/site-packages/nose-0.11.3-py2.6.egg/nose/importer.py", line 86, in importFromDir
> mod = load_module(part_fqname, fh, filename, desc)
> File "/Users/rick/Documents/code/openstack/nova/live-migration/nova/tests/test_virt.py", line 17, in <module>
> import libvirt
> ImportError: No module named libvirt
>
> ----------------------------------------------------------------------

Libvirt is required for the tests to run.

Since not everyone is going to have libvirt on their machine, we should probably use
the 'LazyImport pattern' here so that we only import libvirt if it's actually
going to be used-- in the case of unit-tests only the FakeLibvirt should be
used.

> 232 + for v in instance_ref['volumes']:

In general, it's better to use variable names with one letter-- it aids
readability and make it a little easier to 'grep' around the code. In this
case, 'volume' seems like the right choice. There are a few other instances
throughout the code where I think single-letter variable names should
proabably be expanded:

    + p = os.path.join(FLAGS.instances_path, filename)

On the other hand, it's fine (and idiomatic) for exception handler blocks to use `e` as the
variable for their exception. Like:

+ except exception.ProcessExecutionError, e:

> 697 +compute_services = Table('compute_services', meta,

'compute_services' sounds little too much like the 'compute worker service'
that we already have. This might be clearer if renamed

'compute_nodes' or 'compute_hosts'.

The 'compute_node' would represent the physical machine, while the
'compute-service' would represent the logical endpoin...

Read more...

review: Needs Fixing
Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi Rick!
Thanks for review!

I agree all comments from you.
Fixed them soon...

Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi, Rick!

Fixed based on your comments.
One thing I note that I removed libvirt-dependant test in test_virt.py for non-libvirt environment developers.
(At first I was trying to mock/FakeLibvirt as much as I could, but eventually tests become not so meaningful :) )

Hope your feedback..

Kei

Revision history for this message
Rick Harris (rconradharris) wrote :
Download full text (3.3 KiB)

Hi Kei.

I ran the tests on my Mac OS X machine and received 1 failure. Looks like we might need to mock out the get_cpu_info portion of the driver.

> but eventually tests become not so meaningful

Agreed, in terms of "does this really work?", the unit tests aren't a substitute for real functional/integration testing.

However, even with lots of code faked out, we can still get some value from the tests in terms of catching small issues: passing wrong number arguments, syntax errors, variables being of the wrong type. These are things that unit tests are really good at catching. And since we don't have the benefit of a compiler-pass, these unit tests really help cut down on the number of these problems that make it into trunk.

======================================================================
ERROR: test_update_available_resource_works_correctly (nova.tests.test_virt.LibvirtConnTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/rick/Documents/code/openstack/nova/live-migration/nova/tests/test_virt.py", line 290, in test_update_available_resource_works_correctly
    conn.update_available_resource(self.context, 'dummy')
  File "/Users/rick/Documents/code/openstack/nova/live-migration/nova/virt/libvirt_conn.py", line 1042, in update_available_resource
    dic = {'vcpus': self.get_vcpu_total(),
  File "/Users/rick/Documents/code/openstack/nova/live-migration/nova/virt/libvirt_conn.py", line 861, in get_vcpu_total
    return open('/proc/cpuinfo').read().count('processor')
IOError: [Errno 2] No such file or directory: '/proc/cpuinfo'
-------------------- >> begin captured logging << --------------------
2011-03-03 12:58:20,747 AUDIT nova.auth.manager [-] Created user fake (admin: True)
2011-03-03 12:58:20,750 AUDIT nova.auth.manager [-] Created project fake with manager fake
--------------------- >> end captured logging << ---------------------

======================================================================
ERROR: test_update_available_resource_works_correctly (nova.tests.test_virt.LibvirtConnTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/rick/Documents/code/openstack/nova/live-migration/nova/tests/test_virt.py", line 325, in tearDown
    super(LibvirtConnTestCase, self).tearDown()
  File "/Users/rick/Documents/code/openstack/nova/live-migration/nova/test.py", line 91, in tearDown
    self.mox.VerifyAll()
  File "build/bdist.macosx-10.6-universal/egg/mox.py", line 286, in VerifyAll
    mock_obj._Verify()
  File "build/bdist.macosx-10.6-universal/egg/mox.py", line 506, in _Verify
    raise ExpectedMethodCallsError(self._expected_calls_queue)
ExpectedMethodCallsError: Verify: Expected methods never called:
  0. get_cpu_info.__call__() -> 'cpuinfo'
-------------------- >> begin captured logging << --------------------
2011-03-03 12:58:20,747 AUDIT nova.auth.manager [-] Created user fake (admin: True)
2011-03-03 12:58:20,750 AUDIT nova.auth.manager [-] Created project fake with manager fake
2011-03-03 12:58:20,795 AUDIT nova.auth.manager [-] Deleting project fake
2011-03-03 ...

Read more...

Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi Rick.

Thanks for your response.

> I ran the tests on my Mac OS X machine and received 1 failure. Looks like we might
> need to mock out the get_cpu_info portion of the driver.

Thanks for this information! That is very helpful. OK, perhaps enhancing exception handling at get_cpu_info is better instead of mock out, since get_cpu_info() is called whenever nova-compute launches.

if sys.platform.upper() != 'LINUX2':
    return 0
else:
   open('/proc/cpuinfo').read().count('processors')

Please let me know if you have any comments at this point.

> However, even with lots of code faked out, we can still get some value
> from the tests in terms of catching small issues: passing wrong number arguments,
> syntax errors, variables being of the wrong type. These are things that unit
> tests are really good at catching. And since we don't have the benefit of
> a compiler-pass, these unit tests really help cut down on the number of
> these problems that make it into trunk.

Understand. actually I was bit confusing I can include unit-test-like test into trunk.
Let me get deleted test code back to this branch.

Fix them soon...

Thanks again!
Kei

Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi Rick,

I fixed my branch based on your comments.
I think I already explained main changes at previous e-mail - please review it.

Thanks,
Kei

--
https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
Your team NTT DATA is subscribed to branch lp:~nttdata/nova/live-migration.

Revision history for this message
Brian Schott (bfschott) wrote :

In case reviewers are hitting this:

---
bzr rename nova/db/sqlalchemy/migrate_repo/versions/009_add_instance_migrations.py nova/db/sqlalchemy/migrate_repo/versions/010_add_instance_migrations.py
--

I suggest you create flags:
compute_vcpus_total
compute_memory_mb_total
compute_local_gb_total

They can be used to specify resources less than total (like suppose you only want to dedicate 1 core to VMs or half your host memory)? Also, some Linux distros don't have /proc, or at least I think /proc fs is still optional in the kernel.

if FLAGS.compute_vcpus_total
    return FLAGS.compute_vcpus_total
else:
   try
       open('/proc/cpuinfo').read().count('processors')
   except ... (i forget what goes here :-)
        return 1

review: Approve
Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi Brian,

Thanks for approval.

> bzr rename nova/db/sqlalchemy/migrate_repo/versions/009_add_instance_migrations.py
> nova/db/sqlalchemy/migrate_repo/versions/010_add_instance_migrations.py
OK. Fix this soon.

> They can be used to specify resources less than total
> (like suppose you only want to dedicate 1 core to VMs or half your host memory)?
> Also, some Linux distros don't have /proc, or at least
> I think /proc fs is still optional in the kernel.
>
> if FLAGS.compute_vcpus_total>
> return FLAGS.compute_vcpus_total
> else:
> try
> open('/proc/cpuinfo').read().count('processors')
> except ... (i forget what goes here :-)
> return 1

I think I understand your point.
Currently, I use multi-platform library to calculate cpu number and amount of disks, so no problem at this point.
Regarding to the memory, I was trying to multi-platform library like psutil(http://code.google.com/p/psutil/), but only older version is available at Ubuntu Marverick. So I have to wait newer version, and I agree your suggestion for memory calculation.

If "like suppose you only want to dedicate 1 core to VMs or half your host memory?" is your point, please specify which point you are looking at. I don’t intend neither "only 1 core to VM" nor "half of host memory".

Again, thanks for approval.

Kei

Revision history for this message
Brian Schott (bfschott) wrote :

Sorry, I meant it would be good for cloud server administrators to be able to specify how many cores and disk and memory are dedicated to nova.

If someone has cloud on laptop or office computers they might want to reserve some capacity for host operating system.

Or admin might reserve one core and a gigabyte if memory for swift storage server. Not all configs are dedicated compute blades.

Looking forward to seeing this merged soon!

Sent from my iPhone

On Mar 8, 2011, at 9:40 PM, Kei Masumoto <email address hidden> wrote:

> Hi Brian,
>
> Thanks for approval.
>
>> bzr rename nova/db/sqlalchemy/migrate_repo/versions/009_add_instance_migrations.py
>> nova/db/sqlalchemy/migrate_repo/versions/010_add_instance_migrations.py
> OK. Fix this soon.
>
>> They can be used to specify resources less than total
>> (like suppose you only want to dedicate 1 core to VMs or half your host memory)?
>> Also, some Linux distros don't have /proc, or at least
>> I think /proc fs is still optional in the kernel.
>>
>> if FLAGS.compute_vcpus_total>
>> return FLAGS.compute_vcpus_total
>> else:
>> try
>> open('/proc/cpuinfo').read().count('processors')
>> except ... (i forget what goes here :-)
>> return 1
>
> I think I understand your point.
> Currently, I use multi-platform library to calculate cpu number and amount of disks, so no problem at this point.
> Regarding to the memory, I was trying to multi-platform library like psutil(http://code.google.com/p/psutil/), but only older version is available at Ubuntu Marverick. So I have to wait newer version, and I agree your suggestion for memory calculation.
>
> If "like suppose you only want to dedicate 1 core to VMs or half your host memory?" is your point, please specify which point you are looking at. I don’t intend neither "only 1 core to VM" nor "half of host memory".
>
> Again, thanks for approval.
>
> Kei
>
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> You are reviewing the proposed merge of lp:~nttdata/nova/live-migration into lp:nova.

Revision history for this message
Kei Masumoto (masumotok) wrote :

Brian, Thanks for explanation! I understand your comment, "reserving some cpu/memory/disk" sounds like good idea. I personally agree to implement it to nova, but unfortunately, I didn’t mention "reserving" when I got blueprint approval. In addition, it is not only for live migration topic but also entire nova topic. Admin also wants to use "reserving" when just launching instance or creating volumes, doesn’t he?.
Therefore, I think it is better to discuss at next design summit.. (actually, I've heard from someone that feature is necessary and it should be implement at scheduler. I am sure there are many discussions :)
Thanks again!
Kei

Revision history for this message
Brian Schott (bfschott) wrote :

No problem. I'll propose a follow-on blueprint and link it to this one with a more detailed approach.

Brian Schott
<email address hidden>

On Mar 9, 2011, at 12:43 AM, Kei Masumoto wrote:

> Brian, Thanks for explanation! I understand your comment, "reserving some cpu/memory/disk" sounds like good idea. I personally agree to implement it to nova, but unfortunately, I didn’t mention "reserving" when I got blueprint approval. In addition, it is not only for live migration topic but also entire nova topic. Admin also wants to use "reserving" when just launching instance or creating volumes, doesn’t he?.
> Therefore, I think it is better to discuss at next design summit.. (actually, I've heard from someone that feature is necessary and it should be implement at scheduler. I am sure there are many discussions :)
> Thanks again!
> Kei
>
>
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> You are reviewing the proposed merge of lp:~nttdata/nova/live-migration into lp:nova.

Revision history for this message
Rick Harris (rconradharris) wrote :

Nice work, Kei.

Some small nits:

> 194 + os.fdopen(fd, 'w+').close()a

`os.close` should suffice since the fd that mkstemp returns is already open.

> 1007 + ec2_id = instance_ref['hostname']

Doesn't appear to be used.

review: Approve
Revision history for this message
Jay Pipes (jaypipes) wrote :
Download full text (4.2 KiB)

Hi Kei!

All tests are passing locally for me, which is great, and the code looks very solid. Very good use of mox in your test cases.

Just a few suggestions, mostly small style stuff...

1)

There are a number of places you use __setitem__, like so:

1223 + instance_ref.__setitem__('id', 1)

it's easier to just write:

instance_ref['id'] = 1

2) Tiny style/i18n/English stuff

Please do not take offence to me correcting your English phrases! :)

62 + print 'Unexpected error occurs'

Please i18n that. Also, the English saying would be "An unexpected error has occurred."

29 + raise exception.Error(msg)

There are 2 spaces after raise. Only 1 needed :)

92 + raise exception.Invalid(_('%s does not exists.') % host)

English saying would be "%s does not exist" (without the s on exist)

146 +flags.DEFINE_string('live_migration_retry_count', 30,

Might want to use DEFINE_integer to ensure an integer is used as the flag value...

192 + LOG.debug(_("Creating tmpfile %s to notify to other "
193 + "compute node that they mounts same storage.") % tmp_file)

s/node that they mounts same storage/nodes that they should mount the same storage/

248 + msg = _("%(instance_id)s(%(ec2_id)s) does'nt have fixed_ip")

s/does'nt have/does not have/

365 + LOG.info(_('floating_ip is not found for %s'), i_name)
73 + LOG.info(_('Floating_ip is not found for %s'), i_name)

s/floating_ip is not found for/No floating IP was found for/

381 + LOG.info(_('Migrating %(i_name)s to %(dest)s finishes successfully.')

s/finishes successfully/finished successfully/

383 + LOG.info(_("The below error is normally occurs. "
384 + "Just check if instance is successfully migrated.\n"
385 + "libvir: QEMU error : Domain not found: no domain "
386 + "with matching name.."))

I would say this, instead:

LOG.info(_("You may see the error \"libvirt: QEMU error: "
           "Domain not found: no domain with matching name.\" "
           "This error can be safely ignored.")

547 + raise exception.NotFound(_("%s does not exist or not "
548 + "compute node.") % host)

s/or not compute node/or is not a compute node/

1040 + raise exception.NotEmpty(_("%(ec2_id)s is not capable to "
1041 + "migrate %(dest)s (host:%(mem_avail)s "

I'd rewrite that as "Unable to migrate %(ec2_id)s to destination: %(dest)s ..."

1073 + logging.error(_("Cannot comfirm tmpfile at %(ipath)s is on "

s/comfirm/confirm/ :)

You use this line:

global ghost, gbinary, gmox

in 2 places in the nova/tests/test_service.py file:

2198 + global ghost, gbinary, gmox
2238 + global ghost, gbinary, gmox

However, the actual variable names are:

2187 +# temporary variable to store host/binary/self.mox
2188 +# from each method to fake class.
2189 +global_host = None
2190 +global_binary = None
2191 +global_mox = None

You will want to make those consistent I believe, otherwise I'm not sure what gbinary, ghost, and gmox are going to refer to ;)

2385 + def tes1t_update_available_resource_works_correctly(self):

s/tes1t/test :) The mispelling is causing this test case not to be run. (It passes, BTW, when you fix the typo...I checked. :) )

2658 + msg = _("""Cannot confirm exported volume id:%(volume_id)s."""
2659 + """vblade process...

Read more...

review: Needs Fixing
Revision history for this message
Kei Masumoto (masumotok) wrote :

Thanks for review, Rick! I'll fix it, soon..

Kei

Revision history for this message
Kei Masumoto (masumotok) wrote :

Thanks Jay! Your statement at the last IRC meeting is so helpful for me. Everyone gave me review again and got Approve!
I'll fixed my branch based on your comment soon...

One note:
> You use this line:
>
> global ghost, gbinary, gmox
>
> in 2 places in the nova/tests/test_service.py file:
>
>2198 + global ghost, gbinary, gmox
>2238 + global ghost, gbinary, gmox
>
>However, the actual variable names are:
>
>2187 +# temporary variable to store host/binary/self.mox
>2188 +# from each method to fake class.
>2189 +global_host = None
>2190 +global_binary = None
>2191 +global_mox = None
>
>You will want to make those consistent I believe, otherwise I'm not sure what gbinary, ghost, and gmox are going to >refer to ;)
I completely forgot to update this testcase. I'll rewrite this. Sorry..

Thanks again!
Kei

Revision history for this message
Jay Pipes (jaypipes) wrote :

Awesome job, Kei :)

review: Approve
Revision history for this message
Brian Schott (bfschott) wrote :

+1
This branch adds a lot of new capabilities.

Brian Schott
<email address hidden>

On Mar 10, 2011, at 9:36 AM, Jay Pipes wrote:

> Review: Approve
> Awesome job, Kei :)
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> You are reviewing the proposed merge of lp:~nttdata/nova/live-migration into lp:nova.

Revision history for this message
Kei Masumoto (masumotok) wrote :

Jay, Brian, thank you! I appreciate your help!

Kei

Revision history for this message
Ken Pepple (ken-pepple) wrote :

This is a nit but will drive nova admins crazy -- in nova-manage, i think we should verify the destination host name and service alive-ness before we send the rpc call off to the scheduler. I think this is as easy to implement as wrapping nova-manage:31-39 with an if..else statement with a call to db.service_get_by_host_and_topic(context, dest, "compute").

This will save us from waiting on the migration (that will never happen) and cleaning out the queue later.

review: Needs Fixing
Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi Ken, thanks for your comments!
Fix it soon..

Kei

Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi Ken,

I appreciate if I ask you some question.

1. In current proposed code, scheduler checks many things, src/dest alive check, lack of memory check, hypervisor check... Your comment implies all those checks must be done in nova-manage? or just alive check? I was thinking it is safer that those checks is done at scheduler, for example, scheduler is busy. Other example is that an use directly send a request to scheduler not using nova-manage(what I am worrying about some security issues occurs here or no need to think?).

2. In current proposed code, if destination host is not alive(this check is done at scheduler), an exception raised and returned to nova-manage. then, we have to cleanup rabbitmq?

I'm bit confused, please give me a light..

Kei

Revision history for this message
Ken Pepple (ken-pepple) wrote :

> 1. In current proposed code, scheduler checks many things, src/dest alive check, lack of memory check, hypervisor check... Your comment implies all those checks must be done in nova-manage? or just alive check? I was thinking it is safer that those checks is done at scheduler, for example, scheduler is busy. Other example is that an use directly send a request to scheduler not using nova-manage(what I am worrying about some security issues occurs here or no need to think?).

Hi masumoto-san -

Sorry, I meant for only basic checks to be done in nova-manage. My concern is that admins will start a live migration to a non-existant host (or disabled host), wait for a minute or two, then check euca-describe-instances and see see that nothing happened because the recover_live_migration has already set it back to "running" state.

I agree that most checks should be done in scheduler, so that later we might be able to add API support for live-migration.

Will nova/compute/manager.py:879 check to make sure that the service is alive ? I thought this just looked for a queue (and queues aren't destroyed on alive checks), so this may be my misunderstanding.

> 2. In current proposed code, if destination host is not alive(this check is done at scheduler), an exception raised and returned to nova-manage. then, we have to cleanup rabbitmq?

no, not we are never putting it in the queue.

Thanks for all the work on this -- looking for to live-migration.

Revision history for this message
Kei Masumoto (masumotok) wrote :
Download full text (3.2 KiB)

Hi Ken-san, Thanks for your answer!

> My concern is that admins will start a live migration to a non-existant host
> (or disabled host), wait for a minute or two, then check euca-describe-instances
> and see see that nothing happened because the recover_live_migration has
> already set it back to "running" state.

In our environment, admin can get an error message such as, "destination host is not alive" in a few seconds, because scheduler check it and raise exception(see below for other scheduler checks).

> Will nova/compute/manager.py:879 check to make sure that the service is alive ?
pre_live_migratioin(meaning nova/compute/manager.py:879) is a checking method that can be checked only in compute node. such as, security group ingress rule(iptables rule) is successfully taken over?, destination host can recognize volumes(iscsi daemon is alive?/aoe kernel module is inserted?), ..etc.
On the other hand, scheduler make sure for other checks that can be done everywhere(see below).

[Examples of scheduler checks]
- instance is running
- src/dest host exists(and alive)
- nova-computes run on src/dest host
- nova-volume is alive when instance mounts a volume.
- hypervisor_type, hypervisor_version and cpu compatibility
- dest host has enough memory?
- src/dest host mounts same shared storage?

Please let me know if it does not make sense to you.(I always am worrying about my english mistake confuse you :)) . I think I explained here your concerns does not matter in current implementation....

Thanks again!
Kei

-----Original Message-----
From: Ken Pepple [mailto:<email address hidden>]
Sent: Friday, March 11, 2011 12:26 PM
To: RDH 桝本 圭(ITアーキ&セキュ技術)
Cc: <email address hidden>
Subject: Re: [Merge] lp:~nttdata/nova/live-migration into lp:nova

> 1. In current proposed code, scheduler checks many things, src/dest alive check, lack of memory check, hypervisor check... Your comment implies all those checks must be done in nova-manage? or just alive check? I was thinking it is safer that those checks is done at scheduler, for example, scheduler is busy. Other example is that an use directly send a request to scheduler not using nova-manage(what I am worrying about some security issues occurs here or no need to think?).

Hi masumoto-san -

Sorry, I meant for only basic checks to be done in nova-manage. My concern is that admins will start a live migration to a non-existant host (or disabled host), wait for a minute or two, then check euca-describe-instances and see see that nothing happened because the recover_live_migration has already set it back to "running" state.

I agree that most checks should be done in scheduler, so that later we might be able to add API support for live-migration.

Will nova/compute/manager.py:879 check to make sure that the service is alive ? I thought this just looked for a queue (and queues aren't destroyed on alive checks), so this may be my misunderstanding.

> 2. In current proposed code, if destination host is not alive(this check is done at scheduler), an exception raised and returned to nova-manage. then, we have to cleanup rabbitmq?

no, not we are ...

Read more...

Revision history for this message
Ken Pepple (ken-pepple) wrote :

On Mar 10, 2011, at 8:01 PM, Kei Masumoto wrote:
> Please let me know if it does not make sense to you.(I always am worrying about my english mistake confuse you :)) . I think I explained here your concerns does not matter in current implementation….

Okay, i think i understand … it can be a bit difficult following the scheduler code sometimes.

Last question: don't we need this patch (see below) ? my install fails when i do this:

root@shuttle:~/src/live-migration/contrib/nova# bin/nova-manage vm live_migration i0023 badhost
2011-03-10 21:53:20,653 CRITICAL nova [-] global name 'ec2_id_to_id' is not defined
(nova): TRACE: Traceback (most recent call last):
(nova): TRACE: File "bin/nova-manage", line 1074, in <module>
(nova): TRACE: main()
(nova): TRACE: File "bin/nova-manage", line 1066, in main
(nova): TRACE: fn(*argv)
(nova): TRACE: File "bin/nova-manage", line 573, in live_migration
(nova): TRACE: instance_id = ec2_id_to_id(ec2_id)
(nova): TRACE: NameError: global name 'ec2_id_to_id' is not defined
(nova): TRACE:

Or did I not install this correctly ?
Thanks again
/k

===== PATCH ======

=== modified file 'bin/nova-manage'
--- bin/nova-manage 2011-03-10 06:23:13 +0000
+++ bin/nova-manage 2011-03-11 05:56:38 +0000
@@ -570,7 +570,7 @@
         """

         ctxt = context.get_admin_context()
- instance_id = ec2_id_to_id(ec2_id)
+ instance_id = ec2utils.ec2_id_to_id(ec2_id)

         if FLAGS.connection_type != 'libvirt':
             msg = _('Only KVM is supported for now. Sorry!')

Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi ken!

Hold on, in japan, now quite big earthquake comes. TV channel does not work, I am not sure what is going on..
Probably, almost all companies stop their business, and employees are trying to escape.
Actually, I helped a pregnant woman and baby and finally get back to my apartment.

After I tidy my room, I'm gonna check this. my new TV didn’t fall down and not broke although glasses broke, I don’t know I am still lucky or not...

By the way, I am not sure it is ok to write e-mail or I should prepare to escape? :)

Kei

Revision history for this message
Ken Pepple (ken-pepple) wrote :

Masumoto-san -- i'm talking with my friends in Aoyama (8.8M !). you should definitely escape :)

Revision history for this message
Thierry Carrez (ttx) wrote :

I think this should be merged *now*. The feature part was approved already. Given the situation in Japan I don't expect Kei to have lots of time to add the additional pre-checks that Ken mentioned.

Someone can propose a branch that adds the checks afterwards.

review: Approve
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Attempt to merge into lp:nova failed due to conflicts:

text conflict in nova/tests/test_virt.py

Revision history for this message
Ken Pepple (ken-pepple) wrote :

agreeing with ttx, will file bugs/patches on my objections.

review: Approve
Revision history for this message
Jay Pipes (jaypipes) wrote :

I'm fixing the merge conflict locally for Kei and will push shortly.

Revision history for this message
Brian Schott (bfschott) wrote :

Jay,

Not that you need a reference, but I may have fixed those conflicts in:
lp:~usc-isi/nova/hpc-trunk
Don't pull the whole branch, as it has our cpu-arch extensions.

Brian Schott
<email address hidden>

On Mar 14, 2011, at 12:43 PM, Jay Pipes wrote:

> I'm fixing the merge conflict locally for Kei and will push shortly.
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> You are reviewing the proposed merge of lp:~nttdata/nova/live-migration into lp:nova.

Revision history for this message
Jay Pipes (jaypipes) wrote :

Thanks Brian! It was a simple little import order thingie, though :)
Not a big deal!

On Mon, Mar 14, 2011 at 1:44 PM, Brian Schott <email address hidden> wrote:
> Jay,
>
> Not that you need a reference, but I may have fixed those conflicts in:
> lp:~usc-isi/nova/hpc-trunk
> Don't pull the whole branch, as it has our cpu-arch extensions.
>
> Brian Schott
> <email address hidden>
>
>
>
> On Mar 14, 2011, at 12:43 PM, Jay Pipes wrote:
>
>> I'm fixing the merge conflict locally for Kei and will push shortly.
>> --
>> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
>> You are reviewing the proposed merge of lp:~nttdata/nova/live-migration into lp:nova.
>
>
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> You are reviewing the proposed merge of lp:~nttdata/nova/live-migration into lp:nova.
>

Revision history for this message
Brian Schott (bfschott) wrote :

Lorin,

Good catch. That's going to hit trunk soon. I'm going to submit a bug to nova trunk. Jay, are you able to confirm this?

Brian

---
Brian Schott
USC Information Sciences Institute
http://www.east.isi.edu/~bschott
ph: 703-812-3722 fx: 703-812-3712

On Mar 14, 2011, at 3:34 PM, Lorin Hochstein wrote:

> I was running hpc-trunk with Ubuntu packages and saw error this in nova-compute
>
> 2011-03-14 12:12:02,764 ERROR nova [-] in Service.create()
> (nova): TRACE: Traceback (most recent call last):
> (nova): TRACE: File "/usr/lib/pymodules/python2.6/nova/service.py", line 264, in serve
> (nova): TRACE: services = [Service.create()]
> (nova): TRACE: File "/usr/lib/pymodules/python2.6/nova/service.py", line 167, in create
> (nova): TRACE: report_interval, periodic_interval)
> (nova): TRACE: File "/usr/lib/pymodules/python2.6/nova/service.py", line 73, in __init__
> (nova): TRACE: self.manager = manager_class(host=self.host, *args, **kwargs)
> (nova): TRACE: File "/usr/lib/pymodules/python2.6/nova/compute/manager.py", line 118, in __init__
> (nova): TRACE: self.driver = utils.import_object(compute_driver)
> (nova): TRACE: File "/usr/lib/pymodules/python2.6/nova/utils.py", line 75, in import_object
> (nova): TRACE: return cls()
> (nova): TRACE: File "/usr/lib/pymodules/python2.6/nova/virt/connection.py", line 64, in get_connection
> (nova): TRACE: conn = libvirt_conn.get_connection(read_only)
> (nova): TRACE: File "/usr/lib/pymodules/python2.6/nova/virt/libvirt_conn.py", line 131, in get_connection
> (nova): TRACE: return LibvirtConnection(read_only)
> (nova): TRACE: File "/usr/lib/pymodules/python2.6/nova/virt/libvirt_conn.py", line 163, in __init__
> (nova): TRACE: self.cpuinfo_xml = open(FLAGS.cpuinfo_xml_template).read()
> (nova): TRACE: IOError: [Errno 2] No such file or directory: '/usr/lib/pymodules/python2.6/nova/virt/cpuinfo.xml.template'
> (nova): TRACE:
>
> The NTT guys seem to have added a new file (cpuinfo.xml.template), but that file isn't currently being packaged. I'm not sure why, but I'm trying to figure it out.
>
> Lorin
>
> --
> Lorin Hochstein, Computer Scientist
> USC Information Sciences Institute
> 703.812.3710
> http://www.east.isi.edu/~lorin
>
> _______________________________________________
> DODCS mailing list
> <email address hidden>
> http://list.east.isi.edu/cgi-bin/mailman/listinfo/dodcs

Revision history for this message
Jay Pipes (jaypipes) wrote :

Hmm, this should not have gotten through the distribution/packaging
tests... I'll see what I can discover.

-jay

On Mon, Mar 14, 2011 at 3:47 PM, Brian Schott <email address hidden> wrote:
> Lorin,
>
> Good catch.  That's going to hit trunk soon.  I'm going to submit a bug to nova trunk.  Jay, are you able to confirm this?
>
> Brian
>
> ---
> Brian Schott
> USC Information Sciences Institute
> http://www.east.isi.edu/~bschott
> ph: 703-812-3722 fx: 703-812-3712
>
>
> On Mar 14, 2011, at 3:34 PM, Lorin Hochstein wrote:
>
>> I was running hpc-trunk with Ubuntu packages and saw error this in nova-compute
>>
>> 2011-03-14 12:12:02,764 ERROR nova [-] in Service.create()
>> (nova): TRACE: Traceback (most recent call last):
>> (nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/service.py", line 264, in serve
>> (nova): TRACE:     services = [Service.create()]
>> (nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/service.py", line 167, in create
>> (nova): TRACE:     report_interval, periodic_interval)
>> (nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/service.py", line 73, in __init__
>> (nova): TRACE:     self.manager = manager_class(host=self.host, *args, **kwargs)
>> (nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/compute/manager.py", line 118, in __init__
>> (nova): TRACE:     self.driver = utils.import_object(compute_driver)
>> (nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/utils.py", line 75, in import_object
>> (nova): TRACE:     return cls()
>> (nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/virt/connection.py", line 64, in get_connection
>> (nova): TRACE:     conn = libvirt_conn.get_connection(read_only)
>> (nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/virt/libvirt_conn.py", line 131, in get_connection
>> (nova): TRACE:     return LibvirtConnection(read_only)
>> (nova): TRACE:   File "/usr/lib/pymodules/python2.6/nova/virt/libvirt_conn.py", line 163, in __init__
>> (nova): TRACE:     self.cpuinfo_xml = open(FLAGS.cpuinfo_xml_template).read()
>> (nova): TRACE: IOError: [Errno 2] No such file or directory: '/usr/lib/pymodules/python2.6/nova/virt/cpuinfo.xml.template'
>> (nova): TRACE:
>>
>> The NTT guys seem to have added a new file (cpuinfo.xml.template), but that file isn't currently being packaged. I'm not sure why, but I'm trying to figure it out.
>>
>> Lorin
>>
>> --
>> Lorin Hochstein, Computer Scientist
>> USC Information Sciences Institute
>> 703.812.3710
>> http://www.east.isi.edu/~lorin
>>
>> _______________________________________________
>> DODCS mailing list
>> <email address hidden>
>> http://list.east.isi.edu/cgi-bin/mailman/listinfo/dodcs
>
>
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> You are reviewing the proposed merge of lp:~nttdata/nova/live-migration into lp:nova.
>

Revision history for this message
Soren Hansen (soren) wrote :

I have a Jenkins job that would have alerted us about this sooner. It
triggers when files are added to bzr, but don't end up in the tarball.
I've made a note to get that added tomorrow.

--
Soren Hansen        | http://linux2go.dk/
Ubuntu Developer    | http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/

Revision history for this message
Kei Masumoto (masumotok) wrote :

Hi,

I would like to say thank you for everyone regarding to live-migration branch was merged. Especially, reviewers both from core dev and from community, help me to improve quality of our branch. I don’t think this is the end of our work, at least I've already recognized that I have to follow some recent nova's changes. My team is planning to submit some patches.

By the way, regarding to the earthquake, it happened the north part of Japan, not Tokyo(where I live and I work at). Although we can engaging our business, the effect also comes to Tokyo. For example, crazy traffic jam, lack of gasoline, lack of food, stopping electricity and I spend with no lights but with candles in the night... etc.

Please forgive us we need some more time to get back, and I am looking forward to sending all of you many thanks f2f at next design summit.

Regards,
Kei

Revision history for this message
Jay Pipes (jaypipes) wrote :

I think I can safely say that all of us in the contributor community
wish you and all our colleagues and friends in Japan our best. It's a
horrific event and many of us feel powerless to do anything about it.
We hope you can find some normality in the coming weeks as, hopefully,
Japan recovers from the earthquake.

All our best,

jay

On Wed, Mar 16, 2011 at 2:42 AM, <email address hidden> wrote:
> Hi,
>
> I would like to say thank you for everyone regarding to live-migration branch was merged. Especially, reviewers both from core dev and from community, help me to improve quality of our branch. I don’t think this is the end of our work, at least I've already recognized that I have to follow some recent nova's changes. My team is planning to submit some patches.
>
> By the way, regarding to the earthquake, it happened the north part of Japan, not Tokyo(where I live and I work at). Although we can engaging our business, the effect also comes to Tokyo. For example, crazy traffic jam, lack of gasoline, lack of food, stopping electricity and I spend with no lights but with candles in the night... etc.
>
> Please forgive us we need some more time to get back, and I am looking forward to sending all of you many thanks f2f at next design summit.
>
> Regards,
> Kei
>

Revision history for this message
Brian Schott (bfschott) wrote :

+100

Brian Schott
<email address hidden>

On Mar 16, 2011, at 11:12 AM, Jay Pipes wrote:

> I think I can safely say that all of us in the contributor community
> wish you and all our colleagues and friends in Japan our best. It's a
> horrific event and many of us feel powerless to do anything about it.
> We hope you can find some normality in the coming weeks as, hopefully,
> Japan recovers from the earthquake.
>
> All our best,
>
> jay
>
> On Wed, Mar 16, 2011 at 2:42 AM, <email address hidden> wrote:
>> Hi,
>>
>> I would like to say thank you for everyone regarding to live-migration branch was merged. Especially, reviewers both from core dev and from community, help me to improve quality of our branch. I don’t think this is the end of our work, at least I've already recognized that I have to follow some recent nova's changes. My team is planning to submit some patches.
>>
>> By the way, regarding to the earthquake, it happened the north part of Japan, not Tokyo(where I live and I work at). Although we can engaging our business, the effect also comes to Tokyo. For example, crazy traffic jam, lack of gasoline, lack of food, stopping electricity and I spend with no lights but with candles in the night... etc.
>>
>> Please forgive us we need some more time to get back, and I am looking forward to sending all of you many thanks f2f at next design summit.
>>
>> Regards,
>> Kei
>>
>
> --
> https://code.launchpad.net/~nttdata/nova/live-migration/+merge/49699
> You are reviewing the proposed merge of lp:~nttdata/nova/live-migration into lp:nova.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bin/nova-manage'
2--- bin/nova-manage 2011-03-10 04:42:11 +0000
3+++ bin/nova-manage 2011-03-10 06:27:59 +0000
4@@ -558,6 +558,40 @@
5 db.network_delete_safe(context.get_admin_context(), network.id)
6
7
8+class VmCommands(object):
9+ """Class for mangaging VM instances."""
10+
11+ def live_migration(self, ec2_id, dest):
12+ """Migrates a running instance to a new machine.
13+
14+ :param ec2_id: instance id which comes from euca-describe-instance.
15+ :param dest: destination host name.
16+
17+ """
18+
19+ ctxt = context.get_admin_context()
20+ instance_id = ec2_id_to_id(ec2_id)
21+
22+ if FLAGS.connection_type != 'libvirt':
23+ msg = _('Only KVM is supported for now. Sorry!')
24+ raise exception.Error(msg)
25+
26+ if (FLAGS.volume_driver != 'nova.volume.driver.AOEDriver' and \
27+ FLAGS.volume_driver != 'nova.volume.driver.ISCSIDriver'):
28+ msg = _("Support only AOEDriver and ISCSIDriver. Sorry!")
29+ raise exception.Error(msg)
30+
31+ rpc.call(ctxt,
32+ FLAGS.scheduler_topic,
33+ {"method": "live_migration",
34+ "args": {"instance_id": instance_id,
35+ "dest": dest,
36+ "topic": FLAGS.compute_topic}})
37+
38+ print _('Migration of %s initiated.'
39+ 'Check its progress using euca-describe-instances.') % ec2_id
40+
41+
42 class ServiceCommands(object):
43 """Enable and disable running services"""
44
45@@ -602,6 +636,59 @@
46 return
47 db.service_update(ctxt, svc['id'], {'disabled': True})
48
49+ def describe_resource(self, host):
50+ """Describes cpu/memory/hdd info for host.
51+
52+ :param host: hostname.
53+
54+ """
55+
56+ result = rpc.call(context.get_admin_context(),
57+ FLAGS.scheduler_topic,
58+ {"method": "show_host_resources",
59+ "args": {"host": host}})
60+
61+ if type(result) != dict:
62+ print _('An unexpected error has occurred.')
63+ print _('[Result]'), result
64+ else:
65+ cpu = result['resource']['vcpus']
66+ mem = result['resource']['memory_mb']
67+ hdd = result['resource']['local_gb']
68+ cpu_u = result['resource']['vcpus_used']
69+ mem_u = result['resource']['memory_mb_used']
70+ hdd_u = result['resource']['local_gb_used']
71+
72+ print 'HOST\t\t\tPROJECT\t\tcpu\tmem(mb)\tdisk(gb)'
73+ print '%s(total)\t\t\t%s\t%s\t%s' % (host, cpu, mem, hdd)
74+ print '%s(used)\t\t\t%s\t%s\t%s' % (host, cpu_u, mem_u, hdd_u)
75+ for p_id, val in result['usage'].items():
76+ print '%s\t\t%s\t\t%s\t%s\t%s' % (host,
77+ p_id,
78+ val['vcpus'],
79+ val['memory_mb'],
80+ val['local_gb'])
81+
82+ def update_resource(self, host):
83+ """Updates available vcpu/memory/disk info for host.
84+
85+ :param host: hostname.
86+
87+ """
88+
89+ ctxt = context.get_admin_context()
90+ service_refs = db.service_get_all_by_host(ctxt, host)
91+ if len(service_refs) <= 0:
92+ raise exception.Invalid(_('%s does not exist.') % host)
93+
94+ service_refs = [s for s in service_refs if s['topic'] == 'compute']
95+ if len(service_refs) <= 0:
96+ raise exception.Invalid(_('%s is not compute node.') % host)
97+
98+ rpc.call(ctxt,
99+ db.queue_get_for(ctxt, FLAGS.compute_topic, host),
100+ {"method": "update_available_resource"})
101+
102
103 class LogCommands(object):
104 def request(self, request_id, logfile='/var/log/nova.log'):
105@@ -905,6 +992,7 @@
106 ('fixed', FixedIpCommands),
107 ('floating', FloatingIpCommands),
108 ('network', NetworkCommands),
109+ ('vm', VmCommands),
110 ('service', ServiceCommands),
111 ('log', LogCommands),
112 ('db', DbCommands),
113
114=== modified file 'contrib/nova.sh'
115--- contrib/nova.sh 2011-03-08 00:01:43 +0000
116+++ contrib/nova.sh 2011-03-10 06:27:59 +0000
117@@ -76,6 +76,7 @@
118 sudo apt-get install -y python-migrate python-eventlet python-gflags python-ipy python-tempita
119 sudo apt-get install -y python-libvirt python-libxml2 python-routes python-cheetah
120 sudo apt-get install -y python-netaddr python-paste python-pastedeploy python-glance
121+ sudo apt-get install -y python-multiprocessing
122
123 if [ "$USE_IPV6" == 1 ]; then
124 sudo apt-get install -y radvd
125
126=== modified file 'nova/compute/manager.py'
127--- nova/compute/manager.py 2011-03-07 22:40:19 +0000
128+++ nova/compute/manager.py 2011-03-10 06:27:59 +0000
129@@ -36,9 +36,12 @@
130
131 import base64
132 import datetime
133+import os
134 import random
135 import string
136 import socket
137+import tempfile
138+import time
139 import functools
140
141 from nova import exception
142@@ -61,6 +64,9 @@
143 flags.DEFINE_string('console_host', socket.gethostname(),
144 'Console proxy host to use to connect to instances on'
145 'this host.')
146+flags.DEFINE_integer('live_migration_retry_count', 30,
147+ ("Retry count needed in live_migration."
148+ " sleep 1 sec for each count"))
149
150 LOG = logging.getLogger('nova.compute.manager')
151
152@@ -181,7 +187,7 @@
153 context=context)
154 self.db.instance_update(context,
155 instance_id,
156- {'host': self.host})
157+ {'host': self.host, 'launched_on': self.host})
158
159 self.db.instance_set_state(context,
160 instance_id,
161@@ -723,3 +729,248 @@
162 self.volume_manager.remove_compute_volume(context, volume_id)
163 self.db.volume_detached(context, volume_id)
164 return True
165+
166+ @exception.wrap_exception
167+ def compare_cpu(self, context, cpu_info):
168+ """Checks the host cpu is compatible to a cpu given by xml.
169+
170+ :param context: security context
171+ :param cpu_info: json string obtained from virConnect.getCapabilities
172+ :returns: See driver.compare_cpu
173+
174+ """
175+ return self.driver.compare_cpu(cpu_info)
176+
177+ @exception.wrap_exception
178+ def create_shared_storage_test_file(self, context):
179+ """Makes tmpfile under FLAGS.instance_path.
180+
181+ This method enables compute nodes to recognize that they mounts
182+ same shared storage. (create|check|creanup)_shared_storage_test_file()
183+ is a pair.
184+
185+ :param context: security context
186+ :returns: tmpfile name(basename)
187+
188+ """
189+
190+ dirpath = FLAGS.instances_path
191+ fd, tmp_file = tempfile.mkstemp(dir=dirpath)
192+ LOG.debug(_("Creating tmpfile %s to notify to other "
193+ "compute nodes that they should mount "
194+ "the same storage.") % tmp_file)
195+ os.close(fd)
196+ return os.path.basename(tmp_file)
197+
198+ @exception.wrap_exception
199+ def check_shared_storage_test_file(self, context, filename):
200+ """Confirms existence of the tmpfile under FLAGS.instances_path.
201+
202+ :param context: security context
203+ :param filename: confirm existence of FLAGS.instances_path/thisfile
204+
205+ """
206+
207+ tmp_file = os.path.join(FLAGS.instances_path, filename)
208+ if not os.path.exists(tmp_file):
209+ raise exception.NotFound(_('%s not found') % tmp_file)
210+
211+ @exception.wrap_exception
212+ def cleanup_shared_storage_test_file(self, context, filename):
213+ """Removes existence of the tmpfile under FLAGS.instances_path.
214+
215+ :param context: security context
216+ :param filename: remove existence of FLAGS.instances_path/thisfile
217+
218+ """
219+
220+ tmp_file = os.path.join(FLAGS.instances_path, filename)
221+ os.remove(tmp_file)
222+
223+ @exception.wrap_exception
224+ def update_available_resource(self, context):
225+ """See comments update_resource_info.
226+
227+ :param context: security context
228+ :returns: See driver.update_available_resource()
229+
230+ """
231+
232+ return self.driver.update_available_resource(context, self.host)
233+
234+ def pre_live_migration(self, context, instance_id):
235+ """Preparations for live migration at dest host.
236+
237+ :param context: security context
238+ :param instance_id: nova.db.sqlalchemy.models.Instance.Id
239+
240+ """
241+
242+ # Getting instance info
243+ instance_ref = self.db.instance_get(context, instance_id)
244+ ec2_id = instance_ref['hostname']
245+
246+ # Getting fixed ips
247+ fixed_ip = self.db.instance_get_fixed_address(context, instance_id)
248+ if not fixed_ip:
249+ msg = _("%(instance_id)s(%(ec2_id)s) does not have fixed_ip.")
250+ raise exception.NotFound(msg % locals())
251+
252+ # If any volume is mounted, prepare here.
253+ if not instance_ref['volumes']:
254+ LOG.info(_("%s has no volume."), ec2_id)
255+ else:
256+ for v in instance_ref['volumes']:
257+ self.volume_manager.setup_compute_volume(context, v['id'])
258+
259+ # Bridge settings.
260+ # Call this method prior to ensure_filtering_rules_for_instance,
261+ # since bridge is not set up, ensure_filtering_rules_for instance
262+ # fails.
263+ #
264+ # Retry operation is necessary because continuously request comes,
265+ # concorrent request occurs to iptables, then it complains.
266+ max_retry = FLAGS.live_migration_retry_count
267+ for cnt in range(max_retry):
268+ try:
269+ self.network_manager.setup_compute_network(context,
270+ instance_id)
271+ break
272+ except exception.ProcessExecutionError:
273+ if cnt == max_retry - 1:
274+ raise
275+ else:
276+ LOG.warn(_("setup_compute_network() failed %(cnt)d."
277+ "Retry up to %(max_retry)d for %(ec2_id)s.")
278+ % locals())
279+ time.sleep(1)
280+
281+ # Creating filters to hypervisors and firewalls.
282+ # An example is that nova-instance-instance-xxx,
283+ # which is written to libvirt.xml(Check "virsh nwfilter-list")
284+ # This nwfilter is necessary on the destination host.
285+ # In addition, this method is creating filtering rule
286+ # onto destination host.
287+ self.driver.ensure_filtering_rules_for_instance(instance_ref)
288+
289+ def live_migration(self, context, instance_id, dest):
290+ """Executing live migration.
291+
292+ :param context: security context
293+ :param instance_id: nova.db.sqlalchemy.models.Instance.Id
294+ :param dest: destination host
295+
296+ """
297+
298+ # Get instance for error handling.
299+ instance_ref = self.db.instance_get(context, instance_id)
300+ i_name = instance_ref.name
301+
302+ try:
303+ # Checking volume node is working correctly when any volumes
304+ # are attached to instances.
305+ if instance_ref['volumes']:
306+ rpc.call(context,
307+ FLAGS.volume_topic,
308+ {"method": "check_for_export",
309+ "args": {'instance_id': instance_id}})
310+
311+ # Asking dest host to preparing live migration.
312+ rpc.call(context,
313+ self.db.queue_get_for(context, FLAGS.compute_topic, dest),
314+ {"method": "pre_live_migration",
315+ "args": {'instance_id': instance_id}})
316+
317+ except Exception:
318+ msg = _("Pre live migration for %(i_name)s failed at %(dest)s")
319+ LOG.error(msg % locals())
320+ self.recover_live_migration(context, instance_ref)
321+ raise
322+
323+ # Executing live migration
324+ # live_migration might raises exceptions, but
325+ # nothing must be recovered in this version.
326+ self.driver.live_migration(context, instance_ref, dest,
327+ self.post_live_migration,
328+ self.recover_live_migration)
329+
330+ def post_live_migration(self, ctxt, instance_ref, dest):
331+ """Post operations for live migration.
332+
333+ This method is called from live_migration
334+ and mainly updating database record.
335+
336+ :param ctxt: security context
337+ :param instance_id: nova.db.sqlalchemy.models.Instance.Id
338+ :param dest: destination host
339+
340+ """
341+
342+ LOG.info(_('post_live_migration() is started..'))
343+ instance_id = instance_ref['id']
344+
345+ # Detaching volumes.
346+ try:
347+ for vol in self.db.volume_get_all_by_instance(ctxt, instance_id):
348+ self.volume_manager.remove_compute_volume(ctxt, vol['id'])
349+ except exception.NotFound:
350+ pass
351+
352+ # Releasing vlan.
353+ # (not necessary in current implementation?)
354+
355+ # Releasing security group ingress rule.
356+ self.driver.unfilter_instance(instance_ref)
357+
358+ # Database updating.
359+ i_name = instance_ref.name
360+ try:
361+ # Not return if floating_ip is not found, otherwise,
362+ # instance never be accessible..
363+ floating_ip = self.db.instance_get_floating_address(ctxt,
364+ instance_id)
365+ if not floating_ip:
366+ LOG.info(_('No floating_ip is found for %s.'), i_name)
367+ else:
368+ floating_ip_ref = self.db.floating_ip_get_by_address(ctxt,
369+ floating_ip)
370+ self.db.floating_ip_update(ctxt,
371+ floating_ip_ref['address'],
372+ {'host': dest})
373+ except exception.NotFound:
374+ LOG.info(_('No floating_ip is found for %s.'), i_name)
375+ except:
376+ LOG.error(_("Live migration: Unexpected error:"
377+ "%s cannot inherit floating ip..") % i_name)
378+
379+ # Restore instance/volume state
380+ self.recover_live_migration(ctxt, instance_ref, dest)
381+
382+ LOG.info(_('Migrating %(i_name)s to %(dest)s finished successfully.')
383+ % locals())
384+ LOG.info(_("You may see the error \"libvirt: QEMU error: "
385+ "Domain not found: no domain with matching name.\" "
386+ "This error can be safely ignored."))
387+
388+ def recover_live_migration(self, ctxt, instance_ref, host=None):
389+ """Recovers Instance/volume state from migrating -> running.
390+
391+ :param ctxt: security context
392+ :param instance_id: nova.db.sqlalchemy.models.Instance.Id
393+ :param host:
394+ DB column value is updated by this hostname.
395+ if none, the host instance currently running is selected.
396+
397+ """
398+
399+ if not host:
400+ host = instance_ref['host']
401+
402+ self.db.instance_update(ctxt,
403+ instance_ref['id'],
404+ {'state_description': 'running',
405+ 'state': power_state.RUNNING,
406+ 'host': host})
407+
408+ for volume in instance_ref['volumes']:
409+ self.db.volume_update(ctxt, volume['id'], {'status': 'in-use'})
410
411=== modified file 'nova/db/api.py'
412--- nova/db/api.py 2011-03-09 21:27:38 +0000
413+++ nova/db/api.py 2011-03-10 06:27:59 +0000
414@@ -104,6 +104,11 @@
415 return IMPL.service_get_all_by_host(context, host)
416
417
418+def service_get_all_compute_by_host(context, host):
419+ """Get all compute services for a given host."""
420+ return IMPL.service_get_all_compute_by_host(context, host)
421+
422+
423 def service_get_all_compute_sorted(context):
424 """Get all compute services sorted by instance count.
425
426@@ -153,6 +158,29 @@
427 ###################
428
429
430+def compute_node_get(context, compute_id, session=None):
431+ """Get an computeNode or raise if it does not exist."""
432+ return IMPL.compute_node_get(context, compute_id)
433+
434+
435+def compute_node_create(context, values):
436+ """Create a computeNode from the values dictionary."""
437+ return IMPL.compute_node_create(context, values)
438+
439+
440+def compute_node_update(context, compute_id, values):
441+ """Set the given properties on an computeNode and update it.
442+
443+ Raises NotFound if computeNode does not exist.
444+
445+ """
446+
447+ return IMPL.compute_node_update(context, compute_id, values)
448+
449+
450+###################
451+
452+
453 def certificate_create(context, values):
454 """Create a certificate from the values dictionary."""
455 return IMPL.certificate_create(context, values)
456@@ -257,6 +285,11 @@
457 return IMPL.floating_ip_get_by_address(context, address)
458
459
460+def floating_ip_update(context, address, values):
461+ """Update a floating ip by address or raise if it doesn't exist."""
462+ return IMPL.floating_ip_update(context, address, values)
463+
464+
465 ####################
466
467 def migration_update(context, id, values):
468@@ -441,6 +474,27 @@
469 security_group_id)
470
471
472+def instance_get_vcpu_sum_by_host_and_project(context, hostname, proj_id):
473+ """Get instances.vcpus by host and project."""
474+ return IMPL.instance_get_vcpu_sum_by_host_and_project(context,
475+ hostname,
476+ proj_id)
477+
478+
479+def instance_get_memory_sum_by_host_and_project(context, hostname, proj_id):
480+ """Get amount of memory by host and project."""
481+ return IMPL.instance_get_memory_sum_by_host_and_project(context,
482+ hostname,
483+ proj_id)
484+
485+
486+def instance_get_disk_sum_by_host_and_project(context, hostname, proj_id):
487+ """Get total amount of disk by host and project."""
488+ return IMPL.instance_get_disk_sum_by_host_and_project(context,
489+ hostname,
490+ proj_id)
491+
492+
493 def instance_action_create(context, values):
494 """Create an instance action from the values dictionary."""
495 return IMPL.instance_action_create(context, values)
496@@ -765,6 +819,11 @@
497 return IMPL.volume_get_all_by_host(context, host)
498
499
500+def volume_get_all_by_instance(context, instance_id):
501+ """Get all volumes belonging to a instance."""
502+ return IMPL.volume_get_all_by_instance(context, instance_id)
503+
504+
505 def volume_get_all_by_project(context, project_id):
506 """Get all volumes belonging to a project."""
507 return IMPL.volume_get_all_by_project(context, project_id)
508
509=== modified file 'nova/db/sqlalchemy/api.py'
510--- nova/db/sqlalchemy/api.py 2011-03-09 21:27:38 +0000
511+++ nova/db/sqlalchemy/api.py 2011-03-10 06:27:59 +0000
512@@ -118,6 +118,11 @@
513 service_ref = service_get(context, service_id, session=session)
514 service_ref.delete(session=session)
515
516+ if service_ref.topic == 'compute' and \
517+ len(service_ref.compute_node) != 0:
518+ for c in service_ref.compute_node:
519+ c.delete(session=session)
520+
521
522 @require_admin_context
523 def service_get(context, service_id, session=None):
524@@ -125,6 +130,7 @@
525 session = get_session()
526
527 result = session.query(models.Service).\
528+ options(joinedload('compute_node')).\
529 filter_by(id=service_id).\
530 filter_by(deleted=can_read_deleted(context)).\
531 first()
532@@ -175,6 +181,24 @@
533
534
535 @require_admin_context
536+def service_get_all_compute_by_host(context, host):
537+ topic = 'compute'
538+ session = get_session()
539+ result = session.query(models.Service).\
540+ options(joinedload('compute_node')).\
541+ filter_by(deleted=False).\
542+ filter_by(host=host).\
543+ filter_by(topic=topic).\
544+ all()
545+
546+ if not result:
547+ raise exception.NotFound(_("%s does not exist or is not "
548+ "a compute node.") % host)
549+
550+ return result
551+
552+
553+@require_admin_context
554 def _service_get_all_topic_subquery(context, session, topic, subq, label):
555 sort_value = getattr(subq.c, label)
556 return session.query(models.Service, func.coalesce(sort_value, 0)).\
557@@ -285,6 +309,42 @@
558
559
560 @require_admin_context
561+def compute_node_get(context, compute_id, session=None):
562+ if not session:
563+ session = get_session()
564+
565+ result = session.query(models.ComputeNode).\
566+ filter_by(id=compute_id).\
567+ filter_by(deleted=can_read_deleted(context)).\
568+ first()
569+
570+ if not result:
571+ raise exception.NotFound(_('No computeNode for id %s') % compute_id)
572+
573+ return result
574+
575+
576+@require_admin_context
577+def compute_node_create(context, values):
578+ compute_node_ref = models.ComputeNode()
579+ compute_node_ref.update(values)
580+ compute_node_ref.save()
581+ return compute_node_ref
582+
583+
584+@require_admin_context
585+def compute_node_update(context, compute_id, values):
586+ session = get_session()
587+ with session.begin():
588+ compute_ref = compute_node_get(context, compute_id, session=session)
589+ compute_ref.update(values)
590+ compute_ref.save(session=session)
591+
592+
593+###################
594+
595+
596+@require_admin_context
597 def certificate_get(context, certificate_id, session=None):
598 if not session:
599 session = get_session()
600@@ -505,6 +565,16 @@
601 return result
602
603
604+@require_context
605+def floating_ip_update(context, address, values):
606+ session = get_session()
607+ with session.begin():
608+ floating_ip_ref = floating_ip_get_by_address(context, address, session)
609+ for (key, value) in values.iteritems():
610+ floating_ip_ref[key] = value
611+ floating_ip_ref.save(session=session)
612+
613+
614 ###################
615
616
617@@ -905,6 +975,45 @@
618
619
620 @require_context
621+def instance_get_vcpu_sum_by_host_and_project(context, hostname, proj_id):
622+ session = get_session()
623+ result = session.query(models.Instance).\
624+ filter_by(host=hostname).\
625+ filter_by(project_id=proj_id).\
626+ filter_by(deleted=False).\
627+ value(func.sum(models.Instance.vcpus))
628+ if not result:
629+ return 0
630+ return result
631+
632+
633+@require_context
634+def instance_get_memory_sum_by_host_and_project(context, hostname, proj_id):
635+ session = get_session()
636+ result = session.query(models.Instance).\
637+ filter_by(host=hostname).\
638+ filter_by(project_id=proj_id).\
639+ filter_by(deleted=False).\
640+ value(func.sum(models.Instance.memory_mb))
641+ if not result:
642+ return 0
643+ return result
644+
645+
646+@require_context
647+def instance_get_disk_sum_by_host_and_project(context, hostname, proj_id):
648+ session = get_session()
649+ result = session.query(models.Instance).\
650+ filter_by(host=hostname).\
651+ filter_by(project_id=proj_id).\
652+ filter_by(deleted=False).\
653+ value(func.sum(models.Instance.local_gb))
654+ if not result:
655+ return 0
656+ return result
657+
658+
659+@require_context
660 def instance_action_create(context, values):
661 """Create an instance action from the values dictionary."""
662 action_ref = models.InstanceActions()
663@@ -1522,6 +1631,18 @@
664 all()
665
666
667+@require_admin_context
668+def volume_get_all_by_instance(context, instance_id):
669+ session = get_session()
670+ result = session.query(models.Volume).\
671+ filter_by(instance_id=instance_id).\
672+ filter_by(deleted=False).\
673+ all()
674+ if not result:
675+ raise exception.NotFound(_('No volume for instance %s') % instance_id)
676+ return result
677+
678+
679 @require_context
680 def volume_get_all_by_project(context, project_id):
681 authorize_project_context(context, project_id)
682
683=== added file 'nova/db/sqlalchemy/migrate_repo/versions/010_add_live_migration.py'
684--- nova/db/sqlalchemy/migrate_repo/versions/010_add_live_migration.py 1970-01-01 00:00:00 +0000
685+++ nova/db/sqlalchemy/migrate_repo/versions/010_add_live_migration.py 2011-03-10 06:27:59 +0000
686@@ -0,0 +1,83 @@
687+# vim: tabstop=4 shiftwidth=4 softtabstop=4
688+
689+# Copyright 2010 United States Government as represented by the
690+# Administrator of the National Aeronautics and Space Administration.
691+# All Rights Reserved.
692+#
693+# Licensed under the Apache License, Version 2.0 (the "License"); you may
694+# not use this file except in compliance with the License. You may obtain
695+# a copy of the License at
696+#
697+# http://www.apache.org/licenses/LICENSE-2.0
698+#
699+# Unless required by applicable law or agreed to in writing, software
700+# distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
701+# WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
702+# License for the specific language governing permissions and limitations
703+# under the License.
704+
705+from migrate import *
706+from nova import log as logging
707+from sqlalchemy import *
708+
709+
710+meta = MetaData()
711+
712+instances = Table('instances', meta,
713+ Column('id', Integer(), primary_key=True, nullable=False),
714+ )
715+
716+#
717+# New Tables
718+#
719+
720+compute_nodes = Table('compute_nodes', meta,
721+ Column('created_at', DateTime(timezone=False)),
722+ Column('updated_at', DateTime(timezone=False)),
723+ Column('deleted_at', DateTime(timezone=False)),
724+ Column('deleted', Boolean(create_constraint=True, name=None)),
725+ Column('id', Integer(), primary_key=True, nullable=False),
726+ Column('service_id', Integer(), nullable=False),
727+
728+ Column('vcpus', Integer(), nullable=False),
729+ Column('memory_mb', Integer(), nullable=False),
730+ Column('local_gb', Integer(), nullable=False),
731+ Column('vcpus_used', Integer(), nullable=False),
732+ Column('memory_mb_used', Integer(), nullable=False),
733+ Column('local_gb_used', Integer(), nullable=False),
734+ Column('hypervisor_type',
735+ Text(convert_unicode=False, assert_unicode=None,
736+ unicode_error=None, _warn_on_bytestring=False),
737+ nullable=False),
738+ Column('hypervisor_version', Integer(), nullable=False),
739+ Column('cpu_info',
740+ Text(convert_unicode=False, assert_unicode=None,
741+ unicode_error=None, _warn_on_bytestring=False),
742+ nullable=False),
743+ )
744+
745+
746+#
747+# Tables to alter
748+#
749+instances_launched_on = Column(
750+ 'launched_on',
751+ Text(convert_unicode=False, assert_unicode=None,
752+ unicode_error=None, _warn_on_bytestring=False),
753+ nullable=True)
754+
755+
756+def upgrade(migrate_engine):
757+ # Upgrade operations go here. Don't create your own engine;
758+ # bind migrate_engine to your metadata
759+ meta.bind = migrate_engine
760+
761+ try:
762+ compute_nodes.create()
763+ except Exception:
764+ logging.info(repr(compute_nodes))
765+ logging.exception('Exception while creating table')
766+ meta.drop_all(tables=[compute_nodes])
767+ raise
768+
769+ instances.create_column(instances_launched_on)
770
771=== modified file 'nova/db/sqlalchemy/models.py'
772--- nova/db/sqlalchemy/models.py 2011-03-03 19:13:15 +0000
773+++ nova/db/sqlalchemy/models.py 2011-03-10 06:27:59 +0000
774@@ -113,6 +113,41 @@
775 availability_zone = Column(String(255), default='nova')
776
777
778+class ComputeNode(BASE, NovaBase):
779+ """Represents a running compute service on a host."""
780+
781+ __tablename__ = 'compute_nodes'
782+ id = Column(Integer, primary_key=True)
783+ service_id = Column(Integer, ForeignKey('services.id'), nullable=True)
784+ service = relationship(Service,
785+ backref=backref('compute_node'),
786+ foreign_keys=service_id,
787+ primaryjoin='and_('
788+ 'ComputeNode.service_id == Service.id,'
789+ 'ComputeNode.deleted == False)')
790+
791+ vcpus = Column(Integer, nullable=True)
792+ memory_mb = Column(Integer, nullable=True)
793+ local_gb = Column(Integer, nullable=True)
794+ vcpus_used = Column(Integer, nullable=True)
795+ memory_mb_used = Column(Integer, nullable=True)
796+ local_gb_used = Column(Integer, nullable=True)
797+ hypervisor_type = Column(Text, nullable=True)
798+ hypervisor_version = Column(Integer, nullable=True)
799+
800+ # Note(masumotok): Expected Strings example:
801+ #
802+ # '{"arch":"x86_64",
803+ # "model":"Nehalem",
804+ # "topology":{"sockets":1, "threads":2, "cores":3},
805+ # "features":["tdtscp", "xtpr"]}'
806+ #
807+ # Points are "json translatable" and it must have all dictionary keys
808+ # above, since it is copied from <cpu> tag of getCapabilities()
809+ # (See libvirt.virtConnection).
810+ cpu_info = Column(Text, nullable=True)
811+
812+
813 class Certificate(BASE, NovaBase):
814 """Represents a an x509 certificate"""
815 __tablename__ = 'certificates'
816@@ -191,6 +226,9 @@
817 display_name = Column(String(255))
818 display_description = Column(String(255))
819
820+ # To remember on which host a instance booted.
821+ # An instance may have moved to another host by live migraiton.
822+ launched_on = Column(Text)
823 locked = Column(Boolean)
824
825 # TODO(vish): see Ewan's email about state improvements, probably
826
827=== modified file 'nova/scheduler/driver.py'
828--- nova/scheduler/driver.py 2011-01-18 19:01:16 +0000
829+++ nova/scheduler/driver.py 2011-03-10 06:27:59 +0000
830@@ -26,10 +26,14 @@
831 from nova import db
832 from nova import exception
833 from nova import flags
834+from nova import log as logging
835+from nova import rpc
836+from nova.compute import power_state
837
838 FLAGS = flags.FLAGS
839 flags.DEFINE_integer('service_down_time', 60,
840 'maximum time since last checkin for up service')
841+flags.DECLARE('instances_path', 'nova.compute.manager')
842
843
844 class NoValidHost(exception.Error):
845@@ -64,3 +68,236 @@
846 def schedule(self, context, topic, *_args, **_kwargs):
847 """Must override at least this method for scheduler to work."""
848 raise NotImplementedError(_("Must implement a fallback schedule"))
849+
850+ def schedule_live_migration(self, context, instance_id, dest):
851+ """Live migration scheduling method.
852+
853+ :param context:
854+ :param instance_id:
855+ :param dest: destination host
856+ :return:
857+ The host where instance is running currently.
858+ Then scheduler send request that host.
859+
860+ """
861+
862+ # Whether instance exists and is running.
863+ instance_ref = db.instance_get(context, instance_id)
864+
865+ # Checking instance.
866+ self._live_migration_src_check(context, instance_ref)
867+
868+ # Checking destination host.
869+ self._live_migration_dest_check(context, instance_ref, dest)
870+
871+ # Common checking.
872+ self._live_migration_common_check(context, instance_ref, dest)
873+
874+ # Changing instance_state.
875+ db.instance_set_state(context,
876+ instance_id,
877+ power_state.PAUSED,
878+ 'migrating')
879+
880+ # Changing volume state
881+ for volume_ref in instance_ref['volumes']:
882+ db.volume_update(context,
883+ volume_ref['id'],
884+ {'status': 'migrating'})
885+
886+ # Return value is necessary to send request to src
887+ # Check _schedule() in detail.
888+ src = instance_ref['host']
889+ return src
890+
891+ def _live_migration_src_check(self, context, instance_ref):
892+ """Live migration check routine (for src host).
893+
894+ :param context: security context
895+ :param instance_ref: nova.db.sqlalchemy.models.Instance object
896+
897+ """
898+
899+ # Checking instance is running.
900+ if (power_state.RUNNING != instance_ref['state'] or \
901+ 'running' != instance_ref['state_description']):
902+ ec2_id = instance_ref['hostname']
903+ raise exception.Invalid(_('Instance(%s) is not running') % ec2_id)
904+
905+ # Checing volume node is running when any volumes are mounted
906+ # to the instance.
907+ if len(instance_ref['volumes']) != 0:
908+ services = db.service_get_all_by_topic(context, 'volume')
909+ if len(services) < 1 or not self.service_is_up(services[0]):
910+ raise exception.Invalid(_("volume node is not alive"
911+ "(time synchronize problem?)"))
912+
913+ # Checking src host exists and compute node
914+ src = instance_ref['host']
915+ services = db.service_get_all_compute_by_host(context, src)
916+
917+ # Checking src host is alive.
918+ if not self.service_is_up(services[0]):
919+ raise exception.Invalid(_("%s is not alive(time "
920+ "synchronize problem?)") % src)
921+
922+ def _live_migration_dest_check(self, context, instance_ref, dest):
923+ """Live migration check routine (for destination host).
924+
925+ :param context: security context
926+ :param instance_ref: nova.db.sqlalchemy.models.Instance object
927+ :param dest: destination host
928+
929+ """
930+
931+ # Checking dest exists and compute node.
932+ dservice_refs = db.service_get_all_compute_by_host(context, dest)
933+ dservice_ref = dservice_refs[0]
934+
935+ # Checking dest host is alive.
936+ if not self.service_is_up(dservice_ref):
937+ raise exception.Invalid(_("%s is not alive(time "
938+ "synchronize problem?)") % dest)
939+
940+ # Checking whether The host where instance is running
941+ # and dest is not same.
942+ src = instance_ref['host']
943+ if dest == src:
944+ ec2_id = instance_ref['hostname']
945+ raise exception.Invalid(_("%(dest)s is where %(ec2_id)s is "
946+ "running now. choose other host.")
947+ % locals())
948+
949+ # Checking dst host still has enough capacities.
950+ self.assert_compute_node_has_enough_resources(context,
951+ instance_ref,
952+ dest)
953+
954+ def _live_migration_common_check(self, context, instance_ref, dest):
955+ """Live migration common check routine.
956+
957+ Below checkings are followed by
958+ http://wiki.libvirt.org/page/TodoPreMigrationChecks
959+
960+ :param context: security context
961+ :param instance_ref: nova.db.sqlalchemy.models.Instance object
962+ :param dest: destination host
963+
964+ """
965+
966+ # Checking shared storage connectivity
967+ self.mounted_on_same_shared_storage(context, instance_ref, dest)
968+
969+ # Checking dest exists.
970+ dservice_refs = db.service_get_all_compute_by_host(context, dest)
971+ dservice_ref = dservice_refs[0]['compute_node'][0]
972+
973+ # Checking original host( where instance was launched at) exists.
974+ try:
975+ oservice_refs = db.service_get_all_compute_by_host(context,
976+ instance_ref['launched_on'])
977+ except exception.NotFound:
978+ raise exception.Invalid(_("host %s where instance was launched "
979+ "does not exist.")
980+ % instance_ref['launched_on'])
981+ oservice_ref = oservice_refs[0]['compute_node'][0]
982+
983+ # Checking hypervisor is same.
984+ orig_hypervisor = oservice_ref['hypervisor_type']
985+ dest_hypervisor = dservice_ref['hypervisor_type']
986+ if orig_hypervisor != dest_hypervisor:
987+ raise exception.Invalid(_("Different hypervisor type"
988+ "(%(orig_hypervisor)s->"
989+ "%(dest_hypervisor)s)')" % locals()))
990+
991+ # Checkng hypervisor version.
992+ orig_hypervisor = oservice_ref['hypervisor_version']
993+ dest_hypervisor = dservice_ref['hypervisor_version']
994+ if orig_hypervisor > dest_hypervisor:
995+ raise exception.Invalid(_("Older hypervisor version"
996+ "(%(orig_hypervisor)s->"
997+ "%(dest_hypervisor)s)") % locals())
998+
999+ # Checking cpuinfo.
1000+ try:
1001+ rpc.call(context,
1002+ db.queue_get_for(context, FLAGS.compute_topic, dest),
1003+ {"method": 'compare_cpu',
1004+ "args": {'cpu_info': oservice_ref['cpu_info']}})
1005+
1006+ except rpc.RemoteError:
1007+ src = instance_ref['host']
1008+ logging.exception(_("host %(dest)s is not compatible with "
1009+ "original host %(src)s.") % locals())
1010+ raise
1011+
1012+ def assert_compute_node_has_enough_resources(self, context,
1013+ instance_ref, dest):
1014+ """Checks if destination host has enough resource for live migration.
1015+
1016+ Currently, only memory checking has been done.
1017+ If storage migration(block migration, meaning live-migration
1018+ without any shared storage) will be available, local storage
1019+ checking is also necessary.
1020+
1021+ :param context: security context
1022+ :param instance_ref: nova.db.sqlalchemy.models.Instance object
1023+ :param dest: destination host
1024+
1025+ """
1026+
1027+ # Getting instance information
1028+ ec2_id = instance_ref['hostname']
1029+
1030+ # Getting host information
1031+ service_refs = db.service_get_all_compute_by_host(context, dest)
1032+ compute_node_ref = service_refs[0]['compute_node'][0]
1033+
1034+ mem_total = int(compute_node_ref['memory_mb'])
1035+ mem_used = int(compute_node_ref['memory_mb_used'])
1036+ mem_avail = mem_total - mem_used
1037+ mem_inst = instance_ref['memory_mb']
1038+ if mem_avail <= mem_inst:
1039+ raise exception.NotEmpty(_("Unable to migrate %(ec2_id)s "
1040+ "to destination: %(dest)s "
1041+ "(host:%(mem_avail)s "
1042+ "<= instance:%(mem_inst)s)")
1043+ % locals())
1044+
1045+ def mounted_on_same_shared_storage(self, context, instance_ref, dest):
1046+ """Check if the src and dest host mount same shared storage.
1047+
1048+ At first, dest host creates temp file, and src host can see
1049+ it if they mounts same shared storage. Then src host erase it.
1050+
1051+ :param context: security context
1052+ :param instance_ref: nova.db.sqlalchemy.models.Instance object
1053+ :param dest: destination host
1054+
1055+ """
1056+
1057+ src = instance_ref['host']
1058+ dst_t = db.queue_get_for(context, FLAGS.compute_topic, dest)
1059+ src_t = db.queue_get_for(context, FLAGS.compute_topic, src)
1060+
1061+ try:
1062+ # create tmpfile at dest host
1063+ filename = rpc.call(context, dst_t,
1064+ {"method": 'create_shared_storage_test_file'})
1065+
1066+ # make sure existence at src host.
1067+ rpc.call(context, src_t,
1068+ {"method": 'check_shared_storage_test_file',
1069+ "args": {'filename': filename}})
1070+
1071+ except rpc.RemoteError:
1072+ ipath = FLAGS.instances_path
1073+ logging.error(_("Cannot confirm tmpfile at %(ipath)s is on "
1074+ "same shared storage between %(src)s "
1075+ "and %(dest)s.") % locals())
1076+ raise
1077+
1078+ finally:
1079+ rpc.call(context, dst_t,
1080+ {"method": 'cleanup_shared_storage_test_file',
1081+ "args": {'filename': filename}})
1082
1083=== modified file 'nova/scheduler/manager.py'
1084--- nova/scheduler/manager.py 2011-01-19 15:41:30 +0000
1085+++ nova/scheduler/manager.py 2011-03-10 06:27:59 +0000
1086@@ -67,3 +67,55 @@
1087 {"method": method,
1088 "args": kwargs})
1089 LOG.debug(_("Casting to %(topic)s %(host)s for %(method)s") % locals())
1090+
1091+ # NOTE (masumotok) : This method should be moved to nova.api.ec2.admin.
1092+ # Based on bexar design summit discussion,
1093+ # just put this here for bexar release.
1094+ def show_host_resources(self, context, host, *args):
1095+ """Shows the physical/usage resource given by hosts.
1096+
1097+ :param context: security context
1098+ :param host: hostname
1099+ :returns:
1100+ example format is below.
1101+ {'resource':D, 'usage':{proj_id1:D, proj_id2:D}}
1102+ D: {'vcpus':3, 'memory_mb':2048, 'local_gb':2048}
1103+
1104+ """
1105+
1106+ compute_ref = db.service_get_all_compute_by_host(context, host)
1107+ compute_ref = compute_ref[0]
1108+
1109+ # Getting physical resource information
1110+ compute_node_ref = compute_ref['compute_node'][0]
1111+ resource = {'vcpus': compute_node_ref['vcpus'],
1112+ 'memory_mb': compute_node_ref['memory_mb'],
1113+ 'local_gb': compute_node_ref['local_gb'],
1114+ 'vcpus_used': compute_node_ref['vcpus_used'],
1115+ 'memory_mb_used': compute_node_ref['memory_mb_used'],
1116+ 'local_gb_used': compute_node_ref['local_gb_used']}
1117+
1118+ # Getting usage resource information
1119+ usage = {}
1120+ instance_refs = db.instance_get_all_by_host(context,
1121+ compute_ref['host'])
1122+ if not instance_refs:
1123+ return {'resource': resource, 'usage': usage}
1124+
1125+ project_ids = [i['project_id'] for i in instance_refs]
1126+ project_ids = list(set(project_ids))
1127+ for project_id in project_ids:
1128+ vcpus = db.instance_get_vcpu_sum_by_host_and_project(context,
1129+ host,
1130+ project_id)
1131+ mem = db.instance_get_memory_sum_by_host_and_project(context,
1132+ host,
1133+ project_id)
1134+ hdd = db.instance_get_disk_sum_by_host_and_project(context,
1135+ host,
1136+ project_id)
1137+ usage[project_id] = {'vcpus': int(vcpus),
1138+ 'memory_mb': int(mem),
1139+ 'local_gb': int(hdd)}
1140+
1141+ return {'resource': resource, 'usage': usage}
1142
1143=== modified file 'nova/service.py'
1144--- nova/service.py 2011-03-09 00:51:05 +0000
1145+++ nova/service.py 2011-03-10 06:27:59 +0000
1146@@ -92,6 +92,9 @@
1147 except exception.NotFound:
1148 self._create_service_ref(ctxt)
1149
1150+ if 'nova-compute' == self.binary:
1151+ self.manager.update_available_resource(ctxt)
1152+
1153 conn1 = rpc.Connection.instance(new=True)
1154 conn2 = rpc.Connection.instance(new=True)
1155 if self.report_interval:
1156
1157=== modified file 'nova/tests/test_compute.py'
1158--- nova/tests/test_compute.py 2011-03-10 04:42:11 +0000
1159+++ nova/tests/test_compute.py 2011-03-10 06:27:59 +0000
1160@@ -20,6 +20,7 @@
1161 """
1162
1163 import datetime
1164+import mox
1165
1166 from nova import compute
1167 from nova import context
1168@@ -27,15 +28,20 @@
1169 from nova import exception
1170 from nova import flags
1171 from nova import log as logging
1172+from nova import rpc
1173 from nova import test
1174 from nova import utils
1175 from nova.auth import manager
1176 from nova.compute import instance_types
1177+from nova.compute import manager as compute_manager
1178+from nova.compute import power_state
1179+from nova.db.sqlalchemy import models
1180 from nova.image import local
1181
1182 LOG = logging.getLogger('nova.tests.compute')
1183 FLAGS = flags.FLAGS
1184 flags.DECLARE('stub_network', 'nova.compute.manager')
1185+flags.DECLARE('live_migration_retry_count', 'nova.compute.manager')
1186
1187
1188 class ComputeTestCase(test.TestCase):
1189@@ -83,6 +89,41 @@
1190 'project_id': self.project.id}
1191 return db.security_group_create(self.context, values)
1192
1193+ def _get_dummy_instance(self):
1194+ """Get mock-return-value instance object
1195+ Use this when any testcase executed later than test_run_terminate
1196+ """
1197+ vol1 = models.Volume()
1198+ vol1['id'] = 1
1199+ vol2 = models.Volume()
1200+ vol2['id'] = 2
1201+ instance_ref = models.Instance()
1202+ instance_ref['id'] = 1
1203+ instance_ref['volumes'] = [vol1, vol2]
1204+ instance_ref['hostname'] = 'i-00000001'
1205+ instance_ref['host'] = 'dummy'
1206+ return instance_ref
1207+
1208+ def test_create_instance_defaults_display_name(self):
1209+ """Verify that an instance cannot be created without a display_name."""
1210+ cases = [dict(), dict(display_name=None)]
1211+ for instance in cases:
1212+ ref = self.compute_api.create(self.context,
1213+ FLAGS.default_instance_type, None, **instance)
1214+ try:
1215+ self.assertNotEqual(ref[0]['display_name'], None)
1216+ finally:
1217+ db.instance_destroy(self.context, ref[0]['id'])
1218+
1219+ def test_create_instance_associates_security_groups(self):
1220+ """Make sure create associates security groups"""
1221+ group = self._create_group()
1222+ instance_ref = models.Instance()
1223+ instance_ref['id'] = 1
1224+ instance_ref['volumes'] = [{'id': 1}, {'id': 2}]
1225+ instance_ref['hostname'] = 'i-00000001'
1226+ return instance_ref
1227+
1228 def test_create_instance_defaults_display_name(self):
1229 """Verify that an instance cannot be created without a display_name."""
1230 cases = [dict(), dict(display_name=None)]
1231@@ -301,3 +342,256 @@
1232 self.compute.terminate_instance(self.context, instance_id)
1233 type = instance_types.get_by_flavor_id("1")
1234 self.assertEqual(type, 'm1.tiny')
1235+
1236+ def _setup_other_managers(self):
1237+ self.volume_manager = utils.import_object(FLAGS.volume_manager)
1238+ self.network_manager = utils.import_object(FLAGS.network_manager)
1239+ self.compute_driver = utils.import_object(FLAGS.compute_driver)
1240+
1241+ def test_pre_live_migration_instance_has_no_fixed_ip(self):
1242+ """Confirm raising exception if instance doesn't have fixed_ip."""
1243+ instance_ref = self._get_dummy_instance()
1244+ c = context.get_admin_context()
1245+ i_id = instance_ref['id']
1246+
1247+ dbmock = self.mox.CreateMock(db)
1248+ dbmock.instance_get(c, i_id).AndReturn(instance_ref)
1249+ dbmock.instance_get_fixed_address(c, i_id).AndReturn(None)
1250+
1251+ self.compute.db = dbmock
1252+ self.mox.ReplayAll()
1253+ self.assertRaises(exception.NotFound,
1254+ self.compute.pre_live_migration,
1255+ c, instance_ref['id'])
1256+
1257+ def test_pre_live_migration_instance_has_volume(self):
1258+ """Confirm setup_compute_volume is called when volume is mounted."""
1259+ i_ref = self._get_dummy_instance()
1260+ c = context.get_admin_context()
1261+
1262+ self._setup_other_managers()
1263+ dbmock = self.mox.CreateMock(db)
1264+ volmock = self.mox.CreateMock(self.volume_manager)
1265+ netmock = self.mox.CreateMock(self.network_manager)
1266+ drivermock = self.mox.CreateMock(self.compute_driver)
1267+
1268+ dbmock.instance_get(c, i_ref['id']).AndReturn(i_ref)
1269+ dbmock.instance_get_fixed_address(c, i_ref['id']).AndReturn('dummy')
1270+ for i in range(len(i_ref['volumes'])):
1271+ vid = i_ref['volumes'][i]['id']
1272+ volmock.setup_compute_volume(c, vid).InAnyOrder('g1')
1273+ netmock.setup_compute_network(c, i_ref['id'])
1274+ drivermock.ensure_filtering_rules_for_instance(i_ref)
1275+
1276+ self.compute.db = dbmock
1277+ self.compute.volume_manager = volmock
1278+ self.compute.network_manager = netmock
1279+ self.compute.driver = drivermock
1280+
1281+ self.mox.ReplayAll()
1282+ ret = self.compute.pre_live_migration(c, i_ref['id'])
1283+ self.assertEqual(ret, None)
1284+
1285+ def test_pre_live_migration_instance_has_no_volume(self):
1286+ """Confirm log meg when instance doesn't mount any volumes."""
1287+ i_ref = self._get_dummy_instance()
1288+ i_ref['volumes'] = []
1289+ c = context.get_admin_context()
1290+
1291+ self._setup_other_managers()
1292+ dbmock = self.mox.CreateMock(db)
1293+ netmock = self.mox.CreateMock(self.network_manager)
1294+ drivermock = self.mox.CreateMock(self.compute_driver)
1295+
1296+ dbmock.instance_get(c, i_ref['id']).AndReturn(i_ref)
1297+ dbmock.instance_get_fixed_address(c, i_ref['id']).AndReturn('dummy')
1298+ self.mox.StubOutWithMock(compute_manager.LOG, 'info')
1299+ compute_manager.LOG.info(_("%s has no volume."), i_ref['hostname'])
1300+ netmock.setup_compute_network(c, i_ref['id'])
1301+ drivermock.ensure_filtering_rules_for_instance(i_ref)
1302+
1303+ self.compute.db = dbmock
1304+ self.compute.network_manager = netmock
1305+ self.compute.driver = drivermock
1306+
1307+ self.mox.ReplayAll()
1308+ ret = self.compute.pre_live_migration(c, i_ref['id'])
1309+ self.assertEqual(ret, None)
1310+
1311+ def test_pre_live_migration_setup_compute_node_fail(self):
1312+ """Confirm operation setup_compute_network() fails.
1313+
1314+ It retries and raise exception when timeout exceeded.
1315+
1316+ """
1317+
1318+ i_ref = self._get_dummy_instance()
1319+ c = context.get_admin_context()
1320+
1321+ self._setup_other_managers()
1322+ dbmock = self.mox.CreateMock(db)
1323+ netmock = self.mox.CreateMock(self.network_manager)
1324+ volmock = self.mox.CreateMock(self.volume_manager)
1325+
1326+ dbmock.instance_get(c, i_ref['id']).AndReturn(i_ref)
1327+ dbmock.instance_get_fixed_address(c, i_ref['id']).AndReturn('dummy')
1328+ for i in range(len(i_ref['volumes'])):
1329+ volmock.setup_compute_volume(c, i_ref['volumes'][i]['id'])
1330+ for i in range(FLAGS.live_migration_retry_count):
1331+ netmock.setup_compute_network(c, i_ref['id']).\
1332+ AndRaise(exception.ProcessExecutionError())
1333+
1334+ self.compute.db = dbmock
1335+ self.compute.network_manager = netmock
1336+ self.compute.volume_manager = volmock
1337+
1338+ self.mox.ReplayAll()
1339+ self.assertRaises(exception.ProcessExecutionError,
1340+ self.compute.pre_live_migration,
1341+ c, i_ref['id'])
1342+
1343+ def test_live_migration_works_correctly_with_volume(self):
1344+ """Confirm check_for_export to confirm volume health check."""
1345+ i_ref = self._get_dummy_instance()
1346+ c = context.get_admin_context()
1347+ topic = db.queue_get_for(c, FLAGS.compute_topic, i_ref['host'])
1348+
1349+ dbmock = self.mox.CreateMock(db)
1350+ dbmock.instance_get(c, i_ref['id']).AndReturn(i_ref)
1351+ self.mox.StubOutWithMock(rpc, 'call')
1352+ rpc.call(c, FLAGS.volume_topic, {"method": "check_for_export",
1353+ "args": {'instance_id': i_ref['id']}})
1354+ dbmock.queue_get_for(c, FLAGS.compute_topic, i_ref['host']).\
1355+ AndReturn(topic)
1356+ rpc.call(c, topic, {"method": "pre_live_migration",
1357+ "args": {'instance_id': i_ref['id']}})
1358+ self.mox.StubOutWithMock(self.compute.driver, 'live_migration')
1359+ self.compute.driver.live_migration(c, i_ref, i_ref['host'],
1360+ self.compute.post_live_migration,
1361+ self.compute.recover_live_migration)
1362+
1363+ self.compute.db = dbmock
1364+ self.mox.ReplayAll()
1365+ ret = self.compute.live_migration(c, i_ref['id'], i_ref['host'])
1366+ self.assertEqual(ret, None)
1367+
1368+ def test_live_migration_dest_raises_exception(self):
1369+ """Confirm exception when pre_live_migration fails."""
1370+ i_ref = self._get_dummy_instance()
1371+ c = context.get_admin_context()
1372+ topic = db.queue_get_for(c, FLAGS.compute_topic, i_ref['host'])
1373+
1374+ dbmock = self.mox.CreateMock(db)
1375+ dbmock.instance_get(c, i_ref['id']).AndReturn(i_ref)
1376+ self.mox.StubOutWithMock(rpc, 'call')
1377+ rpc.call(c, FLAGS.volume_topic, {"method": "check_for_export",
1378+ "args": {'instance_id': i_ref['id']}})
1379+ dbmock.queue_get_for(c, FLAGS.compute_topic, i_ref['host']).\
1380+ AndReturn(topic)
1381+ rpc.call(c, topic, {"method": "pre_live_migration",
1382+ "args": {'instance_id': i_ref['id']}}).\
1383+ AndRaise(rpc.RemoteError('', '', ''))
1384+ dbmock.instance_update(c, i_ref['id'], {'state_description': 'running',
1385+ 'state': power_state.RUNNING,
1386+ 'host': i_ref['host']})
1387+ for v in i_ref['volumes']:
1388+ dbmock.volume_update(c, v['id'], {'status': 'in-use'})
1389+
1390+ self.compute.db = dbmock
1391+ self.mox.ReplayAll()
1392+ self.assertRaises(rpc.RemoteError,
1393+ self.compute.live_migration,
1394+ c, i_ref['id'], i_ref['host'])
1395+
1396+ def test_live_migration_dest_raises_exception_no_volume(self):
1397+ """Same as above test(input pattern is different) """
1398+ i_ref = self._get_dummy_instance()
1399+ i_ref['volumes'] = []
1400+ c = context.get_admin_context()
1401+ topic = db.queue_get_for(c, FLAGS.compute_topic, i_ref['host'])
1402+
1403+ dbmock = self.mox.CreateMock(db)
1404+ dbmock.instance_get(c, i_ref['id']).AndReturn(i_ref)
1405+ dbmock.queue_get_for(c, FLAGS.compute_topic, i_ref['host']).\
1406+ AndReturn(topic)
1407+ self.mox.StubOutWithMock(rpc, 'call')
1408+ rpc.call(c, topic, {"method": "pre_live_migration",
1409+ "args": {'instance_id': i_ref['id']}}).\
1410+ AndRaise(rpc.RemoteError('', '', ''))
1411+ dbmock.instance_update(c, i_ref['id'], {'state_description': 'running',
1412+ 'state': power_state.RUNNING,
1413+ 'host': i_ref['host']})
1414+
1415+ self.compute.db = dbmock
1416+ self.mox.ReplayAll()
1417+ self.assertRaises(rpc.RemoteError,
1418+ self.compute.live_migration,
1419+ c, i_ref['id'], i_ref['host'])
1420+
1421+ def test_live_migration_works_correctly_no_volume(self):
1422+ """Confirm live_migration() works as expected correctly."""
1423+ i_ref = self._get_dummy_instance()
1424+ i_ref['volumes'] = []
1425+ c = context.get_admin_context()
1426+ topic = db.queue_get_for(c, FLAGS.compute_topic, i_ref['host'])
1427+
1428+ dbmock = self.mox.CreateMock(db)
1429+ dbmock.instance_get(c, i_ref['id']).AndReturn(i_ref)
1430+ self.mox.StubOutWithMock(rpc, 'call')
1431+ dbmock.queue_get_for(c, FLAGS.compute_topic, i_ref['host']).\
1432+ AndReturn(topic)
1433+ rpc.call(c, topic, {"method": "pre_live_migration",
1434+ "args": {'instance_id': i_ref['id']}})
1435+ self.mox.StubOutWithMock(self.compute.driver, 'live_migration')
1436+ self.compute.driver.live_migration(c, i_ref, i_ref['host'],
1437+ self.compute.post_live_migration,
1438+ self.compute.recover_live_migration)
1439+
1440+ self.compute.db = dbmock
1441+ self.mox.ReplayAll()
1442+ ret = self.compute.live_migration(c, i_ref['id'], i_ref['host'])
1443+ self.assertEqual(ret, None)
1444+
1445+ def test_post_live_migration_working_correctly(self):
1446+ """Confirm post_live_migration() works as expected correctly."""
1447+ dest = 'desthost'
1448+ flo_addr = '1.2.1.2'
1449+
1450+ # Preparing datas
1451+ c = context.get_admin_context()
1452+ instance_id = self._create_instance()
1453+ i_ref = db.instance_get(c, instance_id)
1454+ db.instance_update(c, i_ref['id'], {'state_description': 'migrating',
1455+ 'state': power_state.PAUSED})
1456+ v_ref = db.volume_create(c, {'size': 1, 'instance_id': instance_id})
1457+ fix_addr = db.fixed_ip_create(c, {'address': '1.1.1.1',
1458+ 'instance_id': instance_id})
1459+ fix_ref = db.fixed_ip_get_by_address(c, fix_addr)
1460+ flo_ref = db.floating_ip_create(c, {'address': flo_addr,
1461+ 'fixed_ip_id': fix_ref['id']})
1462+ # reload is necessary before setting mocks
1463+ i_ref = db.instance_get(c, instance_id)
1464+
1465+ # Preparing mocks
1466+ self.mox.StubOutWithMock(self.compute.volume_manager,
1467+ 'remove_compute_volume')
1468+ for v in i_ref['volumes']:
1469+ self.compute.volume_manager.remove_compute_volume(c, v['id'])
1470+ self.mox.StubOutWithMock(self.compute.driver, 'unfilter_instance')
1471+ self.compute.driver.unfilter_instance(i_ref)
1472+
1473+ # executing
1474+ self.mox.ReplayAll()
1475+ ret = self.compute.post_live_migration(c, i_ref, dest)
1476+
1477+ # make sure every data is rewritten to dest
1478+ i_ref = db.instance_get(c, i_ref['id'])
1479+ c1 = (i_ref['host'] == dest)
1480+ flo_refs = db.floating_ip_get_all_by_host(c, dest)
1481+ c2 = (len(flo_refs) != 0 and flo_refs[0]['address'] == flo_addr)
1482+
1483+ # post operaton
1484+ self.assertTrue(c1 and c2)
1485+ db.instance_destroy(c, instance_id)
1486+ db.volume_destroy(c, v_ref['id'])
1487+ db.floating_ip_destroy(c, flo_addr)
1488
1489=== modified file 'nova/tests/test_scheduler.py'
1490--- nova/tests/test_scheduler.py 2011-03-07 01:25:01 +0000
1491+++ nova/tests/test_scheduler.py 2011-03-10 06:27:59 +0000
1492@@ -20,10 +20,12 @@
1493 """
1494
1495 import datetime
1496+import mox
1497
1498 from mox import IgnoreArg
1499 from nova import context
1500 from nova import db
1501+from nova import exception
1502 from nova import flags
1503 from nova import service
1504 from nova import test
1505@@ -32,11 +34,14 @@
1506 from nova.auth import manager as auth_manager
1507 from nova.scheduler import manager
1508 from nova.scheduler import driver
1509+from nova.compute import power_state
1510+from nova.db.sqlalchemy import models
1511
1512
1513 FLAGS = flags.FLAGS
1514 flags.DECLARE('max_cores', 'nova.scheduler.simple')
1515 flags.DECLARE('stub_network', 'nova.compute.manager')
1516+flags.DECLARE('instances_path', 'nova.compute.manager')
1517
1518
1519 class TestDriver(driver.Scheduler):
1520@@ -54,6 +59,34 @@
1521 super(SchedulerTestCase, self).setUp()
1522 self.flags(scheduler_driver='nova.tests.test_scheduler.TestDriver')
1523
1524+ def _create_compute_service(self):
1525+ """Create compute-manager(ComputeNode and Service record)."""
1526+ ctxt = context.get_admin_context()
1527+ dic = {'host': 'dummy', 'binary': 'nova-compute', 'topic': 'compute',
1528+ 'report_count': 0, 'availability_zone': 'dummyzone'}
1529+ s_ref = db.service_create(ctxt, dic)
1530+
1531+ dic = {'service_id': s_ref['id'],
1532+ 'vcpus': 16, 'memory_mb': 32, 'local_gb': 100,
1533+ 'vcpus_used': 16, 'memory_mb_used': 32, 'local_gb_used': 10,
1534+ 'hypervisor_type': 'qemu', 'hypervisor_version': 12003,
1535+ 'cpu_info': ''}
1536+ db.compute_node_create(ctxt, dic)
1537+
1538+ return db.service_get(ctxt, s_ref['id'])
1539+
1540+ def _create_instance(self, **kwargs):
1541+ """Create a test instance"""
1542+ ctxt = context.get_admin_context()
1543+ inst = {}
1544+ inst['user_id'] = 'admin'
1545+ inst['project_id'] = kwargs.get('project_id', 'fake')
1546+ inst['host'] = kwargs.get('host', 'dummy')
1547+ inst['vcpus'] = kwargs.get('vcpus', 1)
1548+ inst['memory_mb'] = kwargs.get('memory_mb', 10)
1549+ inst['local_gb'] = kwargs.get('local_gb', 20)
1550+ return db.instance_create(ctxt, inst)
1551+
1552 def test_fallback(self):
1553 scheduler = manager.SchedulerManager()
1554 self.mox.StubOutWithMock(rpc, 'cast', use_mock_anything=True)
1555@@ -76,6 +109,73 @@
1556 self.mox.ReplayAll()
1557 scheduler.named_method(ctxt, 'topic', num=7)
1558
1559+ def test_show_host_resources_host_not_exit(self):
1560+ """A host given as an argument does not exists."""
1561+
1562+ scheduler = manager.SchedulerManager()
1563+ dest = 'dummydest'
1564+ ctxt = context.get_admin_context()
1565+
1566+ try:
1567+ scheduler.show_host_resources(ctxt, dest)
1568+ except exception.NotFound, e:
1569+ c1 = (e.message.find(_("does not exist or is not a "
1570+ "compute node.")) >= 0)
1571+ self.assertTrue(c1)
1572+
1573+ def _dic_is_equal(self, dic1, dic2, keys=None):
1574+ """Compares 2 dictionary contents(Helper method)"""
1575+ if not keys:
1576+ keys = ['vcpus', 'memory_mb', 'local_gb',
1577+ 'vcpus_used', 'memory_mb_used', 'local_gb_used']
1578+
1579+ for key in keys:
1580+ if not (dic1[key] == dic2[key]):
1581+ return False
1582+ return True
1583+
1584+ def test_show_host_resources_no_project(self):
1585+ """No instance are running on the given host."""
1586+
1587+ scheduler = manager.SchedulerManager()
1588+ ctxt = context.get_admin_context()
1589+ s_ref = self._create_compute_service()
1590+
1591+ result = scheduler.show_host_resources(ctxt, s_ref['host'])
1592+
1593+ # result checking
1594+ c1 = ('resource' in result and 'usage' in result)
1595+ compute_node = s_ref['compute_node'][0]
1596+ c2 = self._dic_is_equal(result['resource'], compute_node)
1597+ c3 = result['usage'] == {}
1598+ self.assertTrue(c1 and c2 and c3)
1599+ db.service_destroy(ctxt, s_ref['id'])
1600+
1601+ def test_show_host_resources_works_correctly(self):
1602+ """Show_host_resources() works correctly as expected."""
1603+
1604+ scheduler = manager.SchedulerManager()
1605+ ctxt = context.get_admin_context()
1606+ s_ref = self._create_compute_service()
1607+ i_ref1 = self._create_instance(project_id='p-01', host=s_ref['host'])
1608+ i_ref2 = self._create_instance(project_id='p-02', vcpus=3,
1609+ host=s_ref['host'])
1610+
1611+ result = scheduler.show_host_resources(ctxt, s_ref['host'])
1612+
1613+ c1 = ('resource' in result and 'usage' in result)
1614+ compute_node = s_ref['compute_node'][0]
1615+ c2 = self._dic_is_equal(result['resource'], compute_node)
1616+ c3 = result['usage'].keys() == ['p-01', 'p-02']
1617+ keys = ['vcpus', 'memory_mb', 'local_gb']
1618+ c4 = self._dic_is_equal(result['usage']['p-01'], i_ref1, keys)
1619+ c5 = self._dic_is_equal(result['usage']['p-02'], i_ref2, keys)
1620+ self.assertTrue(c1 and c2 and c3 and c4 and c5)
1621+
1622+ db.service_destroy(ctxt, s_ref['id'])
1623+ db.instance_destroy(ctxt, i_ref1['id'])
1624+ db.instance_destroy(ctxt, i_ref2['id'])
1625+
1626
1627 class ZoneSchedulerTestCase(test.TestCase):
1628 """Test case for zone scheduler"""
1629@@ -161,9 +261,15 @@
1630 inst['project_id'] = self.project.id
1631 inst['instance_type'] = 'm1.tiny'
1632 inst['mac_address'] = utils.generate_mac()
1633+ inst['vcpus'] = kwargs.get('vcpus', 1)
1634 inst['ami_launch_index'] = 0
1635- inst['vcpus'] = 1
1636 inst['availability_zone'] = kwargs.get('availability_zone', None)
1637+ inst['host'] = kwargs.get('host', 'dummy')
1638+ inst['memory_mb'] = kwargs.get('memory_mb', 20)
1639+ inst['local_gb'] = kwargs.get('local_gb', 30)
1640+ inst['launched_on'] = kwargs.get('launghed_on', 'dummy')
1641+ inst['state_description'] = kwargs.get('state_description', 'running')
1642+ inst['state'] = kwargs.get('state', power_state.RUNNING)
1643 return db.instance_create(self.context, inst)['id']
1644
1645 def _create_volume(self):
1646@@ -173,6 +279,211 @@
1647 vol['availability_zone'] = 'test'
1648 return db.volume_create(self.context, vol)['id']
1649
1650+ def _create_compute_service(self, **kwargs):
1651+ """Create a compute service."""
1652+
1653+ dic = {'binary': 'nova-compute', 'topic': 'compute',
1654+ 'report_count': 0, 'availability_zone': 'dummyzone'}
1655+ dic['host'] = kwargs.get('host', 'dummy')
1656+ s_ref = db.service_create(self.context, dic)
1657+ if 'created_at' in kwargs.keys() or 'updated_at' in kwargs.keys():
1658+ t = datetime.datetime.utcnow() - datetime.timedelta(0)
1659+ dic['created_at'] = kwargs.get('created_at', t)
1660+ dic['updated_at'] = kwargs.get('updated_at', t)
1661+ db.service_update(self.context, s_ref['id'], dic)
1662+
1663+ dic = {'service_id': s_ref['id'],
1664+ 'vcpus': 16, 'memory_mb': 32, 'local_gb': 100,
1665+ 'vcpus_used': 16, 'local_gb_used': 10,
1666+ 'hypervisor_type': 'qemu', 'hypervisor_version': 12003,
1667+ 'cpu_info': ''}
1668+ dic['memory_mb_used'] = kwargs.get('memory_mb_used', 32)
1669+ dic['hypervisor_type'] = kwargs.get('hypervisor_type', 'qemu')
1670+ dic['hypervisor_version'] = kwargs.get('hypervisor_version', 12003)
1671+ db.compute_node_create(self.context, dic)
1672+ return db.service_get(self.context, s_ref['id'])
1673+
1674+ def test_doesnt_report_disabled_hosts_as_up(self):
1675+ """Ensures driver doesn't find hosts before they are enabled"""
1676+ # NOTE(vish): constructing service without create method
1677+ # because we are going to use it without queue
1678+ compute1 = service.Service('host1',
1679+ 'nova-compute',
1680+ 'compute',
1681+ FLAGS.compute_manager)
1682+ compute1.start()
1683+ compute2 = service.Service('host2',
1684+ 'nova-compute',
1685+ 'compute',
1686+ FLAGS.compute_manager)
1687+ compute2.start()
1688+ s1 = db.service_get_by_args(self.context, 'host1', 'nova-compute')
1689+ s2 = db.service_get_by_args(self.context, 'host2', 'nova-compute')
1690+ db.service_update(self.context, s1['id'], {'disabled': True})
1691+ db.service_update(self.context, s2['id'], {'disabled': True})
1692+ hosts = self.scheduler.driver.hosts_up(self.context, 'compute')
1693+ self.assertEqual(0, len(hosts))
1694+ compute1.kill()
1695+ compute2.kill()
1696+
1697+ def test_reports_enabled_hosts_as_up(self):
1698+ """Ensures driver can find the hosts that are up"""
1699+ # NOTE(vish): constructing service without create method
1700+ # because we are going to use it without queue
1701+ compute1 = service.Service('host1',
1702+ 'nova-compute',
1703+ 'compute',
1704+ FLAGS.compute_manager)
1705+ compute1.start()
1706+ compute2 = service.Service('host2',
1707+ 'nova-compute',
1708+ 'compute',
1709+ FLAGS.compute_manager)
1710+ compute2.start()
1711+ hosts = self.scheduler.driver.hosts_up(self.context, 'compute')
1712+ self.assertEqual(2, len(hosts))
1713+ compute1.kill()
1714+ compute2.kill()
1715+
1716+ def test_least_busy_host_gets_instance(self):
1717+ """Ensures the host with less cores gets the next one"""
1718+ compute1 = service.Service('host1',
1719+ 'nova-compute',
1720+ 'compute',
1721+ FLAGS.compute_manager)
1722+ compute1.start()
1723+ compute2 = service.Service('host2',
1724+ 'nova-compute',
1725+ 'compute',
1726+ FLAGS.compute_manager)
1727+ compute2.start()
1728+ instance_id1 = self._create_instance()
1729+ compute1.run_instance(self.context, instance_id1)
1730+ instance_id2 = self._create_instance()
1731+ host = self.scheduler.driver.schedule_run_instance(self.context,
1732+ instance_id2)
1733+ self.assertEqual(host, 'host2')
1734+ compute1.terminate_instance(self.context, instance_id1)
1735+ db.instance_destroy(self.context, instance_id2)
1736+ compute1.kill()
1737+ compute2.kill()
1738+
1739+ def test_specific_host_gets_instance(self):
1740+ """Ensures if you set availability_zone it launches on that zone"""
1741+ compute1 = service.Service('host1',
1742+ 'nova-compute',
1743+ 'compute',
1744+ FLAGS.compute_manager)
1745+ compute1.start()
1746+ compute2 = service.Service('host2',
1747+ 'nova-compute',
1748+ 'compute',
1749+ FLAGS.compute_manager)
1750+ compute2.start()
1751+ instance_id1 = self._create_instance()
1752+ compute1.run_instance(self.context, instance_id1)
1753+ instance_id2 = self._create_instance(availability_zone='nova:host1')
1754+ host = self.scheduler.driver.schedule_run_instance(self.context,
1755+ instance_id2)
1756+ self.assertEqual('host1', host)
1757+ compute1.terminate_instance(self.context, instance_id1)
1758+ db.instance_destroy(self.context, instance_id2)
1759+ compute1.kill()
1760+ compute2.kill()
1761+
1762+ def test_wont_sechedule_if_specified_host_is_down(self):
1763+ compute1 = service.Service('host1',
1764+ 'nova-compute',
1765+ 'compute',
1766+ FLAGS.compute_manager)
1767+ compute1.start()
1768+ s1 = db.service_get_by_args(self.context, 'host1', 'nova-compute')
1769+ now = datetime.datetime.utcnow()
1770+ delta = datetime.timedelta(seconds=FLAGS.service_down_time * 2)
1771+ past = now - delta
1772+ db.service_update(self.context, s1['id'], {'updated_at': past})
1773+ instance_id2 = self._create_instance(availability_zone='nova:host1')
1774+ self.assertRaises(driver.WillNotSchedule,
1775+ self.scheduler.driver.schedule_run_instance,
1776+ self.context,
1777+ instance_id2)
1778+ db.instance_destroy(self.context, instance_id2)
1779+ compute1.kill()
1780+
1781+ def test_will_schedule_on_disabled_host_if_specified(self):
1782+ compute1 = service.Service('host1',
1783+ 'nova-compute',
1784+ 'compute',
1785+ FLAGS.compute_manager)
1786+ compute1.start()
1787+ s1 = db.service_get_by_args(self.context, 'host1', 'nova-compute')
1788+ db.service_update(self.context, s1['id'], {'disabled': True})
1789+ instance_id2 = self._create_instance(availability_zone='nova:host1')
1790+ host = self.scheduler.driver.schedule_run_instance(self.context,
1791+ instance_id2)
1792+ self.assertEqual('host1', host)
1793+ db.instance_destroy(self.context, instance_id2)
1794+ compute1.kill()
1795+
1796+ def test_too_many_cores(self):
1797+ """Ensures we don't go over max cores"""
1798+ compute1 = service.Service('host1',
1799+ 'nova-compute',
1800+ 'compute',
1801+ FLAGS.compute_manager)
1802+ compute1.start()
1803+ compute2 = service.Service('host2',
1804+ 'nova-compute',
1805+ 'compute',
1806+ FLAGS.compute_manager)
1807+ compute2.start()
1808+ instance_ids1 = []
1809+ instance_ids2 = []
1810+ for index in xrange(FLAGS.max_cores):
1811+ instance_id = self._create_instance()
1812+ compute1.run_instance(self.context, instance_id)
1813+ instance_ids1.append(instance_id)
1814+ instance_id = self._create_instance()
1815+ compute2.run_instance(self.context, instance_id)
1816+ instance_ids2.append(instance_id)
1817+ instance_id = self._create_instance()
1818+ self.assertRaises(driver.NoValidHost,
1819+ self.scheduler.driver.schedule_run_instance,
1820+ self.context,
1821+ instance_id)
1822+ for instance_id in instance_ids1:
1823+ compute1.terminate_instance(self.context, instance_id)
1824+ for instance_id in instance_ids2:
1825+ compute2.terminate_instance(self.context, instance_id)
1826+ compute1.kill()
1827+ compute2.kill()
1828+
1829+ def test_least_busy_host_gets_volume(self):
1830+ """Ensures the host with less gigabytes gets the next one"""
1831+ volume1 = service.Service('host1',
1832+ 'nova-volume',
1833+ 'volume',
1834+ FLAGS.volume_manager)
1835+ volume1.start()
1836+ volume2 = service.Service('host2',
1837+ 'nova-volume',
1838+ 'volume',
1839+ FLAGS.volume_manager)
1840+ volume2.start()
1841+ volume_id1 = self._create_volume()
1842+ volume1.create_volume(self.context, volume_id1)
1843+ volume_id2 = self._create_volume()
1844+ host = self.scheduler.driver.schedule_create_volume(self.context,
1845+ volume_id2)
1846+ self.assertEqual(host, 'host2')
1847+ volume1.delete_volume(self.context, volume_id1)
1848+ db.volume_destroy(self.context, volume_id2)
1849+ dic = {'service_id': s_ref['id'],
1850+ 'vcpus': 16, 'memory_mb': 32, 'local_gb': 100,
1851+ 'vcpus_used': 16, 'memory_mb_used': 12, 'local_gb_used': 10,
1852+ 'hypervisor_type': 'qemu', 'hypervisor_version': 12003,
1853+ 'cpu_info': ''}
1854+
1855 def test_doesnt_report_disabled_hosts_as_up(self):
1856 """Ensures driver doesn't find hosts before they are enabled"""
1857 compute1 = self.start_service('compute', host='host1')
1858@@ -316,3 +627,313 @@
1859 volume2.delete_volume(self.context, volume_id)
1860 volume1.kill()
1861 volume2.kill()
1862+
1863+ def test_scheduler_live_migration_with_volume(self):
1864+ """scheduler_live_migration() works correctly as expected.
1865+
1866+ Also, checks instance state is changed from 'running' -> 'migrating'.
1867+
1868+ """
1869+
1870+ instance_id = self._create_instance()
1871+ i_ref = db.instance_get(self.context, instance_id)
1872+ dic = {'instance_id': instance_id, 'size': 1}
1873+ v_ref = db.volume_create(self.context, dic)
1874+
1875+ # cannot check 2nd argument b/c the addresses of instance object
1876+ # is different.
1877+ driver_i = self.scheduler.driver
1878+ nocare = mox.IgnoreArg()
1879+ self.mox.StubOutWithMock(driver_i, '_live_migration_src_check')
1880+ self.mox.StubOutWithMock(driver_i, '_live_migration_dest_check')
1881+ self.mox.StubOutWithMock(driver_i, '_live_migration_common_check')
1882+ driver_i._live_migration_src_check(nocare, nocare)
1883+ driver_i._live_migration_dest_check(nocare, nocare, i_ref['host'])
1884+ driver_i._live_migration_common_check(nocare, nocare, i_ref['host'])
1885+ self.mox.StubOutWithMock(rpc, 'cast', use_mock_anything=True)
1886+ kwargs = {'instance_id': instance_id, 'dest': i_ref['host']}
1887+ rpc.cast(self.context,
1888+ db.queue_get_for(nocare, FLAGS.compute_topic, i_ref['host']),
1889+ {"method": 'live_migration', "args": kwargs})
1890+
1891+ self.mox.ReplayAll()
1892+ self.scheduler.live_migration(self.context, FLAGS.compute_topic,
1893+ instance_id=instance_id,
1894+ dest=i_ref['host'])
1895+
1896+ i_ref = db.instance_get(self.context, instance_id)
1897+ self.assertTrue(i_ref['state_description'] == 'migrating')
1898+ db.instance_destroy(self.context, instance_id)
1899+ db.volume_destroy(self.context, v_ref['id'])
1900+
1901+ def test_live_migration_src_check_instance_not_running(self):
1902+ """The instance given by instance_id is not running."""
1903+
1904+ instance_id = self._create_instance(state_description='migrating')
1905+ i_ref = db.instance_get(self.context, instance_id)
1906+
1907+ try:
1908+ self.scheduler.driver._live_migration_src_check(self.context,
1909+ i_ref)
1910+ except exception.Invalid, e:
1911+ c = (e.message.find('is not running') > 0)
1912+
1913+ self.assertTrue(c)
1914+ db.instance_destroy(self.context, instance_id)
1915+
1916+ def test_live_migration_src_check_volume_node_not_alive(self):
1917+ """Raise exception when volume node is not alive."""
1918+
1919+ instance_id = self._create_instance()
1920+ i_ref = db.instance_get(self.context, instance_id)
1921+ dic = {'instance_id': instance_id, 'size': 1}
1922+ v_ref = db.volume_create(self.context, {'instance_id': instance_id,
1923+ 'size': 1})
1924+ t1 = datetime.datetime.utcnow() - datetime.timedelta(1)
1925+ dic = {'created_at': t1, 'updated_at': t1, 'binary': 'nova-volume',
1926+ 'topic': 'volume', 'report_count': 0}
1927+ s_ref = db.service_create(self.context, dic)
1928+
1929+ try:
1930+ self.scheduler.driver.schedule_live_migration(self.context,
1931+ instance_id,
1932+ i_ref['host'])
1933+ except exception.Invalid, e:
1934+ c = (e.message.find('volume node is not alive') >= 0)
1935+
1936+ self.assertTrue(c)
1937+ db.instance_destroy(self.context, instance_id)
1938+ db.service_destroy(self.context, s_ref['id'])
1939+ db.volume_destroy(self.context, v_ref['id'])
1940+
1941+ def test_live_migration_src_check_compute_node_not_alive(self):
1942+ """Confirms src-compute node is alive."""
1943+ instance_id = self._create_instance()
1944+ i_ref = db.instance_get(self.context, instance_id)
1945+ t = datetime.datetime.utcnow() - datetime.timedelta(10)
1946+ s_ref = self._create_compute_service(created_at=t, updated_at=t,
1947+ host=i_ref['host'])
1948+
1949+ try:
1950+ self.scheduler.driver._live_migration_src_check(self.context,
1951+ i_ref)
1952+ except exception.Invalid, e:
1953+ c = (e.message.find('is not alive') >= 0)
1954+
1955+ self.assertTrue(c)
1956+ db.instance_destroy(self.context, instance_id)
1957+ db.service_destroy(self.context, s_ref['id'])
1958+
1959+ def test_live_migration_src_check_works_correctly(self):
1960+ """Confirms this method finishes with no error."""
1961+ instance_id = self._create_instance()
1962+ i_ref = db.instance_get(self.context, instance_id)
1963+ s_ref = self._create_compute_service(host=i_ref['host'])
1964+
1965+ ret = self.scheduler.driver._live_migration_src_check(self.context,
1966+ i_ref)
1967+
1968+ self.assertTrue(ret == None)
1969+ db.instance_destroy(self.context, instance_id)
1970+ db.service_destroy(self.context, s_ref['id'])
1971+
1972+ def test_live_migration_dest_check_not_alive(self):
1973+ """Confirms exception raises in case dest host does not exist."""
1974+ instance_id = self._create_instance()
1975+ i_ref = db.instance_get(self.context, instance_id)
1976+ t = datetime.datetime.utcnow() - datetime.timedelta(10)
1977+ s_ref = self._create_compute_service(created_at=t, updated_at=t,
1978+ host=i_ref['host'])
1979+
1980+ try:
1981+ self.scheduler.driver._live_migration_dest_check(self.context,
1982+ i_ref,
1983+ i_ref['host'])
1984+ except exception.Invalid, e:
1985+ c = (e.message.find('is not alive') >= 0)
1986+
1987+ self.assertTrue(c)
1988+ db.instance_destroy(self.context, instance_id)
1989+ db.service_destroy(self.context, s_ref['id'])
1990+
1991+ def test_live_migration_dest_check_service_same_host(self):
1992+ """Confirms exceptioin raises in case dest and src is same host."""
1993+ instance_id = self._create_instance()
1994+ i_ref = db.instance_get(self.context, instance_id)
1995+ s_ref = self._create_compute_service(host=i_ref['host'])
1996+
1997+ try:
1998+ self.scheduler.driver._live_migration_dest_check(self.context,
1999+ i_ref,
2000+ i_ref['host'])
2001+ except exception.Invalid, e:
2002+ c = (e.message.find('choose other host') >= 0)
2003+
2004+ self.assertTrue(c)
2005+ db.instance_destroy(self.context, instance_id)
2006+ db.service_destroy(self.context, s_ref['id'])
2007+
2008+ def test_live_migration_dest_check_service_lack_memory(self):
2009+ """Confirms exception raises when dest doesn't have enough memory."""
2010+ instance_id = self._create_instance()
2011+ i_ref = db.instance_get(self.context, instance_id)
2012+ s_ref = self._create_compute_service(host='somewhere',
2013+ memory_mb_used=12)
2014+
2015+ try:
2016+ self.scheduler.driver._live_migration_dest_check(self.context,
2017+ i_ref,
2018+ 'somewhere')
2019+ except exception.NotEmpty, e:
2020+ c = (e.message.find('Unable to migrate') >= 0)
2021+
2022+ self.assertTrue(c)
2023+ db.instance_destroy(self.context, instance_id)
2024+ db.service_destroy(self.context, s_ref['id'])
2025+
2026+ def test_live_migration_dest_check_service_works_correctly(self):
2027+ """Confirms method finishes with no error."""
2028+ instance_id = self._create_instance()
2029+ i_ref = db.instance_get(self.context, instance_id)
2030+ s_ref = self._create_compute_service(host='somewhere',
2031+ memory_mb_used=5)
2032+
2033+ ret = self.scheduler.driver._live_migration_dest_check(self.context,
2034+ i_ref,
2035+ 'somewhere')
2036+ self.assertTrue(ret == None)
2037+ db.instance_destroy(self.context, instance_id)
2038+ db.service_destroy(self.context, s_ref['id'])
2039+
2040+ def test_live_migration_common_check_service_orig_not_exists(self):
2041+ """Destination host does not exist."""
2042+
2043+ dest = 'dummydest'
2044+ # mocks for live_migration_common_check()
2045+ instance_id = self._create_instance()
2046+ i_ref = db.instance_get(self.context, instance_id)
2047+ t1 = datetime.datetime.utcnow() - datetime.timedelta(10)
2048+ s_ref = self._create_compute_service(created_at=t1, updated_at=t1,
2049+ host=dest)
2050+
2051+ # mocks for mounted_on_same_shared_storage()
2052+ fpath = '/test/20110127120000'
2053+ self.mox.StubOutWithMock(driver, 'rpc', use_mock_anything=True)
2054+ topic = FLAGS.compute_topic
2055+ driver.rpc.call(mox.IgnoreArg(),
2056+ db.queue_get_for(self.context, topic, dest),
2057+ {"method": 'create_shared_storage_test_file'}).AndReturn(fpath)
2058+ driver.rpc.call(mox.IgnoreArg(),
2059+ db.queue_get_for(mox.IgnoreArg(), topic, i_ref['host']),
2060+ {"method": 'check_shared_storage_test_file',
2061+ "args": {'filename': fpath}})
2062+ driver.rpc.call(mox.IgnoreArg(),
2063+ db.queue_get_for(mox.IgnoreArg(), topic, dest),
2064+ {"method": 'cleanup_shared_storage_test_file',
2065+ "args": {'filename': fpath}})
2066+
2067+ self.mox.ReplayAll()
2068+ try:
2069+ self.scheduler.driver._live_migration_common_check(self.context,
2070+ i_ref,
2071+ dest)
2072+ except exception.Invalid, e:
2073+ c = (e.message.find('does not exist') >= 0)
2074+
2075+ self.assertTrue(c)
2076+ db.instance_destroy(self.context, instance_id)
2077+ db.service_destroy(self.context, s_ref['id'])
2078+
2079+ def test_live_migration_common_check_service_different_hypervisor(self):
2080+ """Original host and dest host has different hypervisor type."""
2081+ dest = 'dummydest'
2082+ instance_id = self._create_instance()
2083+ i_ref = db.instance_get(self.context, instance_id)
2084+
2085+ # compute service for destination
2086+ s_ref = self._create_compute_service(host=i_ref['host'])
2087+ # compute service for original host
2088+ s_ref2 = self._create_compute_service(host=dest, hypervisor_type='xen')
2089+
2090+ # mocks
2091+ driver = self.scheduler.driver
2092+ self.mox.StubOutWithMock(driver, 'mounted_on_same_shared_storage')
2093+ driver.mounted_on_same_shared_storage(mox.IgnoreArg(), i_ref, dest)
2094+
2095+ self.mox.ReplayAll()
2096+ try:
2097+ self.scheduler.driver._live_migration_common_check(self.context,
2098+ i_ref,
2099+ dest)
2100+ except exception.Invalid, e:
2101+ c = (e.message.find(_('Different hypervisor type')) >= 0)
2102+
2103+ self.assertTrue(c)
2104+ db.instance_destroy(self.context, instance_id)
2105+ db.service_destroy(self.context, s_ref['id'])
2106+ db.service_destroy(self.context, s_ref2['id'])
2107+
2108+ def test_live_migration_common_check_service_different_version(self):
2109+ """Original host and dest host has different hypervisor version."""
2110+ dest = 'dummydest'
2111+ instance_id = self._create_instance()
2112+ i_ref = db.instance_get(self.context, instance_id)
2113+
2114+ # compute service for destination
2115+ s_ref = self._create_compute_service(host=i_ref['host'])
2116+ # compute service for original host
2117+ s_ref2 = self._create_compute_service(host=dest,
2118+ hypervisor_version=12002)
2119+
2120+ # mocks
2121+ driver = self.scheduler.driver
2122+ self.mox.StubOutWithMock(driver, 'mounted_on_same_shared_storage')
2123+ driver.mounted_on_same_shared_storage(mox.IgnoreArg(), i_ref, dest)
2124+
2125+ self.mox.ReplayAll()
2126+ try:
2127+ self.scheduler.driver._live_migration_common_check(self.context,
2128+ i_ref,
2129+ dest)
2130+ except exception.Invalid, e:
2131+ c = (e.message.find(_('Older hypervisor version')) >= 0)
2132+
2133+ self.assertTrue(c)
2134+ db.instance_destroy(self.context, instance_id)
2135+ db.service_destroy(self.context, s_ref['id'])
2136+ db.service_destroy(self.context, s_ref2['id'])
2137+
2138+ def test_live_migration_common_check_checking_cpuinfo_fail(self):
2139+ """Raise excetion when original host doen't have compatible cpu."""
2140+
2141+ dest = 'dummydest'
2142+ instance_id = self._create_instance()
2143+ i_ref = db.instance_get(self.context, instance_id)
2144+
2145+ # compute service for destination
2146+ s_ref = self._create_compute_service(host=i_ref['host'])
2147+ # compute service for original host
2148+ s_ref2 = self._create_compute_service(host=dest)
2149+
2150+ # mocks
2151+ driver = self.scheduler.driver
2152+ self.mox.StubOutWithMock(driver, 'mounted_on_same_shared_storage')
2153+ driver.mounted_on_same_shared_storage(mox.IgnoreArg(), i_ref, dest)
2154+ self.mox.StubOutWithMock(rpc, 'call', use_mock_anything=True)
2155+ rpc.call(mox.IgnoreArg(), mox.IgnoreArg(),
2156+ {"method": 'compare_cpu',
2157+ "args": {'cpu_info': s_ref2['compute_node'][0]['cpu_info']}}).\
2158+ AndRaise(rpc.RemoteError("doesn't have compatibility to", "", ""))
2159+
2160+ self.mox.ReplayAll()
2161+ try:
2162+ self.scheduler.driver._live_migration_common_check(self.context,
2163+ i_ref,
2164+ dest)
2165+ except rpc.RemoteError, e:
2166+ c = (e.message.find(_("doesn't have compatibility to")) >= 0)
2167+
2168+ self.assertTrue(c)
2169+ db.instance_destroy(self.context, instance_id)
2170+ db.service_destroy(self.context, s_ref['id'])
2171+ db.service_destroy(self.context, s_ref2['id'])
2172
2173=== modified file 'nova/tests/test_service.py'
2174--- nova/tests/test_service.py 2011-02-23 23:14:16 +0000
2175+++ nova/tests/test_service.py 2011-03-10 06:27:59 +0000
2176@@ -30,6 +30,7 @@
2177 from nova import test
2178 from nova import service
2179 from nova import manager
2180+from nova.compute import manager as compute_manager
2181
2182 FLAGS = flags.FLAGS
2183 flags.DEFINE_string("fake_manager", "nova.tests.test_service.FakeManager",
2184@@ -251,3 +252,43 @@
2185 serv.report_state()
2186
2187 self.assert_(not serv.model_disconnected)
2188+
2189+ def test_compute_can_update_available_resource(self):
2190+ """Confirm compute updates their record of compute-service table."""
2191+ host = 'foo'
2192+ binary = 'nova-compute'
2193+ topic = 'compute'
2194+
2195+ # Any mocks are not working without UnsetStubs() here.
2196+ self.mox.UnsetStubs()
2197+ ctxt = context.get_admin_context()
2198+ service_ref = db.service_create(ctxt, {'host': host,
2199+ 'binary': binary,
2200+ 'topic': topic})
2201+ serv = service.Service(host,
2202+ binary,
2203+ topic,
2204+ 'nova.compute.manager.ComputeManager')
2205+
2206+ # This testcase want to test calling update_available_resource.
2207+ # No need to call periodic call, then below variable must be set 0.
2208+ serv.report_interval = 0
2209+ serv.periodic_interval = 0
2210+
2211+ # Creating mocks
2212+ self.mox.StubOutWithMock(service.rpc.Connection, 'instance')
2213+ service.rpc.Connection.instance(new=mox.IgnoreArg())
2214+ service.rpc.Connection.instance(new=mox.IgnoreArg())
2215+ self.mox.StubOutWithMock(serv.manager.driver,
2216+ 'update_available_resource')
2217+ serv.manager.driver.update_available_resource(mox.IgnoreArg(), host)
2218+
2219+ # Just doing start()-stop(), not confirm new db record is created,
2220+ # because update_available_resource() works only in
2221+ # libvirt environment. This testcase confirms
2222+ # update_available_resource() is called. Otherwise, mox complains.
2223+ self.mox.ReplayAll()
2224+ serv.start()
2225+ serv.stop()
2226+
2227+ db.service_destroy(ctxt, service_ref['id'])
2228
2229=== modified file 'nova/tests/test_virt.py'
2230--- nova/tests/test_virt.py 2011-03-09 23:45:00 +0000
2231+++ nova/tests/test_virt.py 2011-03-10 06:27:59 +0000
2232@@ -14,21 +14,28 @@
2233 # License for the specific language governing permissions and limitations
2234 # under the License.
2235
2236+import eventlet
2237+import mox
2238 import os
2239+import sys
2240
2241-import eventlet
2242 from xml.etree.ElementTree import fromstring as xml_to_tree
2243 from xml.dom.minidom import parseString as xml_to_dom
2244
2245 from nova import context
2246 from nova import db
2247+from nova import exception
2248 from nova import flags
2249 from nova import test
2250 from nova import utils
2251 from nova.api.ec2 import cloud
2252 from nova.auth import manager
2253+from nova.compute import manager as compute_manager
2254+from nova.compute import power_state
2255+from nova.db.sqlalchemy import models
2256 from nova.virt import libvirt_conn
2257
2258+libvirt = None
2259 FLAGS = flags.FLAGS
2260 flags.DECLARE('instances_path', 'nova.compute.manager')
2261
2262@@ -103,11 +110,28 @@
2263 libvirt_conn._late_load_cheetah()
2264 self.flags(fake_call=True)
2265 self.manager = manager.AuthManager()
2266+
2267+ try:
2268+ pjs = self.manager.get_projects()
2269+ pjs = [p for p in pjs if p.name == 'fake']
2270+ if 0 != len(pjs):
2271+ self.manager.delete_project(pjs[0])
2272+
2273+ users = self.manager.get_users()
2274+ users = [u for u in users if u.name == 'fake']
2275+ if 0 != len(users):
2276+ self.manager.delete_user(users[0])
2277+ except Exception, e:
2278+ pass
2279+
2280+ users = self.manager.get_users()
2281 self.user = self.manager.create_user('fake', 'fake', 'fake',
2282 admin=True)
2283 self.project = self.manager.create_project('fake', 'fake', 'fake')
2284 self.network = utils.import_object(FLAGS.network_manager)
2285+ self.context = context.get_admin_context()
2286 FLAGS.instances_path = ''
2287+ self.call_libvirt_dependant_setup = False
2288
2289 test_ip = '10.11.12.13'
2290 test_instance = {'memory_kb': '1024000',
2291@@ -119,6 +143,58 @@
2292 'bridge': 'br101',
2293 'instance_type': 'm1.small'}
2294
2295+ def lazy_load_library_exists(self):
2296+ """check if libvirt is available."""
2297+ # try to connect libvirt. if fail, skip test.
2298+ try:
2299+ import libvirt
2300+ import libxml2
2301+ except ImportError:
2302+ return False
2303+ global libvirt
2304+ libvirt = __import__('libvirt')
2305+ libvirt_conn.libvirt = __import__('libvirt')
2306+ libvirt_conn.libxml2 = __import__('libxml2')
2307+ return True
2308+
2309+ def create_fake_libvirt_mock(self, **kwargs):
2310+ """Defining mocks for LibvirtConnection(libvirt is not used)."""
2311+
2312+ # A fake libvirt.virConnect
2313+ class FakeLibvirtConnection(object):
2314+ pass
2315+
2316+ # A fake libvirt_conn.IptablesFirewallDriver
2317+ class FakeIptablesFirewallDriver(object):
2318+
2319+ def __init__(self, **kwargs):
2320+ pass
2321+
2322+ def setattr(self, key, val):
2323+ self.__setattr__(key, val)
2324+
2325+ # Creating mocks
2326+ fake = FakeLibvirtConnection()
2327+ fakeip = FakeIptablesFirewallDriver
2328+ # Customizing above fake if necessary
2329+ for key, val in kwargs.items():
2330+ fake.__setattr__(key, val)
2331+
2332+ # Inevitable mocks for libvirt_conn.LibvirtConnection
2333+ self.mox.StubOutWithMock(libvirt_conn.utils, 'import_class')
2334+ libvirt_conn.utils.import_class(mox.IgnoreArg()).AndReturn(fakeip)
2335+ self.mox.StubOutWithMock(libvirt_conn.LibvirtConnection, '_conn')
2336+ libvirt_conn.LibvirtConnection._conn = fake
2337+
2338+ def create_service(self, **kwargs):
2339+ service_ref = {'host': kwargs.get('host', 'dummy'),
2340+ 'binary': 'nova-compute',
2341+ 'topic': 'compute',
2342+ 'report_count': 0,
2343+ 'availability_zone': 'zone'}
2344+
2345+ return db.service_create(context.get_admin_context(), service_ref)
2346+
2347 def test_xml_and_uri_no_ramdisk_no_kernel(self):
2348 instance_data = dict(self.test_instance)
2349 self._check_xml_and_uri(instance_data,
2350@@ -258,8 +334,8 @@
2351 expected_result,
2352 '%s failed common check %d' % (xml, i))
2353
2354- # This test is supposed to make sure we don't override a specifically
2355- # set uri
2356+ # This test is supposed to make sure we don't
2357+ # override a specifically set uri
2358 #
2359 # Deliberately not just assigning this string to FLAGS.libvirt_uri and
2360 # checking against that later on. This way we make sure the
2361@@ -273,6 +349,150 @@
2362 self.assertEquals(uri, testuri)
2363 db.instance_destroy(user_context, instance_ref['id'])
2364
2365+ def test_update_available_resource_works_correctly(self):
2366+ """Confirm compute_node table is updated successfully."""
2367+ org_path = FLAGS.instances_path = ''
2368+ FLAGS.instances_path = '.'
2369+
2370+ # Prepare mocks
2371+ def getVersion():
2372+ return 12003
2373+
2374+ def getType():
2375+ return 'qemu'
2376+
2377+ def listDomainsID():
2378+ return []
2379+
2380+ service_ref = self.create_service(host='dummy')
2381+ self.create_fake_libvirt_mock(getVersion=getVersion,
2382+ getType=getType,
2383+ listDomainsID=listDomainsID)
2384+ self.mox.StubOutWithMock(libvirt_conn.LibvirtConnection,
2385+ 'get_cpu_info')
2386+ libvirt_conn.LibvirtConnection.get_cpu_info().AndReturn('cpuinfo')
2387+
2388+ # Start test
2389+ self.mox.ReplayAll()
2390+ conn = libvirt_conn.LibvirtConnection(False)
2391+ conn.update_available_resource(self.context, 'dummy')
2392+ service_ref = db.service_get(self.context, service_ref['id'])
2393+ compute_node = service_ref['compute_node'][0]
2394+
2395+ if sys.platform.upper() == 'LINUX2':
2396+ self.assertTrue(compute_node['vcpus'] >= 0)
2397+ self.assertTrue(compute_node['memory_mb'] > 0)
2398+ self.assertTrue(compute_node['local_gb'] > 0)
2399+ self.assertTrue(compute_node['vcpus_used'] == 0)
2400+ self.assertTrue(compute_node['memory_mb_used'] > 0)
2401+ self.assertTrue(compute_node['local_gb_used'] > 0)
2402+ self.assertTrue(len(compute_node['hypervisor_type']) > 0)
2403+ self.assertTrue(compute_node['hypervisor_version'] > 0)
2404+ else:
2405+ self.assertTrue(compute_node['vcpus'] >= 0)
2406+ self.assertTrue(compute_node['memory_mb'] == 0)
2407+ self.assertTrue(compute_node['local_gb'] > 0)
2408+ self.assertTrue(compute_node['vcpus_used'] == 0)
2409+ self.assertTrue(compute_node['memory_mb_used'] == 0)
2410+ self.assertTrue(compute_node['local_gb_used'] > 0)
2411+ self.assertTrue(len(compute_node['hypervisor_type']) > 0)
2412+ self.assertTrue(compute_node['hypervisor_version'] > 0)
2413+
2414+ db.service_destroy(self.context, service_ref['id'])
2415+ FLAGS.instances_path = org_path
2416+
2417+ def test_update_resource_info_no_compute_record_found(self):
2418+ """Raise exception if no recorde found on services table."""
2419+ org_path = FLAGS.instances_path = ''
2420+ FLAGS.instances_path = '.'
2421+ self.create_fake_libvirt_mock()
2422+
2423+ self.mox.ReplayAll()
2424+ conn = libvirt_conn.LibvirtConnection(False)
2425+ self.assertRaises(exception.Invalid,
2426+ conn.update_available_resource,
2427+ self.context, 'dummy')
2428+
2429+ FLAGS.instances_path = org_path
2430+
2431+ def test_ensure_filtering_rules_for_instance_timeout(self):
2432+ """ensure_filtering_fules_for_instance() finishes with timeout."""
2433+ # Skip if non-libvirt environment
2434+ if not self.lazy_load_library_exists():
2435+ return
2436+
2437+ # Preparing mocks
2438+ def fake_none(self):
2439+ return
2440+
2441+ def fake_raise(self):
2442+ raise libvirt.libvirtError('ERR')
2443+
2444+ self.create_fake_libvirt_mock(nwfilterLookupByName=fake_raise)
2445+ instance_ref = db.instance_create(self.context, self.test_instance)
2446+
2447+ # Start test
2448+ self.mox.ReplayAll()
2449+ try:
2450+ conn = libvirt_conn.LibvirtConnection(False)
2451+ conn.firewall_driver.setattr('setup_basic_filtering', fake_none)
2452+ conn.firewall_driver.setattr('prepare_instance_filter', fake_none)
2453+ conn.ensure_filtering_rules_for_instance(instance_ref)
2454+ except exception.Error, e:
2455+ c1 = (0 <= e.message.find('Timeout migrating for'))
2456+ self.assertTrue(c1)
2457+
2458+ db.instance_destroy(self.context, instance_ref['id'])
2459+
2460+ def test_live_migration_raises_exception(self):
2461+ """Confirms recover method is called when exceptions are raised."""
2462+ # Skip if non-libvirt environment
2463+ if not self.lazy_load_library_exists():
2464+ return
2465+
2466+ # Preparing data
2467+ self.compute = utils.import_object(FLAGS.compute_manager)
2468+ instance_dict = {'host': 'fake', 'state': power_state.RUNNING,
2469+ 'state_description': 'running'}
2470+ instance_ref = db.instance_create(self.context, self.test_instance)
2471+ instance_ref = db.instance_update(self.context, instance_ref['id'],
2472+ instance_dict)
2473+ vol_dict = {'status': 'migrating', 'size': 1}
2474+ volume_ref = db.volume_create(self.context, vol_dict)
2475+ db.volume_attached(self.context, volume_ref['id'], instance_ref['id'],
2476+ '/dev/fake')
2477+
2478+ # Preparing mocks
2479+ vdmock = self.mox.CreateMock(libvirt.virDomain)
2480+ self.mox.StubOutWithMock(vdmock, "migrateToURI")
2481+ vdmock.migrateToURI(FLAGS.live_migration_uri % 'dest',
2482+ mox.IgnoreArg(),
2483+ None, FLAGS.live_migration_bandwidth).\
2484+ AndRaise(libvirt.libvirtError('ERR'))
2485+
2486+ def fake_lookup(instance_name):
2487+ if instance_name == instance_ref.name:
2488+ return vdmock
2489+
2490+ self.create_fake_libvirt_mock(lookupByName=fake_lookup)
2491+
2492+ # Start test
2493+ self.mox.ReplayAll()
2494+ conn = libvirt_conn.LibvirtConnection(False)
2495+ self.assertRaises(libvirt.libvirtError,
2496+ conn._live_migration,
2497+ self.context, instance_ref, 'dest', '',
2498+ self.compute.recover_live_migration)
2499+
2500+ instance_ref = db.instance_get(self.context, instance_ref['id'])
2501+ self.assertTrue(instance_ref['state_description'] == 'running')
2502+ self.assertTrue(instance_ref['state'] == power_state.RUNNING)
2503+ volume_ref = db.volume_get(self.context, volume_ref['id'])
2504+ self.assertTrue(volume_ref['status'] == 'in-use')
2505+
2506+ db.volume_destroy(self.context, volume_ref['id'])
2507+ db.instance_destroy(self.context, instance_ref['id'])
2508+
2509 def tearDown(self):
2510 self.manager.delete_project(self.project)
2511 self.manager.delete_user(self.user)
2512
2513=== modified file 'nova/tests/test_volume.py'
2514--- nova/tests/test_volume.py 2011-03-07 01:25:01 +0000
2515+++ nova/tests/test_volume.py 2011-03-10 06:27:59 +0000
2516@@ -20,6 +20,8 @@
2517
2518 """
2519
2520+import cStringIO
2521+
2522 from nova import context
2523 from nova import exception
2524 from nova import db
2525@@ -173,3 +175,196 @@
2526 # each of them having a different FLAG for storage_node
2527 # This will allow us to test cross-node interactions
2528 pass
2529+
2530+
2531+class DriverTestCase(test.TestCase):
2532+ """Base Test class for Drivers."""
2533+ driver_name = "nova.volume.driver.FakeAOEDriver"
2534+
2535+ def setUp(self):
2536+ super(DriverTestCase, self).setUp()
2537+ self.flags(volume_driver=self.driver_name,
2538+ logging_default_format_string="%(message)s")
2539+ self.volume = utils.import_object(FLAGS.volume_manager)
2540+ self.context = context.get_admin_context()
2541+ self.output = ""
2542+
2543+ def _fake_execute(_command, *_args, **_kwargs):
2544+ """Fake _execute."""
2545+ return self.output, None
2546+ self.volume.driver._execute = _fake_execute
2547+ self.volume.driver._sync_execute = _fake_execute
2548+
2549+ log = logging.getLogger()
2550+ self.stream = cStringIO.StringIO()
2551+ log.addHandler(logging.StreamHandler(self.stream))
2552+
2553+ inst = {}
2554+ self.instance_id = db.instance_create(self.context, inst)['id']
2555+
2556+ def tearDown(self):
2557+ super(DriverTestCase, self).tearDown()
2558+
2559+ def _attach_volume(self):
2560+ """Attach volumes to an instance. This function also sets
2561+ a fake log message."""
2562+ return []
2563+
2564+ def _detach_volume(self, volume_id_list):
2565+ """Detach volumes from an instance."""
2566+ for volume_id in volume_id_list:
2567+ db.volume_detached(self.context, volume_id)
2568+ self.volume.delete_volume(self.context, volume_id)
2569+
2570+
2571+class AOETestCase(DriverTestCase):
2572+ """Test Case for AOEDriver"""
2573+ driver_name = "nova.volume.driver.AOEDriver"
2574+
2575+ def setUp(self):
2576+ super(AOETestCase, self).setUp()
2577+
2578+ def tearDown(self):
2579+ super(AOETestCase, self).tearDown()
2580+
2581+ def _attach_volume(self):
2582+ """Attach volumes to an instance. This function also sets
2583+ a fake log message."""
2584+ volume_id_list = []
2585+ for index in xrange(3):
2586+ vol = {}
2587+ vol['size'] = 0
2588+ volume_id = db.volume_create(self.context,
2589+ vol)['id']
2590+ self.volume.create_volume(self.context, volume_id)
2591+
2592+ # each volume has a different mountpoint
2593+ mountpoint = "/dev/sd" + chr((ord('b') + index))
2594+ db.volume_attached(self.context, volume_id, self.instance_id,
2595+ mountpoint)
2596+
2597+ (shelf_id, blade_id) = db.volume_get_shelf_and_blade(self.context,
2598+ volume_id)
2599+ self.output += "%s %s eth0 /dev/nova-volumes/vol-foo auto run\n" \
2600+ % (shelf_id, blade_id)
2601+
2602+ volume_id_list.append(volume_id)
2603+
2604+ return volume_id_list
2605+
2606+ def test_check_for_export_with_no_volume(self):
2607+ """No log message when no volume is attached to an instance."""
2608+ self.stream.truncate(0)
2609+ self.volume.check_for_export(self.context, self.instance_id)
2610+ self.assertEqual(self.stream.getvalue(), '')
2611+
2612+ def test_check_for_export_with_all_vblade_processes(self):
2613+ """No log message when all the vblade processes are running."""
2614+ volume_id_list = self._attach_volume()
2615+
2616+ self.stream.truncate(0)
2617+ self.volume.check_for_export(self.context, self.instance_id)
2618+ self.assertEqual(self.stream.getvalue(), '')
2619+
2620+ self._detach_volume(volume_id_list)
2621+
2622+ def test_check_for_export_with_vblade_process_missing(self):
2623+ """Output a warning message when some vblade processes aren't
2624+ running."""
2625+ volume_id_list = self._attach_volume()
2626+
2627+ # the first vblade process isn't running
2628+ self.output = self.output.replace("run", "down", 1)
2629+ (shelf_id, blade_id) = db.volume_get_shelf_and_blade(self.context,
2630+ volume_id_list[0])
2631+
2632+ msg_is_match = False
2633+ self.stream.truncate(0)
2634+ try:
2635+ self.volume.check_for_export(self.context, self.instance_id)
2636+ except exception.ProcessExecutionError, e:
2637+ volume_id = volume_id_list[0]
2638+ msg = _("Cannot confirm exported volume id:%(volume_id)s. "
2639+ "vblade process for e%(shelf_id)s.%(blade_id)s "
2640+ "isn't running.") % locals()
2641+
2642+ msg_is_match = (0 <= e.message.find(msg))
2643+
2644+ self.assertTrue(msg_is_match)
2645+ self._detach_volume(volume_id_list)
2646+
2647+
2648+class ISCSITestCase(DriverTestCase):
2649+ """Test Case for ISCSIDriver"""
2650+ driver_name = "nova.volume.driver.ISCSIDriver"
2651+
2652+ def setUp(self):
2653+ super(ISCSITestCase, self).setUp()
2654+
2655+ def tearDown(self):
2656+ super(ISCSITestCase, self).tearDown()
2657+
2658+ def _attach_volume(self):
2659+ """Attach volumes to an instance. This function also sets
2660+ a fake log message."""
2661+ volume_id_list = []
2662+ for index in xrange(3):
2663+ vol = {}
2664+ vol['size'] = 0
2665+ vol_ref = db.volume_create(self.context, vol)
2666+ self.volume.create_volume(self.context, vol_ref['id'])
2667+ vol_ref = db.volume_get(self.context, vol_ref['id'])
2668+
2669+ # each volume has a different mountpoint
2670+ mountpoint = "/dev/sd" + chr((ord('b') + index))
2671+ db.volume_attached(self.context, vol_ref['id'], self.instance_id,
2672+ mountpoint)
2673+ volume_id_list.append(vol_ref['id'])
2674+
2675+ return volume_id_list
2676+
2677+ def test_check_for_export_with_no_volume(self):
2678+ """No log message when no volume is attached to an instance."""
2679+ self.stream.truncate(0)
2680+ self.volume.check_for_export(self.context, self.instance_id)
2681+ self.assertEqual(self.stream.getvalue(), '')
2682+
2683+ def test_check_for_export_with_all_volume_exported(self):
2684+ """No log message when all the vblade processes are running."""
2685+ volume_id_list = self._attach_volume()
2686+
2687+ self.mox.StubOutWithMock(self.volume.driver, '_execute')
2688+ for i in volume_id_list:
2689+ tid = db.volume_get_iscsi_target_num(self.context, i)
2690+ self.volume.driver._execute("sudo ietadm --op show --tid=%(tid)d"
2691+ % locals())
2692+
2693+ self.stream.truncate(0)
2694+ self.mox.ReplayAll()
2695+ self.volume.check_for_export(self.context, self.instance_id)
2696+ self.assertEqual(self.stream.getvalue(), '')
2697+ self.mox.UnsetStubs()
2698+
2699+ self._detach_volume(volume_id_list)
2700+
2701+ def test_check_for_export_with_some_volume_missing(self):
2702+ """Output a warning message when some volumes are not recognied
2703+ by ietd."""
2704+ volume_id_list = self._attach_volume()
2705+
2706+ # the first vblade process isn't running
2707+ tid = db.volume_get_iscsi_target_num(self.context, volume_id_list[0])
2708+ self.mox.StubOutWithMock(self.volume.driver, '_execute')
2709+ self.volume.driver._execute("sudo ietadm --op show --tid=%(tid)d"
2710+ % locals()).AndRaise(exception.ProcessExecutionError())
2711+
2712+ self.mox.ReplayAll()
2713+ self.assertRaises(exception.ProcessExecutionError,
2714+ self.volume.check_for_export,
2715+ self.context,
2716+ self.instance_id)
2717+ msg = _("Cannot confirm exported volume id:%s.") % volume_id_list[0]
2718+ self.assertTrue(0 <= self.stream.getvalue().find(msg))
2719+ self.mox.UnsetStubs()
2720+
2721+ self._detach_volume(volume_id_list)
2722
2723=== added file 'nova/virt/cpuinfo.xml.template'
2724--- nova/virt/cpuinfo.xml.template 1970-01-01 00:00:00 +0000
2725+++ nova/virt/cpuinfo.xml.template 2011-03-10 06:27:59 +0000
2726@@ -0,0 +1,9 @@
2727+<cpu>
2728+ <arch>$arch</arch>
2729+ <model>$model</model>
2730+ <vendor>$vendor</vendor>
2731+ <topology sockets="$topology.sockets" cores="$topology.cores" threads="$topology.threads"/>
2732+#for $var in $features
2733+ <features name="$var" />
2734+#end for
2735+</cpu>
2736
2737=== modified file 'nova/virt/fake.py'
2738--- nova/virt/fake.py 2011-02-28 17:39:23 +0000
2739+++ nova/virt/fake.py 2011-03-10 06:27:59 +0000
2740@@ -407,6 +407,27 @@
2741 """
2742 return True
2743
2744+ def update_available_resource(self, ctxt, host):
2745+ """This method is supported only by libvirt."""
2746+ return
2747+
2748+ def compare_cpu(self, xml):
2749+ """This method is supported only by libvirt."""
2750+ raise NotImplementedError('This method is supported only by libvirt.')
2751+
2752+ def ensure_filtering_rules_for_instance(self, instance_ref):
2753+ """This method is supported only by libvirt."""
2754+ raise NotImplementedError('This method is supported only by libvirt.')
2755+
2756+ def live_migration(self, context, instance_ref, dest,
2757+ post_method, recover_method):
2758+ """This method is supported only by libvirt."""
2759+ return
2760+
2761+ def unfilter_instance(self, instance_ref):
2762+ """This method is supported only by libvirt."""
2763+ raise NotImplementedError('This method is supported only by libvirt.')
2764+
2765
2766 class FakeInstance(object):
2767
2768
2769=== modified file 'nova/virt/libvirt_conn.py'
2770--- nova/virt/libvirt_conn.py 2011-03-10 04:42:11 +0000
2771+++ nova/virt/libvirt_conn.py 2011-03-10 06:27:59 +0000
2772@@ -36,10 +36,13 @@
2773
2774 """
2775
2776+import multiprocessing
2777 import os
2778 import shutil
2779+import sys
2780 import random
2781 import subprocess
2782+import time
2783 import uuid
2784 from xml.dom import minidom
2785
2786@@ -70,6 +73,7 @@
2787 LOG = logging.getLogger('nova.virt.libvirt_conn')
2788
2789 FLAGS = flags.FLAGS
2790+flags.DECLARE('live_migration_retry_count', 'nova.compute.manager')
2791 # TODO(vish): These flags should probably go into a shared location
2792 flags.DEFINE_string('rescue_image_id', 'ami-rescue', 'Rescue ami image')
2793 flags.DEFINE_string('rescue_kernel_id', 'aki-rescue', 'Rescue aki image')
2794@@ -100,6 +104,17 @@
2795 flags.DEFINE_string('firewall_driver',
2796 'nova.virt.libvirt_conn.IptablesFirewallDriver',
2797 'Firewall driver (defaults to iptables)')
2798+flags.DEFINE_string('cpuinfo_xml_template',
2799+ utils.abspath('virt/cpuinfo.xml.template'),
2800+ 'CpuInfo XML Template (Used only live migration now)')
2801+flags.DEFINE_string('live_migration_uri',
2802+ "qemu+tcp://%s/system",
2803+ 'Define protocol used by live_migration feature')
2804+flags.DEFINE_string('live_migration_flag',
2805+ "VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER",
2806+ 'Define live migration behavior.')
2807+flags.DEFINE_integer('live_migration_bandwidth', 0,
2808+ 'Define live migration behavior')
2809
2810
2811 def get_connection(read_only):
2812@@ -146,6 +161,7 @@
2813 self.libvirt_uri = self.get_uri()
2814
2815 self.libvirt_xml = open(FLAGS.libvirt_xml_template).read()
2816+ self.cpuinfo_xml = open(FLAGS.cpuinfo_xml_template).read()
2817 self._wrapped_conn = None
2818 self.read_only = read_only
2819
2820@@ -851,6 +867,158 @@
2821
2822 return interfaces
2823
2824+ def get_vcpu_total(self):
2825+ """Get vcpu number of physical computer.
2826+
2827+ :returns: the number of cpu core.
2828+
2829+ """
2830+
2831+ # On certain platforms, this will raise a NotImplementedError.
2832+ try:
2833+ return multiprocessing.cpu_count()
2834+ except NotImplementedError:
2835+ LOG.warn(_("Cannot get the number of cpu, because this "
2836+ "function is not implemented for this platform. "
2837+ "This error can be safely ignored for now."))
2838+ return 0
2839+
2840+ def get_memory_mb_total(self):
2841+ """Get the total memory size(MB) of physical computer.
2842+
2843+ :returns: the total amount of memory(MB).
2844+
2845+ """
2846+
2847+ if sys.platform.upper() != 'LINUX2':
2848+ return 0
2849+
2850+ meminfo = open('/proc/meminfo').read().split()
2851+ idx = meminfo.index('MemTotal:')
2852+ # transforming kb to mb.
2853+ return int(meminfo[idx + 1]) / 1024
2854+
2855+ def get_local_gb_total(self):
2856+ """Get the total hdd size(GB) of physical computer.
2857+
2858+ :returns:
2859+ The total amount of HDD(GB).
2860+ Note that this value shows a partition where
2861+ NOVA-INST-DIR/instances mounts.
2862+
2863+ """
2864+
2865+ hddinfo = os.statvfs(FLAGS.instances_path)
2866+ return hddinfo.f_frsize * hddinfo.f_blocks / 1024 / 1024 / 1024
2867+
2868+ def get_vcpu_used(self):
2869+ """ Get vcpu usage number of physical computer.
2870+
2871+ :returns: The total number of vcpu that currently used.
2872+
2873+ """
2874+
2875+ total = 0
2876+ for dom_id in self._conn.listDomainsID():
2877+ dom = self._conn.lookupByID(dom_id)
2878+ total += len(dom.vcpus()[1])
2879+ return total
2880+
2881+ def get_memory_mb_used(self):
2882+ """Get the free memory size(MB) of physical computer.
2883+
2884+ :returns: the total usage of memory(MB).
2885+
2886+ """
2887+
2888+ if sys.platform.upper() != 'LINUX2':
2889+ return 0
2890+
2891+ m = open('/proc/meminfo').read().split()
2892+ idx1 = m.index('MemFree:')
2893+ idx2 = m.index('Buffers:')
2894+ idx3 = m.index('Cached:')
2895+ avail = (int(m[idx1 + 1]) + int(m[idx2 + 1]) + int(m[idx3 + 1])) / 1024
2896+ return self.get_memory_mb_total() - avail
2897+
2898+ def get_local_gb_used(self):
2899+ """Get the free hdd size(GB) of physical computer.
2900+
2901+ :returns:
2902+ The total usage of HDD(GB).
2903+ Note that this value shows a partition where
2904+ NOVA-INST-DIR/instances mounts.
2905+
2906+ """
2907+
2908+ hddinfo = os.statvfs(FLAGS.instances_path)
2909+ avail = hddinfo.f_frsize * hddinfo.f_bavail / 1024 / 1024 / 1024
2910+ return self.get_local_gb_total() - avail
2911+
2912+ def get_hypervisor_type(self):
2913+ """Get hypervisor type.
2914+
2915+ :returns: hypervisor type (ex. qemu)
2916+
2917+ """
2918+
2919+ return self._conn.getType()
2920+
2921+ def get_hypervisor_version(self):
2922+ """Get hypervisor version.
2923+
2924+ :returns: hypervisor version (ex. 12003)
2925+
2926+ """
2927+
2928+ return self._conn.getVersion()
2929+
2930+ def get_cpu_info(self):
2931+ """Get cpuinfo information.
2932+
2933+ Obtains cpu feature from virConnect.getCapabilities,
2934+ and returns as a json string.
2935+
2936+ :return: see above description
2937+
2938+ """
2939+
2940+ xml = self._conn.getCapabilities()
2941+ xml = libxml2.parseDoc(xml)
2942+ nodes = xml.xpathEval('//cpu')
2943+ if len(nodes) != 1:
2944+ raise exception.Invalid(_("Invalid xml. '<cpu>' must be 1,"
2945+ "but %d\n") % len(nodes)
2946+ + xml.serialize())
2947+
2948+ cpu_info = dict()
2949+ cpu_info['arch'] = xml.xpathEval('//cpu/arch')[0].getContent()
2950+ cpu_info['model'] = xml.xpathEval('//cpu/model')[0].getContent()
2951+ cpu_info['vendor'] = xml.xpathEval('//cpu/vendor')[0].getContent()
2952+
2953+ topology_node = xml.xpathEval('//cpu/topology')[0].get_properties()
2954+ topology = dict()
2955+ while topology_node != None:
2956+ name = topology_node.get_name()
2957+ topology[name] = topology_node.getContent()
2958+ topology_node = topology_node.get_next()
2959+
2960+ keys = ['cores', 'sockets', 'threads']
2961+ tkeys = topology.keys()
2962+ if list(set(tkeys)) != list(set(keys)):
2963+ ks = ', '.join(keys)
2964+ raise exception.Invalid(_("Invalid xml: topology(%(topology)s) "
2965+ "must have %(ks)s") % locals())
2966+
2967+ feature_nodes = xml.xpathEval('//cpu/feature')
2968+ features = list()
2969+ for nodes in feature_nodes:
2970+ features.append(nodes.get_properties().getContent())
2971+
2972+ cpu_info['topology'] = topology
2973+ cpu_info['features'] = features
2974+ return utils.dumps(cpu_info)
2975+
2976 def block_stats(self, instance_name, disk):
2977 """
2978 Note that this function takes an instance name, not an Instance, so
2979@@ -881,6 +1049,207 @@
2980 def refresh_security_group_members(self, security_group_id):
2981 self.firewall_driver.refresh_security_group_members(security_group_id)
2982
2983+ def update_available_resource(self, ctxt, host):
2984+ """Updates compute manager resource info on ComputeNode table.
2985+
2986+ This method is called when nova-coompute launches, and
2987+ whenever admin executes "nova-manage service update_resource".
2988+
2989+ :param ctxt: security context
2990+ :param host: hostname that compute manager is currently running
2991+
2992+ """
2993+
2994+ try:
2995+ service_ref = db.service_get_all_compute_by_host(ctxt, host)[0]
2996+ except exception.NotFound:
2997+ raise exception.Invalid(_("Cannot update compute manager "
2998+ "specific info, because no service "
2999+ "record was found."))
3000+
3001+ # Updating host information
3002+ dic = {'vcpus': self.get_vcpu_total(),
3003+ 'memory_mb': self.get_memory_mb_total(),
3004+ 'local_gb': self.get_local_gb_total(),
3005+ 'vcpus_used': self.get_vcpu_used(),
3006+ 'memory_mb_used': self.get_memory_mb_used(),
3007+ 'local_gb_used': self.get_local_gb_used(),
3008+ 'hypervisor_type': self.get_hypervisor_type(),
3009+ 'hypervisor_version': self.get_hypervisor_version(),
3010+ 'cpu_info': self.get_cpu_info()}
3011+
3012+ compute_node_ref = service_ref['compute_node']
3013+ if not compute_node_ref:
3014+ LOG.info(_('Compute_service record created for %s ') % host)
3015+ dic['service_id'] = service_ref['id']
3016+ db.compute_node_create(ctxt, dic)
3017+ else:
3018+ LOG.info(_('Compute_service record updated for %s ') % host)
3019+ db.compute_node_update(ctxt, compute_node_ref[0]['id'], dic)
3020+
3021+ def compare_cpu(self, cpu_info):
3022+ """Checks the host cpu is compatible to a cpu given by xml.
3023+
3024+ "xml" must be a part of libvirt.openReadonly().getCapabilities().
3025+ return values follows by virCPUCompareResult.
3026+ if 0 > return value, do live migration.
3027+ 'http://libvirt.org/html/libvirt-libvirt.html#virCPUCompareResult'
3028+
3029+ :param cpu_info: json string that shows cpu feature(see get_cpu_info())
3030+ :returns:
3031+ None. if given cpu info is not compatible to this server,
3032+ raise exception.
3033+
3034+ """
3035+
3036+ LOG.info(_('Instance launched has CPU info:\n%s') % cpu_info)
3037+ dic = utils.loads(cpu_info)
3038+ xml = str(Template(self.cpuinfo_xml, searchList=dic))
3039+ LOG.info(_('to xml...\n:%s ' % xml))
3040+
3041+ u = "http://libvirt.org/html/libvirt-libvirt.html#virCPUCompareResult"
3042+ m = _("CPU doesn't have compatibility.\n\n%(ret)s\n\nRefer to %(u)s")
3043+ # unknown character exists in xml, then libvirt complains
3044+ try:
3045+ ret = self._conn.compareCPU(xml, 0)
3046+ except libvirt.libvirtError, e:
3047+ ret = e.message
3048+ LOG.error(m % locals())
3049+ raise
3050+
3051+ if ret <= 0:
3052+ raise exception.Invalid(m % locals())
3053+
3054+ return
3055+
3056+ def ensure_filtering_rules_for_instance(self, instance_ref):
3057+ """Setting up filtering rules and waiting for its completion.
3058+
3059+ To migrate an instance, filtering rules to hypervisors
3060+ and firewalls are inevitable on destination host.
3061+ ( Waiting only for filterling rules to hypervisor,
3062+ since filtering rules to firewall rules can be set faster).
3063+
3064+ Concretely, the below method must be called.
3065+ - setup_basic_filtering (for nova-basic, etc.)
3066+ - prepare_instance_filter(for nova-instance-instance-xxx, etc.)
3067+
3068+ to_xml may have to be called since it defines PROJNET, PROJMASK.
3069+ but libvirt migrates those value through migrateToURI(),
3070+ so , no need to be called.
3071+
3072+ Don't use thread for this method since migration should
3073+ not be started when setting-up filtering rules operations
3074+ are not completed.
3075+
3076+ :params instance_ref: nova.db.sqlalchemy.models.Instance object
3077+
3078+ """
3079+
3080+ # If any instances never launch at destination host,
3081+ # basic-filtering must be set here.
3082+ self.firewall_driver.setup_basic_filtering(instance_ref)
3083+ # setting up n)ova-instance-instance-xx mainly.
3084+ self.firewall_driver.prepare_instance_filter(instance_ref)
3085+
3086+ # wait for completion
3087+ timeout_count = range(FLAGS.live_migration_retry_count)
3088+ while timeout_count:
3089+ try:
3090+ filter_name = 'nova-instance-%s' % instance_ref.name
3091+ self._conn.nwfilterLookupByName(filter_name)
3092+ break
3093+ except libvirt.libvirtError:
3094+ timeout_count.pop()
3095+ if len(timeout_count) == 0:
3096+ ec2_id = instance_ref['hostname']
3097+ iname = instance_ref.name
3098+ msg = _('Timeout migrating for %(ec2_id)s(%(iname)s)')
3099+ raise exception.Error(msg % locals())
3100+ time.sleep(1)
3101+
3102+ def live_migration(self, ctxt, instance_ref, dest,
3103+ post_method, recover_method):
3104+ """Spawning live_migration operation for distributing high-load.
3105+
3106+ :params ctxt: security context
3107+ :params instance_ref:
3108+ nova.db.sqlalchemy.models.Instance object
3109+ instance object that is migrated.
3110+ :params dest: destination host
3111+ :params post_method:
3112+ post operation method.
3113+ expected nova.compute.manager.post_live_migration.
3114+ :params recover_method:
3115+ recovery method when any exception occurs.
3116+ expected nova.compute.manager.recover_live_migration.
3117+
3118+ """
3119+
3120+ greenthread.spawn(self._live_migration, ctxt, instance_ref, dest,
3121+ post_method, recover_method)
3122+
3123+ def _live_migration(self, ctxt, instance_ref, dest,
3124+ post_method, recover_method):
3125+ """Do live migration.
3126+
3127+ :params ctxt: security context
3128+ :params instance_ref:
3129+ nova.db.sqlalchemy.models.Instance object
3130+ instance object that is migrated.
3131+ :params dest: destination host
3132+ :params post_method:
3133+ post operation method.
3134+ expected nova.compute.manager.post_live_migration.
3135+ :params recover_method:
3136+ recovery method when any exception occurs.
3137+ expected nova.compute.manager.recover_live_migration.
3138+
3139+ """
3140+
3141+ # Do live migration.
3142+ try:
3143+ flaglist = FLAGS.live_migration_flag.split(',')
3144+ flagvals = [getattr(libvirt, x.strip()) for x in flaglist]
3145+ logical_sum = reduce(lambda x, y: x | y, flagvals)
3146+
3147+ if self.read_only:
3148+ tmpconn = self._connect(self.libvirt_uri, False)
3149+ dom = tmpconn.lookupByName(instance_ref.name)
3150+ dom.migrateToURI(FLAGS.live_migration_uri % dest,
3151+ logical_sum,
3152+ None,
3153+ FLAGS.live_migration_bandwidth)
3154+ tmpconn.close()
3155+ else:
3156+ dom = self._conn.lookupByName(instance_ref.name)
3157+ dom.migrateToURI(FLAGS.live_migration_uri % dest,
3158+ logical_sum,
3159+ None,
3160+ FLAGS.live_migration_bandwidth)
3161+
3162+ except Exception:
3163+ recover_method(ctxt, instance_ref)
3164+ raise
3165+
3166+ # Waiting for completion of live_migration.
3167+ timer = utils.LoopingCall(f=None)
3168+
3169+ def wait_for_live_migration():
3170+ """waiting for live migration completion"""
3171+ try:
3172+ self.get_info(instance_ref.name)['state']
3173+ except exception.NotFound:
3174+ timer.stop()
3175+ post_method(ctxt, instance_ref, dest)
3176+
3177+ timer.f = wait_for_live_migration
3178+ timer.start(interval=0.5, now=True)
3179+
3180+ def unfilter_instance(self, instance_ref):
3181+ """See comments of same method in firewall_driver."""
3182+ self.firewall_driver.unfilter_instance(instance_ref)
3183+
3184
3185 class FirewallDriver(object):
3186 def prepare_instance_filter(self, instance):
3187
3188=== modified file 'nova/virt/xenapi_conn.py'
3189--- nova/virt/xenapi_conn.py 2011-03-07 23:51:20 +0000
3190+++ nova/virt/xenapi_conn.py 2011-03-10 06:27:59 +0000
3191@@ -263,6 +263,27 @@
3192 'username': FLAGS.xenapi_connection_username,
3193 'password': FLAGS.xenapi_connection_password}
3194
3195+ def update_available_resource(self, ctxt, host):
3196+ """This method is supported only by libvirt."""
3197+ return
3198+
3199+ def compare_cpu(self, xml):
3200+ """This method is supported only by libvirt."""
3201+ raise NotImplementedError('This method is supported only by libvirt.')
3202+
3203+ def ensure_filtering_rules_for_instance(self, instance_ref):
3204+ """This method is supported only libvirt."""
3205+ return
3206+
3207+ def live_migration(self, context, instance_ref, dest,
3208+ post_method, recover_method):
3209+ """This method is supported only by libvirt."""
3210+ return
3211+
3212+ def unfilter_instance(self, instance_ref):
3213+ """This method is supported only by libvirt."""
3214+ raise NotImplementedError('This method is supported only by libvirt.')
3215+
3216
3217 class XenAPISession(object):
3218 """The session to invoke XenAPI SDK calls"""
3219
3220=== modified file 'nova/volume/driver.py'
3221--- nova/volume/driver.py 2011-03-09 20:33:20 +0000
3222+++ nova/volume/driver.py 2011-03-10 06:27:59 +0000
3223@@ -143,6 +143,10 @@
3224 """Undiscover volume on a remote host."""
3225 raise NotImplementedError()
3226
3227+ def check_for_export(self, context, volume_id):
3228+ """Make sure volume is exported."""
3229+ raise NotImplementedError()
3230+
3231
3232 class AOEDriver(VolumeDriver):
3233 """Implements AOE specific volume commands."""
3234@@ -198,15 +202,45 @@
3235 self._try_execute('sudo', 'vblade-persist', 'destroy',
3236 shelf_id, blade_id)
3237
3238- def discover_volume(self, _volume):
3239+ def discover_volume(self, context, _volume):
3240 """Discover volume on a remote host."""
3241- self._execute('sudo', 'aoe-discover')
3242- self._execute('sudo', 'aoe-stat', check_exit_code=False)
3243+ (shelf_id,
3244+ blade_id) = self.db.volume_get_shelf_and_blade(context,
3245+ _volume['id'])
3246+ self._execute("sudo aoe-discover")
3247+ out, err = self._execute("sudo aoe-stat", check_exit_code=False)
3248+ device_path = 'e%(shelf_id)d.%(blade_id)d' % locals()
3249+ if out.find(device_path) >= 0:
3250+ return "/dev/etherd/%s" % device_path
3251+ else:
3252+ return
3253
3254 def undiscover_volume(self, _volume):
3255 """Undiscover volume on a remote host."""
3256 pass
3257
3258+ def check_for_export(self, context, volume_id):
3259+ """Make sure volume is exported."""
3260+ (shelf_id,
3261+ blade_id) = self.db.volume_get_shelf_and_blade(context,
3262+ volume_id)
3263+ cmd = "sudo vblade-persist ls --no-header"
3264+ out, _err = self._execute(cmd)
3265+ exported = False
3266+ for line in out.split('\n'):
3267+ param = line.split(' ')
3268+ if len(param) == 6 and param[0] == str(shelf_id) \
3269+ and param[1] == str(blade_id) and param[-1] == "run":
3270+ exported = True
3271+ break
3272+ if not exported:
3273+ # Instance will be terminated in this case.
3274+ desc = _("Cannot confirm exported volume id:%(volume_id)s. "
3275+ "vblade process for e%(shelf_id)s.%(blade_id)s "
3276+ "isn't running.") % locals()
3277+ raise exception.ProcessExecutionError(out, _err, cmd=cmd,
3278+ description=desc)
3279+
3280
3281 class FakeAOEDriver(AOEDriver):
3282 """Logs calls instead of executing."""
3283@@ -402,7 +436,7 @@
3284 (property_key, property_value))
3285 return self._run_iscsiadm(iscsi_properties, iscsi_command)
3286
3287- def discover_volume(self, volume):
3288+ def discover_volume(self, context, volume):
3289 """Discover volume on a remote host."""
3290 iscsi_properties = self._get_iscsi_properties(volume)
3291
3292@@ -461,6 +495,20 @@
3293 self._run_iscsiadm(iscsi_properties, "--logout")
3294 self._run_iscsiadm(iscsi_properties, "--op delete")
3295
3296+ def check_for_export(self, context, volume_id):
3297+ """Make sure volume is exported."""
3298+
3299+ tid = self.db.volume_get_iscsi_target_num(context, volume_id)
3300+ try:
3301+ self._execute("sudo ietadm --op show --tid=%(tid)d" % locals())
3302+ except exception.ProcessExecutionError, e:
3303+ # Instances remount read-only in this case.
3304+ # /etc/init.d/iscsitarget restart and rebooting nova-volume
3305+ # is better since ensure_export() works at boot time.
3306+ logging.error(_("Cannot confirm exported volume "
3307+ "id:%(volume_id)s.") % locals())
3308+ raise
3309+
3310
3311 class FakeISCSIDriver(ISCSIDriver):
3312 """Logs calls instead of executing."""
3313
3314=== modified file 'nova/volume/manager.py'
3315--- nova/volume/manager.py 2011-02-21 23:52:41 +0000
3316+++ nova/volume/manager.py 2011-03-10 06:27:59 +0000
3317@@ -160,7 +160,7 @@
3318 if volume_ref['host'] == self.host and FLAGS.use_local_volumes:
3319 path = self.driver.local_path(volume_ref)
3320 else:
3321- path = self.driver.discover_volume(volume_ref)
3322+ path = self.driver.discover_volume(context, volume_ref)
3323 return path
3324
3325 def remove_compute_volume(self, context, volume_id):
3326@@ -171,3 +171,9 @@
3327 return True
3328 else:
3329 self.driver.undiscover_volume(volume_ref)
3330+
3331+ def check_for_export(self, context, instance_id):
3332+ """Make sure whether volume is exported."""
3333+ instance_ref = self.db.instance_get(context, instance_id)
3334+ for volume in instance_ref['volumes']:
3335+ self.driver.check_for_export(context, volume['id'])