CloudStack data source uses wrong address

Bug #1089989 reported by Gerard Dethier
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Medium
Gerard Dethier

Bug Description

Found with a build from trunk (revision 747).

In CloudStack's documentation, it is stated that meta/user-data can be retrieved from CloudStack's Virtual Router (VR): (see http://incubator.apache.org/cloudstack/docs/en-US/Apache_CloudStack/4.0.0-incubating/html/Admin_Guide/user-data-and-meta-data.html). However, cloud-init retrieves these information from default gateway. VR and default gateway may be the same machine (i.e. have the same address) in some cases, but that is not be always true (actually, in my case, it is not).

The fix seems easy: in file 'DataSourceCloudStack.py', the method 'get_default_gateway' should be corrected.

I may provide a patch soon, if it can help.

Related branches

description: updated
Revision history for this message
Gerard Dethier (g-dethier) wrote :

Here is a patch that fixes, at least in my case, the bug.

Revision history for this message
Gerard Dethier (g-dethier) wrote :

Hello, the patch I previously submitted was invalid, here is the right one.

Revision history for this message
Scott Moser (smoser) wrote :

Gerard,
  Thanks for updating.
   Reading the patch, I have 2 questions:
a.) are you sure that getting the router from dhcp response is reliable in this scenario?
b.) is there a better choice for the url to check ?
   Assuming that '/latest/meta-data/instance-id' will always be present is essentially assuming that their MD api version will *always* have that. In other cases we've chosen values to hard code, we've relied on a specific version of the metadata service.
  Ie: /2012-12-12/meta-data/instance-id
 Perhaps we could at lesat try 2 differnet urls there as a failsafe?

Revision history for this message
Gerard Dethier (g-dethier) wrote : Re: [Bug 1089989] Re: CloudStack data source uses wrong address

Scott,

> Reading the patch, I have 2 questions:
> a.) are you sure that getting the router from dhcp response is reliable in this scenario?
I hope so, it is the recommended method in CloudStack's documentation
(see link below). To go into more details, CloudStack uses a VM called
the Virtual Router that exposes several services to created VMs, among
which meta-data and DHCP. Therefore, using DHCP response works. However,
if for some reason several DHCPs are detected, there is an ambiguity
that my code considers as an error.

> b.) is there a better choice for the url to check ?
> Assuming that '/latest/meta-data/instance-id' will always be present is essentially assuming that their MD api version will *always* have that. In other cases we've chosen values to hard code, we've relied on a specific version of the metadata service.
> Ie: /2012-12-12/meta-data/instance-id
> Perhaps we could at lesat try 2 differnet urls there as a failsafe?
It seems there is no better choice (see link below). Currently
CloudStack does not have the concept of API version: there are no
multiple URLs for different versions, only a "latest" one.

http://incubator.apache.org/cloudstack/docs/en-US/Apache_CloudStack/4.0.0-incubating/html/Admin_Guide/user-data-and-meta-data.html

Scott Moser (smoser)
Changed in cloud-init:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Scott Moser (smoser) wrote :

fixed in revno 753.
Please test my integration changes, Thanks for your help.

Changed in cloud-init:
status: Confirmed → Fix Committed
Revision history for this message
Gerard Dethier (g-dethier) wrote :

Just tested with revno 754 and it works. Thank you!

Revision history for this message
Scott Moser (smoser) wrote :

Gerard,
  I went and looked a bit at this, as I was hoping to just simply change it to use 'util.load_file' rather than 'open' directly.

Then, I found a few things:
 * ubuntu's "dhclient leases" directory is /var/lib/dhcp, not /var/lib/dhclient. So this would appear broken on Ubuntu (I just looked at my desktop running 12.10).
 * I have 65 files in that directory matching *.lease or *.leases, some of them:

   $ ( cd /var/lib/dhcp && ls -altr *.lease *.leases ) > /tmp/out
  995 Oct 24 2011 dhclient-134bd9a3-e41f-4d54-bd8c-4ea6ad41d0dd-eth0.lease
  995 Oct 25 2011 dhclient-f2ec2e51-12fc-4b69-a2f7-be079e4c23fa-eth0.lease
  549 Oct 30 2011 dhclient-80e3dc75-8306-4e89-96c7-edac04cde415-eth0.lease
    0 Oct 30 2011 dhclient-154048a8-3379-47da-8527-6c9d68ec2526-wlan0.lease
    0 Oct 30 2011 dhclient.leases
 1016 Oct 31 2011 dhclient-9b714964-4a60-449c-a1d3-eec7bc5d35bf-eth0.lease
  <snip>
  789 Dec 3 08:56 dhclient-9798af6e-1c1b-40ea-9a0e-1c25b7621fd6-eth0.lease
  789 Dec 5 13:11 dhclient-394c8646-57d5-48cf-8263-4866aa99f153-eth0.lease
 1519 Dec 8 23:59 dhclient-d166b4ac-9cf3-4f46-8571-b7f5b12f14b4-wlan0.lease

   output was trimmed for brevity

   So, my directory would end up with a None result although there were 12 matches (len(addresses) == 12)

Could we make the following changes:
 * support searching /var/lib/dhcp, ordered by write time, and take the last match.
 * if None is found fall back to the incorrect 'route' solution we had before.
 * make searching directories configurable and make it default to include both
   /var/lib/dhcp and /var/lib/dhclient.

I'm sorry I didn't take a deeper look before, but I'm pretty sure Ubuntu is broken if we ship this.

Revision history for this message
Gerard Dethier (g-dethier) wrote :

Scott,

Sorry, it seems I did not go enough into details with this patch.

* Regarding Ubuntu's "dhclient leases" directory, you are totally right. I only tested on Fedora and CentOS but did not think that Ubuntu (and Debian?) distros use another directory. Search in both directories should indeed be implemented and configurable in the way you mention.

* For the multiple lease files, it seems the matter is rather "complex" as a single file may actually contain several leases (fortunately, they all seem to come from same DHCP server), some of them being expired. Your solution to consider the youngest lease file and, if there is an ambiguity, fall-back on previous approach (choosing default route) is probably the best as it should work in most cases.

Unfortunately, I will not be able to work on this soon. I can submit a new patch in 1-2 weeks if it is OK...

Revision history for this message
Scott Moser (smoser) wrote :

1-2 weeks is fine. I went ahead and assigned you, and targetted this for 0.7.2, since we really need to have it fixed otherwise we regress.

Thanks.

Changed in cloud-init:
status: Fix Committed → In Progress
assignee: nobody → Gerard Dethier (g-dethier)
milestone: none → 0.7.2
Revision history for this message
Gerard Dethier (g-dethier) wrote :

Here is a new patch (against revno 754) related to previous discussion. I did as follows:
- both lease directories are supported (existing one is chosen)
- only latest (based on modification time) lease file is considered
- only latest DHCP identifier is considered (no more ambiguity)
- fall-back to default gateway if no DHCP identifier was found

Revision history for this message
Gerard Dethier (g-dethier) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

fixed in trunk at revision 757.

Thanks!

Changed in cloud-init:
status: In Progress → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

fixed in 0.7.2

Changed in cloud-init:
milestone: 0.7.2 → none
status: Fix Committed → Fix Released
Revision history for this message
Kevin (kbockspam) wrote :

Could this also please be ported to precise?

Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi Kevin,

If you would like to prepare a patch for the precise version of cloud-init that is tested on a CloudStack environment where you see this problem, we would be able to look at integrating it.

It's also worth noting that precise has less than 2 years until it reaches its EOL, so you would probably be best served by migrating to trusty instead of precise where possible. :)

Thanks,

Dan

Revision history for this message
Carlos (creategui) wrote :

Hi Dan,
I'll add another request for this to get ported to precise. Until this bug was fixed, the CloudStack datasource has never worked. Prior to this fix it was connecting to the wrong metadata server. This fix resolved that.

I noticed that there have been other fixes ported to precise, like this one https://bugs.launchpad.net/cloud-init/+bug/1422388, but as I mentioned in my comment on that one those changes will not do anything unless this bug is also ported.

thanks,
Carlos

Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi Carlos,

It's not entirely true that other precise fixes won't do anything; some CloudStack environments are deployed in a way that precise's cloud-init works with.

Unfortunately, I don't have access to such an environment to test any development work I do. If you'd like to prepare a patch to fix this, I can help you work through the SRU process to get it in to precise.

Thanks,

Dan

Revision history for this message
Carlos (creategui) wrote :

Hi Dan,

I have looked at this further and I guess it is possible for Cloudstack's "Virtual Router" to also act as a gateway in certain configurations. That may explain why it has worked for some folks. Unfortunately not in my configuration.

I would like to take you up on your offer to help me get a patch through.

I have a cloud-init.log I can share with you from a system with cloud-init 0.6.3-0ubuntu1.17 where it takes nearly 35minutes for cloud-init to finally give up and let the machine finish booting because it is unable to connect to the meta-data server.

I have updated the patch I have been using for a while (pre 1.17) to work on the changes that happened to the Cloudstack datasource 0.6.3-0ubuntu1.17 which I can send you along with the corresponding cloud-init.log

We are in the process of moving over to trusty but I still have a few folks using precise. This would help.

thank you
Carlos

Revision history for this message
Dan Watkins (oddbloke) wrote :

Hi Carlos,

Could you attach the patch you've been using to this bug so I can review it?

Thanks,

Dan

Revision history for this message
Carlos (creategui) wrote :

Patch to current CloudStack data source on precise to bring it up to equivalent of current version (1118) of this file.

Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.