Running instances can't be contacted from CLC in CLC+Walrus / CC+SC / NC topology

Bug #527648 reported by Thierry Carrez
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Eucalyptus
Fix Released
Low
Daniel Nurmi
eucalyptus (Ubuntu)
Fix Released
High
Thierry Carrez
Lucid
Fix Released
High
Thierry Carrez

Bug Description

Alpha3 UEC install (eucalyptus 1.6.2-0ubuntu4)
CLC+Walrus / CC+SC / NC topology

Running an alpha3 lucid cloud image : it starts up ("running") but you can't SSH in. console-output shows "Caught exception reading instance data". Nothing related seems to show in cloud-error.log.

Running the official karmic cloud image: it starts up ("running") but you can't SSH in (or ping it).

I suspect a networking issue, preventing the images to access the metadata service and preventing us from pinging the public or private address of the instance.

Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Thierry Carrez (ttx) wrote :

That analysis tends to be confirmed by my latest test:

From the CC+SC, I can ping and try to connect by ssh to the karmic instance.
SSH connection fails because of publickey, which tends to prove the instance could not query the metadata service to get it.

So it seems like the instance cannot contact the CLC+Walrus (to query metadata) and that the CLC+Walrus cannot contact the public/private address of the instance (for ping or SSH).

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

I can confirm this. I'm seeing this in ISO and preseed installs.

Thierry Carrez (ttx)
summary: - Running instances can't be contacted in CLC+Walrus / CC+SC / NC topology
+ Running instances can't be contacted from CLC in CLC+Walrus / CC+SC / NC
+ topology
Revision history for this message
Mathias Gug (mathiaz) wrote : Re: [Bug 527648] Re: Running instances can't be contacted in CLC+Walrus / CC+SC / NC topology

On Thu, Feb 25, 2010 at 12:39:35PM -0000, Thierry Carrez wrote:
>
> So it seems like the instance cannot contact the CLC+Walrus (to query
> metadata) and that the CLC+Walrus cannot contact the public/private
> address of the instance (for ping or SSH).
>

Is routing set correctly on the CLC+Walrus?

Could you outline your address topology?

--
Mathias Gug
Ubuntu Developer http://www.ubuntu.com

Revision history for this message
Thierry Carrez (ttx) wrote :

Setting up the missing VNET_CLOUDIP on CC seems to be the only thing preventing it to work.

Revision history for this message
Thierry Carrez (ttx) wrote :

I mean, setting up VNET_CLOUDIP on CC makes it work :)

Revision history for this message
Thierry Carrez (ttx) wrote :

OK, further testing reveals there are two issues:
* The FORWARD chain is blocking connections from CLC to instance
* The CC doesn't know the way to the metadata service (missing VNET_CLOUDIP)

Defaulting to accept on the FORWARD chain works around the first issue:
sudo iptables -P FORWARD ACCEPT

I can ping karmic instances OK from the CLC.

Adding VNET_CLOUDIP=ip.address.of.clc to the CC's eucalyptus.conf solves the second one.

I can boot a karmic or lucid instance and SSH into it.

More investigation is needed to see why the FORWARD chain isn't set up to accept the packets as it should...

Revision history for this message
Thierry Carrez (ttx) wrote :

for the record, it's the FORWARD chain on the CC, of course.

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Actually, I found I only needed to make the VNET_CLOUDIP=... change. I did not do anything to my iptables.

I'll investigate where the VNET_CLOUDIP needs to be written to eucalyptus.local.conf.

Revision history for this message
Daniel Nurmi (nurmi) wrote :

the iptables policy is set to DROP by the CC, as a way to enforce correct AWS security group semantics (inter-sec. group traffic is blocked by default until rules are added to allow traffic). However, once a public->private mapping is added (DNAT/SNAT rules show up in iptables nat table), traffic should flow freely, assuming that you've authorized ssh/ping access to the security group in which your VM is running (euca-authorize ....). Those authorizations show up iptables once the authorize rules have been applied. If you can get it into this state, it would help to see the output of:

iptables -t nat -L -n
iptables -L -n

Regards
-Dan

Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Dan-

Just to be clear ... The command:

euca-authorize default -P tcp -p 22 -s 0.0.0.0/0

Only needs to be run once, and only from an account with admin privileges, correct?

In other words, this doesn't need to be run in more than one place (CLC, and CC, or some such), right?

Revision history for this message
Thierry Carrez (ttx) wrote :

Re: "I found I only needed to make the VNET_CLOUDIP=... change"
That's what I originally found, but then when I tried to reproduce the fix I also needed to tweak the FORWARD rule to connect from CLC to instance. I guess I got something wrong.

Dan, can we set VNET_CLOUDIP on the CC in all cases, or should it only be set if the CLC is separate ?

Re: euca-authorize
IIUC it needs to be run only once (from anywhere) for the security group you want to use ("default" being the default one). You probably need to be admin to modify the "default" one.

Revision history for this message
Daniel Nurmi (nurmi) wrote :

euca-authorize sets up a rule that is specific to a user/security group. For example, if you are using admin credentials and the group you're authorizing is 'default', then the rule will apply to all instances run by 'admin' in the group 'default'. If you acting as different user, say 'foobar', then you will need to run the command again to authorize the 'foobar/default' group. In a nutshell, the authorizations are tied to user/group pairs, and only need to be run once per user/group pair.

Regards,
-Dan

Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu Lucid):
milestone: none → ubuntu-10.04-beta-1
Changed in eucalyptus (Ubuntu Lucid):
status: Confirmed → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2-0ubuntu5

---------------
eucalyptus (1.6.2-0ubuntu5) lucid; urgency=low

  [ Colin Watson ]
  * Preseed postfix/main_mailer_type on the cloud controller, not the
    cluster controller (LP: #455746).

  [ Dustin Kirkland ]
  * debian/eucalyptus-cc.postinst, debian/eucalyptus-udeb.finish-install,
    debian/eucalyptus-cc.templates, debian/eucalyptus-udeb.postinst:
    - Have the CLC add it's IP address to the served preseed file, such
      the CC can pick it up and write it to eucalyptus.local.conf
      as the required VNET_CLOUDIP value (on separated CC, CLC installs),
      LP: #527648
 -- Dustin Kirkland <email address hidden> Fri, 26 Feb 2010 11:41:45 -0600

Changed in eucalyptus (Ubuntu Lucid):
status: In Progress → Fix Released
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Reopening, not yet fixed.

VNET_CLOUDIP did not make it into the preseed file, unfortunately.

Changed in eucalyptus (Ubuntu Lucid):
status: Fix Released → In Progress
assignee: nobody → Dustin Kirkland (kirkland)
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2-0ubuntu6

---------------
eucalyptus (1.6.2-0ubuntu6) lucid; urgency=low

  * debian/eucalyptus-cc.config, debian/eucalyptus-udeb.finish-install,
    tools/euca_conf.in: continuation of previous fixes to LP: #527648;
    must add VNET_CLOUDIP to euca_conf (should send this upstream)
 -- Dustin Kirkland <email address hidden> Fri, 26 Feb 2010 16:19:56 -0600

Changed in eucalyptus (Ubuntu Lucid):
status: In Progress → Fix Released
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Still not fixed :-(

The eucalyptus/cloud-address value is not landing in the preseed.conf.

Colin, could you take a look? There's something going wrong here with the scope of that db_get, I think. I'm wondering if it needs to happen in the install chroot or something?

Baffled and tired...
Dustin

Changed in eucalyptus (Ubuntu Lucid):
status: Fix Released → Confirmed
assignee: Dustin Kirkland (kirkland) → Colin Watson (cjwatson)
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Aha, I think I understand why ...

During the CLC install, I think the user is prompted for eucalyptus/cloud-address (if it exists), and if the user leaves it blank (and none is detected on the network), then the installer suggests installing this machine as the CLC. Hence, eucalyptus/cloud-address remains blank. So I just think we need to determine that address in a different manner. I'll take another look.

Changed in eucalyptus (Ubuntu Lucid):
assignee: Colin Watson (cjwatson) → Dustin Kirkland (kirkland)
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Attaching /var/log/installer/syslog from my CC/SC install.

Search this file for 10.1.1.71 (the address of the CLC) and you'll see that eucalyptus/cloud-address is set to 10.1.1.71:8773, at least for a bit.

Maybe it's set correctly in the installer environment but that doesn't get propagated to the install chroot. Needs a little more investigation.

Revision history for this message
Thierry Carrez (ttx) wrote :

The main issue is that eucalyptus/cloud-address was just meant to hold the value of the user answer to the "No CLC detected, enter IP address" question from the installer (in order to get an IP address to get the CC/SC/Walrus preseed from). In order to use it for this use case, you'll need to preseed in the target (magic that involves some logfile, see how Colin seeded the postfix debconf value).

There is also a trick in the value cloud-address can get: when set by the user it's an IP address, when "discovered" by euca_find_component its "IPaddress:port". It's all normalized before being used in the postinst by "${cloud%:*}": you should probably do the same before writing VNET_CLOUDIP.

Revision history for this message
Daniel Nurmi (nurmi) wrote :

The upstream part of this fix (add VNET_CLOUDIP and VNET_LOCALIP to euca_conf --import-conf) is in revno 1202

-Dan

Changed in eucalyptus:
status: New → Confirmed
importance: Undecided → Low
assignee: nobody → Daniel Nurmi (nurmi)
status: Confirmed → Fix Committed
Revision history for this message
Scott Moser (smoser) wrote :

I tried working around this as described above. In data center UEC at the moment:
santol (10.55.55.7): cluster-controller (CC)
cempadek (10.55.55.2): cloud-controller (CLC)

ubuntu@santol:~$ grep VNET_CLOUDIP /etc/eucalyptus/eucalyptus.local.conf
VNET_CLOUDIP=10.55.55.2

I ran 'sudo iptables -P FORWARD ACCEPT' on both CC and CLC. Then:
$ euca-run-instances --key sm-kp emi-B1E21863
$ euca-describe-instances i-3F580796
RESERVATION r-4AB408BB admin default
INSTANCE i-3F580796 emi-B1E21863 10.55.55.100 172.19.1.2 running sm-kp 0 m1.small 2010-03-01T19:18:37.335Z UEC-TEST1 eki-1E661D65 eri-F91C1CD8

The instance boots and console-output shows ec2-init ran successfully (ie, metadata service could be reached from the instance).

Now, on cempadek:
ping 10.55.55.100 succeeds
ping 172.19.1.2 fails

The instance
Maybe there was some user error, or misunderstanding, but it appears that from the CLC I can only reach the external address, not the internal. From santol (CC) either can be used.

Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu Lucid):
assignee: Dustin Kirkland (kirkland) → Thierry Carrez (ttx)
status: Confirmed → Triaged
Thierry Carrez (ttx)
Changed in eucalyptus (Ubuntu Lucid):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package eucalyptus - 1.6.2-0ubuntu7

---------------
eucalyptus (1.6.2-0ubuntu7) lucid; urgency=low

  [ Dustin Kirkland ]
  * tools/euca_conf.in: fix manual node registration with rsync, LP: #530942
    - fix one stray 'rsync', replace with $RSYNC
    - use sudo -u $EUCA_USER to match (working) behavior with scp
    - use local $RSYNC_RSH variable, rather than exporting to the environment
  * eucalyptus-cc.templates, eucalyptus-nc.config, eucalyptus-nc.templates,
    eucalyptus-sc.templates: fix the default cluster name 'cluster1', as
    this was not getting populated in the -nc if the default cluster name
    was accepted on the CC, LP: #530937

  [ Thierry Carrez ]
  * Fixed eucalyptus-nc.templates so that eucalyptus-nc postinst doesn't fail
  * eucalyptus-udeb.*: Preseed detected cloud for the CC installer to pick
    it up and set VNET_CLOUDIP in separated CC case (LP: #527648)
 -- Thierry Carrez <email address hidden> Wed, 03 Mar 2010 14:02:24 +0100

Changed in eucalyptus (Ubuntu Lucid):
status: In Progress → Fix Released
Changed in eucalyptus:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.