gzipped and base64 encoded user-data leads to failure

Bug #1884071 reported by Bryan Carmichael
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init
Fix Released
Undecided
Unassigned

Bug Description

Issue is here as well: https://github.com/terraform-providers/terraform-provider-aws/issues/8244

In /var/log/cloud-init.log this is the only WARNING message (there are no ERROR messages):

2020-06-18 12:16:59,413 - __init__.py[WARNING]: Unhandled non-multipart (text/x-not-multipart) userdata: 'b'H4sIAAAAAAAA/8RV3W4bNxO9'...'

I can even pull the file from the created VM and it can be extracted and shows to be proper formatting and everything:

curl -L http://169.254.169.254/latest/user-data/ | base64 --decode | gunzip

Content-Type: multipart/mixed; boundary="MIMEBOUNDARY"
MIME-Version: 1.0

--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/cloud-config
Mime-Version: 1.0

#cloud-config
# set locale
locale: en_GB.UTF-8
# ensure time sync between all nodes
ntp:
  enabled: true
  ntp_client: chrony
# hides ssh keys in console
ssh_fp_console_blacklist: [ssh-dss, ssh-dsa, ssh-ed25519]
ssh_key_console_blacklist: [ssh-dss, ssh-dsa, ssh-ed25519]

# upgrade all packages and install necessary ones
package_upgrade: true
package_reboot_if_required: true
packages:
- apt-transport-https
- ca-certificates
- curl
- gnupg-agent
- software-properties-common
- build-essential
- libssl-dev
- make

# set random root password and disable password login for ssh
chpasswd:
  expire: false
  list: |
      root:RANDOM
ssh_pwauth: no

# create sre user with sudo privs and set autrhorized key
users:
- name: sre
  groups: sudo
  lock_passwd: true
  ssh_authorized_keys:
   - censored
  sudo: ['ALL=(ALL) NOPASSWD:ALL']
  shell: /bin/bash

--MIMEBOUNDARY
Content-Transfer-Encoding: 7bit
Content-Type: text/cloud-config
Mime-Version: 1.0

#cloud-config
# Configure Floating IP (Ubuntu 20.04 LTS)
# Not required when using https://github.com/costela/hcloud-ip-floater
#write_files:
# - content: |
# network:
# version: 2
# ethernets:
# eth0:
# addresses:
# - ${floating_ip}/32
# path: /etc/netplan/60-floating-ip.yaml
# Install Keepalived
runcmd:
- cd /root/
- wget http://www.keepalived.org/software/keepalived-2.1.2.tar.gz
- tar xvf keepalived-2.1.2.tar.gz
- cd keepalived-2.1.2
- ./configure
- make
- sudo make install

final_message: "The system is finally up, after $UPTIME seconds"

When I use the exact same userdata as above but extracted and passed as plain text it works without issue.

Revision history for this message
Bryan Carmichael (bcxxbc) wrote :
Revision history for this message
Scott Moser (smoser) wrote :

Thank you for taking the time to file a bug on cloud-init.

Is there a reason you base64 encoded the content? Was their documentation
to lead you to believe you should? Some clouds (ec2) require the user of
the API to base64 user-data when posted to their API.

Cloud-init should correctly treat the user-data on Hetzner cloud as binary
content and uncompress it if it is gzipped data. You should not need to
base64 encode it.

If you need to make it some text-friendly format, you could try one of the
other formats described in [1]. Both cloud-config-archive and mime
multipart are text friendly formats.

 [1] https://cloudinit.readthedocs.io/en/latest/topics/format.html?highlight=include%20file#id2

If you think I've mis-diagnosed this, please feel free to put more
information in and set the Status back to New.

Changed in cloud-init:
status: New → Incomplete
Revision history for this message
Bryan Carmichael (bcxxbc) wrote :

Hey Scott,

I don't see how it'd be possible to submit gzip binary data with hetzner cloud as user-data since the user-data field expects a string on hetzner cloud. This means the only way to provide user-data is with base64encoding.

I assumed since other providers with cloud-init allow base64 this would be the way it would work here as well. Cloud-init is a singular separately maintained project included with the OS, so I'd expect the base64 handling be a standard feature handled from within the application itself not something the cloud provider would maintain or add on their own.

Even the terraform modules for templating cloud init hard depend base64 encoding on enabling gzip compression:

Error: base64_encode is mandatory when gzip is enabled

  on main.tf line 1, in data "cloudinit_config" "user_data":
   1: data "cloudinit_config" "user_data" {

I tried changing my mime type to use text/cloud-config-archive instead of text/cloud-config with the same result.

Revision history for this message
Scott Moser (smoser) wrote :

@Bryan

> I don't see how it'd be possible to submit gzip binary data with hetzner
> cloud as user-data since the user-data field expects a string on hetzner
> cloud. This means the only way to provide user-data is with
> base64encoding.

cloud-init treats the user-data content as binary. If the platform does
not allow you to put binary data there, then there isn't much we can do.

In the case of "gzip"ed data, cloud-init attempts to transparently
uncompress it. But otherwise, without explicit instruction, cloud-init
does not modify the data.

> I assumed since other providers with cloud-init allow base64 this would
> be the way it would work here as well. Cloud-init is a singular

Other providers like EC2 actually provide the consumer (cloud-init) with
binary data. They just require base64 encoding when posting to their API
server. But they *always* require base64, and *always* provide binary to
cloud-init. So there is no guessing.

> separately maintained project included with the OS, so I'd expect the
> base64 handling be a standard feature handled from within the
> application itself not something the cloud provider would maintain or
> add on their own.

> Even the terraform modules for templating cloud init hard depend base64
> encoding on enabling gzip compression:

> Error: base64_encode is mandatory when gzip is enabled

> on main.tf line 1, in data "cloudinit_config" "user_data":
> 1: data "cloudinit_config" "user_data" {

> I tried changing my mime type to use text/cloud-config-archive instead
> of text/cloud-config with the same result.

If you submit user-data to the API, and then cloud-init correctly
gets the cloud-config-archive, then it should work.

If you have submitted such data, please run 'cloud-init collect-logs' and
attach the output.

Revision history for this message
Scott Moser (smoser) wrote :

fwiw, I'm fully aware that cloud-config-archive and mime data are not well documented :-(

Revision history for this message
Scott Moser (smoser) wrote :

Well, this does make cloud-config-archive support gzipped blobs, so in one way it fixes an issue that was reported. But the problem really seems to be that:

a.) Hetzner only supports (utf8?) strings in their user-data
b.) "terraform modules for templating cloud init hard depend base64 encoding on enabling gzip compression:"

Another option is just to add transparent decompression on the user-data blob that we get from hetzner. The problem with transparent decompression is you can't turn it off...

That change would look like this:

--- a/cloudinit/sources/DataSourceHetzner.py
+++ b/cloudinit/sources/DataSourceHetzner.py
@@ -59,7 +59,13 @@ class DataSourceHetzner(sources.DataSource):
                 self.userdata_address, timeout=self.timeout,
                 sec_between=self.wait_retry, retries=self.retries)

- self.userdata_raw = ud
+ # Hetzner cloud does not support binary user-data. So try to
+ # transparently compress what we download. The issue with
+ # doing this is that now /var/lib/cloud-init/cloud-config.txt
+ # woudl have *uncompressed* content in it. A user that was
+ # rightfully expecting it to have exactly the user-data they
+ # provided to Hetzner would be broken.
+ self.userdata_raw = util.decomp_gzip(ud, quiet=True, decode=False)
         self.metadata_full = md

         """hostname is name provided by user at launch. The API enforces

Revision history for this message
Bryan Carmichael (bcxxbc) wrote :

Hey Scott,

Wow thanks so much for the many responses and looking closely into this. You are correct that it expects utf-8 for the string for user-data.

In my opinion, your first solution seems like the best path to take because it has no implications from provider to provider I imagine. I see no negatives in detecting the gzip data then extracting since if it isn't gzip data then it would just run same as before.

As for this:

> If you submit user-data to the API, and then cloud-init correctly
> gets the cloud-config-archive, then it should work.
>
> If you have submitted such data, please run 'cloud-init collect-logs' and
> attach the output.

When I tried with the "text/cloud-config-archive" MIME type it was the same exact error message as before because the only way I could submit it was with gzipped base64 due to how hetzner cloud allows one to submit the user-data.

I did open a ticket with hetzner to ask them how they handle it and maybe they will do something from their end, but I haven't gotten any solid response yet after giving them log data.

Either way, thanks so much for your hard work so far on this!

Revision history for this message
Bryan Carmichael (bcxxbc) wrote :

Hey Scott,

So I spoke to Hetzner and they think allowing for handling of base64 binary data is the way to go:

"Dear Client,

thank you for your feedback. We are not planning to change the cloudinit handling on our side, so there are no future plans to handle binary data.

We are happy to hear that someone actually adds that feature to our datasource in cloudinit.

Mit freundlichen Grüßen / Kind regards

Jonas Keidel"

Revision history for this message
Scott Moser (smoser) wrote :

@Bryan,

So the response from Hetzner is: No we can't change, someone else should.
Its not very helpful.

Motivating factors for that response:
 a.) Hetzner would have work to do to implement that feature
 b.) Hetzner would potentially break some existing users.

Those are the same arguments from cloud-init side for not transparently doing this in cloud-init.

One last question: Is there a limit to the size of Hetzner user-data?

I've put up a pull request at https://github.com/canonical/cloud-init/pull/448 . I'm pretty sure that it should support your usecase of base64 encoded gzipped data, but I would like your feedback/test.

Revision history for this message
Bryan Carmichael (bcxxbc) wrote :

@Scott,

Thanks for you continued engagement on this, and sorry for the slow reply as I was waiting to hear back from Hetzner to answer your question. They told me their limit for user-data is 32KiB. Better than the 16KiB I thought it was, but I am fairly certain it will be reached by me shortly as I am working to make a modular multipart user-data terraform project to provision entire k8s clusters using almost only cloud-init modules.

How can I test it properly since it seems if I provide a custom image on hcloud it will no longer support working with user-data? I am happy to test if you can shine some light on the best way to do it.

Thanks!

Bryan

Scott Moser (smoser)
Changed in cloud-init:
status: Incomplete → Fix Committed
Revision history for this message
James Falcon (falcojr) wrote : Fixed in cloud-init version 20.3.

This bug is believed to be fixed in cloud-init in version 20.3. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in cloud-init:
status: Fix Committed → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.