images should not be largely bigger than the actual content

Bug #1619362 reported by Oliver Grawert
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Image
Fix Released
High
Unassigned

Bug Description

currently building a raspberry pi image results in a 4GB file for about < 300MB content ... since we resize the writable partition to full disk size on first boot this is a lot of wasted space and makes dd take enormously long to just write zeros to the SD card.
ubuntu-image should check the actual requirements for the content that gets added to the image and probably add 10-100MB wiggle room so we dont end up with gigantic empty images.

Revision history for this message
Steve Langasek (vorlon) wrote :

The image is created as a sparse file, so does not occupy 4G on disk. We use xz for compression everywhere, so the sparseness is preserved when decompressing. What is the use case for which the pre expanded filesystem causes a problem? Note that even dd has sparse handling, though this was unfamiliar to me until recently, so it's possible even to ensure efficiency when writing out to an sd card.

I'm not saying we can't make this change, if first boot resizing is preferred; I'd just like to be sure we understand what is actually impacted.

Changed in ubuntu-image:
status: New → Incomplete
Revision history for this message
Oliver Grawert (ogra) wrote :

dd'ing a sparse 4GB file still writes 4GB of zeros to my SD which takes about 20min (what is the magic to make it not do that that you mention above ?) while a 150MB file (as the pi2 image would likely be if we'd just map to the content size) takes below 1min ...

also when we release images we usually manually compress them using xz which takes a horrid amount of time for just compressing zeros ... (i understand cdimage will take care for this in future images for us though)

additionally you indeed force the user to have the amount of diskspace available when he wants to uncompress before writing the image.

Revision history for this message
Steve Langasek (vorlon) wrote : Re: [Bug 1619362] Re: images should not be largely bigger than the actual content

On Thu, Sep 01, 2016 at 05:09:32PM -0000, Oliver Grawert wrote:
> dd'ing a sparse 4GB file still writes 4GB of zeros to my SD which takes
> about 20min (what is the magic to make it not do that that you mention
> above ?)

conv=sparse option to dd.

> while a 150MB file (as the pi2 image would likely be if we'd just map to
> the content size) takes below 1min ...

The rootfs image we create shows here as 289M (du -sh
workdir/.images/root.img). Is this so much smaller because of filesystem
overhead when creating a larger image?

> also when we release images we usually manually compress them using xz
> which takes a horrid amount of time for just compressing zeros ... (i
> understand cdimage will take care for this in future images for us
> though)

xz does create sparse files by default, but it's possible that it doesn't
support /reading/ sparsely on input. If so, that's also a good argument for
creating the image smaller and expanding it on first boot, though it's an
impact only on the building and not on the user experience.

> additionally you indeed force the user to have the amount of diskspace
> available when he wants to uncompress before writing the image.

Certainly not. xz uses sparse unpacking by default.

Revision history for this message
Oliver Grawert (ogra) wrote :

ogra@anubis:~$ ls -l datengrab/images/snappy/u-image-pi2.img
-rw-r--r-- 1 root root 4000000000 Sep 1 17:30 datengrab/images/snappy/u-image-pi2.img
...
ogra@anubis:~$ du -sh datengrab/images/snappy/u-image-pi2.img
3,8G datengrab/images/snappy/u-image-pi2.img

this image was created using http://people.canonical.com/~ogra/snappy/pi2-model.assertion with http://people.canonical.com/~ogra/snappy/ubuntu-image_0.5_amd64.snap (snapcraft build from a PR from mvo with some extra fixes to master) ...

the actual content of the image are a 85MB pi2-kernel snap
and a 56MB ubuntu-core snap plus whatever overhead the image creation adds (pre-created writable bits, unpacked gadget, copied kernel and initrd files etc), nearly 300MB sounds actually pretty large for that content.

i wasnt aware conv=sparse can be used when writing SD cards, i'll play with that, thanks !

Revision history for this message
Steve Langasek (vorlon) wrote :

On Thu, Sep 01, 2016 at 06:34:51PM -0000, Oliver Grawert wrote:
> ogra@anubis:~$ ls -l datengrab/images/snappy/u-image-pi2.img
> -rw-r--r-- 1 root root 4000000000 Sep 1 17:30 datengrab/images/snappy/u-image-pi2.img
> ...
> ogra@anubis:~$ du -sh datengrab/images/snappy/u-image-pi2.img
> 3,8G datengrab/images/snappy/u-image-pi2.img

> this image was created using
> http://people.canonical.com/~ogra/snappy/pi2-model.assertion with
> http://people.canonical.com/~ogra/snappy/ubuntu-image_0.5_amd64.snap
> (snapcraft build from a PR from mvo with some extra fixes to master) ...

Ok, I've been able to reproduce the problem now. The issue is that you were
probably using sensible default options, and therefore ubuntu-image was
doing all the building in /tmp; whereas I've been futzing with ubuntu-image
internals and was therefore doing all of my builds in a named working
directory; and the final stage of the build is to move the image from the
working directory to its output location with shutil.move(), which falls
back to shutil.copy2() for a cross-filesystem move (like if /tmp is a
tmpfs), and this function is not sparse-aware.

  https://bugs.python.org/issue10016

So if you're doing all the building in your current directory, you get a
sparse file that is renamed and there's no problem. If you're doing it
under /tmp, you get a sparse file that is then copied by a non-sparse-aware
python function.

I believe this may also explain the report of bug #1619351, and believe that
also means that switching to os.truncate() won't fix the problem.

> the actual content of the image are a 85MB pi2-kernel snap
> and a 56MB ubuntu-core snap plus whatever overhead the image creation adds
> (pre-created writable bits, unpacked gadget, copied kernel and initrd
> files etc), nearly 300MB sounds actually pretty large for that content.

snap prepare-image stores two copies of the snaps, one in
/var/lib/snapd/seed/snaps and one in /var/lib/snapd/snaps. So it doesn't
look like filesystem overhead is the cause of the increase here.

Revision history for this message
Oliver Grawert (ogra) wrote :

oh, indeed ... i totally forgot about rollback ... yeah, the ~300MB size seems fine with all snaps duplicated.

Revision history for this message
Barry Warsaw (barry) wrote :

I knew about potential cross-fs copies, but didn't realize the effect of de-sparsifying the file. IIRC we used some other mechanism in an earlier version of the code that adjusted for cross-fs copies, but I don't remember the details now.

At the very least we should warn or verify the sparseness both before the copy and afterward, or warn if we know we're doing cross-fs copies. I'm not sure of the best way to check for sparseness in Python, other that perhaps shelling-out. We'll also need an alternative for cross-fs copies in that case.

Revision history for this message
Oliver Grawert (ogra) wrote :

excuse my ignorance, but what is the benefit of using a sparse file at all over a properly sized img ? i really dont see any advantage in wasting all these zeros :)

Revision history for this message
Steve Langasek (vorlon) wrote :

On Fri, Sep 02, 2016 at 04:04:30PM -0000, Oliver Grawert wrote:
> excuse my ignorance, but what is the benefit of using a sparse file at
> all over a properly sized img ? i really dont see any advantage in
> wasting all these zeros :)

I'm not opposed to us switching to online-resizing of the filesystem. But
here are a couple reasons why, in general, we should be doing sparseness and
not just relying on on-line resizing:

 - Only the rootfs is auto-expanded. All the other filesystems get as much
   space as is initially allocated in the gadget snap, so there are going to
   be a lot of zeroes there.
 - Some partition schemes may require placing additional partitions at the
   end of the disk, /after/ the writable partition. In this case, we won't
   be able to online resize anything, but will instead have to statically
   allocate it.

Revision history for this message
Oliver Grawert (ogra) wrote :

i totally understand why you would have sparse partition images that leaves needed wiggle room for partition content ...

what i was referring to was the image as a whole being giant sparse blob which i find unnecessary ...

in the case where something lives after the writable partition you are indeed correct that we need sparseness there.

Revision history for this message
Michael Vogt (mvo) wrote :

As a data-point: using `dd if=image.img of=/dev/device conv=sparse` is not practical, it is very fast (of course) but e.g. uboot.env needs to be written with a lot of \0 in it. It seems that sparse just jump over these (see dd.c:iwrite()) which leads to e.g. a corrupted uboot.env (checksum does no longer match). So we would need something like a `conv=keep-sparse` or something that would only skip writes if the input also is a sparse section. It seems like stock dd does not do that.

Revision history for this message
Steve Langasek (vorlon) wrote :

> As a data-point: using `dd if=image.img of=/dev/device conv=sparse`
> is not practical, it is very fast (of course) but e.g. uboot.env needs to
> be written with a lot of \0 in it. It seems that sparse just jump over
> these (see dd.c:iwrite()) which leads to e.g. a corrupted uboot.env

This is a very good argument. Bumping priority of this bug accordingly.

Changed in ubuntu-image:
importance: Undecided → High
status: Incomplete → Triaged
Steve Langasek (vorlon)
Changed in ubuntu-image:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.