Comment 3 for bug 1891473

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

After some investigation, seems we were able to narrow down the issue.

* tl;dr: After the upgrade of the PPA builder to Bionic, the memlock limit (ulimit -l) was bumped from a ridiculous low value (64) to something bigger (16M). Happens that cryptsetup then succeeded in its call to mlockall(), so all allocations got restricted by such limit, which is still a bit low and it ends up leading to allocation failures.
When the limit is very low (like in Xenial), the lock procedure fails, and cryptsetup allocations are not subject to this restriction, so everything just works.

See section "Conclusion" for alternatives on how to fix this

* Details:
I manage to reproduce that by collecting the luks2-validation images in a local environment, running a Bionic VM + LXD (a Focal container). By collecting the strace of luksDump in both environments, we got the following:

### LXD - NOT working
...
openat(AT_FDCWD, "./luks2-metadata-size-4m.img", O_RDONLY|O_DIRECT) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
lseek(5, 0, SEEK_SET) = 0
read(5, "LUKS\272\276\0\2\0\0\0\0\0@\0\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0\0"..., 4096) = 4096
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily unavailable)
brk(0x55e789c38000) = 0x55e78982d000
mmap(NULL, 4325376, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily unavailable)
lseek(5, 16384, SEEK_SET) = 16384
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(5, 32768, SEEK_SET) = 32768
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
lseek(5, 65536, SEEK_SET) = 65536
read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
...

### VM - working
...
openat(AT_FDCWD, "./luks2-metadata-size-4m.img", O_RDONLY|O_DIRECT) = 5
fstat(5, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
fstat(6, {st_mode=S_IFREG|0644, st_size=16777216, ...}) = 0
lseek(5, 0, SEEK_SET) = 0
read(5, "LUKS\272\276\0\2\0\0\0\0\0@\0\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0\0"..., 4096) = 4096
mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b06031000
lseek(5, 4096, SEEK_SET) = 4096
mmap(NULL, 4198400, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6b05c30000

So: as mmap fails, lseeks start to be attempted with wrong sizes, 2K^N, where N=4,5,...

In cryptsetup code: on luks2_disk_metadata.c, function LUKS2_disk_hdr_read(), we fail and try all known offsets, as per the below code:

[...]
   * No header size, check all known offsets.
                 */
                for (r = -EINVAL,i = 0; r < 0 && i < ARRAY_SIZE(hdr2_offsets); i++)
[...]

This explains why we see that many lseeks in the LXD failing case, with multiple offsets.

But then, why we fail? In the failing case, on funtion LUKS2_disk_hdr_read(), we fail right in the first header read, as per code in lib/luks2/luks2_disk_metadata.c:

[...]
         * Read primary LUKS2 header (offset 0).
         */
        state_hdr1 = HDR_FAIL;
        r = hdr_read_disk(cd, device, &hdr_disk1, &json_area1, 0, 0);
[...]

The failure comes in a malloc(), specifically in hdr_read_disk():

[...]
        r = hdr_disk_sanity_check_pre(cd, hdr_disk, &hdr_json_size, secondary, offset);
        if (r < 0) {
                return r;
        }
        /*
         * Allocate and read JSON area. Always the whole area must be read.
         */
        *json_area = malloc(hdr_json_size);
        if (!*json_area) {
                return -ENOMEM;
        }
[...]

Without the json_area allocated we end-up looping, in search of the proper header size, and failing the test. This malloc is the one generating the following entry on strace:

mmap(NULL, 4194304, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 EAGAIN (Resource temporarily unavailable)

* Conclusion: we have 2 avenues for fixing that, I personally consider (a) [below] the more correct one.

(a) We could increase the builders memlock limit to 64M - Focal has that as a default now. This seems to me the proper approach, given that in real life cryptsetup is performing the memory lock, so we should exercise it like that during the build tests.

(b) It's possible to fallback to the same scenario of Xenial builder by _reducing_ the memlock limit and having cryptsetup not setting the memory lock at all during the build. The bonus of this approach is its simplicity - we can decrease such limit from the package itself, but at the same time, we don't exercise the real life usage anymore during the build tests.

By following the approach (b) above, I've managed to make the build work: https://launchpad.net/~gpiccoli/+archive/ubuntu/crypt-groovy/+build/19913720

I'll spin a mailing-list discussion on top of Colin's PPA builder update message to discuss the possibility of approach (a).
Cheers,

Guilherme