~mamarley/openconnect/+git/gitlab-main:aesni

Last commit made on 2021-07-02
Get this branch:
git clone -b aesni https://git.launchpad.net/~mamarley/openconnect/+git/gitlab-main

Branch merges

Branch information

Recent commits

96686a8... by dwmw2

Add AES-NI ESP implementation

Signed-off-by: David Woodhouse <email address hidden>

d3ebd62... by dwmw2

Import CRYPTOGAMS ASM code (from OpenSSL 3.0.0-beta1)

The CRYPTOGAMS code is suitably licensed for use in OpenConnect under
LGPLv2.1, and gives us a 40% speedup to ESP AES-SHA1 encryption.

However, not everything is in the standalone CRYPTOGAMS repository, so
we have to import from OpenSSL itself for now, which means the licence
is incompatible.

Once https://github.com/dot-asm/cryptogams/issues/7 is resolved, we can
do this for real. But for now it's worth having it to experiment with.

Really needs SHA256 too...

Signed-off-by: David Woodhouse <email address hidden>

3629c27... by dwmw2

Add esptest microbenchmark for ESP encryption

Signed-off-by: David Woodhouse <email address hidden>

a01005e... by dwmw2

Allow ESP functions to be overridden

Signed-off-by: David Woodhouse <email address hidden>

167cf7a... by dwmw2

vhost: Add USED_EVENT and AVAIL_EVENT macros

Open-coding this was kind of awful. I mean, it's *still* fairly awful
but now we can hide it away in the macro and never think about it again.

Signed-off-by: David Woodhouse <email address hidden>

acc4bae... by dwmw2

vhost: Avoid TX queue when writing directly is faster

Using vhost makes high-volume transfers go nice and fast, especially
we are using 100% of a CPU in the single-threaded OpenConnect process
and just offloading the kernel←→user copies for the tun packets to
the vhost thread instead of having to do them from our single thread
too.

However, for a lightly used link with *occasional* packets, which is
fairly much the definition of a VPN being used for VoIP, it adds a lot
of unwanted latency. If our userspace thread is otherwise going to be
*idle*, and fall back into select() to wait for something else to do,
then we might as well just write the packet *directly* to the tun
device.

So... when the queue is stopped and would need to be kicked, and if
there are only a *few* (heuristic: half max_qlen) packets on the
queue to be sent, just send them directly.

Signed-off-by: David Woodhouse <email address hidden>

87bbf24... by dwmw2

Use vhost for dtls-psk and sigterm tests

Signed-off-by: David Woodhouse <email address hidden>

c6ef119... by dwmw2

Initial vhost-net support

We spend a lot of CPU time copying packets between kernel and userspace.

Eventually we want to implement a completely in-kernel data path. It
isn't even that hard, now that most of the functionality we need from
the kernel is there and it's mostly just a case of fitting it together.

In the meantime, though, there are a few things we can do even on today's
released kernels. For a start, we can use vhost-net to avoid having to
do the read()/write() on the tun device in our mainloop.

Ultimately, it ends up being done by a kernel thread instead; it doesn't
really go away. But that should at least give us a performance win which
would compare with a decent threading model, while allowing OpenConnect
to remain naïvely single-threaded and lock-free.

We have to carefully pick a configuration for vhost-net which actually
works, since it's fairly hosed for IFF_TUN support:
https://lore.<email address hidden>/T/

But by limiting the sndbuf (which disables XDP, sadly) and by requesting
a virtio header that we don't actually want, we *can* make it work even
with today's production kernels.

Thanks to Eugenio Pérez Martín ><email address hidden>> for his blog at
https://www.redhat.com/en/blog/virtqueues-and-virtio-ring-how-data-travels
and for lots more help and guidance as I floundered around trying to make
this work.

Although this gives a 10% improvement on the bandwidth we can manage in
my testing (up to 2.75Gb/s with other tricks, on a c5.8xlarge Skylake VM)
it also introduces a small amount of extra latency, so disable it by
default unless the user has bumped the queue length to 16 or more, which
presumably means they choose bandwidth over latency.

Signed-off-by: David Woodhouse <email address hidden>

e1564bf... by dwmw2

Stop accepting DTLS packets when the queue is full

Signed-off-by: David Woodhouse <email address hidden>

d120d8b... by dwmw2

Clear epoll_fd after forking to background self

Otherwise we remove the events from the epoll_fd before we exit in
the parent process.

This would be a bit awful if it were something we require all users of
libopenconnect to know about, but it isn't. We make everything O_CLOEXEC
and we don't expect users to be calling openconnect_vpninfo_free() from
another process after forking, like the background code does. We only
do it there so that we can check for memory leaks (I think).

Signed-off-by: David Woodhouse <email address hidden>