->ki_pos value is unreliable in such cases. For an obvious example,
consider O_DSYNC write - we feed the data to page cache and start IO,
then we make sure it's completed. Update of ->ki_pos is dealt with
by the first part; failure in the second ends up with negative value
returned _and_ ->ki_pos left advanced as if sync had been successful.
In the same situation write(2) does not advance the file position
at all.
Reviewed-by: Christian Brauner <email address hidden>
Reviewed-by: Jens Axboe <email address hidden>
Signed-off-by: Al Viro <email address hidden>
(cherry picked from commit 1939316bf988f3e49a07d9c4dd6f660bf4daa53d)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
If an application does O_DIRECT writes with io_uring and the file system
supports IOCB_DIO_CALLER_COMP, then completions of the dio write side is
done from the task_work that will post the completion event for said
write as well.
Whenever a dio write is done against a file, the inode i_dio_count is
elevated. This enables other callers to use inode_dio_wait() to wait for
previous writes to complete. If we defer the full dio completion to
task_work, we are dependent on that task_work being run before the
inode i_dio_count can be decremented.
If the same task that issues io_uring dio writes with
IOCB_DIO_CALLER_COMP performs a synchronous system call that calls
inode_dio_wait(), then we can deadlock as we're blocked sleeping on
the event to become true, but not processing the completions that will
result in the inode i_dio_count being decremented.
Until we can guarantee that this is the case, then disable the deferred
caller completions.
Fixes: 099ada2c8726 ("io_uring/rw: add write support for IOCB_DIO_CALLER_COMP")
Reported-by: Andres Freund <email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
(cherry picked from commit 838b35bb6a89c36da07ca39520ec071d9250334d)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
We could race with SQ thread exit, and if we do, we'll hit a NULL pointer
dereference when the thread is cleared. Grab the SQPOLL data lock before
attempting to get the task cpu and pid for fdinfo, this ensures we have a
stable view of it.
Cc: <email address hidden>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218032
Reviewed-by: Gabriel Krisman Bertazi <email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
(cherry picked from commit 7644b1a1c9a7ae8ab99175989bfc8676055edb46)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
If we specify a valid CQ ring address but an invalid SQ ring address,
we'll correctly spot this and free the allocated pages and clear them
to NULL. However, we don't clear the ring page count, and hence will
attempt to free the pages again. We've already cleared the address of
the page array when freeing them, but we don't check for that. This
causes the following crash:
A cpu hotplug callback was issued before wq->all_list was initialized.
This results in a null pointer dereference. The fix is to fully setup
the io_wq before calling cpuhp_state_add_instance_nocalls().
Signed-off-by: Jeff Moyer <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
(cherry picked from commit 0f8baa3c9802fbfe313c901e1598397b61b91ada)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
On at least arm32, but presumably any arch with highmem, if the
application passes in memory that resides in highmem for the rings,
then we should fail that ring creation. We fail it with -EINVAL, which
is what kernels that don't support IORING_SETUP_NO_MMAP will do as well.
Cc: <email address hidden>
Fixes: 03d89a2de25b ("io_uring: support for user allocated memory for rings/sqes")
Signed-off-by: Jens Axboe <email address hidden>
(cherry picked from commit 223ef474316466e9f61f6e0064f3a6fe4923a2c5)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
io_lockdep_assert_cq_locked() checks that locking is correctly done when
a CQE is posted. If the ring is setup in a disabled state with
IORING_SETUP_R_DISABLED, then ctx->submitter_task isn't assigned until
the ring is later enabled. We generally don't post CQEs in this state,
as no SQEs can be submitted. However it is possible to generate a CQE
if tagged resources are being updated. If this happens and PROVE_LOCKING
is enabled, then the locking check helper will dereference
ctx->submitter_task, which hasn't been set yet.
Fixup io_lockdep_assert_cq_locked() to handle this case correctly. While
at it, convert it to a static inline as well, so that generated line
offsets will actually reflect which condition failed, rather than just
the line offset for io_lockdep_assert_cq_locked() itself.
syzbot reports that registering a mapped buffer ring on arm32 can
trigger an OOPS. Registered buffer rings have two modes, one of them
is the application passing in the memory that the buffer ring should
reside in. Once those pages are mapped, we use page_address() to get
a virtual address. This will obviously fail on highmem pages, which
aren't mapped.
Add a check if we have any highmem pages after mapping, and fail the
attempt to register a provided buffer ring if we do. This will return
the same error as kernels that don't support provided buffer rings to
begin with.
Link: https://<email address hidden>/
Fixes: c56e022c0a27 ("io_uring: add support for user mapped provided buffer ring")
Cc: <email address hidden>
Reported-by: <email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
(cherry picked from commit f8024f1f36a30a082b0457d5779c8847cea57f57)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
This is unionized with the actual link flags, so they can of course be
set and they will be evaluated further down. If not we fail any LINKAT
that has to set option flags.
Fixes: cf30da90bc3a ("io_uring: add support for IORING_OP_LINKAT")
Cc: <email address hidden>
Reported-by: Thomas Leonard <email address hidden>
Link: https://github.com/axboe/liburing/issues/955
Signed-off-by: Jens Axboe <email address hidden>
(cherry picked from commit a52d4f657568d6458e873f74a9602e022afe666f)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>
cde57da...
by
Pavel Begunkov <email address hidden>
io_uring/net: fix iter retargeting for selected buf
When using selected buffer feature, io_uring delays data iter setup
until later. If io_setup_async_msg() is called before that it might see
not correctly setup iterator. Pre-init nr_segs and judge from its state
whether we repointing.
Cc: <email address hidden>
Reported-by: <email address hidden>
Fixes: 0455d4ccec548 ("io_uring: add POLL_FIRST support for send/sendmsg and recv/recvmsg")
Signed-off-by: Pavel Begunkov <email address hidden>
Link: https://<email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
(cherry picked from commit c21a8027ad8a68c340d0d58bf1cc61dcb0bc4d2f)
Signed-off-by: Thadeu Lima de Souza Cascardo <email address hidden>