Linux kernel source tree
Go to file
Jens Axboe 7029acd8a9 io_uring/rsrc: get rid of per-ring io_rsrc_node list
Work in progress, but get rid of the per-ring serialization of resource
nodes, like registered buffers and files. Main issue here is that one
node can otherwise hold up a bunch of other nodes from getting freed,
which is especially a problem for file resource nodes and networked
workloads where some descriptors may not see activity in a long time.

As an example, instantiate an io_uring ring fd and create a sparse
registered file table. Even 2 will do. Then create a socket and register
it as fixed file 0, F0. The number of open files in the app is now 5,
with 0/1/2 being the usual stdin/out/err, 3 being the ring fd, and 4
being the socket. Register this socket (eg "the listener") in slot 0 of
the registered file table. Now add an operation on the socket that uses
slot 0. Finally, loop N times, where each loop creates a new socket,
registers said socket as a file, then unregisters the socket, and
finally closes the socket. This is roughly similar to what a basic
accept loop would look like.

At the end of this loop, it's not unreasonable to expect that there
would still be 5 open files. Each socket created and registered in the
loop is also unregistered and closed. But since the listener socket
registered first still has references to its resource node due to still
being active, each subsequent socket unregistration is stuck behind it
for reclaim. Hence 5 + N files are still open at that point, where N is
awaiting the final put held up by the listener socket.

Rewrite the io_rsrc_node handling to NOT rely on serialization. Struct
io_kiocb now gets explicit resource nodes assigned, with each holding a
reference to the parent node. A parent node is either of type FILE or
BUFFER, which are the two types of nodes that exist. A request can have
two nodes assigned, if it's using both registered files and buffers.
Since request issue and task_work completion is both under the ring
private lock, no atomics are needed to handle these references. It's a
simple unlocked inc/dec. As before, the registered buffer or file table
each hold a reference as well to the registered nodes. Final put of the
node will remove the node and free the underlying resource, eg unmap the
buffer or put the file.

Outside of removing the stall in resource reclaim described above, it
has the following advantages:

1) It's a lot simpler than the previous scheme, and easier to follow.
   No need to specific quiesce handling anymore.

2) There are no resource node allocations in the fast path, all of that
   happens at resource registration time.

3) The structs related to resource handling can all get simplified
   quite a bit, like io_rsrc_node and io_rsrc_data. io_rsrc_put can
   go away completely.

4) Handling of resource tags is much simpler, and doesn't require
   persistent storage as it can simply get assigned up front at
   registration time. Just copy them in one-by-one at registration time
   and assign to the resource node.

The only real downside is that a request is now explicitly limited to
pinning 2 resources, one file and one buffer, where before just
assigning a resource node to a request would pin all of them. The upside
is that it's easier to follow now, as an individual resource is
explicitly referenced and assigned to the request.

With this in place, the above mentioned example will be using exactly 5
files at the end of the loop, not N.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-11-02 15:44:18 -06:00
arch - Explicitly disable the TSC deadline timer when going idle to address 2024-10-20 12:04:32 -07:00
block block-6.12-20241018 2024-10-18 15:53:00 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto This push fixes the following issues: 2024-10-16 08:42:54 -07:00
Documentation Char/Misc/IIO fixes for 6.12-rc4 2024-10-20 13:10:44 -07:00
drivers bluetooth pull request for net: 2024-10-20 14:08:17 -07:00
fs Mashed-up update that I sat on too long: 2024-10-19 08:44:10 -07:00
include io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
init cfi: fix conditions for HAVE_CFI_ICALL_NORMALIZE_INTEGERS 2024-10-13 22:23:13 +02:00
io_uring io_uring/rsrc: get rid of per-ring io_rsrc_node list 2024-11-02 15:44:18 -06:00
ipc struct fd layout change (and conversion to accessor helpers) 2024-09-23 09:35:36 -07:00
kernel - Add PREEMPT_RT maintainers 2024-10-20 11:30:56 -07:00
lib Rust fixes for v6.12 (2nd) 2024-10-19 08:32:47 -07:00
LICENSES LICENSES: add 0BSD license text 2024-09-01 20:43:24 -07:00
mm mm: fix follow_pfnmap API lockdep assert 2024-10-18 09:50:05 -07:00
net bluetooth pull request for net: 2024-10-20 14:08:17 -07:00
rust Driver core fix for 6.12-rc3 2024-10-13 09:10:52 -07:00
samples [tree-wide] finally take no_llseek out 2024-09-27 08:18:43 -07:00
scripts kbuild: rust: add CONFIG_RUSTC_LLVM_VERSION 2024-10-13 22:22:28 +02:00
security ipe: fallback to platform keyring also if key in trusted keyring is rejected 2024-10-18 12:14:53 -07:00
sound ALSA: hda/conexant - Use cached pin control for Node 0x1d on HP EliteOne 1000 G2 2024-10-16 10:29:57 +02:00
tools BPF fixes: 2024-10-18 16:27:14 -07:00
usr initramfs: shorten cmd_initfs in usr/Makefile 2024-07-16 01:07:52 +09:00
virt sched/fair: Fix external p->on_rq users 2024-10-14 09:14:35 +02:00
.clang-format clang-format: Update with v6.11-rc1's for_each macro list 2024-08-02 13:20:31 +02:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore Add Jeff Kirsher to .get_maintainer.ignore 2024-03-08 11:36:54 +00:00
.gitattributes
.gitignore Kbuild updates for v6.12 2024-09-24 13:02:06 -07:00
.mailmap mailmap: add an entry for Andy Chiu 2024-10-17 00:28:08 -07:00
.rustfmt.toml
COPYING
CREDITS CREDITS: sort alphabetically by name 2024-10-09 12:47:19 -07:00
Kbuild
Kconfig
MAINTAINERS Char/Misc/IIO fixes for 6.12-rc4 2024-10-20 13:10:44 -07:00
Makefile Linux 6.12-rc4 2024-10-20 15:19:38 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.