[rlc-10/6.12.0-211.26.1.el10_2] Multiple patches tested (7 commits) by ciq-kernel-automation[bot] · Pull Request #1384 · ctrliq/kernel-src-tree

ciq-kernel-automation · 2026-06-26T19:33:09Z

Summary

This PR has been automatically created after successful completion of all CI stages.

Commit Message(s)

writeback: Avoid contention on wb->list_lock when switching inodes

jira jira SECO-535
bugfix: writeback softlockups
commit-author Jan Kara <jack@suse.cz>
commit e1b849cfa6b61f1c866a908c9e8dd9b5aaab820b
upstream-diff | Due to the change in bdi_writeback it propagates a kabi
	breakage through every pointer version of this and
	backing_dev_info we have to use RH_KABI_EXTEND() on
	bdi_writeback to prevent the CRC miscalculation.

writeback: Avoid softlockup when switching many inodes

jira SECO-535
bugfix: writeback softlockups
commit-author Jan Kara <jack@suse.cz>
commit 66c14dccd810d42ec5c73bb8a9177489dfd62278

writeback: Avoid excessively long inode switching times

jira SECO-535
bugfix: writeback softlockups
commit-author Jan Kara <jack@suse.cz>
commit 9a6ebbdbd41235ea3bc0c4f39e2076599b8113cc

writeback: Add tracepoint to track pending inode switches

jira SECO-535
bugfix: writeback softlockups
commit-author Jan Kara <jack@suse.cz>
commit 0cee64c547e3c9cda646af3e075a64f445ee8148

writeback: Fix use after free in inode_switch_wbs_work_fn()

jira SECO-535
bugfix: writeback softlockups
commit-author Jan Kara <jack@suse.cz>
commit 6689f01d6740cf358932b3e97ee968c6099800d9

writeback: fix race between cgroup_writeback_umount() and inode_switch_wbs()

jira SECO-535
bugfix: writeback softlockups
commit-author Baokun Li <libaokun@linux.alibaba.com>
commit cba38ec4cbd3a7b8b942a8d52531a05be8a9ff0d

writeback: drop now-unnecessary rcu_barrier() in cgroup_writeback_umount()

jira SECO-535
bugfix: writeback softlockups
commit-author Baokun Li <libaokun@linux.alibaba.com>
commit e90a6d668e26e00a72df2d09c173b563468f09c9

Test Results

✅ Build Stage

Architecture	Build Time	Total Time
x86_64	41m 8s	41m 57s
aarch64	24m 38s	25m 16s

View build logs

✅ Boot Verification

Status: Passed (all architectures)
View boot logs

✅ Kernel Selftests

Architecture	Passed	Failed	Compared Against	Status
x86_64	428	64	rlc-10/6.12.0-211.26.1.el10_2	⚠️ No baseline available
aarch64	375	60	rlc-10/6.12.0-211.26.1.el10_2	⚠️ No baseline available

View kselftest logs

✅ LTP Results

Architecture	Passed	Failed	Compared Against	Status
x86_64	1481	79	rlc-10/6.12.0-211.26.1.el10_2	⚠️ No baseline available
aarch64	1452	80	rlc-10/6.12.0-211.26.1.el10_2	⚠️ No baseline available

View LTP logs

🤖 This PR was automatically generated by GitHub Actions
Run ID: 28248930605

jira jira SECO-535 bugfix: writeback softlockups commit-author Jan Kara <jack@suse.cz> commit e1b849c upstream-diff | Due to the change in bdi_writeback it propagates a kabi breakage through every pointer version of this and backing_dev_info we have to use RH_KABI_EXTEND() on bdi_writeback to prevent the CRC miscalculation. There can be multiple inode switch works that are trying to switch inodes to / from the same wb. This can happen in particular if some cgroup exits which owns many (thousands) inodes and we need to switch them all. In this case several inode_switch_wbs_work_fn() instances will be just spinning on the same wb->list_lock while only one of them makes forward progress. This wastes CPU cycles and quickly leads to softlockup reports and unusable system. Instead of running several inode_switch_wbs_work_fn() instances in parallel switching to the same wb and contending on wb->list_lock, run just one work item per wb and manage a queue of isw items switching to this wb. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> (cherry picked from commit e1b849c) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

jira SECO-535 bugfix: writeback softlockups commit-author Jan Kara <jack@suse.cz> commit 66c14dc process_inode_switch_wbs_work() can be switching over 100 inodes to a different cgroup. Since switching an inode requires counting all dirty & under-writeback pages in the address space of each inode, this can take a significant amount of time. Add a possibility to reschedule after processing each inode to avoid softlockups. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 66c14dc) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

jira SECO-535 bugfix: writeback softlockups commit-author Jan Kara <jack@suse.cz> commit 9a6ebbd With lazytime mount option enabled we can be switching many dirty inodes on cgroup exit to the parent cgroup. The numbers observed in practice when systemd slice of a large cron job exits can easily reach hundreds of thousands or millions. The logic in inode_do_switch_wbs() which sorts the inode into appropriate place in b_dirty list of the target wb however has linear complexity in the number of dirty inodes thus overall time complexity of switching all the inodes is quadratic leading to workers being pegged for hours consuming 100% of the CPU and switching inodes to the parent wb. Simple reproducer of the issue: FILES=10000 # Filesystem mounted with lazytime mount option MNT=/mnt/ echo "Creating files and switching timestamps" for (( j = 0; j < 50; j ++ )); do mkdir $MNT/dir$j for (( i = 0; i < $FILES; i++ )); do echo "foo" >$MNT/dir$j/file$i done touch -a -t 202501010000 $MNT/dir$j/file* done wait echo "Syncing and flushing" sync echo 3 >/proc/sys/vm/drop_caches echo "Reading all files from a cgroup" mkdir /sys/fs/cgroup/unified/mycg1 || exit echo $$ >/sys/fs/cgroup/unified/mycg1/cgroup.procs || exit for (( j = 0; j < 50; j ++ )); do cat /mnt/dir$j/file* >/dev/null & done wait echo "Switching wbs" # Now rmdir the cgroup after the script exits We need to maintain b_dirty list ordering to keep writeback happy so instead of sorting inode into appropriate place just append it at the end of the list and clobber dirtied_time_when. This may result in inode writeback starting later after cgroup switch however cgroup switches are rare so it shouldn't matter much. Since the cgroup had write access to the inode, there are no practical concerns of the possible DoS issues. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 9a6ebbd) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

jira SECO-535 bugfix: writeback softlockups commit-author Jan Kara <jack@suse.cz> commit 0cee64c Add trace_inode_switch_wbs_queue tracepoint to allow insight into how many inodes are queued to switch their bdi_writeback structure. Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 0cee64c) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

jira SECO-535 bugfix: writeback softlockups commit-author Jan Kara <jack@suse.cz> commit 6689f01 inode_switch_wbs_work_fn() has a loop like: wb_get(new_wb); while (1) { list = llist_del_all(&new_wb->switch_wbs_ctxs); /* Nothing to do? */ if (!list) break; ... process the items ... } Now adding of items to the list looks like: wb_queue_isw() if (llist_add(&isw->list, &wb->switch_wbs_ctxs)) queue_work(isw_wq, &wb->switch_work); Because inode_switch_wbs_work_fn() loops when processing isw items, it can happen that wb->switch_work is pending while wb->switch_wbs_ctxs is empty. This is a problem because in that case wb can get freed (no isw items -> no wb reference) while the work is still pending causing use-after-free issues. We cannot just fix this by cancelling work when freeing wb because that could still trigger problematic 0 -> 1 transitions on wb refcount due to wb_get() in inode_switch_wbs_work_fn(). It could be all handled with more careful code but that seems unnecessarily complex so let's avoid that until it is proven that the looping actually brings practical benefit. Just remove the loop from inode_switch_wbs_work_fn() instead. That way when wb_queue_isw() queues work, we are guaranteed we have added the first item to wb->switch_wbs_ctxs and nobody is going to remove it (and drop the wb reference it holds) until the queued work runs. Fixes: e1b849c ("writeback: Avoid contention on wb->list_lock when switching inodes") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260413093618.17244-2-jack@suse.cz Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> (cherry picked from commit 6689f01) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

…h_wbs() jira SECO-535 bugfix: writeback softlockups commit-author Baokun Li <libaokun@linux.alibaba.com> commit cba38ec When a container exits, the following BUG_ON() is occasionally triggered: ================================================================== VFS: Busy inodes after unmount of sdb (ext4) ------------[ cut here ]------------ kernel BUG at fs/super.c:695! CPU: 3 PID: 6 Comm: containerd-shim Tainted: G OE K 6.6 #1 pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) pc : generic_shutdown_super+0xf0/0x100 lr : generic_shutdown_super+0xf0/0x100 Call trace: generic_shutdown_super+0xf0/0x100 kill_block_super+0x20/0x48 ext4_kill_sb+0x28/0x60 deactivate_locked_super+0x54/0x130 deactivate_super+0x84/0xa0 cleanup_mnt+0xa4/0x140 __cleanup_mnt+0x18/0x28 task_work_run+0x78/0xe0 do_notify_resume+0x204/0x240 ================================================================== The root cause is a race between cgroup_writeback_umount() and inode_switch_wbs()/cleanup_offline_cgwb(). There is a window between inode_prepare_wbs_switch() returning true and the subsequent wb_queue_isw() call. Following is the process that triggers the issue: CPU A (umount) | CPU B (writeback) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ inode_switch_wbs/cleanup_offline_cgwb atomic_inc(&isw_nr_in_flight) inode_prepare_wbs_switch -> passes SB_ACTIVE check __iget(inode) generic_shutdown_super sb->s_flags &= ~SB_ACTIVE cgroup_writeback_umount(sb) smp_mb() atomic_read(&isw_nr_in_flight) rcu_barrier() -> no pending RCU callbacks flush_workqueue(isw_wq) -> nothing queued, returns evict_inodes(sb) -> Inode skipped as isw still holds a ref. sop->put_super(sb) /* destroys percpu counters */ -> VFS: Busy inodes after unmount! wb_queue_isw() queue_work(isw_wq, ...) /* later in work function */ inode_switch_wbs_work_fn process_inode_switch_wbs iput() -> evict percpu_counter_dec() // UAF! Fix this by extending the RCU read-side critical section in inode_switch_wbs() and cleanup_offline_cgwb() to cover from inode_prepare_wbs_switch() through wb_queue_isw(). Since there is no sleep in this window, rcu_read_lock() can be used. Then add a synchronize_rcu() in cgroup_writeback_umount() before the existing rcu_barrier(), so that all in-flight switchers that have passed the SB_ACTIVE check have completed queue_work() before flush_workqueue() is called. The existing rcu_barrier() is intentionally retained so this fix can be backported unchanged to stable kernels (5.10.y, 6.6.y, ...) that still queue switches via queue_rcu_work(). It is a no-op on current mainline (since commit e1b849c ("writeback: Avoid contention on wb->list_lock when switching inodes")) and is removed in a follow-up patch. Fixes: a1a0e23 ("writeback: flush inode cgroup wb switches instead of pinning super_block") Cc: stable@vger.kernel.org Suggested-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/all/mxnjq2l6guusfchvauxr3v7c4bwjasybxlleqbbh4efloeqspz@iqylk76ohufz Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260521095016.2791354-2-libaokun@linux.alibaba.com Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> (cherry picked from commit cba38ec) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

…unt() jira SECO-535 bugfix: writeback softlockups commit-author Baokun Li <libaokun@linux.alibaba.com> commit e90a6d6 Commit e1b849c ("writeback: Avoid contention on wb->list_lock when switching inodes") replaced the queue_rcu_work() based scheduling of inode wb switches with a plain queue_work(). Since then no switcher goes through call_rcu(), so rcu_barrier() in cgroup_writeback_umount() has no callbacks of its own to wait for. It still drains unrelated call_rcu() callbacks from other subsystems on busy systems, which incidentally slows umount down; drop it. Fixes: e1b849c ("writeback: Avoid contention on wb->list_lock when switching inodes") Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun@linux.alibaba.com> Link: https://patch.msgid.link/20260521095016.2791354-3-libaokun@linux.alibaba.com Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> (cherry picked from commit e90a6d6) Signed-off-by: Jonathan Maple <jmaple@ciq.com>

PlaidCat added 7 commits June 26, 2026 11:42

ciq-kernel-automation Bot added the created-by-kernelci Tag PRs that were automatically created when a user branch was pushed to the repo (kernelCI) label Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[rlc-10/6.12.0-211.26.1.el10_2] Multiple patches tested (7 commits)#1384

[rlc-10/6.12.0-211.26.1.el10_2] Multiple patches tested (7 commits)#1384
ciq-kernel-automation[bot] wants to merge 7 commits into
rlc-10/6.12.0-211.26.1.el10_2from
{jmaple}_rlc-10/6.12.0-211.26.1.el10_2

ciq-kernel-automation Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ciq-kernel-automation Bot commented Jun 26, 2026

Summary

Commit Message(s)

Test Results

✅ Build Stage

✅ Boot Verification

✅ Kernel Selftests

✅ LTP Results

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant