Skip to content

net: sched: sch_qfq: Fix UAF in qfq_dequeue()#57

Closed
gvrose8192 wants to merge 1 commit into
PR-CHECKERfrom
gvrose_ciqlts8_8-PR
Closed

net: sched: sch_qfq: Fix UAF in qfq_dequeue()#57
gvrose8192 wants to merge 1 commit into
PR-CHECKERfrom
gvrose_ciqlts8_8-PR

Conversation

@gvrose8192

Copy link
Copy Markdown

jira VULN-6730
cve CVE-2023-4921
commit-author valis sec@valis.email
commit 8fc134f

When the plug qdisc is used as a class of the qfq qdisc it could trigger a UAF. This issue can be reproduced with following commands:

tc qdisc add dev lo root handle 1: qfq
tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
tc qdisc add dev lo parent 1:1 handle 2: plug
tc filter add dev lo parent 1: basic classid 1:1
ping -c1 127.0.0.1

and boom:

[ 285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0 [ 285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144 [ 285.355903]
[ 285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4 [ 285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 285.358376] Call Trace:
[ 285.358773]
[ 285.359109] dump_stack_lvl+0x44/0x60
[ 285.359708] print_address_description.constprop.0+0x2c/0x3c0 [ 285.360611] kasan_report+0x10c/0x120
[ 285.361195] ? qfq_dequeue+0xa7/0x7f0
[ 285.361780] qfq_dequeue+0xa7/0x7f0
[ 285.362342] __qdisc_run+0xf1/0x970
[ 285.362903] net_tx_action+0x28e/0x460
[ 285.363502] __do_softirq+0x11b/0x3de
[ 285.364097] do_softirq.part.0+0x72/0x90
[ 285.364721]
[ 285.365072]
[ 285.365422] __local_bh_enable_ip+0x77/0x90
[ 285.366079] __dev_queue_xmit+0x95f/0x1550
[ 285.366732] ? __pfx_csum_and_copy_from_iter+0x10/0x10 [ 285.367526] ? __pfx___dev_queue_xmit+0x10/0x10 [ 285.368259] ? __build_skb_around+0x129/0x190
[ 285.368960] ? ip_generic_getfrag+0x12c/0x170
[ 285.369653] ? __pfx_ip_generic_getfrag+0x10/0x10 [ 285.370390] ? csum_partial+0x8/0x20
[ 285.370961] ? raw_getfrag+0xe5/0x140
[ 285.371559] ip_finish_output2+0x539/0xa40
[ 285.372222] ? __pfx_ip_finish_output2+0x10/0x10 [ 285.372954] ip_output+0x113/0x1e0
[ 285.373512] ? __pfx_ip_output+0x10/0x10
[ 285.374130] ? icmp_out_count+0x49/0x60
[ 285.374739] ? __pfx_ip_finish_output+0x10/0x10 [ 285.375457] ip_push_pending_frames+0xf3/0x100
[ 285.376173] raw_sendmsg+0xef5/0x12d0
[ 285.376760] ? do_syscall_64+0x40/0x90
[ 285.377359] ? __static_call_text_end+0x136578/0x136578 [ 285.378173] ? do_syscall_64+0x40/0x90
[ 285.378772] ? kasan_enable_current+0x11/0x20
[ 285.379469] ? __pfx_raw_sendmsg+0x10/0x10
[ 285.380137] ? __sock_create+0x13e/0x270
[ 285.380673] ? __sys_socket+0xf3/0x180
[ 285.381174] ? __x64_sys_socket+0x3d/0x50
[ 285.381725] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.382425] ? __rcu_read_unlock+0x48/0x70
[ 285.382975] ? ip4_datagram_release_cb+0xd8/0x380 [ 285.383608] ? __pfx_ip4_datagram_release_cb+0x10/0x10 [ 285.384295] ? preempt_count_sub+0x14/0xc0
[ 285.384844] ? __list_del_entry_valid+0x76/0x140 [ 285.385467] ? _raw_spin_lock_bh+0x87/0xe0
[ 285.386014] ? __pfx__raw_spin_lock_bh+0x10/0x10 [ 285.386645] ? release_sock+0xa0/0xd0
[ 285.387148] ? preempt_count_sub+0x14/0xc0
[ 285.387712] ? freeze_secondary_cpus+0x348/0x3c0 [ 285.388341] ? aa_sk_perm+0x177/0x390
[ 285.388856] ? __pfx_aa_sk_perm+0x10/0x10
[ 285.389441] ? check_stack_object+0x22/0x70
[ 285.390032] ? inet_send_prepare+0x2f/0x120
[ 285.390603] ? __pfx_inet_sendmsg+0x10/0x10
[ 285.391172] sock_sendmsg+0xcc/0xe0
[ 285.391667] __sys_sendto+0x190/0x230
[ 285.392168] ? __pfx___sys_sendto+0x10/0x10
[ 285.392727] ? kvm_clock_get_cycles+0x14/0x30
[ 285.393328] ? set_normalized_timespec64+0x57/0x70 [ 285.393980] ? _raw_spin_unlock_irq+0x1b/0x40
[ 285.394578] ? __x64_sys_clock_gettime+0x11c/0x160 [ 285.395225] ? __pfx___x64_sys_clock_gettime+0x10/0x10 [ 285.395908] ? _copy_to_user+0x3e/0x60
[ 285.396432] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.397086] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.397734] ? do_syscall_64+0x71/0x90
[ 285.398258] __x64_sys_sendto+0x74/0x90
[ 285.398786] do_syscall_64+0x64/0x90
[ 285.399273] ? exit_to_user_mode_prepare+0x1a/0x120 [ 285.399949] ? syscall_exit_to_user_mode+0x22/0x50 [ 285.400605] ? do_syscall_64+0x71/0x90
[ 285.401124] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.401807] RIP: 0033:0x495726
[ 285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09 [ 285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c [ 285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726 [ 285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000 [ 285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c [ 285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634 [ 285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000 [ 285.410403]
[ 285.410704]
[ 285.410929] Allocated by task 144:
[ 285.411402] kasan_save_stack+0x1e/0x40
[ 285.411926] kasan_set_track+0x21/0x30
[ 285.412442] __kasan_slab_alloc+0x55/0x70
[ 285.412973] kmem_cache_alloc_node+0x187/0x3d0
[ 285.413567] __alloc_skb+0x1b4/0x230
[ 285.414060] __ip_append_data+0x17f7/0x1b60
[ 285.414633] ip_append_data+0x97/0xf0
[ 285.415144] raw_sendmsg+0x5a8/0x12d0
[ 285.415640] sock_sendmsg+0xcc/0xe0
[ 285.416117] __sys_sendto+0x190/0x230
[ 285.416626] __x64_sys_sendto+0x74/0x90
[ 285.417145] do_syscall_64+0x64/0x90
[ 285.417624] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 285.418306]
[ 285.418531] Freed by task 144:
[ 285.418960] kasan_save_stack+0x1e/0x40
[ 285.419469] kasan_set_track+0x21/0x30
[ 285.419988] kasan_save_free_info+0x27/0x40
[ 285.420556] ____kasan_slab_free+0x109/0x1a0
[ 285.421146] kmem_cache_free+0x1c2/0x450
[ 285.421680] __netif_receive_skb_core+0x2ce/0x1870 [ 285.422333] __netif_receive_skb_one_core+0x97/0x140 [ 285.423003] process_backlog+0x100/0x2f0
[ 285.423537] __napi_poll+0x5c/0x2d0
[ 285.424023] net_rx_action+0x2be/0x560
[ 285.424510] __do_softirq+0x11b/0x3de
[ 285.425034]
[ 285.425254] The buggy address belongs to the object at ffff8880bad31280 [ 285.425254] which belongs to the cache skbuff_head_cache of size 224 [ 285.426993] The buggy address is located 40 bytes inside of [ 285.426993] freed 224-byte region [ffff8880bad31280, ffff8880bad31360) [ 285.428572]
[ 285.428798] The buggy address belongs to the physical page: [ 285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31 [ 285.430758] flags: 0x100000000000200(slab|node=0|zone=1) [ 285.431447] page_type: 0xffffffff()
[ 285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000 [ 285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 [ 285.433562] page dumped because: kasan: bad access detected [ 285.434144]
[ 285.434320] Memory state around the buggy address: [ 285.434828] ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.435580] ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 285.436777] ^
[ 285.437106] ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[ 285.437616] ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 285.438126] ==================================================================
[ 285.438662] Disabling lock debugging due to kernel taint

Fix this by:

  1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a function compatible with non-work-conserving qdiscs
  2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.

Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
Reported-by: valis sec@valis.email
Signed-off-by: valis sec@valis.email
Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com
Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com
Signed-off-by: Paolo Abeni pabeni@redhat.com
(cherry picked from commit 8fc134f)
Signed-off-by: Greg Rose g.v.rose@ciq.com

jira VULN-6730
cve CVE-2023-4921
commit-author valis <sec@valis.email>
commit 8fc134f

When the plug qdisc is used as a class of the qfq qdisc it could trigger a
UAF. This issue can be reproduced with following commands:

  tc qdisc add dev lo root handle 1: qfq
  tc class add dev lo parent 1: classid 1:1 qfq weight 1 maxpkt 512
  tc qdisc add dev lo parent 1:1 handle 2: plug
  tc filter add dev lo parent 1: basic classid 1:1
  ping -c1 127.0.0.1

and boom:

[  285.353793] BUG: KASAN: slab-use-after-free in qfq_dequeue+0xa7/0x7f0
[  285.354910] Read of size 4 at addr ffff8880bad312a8 by task ping/144
[  285.355903]
[  285.356165] CPU: 1 PID: 144 Comm: ping Not tainted 6.5.0-rc3+ #4
[  285.357112] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[  285.358376] Call Trace:
[  285.358773]  <IRQ>
[  285.359109]  dump_stack_lvl+0x44/0x60
[  285.359708]  print_address_description.constprop.0+0x2c/0x3c0
[  285.360611]  kasan_report+0x10c/0x120
[  285.361195]  ? qfq_dequeue+0xa7/0x7f0
[  285.361780]  qfq_dequeue+0xa7/0x7f0
[  285.362342]  __qdisc_run+0xf1/0x970
[  285.362903]  net_tx_action+0x28e/0x460
[  285.363502]  __do_softirq+0x11b/0x3de
[  285.364097]  do_softirq.part.0+0x72/0x90
[  285.364721]  </IRQ>
[  285.365072]  <TASK>
[  285.365422]  __local_bh_enable_ip+0x77/0x90
[  285.366079]  __dev_queue_xmit+0x95f/0x1550
[  285.366732]  ? __pfx_csum_and_copy_from_iter+0x10/0x10
[  285.367526]  ? __pfx___dev_queue_xmit+0x10/0x10
[  285.368259]  ? __build_skb_around+0x129/0x190
[  285.368960]  ? ip_generic_getfrag+0x12c/0x170
[  285.369653]  ? __pfx_ip_generic_getfrag+0x10/0x10
[  285.370390]  ? csum_partial+0x8/0x20
[  285.370961]  ? raw_getfrag+0xe5/0x140
[  285.371559]  ip_finish_output2+0x539/0xa40
[  285.372222]  ? __pfx_ip_finish_output2+0x10/0x10
[  285.372954]  ip_output+0x113/0x1e0
[  285.373512]  ? __pfx_ip_output+0x10/0x10
[  285.374130]  ? icmp_out_count+0x49/0x60
[  285.374739]  ? __pfx_ip_finish_output+0x10/0x10
[  285.375457]  ip_push_pending_frames+0xf3/0x100
[  285.376173]  raw_sendmsg+0xef5/0x12d0
[  285.376760]  ? do_syscall_64+0x40/0x90
[  285.377359]  ? __static_call_text_end+0x136578/0x136578
[  285.378173]  ? do_syscall_64+0x40/0x90
[  285.378772]  ? kasan_enable_current+0x11/0x20
[  285.379469]  ? __pfx_raw_sendmsg+0x10/0x10
[  285.380137]  ? __sock_create+0x13e/0x270
[  285.380673]  ? __sys_socket+0xf3/0x180
[  285.381174]  ? __x64_sys_socket+0x3d/0x50
[  285.381725]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.382425]  ? __rcu_read_unlock+0x48/0x70
[  285.382975]  ? ip4_datagram_release_cb+0xd8/0x380
[  285.383608]  ? __pfx_ip4_datagram_release_cb+0x10/0x10
[  285.384295]  ? preempt_count_sub+0x14/0xc0
[  285.384844]  ? __list_del_entry_valid+0x76/0x140
[  285.385467]  ? _raw_spin_lock_bh+0x87/0xe0
[  285.386014]  ? __pfx__raw_spin_lock_bh+0x10/0x10
[  285.386645]  ? release_sock+0xa0/0xd0
[  285.387148]  ? preempt_count_sub+0x14/0xc0
[  285.387712]  ? freeze_secondary_cpus+0x348/0x3c0
[  285.388341]  ? aa_sk_perm+0x177/0x390
[  285.388856]  ? __pfx_aa_sk_perm+0x10/0x10
[  285.389441]  ? check_stack_object+0x22/0x70
[  285.390032]  ? inet_send_prepare+0x2f/0x120
[  285.390603]  ? __pfx_inet_sendmsg+0x10/0x10
[  285.391172]  sock_sendmsg+0xcc/0xe0
[  285.391667]  __sys_sendto+0x190/0x230
[  285.392168]  ? __pfx___sys_sendto+0x10/0x10
[  285.392727]  ? kvm_clock_get_cycles+0x14/0x30
[  285.393328]  ? set_normalized_timespec64+0x57/0x70
[  285.393980]  ? _raw_spin_unlock_irq+0x1b/0x40
[  285.394578]  ? __x64_sys_clock_gettime+0x11c/0x160
[  285.395225]  ? __pfx___x64_sys_clock_gettime+0x10/0x10
[  285.395908]  ? _copy_to_user+0x3e/0x60
[  285.396432]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.397086]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.397734]  ? do_syscall_64+0x71/0x90
[  285.398258]  __x64_sys_sendto+0x74/0x90
[  285.398786]  do_syscall_64+0x64/0x90
[  285.399273]  ? exit_to_user_mode_prepare+0x1a/0x120
[  285.399949]  ? syscall_exit_to_user_mode+0x22/0x50
[  285.400605]  ? do_syscall_64+0x71/0x90
[  285.401124]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.401807] RIP: 0033:0x495726
[  285.402233] Code: ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 09
[  285.404683] RSP: 002b:00007ffcc25fb618 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  285.405677] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 0000000000495726
[  285.406628] RDX: 0000000000000040 RSI: 0000000002518750 RDI: 0000000000000000
[  285.407565] RBP: 00000000005205ef R08: 00000000005f8838 R09: 000000000000001c
[  285.408523] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000002517634
[  285.409460] R13: 00007ffcc25fb6f0 R14: 0000000000000003 R15: 0000000000000000
[  285.410403]  </TASK>
[  285.410704]
[  285.410929] Allocated by task 144:
[  285.411402]  kasan_save_stack+0x1e/0x40
[  285.411926]  kasan_set_track+0x21/0x30
[  285.412442]  __kasan_slab_alloc+0x55/0x70
[  285.412973]  kmem_cache_alloc_node+0x187/0x3d0
[  285.413567]  __alloc_skb+0x1b4/0x230
[  285.414060]  __ip_append_data+0x17f7/0x1b60
[  285.414633]  ip_append_data+0x97/0xf0
[  285.415144]  raw_sendmsg+0x5a8/0x12d0
[  285.415640]  sock_sendmsg+0xcc/0xe0
[  285.416117]  __sys_sendto+0x190/0x230
[  285.416626]  __x64_sys_sendto+0x74/0x90
[  285.417145]  do_syscall_64+0x64/0x90
[  285.417624]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  285.418306]
[  285.418531] Freed by task 144:
[  285.418960]  kasan_save_stack+0x1e/0x40
[  285.419469]  kasan_set_track+0x21/0x30
[  285.419988]  kasan_save_free_info+0x27/0x40
[  285.420556]  ____kasan_slab_free+0x109/0x1a0
[  285.421146]  kmem_cache_free+0x1c2/0x450
[  285.421680]  __netif_receive_skb_core+0x2ce/0x1870
[  285.422333]  __netif_receive_skb_one_core+0x97/0x140
[  285.423003]  process_backlog+0x100/0x2f0
[  285.423537]  __napi_poll+0x5c/0x2d0
[  285.424023]  net_rx_action+0x2be/0x560
[  285.424510]  __do_softirq+0x11b/0x3de
[  285.425034]
[  285.425254] The buggy address belongs to the object at ffff8880bad31280
[  285.425254]  which belongs to the cache skbuff_head_cache of size 224
[  285.426993] The buggy address is located 40 bytes inside of
[  285.426993]  freed 224-byte region [ffff8880bad31280, ffff8880bad31360)
[  285.428572]
[  285.428798] The buggy address belongs to the physical page:
[  285.429540] page:00000000f4b77674 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbad31
[  285.430758] flags: 0x100000000000200(slab|node=0|zone=1)
[  285.431447] page_type: 0xffffffff()
[  285.431934] raw: 0100000000000200 ffff88810094a8c0 dead000000000122 0000000000000000
[  285.432757] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
[  285.433562] page dumped because: kasan: bad access detected
[  285.434144]
[  285.434320] Memory state around the buggy address:
[  285.434828]  ffff8880bad31180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.435580]  ffff8880bad31200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.436264] >ffff8880bad31280: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[  285.436777]                                   ^
[  285.437106]  ffff8880bad31300: fb fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
[  285.437616]  ffff8880bad31380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[  285.438126] ==================================================================
[  285.438662] Disabling lock debugging due to kernel taint

Fix this by:
1. Changing sch_plug's .peek handler to qdisc_peek_dequeued(), a
function compatible with non-work-conserving qdiscs
2. Checking the return value of qdisc_dequeue_peeked() in sch_qfq.

Fixes: 462dbc9 ("pkt_sched: QFQ Plus: fair-queueing service at DRR cost")
	Reported-by: valis <sec@valis.email>
	Signed-off-by: valis <sec@valis.email>
	Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://lore.kernel.org/r/20230901162237.11525-1-jhs@mojatatu.com
	Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit 8fc134f)
	Signed-off-by: Greg Rose <g.v.rose@ciq.com>
@gvrose8192 gvrose8192 closed this Jan 13, 2025
@gvrose8192 gvrose8192 deleted the gvrose_ciqlts8_8-PR branch January 13, 2025 23:02
PlaidCat added a commit that referenced this pull request Jun 18, 2025
jira LE-3201
cve CVE-2024-35989
Rebuild_History Non-Buildable kernel-rt-4.18.0-553.22.1.rt7.363.el8_10
commit-author Fenghua Yu <fenghua.yu@intel.com>
commit f221033

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
	Reported-by: Terrence Xu <terrence.xu@intel.com>
	Tested-by: Terrence Xu <terrence.xu@intel.com>
	Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
	Signed-off-by: Vinod Koul <vkoul@kernel.org>
(cherry picked from commit f221033)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
github-actions Bot pushed a commit that referenced this pull request Aug 4, 2025
All of the ID tables based on <linux/mod_devicetable.h> (of_device_id,
pci_device_id, ...) require their arrays to end in an empty sentinel
value.  That's usually spelled with an empty initializer entry (e.g.,
"{}"), but also sometimes with explicit 0 entries, field initializers
(e.g., '.id = ""'), or even a macro entry (like PCMCIA_DEVICE_NULL).

Without a sentinel, device-matching code may read out of bounds.

I've found a number of such bugs in driver reviews, and we even
occasionally commit one to the tree.  See commit 5751eee ("i2c:
nomadik: Add missing sentinel to match table") for example.

Teach checkpatch to find these ID tables, and complain if it looks like
there wasn't a sentinel value.

Test output:

  $ git format-patch -1 a0d15cc --stdout | scripts/checkpatch.pl -
  ERROR: missing sentinel in ID array
  #57: FILE: drivers/i2c/busses/i2c-nomadik.c:1073:
  +static const struct of_device_id nmk_i2c_eyeq_match_table[] = {
   	{
   		.compatible = "XXXXXXXXXXXXXXXXXX",
   		.data = (void *)(NMK_I2C_EYEQ_FLAG_32B_BUS | NMK_I2C_EYEQ_FLAG_IS_EYEQ5),
   	},
   };

  total: 1 errors, 0 warnings, 66 lines checked

  NOTE: For some of the reported defects, checkpatch may be able to
        mechanically convert to the typical style using --fix or --fix-inplace.

  "[PATCH] i2c: nomadik: switch from of_device_is_compatible() to" has style problems, please review.

  NOTE: If any of the errors are false positives, please report
        them to the maintainer, see CHECKPATCH in MAINTAINERS.

When run across the entire tree (scripts/checkpatch.pl -q --types
MISSING_SENTINEL -f ...), false positives exist:

* where macros are used that hide the table from analysis
  (e.g., drivers/gpu/drm/radeon/radeon_drv.c / radeon_PCI_IDS).
  There are fewer than 5 of these.

* where such tables are processed correctly via ARRAY_SIZE() (fewer than
  5 instances). This is by far not the typical usage of *_device_id
  arrays.

* some odd parsing artifacts, where ctx_statement_block() seems to quit
  in the middle of a block due to #if/#else/#endif.

Also, not every "struct *_device_id" is in fact a sentinel-requiring
structure, but even with such types, false positives are very rare.

Link: https://lkml.kernel.org/r/20250702235245.1007351-1-briannorris@chromium.org
Signed-off-by: Brian Norris <briannorris@chromium.org>
Acked-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Brian Norris <briannorris@chromium.org>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
bmastbergen pushed a commit to bmastbergen/kernel-src-tree that referenced this pull request Aug 29, 2025
jira LE-1907
cve CVE-2024-35989
Rebuild_History Non-Buildable kernel-5.14.0-427.40.1.el9_4
commit-author Fenghua Yu <fenghua.yu@intel.com>
commit f221033

During the removal of the idxd driver, registered offline callback is
invoked as part of the clean up process. However, on systems with only
one CPU online, no valid target is available to migrate the
perf context, resulting in a kernel oops:

    BUG: unable to handle page fault for address: 000000000002a2b8
    #PF: supervisor write access in kernel mode
    #PF: error_code(0x0002) - not-present page
    PGD 1470e1067 P4D 0
    Oops: 0002 [#1] PREEMPT SMP NOPTI
    CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ ctrliq#57
    Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023
    RIP: 0010:mutex_lock+0x2e/0x50
    ...
    Call Trace:
    <TASK>
    __die+0x24/0x70
    page_fault_oops+0x82/0x160
    do_user_addr_fault+0x65/0x6b0
    __pfx___rdmsr_safe_on_cpu+0x10/0x10
    exc_page_fault+0x7d/0x170
    asm_exc_page_fault+0x26/0x30
    mutex_lock+0x2e/0x50
    mutex_lock+0x1e/0x50
    perf_pmu_migrate_context+0x87/0x1f0
    perf_event_cpu_offline+0x76/0x90 [idxd]
    cpuhp_invoke_callback+0xa2/0x4f0
    __pfx_perf_event_cpu_offline+0x10/0x10 [idxd]
    cpuhp_thread_fun+0x98/0x150
    smpboot_thread_fn+0x27/0x260
    smpboot_thread_fn+0x1af/0x260
    __pfx_smpboot_thread_fn+0x10/0x10
    kthread+0x103/0x140
    __pfx_kthread+0x10/0x10
    ret_from_fork+0x31/0x50
    __pfx_kthread+0x10/0x10
    ret_from_fork_asm+0x1b/0x30
    <TASK>

Fix the issue by preventing the migration of the perf context to an
invalid target.

Fixes: 81dd4d4 ("dmaengine: idxd: Add IDXD performance monitor support")
	Reported-by: Terrence Xu <terrence.xu@intel.com>
	Tested-by: Terrence Xu <terrence.xu@intel.com>
	Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: https://lore.kernel.org/r/20240313214031.1658045-1-fenghua.yu@intel.com
	Signed-off-by: Vinod Koul <vkoul@kernel.org>
(cherry picked from commit f221033)
	Signed-off-by: Jonathan Maple <jmaple@ciq.com>
github-actions Bot pushed a commit that referenced this pull request May 31, 2026
Prevent a crash from happening as the first serial port is initialised:

  Console: switching to mono frame buffer device 160x64
  fb0: PMAG-AA frame buffer device at tc0
  DECstation Z85C30 serial driver version 0.10
  CPU 0 Unable to handle kernel paging request at virtual address 0000002c, epc == 803ab00c, ra == 803aafe0
  Oops[#1]:
  CPU: 0 PID: 1 Comm: swapper Not tainted 6.4.0-rc3-00031-g84a9582fd203-dirty #57
  $ 0   : 00000000 10012c00 803aaeb0 00000000
  $ 4   : 80e12f60 80e12f50 80e12f58 81000030
  $ 8   : 00000000 805ff37c 00000000 33433538
  $12   : 65732030 00000006 80c2915d 6c616972
  $16   : 80e12f00 807b7630 00000000 00000000
  $20   : 00000004 00000348 000001a0 807623b8
  $24   : 00000018 00000000
  $28   : 80c24000 80c25d60 8078b148 803aafe0
  Hi    : 00000000
  Lo    : 00000000
  epc   : 803ab00c serial_base_ctrl_add+0x78/0xf4
  ra    : 803aafe0 serial_base_ctrl_add+0x4c/0xf4
  Status: 10012c03	KERNEL EXL IE
  Cause : 00000008 (ExcCode 02)
  BadVA : 0000002c
  PrId  : 00000440 (R4400SC)
  Modules linked in:
  Process swapper (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
  Stack : 80760000 00000cc0 00400044 00400040 803aa02c 80d61ab8 00000000 807b7630
          80760000 807623b8 807b7628 803aa644 80386998 00000000 80e17780 80220f68
          80e17780 80d61ab8 80c17d80 80e17780 80e17780 8063c798 80e17780 80383fa0
          00000010 80e17780 00000000 80386998 807a0000 00000000 00400040 8038f848
          807623b8 80d61ab8 00000004 80e17780 00000000 803a68e4 80c25e2c 803bb884
          ...
  Call Trace:
  [<803ab00c>] serial_base_ctrl_add+0x78/0xf4
  [<803aa644>] serial_core_register_port+0x174/0x69c
  [<8077e9ac>] zs_init+0xc8/0xfc
  [<800404d4>] do_one_initcall+0x40/0x2ac
  [<8076cecc>] kernel_init_freeable+0x1e4/0x270
  [<80605bec>] kernel_init+0x20/0x108
  [<800431e8>] ret_from_kernel_thread+0x14/0x1c

  Code: 2442aeb0  ae120024  ae0200d0 <8c67002c> 50e00001  8c670000  3c06806  3c05806e  afb30010

  ---[ end trace 0000000000000000 ]---

(report at the offending commit) -- where a pointer is dereferenced that
has been derived from a null pointer to the port's parent device.

Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device.  Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.

Use platform_driver_probe() not just because SCC devices are fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the dz
driver and the first driver to claim it will prevent the other one from
using it.  Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.

An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.

Since there is one way only remaining to reach zs_reset() now, remove
the port initialisation marker as no longer needed and go through the
channel reset unconditionally.

Fixes: 84a9582 ("serial: core: Start managing serial controllers to enable runtime PM")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10
Link: https://patch.msgid.link/alpine.DEB.2.21.2605062328480.46195@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
github-actions Bot pushed a commit to athimm/kernel-src-tree that referenced this pull request Jun 9, 2026
commit 7cac59d upstream.

Prevent a crash from happening as the first serial port is initialised:

  Console: switching to mono frame buffer device 160x64
  fb0: PMAG-AA frame buffer device at tc0
  DECstation Z85C30 serial driver version 0.10
  CPU 0 Unable to handle kernel paging request at virtual address 0000002c, epc == 803ab00c, ra == 803aafe0
  Oops[ctrliq#1]:
  CPU: 0 PID: 1 Comm: swapper Not tainted 6.4.0-rc3-00031-g84a9582fd203-dirty ctrliq#57
  $ 0   : 00000000 10012c00 803aaeb0 00000000
  $ 4   : 80e12f60 80e12f50 80e12f58 81000030
  $ 8   : 00000000 805ff37c 00000000 33433538
  $12   : 65732030 00000006 80c2915d 6c616972
  $16   : 80e12f00 807b7630 00000000 00000000
  $20   : 00000004 00000348 000001a0 807623b8
  $24   : 00000018 00000000
  $28   : 80c24000 80c25d60 8078b148 803aafe0
  Hi    : 00000000
  Lo    : 00000000
  epc   : 803ab00c serial_base_ctrl_add+0x78/0xf4
  ra    : 803aafe0 serial_base_ctrl_add+0x4c/0xf4
  Status: 10012c03	KERNEL EXL IE
  Cause : 00000008 (ExcCode 02)
  BadVA : 0000002c
  PrId  : 00000440 (R4400SC)
  Modules linked in:
  Process swapper (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
  Stack : 80760000 00000cc0 00400044 00400040 803aa02c 80d61ab8 00000000 807b7630
          80760000 807623b8 807b7628 803aa644 80386998 00000000 80e17780 80220f68
          80e17780 80d61ab8 80c17d80 80e17780 80e17780 8063c798 80e17780 80383fa0
          00000010 80e17780 00000000 80386998 807a0000 00000000 00400040 8038f848
          807623b8 80d61ab8 00000004 80e17780 00000000 803a68e4 80c25e2c 803bb884
          ...
  Call Trace:
  [<803ab00c>] serial_base_ctrl_add+0x78/0xf4
  [<803aa644>] serial_core_register_port+0x174/0x69c
  [<8077e9ac>] zs_init+0xc8/0xfc
  [<800404d4>] do_one_initcall+0x40/0x2ac
  [<8076cecc>] kernel_init_freeable+0x1e4/0x270
  [<80605bec>] kernel_init+0x20/0x108
  [<800431e8>] ret_from_kernel_thread+0x14/0x1c

  Code: 2442aeb0  ae120024  ae0200d0 <8c67002c> 50e00001  8c670000  3c06806  3c05806e  afb30010

  ---[ end trace 0000000000000000 ]---

(report at the offending commit) -- where a pointer is dereferenced that
has been derived from a null pointer to the port's parent device.

Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device.  Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.

Use platform_driver_probe() not just because SCC devices are fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the dz
driver and the first driver to claim it will prevent the other one from
using it.  Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.

An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.

Since there is one way only remaining to reach zs_reset() now, remove
the port initialisation marker as no longer needed and go through the
channel reset unconditionally.

Fixes: 84a9582 ("serial: core: Start managing serial controllers to enable runtime PM")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10
Link: https://patch.msgid.link/alpine.DEB.2.21.2605062328480.46195@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
github-actions Bot pushed a commit that referenced this pull request Jun 9, 2026
commit 7cac59d upstream.

Prevent a crash from happening as the first serial port is initialised:

  Console: switching to mono frame buffer device 160x64
  fb0: PMAG-AA frame buffer device at tc0
  DECstation Z85C30 serial driver version 0.10
  CPU 0 Unable to handle kernel paging request at virtual address 0000002c, epc == 803ab00c, ra == 803aafe0
  Oops[#1]:
  CPU: 0 PID: 1 Comm: swapper Not tainted 6.4.0-rc3-00031-g84a9582fd203-dirty #57
  $ 0   : 00000000 10012c00 803aaeb0 00000000
  $ 4   : 80e12f60 80e12f50 80e12f58 81000030
  $ 8   : 00000000 805ff37c 00000000 33433538
  $12   : 65732030 00000006 80c2915d 6c616972
  $16   : 80e12f00 807b7630 00000000 00000000
  $20   : 00000004 00000348 000001a0 807623b8
  $24   : 00000018 00000000
  $28   : 80c24000 80c25d60 8078b148 803aafe0
  Hi    : 00000000
  Lo    : 00000000
  epc   : 803ab00c serial_base_ctrl_add+0x78/0xf4
  ra    : 803aafe0 serial_base_ctrl_add+0x4c/0xf4
  Status: 10012c03	KERNEL EXL IE
  Cause : 00000008 (ExcCode 02)
  BadVA : 0000002c
  PrId  : 00000440 (R4400SC)
  Modules linked in:
  Process swapper (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
  Stack : 80760000 00000cc0 00400044 00400040 803aa02c 80d61ab8 00000000 807b7630
          80760000 807623b8 807b7628 803aa644 80386998 00000000 80e17780 80220f68
          80e17780 80d61ab8 80c17d80 80e17780 80e17780 8063c798 80e17780 80383fa0
          00000010 80e17780 00000000 80386998 807a0000 00000000 00400040 8038f848
          807623b8 80d61ab8 00000004 80e17780 00000000 803a68e4 80c25e2c 803bb884
          ...
  Call Trace:
  [<803ab00c>] serial_base_ctrl_add+0x78/0xf4
  [<803aa644>] serial_core_register_port+0x174/0x69c
  [<8077e9ac>] zs_init+0xc8/0xfc
  [<800404d4>] do_one_initcall+0x40/0x2ac
  [<8076cecc>] kernel_init_freeable+0x1e4/0x270
  [<80605bec>] kernel_init+0x20/0x108
  [<800431e8>] ret_from_kernel_thread+0x14/0x1c

  Code: 2442aeb0  ae120024  ae0200d0 <8c67002c> 50e00001  8c670000  3c06806  3c05806e  afb30010

  ---[ end trace 0000000000000000 ]---

(report at the offending commit) -- where a pointer is dereferenced that
has been derived from a null pointer to the port's parent device.

Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device.  Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.

Use platform_driver_probe() not just because SCC devices are fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the dz
driver and the first driver to claim it will prevent the other one from
using it.  Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.

An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.

Since there is one way only remaining to reach zs_reset() now, remove
the port initialisation marker as no longer needed and go through the
channel reset unconditionally.

Fixes: 84a9582 ("serial: core: Start managing serial controllers to enable runtime PM")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10
Link: https://patch.msgid.link/alpine.DEB.2.21.2605062328480.46195@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant