PR of Maira's pmdomain/downstream/timeouts branch by pelwell · Pull Request #7400 · raspberrypi/linux

pelwell · 2026-05-25T20:07:33Z

Turn Maira's branch into a PR to get the build artefacts.

pelwell · 2026-05-26T19:02:03Z

Consider this an Approve.

rvprudent-lang · 2026-05-27T08:19:30Z

First, sorry for not getting back sooner — I've been caught up with other things and only just had the chance to test this today.

Thank you both for the quick turnaround on this. The analysis of the MMU/TLB flush issue and the runtime PM work look spot-on based on what I was seeing.

I've just applied the patch on my RPi4 (running headless, Debian 13):

sudo rpi-update pulls/7400/head

Kernel is now 6.18.33-v8+. I removed my --gpu workaround and WayVNC is running again with hardware acceleration enabled. Boot was clean — no v3d errors or MMU pte invalid messages in dmesg.

I'll leave it running for 48 hours and report back. Fingers crossed 🤞

Thanks again for the great work!

pelwell · 2026-05-27T13:07:14Z

I think this is looking good enough to merge. Are you happy for me to proceed, @mairacanal?

Commit 18605b1 ("pmdomain: bcm: bcm2835-power: Increase ASB control timeout") raised the ASB handshake polling budget from 1us to 5us. Surveying the pmdomain subsystem, 5us is still the smallest polling budget by a wide margin - comparable handshakes in other drivers use: - 100us : starfive jh71xx-pmu, apple pmgr-pwrstate - 1ms : renesas rcar-sysc, rmobile-sysc (power-on) - 10ms : renesas rcar-gen4-sysc, sunxi sun55i-pck600 - 1s : mediatek mtk-pm-domains, mtk-scpsys Raise the bcm2835 timeout to 100us, matching analogous drivers. 100us is still negligible relative to a power-domain transition and gives the V3D master ASB substantially more headroom to drain under heavy workloads, where 5us has been observed to be insufficient in practice. Cc: stable@vger.kernel.org Fixes: b826d2c ("pmdomain: bcm: bcm2835-power: Increase ASB control timeout") Signed-off-by: Maíra Canal <mcanal@igalia.com>

Make the downstream version match the upstream commit 458f2a712ab4 ("drm/v3d: Introduce Runtime Power Management"). Signed-off-by: Maíra Canal <mcanal@igalia.com>

v3d_mmu_set_page_table() ends by calling v3d_mmu_flush_all() to flush the MMU cache and clear the TLB after reprogramming V3D_MMU_PT_PA_BASE. v3d_mmu_flush_all() is gated by pm_runtime_get_if_active(), which returns 0 unless runtime_status == RPM_ACTIVE. v3d_mmu_set_page_table() is called from two paths that *know* V3D is reachable, but where the runtime PM status might be wrong: 1. v3d_power_resume(): the runtime resume callback itself, where runtime_status is RPM_RESUMING. 2. v3d_reset(): called from the DRM scheduler timeout handler with the hung job's pm_runtime reference held, so RPM_ACTIVE, but here we don't need to take an extra reference for the duration of the flush either. In the first case pm_runtime_get_if_active() returns 0, the flush is silently skipped, and V3D resumes executing with whatever MMUC/TLB state happened to survive the last reset. On BCM2711, this leaves stale translations live across runtime PM cycles, manifesting as random GPU hangs. Split the actual flush sequence into a helper that does the writes unconditionally, and have v3d_mmu_set_page_table() call it directly. Fixes: 17af1d14deaf ("drm/v3d: Introduce Runtime Power Management") Signed-off-by: Maíra Canal <mcanal@igalia.com>

v3d_clean_caches() starts the cache-clean sequence by writing V3D_L2TCACTL_TMUWCF to V3D_CTL_L2TCACTL and then polling for that bit to clear. It does not, however, check for an L2T flush (L2TFLS) that may still be in flight from a previous operation. On pre-V3D 7.1 hardware, kicking off the TMU write-combiner flush while an L2T flush is still pending can clobber bits in L2TCACTL and cause cache inconsistencies. Poll for L2TFLS to clear before writing L2TCACTL on V3D < 7.1, ensuring any pending flush has completed before a new clean is issued. Cc: stable@vger.kernel.org Fixes: d223f98 ("drm/v3d: Add support for compute shader dispatch.") Signed-off-by: Maíra Canal <mcanal@igalia.com>

On runtime suspend, clean the V3D caches before suspending so all dirty lines are written back to memory before the power domain is shut down. Fixes several system hangs reported in [1][2][3]. Closes: raspberrypi#7381 [1] Closes: raspberrypi#7396 [2] Closes: raspberrypi#7397 [3] Fixes: 17af1d14deaf ("drm/v3d: Introduce Runtime Power Management") Signed-off-by: Maíra Canal <mcanal@igalia.com>

mairacanal · 2026-05-27T13:42:11Z

@pelwell, I just rebased the branch and reviewed the patches (fixing a few nits in the commit messages), so we are good to go. Thanks!

I'll proceed with the upstreaming process.

See: raspberrypi/linux#7394 kernel: PR of Maira's pmdomain/downstream/timeouts branch See: raspberrypi/linux#7400

rvprudent-lang · 2026-05-28T16:28:03Z

Following up on my previous comment — I promised a 48h report, here's the result.

The test ran for ~22 hours before being cut short by a power outage (not a GPU hang). During that entire window, zero v3d errors, no MMU faults, no resets:

$ journalctl -b -1 -k | grep -iE 'v3d.*error|hang|MMU error|reset GPU'
(no output)

Kernel 6.18.33-v8+ with WayVNC running --gpu was stable throughout. I can't call it a full 48h clean run, but the fix is clearly working — the v3d hang that used to appear within ~26h didn't show up at all.

Congrats on the merge and on getting this upstreamed. Thanks again to @pelwell and @mairacanal for the fast turnaround!

This reverts commit 3341dd2. After #7400, this commit is no longer needed. After further analysis, the 100ms autosuspend delay was only ever a workaround: shorter delays caused more frequent runtime suspend/resume cycles on the BCM2711, which exposed the cache and MMU coherency bugs as random GPU hangs. With those hangs resolved, the inflated delay is no longer necessary. Reduce it from 100ms to 50ms so the GPU power domain can be released sooner once the GPU goes idle. Signed-off-by: Maíra Canal <mcanal@igalia.com>

This was referenced May 25, 2026

Random system freezes on RPi 4/400 using kernel 6.18 - possible regression from 6.12 #7381

Closed

v3d: GPU hang with V3D_ERR_STAT=0x00001000 / MMU pte invalid causing system crash (RPi4, headless, WayVNC) #7397

Closed

mairacanal force-pushed the pmdomain/downstream/timeouts branch from dd930f1 to da2f572 Compare May 26, 2026 23:55

mairacanal added 5 commits May 27, 2026 10:34

[BACKPORT] drm/v3d: Introduce Runtime Power Management

b3b6213

Make the downstream version match the upstream commit 458f2a712ab4 ("drm/v3d: Introduce Runtime Power Management"). Signed-off-by: Maíra Canal <mcanal@igalia.com>

mairacanal force-pushed the pmdomain/downstream/timeouts branch from da2f572 to 716bd38 Compare May 27, 2026 13:41

pelwell merged commit 95b85be into raspberrypi:rpi-6.18.y May 27, 2026
12 checks passed

pelwell mentioned this pull request May 27, 2026

Kernel 7.0.9 Freezing #7396

Closed

popcornmix added a commit to raspberrypi/firmware that referenced this pull request May 27, 2026

kernel: Revert downstream changes to imx219 driver

ac55e8b

See: raspberrypi/linux#7394 kernel: PR of Maira's pmdomain/downstream/timeouts branch See: raspberrypi/linux#7400

popcornmix added a commit to raspberrypi/rpi-firmware that referenced this pull request May 27, 2026

kernel: Revert downstream changes to imx219 driver

f0192de

See: raspberrypi/linux#7394 kernel: PR of Maira's pmdomain/downstream/timeouts branch See: raspberrypi/linux#7400

mairacanal mentioned this pull request May 30, 2026

Revert "drm/v3d: Increase the autosuspend delay" #7408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PR of Maira's pmdomain/downstream/timeouts branch#7400

PR of Maira's pmdomain/downstream/timeouts branch#7400
pelwell merged 5 commits into
raspberrypi:rpi-6.18.yfrom
mairacanal:pmdomain/downstream/timeouts

pelwell commented May 25, 2026

Uh oh!

pelwell commented May 26, 2026

Uh oh!

rvprudent-lang commented May 27, 2026

Uh oh!

pelwell commented May 27, 2026

Uh oh!

mairacanal commented May 27, 2026

Uh oh!

Uh oh!

rvprudent-lang commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pelwell commented May 25, 2026

Uh oh!

pelwell commented May 26, 2026

Uh oh!

rvprudent-lang commented May 27, 2026

Uh oh!

pelwell commented May 27, 2026

Uh oh!

mairacanal commented May 27, 2026

Uh oh!

Uh oh!

rvprudent-lang commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants