PR of Maira's pmdomain/downstream/timeouts branch#7400
Conversation
|
Consider this an Approve. |
dd930f1 to
da2f572
Compare
|
Hi @pelwell and @mairacanal, First, sorry for not getting back sooner — I've been caught up with other things and only just had the chance to test this today. Thank you both for the quick turnaround on this. The analysis of the MMU/TLB flush issue and the runtime PM work look spot-on based on what I was seeing. I've just applied the patch on my RPi4 (running headless, Debian 13): Kernel is now I'll leave it running for 48 hours and report back. Fingers crossed 🤞 Thanks again for the great work! |
|
I think this is looking good enough to merge. Are you happy for me to proceed, @mairacanal? |
Commit 18605b1 ("pmdomain: bcm: bcm2835-power: Increase ASB control timeout") raised the ASB handshake polling budget from 1us to 5us. Surveying the pmdomain subsystem, 5us is still the smallest polling budget by a wide margin - comparable handshakes in other drivers use: - 100us : starfive jh71xx-pmu, apple pmgr-pwrstate - 1ms : renesas rcar-sysc, rmobile-sysc (power-on) - 10ms : renesas rcar-gen4-sysc, sunxi sun55i-pck600 - 1s : mediatek mtk-pm-domains, mtk-scpsys Raise the bcm2835 timeout to 100us, matching analogous drivers. 100us is still negligible relative to a power-domain transition and gives the V3D master ASB substantially more headroom to drain under heavy workloads, where 5us has been observed to be insufficient in practice. Cc: stable@vger.kernel.org Fixes: b826d2c ("pmdomain: bcm: bcm2835-power: Increase ASB control timeout") Signed-off-by: Maíra Canal <mcanal@igalia.com>
Make the downstream version match the upstream commit
458f2a712ab4 ("drm/v3d: Introduce Runtime Power Management").
Signed-off-by: Maíra Canal <mcanal@igalia.com>
v3d_mmu_set_page_table() ends by calling v3d_mmu_flush_all() to flush the
MMU cache and clear the TLB after reprogramming V3D_MMU_PT_PA_BASE.
v3d_mmu_flush_all() is gated by pm_runtime_get_if_active(), which returns
0 unless runtime_status == RPM_ACTIVE.
v3d_mmu_set_page_table() is called from two paths that *know* V3D is
reachable, but where the runtime PM status might be wrong:
1. v3d_power_resume(): the runtime resume callback itself, where
runtime_status is RPM_RESUMING.
2. v3d_reset(): called from the DRM scheduler timeout handler with the
hung job's pm_runtime reference held, so RPM_ACTIVE, but here we
don't need to take an extra reference for the duration of the flush
either.
In the first case pm_runtime_get_if_active() returns 0, the flush is
silently skipped, and V3D resumes executing with whatever MMUC/TLB state
happened to survive the last reset. On BCM2711, this leaves stale
translations live across runtime PM cycles, manifesting as random GPU
hangs.
Split the actual flush sequence into a helper that does the writes
unconditionally, and have v3d_mmu_set_page_table() call it directly.
Fixes: 17af1d14deaf ("drm/v3d: Introduce Runtime Power Management")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
v3d_clean_caches() starts the cache-clean sequence by writing V3D_L2TCACTL_TMUWCF to V3D_CTL_L2TCACTL and then polling for that bit to clear. It does not, however, check for an L2T flush (L2TFLS) that may still be in flight from a previous operation. On pre-V3D 7.1 hardware, kicking off the TMU write-combiner flush while an L2T flush is still pending can clobber bits in L2TCACTL and cause cache inconsistencies. Poll for L2TFLS to clear before writing L2TCACTL on V3D < 7.1, ensuring any pending flush has completed before a new clean is issued. Cc: stable@vger.kernel.org Fixes: d223f98 ("drm/v3d: Add support for compute shader dispatch.") Signed-off-by: Maíra Canal <mcanal@igalia.com>
On runtime suspend, clean the V3D caches before suspending so all dirty lines are written back to memory before the power domain is shut down. Fixes several system hangs reported in [1][2][3]. Closes: raspberrypi#7381 [1] Closes: raspberrypi#7396 [2] Closes: raspberrypi#7397 [3] Fixes: 17af1d14deaf ("drm/v3d: Introduce Runtime Power Management") Signed-off-by: Maíra Canal <mcanal@igalia.com>
da2f572 to
716bd38
Compare
|
@pelwell, I just rebased the branch and reviewed the patches (fixing a few nits in the commit messages), so we are good to go. Thanks! I'll proceed with the upstreaming process. |
See: raspberrypi/linux#7394 kernel: PR of Maira's pmdomain/downstream/timeouts branch See: raspberrypi/linux#7400
See: raspberrypi/linux#7394 kernel: PR of Maira's pmdomain/downstream/timeouts branch See: raspberrypi/linux#7400
|
Following up on my previous comment — I promised a 48h report, here's the result. The test ran for ~22 hours before being cut short by a power outage (not a GPU hang). During that entire window, zero Kernel Congrats on the merge and on getting this upstreamed. Thanks again to @pelwell and @mairacanal for the fast turnaround! |
This reverts commit 3341dd2. After #7400, this commit is no longer needed. After further analysis, the 100ms autosuspend delay was only ever a workaround: shorter delays caused more frequent runtime suspend/resume cycles on the BCM2711, which exposed the cache and MMU coherency bugs as random GPU hangs. With those hangs resolved, the inflated delay is no longer necessary. Reduce it from 100ms to 50ms so the GPU power domain can be released sooner once the GPU goes idle. Signed-off-by: Maíra Canal <mcanal@igalia.com>
Turn Maira's branch into a PR to get the build artefacts.