Re: [PATCH] cpuidle: riscv-sbi: Stop using non-retentive suspend
From: Conor Dooley
Date: Wed Nov 23 2022 - 05:28:12 EST
Hey Anup,
On Wed, Nov 23, 2022 at 09:56:31AM +0530, Anup Patel wrote:
> On Tue, Nov 22, 2022 at 4:50 PM Conor Dooley <conor.dooley@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Nov 22, 2022 at 11:06:15AM +0530, Anup Patel wrote:
> > > On Tue, Nov 22, 2022 at 10:46 AM Palmer Dabbelt <palmer@xxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, 21 Nov 2022 19:45:07 PST (-0800), anup@xxxxxxxxxxxxxx wrote:
> > > > > On Tue, Nov 22, 2022 at 2:27 AM Palmer Dabbelt <palmer@xxxxxxxxxxxx> wrote:
> > > > >>
> > > > >> From: Palmer Dabbelt <palmer@xxxxxxxxxxxx>
> > > > >>
> > > > >> As per [1], whether or not the core can wake up from non-retentive
> > > > >> suspend is a platform-specific detail. We don't have any way to encode
> > > > >> that, so just stop using them until we've sorted that out.
> > > > >>
> > > > >> Link: https://github.com/riscv-non-isa/riscv-sbi-doc/issues/98#issuecomment-1288564687
> > > > >> Fixes: 6abf32f1d9c5 ("cpuidle: Add RISC-V SBI CPU idle driver")
> > > > >> Signed-off-by: Palmer Dabbelt <palmer@xxxxxxxxxxxx>
> > > > >
> > > > > This is just unnecessary maintenance churn and it's not the
> > > > > right way to go. Better to fix this the right way instead of having
> > > > > a temporary fix.
> > > > >
> > > > > I had already sent-out a patch series 5 months back to describe
> > > > > this in DT:
> > > > > https://lore.kernel.org/lkml/20220727114302.302201-1-apatel@xxxxxxxxxxxxxxxx/
> > > > >
> > > > > No one has commented/suggested anything (except Samuel
> > > > > Holland and Sudeep Holla).
> > > >
> > > > I see some comments from Krzysztof here
> > > > <https://lore.kernel.org/lkml/7a0477a0-9f0f-87d6-4070-30321745f4cc@xxxxxxxxxx/>
> > > > as well. Looks like everyone is pointing out that having our CPU nodes
> > > > encode timers is a bad idea, my guess is that they're probably right.
> > >
> > > Adding a separate timer DT node, creates a new set of compatibility
> > > issues for existing platforms. I am fine updating my series to have
> > > separate timer DT node but do we want to go in this direction ?
> >
> > I don't really follow. How is there a compatibility issue created by
> > adding a new node that is not added for a new property? Both will
> > require changes to the device tree. (You need not reply here, I am going
> > to review the other thread, it's been on my todo list for too long. Been
> > caught up with non-coherent stuff & our sw release cycle..)
>
> Adding a new timer DT node would mean, the RISC-V timer driver
> will now be probed using the compatible to the new DT node whereas
> the RISC-V timer driver is currently probed using CPU DT nodes.
Ahh, that is what I have missed. I'll continue my thoughts on this in
the dt-binding thread.
> > > Even if ARM has a separate timer DT node, the timers are still part
> > > of the CPU. It depends on how we see the DT bindings aligning with
> > > actual HW.
> > >
> > > >
> > > > > Please review this series. I can quickly address comments to
> > > > > make this available for Linux-6.2. Until this series is merged,
> > > > > the affected platforms can simply remove non-retentive suspend
> > > > > states from their DT.
> > > >
> > > > That leaves us with a dependency between kernel versions and DT
> > > > bindings: kernels with the current driver will result in broken systems
> > > > with the non-retentive suspend states in the DT they boot with when
> > > > those states can't wake up the CPU.
> >
> > Can someone point me at a (non D1 or virt) system that has suspend
> > states in the DT that would need fixing?
>
> For the QEMU virt machine, the default non-retentive suspend state was
> tested using a temporary DTB provided separately via QEMU command
> line. The QEMU virt machine does not have its own HART suspend
> states so OpenSBI will functionally emulate default retentive/non-retentive
> suspend states.
So since I asked for non D1 or virt systems, that's a no & no DTs
actually needs to be fixed :)
> > > This is not a new problem we are facing. Even in the ARM world,
> > > the DT bindings grew organically over time based on newer platform
> > > requirements.
> > >
> > > Now that we have a platform which does not want the time
> > > C3STOP feature, we need to first come-up with DT bindings
> > > to support this platform instead of temporarily disabling
> > > features which don't work on this platform.
> >
> > It's the opposite surely? It should be "now that we have a platform that
> > *does want* the C3STOP feature", right?
>
> Yes, we can think this way as well.
No, there's no "thinking" involved here from what I can tell. Pre-D1
systems do not seem to need the flag and the D1 does want that flag for
its riscv,timer. We have to operate with respect to hardware timelines
& the corresponding software implementations, not specs in this context.
If it was the case that you proposed, there would be no chance for
regressions if someone updates their kernel but not their DT.
> > > > > With all due respect, NACK to this patch from my side.
> >
> > As Samuel pointed out that the D1 doesn't actually use the timer in
> > question, I think we are okay here?
>
> Yes, that's why D1 needs the C3STOP flag.
I don't understand what you mean here, you don't appear to be replying
to what I said.
I was saying that the current D1 configuration does not actually use
the timer-riscv driver as there's another one that has a higher rating
& therefore we are okay to not apply this patch as my revert will not
cause it to be put into sleep states that it cannot return from.
Your reply makes no sense to me in that context.
Thanks,
Conor.