Re: [PATCH V1] PCI/ASPM: Save/restore L1SS Capability for suspend/resume

From: Matthias Kaehlcke
Date: Wed Nov 23 2022 - 16:44:14 EST


Hi,

not sure this is the best thread to reply to, but we are also
observing suspend issues with the same Kioxia NVMe on a platform
with a Qualcomm sc7280 SoC. The system runs a v5.15 downstream
kernel which includes most (post v5.15) ASPM patches from
upstream.

There are two issues with the Kioxia NVMe:

1. Power consumption is high unless a LTR_L1.2_THRESHOLD of 0ns
is configured (related dubious patch: [1])

2. The system often hangs on resume unless a longer delay is
added in the suspend pass. QC engineers say that the NVMe is
taking so much time to settle in L1ss.

Other NVMe models don't exhibit power or suspend issues on this
platform, except for one model which also needs a shorter
delay during suspend, otherwise the system will hang
occasionally upon resume.

The second issue could possibly be 'fixed' with a quirk for
the Kioxia NVMe model, though it seems the issue is not seen on
all platforms, apparently the delay is not needed on Kenny's
system.

I'm currently a bit at a loss with the first issue. The patch
mentioned above claims that the (no-)snoop latencies are
involved, which may or may not be true. In this thread I saw
Kenny posting 'lspci' output from his (now) working system.
I noticed max (no-)snoop values of 3145728ns, which seems to
be some sort of default (programmed) max. On my system these
values are 0ns, which is the default value of the registers.
I tried to set these to 3146us from the kernel to see if that
makes a difference, but could only successfully update the max
snoop latency, but not non-snoop (maybe this can be only done
at early initialization time?). With just the max snoop latency
set to 3146us power consumption of the NVMe remains high.

The output of lspci from my system is attached.

In this thread it was mentioned that possibly a BIOS update
fixed the issue Kenny was seeing. What kind of values is
the BIOS supposed to adjust (I'm a PCI n00b)?

Any suggestions about what else to try?

Thanks

m.

[1] https://patchwork.kernel.org/project/linux-arm-msm/patch/1663315719-21563-1-git-send-email-quic_krichai@xxxxxxxxxxx/

---

0001:00:00.0 PCI bridge: Qualcomm Device 010b (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 183
IOMMU group: 0
Region 0: Memory at 40700000 (32-bit, non-prefetchable) [size=4K]
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: 00001000-00001fff [size=4K]
Memory behind bridge: 40300000-404fffff [size=2M]
Prefetchable memory behind bridge: 0000000040500000-00000000406fffff [size=2M]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/32 Maskable+ 64bit+
Address: 00000000fffff000 Data: 0000
Masking: fffffffe Pending: 00000000
Capabilities: [70] Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x2, ASPM L0s L1, Exit Latency L0s <1us, L1 <64us
ClockPM- Surprise- LLActRep+ BwNot- ASPMOptComp+
LnkCtl: ASPM L0s L1 Enabled; RCB 128 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x2 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise+
Slot #0, PowerLimit 0.000W; Interlock+ NoCompl-
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Off, PwrInd Off, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
Changed: MRL- PresDet- LinkState-
RootCap: CRSVisible-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ NROPrPrP+ LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- LN System CLS Not Supported, TPHComp+ ExtTPHComp- ARIFwd-
AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled, ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap+ ECRCGenEn- ECRCChkCap+ ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
RootCmd: CERptEn- NFERptEn- FERptEn-
RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
Capabilities: [148 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [168 v1] Transaction Processing Hints
No steering table available
Capabilities: [1fc v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=70us PortTPowerOnTime=0us
L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=70us LTR1.2_Threshold=86016ns
L1SubCtl2: T_PwrOn=10us
Kernel driver in use: pcieport

0001:01:00.0 Non-Volatile memory controller: KIOXIA Corporation Device 0001 (prog-if 02 [NVM Express])
Subsystem: KIOXIA Corporation Device 0001
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 182
IOMMU group: 0
Region 0: Memory at 40300000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [40] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <2us, L1 <32us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x2 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [80] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [90] MSI: Enable- Count=1/32 Maskable+ 64bit+
Address: 0000000000000000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [b0] MSI-X: Enable+ Count=32 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [150 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
Status: NegoPending- InProgress-
Capabilities: [260 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [300 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [400 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1- L1_PM_Substates+
PortCommonModeRestoreTime=60us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=86016ns
L1SubCtl2: T_PwrOn=10us
Kernel driver in use: nvme


On Tue, Apr 12, 2022 at 05:50:47PM -0500, Bjorn Helgaas wrote:
> [+cc Ricky for rtsx_pci ASPM behavior, Rajat, Prasad for L1 SS stuff,
> Victor for interest in disabling ASPM during save/restore]
>
> On Wed, Feb 16, 2022 at 06:41:39PM +0530, Vidya Sagar wrote:
> > On 2/16/2022 11:30 AM, Kenneth R. Crudup wrote:
> > > On Wed, 16 Feb 2022, Vidya Sagar wrote:
> > >
> > > > I see that the ASPM-L1 state of Realtek NIC which was in
> > > > disabled state before hibernate got enabled after hibernate.
> > >
> > > That's actually my SD-Card reader; there's a good chance the BIOS
> > > does "something" to it at boot time, as it's possible to boot from
> > > SD-Card on my laptop.
> > >
> > > > This patch doesn't do anything to LnkCtl register which has
> > > > control for ASPM L1 state.
> > >
> > > > Could you please check why ASPM L1 got enabled post hibernation?
> > >
> > > I wouldn't know how to do that; if you're still interested in that
> > > let me know what to do to determine that.
>
> > I would like Bjorn to take a call on it.
> > At this point, there are contradictions in observations.
>
> Remind me what contradictions you see? I know Kenny saw NVMe errors
> on a kernel that included 4257f7e008ea ("PCI/ASPM: Save/restore L1SS
> Capability for suspend/resume") in December 2020 [1], and that he did
> *not* see those errors on 4257f7e008ea in February 2022 [2]. Is that
> what you mean?
>
> > Just to summarize,
> > - The root ports in your laptop don't have support for L1SS
> > - With the same old code base with which the errors were observed plus my
> > patch on top of it, I see that ASPM-L1 state getting enabled for one of the
> > endpoints (Realtek SD-Card reader) after system comes out of hibernation
> > even though ASPM-L1 was disabled before the system enter into hibernation.
> > No errors are reported now.
>
> I assume you refer to [2], where on 4257f7e008ea ("PCI/ASPM:
> Save/restore L1SS Capability for suspend/resume"), Kenny saw ASPM L1
> disabled before hibernate and enabled afterwards:
>
> --- pre-hibernate
> +++ post-hibernate
> 00:1d.7 PCI bridge [0604]: Intel [8086:34b7]
> Bus: primary=00, secondary=58, subordinate=58
> LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> 58:00.0 RTS525A PCI Express Card Reader [10ec:525a]
> - LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk-
> - ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> + LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-
> + ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
>
> Per PCIe r6.0, sec 7.5.3.7, "ASPM L1 must be enabled by software in
> the Upstream component on a Link prior to enabling ASPM L1 in the
> Downstream component on that Link," so this definitely seems broken,
> but wouldn't explain the NVMe issue.
>
> The PCI core (pcie_config_aspm_link()) always enables L1 in the
> upstream component before the downstream one, but 58:00.0 uses the
> rtsx_pci driver, which does a lot of its own ASPM fiddling, so my
> guess is that it's doing something wrong here.
>
> > - With the linux-next top of the tree plus my patch, no change in the ASPM
> > states and no errors also reported.
>
> I don't know which report this refers to.
>
> > This points to BIOS being buggy (both old and new with new one being less
> > problematic)
>
> I agree that a BIOS change between [1] and [2] seems plausible, but I
> don't think we can prove that yet. I'm slightly queasy because while
> Kenny may have updated his BIOS, most people will not have.
>
> I think we should try this patch again with some changes and maybe
> some debug logging:
>
> - I wonder if we should integrate the LTR, L1 SS, and Link Control
> ASPM restore instead of having them spread around through
> pci_restore_ltr_state(), pci_restore_aspm_l1ss_state(), and
> pci_restore_pcie_state(). Maybe a new pci_restore_aspm() that
> would be called from pci_restore_pcie_state()?
>
> - For L1 PM Substates configuration, sec 5.5.4 says that both ports
> must be configured while ASPM L1 is disabled, but I don't think we
> currently guarantee this: we restore all the upstream component
> state first, and we don't know the ASPM state of the downstream
> one. Maybe we need to:
>
> * When restoring upstream component,
> + disable its ASPM
>
> * When restoring downstream component,
> + disable its ASPM
> + restore upstream component's LTR, L1SS
> + restore downstream component's LTR, L1SS
> + restore upstream component's ASPM
> + restore downstream component's ASPM
>
> This seems pretty messy, but seems like what the spec requires.
>
> - Add some pci_dbg() logging of all these save/restore values to
> help debug any issues.
>
> Bjorn
>
> [1] https://lore.kernel.org/r/20201228040513.GA611645@bjorn-Precision-5520
> [2] https://lore.kernel.org/r/3ca14a7-b726-8430-fe61-a3ac183a1088@xxxxxxxxx
>