Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()"
From: Will Deacon
Date: Fri Nov 18 2022 - 07:34:10 EST
On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote:
> On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote:
> > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote:
> > > On 2022-11-14 14:11, Will Deacon wrote:
> > > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote:
> > > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7.
> > > > >
> > > > > As reported by Amit [1], dropping cache invalidation from
> > > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform
> > > > > (most probably on other Qcom platforms too). The reason is, Qcom
> > > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem
> > > > > for validation. The modem has a secure block (XPU) that will trigger a
> > > > > whole system crash if the shared memory is accessed by the CPU while modem
> > > > > is poking at it.
> > > > >
> > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory
> > > > > with no kernel mapping, vmap's it, copies the firmware metadata and
> > > > > unvmap's it. Finally the address is then shared with modem for metadata
> > > > > validation [2].
> > > > >
> > > > > Now because of the removal of cache invalidation from
> > > > > arch_dma_prep_coherent(), there will be cache lines associated with this
> > > > > memory even after sharing with modem. So when the CPU accesses it, the XPU
> > > > > violation gets triggered.
> > > >
> > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU
> > > > side, so how would the CPU access it?
> > >
> > > Right, for the previous change to have made a difference the offending part
> > > of this buffer must be present in some cache somewhere *before* the DMA
> > > buffer allocation completes.
> > >
> > > Clearly that driver is completely broken though. If the DMA allocation came
> > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap()
> > > shenanigans wouldn't work, so if it backed by struct pages then the whole
> > > dance is still pointless because *a cacheable linear mapping exists*, and
> > > it's just relying on the reduced chance that anything's going to re-fetch
> > > the linear map address after those pages have been allocated, exactly as I
> > > called out previously[1].
> >
> > So I guess a DMA pool that's not mapped in the linear map, together with
> > memremap() instead of vmap(), would work around the issue. But the
> > driver needs fixing, not the arch code.
> >
>
> Okay, thanks for the hint. Can you share how to allocate the dma-pool that's
> not part of the kernel's linear map? I looked into it but couldn't find a way.
The no-map property should take care of this iirc
Will