Re: [PATCH] arm64/mm: Drop redundant BUG_ON(!pgtable_alloc)
From: Will Deacon
Date: Mon Nov 21 2022 - 07:51:43 EST
On Mon, Nov 21, 2022 at 12:27:51PM +0000, Mark Rutland wrote:
> On Mon, Nov 21, 2022 at 11:00:42AM +0530, Anshuman Khandual wrote:
> > On 11/20/22 21:46, Nathan Chancellor wrote:
> > > I just bisected a boot failure in our QEMU-based continuous integration
> > > setup to this change as commit 9ed2b4616d4e ("arm64/mm: Drop redundant
> > > BUG_ON(!pgtable_alloc)") in the arm64 tree. There is no output so the
> > > panic clearly happens early at boot. If I move back to the previous
> > > commit and add a WARN_ON() like so:
> > >
> > > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > > index d386033a074c..9280a92ff920 100644
> > > --- a/arch/arm64/mm/mmu.c
> > > +++ b/arch/arm64/mm/mmu.c
> > > @@ -383,6 +383,7 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
> > > phys &= PAGE_MASK;
> > > addr = virt & PAGE_MASK;
> > > end = PAGE_ALIGN(virt + size);
> > > + WARN_ON(!pgtable_alloc);
> > >
> > > do {
> > > next = pgd_addr_end(addr, end);
> > >
> > > I do see some stacktraces. I have attached the boot log from QEMU.
> > >
> > > If there is any additional information I can provide or patches I can
> > > test, I am more than happy to do so.
> >
> > There are couple of instances, where __create_pgd_mapping() function gets called
> > without a valid pgtable alloc function (NULL is passed on instead), as it is not
> > expected to allocate page table pages, during the mapping process. The following
> > change after this patch should solve the reported problem.
> >
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 9ea8e9039992..a00563122fcb 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -42,6 +42,7 @@
> > #define NO_BLOCK_MAPPINGS BIT(0)
> > #define NO_CONT_MAPPINGS BIT(1)
> > #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */
> > +#define NO_ALLOC_MAPPINGS BIT(3) /* does not allocate page table pages */
> >
> > int idmap_t0sz __ro_after_init;
> >
> > @@ -380,7 +381,7 @@ static void __create_pgd_mapping_locked(pgd_t *pgdir, phys_addr_t phys,
> > phys &= PAGE_MASK;
> > addr = virt & PAGE_MASK;
> > end = PAGE_ALIGN(virt + size);
> > - BUG_ON(!pgtable_alloc);
> > + BUG_ON(!(flags & NO_ALLOC_MAPPINGS) && !pgtable_alloc);
> >
> > do {
> > next = pgd_addr_end(addr, end);
> > @@ -453,7 +454,7 @@ static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
> > return;
> > }
> > __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
> > - NO_CONT_MAPPINGS);
> > + NO_CONT_MAPPINGS | NO_ALLOC_MAPPINGS);
> > }
> >
> > void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> > @@ -481,7 +482,7 @@ static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
> > }
> >
> > __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, NULL,
> > - NO_CONT_MAPPINGS);
> > + NO_CONT_MAPPINGS | NO_ALLOC_MAPPINGS);
> >
> > /* flush the TLBs after updating live kernel mappings */
> > flush_tlb_kernel_range(virt, virt + size);
>
> This is now more complicated than what we had originally, and it doesn't catch
> the case where the caller sets NO_ALLOC_MAPPINGS but the callee ends up needing
> to perform an allocation, which the old code would have caught.
>
> This is clearly more subtle than we thought initially; for now could we please
> just drop the patch?
Absolutely, this was supposed to be a trivial cleanup but clearly it's much
more than that. I'll revert it today.
Thanks, Nathan!
Will