Re: [PATCH RFC v2 06/12] mm/hugetlb: Protect huge_pmd_share() with walker lock

From: Peter Xu
Date: Thu Nov 17 2022 - 20:18:54 EST


On Thu, Nov 17, 2022 at 08:10:19PM -0500, Peter Xu wrote:
> huge_pmd_share() is normally called with vma lock held already, it lets us
> feel like we don't really need the walker lock. But that's not true,
> because we only took the vma lock for "current" vma, but not all the vma
> pgtables we're going to walk upon.
>
> Taking each vma lock may lead to deadlock and hard to order. The safe
> approach is just to use the walker lock which guarantees the pgtable page
> being alive, then we should use get_page_unless_zero() rather than
> get_page() only, to make sure the pgtable page is not being freed at the
> meantime.
>
> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> ---
> mm/hugetlb.c | 16 +++++++++++++---
> 1 file changed, 13 insertions(+), 3 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 61a1fa678172..5ef883184885 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -7008,6 +7008,13 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
> spinlock_t *ptl;
>
> i_mmap_lock_read(mapping);
> +
> + /*
> + * NOTE: even if we've got the vma read lock, here we still need to
> + * take the walker lock, because we're not walking the current vma,
> + * but some other mm's!
> + */
> + hugetlb_walker_lock();
> vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) {
> if (svma == vma)
> continue;
> @@ -7016,12 +7023,15 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
> if (saddr) {
> spte = huge_pte_offset(svma->vm_mm, saddr,
> vma_mmu_pagesize(svma));
> - if (spte) {
> - get_page(virt_to_page(spte));
> + /*
> + * When page ref==0, it means it's probably being
> + * freed; continue with the next one.
> + */
> + if (spte && get_page_unless_zero(virt_to_page(spte)))
> break;
> - }
> }
> }
> + hugetlb_walker_unlock();
>
> if (!spte)
> goto out;
> --
> 2.37.3
>

Will squash this in..

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6c77ae7a3d94..f9abede9de47 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -7045,6 +7045,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma,
*/
if (spte && get_page_unless_zero(virt_to_page(spte)))
break;
+ else
+ spte = NULL;
}
}
hugetlb_walker_unlock();

--
Peter Xu