Re: [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries
From: Peter Xu
Date: Wed Nov 16 2022 - 17:19:05 EST
On Fri, Oct 21, 2022 at 04:36:26PM +0000, James Houghton wrote:
> +struct hugetlb_pte {
> + pte_t *ptep;
> + unsigned int shift;
> + enum hugetlb_level level;
> + spinlock_t *ptl;
> +};
Do we need both shift + level? Maybe it's only meaningful for ARM where
the shift may not be directly calculcated from level?
I'm wondering whether we can just maintain "shift" then we calculate
"level" realtime. It just reads a bit weird to have these two fields, also
a burden to most of the call sites where shift and level exactly match..
> +
> +static inline
> +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep,
> + unsigned int shift, enum hugetlb_level level)
I'd think it's nicer to replace "populate" with something else, as populate
is definitely a meaningful word in vm world for "making something appear if
it wasn't". Maybe hugetlb_pte_setup()?
Even one step back, on the naming of hugetlb_pte.. Sorry to comment on
namings especially on this one, I really don't like to do that normally..
but here hugetlb_pte only walks the sub-page level of pgtables, meanwhile
it's not really a pte but an iterator. How about hugetlb_hgm_iter? "hgm"
tells that it only walks sub-level, and "iter" tells that it is an
iterator, being updated for each stepping downwards.
Then hugetlb_pte_populate() can be hugetlb_hgm_iter_init().
Take these comments with a grain of salt, and it never hurts to wait for a
2nd opinion before anything.
> +{
> + WARN_ON_ONCE(!ptep);
> + hpte->ptep = ptep;
> + hpte->shift = shift;
> + hpte->level = level;
> + hpte->ptl = NULL;
> +}
> +
> +static inline
> +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte)
> +{
> + WARN_ON_ONCE(!hpte->ptep);
> + return 1UL << hpte->shift;
> +}
> +
> +static inline
> +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte)
> +{
> + WARN_ON_ONCE(!hpte->ptep);
> + return ~(hugetlb_pte_size(hpte) - 1);
> +}
> +
> +static inline
> +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte)
> +{
> + WARN_ON_ONCE(!hpte->ptep);
> + return hpte->shift;
> +}
> +
> +static inline
> +enum hugetlb_level hugetlb_pte_level(const struct hugetlb_pte *hpte)
> +{
> + WARN_ON_ONCE(!hpte->ptep);
There're definitely a bunch of hpte->ptep WARN_ON_ONCE()s.. AFAIK the
hugetlb_pte* will be setup once with valid ptep and then it should always
be. I rem someone commented on these helpers look not useful, which I must
confess I had the same feeling. But besides that, I'd rather drop all
these WARN_ON_ONCE()s but only check it when init() the iterator/pte.
> + return hpte->level;
> +}
> +
> +static inline
> +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src)
> +{
> + dest->ptep = src->ptep;
> + dest->shift = src->shift;
> + dest->level = src->level;
> + dest->ptl = src->ptl;
> +}
> +
> +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte);
> +
> struct hugepage_subpool {
> spinlock_t lock;
> long count;
> @@ -1210,6 +1279,25 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h,
> return ptl;
> }
>
> +static inline
> +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte)
> +{
> +
> + BUG_ON(!hpte->ptep);
Another BUG_ON(); better be dropped too.
> + if (hpte->ptl)
> + return hpte->ptl;
> + return huge_pte_lockptr(hugetlb_pte_shift(hpte), mm, hpte->ptep);
I'm curious whether we can always have hpte->ptl set for a valid
hugetlb_pte. I think that means we'll need to also init the ptl in the
init() fn of the iterator. Then it'll be clear on which lock to take for
each valid hugetlb_pte.
> +}
--
Peter Xu