Re: [PATCH 2/2] mm/page_alloc: Leave IRQs enabled for per-cpu page allocations
From: Vlastimil Babka
Date: Fri Nov 18 2022 - 09:31:11 EST
On 11/18/22 11:17, Mel Gorman wrote:
> The pcp_spin_lock_irqsave protecting the PCP lists is IRQ-safe as a task
> allocating from the PCP must not re-enter the allocator from IRQ context.
> In each instance where IRQ-reentrancy is possible, the lock is acquired
> using pcp_spin_trylock_irqsave() even though IRQs are disabled and
> re-entrancy is impossible.
>
> Demote the lock to pcp_spin_lock avoids an IRQ disable/enable in the common
> case at the cost of some IRQ allocations taking a slower path. If the PCP
> lists need to be refilled, the zone lock still needs to disable IRQs but
> that will only happen on PCP refill and drain. If an IRQ is raised when
> a PCP allocation is in progress, the trylock will fail and fallback to
> using the buddy lists directly. Note that this may not be a universal win
> if an interrupt-intensive workload also allocates heavily from interrupt
> context and contends heavily on the zone->lock as a result.
>
> [yuzhao@xxxxxxxxxx: Reported lockdep issue on IO completion from softirq]
> [hughd@xxxxxxxxxx: Fix list corruption, lock improvements, micro-optimsations]
> Signed-off-by: Mel Gorman <mgorman@xxxxxxxxxxxxxxxxxxx>
Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx>
Some nits below:
> @@ -3516,10 +3485,10 @@ void free_unref_page(struct page *page, unsigned int order)
> */
> void free_unref_page_list(struct list_head *list)
> {
> + unsigned long __maybe_unused UP_flags;
> struct page *page, *next;
> struct per_cpu_pages *pcp = NULL;
> struct zone *locked_zone = NULL;
> - unsigned long flags;
> int batch_count = 0;
> int migratetype;
>
> @@ -3550,11 +3519,26 @@ void free_unref_page_list(struct list_head *list)
>
> /* Different zone, different pcp lock. */
> if (zone != locked_zone) {
> - if (pcp)
> - pcp_spin_unlock_irqrestore(pcp, flags);
> + if (pcp) {
> + pcp_spin_unlock(pcp);
> + pcp_trylock_finish(UP_flags);
> + }
>
> + /*
> + * trylock is necessary as pages may be getting freed
> + * from IRQ or SoftIRQ context after an IO completion.
> + */
> + pcp_trylock_prepare(UP_flags);
> + pcp = pcp_spin_trylock(zone->per_cpu_pageset);
> + if (!pcp) {
Perhaps use unlikely() here?
> + pcp_trylock_finish(UP_flags);
> + free_one_page(zone, page, page_to_pfn(page),
> + 0, migratetype, FPI_NONE);
Not critical for correctness, but the migratepage here might be stale and we
should do get_pcppage_migratetype(page);
> + locked_zone = NULL;
> + continue;
> + }
> locked_zone = zone;
> - pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags);
> + batch_count = 0;
> }
>
> /*
> @@ -3569,18 +3553,23 @@ void free_unref_page_list(struct list_head *list)
> free_unref_page_commit(zone, pcp, page, migratetype, 0);
>
> /*
> - * Guard against excessive IRQ disabled times when we get
> - * a large list of pages to free.
> + * Guard against excessive lock hold times when freeing
> + * a large list of pages. Lock will be reacquired if
> + * necessary on the next iteration.
> */
> if (++batch_count == SWAP_CLUSTER_MAX) {
> - pcp_spin_unlock_irqrestore(pcp, flags);
> + pcp_spin_unlock(pcp);
> + pcp_trylock_finish(UP_flags);
> batch_count = 0;
> - pcp = pcp_spin_lock_irqsave(locked_zone->per_cpu_pageset, flags);
> + pcp = NULL;
> + locked_zone = NULL;
AFAICS if this block was just "locked_zone = NULL;" then the existing code
would do the right thing.
Or maybe to have simpler code, just do batch_count++ here and
make the relocking check do
if (zone != locked_zone || batch_count == SWAP_CLUSTER_MAX)
> }
> }
>
> - if (pcp)
> - pcp_spin_unlock_irqrestore(pcp, flags);
> + if (pcp) {
> + pcp_spin_unlock(pcp);
> + pcp_trylock_finish(UP_flags);
> + }
> }
>
> /*