Re: [PATCH 0/3] mm,thp,rmap: rework the use of subpages_mapcount
From: Johannes Weiner
Date: Mon Nov 21 2022 - 13:52:07 EST
On Mon, Nov 21, 2022 at 04:59:38PM +0000, Shakeel Butt wrote:
> On Fri, Nov 18, 2022 at 01:08:13AM -0800, Hugh Dickins wrote:
> > Linus was underwhelmed by the earlier compound mapcounts series:
> > this series builds on top of it (as in next-20221117) to follow
> > up on his suggestions - except rmap.c still using lock_page_memcg(),
> > since I hesitate to steal the pleasure of deletion from Johannes.
>
> Is there a plan to remove lock_page_memcg() altogether which I missed? I
> am planning to make lock_page_memcg() a nop for cgroup-v2 (as it shows
> up in the perf profile on exit path) but if we are removing it then I
> should just wait.
We can remove it for rmap at least, but we might be able to do more.
Besides rmap, we're left with the dirty and writeback page transitions
that wrt cgroups need to be atomic with NR_FILE_DIRTY and NR_WRITEBACK.
Looking through the various callsites, I think we can delete it from
setting and clearing dirty state, as we always hold the page lock (or
the pte lock in some instances of folio_mark_dirty). Both of these are
taken from the cgroup side, so we're good there.
I think we can also remove it when setting writeback, because those
sites have the page locked as well.
That leaves clearing writeback. This can't hold the page lock due to
the atomic context, so currently we need to take lock_page_memcg() as
the lock of last resort.
I wonder if we can have cgroup take the xalock instead: writeback
ending on file pages always acquires the xarray lock. Swap writeback
currently doesn't, but we could make it so (swap_address_space).
The only thing that gives me pause is the !mapping check in
__folio_end_writeback. File and swapcache pages usually have mappings,
and truncation waits for writeback to finish before axing
page->mapping. So AFAICS this can only happen if we call end_writeback
on something that isn't under writeback - in which case the test_clear
will fail and we don't update the stats anyway. But I want to be sure.
Does anybody know from the top of their heads if a page under
writeback could be without a mapping in some weird cornercase?
If we could ensure that the NR_WRITEBACK decs are always protected by
the xalock, we could grab it from mem_cgroup_move_account(), and then
kill lock_page_memcg() altogether.