32bit architectures and __HAVE_ARCH_PTE_SWP_EXCLUSIVE
From: David Hildenbrand
Date: Tue Nov 22 2022 - 09:07:47 EST
Hi all,
Spoiler: is there a real use case for > 16 GiB of swap in a single file
on 32bit architectures?
I'm currently looking into implementing __HAVE_ARCH_PTE_SWP_EXCLUSIVE
support for all remaining architectures. So far, I only implemented it
for the most relevant enterprise architectures.
With __HAVE_ARCH_PTE_SWP_EXCLUSIVE, we remember when unmapping a page
and replacing the present PTE by a swap PTE for swapout whether the
anonymous page that was mapped was exclusive (PageAnonExclusive(), i.e.,
not COW-shared). When refaulting that page, whereby we replace the swap
PTE by a present PTE, we can reuse that information to map that page
writable and avoid unnecessary page copies due to COW, even if there are
still unexpected references on the page.
While this would usually be a pure optimization, currently O_DIRECT
still (wrongly) uses FOLL_GET instead of FOLL_PIN and can trigger in
corner cases memory corruptions. So for that case, it is also a
temporary fix until O_DIRECT properly uses FOLL_PIN. More details can be
found in [1].
Ideally, I'd just implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE for all
architectures. However, __HAVE_ARCH_PTE_SWP_EXCLUSIVE requires an
additional bit in the swap PTE. While mostly unproblematic on 64bit, for
32bit this implies that we'll have to "steal" one bit from the swap
offset on most architectures, reducing the maximum swap size per file.
Assuming we previously supported 32 GiB per swap file (e.g., hexagon,
csky), this number would get reduced to 16 GiB. The kernel would
automatically truncate the oversized swap area and the system would
continue working by using less space of that swapfile, but ... well, is
there a but?
Usually (well, there is PAE on x86 ...), a 32bit system can address 4
GiB of memory. Maximum swap size recommendation seem to be around 2--3
times the memory size (2x without hibernation, 3x with hibernation). So
it sounds like there is barely a use case for more swap space. Of course
one can use multiple swap files.
So, is anybody aware of excessive swap space requirements on 32bit?
Note that I thought about storing the exclusive marker in the swap_map
instead of in the swap PTE, but quickly decided to discard that idea
because it results in significantly more complexity and the swap code is
already horrible enough.
[1] https://lkml.kernel.org/r/20220329164329.208407-1-david@xxxxxxxxxx
--
Thanks,
David / dhildenb