RE: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())

From: Elliott, Robert (Servers)
Date: Fri Nov 18 2022 - 11:52:13 EST




> -----Original Message-----
> From: Phillip Lougher <phillip@xxxxxxxxxxxxxxx>
> Sent: Friday, November 18, 2022 12:11 AM
> To: Mirsad Goran Todorovac <mirsad.todorovac@xxxxxxxxxxxx>; LKML <linux-
> kernel@xxxxxxxxxxxxxxx>; Paul E. McKenney <paulmck@xxxxxxxxxx>
> Cc: phillip.lougher@xxxxxxxxx; Thorsten Leemhuis
> <regressions@xxxxxxxxxxxxx>
> Subject: Re: BUG: in squashfs_xz_uncompress() (Was: RCU stalls in
> squashfs_readahead())
>
> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
> > Hi,
> >
> > While trying to bisect, I've found another bug that predated the
> > introduction of squashfs_readahead(), but it has
> > a common denominator in squashfs_decompress() and
> squashfs_xz_uncompress().
>
> Wrong, the stall is happening in the XZ decompressor library, which
> is *not* in Squashfs.
>
> This reported stall in the decompressor code is likely a symptom of you
> deliberately thrashing your system. When the system thrashes everything
> starts to happen very slowly, and the system will spend a lot of
> its time doing page I/O, and the CPU will spend a lot of time in
> any CPU intensive code like the XZ decompressor library.
>
> So the fact the stall is being hit here is a symptom and not
> a cause. The decompressor code is likely running slowly due to
> thrashing and waiting on paged-out buffers. This is not indicative
> of any bug, merely a system running slowly due to overload.
>
> As I said, this is not a Squashfs issue, because the code when the
> stall takes place isn't in Squashfs.
>
> The people responsible for the rcu code should have a lot more insight
> about what happens when the system is thrashing, and how this will
> throw up false positives. In this I believe this is an instance of
> perfectly correct code running slowly due to thrashing incorrectly
> being flagged as looping.
>
> CC'ing Paul E. McKenney <paulmck@xxxxxxxxxx>
>
> Phillip

How big can these readahead sizes be? Should one of the loops include
cond_resched() calls?