RE: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls in squashfs_readahead())
From: Elliott, Robert (Servers)
Date: Wed Nov 23 2022 - 11:09:53 EST
> -----Original Message-----
> From: Paul E. McKenney <paulmck@xxxxxxxxxx>
> Sent: Wednesday, November 23, 2022 8:44 AM
> Subject: Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls
> in squashfs_readahead())
>
> On Wed, Nov 23, 2022 at 10:14:07AM +0100, Mirsad Goran Todorovac wrote:
...
> > commit 1045a06724f322ed61f1ffb994427c7bdbe64647
> > Author: Christoph Hellwig <hch@xxxxxx>
> > Date: Wed Jun 29 17:01:02 2022 +0200
> >
> > remove CONFIG_ANDROID
>
> But your CONFIG_RCU_EXP_CPU_STALL_TIMEOUT seems to be about 20.
> As in 20 milliseconds.
>
> Is that intentional?
...
> > > And one approach is to take a less aggressive RCU CPU stall timeout,
> > > say reducing from 21 seconds to (say) 15 seconds instead of all the
> > > way down to 20 milliseconds. This could allow you to ease into the
> > > latency-reduction work.
> > >
> > > Alternatively, consider that response time is a property of the
> > > entire system plus the environment that it runs in. So I suspect that
> > > the Android folks are accompanying that 20-millisecond timeout with
> > > some restrictions on what the on-phone workloads are permitted to do.
> > > Maybe ask the Android guys what those restrictions are and loosen them
> > > slightly, again allowing you to ease into the latency-reduction work.
Since there is no longer a kernel default of 20 ms for Android since
commit 1045a06724f3 ("remove CONFIG_ANDROID"), it might be appropriate
to drop that from stallwarn.rst. Something like:
diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index e38c587067fc..d86a8b47504f 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -168,10 +168,11 @@ CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
the expedited grace period. This parameter defines the period
of time that RCU will wait from the beginning of an expedited
- grace period until it issues an RCU CPU stall warning. This time
- period is normally 20 milliseconds on Android devices. A zero
- value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
- after conversion to milliseconds.
+ grace period until it issues an RCU CPU stall warning.
+
+ A zero value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be
+ used, after conversion to milliseconds, limited to a maximum of
+ 21 seconds.
This configuration parameter may be changed at runtime via the
/sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 1b0c41d490f0..4477eeb8a54f 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -93,7 +93,7 @@ config RCU_EXP_CPU_STALL_TIMEOUT
If the RCU grace period persists, additional CPU stall warnings
are printed at more widely spaced intervals. A value of zero
says to use the RCU_CPU_STALL_TIMEOUT value converted from
- seconds to milliseconds.
+ seconds to milliseconds, limited to a maximum of 21 seconds.
> > > But please keep in mind that cond_resched() helps only in kernels
> > > built with CONFIG_PREEMPTION=n.
I think they also help with
CONFIG_PREEMPT_VOLUNTARY=y
which is used by Fedora and RHEL. For example, adding more of those in
the crypto tcrypt test module eliminated some RCU stalls when running
speed tests (that send lots of data through the algorithms).
> > I have bad news that 6.1-r6 is still affected with
> > squashfs_xz_uncompress bug, despite having both of your fixes
> > (as visible in above command's output -- double checked):
> >
> > [ 91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.
>
> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> this still happen?
BTW, you don't need to rebuild the kernel to change those parameters; they're
module parameters, so can be modified on the kernel command line (if needed
during boot) and sysfs (if only needed after boot).
For sysfs the syntax is:
#!/bin/bash
# set rcu timeouts to specified values
echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout