Re: [PATCH v5 3/3] softirq: defer softirq processing to ksoftirqd if CPU is busy with RT

From: John Stultz
Date: Fri Nov 18 2022 - 03:18:52 EST


On Wed, Nov 16, 2022 at 2:37 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, Nov 16, 2022 at 07:59:28AM +0000, John Stultz wrote:
> > From: Pavankumar Kondeti <pkondeti@xxxxxxxxxxxxxx>
> >
> > Defer the softirq processing to ksoftirqd if a RT task is
> > running or queued on the current CPU. This complements the RT
> > task placement algorithm which tries to find a CPU that is not
> > currently busy with softirqs.
> >
> > Currently NET_TX, NET_RX, BLOCK and IRQ_POLL softirqs are only
> > deferred as they can potentially run for long time.
> >
> > Additionally, this patch stubs out ksoftirqd_running() logic,
> > in the CONFIG_RT_SOFTIRQ_AWARE_SCHED case, as deferring
> > potentially long-running softirqs will cause the logic to not
> > process shorter-running softirqs immediately. By stubbing it out
> > the potentially long running softirqs are deferred, but the
> > shorter running ones can still run immediately.
>
> So I'm hating on the new config space, and dubious of the placement
> logic (I'm thinking much the same gain can be had when actual softirq
> processing stops quickly when we want to reschedule).

Hey Peter!
Thanks for taking the time to look this over and provide feedback!

> However I still have these here patches (revived from the dead):
>
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=core/softirq
>
> That fix some of this same... I think the last time this fell on its
> face due to regressions and me not having time/energy to chase them down
> and the author of the hack-of-the-day solution walking off or something
> like that.

So I've taken a look at your patch series and backported it (as well
as extended it to use softirq_needs_break for the BLOCK softirq) to a
5.10 device kernel where I can compare the behaviors against the tuned
code that's shipping.

At first glance at your patches, I was optimistic, as it's great that
it nicely time bounds the number of softirqs run, but I fet that it
doesn't help if a single softirq takes longer than we'd like.
(And indeed, I can see single BLOCK softirqs invocations taking quite
some time - > 16ms! in my last run)

Obviously "fix that driver to not do that" would be the standard
response, but it does become a constant wack-a-mole game. Sort of like
some of the philosophy around some of the PREEMPT_RT changes, where a
broad solution is used to avoid having to fix and maintain latencies
in code across the kernel, this task placement solution helps avoid rt
latencies being dependent on the continual perfection of all drivers
used.

> I think this aspect of the whole softirq thing really needs fixing first
> and not hidden under some obscure CONFIG symbol.

Sure, and while I am happy to help with your current patch series, as
I do think it should improve things, I'm not sure if it will let us
move away from the softirq-rt placement optimization patches.
Any ideas for additional approaches that would be more agreeable to you?

We could pull most of the logic out from the CONFIG file, maybe let
userland set the long-softirq mask which rt tasks would avoid
scheduling onto? Though I know folks would probably like to avoid the
cpupri_find_fitness test logic if they could so I'll have to think how
to efficiently shortcut that if the mask is null.

Anyway, thanks so much again for the feedback!
-john