Re: [PATCH v6 0/4] sched/fair: Improve scan efficiency of SIS

From: Abel Wu
Date: Wed Nov 23 2022 - 22:50:27 EST

Next message: Wang Honghui: "[PATCH 2/3]mailbox: Add to support Phytium FT2000/4 CPU"
Previous message: Macpaul Lin: "Re: [PATCH v4] arm64: dts: mt8195: Add Ethernet controller"
In reply to: K Prateek Nayak: "Re: [PATCH v6 0/4] sched/fair: Improve scan efficiency of SIS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi Prateek, thanks again for your detailed test!

On 11/22/22 7:28 PM, K Prateek Nayak wrote:

Hello Abel,

Following are the results for hackbench with larger number of
groups, ycsb-mongodb, Spec-JBB, and unixbench. Apart for
a regression in unixbench spawn in NPS2 and NPS4 mode and
unixbench syscall in NPs2 mode, everything looks good.

...

-> unixbench-syscall

o NPS4

kernel: tip sis_core
Min unixbench-syscall-1 2971799.80 ( 0.00%) 2979335.60 ( -0.25%)
Min unixbench-syscall-512 7824196.90 ( 0.00%) 8155610.20 ( -4.24%)
Amean unixbench-syscall-1 2973045.43 ( 0.00%) 2982036.13 * -0.30%*
Amean unixbench-syscall-512 7826302.17 ( 0.00%) 8173026.57 * -4.43%* <-- Regression in syscall for larger worker count
CoeffVar unixbench-syscall-1 0.04 ( 0.00%) 0.09 (-139.63%)
CoeffVar unixbench-syscall-512 0.03 ( 0.00%) 0.20 (-701.13%)

-> unixbench-spawn

o NPS1

kernel: tip sis_core
Min unixbench-spawn-1 6536.50 ( 0.00%) 6000.30 ( -8.20%)
Min unixbench-spawn-512 72571.40 ( 0.00%) 70829.60 ( -2.40%)
Hmean unixbench-spawn-1 6811.16 ( 0.00%) 7016.11 ( 3.01%)
Hmean unixbench-spawn-512 72801.77 ( 0.00%) 71012.03 * -2.46%*
CoeffVar unixbench-spawn-1 3.69 ( 0.00%) 13.52 (-266.69%)
CoeffVar unixbench-spawn-512 0.27 ( 0.00%) 0.22 ( 18.25%)

o NPS2

kernel: tip sis_core
Min unixbench-spawn-1 7042.20 ( 0.00%) 7078.70 ( 0.52%)
Min unixbench-spawn-512 85571.60 ( 0.00%) 77362.60 ( -9.59%)
Hmean unixbench-spawn-1 7199.01 ( 0.00%) 7276.55 ( 1.08%)
Hmean unixbench-spawn-512 85717.77 ( 0.00%) 77923.73 * -9.09%* <-- Regression in spawn test for larger worker count
CoeffVar unixbench-spawn-1 3.50 ( 0.00%) 3.30 ( 5.70%)
CoeffVar unixbench-spawn-512 0.20 ( 0.00%) 0.82 (-304.88%)

o NPS4

kernel: tip sis_core
Min unixbench-spawn-1 7521.90 ( 0.00%) 8102.80 ( 7.72%)
Min unixbench-spawn-512 84245.70 ( 0.00%) 73074.50 ( -13.26%)
Hmean unixbench-spawn-1 7659.12 ( 0.00%) 8645.19 * 12.87%*
Hmean unixbench-spawn-512 84908.77 ( 0.00%) 73409.49 * -13.54%* <-- Regression in spawn test for larger worker count
CoeffVar unixbench-spawn-1 1.92 ( 0.00%) 5.78 (-200.56%)
CoeffVar unixbench-spawn-512 0.76 ( 0.00%) 0.41 ( 46.58%)

...

For unixbench regressions, I do not see anything obvious jump up
in perf traces captureed with IBS. top shows over 99% utilization
which would ideally mean there are not many updates to the mask.
I'll take some more look at the spawn test case and get back to you.

These regressions seems to be common in full parallel tests. I
guess it might be due to over updating the idle cpumask when LLC
is overloaded which is not necessary if SIS_UTIL enabled, but I
need to dig it further. Maybe the rq avg_idle or nr_idle_scan need
to be taken into consideration as well. Thanks for providing these
important information.

~~~~~~~~~~~~~
~ Hackbench ~
~~~~~~~~~~~~~

$ perf bench sched messaging -p -l 50000 -g <groups>

o NPS1

kernel: tip sis_core
32-groups: 6.20 (0.00 pct) 5.86 (5.48 pct)
64-groups: 16.55 (0.00 pct) 15.21 (8.09 pct)
128-groups: 42.57 (0.00 pct) 34.63 (18.65 pct)
256-groups: 71.69 (0.00 pct) 67.11 (6.38 pct)
512-groups: 108.48 (0.00 pct) 110.23 (-1.61 pct)

o NPS2

kernel: tip sis_core
32-groups: 6.56 (0.00 pct) 5.60 (14.63 pct)
64-groups: 15.74 (0.00 pct) 14.45 (8.19 pct)
128-groups: 39.93 (0.00 pct) 35.33 (11.52 pct)
256-groups: 74.49 (0.00 pct) 69.65 (6.49 pct)
512-groups: 112.22 (0.00 pct) 113.75 (-1.36 pct)

o NPS4:

kernel: tip sis_core
32-groups: 9.48 (0.00 pct) 5.64 (40.50 pct)
64-groups: 15.38 (0.00 pct) 14.13 (8.12 pct)
128-groups: 39.93 (0.00 pct) 34.47 (13.67 pct)
256-groups: 75.31 (0.00 pct) 67.98 (9.73 pct)
512-groups: 115.37 (0.00 pct) 111.15 (3.65 pct)

Note: Hackbench with 32-groups show run to run variation
on tip but is more stable with sis_core. Hackbench for
64-groups and beyond is stable on both kernels.

The result is consistent with mine except 512-groups which I
didn't test. The 512-groups test may have the same problem
aforementioned.

Thanks & Regards,
Abel

Next message: Wang Honghui: "[PATCH 2/3]mailbox: Add to support Phytium FT2000/4 CPU"
Previous message: Macpaul Lin: "Re: [PATCH v4] arm64: dts: mt8195: Add Ethernet controller"
In reply to: K Prateek Nayak: "Re: [PATCH v6 0/4] sched/fair: Improve scan efficiency of SIS"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]