Re: [PATCH v3 1/2] x86/resctrl: IPI all CPUs for group updates

From: Reinette Chatre
Date: Mon Nov 21 2022 - 16:53:38 EST


Hi Peter,

On 11/15/2022 6:19 AM, Peter Newman wrote:
> To rule out needing to update a CPU when deleting an rdtgroup, we must

Please do not impersonate code in changelog and comments (do not
use "we"). This is required for resctrl changes to be considered for
inclusion because resctrl patches are routed via the "tip" repo
and is thus required to follow the "tip tree handbook"
(Documentation/process/maintainer-tip.rst). Please also
stick to a clear "context-problem-solution" changelog as is the custom
in this area.

> search the entire tasklist for group members which could be running on
> that CPU. This needs to be done while blocking updates to the tasklist
> to avoid leaving newly-created child tasks assigned to the old
> CLOSID/RMID.

This is not clear to me. rdt_move_group_tasks() obtains a read lock,
read_lock(&tasklist_lock), so concurrent modifications to the tasklist
are indeed possible. Should this perhaps be write_lock() instead?
It sounds like the scenario you are describing may be a concern. That is,
if a task belonging to a group that is being removed happens to
call fork()/clone() during the move then the child may end up being
created with old closid.

>
> The cost of reliably propagating a CLOSID or RMID update to a single
> task is higher than originally thought. The present understanding is
> that we must obtain the task_rq_lock() on each task to ensure that it
> observes CLOSID/RMID updates in the case that it migrates away from its
> current CPU before the update IPI reaches it.

I find this confusing since it describes why a potential solution does
not solve a problem, neither problem nor solution is well described at this
point.

What if you switch the order of the two patches? Patch #2 provides
the potential solution mentioned here so that may be helpful to have as
reference in this changelog.

> For now, just notify all the CPUs after updating the closid/rmid fields

For now? If you anticipate changes then there should be a plan for that,
otherwise this is the fix without further speculation.

> in impacted tasks task_structs rather than paying the cost of obtaining
> a more precise cpu mask.

s/cpu/CPU/
It may be helpful to add that an accurate CPU mask cannot be guaranteed and
the more tasks moved the less accurate it could be (if I understand correctly).

>
> Signed-off-by: Peter Newman <peternewman@xxxxxxxxxx>
> Reviewed-by: James Morse <james.morse@xxxxxxx>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 52 +++++++-------------------
> 1 file changed, 13 insertions(+), 39 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index e5a48f05e787..049971efea2f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -2385,12 +2385,10 @@ static int reset_all_ctrls(struct rdt_resource *r)
> * Move tasks from one to the other group. If @from is NULL, then all tasks
> * in the systems are moved unconditionally (used for teardown).
> *
> - * If @mask is not NULL the cpus on which moved tasks are running are set
> - * in that mask so the update smp function call is restricted to affected
> - * cpus.
> + * Following this operation, the caller is required to update the MSRs on all
> + * CPUs.
> */

On x86 only one MSR needs updating, the PQR_ASSOC MSR. The above could be
summarized as:
"Caller should update per CPU storage and PQR_ASSOC."

> -static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
> - struct cpumask *mask)
> +static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to)
> {
> struct task_struct *p, *t;
>
> @@ -2400,16 +2398,6 @@ static void rdt_move_group_tasks(struct rdtgroup *from, struct rdtgroup *to,
> is_rmid_match(t, from)) {
> WRITE_ONCE(t->closid, to->closid);
> WRITE_ONCE(t->rmid, to->mon.rmid);
> -
> - /*
> - * If the task is on a CPU, set the CPU in the mask.
> - * The detection is inaccurate as tasks might move or
> - * schedule before the smp function call takes place.
> - * In such a case the function call is pointless, but
> - * there is no other side effect.
> - */
> - if (IS_ENABLED(CONFIG_SMP) && mask && task_curr(t))
> - cpumask_set_cpu(task_cpu(t), mask);
> }
> }
> read_unlock(&tasklist_lock);
> @@ -2440,7 +2428,7 @@ static void rmdir_all_sub(void)
> struct rdtgroup *rdtgrp, *tmp;
>
> /* Move all tasks to the default resource group */
> - rdt_move_group_tasks(NULL, &rdtgroup_default, NULL);
> + rdt_move_group_tasks(NULL, &rdtgroup_default);
>
> list_for_each_entry_safe(rdtgrp, tmp, &rdt_all_groups, rdtgroup_list) {
> /* Free any child rmids */
> @@ -3099,23 +3087,19 @@ static int rdtgroup_mkdir(struct kernfs_node *parent_kn, const char *name,
> return -EPERM;
> }
>
> -static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> +static int rdtgroup_rmdir_mon(struct rdtgroup *rdtgrp)
> {
> struct rdtgroup *prdtgrp = rdtgrp->mon.parent;
> int cpu;
>
> /* Give any tasks back to the parent group */
> - rdt_move_group_tasks(rdtgrp, prdtgrp, tmpmask);
> + rdt_move_group_tasks(rdtgrp, prdtgrp);
>
> /* Update per cpu rmid of the moved CPUs first */
> for_each_cpu(cpu, &rdtgrp->cpu_mask)
> per_cpu(pqr_state.default_rmid, cpu) = prdtgrp->mon.rmid;
> - /*
> - * Update the MSR on moved CPUs and CPUs which have moved
> - * task running on them.
> - */
> - cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
> - update_closid_rmid(tmpmask, NULL);
> +
> + update_closid_rmid(cpu_online_mask, NULL);
>
> rdtgrp->flags = RDT_DELETED;
> free_rmid(rdtgrp->mon.rmid);
> @@ -3140,12 +3124,12 @@ static int rdtgroup_ctrl_remove(struct rdtgroup *rdtgrp)
> return 0;
> }
>
> -static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> +static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp)
> {
> int cpu;
>
> /* Give any tasks back to the default group */
> - rdt_move_group_tasks(rdtgrp, &rdtgroup_default, tmpmask);
> + rdt_move_group_tasks(rdtgrp, &rdtgroup_default);
>
> /* Give any CPUs back to the default group */
> cpumask_or(&rdtgroup_default.cpu_mask,
> @@ -3157,12 +3141,7 @@ static int rdtgroup_rmdir_ctrl(struct rdtgroup *rdtgrp, cpumask_var_t tmpmask)
> per_cpu(pqr_state.default_rmid, cpu) = rdtgroup_default.mon.rmid;
> }
>
> - /*
> - * Update the MSR on moved CPUs and CPUs which have moved
> - * task running on them.
> - */
> - cpumask_or(tmpmask, tmpmask, &rdtgrp->cpu_mask);
> - update_closid_rmid(tmpmask, NULL);
> + update_closid_rmid(cpu_online_mask, NULL);
>
> closid_free(rdtgrp->closid);
> free_rmid(rdtgrp->mon.rmid);
> @@ -3181,12 +3160,8 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
> {
> struct kernfs_node *parent_kn = kn->parent;
> struct rdtgroup *rdtgrp;
> - cpumask_var_t tmpmask;
> int ret = 0;
>
> - if (!zalloc_cpumask_var(&tmpmask, GFP_KERNEL))
> - return -ENOMEM;
> -
> rdtgrp = rdtgroup_kn_lock_live(kn);
> if (!rdtgrp) {
> ret = -EPERM;
> @@ -3206,18 +3181,17 @@ static int rdtgroup_rmdir(struct kernfs_node *kn)
> rdtgrp->mode == RDT_MODE_PSEUDO_LOCKED) {
> ret = rdtgroup_ctrl_remove(rdtgrp);
> } else {
> - ret = rdtgroup_rmdir_ctrl(rdtgrp, tmpmask);
> + ret = rdtgroup_rmdir_ctrl(rdtgrp);
> }
> } else if (rdtgrp->type == RDTMON_GROUP &&
> is_mon_groups(parent_kn, kn->name)) {
> - ret = rdtgroup_rmdir_mon(rdtgrp, tmpmask);
> + ret = rdtgroup_rmdir_mon(rdtgrp);
> } else {
> ret = -EPERM;
> }
>
> out:
> rdtgroup_kn_unlock(kn);
> - free_cpumask_var(tmpmask);
> return ret;
> }
>

The fix looks good to me. I do think its motivation and description
needs to improve to make it palatable to folks not familiar with this area.

Reinette