Re: [PATCH v3 2/2] x86/resctrl: update task closid/rmid with task_call_func()

From: Reinette Chatre
Date: Mon Nov 21 2022 - 16:59:29 EST


Hi Peter,

Patch description in subject should start with upper case.

On 11/15/2022 6:19 AM, Peter Newman wrote:
> When determining whether running tasks need to be interrupted due to a
> closid/rmid change, it was possible for the task in question to migrate
> or wake up concurrently without observing the updated values.

Mixing tenses can quickly become confusing. Please stick to imperative tone.

Also, please start with the context of this work before jumping to
the problem description.

For example (not a requirement to use - feel free to change):
"A task is moved to a resource group when its task id is written to the
destination resource group's "tasks" file. Moving a task to a new
resource group involves updating the task's closid and rmid (found
in its task_struct) and updating the PQR_ASSOC MSR if the task
is current on a CPU.

It is possible for the task to migrate or wake up while it is moved
to a new resource group. In this scenario the task starts running
with the old closid and rmid values but it may not be considered
as running and thus continue running with the old values until it is
rescheduled."

>
> In particular, __rdtgroup_move_task() assumed that a CPU migrating
> implied that it observed the updated closid/rmid. This assumption is
> broken by the following reorderings, both of which are allowed on x86:
>
> 1. In __rdtgroup_move_task(), stores updating the closid and rmid in
> the task structure could reorder with the loads in task_curr() and
> task_cpu().
> 2. In resctrl_sched_in(), the earlier stores to the fields read by
> task_curr() could be delayed until after the loads from
> t->{closid,rmid}.
>
> Preventing this reordering with barriers would have required an smp_mb()
> in all context switches whenever resctrlfs is mounted. Instead, when
> moving a single task, use task_call_func() to serialize updates to the
> closid and rmid fields in the task_struct with context switch.

Please adjust the above to imperative tone.

>
> Signed-off-by: Peter Newman <peternewman@xxxxxxxxxx>
> Reviewed-by: James Morse <james.morse@xxxxxxx>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 76 +++++++++++++++-----------
> 1 file changed, 45 insertions(+), 31 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 049971efea2f..511b7cea143f 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -538,10 +538,47 @@ static void _update_task_closid_rmid(void *task)
> resctrl_sched_in();
> }
>
> -static void update_task_closid_rmid(struct task_struct *t)
> +static int update_locked_task_closid_rmid(struct task_struct *t, void *arg)
> {
> - if (IS_ENABLED(CONFIG_SMP) && task_curr(t))
> - smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
> + struct rdtgroup *rdtgrp = arg;
> +
> + /*
> + * We assume task_call_func() has provided the necessary serialization
> + * with resctrl_sched_in().

Please no "we".

Also, either task_call_func() provides serialization or it does not. Wording
it as "assume" creates uncertainty about this change.

> + */
> + if (rdtgrp->type == RDTCTRL_GROUP) {
> + t->closid = rdtgrp->closid;
> + t->rmid = rdtgrp->mon.rmid;
> + } else if (rdtgrp->type == RDTMON_GROUP) {
> + t->rmid = rdtgrp->mon.rmid;
> + }

wrt the READ_ONCE() in __resctrl_sched_in() ... memory_barriers.txt tells me
that "When dealing with CPU-CPU interactions, certain types of memory barrier
should always be paired. A lack of appropriate pairing is almost certainly
an error."

> +
> + /*
> + * If the task is current on a CPU, the PQR_ASSOC MSR needs to be
> + * updated to make the resource group go into effect. If the task is not
> + * current, the MSR will be updated when the task is scheduled in.
> + */
> + return task_curr(t);
> +}
> +
> +static void update_task_closid_rmid(struct task_struct *t,
> + struct rdtgroup *rdtgrp)
> +{
> + /*
> + * Serialize the closid and rmid update with context switch. If this
> + * function indicates that the task was running, then it needs to be

What does "this function" refer to? Please replace with function name to be
specific since there are a few functions below.

/was running/is running/?

> + * interrupted to install the new closid and rmid.
> + */
> + if (task_call_func(t, update_locked_task_closid_rmid, rdtgrp) &&
> + IS_ENABLED(CONFIG_SMP))
> + /*
> + * If the task has migrated away from the CPU indicated by
> + * task_cpu() below, then it has already switched in on the
> + * new CPU using the updated closid and rmid and the call below
> + * unnecessary, but harmless.

is unnecessary ?

> + */
> + smp_call_function_single(task_cpu(t),
> + _update_task_closid_rmid, t, 1);
> else
> _update_task_closid_rmid(t);
> }

Could you please keep update_task_closid_rmid() and
_update_task_closid_rmid() together?

> @@ -557,39 +594,16 @@ static int __rdtgroup_move_task(struct task_struct *tsk,
> return 0;
>
> /*
> - * Set the task's closid/rmid before the PQR_ASSOC MSR can be
> - * updated by them.
> - *
> - * For ctrl_mon groups, move both closid and rmid.
> * For monitor groups, can move the tasks only from
> * their parent CTRL group.
> */
> -
> - if (rdtgrp->type == RDTCTRL_GROUP) {
> - WRITE_ONCE(tsk->closid, rdtgrp->closid);
> - WRITE_ONCE(tsk->rmid, rdtgrp->mon.rmid);
> - } else if (rdtgrp->type == RDTMON_GROUP) {
> - if (rdtgrp->mon.parent->closid == tsk->closid) {
> - WRITE_ONCE(tsk->rmid, rdtgrp->mon.rmid);
> - } else {
> - rdt_last_cmd_puts("Can't move task to different control group\n");
> - return -EINVAL;
> - }
> + if (rdtgrp->type == RDTMON_GROUP &&
> + rdtgrp->mon.parent->closid != tsk->closid) {
> + rdt_last_cmd_puts("Can't move task to different control group\n");
> + return -EINVAL;
> }
>
> - /*
> - * Ensure the task's closid and rmid are written before determining if
> - * the task is current that will decide if it will be interrupted.
> - */
> - barrier();
> -
> - /*
> - * By now, the task's closid and rmid are set. If the task is current
> - * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
> - * group go into effect. If the task is not current, the MSR will be
> - * updated when the task is scheduled in.
> - */
> - update_task_closid_rmid(tsk);
> + update_task_closid_rmid(tsk, rdtgrp);
>
> return 0;
> }

Reinette