Re: [PATCH v8 10/13] x86/resctrl: Add sysfs interface to write mbm_total_bytes_config
From: Reinette Chatre
Date: Tue Nov 22 2022 - 19:22:39 EST
Hi Babu,
On 11/4/2022 1:01 PM, Babu Moger wrote:
> The current event configuration for mbm_total_bytes can be changed by
> the user by writing to the file
> /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config.
>
> The event configuration settings are domain specific and will affect all
> the CPUs in the domain.
>
> Following are the types of events supported:
>
> ==== ===========================================================
> Bits Description
> ==== ===========================================================
> 6 Dirty Victims from the QOS domain to all types of memory
> 5 Reads to slow memory in the non-local NUMA domain
> 4 Reads to slow memory in the local NUMA domain
> 3 Non-temporal writes to non-local NUMA domain
> 2 Non-temporal writes to local NUMA domain
> 1 Reads to memory in the non-local NUMA domain
> 0 Reads to memory in the local NUMA domain
> ==== ===========================================================
>
> For example:
> To change the mbm_total_bytes to count only reads on domain 0, the bits
> 0, 1, 4 and 5 needs to be set, which is 110011b (in hex 0x33). Run the
> command.
> $echo 0=0x33 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>
> To change the mbm_total_bytes to count all the slow memory reads on
> domain 1, the bits 4 and 5 needs to be set which is 110000b (in hex 0x30).
> Run the command.
> $echo 1=0x30 > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
>
> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
> ---
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 130 ++++++++++++++++++++++++++++++++
> 1 file changed, 129 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> index 18f9588a41cf..0cdccb69386e 100644
> --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
> @@ -1505,6 +1505,133 @@ static int mbm_local_bytes_config_show(struct kernfs_open_file *of,
> return 0;
> }
>
> +static void mon_event_config_write(void *info)
> +{
> + struct mon_config_info *mon_info = info;
> + u32 index;
> +
> + index = mon_event_config_index_get(mon_info->evtid);
> + if (index >= MAX_CONFIG_EVENTS) {
> + pr_warn_once("Invalid event id %d\n", mon_info->evtid);
> + return;
> + }
> + wrmsr(MSR_IA32_EVT_CFG_BASE + index, mon_info->mon_config, 0);
> +}
> +
> +static int mbm_config_write(struct rdt_resource *r, struct rdt_domain *d,
> + u32 evtid, u32 val)
> +{
> + struct mon_config_info mon_info = {0};
> + int ret = 0;
> +
> + rdt_last_cmd_clear();
> +
Why is this extra clear() needed?
> + /* mon_config cannot be more than the supported set of events */
> + if (val > MAX_EVT_CONFIG_BITS) {
> + rdt_last_cmd_puts("Invalid event configuration\n");
> + return -EINVAL;
> + }
> +
> + /*
> + * Read the current config value first. If both are same then
> + * we don't need to write it again.
Please no "we". Maybe just "If both are the same then no need to write it again."
> + */
> + mon_info.evtid = evtid;
> + mondata_config_read(d, &mon_info);
Here I see motivation for doing validity check in mondata_config_read() as
mentioned in feedback for patch #8. If hardware decides to use the other bits
in that MSR then the check below would have trouble.
> + if (mon_info.mon_config == val)
> + goto write_exit;
> +
Could you please follow the custom in this area? Please see goto usage in the rest
of the file that you are changing. The label should reflect the action being
jumped to. In that sense, "write_exit" is not clear. A simple "goto out"
would be clear and matches usage in rest of file.
> + mon_info.mon_config = val;
> +
> + /*
> + * Update MSR_IA32_EVT_CFG_BASE MSRs on all the CPUs in the
> + * domain. The MSRs offset from MSR MSR_IA32_EVT_CFG_BASE
> + * are scoped at the domain level. Writing any of these MSRs
> + * on one CPU is supposed to be observed by all CPUs in the
> + * domain. However, the hardware team recommends to update
> + * these MSRs on all the CPUs in the domain.
> + */
> + on_each_cpu_mask(&d->cpu_mask, mon_event_config_write, &mon_info, 1);
> +
> + /*
> + * When an Event Configuration is changed, the bandwidth counters
> + * for all RMIDs and Events will be cleared by the hardware. The
> + * hardware also sets MSR_IA32_QM_CTR.Unavailable (bit 62) for
> + * every RMID on the next read to any event for every RMID.
> + * Subsequent reads will have MSR_IA32_QM_CTR.Unavailable (bit 62)
> + * cleared while it is tracked by the hardware. Clear the
> + * mbm_local and mbm_total counts for all the RMIDs.
> + */
> + memset(d->mbm_local, 0, sizeof(struct mbm_state) * r->num_rmid);
> + memset(d->mbm_total, 0, sizeof(struct mbm_state) * r->num_rmid);
> +
> +write_exit:
> + return ret;
> +}
> +
> +static int mon_config_parse(struct rdt_resource *r, char *tok, u32 evtid)
> +{
> + char *dom_str = NULL, *id_str;
> + unsigned long dom_id, val;
> + struct rdt_domain *d;
> + int ret = 0;
> +
> +next:
> + if (!tok || tok[0] == '\0')
> + return 0;
> +
> + /* Start processing the strings for each domain */
> + dom_str = strim(strsep(&tok, ";"));
> + id_str = strsep(&dom_str, "=");
> +
> + if (!dom_str || kstrtoul(id_str, 10, &dom_id)) {
> + rdt_last_cmd_puts("Missing '=' or non-numeric domain id\n");
> + return -EINVAL;
> + }
> +
> + if (!dom_str || kstrtoul(dom_str, 16, &val)) {
> + rdt_last_cmd_puts("Missing '=' or non-numeric event configuration value\n");
> + return -EINVAL;
> + }
There is some duplication above ... both if () statememts
check for "!dom_str" - is one intended to be "!id_str"?
Could both checks really mean that a "=" may be missing?
> +
> + list_for_each_entry(d, &r->domains, list) {
> + if (d->id == dom_id) {
> + ret = mbm_config_write(r, d, evtid, val);
> + if (ret)
> + return -EINVAL;
> + goto next;
> + }
> + }
> +
> + return -EINVAL;
> +}
> +
> +static ssize_t mbm_total_bytes_config_write(struct kernfs_open_file *of,
> + char *buf, size_t nbytes,
> + loff_t off)
> +{
> + struct rdt_resource *r = of->kn->parent->priv;
> + int ret;
> +
> + /* Valid input requires a trailing newline */
> + if (nbytes == 0 || buf[nbytes - 1] != '\n')
> + return -EINVAL;
> +
> + cpus_read_lock();
> + mutex_lock(&rdtgroup_mutex);
> +
> + rdt_last_cmd_clear();
> +
> + buf[nbytes - 1] = '\0';
> +
> + ret = mon_config_parse(r, buf, QOS_L3_MBM_TOTAL_EVENT_ID);
> +
The naming here does not reflect what is done ... much more than
parsing is done here.
How about renaming mon_config_parse() to mon_config_write(), and
renaming mon_config_write() to mon_config_write_domain() ?
Reinette