Re: [PATCH] arch/x86/events/amd/core.c: Return -ENODEV when CPU does not have PERFCTL_CORE bit
From: Sandipan Das
Date: Wed Nov 23 2022 - 00:42:10 EST
On 11/14/2022 4:22 PM, Sandipan Das wrote:
> On 11/13/2022 4:03 AM, Liang Yan wrote:
>>
>> On 10/31/22 10:28, Sandipan Das wrote:
>>> Hi Liang, Peter,
>>>
>>> On 10/31/2022 6:29 PM, Peter Zijlstra wrote:
>>>> On Thu, Oct 27, 2022 at 09:35:11AM -0400, Liang Yan wrote:
>>>>> After disabling cpu.perfctr_core in qemu, I noticed that the guest kernel
>>>>> still loads the pmu driver while the cpuid does not have perfctl_core.
>>>>>
>>>>> The test is running on an EPYC Rome machine.
>>>>> root@ubuntu-s-4vcpu-8gb-amd-nyc1-01:~# lscpu | grep perfctl
>>>>> root@ubuntu-s-4vcpu-8gb-amd-nyc1-01:~#
>>>>> root@ubuntu-s-4vcpu-8gb-amd-nyc1-01:~# dmesg | grep PMU
>>>>> [ 0.732097] Performance Events: AMD PMU driver.
>>>>>
>>>>> By further looking,
>>>>>
>>>>> ==> init_hw_perf_events
>>>>> ==> amd_pmu_init
>>>>> ==> amd_core_pmu_init
>>>>> ==>
>>>>> if (!boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
>>>>> return 0;
>>>>>
>>>>> With returning 0, it will bypass amd_pmu_init and return 0 to
>>>>> init_hw_perf_events, and continue the initialization.
>>>>>
>>>>> I am not a perf expert and not sure if it is expected for AMD PMU,
>>>>> otherwise, it would be nice to return -ENODEV instead.
>>>>>
>>>>> New output after the change:
>>>>> root@ubuntu-s-4vcpu-8gb-amd-nyc1-01:~# dmesg | grep PMU
>>>>> [ 0.531609] Performance Events: no PMU driver, software events only.
>>>>>
>>>>> Signed-off-by: Liang Yan <lyan@xxxxxxxxxxxxxxx>
>>>> Looks about right, Ravi?
>>>>
>>>>> ---
>>>>> arch/x86/events/amd/core.c | 2 +-
>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
>>>>> index 8b70237c33f7..34d3d2944020 100644
>>>>> --- a/arch/x86/events/amd/core.c
>>>>> +++ b/arch/x86/events/amd/core.c
>>>>> @@ -1335,7 +1335,7 @@ static int __init amd_core_pmu_init(void)
>>>>> int i;
>>>>> if (!boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
>>>>> - return 0;
>>>>> + return -ENODEV;
>>>>>
>>> There are four legacy counters that are always available even when PERFCTR_CORE
>>> is absent. This is why the code returns 0 here. I found this to be a bit confusing
>>> as well during PerfMonV2 development so I wrote the following patch but forgot to
>>> send it out.
>>
>>
>> Hi Sandipan,
>>
>> Thanks for the classification.
>> Do these legacy counters belong to the AMD PMU property from a VM perspective? I mean, if I want to disable PMU for an AMD vcpu for some reason, is it possible to disable perfctr_core and the four counters, or is this not logical since the four counters could not be disabled from the bare-metal level?
>> I asked because I saw 'pmu' could be disabled for Intel and ARM, but it seems not for AMD.
>>
>
> From what I see, the four legacy counters are not tied to any processor
> properties (e.g. CPUID bits). Disabling "perfctr-core" only brings the
> number of supported core counters down to 4 from 6. So guests exhibit
> the same behaviour as bare-metal where the legacy counters are used if
> CPUID 0x80000001[ECX].PerfCtrExtCore is not set.
>
> The "pmu" property only overrides guest CPUID. Hence it is not possible
> to prevent the discovery of the legacy counters using that.
>
Following up on this:
KVM has an "enable_pmu" parameter which when disabled can turn off guest PMC
access completely.
Here's how it works:
Upon setting enable_pmu=0, the PMC MSR interceptions fail. The SVM code also
takes care of clearing the PerfCtrExtCore bit for the guest CPUID (see
svm_set_cpu_caps() in arch/x86/kvm/svm/svm.c).
During PMU initialization, check_hw_exists() from arch/x86/events/core.c tests
if all the required PMC MSRs are accessible by reading them. For a guest, this
fails due to an exception and stops hardware PMU initialization. At this point,
the guest kernel continues with just software events.
>> Also, could you please list the four legacy counters here?
>>
>
> The MSRs for the four legacy counters are:
> 0xc001000[0..3] known as PERF_LEGACY_CTL[0..3], alias of PERF_CTL[0..3]
> 0xc001000[4..7] known as PERF_LEGACY_CTR[0..3], alias of PERF_CTR[0..3]
>
> You can find more details in the Processor Programming Reference (PPR) that
> is appropriate for the AMD processor that you are using. PPRs can be found
> at: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
>>
>>
>>> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
>>> index 262e39a85031..d3eb7b2f4dda 100644
>>> --- a/arch/x86/events/amd/core.c
>>> +++ b/arch/x86/events/amd/core.c
>>> @@ -1345,6 +1345,14 @@ static int __init amd_core_pmu_init(void)
>>> u64 even_ctr_mask = 0ULL;
>>> int i;
>>> + /*
>>> + * All processors support four PMCs even when X86_FEATURE_PERFCTR_CORE
>>> + * is unavailable. They are programmable via the PERF_LEGACY_CTLx and
>>> + * PERF_LEGACY_CTRx registers which have the same address as that of
>>> + * MSR_K7_EVNTSELx and MSR_K7_PERFCTRx. For Family 17h+, these are
>>> + * legacy aliases of PERF_CTLx and PERF_CTRx respectively. Hence, not
>>> + * returning -ENODEV here.
>>> + */
>>> if (!boot_cpu_has(X86_FEATURE_PERFCTR_CORE))
>>> return 0;
>>>
>>>
>>> If this looks good to you, I will post it.
>>>
>>> - Sandipan
>>>
>