Re: [PATCH 07/11] cxl/mem: Trace Memory Module Event Record

From: Jonathan Cameron
Date: Thu Nov 17 2022 - 06:22:44 EST


On Wed, 16 Nov 2022 17:23:58 -0800
Ira Weiny <ira.weiny@xxxxxxxxx> wrote:

> On Wed, Nov 16, 2022 at 03:35:28PM +0000, Jonathan Cameron wrote:
> > On Thu, 10 Nov 2022 10:57:54 -0800
> > ira.weiny@xxxxxxxxx wrote:
> >
> > > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> > >
> > > CXL rev 3.0 section 8.2.9.2.1.3 defines the Memory Module Event Record.
> > >
> > > Determine if the event read is memory module record and if so trace the
> > > record.
> > >
> > > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> > >
> > Noticed that we have a mixture of fully capitalized and not for flags.
> > With that either explained or tidied up:
> >
> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx>
> >
> > > +/*
> > > + * Device Health Information - DHI
> > > + *
> > > + * CXL res 3.0 section 8.2.9.8.3.1; Table 8-100
> > > + */
> > > +#define CXL_DHI_HS_MAINTENANCE_NEEDED BIT(0)
> > > +#define CXL_DHI_HS_PERFORMANCE_DEGRADED BIT(1)
> > > +#define CXL_DHI_HS_HW_REPLACEMENT_NEEDED BIT(2)
> > > +#define show_health_status_flags(flags) __print_flags(flags, "|", \
> > > + { CXL_DHI_HS_MAINTENANCE_NEEDED, "Maintenance Needed" }, \
> > > + { CXL_DHI_HS_PERFORMANCE_DEGRADED, "Performance Degraded" }, \
> > > + { CXL_DHI_HS_HW_REPLACEMENT_NEEDED, "Replacement Needed" } \
> >
> > Why are we sometime using capitals for flags (e.g patch 5) and not other times?
>
> Not sure what you mean. Do you mean this from patch 5?
Nope

+#define CXL_DPA_VOLATILE BIT(0)
+#define CXL_DPA_NOT_REPAIRABLE BIT(1)
+#define show_dpa_flags(flags) __print_flags(flags, "|", \
+ { CXL_DPA_VOLATILE, "VOLATILE" }, \
+ { CXL_DPA_NOT_REPAIRABLE, "NOT_REPAIRABLE" } \
+)
+

Where they are all capitals. I thought that was maybe a flags vs other fields
thing but it doesn't seem to be.


>
> ...
> { CXL_GMER_EVT_DESC_UNCORECTABLE_EVENT, "Uncorrectable Event" }, \
> { CXL_GMER_EVT_DESC_THRESHOLD_EVENT, "Threshold event" }, \
> { CXL_GMER_EVT_DESC_POISON_LIST_OVERFLOW, "Poison List Overflow" } \
> ...
>
> Threshold event was a mistake. This is the capitalization the spec uses.
>
> Bit[0]: Uncorrectable Event: When set, indicates the reported event is
> ^^^^^^^^^^^^^^^^^^^
> uncorrectable by the device. When cleared, indicates the reported
> event was corrected by the device.
>
> Bit[1]: Threshold Event: When set, the event is the result of a
> ^^^^^^^^^^^^^^^
> threshold on the device having been reached. When cleared, the event
> is not the result of a threshold limit.
>
> Bit[2]: Poison List Overflow Event: When set, the Poison List has
> ^^^^^^^^^^^^^^^^^^^^^^^^^^
> overflowed, and this event is not in the Poison List. When cleared, the
> Poison List has not overflowed.
>
>
> I'll update this 'Event' in patch 5. Probably need to add 'Event' to the
> Poison List...
>
> Ira