Re: [PATCH 02/11] cxl/mem: Implement Get Event Records command

From: Ira Weiny
Date: Wed Nov 16 2022 - 19:49:45 EST


On Wed, Nov 16, 2022 at 03:19:36PM +0000, Jonathan Cameron wrote:
> On Thu, 10 Nov 2022 10:57:49 -0800
> ira.weiny@xxxxxxxxx wrote:
>
> > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> >
> > CXL devices have multiple event logs which can be queried for CXL event
> > records. Devices are required to support the storage of at least one
> > event record in each event log type.
> >
> > Devices track event log overflow by incrementing a counter and tracking
> > the time of the first and last overflow event seen.
> >
> > Software queries events via the Get Event Record mailbox command; CXL
> > rev 3.0 section 8.2.9.2.2.
> >
> > Issue the Get Event Record mailbox command on driver load. Trace each
> > record found with a generic record trace. Trace any overflow
> > conditions.
> >
> > The device can return up to 1MB worth of event records per query. This
> > presents complications with allocating a huge buffers to potentially
> > capture all the records. It is not anticipated that these event logs
> > will be very deep and reading them does not need to be performant.
> > Process only 3 records at a time. 3 records was chosen as it fits
> > comfortably on the stack to prevent dynamic allocation while still
> > cutting down on extra mailbox messages.
> >
> > This patch traces a raw event record only and leaves the specific event
> > record types to subsequent patches.
> >
> > Macros are created to use for tracing the common CXL Event header
> > fields.
> >
> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
>
> Hi Ira,
>
> A question inline about whether some of the conditions you are checking
> for can actually happen. Otherwise looks good to me.
>
> Jonathan
>

[snip]

> > +static void cxl_mem_get_records_log(struct cxl_dev_state *cxlds,
> > + enum cxl_event_log_type type)
> > +{
> > + struct cxl_get_event_payload payload;
> > + u16 pl_nr;
> > +
> > + do {
> > + u8 log_type = type;
> > + int rc;
> > +
> > + rc = cxl_mbox_send_cmd(cxlds, CXL_MBOX_OP_GET_EVENT_RECORD,
> > + &log_type, sizeof(log_type),
> > + &payload, sizeof(payload));
> > + if (rc) {
> > + dev_err(cxlds->dev, "Event log '%s': Failed to query event records : %d",
> > + cxl_event_log_type_str(type), rc);
> > + return;
> > + }
> > +
> > + pl_nr = le16_to_cpu(payload.record_count);
> > + if (trace_cxl_generic_event_enabled()) {
> > + u16 nr_rec = min_t(u16, pl_nr, CXL_GET_EVENT_NR_RECORDS);
>
> Either I'm misreading the spec, or it can't be greater than NR_RECORDS.

Well... I could have read the spec wrong as well. But after reading very
carefully I think this is actually correct.

> "The number of event records in the Event Records list...."

Where is this quote from? I don't see that in the spec.

> Event Records being the field inside this payload which is not big enough to
> take more than CXL_GET_EVENT_NR_RECORDS and the intro to Get Event Records
> refers to the number being restricted by the mailbox output payload provided.

My understanding is that the output payload is only limited by the Payload Size
reported in the Mailbox Capability Register.Payload Size. (Section 8.2.8.4.3)

This can be up to 1MB. So the device could fill up to 1MB's worth of Event
Records while still being in compliance. The generic mailbox code in the
driver caps the data based on the size passed into cxl_mbox_send_cmd() however,
the number of records reported is not changed.

>
> I'm in favor of defense against broken hardware, but don't paper over any
> such error - scream about it.

I don't think this is out of spec unless the device is trying to write more
than 1MB and I think the core mailbox code will scream about that.

>
> > + int i;
> > +
> > + for (i = 0; i < nr_rec; i++)
> > + trace_cxl_generic_event(dev_name(cxlds->dev),
> > + type,
> > + &payload.record[i]);
> > + }
> > +
> > + if (trace_cxl_overflow_enabled() &&
> > + (payload.flags & CXL_GET_EVENT_FLAG_OVERFLOW))
> > + trace_cxl_overflow(dev_name(cxlds->dev), type, &payload);
> > +
> > + } while (pl_nr > CXL_GET_EVENT_NR_RECORDS ||
>
> Isn't pl_nr > CXL_GET_EVENT_NR_RECORDS a hardware bug? It's the number in returned
> payload not the total number.

I don't think so. The only value passed to the device is the _input_ payload
size. The output payload size is not passed to the device and is not included
in the Get Event Records Input Payload. (Table 8-49)

So my previous code was wrong. Here is an example I think which is within the
spec but would result in the more records flag not being set.

Device log depth == 10
nr log entries == 7
nr log entries in 1MB ~= (1M - hdr size) / 128 ~= 8000

Device sets Output Payload.Event Record Count == 7 (which is < 8000). Common
mailbox code truncates that to 3. More Event Records == 0 because it sent all
7 that it had.

This code will clear 3 and read again 2 more times.

Am I reading that wrong?

>
> > + payload.flags & CXL_GET_EVENT_FLAG_MORE_RECORDS);
> > +}
>
>
> > diff --git a/include/trace/events/cxl.h b/include/trace/events/cxl.h
> > new file mode 100644
> > index 000000000000..60dec9a84918
> > --- /dev/null
> > +++ b/include/trace/events/cxl.h
> > @@ -0,0 +1,127 @@
>
>
> > +#define CXL_EVT_TP_fast_assign(dname, l, hdr) \
> > + __assign_str(dev_name, (dname)); \
> > + __entry->log = (l); \
> > + memcpy(&__entry->hdr_uuid, &(hdr).id, sizeof(uuid_t)); \
> > + __entry->hdr_length = (hdr).length; \
> > + __entry->hdr_flags = get_unaligned_le24((hdr).flags); \
> > + __entry->hdr_handle = le16_to_cpu((hdr).handle); \
> > + __entry->hdr_related_handle = le16_to_cpu((hdr).related_handle); \
> > + __entry->hdr_timestamp = le64_to_cpu((hdr).timestamp); \
> > + __entry->hdr_maint_op_class = (hdr).maint_op_class
> > +
> Trivial: Maybe one blank line is enough?

Yea I'll adjust,
Ira

> > +
> > +#define CXL_EVT_TP_printk(fmt, ...) \
> > + TP_printk("%s log=%s : time=%llu uuid=%pUb len=%d flags='%s' " \
> > + "handle=%x related_handle=%x maint_op_class=%u" \
> > + " : " fmt, \
> > + __get_str(dev_name), cxl_event_log_type_str(__entry->log), \
> > + __entry->hdr_timestamp, &__entry->hdr_uuid, __entry->hdr_length,\
> > + show_hdr_flags(__entry->hdr_flags), __entry->hdr_handle, \
> > + __entry->hdr_related_handle, __entry->hdr_maint_op_class, \
> > + ##__VA_ARGS__)
>