Re: [patch 28/33] PCI/MSI: Provide IMS (Interrupt Message Store) support
From: Bjorn Helgaas
Date: Wed Nov 16 2022 - 15:17:19 EST
On Fri, Nov 11, 2022 at 02:58:55PM +0100, Thomas Gleixner wrote:
> IMS (Interrupt Message Store) is a new specification which allows
> implementation specific storage of MSI messages contrary to the
> strict standard specified MSI and MSI-X message stores.
>
> This requires new device specific interrupt domains to handle the
> implementation defined storage which can be an array in device memory or
> host/guest memory which is shared with hardware queues.
>
> Add a function to create IMS domains for PCI devices. IMS domains are using
> the new per device domain mechanism and are configured by the device driver
> via a template. IMS domains are created as secondary device domains so they
> work side on side with MSI[-X] on the same device.
>
> The IMS domains have a few constraints:
>
> - The index space is managed by the core code.
>
> Device memory based IMS provides a storage array with a fixed size
> which obviously requires an index. But there is no association between
> index and functionality so the core can randomly allocate an index in
> the array.
>
> Queue memory based IMS does not have the concept of an index as the
> storage is somewhere in memory. In that case the index is purely
> software based to keep track of the allocations.
>
> - There is no requirement for consecutive index ranges
>
> This is currently a limitation of the MSI core and can be implemented
> if there is a justified use case by changing the internal storage from
> xarray to maple_tree. For now it's single vector allocation.
>
> - The interrupt chip must provide the following callbacks:
>
> - irq_mask()
> - irq_unmask()
> - irq_write_msi_msg()
>
> - The interrupt chip must provide the following optional callbacks
> when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks
> cannot operate directly on hardware, e.g. in the case that the
> interrupt message store is in queue memory:
>
> - irq_bus_lock()
> - irq_bus_unlock()
>
> These callbacks are invoked from preemptible task context and are
> allowed to sleep. In this case the mandatory callbacks above just
> store the information. The irq_bus_unlock() callback is supposed to
> make the change effective before returning.
>
> - Interrupt affinity setting is handled by the underlying parent
> interrupt domain and communicated to the IMS domain via
> irq_write_msi_msg(). IMS domains cannot have a irq_set_affinity()
> callback. That's a reasonable restriction similar to the PCI/MSI
> device domain implementations.
>
> The domain is automatically destroyed when the PCI device is removed.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
A couple typos below.
> ---
> drivers/pci/msi/irqdomain.c | 59 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/pci.h | 5 +++
> 2 files changed, 64 insertions(+)
>
> --- a/drivers/pci/msi/irqdomain.c
> +++ b/drivers/pci/msi/irqdomain.c
> @@ -355,6 +355,65 @@ bool pci_msi_domain_supports(struct pci_
> return (supported & feature_mask) == feature_mask;
> }
>
> +/**
> + * pci_create_ims_domain - Create a secondary IMS domain for a PCI device
> + * @pdev: The PCI device to operate on
> + * @template: The MSI info template which describes the domain
> + * @hwsize: The size of the hardware entry table or 0 if the domain
> + * is purely software managed
> + * @data: Optional pointer to domain specific data to be stored
> + * in msi_domain_info::data
> + *
> + * Return: True on success, false otherwise
> + *
> + * A IMS domain is expected to have the following constraints:
An IMS ...
> + * - The index space is managed by the core code
> + *
> + * - There is no requirement for consecutive index ranges
> + *
> + * - The interrupt chip must provide the following callbacks:
> + * - irq_mask()
> + * - irq_unmask()
> + * - irq_write_msi_msg()
> + *
> + * - The interrupt chip must provide the following optional callbacks
> + * when the irq_mask(), irq_unmask() and irq_write_msi_msg() callbacks
> + * cannot operate directly on hardware, e.g. in the case that the
> + * interrupt message store is in queue memory:
> + * - irq_bus_lock()
> + * - irq_bus_unlock()
> + *
> + * These callbacks are invoked from preemptible task context and are
> + * allowed to sleep. In this case the mandatory callbacks above just
> + * store the information. The irq_bus_unlock() callback is supposed
> + * to make the change effective before returning.
> + *
> + * - Interrupt affinity setting is handled by the underlying parent
> + * interrupt domain and communicated to the IMS domain via
> + * irq_write_msi_msg().
Different indentation than the bullet items above.
> + *
> + * The domain is automatically destroyed when the PCI device is removed.
> + */
> +bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
> + unsigned int hwsize, void *data)
> +{
> + struct irq_domain *domain = dev_get_msi_domain(&pdev->dev);
> +
> + if (!domain || !irq_domain_is_msi_parent(domain))
> + return -ENOTSUPP;
> +
> + if (template->info.bus_token != DOMAIN_BUS_PCI_DEVICE_IMS ||
> + !(template->info.flags & MSI_FLAG_ALLOC_SIMPLE_MSI_DESCS) ||
> + !(template->info.flags & MSI_FLAG_FREE_MSI_DESCS) ||
> + !template->chip.irq_mask || !template->chip.irq_unmask ||
> + !template->chip.irq_write_msi_msg || template->chip.irq_set_affinity)
> + return -EINVAL;
> +
> + return msi_create_device_irq_domain(&pdev->dev, MSI_SECONDARY_DOMAIN, template,
> + hwsize, data, NULL);
> +}
> +EXPORT_SYMBOL_GPL(pci_create_ims_domain);
> +
> /*
> * Users of the generic MSI infrastructure expect a device to have a single ID,
> * so with DMA aliases we have to pick the least-worst compromise. Devices with
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -2481,6 +2481,11 @@ static inline bool pci_is_thunderbolt_at
> void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type);
> #endif
>
> +struct msi_domain_template;
> +
> +bool pci_create_ims_domain(struct pci_dev *pdev, const struct msi_domain_template *template,
> + unsigned int hwsize, void *data);
> +
> #include <linux/dma-mapping.h>
>
> #define pci_printk(level, pdev, fmt, arg...) \
>