Re: [patch 08/20] genirq/msi: Make MSI descriptor iterators device domain aware

From: Thomas Gleixner
Date: Wed Nov 16 2022 - 17:32:25 EST


On Wed, Nov 16 2022 at 14:36, Jason Gunthorpe wrote:
> On Fri, Nov 11, 2022 at 02:56:50PM +0100, Thomas Gleixner wrote:
>> To support multiple MSI interrupt domains per device it is necessary to
>> segment the xarray MSI descriptor storage. Each domain gets up to
>> MSI_MAX_INDEX entries.
>
> This kinds of suggests that the new per-device MSI domains should hold
> this storage instead of per-device xarray?

No, really not. This would create random storage in random driver places
instead of having a central storage place which is managed by the core
code. We've had that back in the days when every architecture had it's
own magic place to store and manage interrupt descriptors. Seen that,
mopped it up and never want to go back.

> I suppose the reason to avoid this is because alot of the driver
> facing API is now built on vector index numbers that index this
> xarray?

That's one aspect, but as I demonstrate later even for the IMS domains
which do not have a real requirement for 'index' you still need to have
a place to store the MSI descriptor and allocate storage space for it.

I really don't want to have random places doing that because then I
can't provide implicit MSI descriptor management, e.g. automatic
alloc/free anymore and everything has to happen at the driver side. The
only reason why I still need to do that for PCI/MSI is to be able to
support the museum architectures which still depend on the arch_....()
interfaces from 20 years ago.

So if a IMS domain, which e.g. stores the MSI message in queue memory,
wants a new interrupt then it allocates it with MSI_ANY_INDEX, which
gives it the next free slot in the XARRAY section of the MSI domain.

This avoids having IDA, bitmap allocators or whatever at the driver side
and having a virtual index number to track things does not affect the
flexibility of the driver side in any way.

All the driver needs at the very end is the interrupt number and the
message itself.

> But on the other hand can we just say drivers using multiple domains
> are "new" and they should use some new style pointer based interface
> so we don't have to have arrays of things?

Then driver writers have to provide storage for the domain pointer and
care about teardown etc. Seriously? NO!

> At least, I'd like to understand a bit better the motivation for using
> a domain ID instead of a pointer.

The main motivation was to avoid device specific storage for the irq
domain pointers. It would have started with PCI/MSI[X]: I'd had to add a
irqdomain pointer to struct pci_dev and then have the PCI core care
about it. So we'd add that to everything and the world which utilizes
per device MSI domains which is quite a few places outside of PCI in the
ARM64 world and growing.

The msi_device_data struct which is allocated on demand for MSI usage is
the obvious point to store _and_ manage these things, i.e. managed
teardown etc.

Giving this up makes any change to the core code hard because you have
to chase all usage sites and mop them up. Just look at the ARM part of
this series which is by now 40+ patches just to mop up the irqchip
core. There are still 25 PCI/MSI global irqdomain left.

> It feels like we are baking in several hard coded limits with this
> choice

Which ones?

The chosen array section size per domain is arbitrary and can be changed
at any given time. Though you have to exhaust 64k vectors per domain
first before we start debating that.

The number of irqdomains is not really hard limited either. It's trivial
enough to extend that number and once we hit 32 we just can stash them
away in the xarray. I pondered to do that right away, but that wastes
too much memory for now.

It really does not matter whether the domain creation results in a
number or in a pointer. Pointers are required for the inner workings of
the domain hierarchy but absolutely uninteresting for endpoint domains.

All you need there is a conveniant way to create the domain and then
allocate/free interrupts as you see fit.

We agreed a year ago that we want to abstract most of these things away
for driver writers and that all they need is simple way to create the
domains and the corresponding interrupt chip is mostly about writing the
MSI message to implementation defined storage and eventually providing a
implementation specific mask/unmask operation.

So what are you concerned about?

Thanks,

tglx