Re: [PATCH v4 0/6] Add Auxiliary driver support
From: Leon Romanovsky
Date: Wed Nov 16 2022 - 08:22:20 EST
On Mon, Nov 14, 2022 at 04:47:31PM -0800, Ajit Khaparde wrote:
> Leon,
> We appreciate your valuable feedback.
> Please see inline.
>
> On Thu, Nov 10, 2022 at 2:53 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
> >
> > On Wed, Nov 09, 2022 at 10:42:38AM -0800, Ajit Khaparde wrote:
> > > Add auxiliary device driver for Broadcom devices.
> > > The bnxt_en driver will register and initialize an aux device
> > > if RDMA is enabled in the underlying device.
> > > The bnxt_re driver will then probe and initialize the
> > > RoCE interfaces with the infiniband stack.
> > >
> > > We got rid of the bnxt_en_ops which the bnxt_re driver used to
> > > communicate with bnxt_en.
> > > Similarly We have tried to clean up most of the bnxt_ulp_ops.
> > > In most of the cases we used the functions and entry points provided
> > > by the auxiliary bus driver framework.
> > > And now these are the minimal functions needed to support the functionality.
> > >
> > > We will try to work on getting rid of the remaining if we find any
> > > other viable option in future.
> >
> > I still see extra checks for something that was already checked in upper
> > functions, for example in bnxt_re_register_netdev() you check rdev, which
> > you already checked before.
> Sure. I will do another sweep and clean up.
>
> >
> > However, the most important part is still existence of bnxt_ulp_ops,
> > which shows completely no-go thing - SR-IOV config in RDMA code.
> >
> > 302 static struct bnxt_ulp_ops bnxt_re_ulp_ops = {
> > 303 .ulp_sriov_config = bnxt_re_sriov_config,
> > 304 .ulp_irq_stop = bnxt_re_stop_irq,
> > 305 .ulp_irq_restart = bnxt_re_start_irq
> > 306 };
> >
> > All PCI management logic and interfaces are needed to be inside eth part
> > of your driver and only that part should implement SR-IOV config. Once
> > user enabled SR-IOV, the PCI driver should create auxiliary devices for
> > each VF. These device will have RDMA capabilities and it will trigger RDMA
> > driver to bind to them.
> I agree and once the PF creates the auxiliary devices for the VF, the RoCE
> Vf indeed get probed and created. But the twist in bnxt_en/bnxt_re
> design is that
> the RoCE driver is responsible for making adjustments to the RoCE resources.
You can still do these adjustments by checking type of function that
called to RDMA .probe. PCI core exposes some functions to help distinguish between
PF and VFs.
>
> So once the VF's are created and the bnxt_en driver enables SRIOV adjusts the
> NIC resources for the VF, and such, it tries to call into the bnxt_re
> driver for the
> same purpose.
If I read code correctly, all these resources are for one PCI function.
Something like this:
bnxt_re_probe()
{
...
if (is_virtfn(p))
bnxt_re_sriov_config(p);
...
}
>
> 1. We do something like this to the auxiliary_device structure:
>
> diff --git a/include/linux/auxiliary_bus.h b/include/linux/auxiliary_bus.h
> index de21d9d24a95..4e581fbf458f 100644
> --- a/include/linux/auxiliary_bus.h
> +++ b/include/linux/auxiliary_bus.h
> @@ -148,6 +148,7 @@ struct auxiliary_device {
> * @shutdown: Called at shut-down time to quiesce the device.
> * @suspend: Called to put the device to sleep mode. Usually to a power state.
> * @resume: Called to bring a device from sleep mode.
> + * @sriov_configure: Called to allow configuration of VFs .
> * @name: Driver name.
> * @driver: Core driver structure.
> * @id_table: Table of devices this driver should match on the bus.
> @@ -183,6 +184,7 @@ struct auxiliary_driver {
> void (*shutdown)(struct auxiliary_device *auxdev);
> int (*suspend)(struct auxiliary_device *auxdev, pm_message_t state);
> int (*resume)(struct auxiliary_device *auxdev);
> + int (*sriov_configure)(struct auxiliary_device *auxdev, int
> num_vfs); /* On PF */
> const char *name;
> struct device_driver driver;
> const struct auxiliary_device_id *id_table;
>
> Then the bnxt_en driver could call into bnxt_re via that interface.
>
@sriov_configure callback is PCI specific and doesn't belong to aux
devices.
> Please let me know what you feel.
>
> 2. While it may take care of the first function in the ulp_ops, it
> leaves us with two.
> And that is where I will need some input if we need to absolutely get
> rid of the ops.
>
> 2a. We may be able to use the auxiliary_device suspend & resume with a
> private flag
> in the driver's aux_dev pointer.
> 2b. Or just like (1) above, add another hook to auxiliary_driver
> void (*restart)(struct auxiliary_device *auxdev);
> And then use the auxiliary_driver shutdown & restart with a private flag.
>
> Note that we may get creative right now and get rid of the ulp_ops.
> But the bnxt_en driver having a need to update the bnxt_re driver is a
> part of the
> design. So it will help if we can consider beyond the ulp_irq_stop,
> ulp_irq_restart.
> 2c. Maybe keep the bnxt_ulp_ops for that reason?
It is nicer to get rid from bnxt_ulp_ops completely, but it is not must.
To get rid from sriov_configure is the most important comment here.
Thanks
>
> Thank you for your time.
>
> Thanks
> Ajit
>
> >
> > Thanks