Re: [PATCH v1] virtio_pmem: populate numa information

From: Pankaj Gupta
Date: Wed Nov 16 2022 - 09:29:11 EST


> > > > > > > > Compute the numa information for a virtio_pmem device from the memory
> > > > > > > > range of the device. Previously, the target_node was always 0 since
> > > > > > > > the ndr_desc.target_node field was never explicitly set. The code for
> > > > > > > > computing the numa node is taken from cxl_pmem_region_probe in
> > > > > > > > drivers/cxl/pmem.c.
> > > > > > > >
> > > > > > > > Signed-off-by: Michael Sammler <sammler@xxxxxxxxxx>
> > >
> > > Tested-by: Mina Almasry <almasrymina@xxxxxxxxxx>
> > >
> > > I don't have much expertise on this driver, but with the help of this
> > > patch I was able to get memory tiering [1] emulation going on qemu. As
> > > far as I know there is no alternative to this emulation, and so I
> > > would love to see this or equivalent merged, if possible.
> > >
> > > This is what I have going to get memory tiering emulation:
> > >
> > > In qemu, added these configs:
> > > -object memory-backend-file,id=m4,share=on,mem-path="$path_to_virtio_pmem_file",size=2G
> > > \
> > > -smp 2,sockets=2,maxcpus=2 \
> > > -numa node,nodeid=0,memdev=m0 \
> > > -numa node,nodeid=1,memdev=m1 \
> > > -numa node,nodeid=2,memdev=m2,initiator=0 \
> > > -numa node,nodeid=3,initiator=0 \
> > > -device virtio-pmem-pci,memdev=m4,id=nvdimm1 \
> > >
> > > On boot, ran these commands:
> > > ndctl_static create-namespace -e namespace0.0 -m devdax -f 1&> /dev/null
> > > echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> > > echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> > > for i in `ls /sys/devices/system/memory/`; do
> > > state=$(cat "/sys/devices/system/memory/$i/state" 2&>/dev/null)
> > > if [ "$state" == "offline" ]; then
> > > echo online_movable > "/sys/devices/system/memory/$i/state"
> > > fi
> > > done
> >
> > Nice to see the way to handle the virtio-pmem device memory through kmem driver
> > and online the corresponding memory blocks to 'zone_movable'.
> >
> > This also opens way to use this memory range directly irrespective of attached
> > block device. Of course there won't be any persistent data guarantee. But good
> > way to simulate memory tiering inside guest as demonstrated below.
> > >
> > > Without this CL, I see the memory onlined in node 0 always, and is not
> > > a separate memory tier. With this CL and qemu configs, the memory is
> > > onlined in node 3 and is set as a separate memory tier, which enables
> > > qemu-based development:
> > >
> > > ==> /sys/devices/virtual/memory_tiering/memory_tier22/nodelist <==
> > > 3
> > > ==> /sys/devices/virtual/memory_tiering/memory_tier4/nodelist <==
> > > 0-2
> > >
> > > AFAIK there is no alternative to enabling memory tiering emulation in
> > > qemu, and would love to see this or equivalent merged, if possible.
> >
> > Just wondering if Qemu vNVDIMM device can also achieve this?
> >
>
> I spent a few minutes on this. Please note I'm really not familiar
> with these drivers, but as far as I can tell the qemu vNVDIMM device
> has the same problem and needs a similar fix to this to what Michael
> did here. What I did with vNVDIMM qemu device:
>
> - Added these qemu configs:
> -object memory-backend-file,id=m4,share=on,mem-path=./hello,size=2G,readonly=off
> \
> -device nvdimm,id=nvdimm1,memdev=m4,unarmed=off \
>
> - Ran the same commands in my previous email (they seem to apply to
> the vNVDIMM device without modification):
> ndctl_static create-namespace -e namespace0.0 -m devdax -f 1&> /dev/null
> echo dax0.0 > /sys/bus/dax/drivers/device_dax/unbind
> echo dax0.0 > /sys/bus/dax/drivers/kmem/new_id
> for i in `ls /sys/devices/system/memory/`; do
> state=$(cat "/sys/devices/system/memory/$i/state" 2&>/dev/null)
> if [ "$state" == "offline" ]; then
> echo online_movable > "/sys/devices/system/memory/$i/state"
> fi
> done
>
> I see the memory from the vNVDIMM device get onlined on node0, and is
> not detected as a separate memory tier. I suspect that driver needs a
> similar fix to this one.

Thanks for trying. It seems vNVDIMM device already has an option to provide
the target node[1].

[1] https://www.mail-archive.com/qemu-devel@xxxxxxxxxx/msg827765.html