Re: [PATCH] xfs: account extra freespace btree splits for multiple allocations
From: Gao Xiang
Date: Tue Nov 22 2022 - 19:42:33 EST
On Wed, Nov 23, 2022 at 09:42:43AM +1100, Dave Chinner wrote:
> On Tue, Nov 22, 2022 at 06:03:03PM +0800, Oliver Sang wrote:
> > hi Gao Xiang,
> >
> > On Tue, Nov 22, 2022 at 09:33:38AM +0800, Gao Xiang wrote:
> > > On Tue, Nov 22, 2022 at 09:09:34AM +0800, kernel test robot wrote:
> > > >
> > > > please be noted we noticed Gao Xiang and Dave Chinner have already had lots of
> > > > discussion around this patch, which seems there is maybe new version later.
> > > > we just sent out this report FYI the possible performance impact of this patch.
> > > >
> > > >
> > > > Greeting,
> > > >
> > > > FYI, we noticed a -15.1% regression of fxmark.ssd_xfs_MWCM_72_directio.works/sec due to commit:
> > >
> > > Thanks for your report!
> > >
> > > At a glance, I have no idea why this commit can have performance
> > > impacts. Is the result stable?
> >
> > in our tests, the result is quite stable.
> > 45589 -15.1% 38687 ± 2% fxmark.ssd_xfs_MWCM_72_directio.works/sec
> >
> > and detail data is as below:
> > for this commit:
> > "fxmark.ssd_xfs_MWCM_72_directio.works/sec": [
> > 39192.224368,
> > 39665.690567,
> > 38980.680601,
> > 37298.99538,
> > 37483.256377,
> > 39504.606569
> > ],
> >
> > for parent:
> > "fxmark.ssd_xfs_MWCM_72_directio.works/sec": [
> > 45381.458009,
> > 45314.376204,
> > 45724.688965,
> > 45751.955937,
> > 45614.323267,
> > 45747.216475
> > ],
>
> This MWCM workload uses a shared directory. Every worker thread (72
> of them) iterates creating a new file, writes 4kB of data to it and
> then closes it. There is no synchronisation between worker threads.
>
> The worker threads will lockstep on the directory lock for file
> creation, they will all attempt to allocate data in the same AG as
> the file is created. Hence writeback will race with file creation
> for AG locks, too. Once the first AG is full, they will all attempt
> to allocate in the next AG (file creation and writeback).
>
> IOWs, this workload will race to fill AGs, will exercise the "AG
> full so skip to next AG" allocator fallbacks, etc.
Glad to know about that. I didn't look into the MWCM workload before.
>
> Changing where/how AGs are considered full will impact how the AG
> selection is made. I'm betting that there's a mismatch between the
> code that selects the initial AG for allocation (from
> xfs_bmap_btalloc() via the nullfb case) and the code that selects
> the actual AG for allocation (xfs_alloc_vextent() w/ NEAR_BNO
> policy) as a result of this change. This then results in
> xfs_alloc_vextent() trying to initially allocate from an AG that
> xfs_alloc_fix_freelist() considers to be full, so it skips the
> initial selected AG and starts searching for an AG it can allocate
> into.
I can imagine, but I didn't think out several block reservation could
cause such huge impacts.
>
> Combine that with AGF lock contention from 70+ tasks all trying to
> allocate in the same location...
Yeah, anyway, I will reconfirm on our side about this workload
as well.
Thanks,
Gao Xiang
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx