Re: [PATCH v4 0/5] fsnotify: fix softlockups iterating over d_subdirs
From: Amir Goldstein
Date: Tue Nov 22 2022 - 09:05:31 EST
On Tue, Nov 22, 2022 at 1:50 PM Jan Kara <jack@xxxxxxx> wrote:
>
> Hi Stephen!
>
> On Fri 11-11-22 14:06:09, Stephen Brennan wrote:
> > Here's my v4 patch series that aims to eliminate soft lockups when updating
> > dentry flags in fsnotify. I've incorporated Jan's suggestion of simply
> > allowing the flag to be lazily cleared in the fsnotify_parent() function,
> > via Amir's patch. This allowed me to drop patch #2 from my previous series
> > (fsnotify: Protect i_fsnotify_mask and child flags with inode rwsem). I
> > replaced it with "fsnotify: require inode lock held during child flag
> > update", patch #5 in this series. I also added "dnotify: move
> > fsnotify_recalc_mask() outside spinlock" to address the sleep-during-atomic
> > issues with dnotify.
>
> Yes, the series is now much simpler. Thanks!
>
> > Jan expressed concerns about lock ordering of the inode rwsem with the
> > fsnotify group mutex. I built this with lockdep enabled (see below for the
> > lock debugging .config section -- I'm not too familiar with lockdep so I
> > wanted a sanity check). I ran all the fanotify, inotify, and dnotify tests
> > I could find in LTP, with no lockdep splats to be found. I don't know that
> > this can completely satisfy the concerns about lock ordering: I'm reading
> > through the code to better understand the concern about "the removal of
> > oneshot mark during modify event generation". But I'm encouraged by the
> > LTP+lockdep results.
>
> So I had a look and I think your patches could cause deadlock at least for
> nfsd. The problem is with things like inotify IN_ONESHOT marks. They get
> autodeleted as soon as they trigger. Thus e.g. fsnotify_mkdir() can trigger
> IN_ONESHOT mark and goes on removing it by calling fsnotify_destroy_mark()
> from inotify_handle_inode_event(). And nfsd calls e.g. fsnotify_mkdir()
> while holding dir->i_rwsem held. So we have lock ordering like:
>
> nfsd_mkdir()
> inode_lock(dir);
> ...
> __nfsd_mkdir(dir, ...)
> fsnotify_mkdir(dir, dentry);
> ...
> inotify_handle_inode_event()
> ...
> fsnotify_destroy_mark()
> fsnotify_group_lock(group)
>
> So we have dir->i_rwsem > group->mark_mutex. But we also have callchains
> like:
>
> inotify_add_watch()
> inotify_update_watch()
> fsnotify_group_lock(group)
> inotify_update_existing_watch()
> ...
> fsnotify_recalc_mask()
> inode_lock(dir); -> added by your series
>
> which creates ordering group->mark_mutex > dir->i_rwsem.
>
> It is even worse with dnotify which (even with your patches) ends up
> calling fsnotify_recalc_mask() from dnotify_handle_event() so we have a
> possibility of direct A->A deadlock. But I'd leave dnotify aside, I think
> that can be massaged to not need to call fsnotify_recalc_mask()
> (__fsnotify_recalc_mask() would be enough there).
>
> Still I'm not 100% sure about a proper way out of this. The simplicity of
> alias->d_subdirs iteration with i_rwsem held is compeling.
Agreed.
> We could mandate
> that fsnotify hooks cannot be called with inode->i_rwsem held (and fixup
> nfsd) but IMO that is pushing the complexity from the fsnotify core into
> its users which is undesirable.
I think inode in this context is the parent inode, so all fsnotify hooks
in namei.c are holding inode->i_rwsem by design.
> Maybe we could grab inode->i_rwsem in those
> places adding / removing notification marks before we grab
> group->mark_mutex, just verify (with lockdep) that fsnotify_recalc_mask()
> has the inode->i_rwsem held and be done with it? That pushes a bit of
> complexity into the fsnotify backends but it is not too bad.
> fsnotify_recalc_mask() gets only called by dnotify, inotify, and fanotify.
> Amir?
>
Absolutely agree - I think it makes sense and will simplify things a lot.
Obviously if we need to assert inode_is_locked() in fsnotify_recalc_mask()
only for (conn->type == FSNOTIFY_OBJ_TYPE_INODE).
Thanks,
Amir.