Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix inode eviction and sync writeback deadlock on Linux #17159

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions include/sys/zfs_znode.h
Original file line number Diff line number Diff line change
Expand Up @@ -278,6 +278,7 @@ extern int zfs_znode_hold_compare(const void *, const void *);
extern znode_hold_t *zfs_znode_hold_enter(zfsvfs_t *, uint64_t);
extern void zfs_znode_hold_exit(zfsvfs_t *, znode_hold_t *);
extern int zfs_zget(zfsvfs_t *, uint64_t, znode_t **);
extern int zfs_zget_impl(zfsvfs_t *, uint64_t, znode_t **, boolean_t);
extern int zfs_rezget(znode_t *);
extern void zfs_zinactive(znode_t *);
extern void zfs_znode_delete(znode_t *, dmu_tx_t *);
Expand Down
7 changes: 7 additions & 0 deletions module/os/freebsd/zfs/zfs_znode_os.c
Original file line number Diff line number Diff line change
Expand Up @@ -1051,6 +1051,13 @@ zfs_zget(zfsvfs_t *zfsvfs, uint64_t obj_num, znode_t **zpp)
return (err);
}

int
zfs_zget_impl(zfsvfs_t *zfsvfs, uint64_t obj_num, znode_t **zpp,
boolean_t check_sync)
{
return (zfs_zget(zfsvfs, obj_num, zpp));
}

int
zfs_rezget(znode_t *zp)
{
Expand Down
22 changes: 20 additions & 2 deletions module/os/linux/zfs/zfs_znode_os.c
Original file line number Diff line number Diff line change
Expand Up @@ -1044,13 +1044,21 @@ zfs_xvattr_set(znode_t *zp, xvattr_t *xvap, dmu_tx_t *tx)

int
zfs_zget(zfsvfs_t *zfsvfs, uint64_t obj_num, znode_t **zpp)
{
return (zfs_zget_impl(zfsvfs, obj_num, zpp, B_FALSE));
}

int
zfs_zget_impl(zfsvfs_t *zfsvfs, uint64_t obj_num, znode_t **zpp,
boolean_t check_sync)
{
dmu_object_info_t doi;
dmu_buf_t *db;
znode_t *zp;
znode_hold_t *zh;
int err;
sa_handle_t *hdl;
boolean_t noloop = B_FALSE;

*zpp = NULL;

Expand Down Expand Up @@ -1109,8 +1117,18 @@ zfs_zget(zfsvfs_t *zfsvfs, uint64_t obj_num, znode_t **zpp)
if (igrab(ZTOI(zp)) == NULL) {
if (zp->z_unlinked)
err = SET_ERROR(ENOENT);
else
else {
err = SET_ERROR(EAGAIN);
/*
* In writeback path, I_SYNC flag will be set
* and block inode eviction. So we must not
* loop doing igrab in possible writeback
* path, i.e. zfs_get_data, if inode is being
* evicted and I_SYNC is also set.
*/
if (check_sync && (ZTOI(zp)->i_state & I_SYNC))
Copy link
Contributor

@snajpa snajpa Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can move this before the igrab, take the i_lock and check for I_FREEING and I_SYNC and if they are both set, bail out of the original zfs_zget without the need for zfs_zget_impl - have you already considered it? am I overlooking something?

Copy link
Contributor Author

@tuxoko tuxoko Mar 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only place that needs bail out is if you are in writeback context, either you are the one setting I_SYNC or whoever set it is blocked by you.

If you do bailout for other caller then you will get errors randomly in all sorts of places for no good reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I_SYNC is set in two kernel functions, writeback_sb_inodes, writeback_single_inode. So if you check for I_SYNC and I_FREEING and you use i_lock correctly as you're supposed to, you get more clean and also reliable solution. How is what am saying wrong specifically and why? Which tons of errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snajpa
I_FREEING and I_SYNC is set on the inode. Every one trying to do zfs_zget will see them. So if you don't have any restriction on caller then every one will bailout and got error as long as eviction and writeback is happening at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well.. I tried playing around with it and it seems like it would be an interesting project to get to the bottom of maybe few other lurking bugs, forcing one to go around and look at how errors from zfs_zget are handled all over the codebase - I think there's a thing in direct IO I'll try to tackle if I find time; but that's not relevant to resolving the deadlock, so, yeah, no easy way other than the _impl split :)

noloop = B_TRUE;
}
} else {
*zpp = zp;
err = 0;
Expand All @@ -1120,7 +1138,7 @@ zfs_zget(zfsvfs_t *zfsvfs, uint64_t obj_num, znode_t **zpp)
sa_buf_rele(db, NULL);
zfs_znode_hold_exit(zfsvfs, zh);

if (err == EAGAIN) {
if (err == EAGAIN && !noloop) {
/* inode might need this to finish evict */
cond_resched();
goto again;
Expand Down
13 changes: 12 additions & 1 deletion module/zfs/zfs_vnops.c
Original file line number Diff line number Diff line change
Expand Up @@ -1149,10 +1149,21 @@ zfs_get_data(void *arg, uint64_t gen, lr_write_t *lr, char *buf,
ASSERT3P(lwb, !=, NULL);
ASSERT3U(size, !=, 0);

error = zfs_zget_impl(zfsvfs, object, &zp, B_TRUE);
#if defined(__linux__)
Copy link
Contributor Author

@tuxoko tuxoko Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added ifdef because I'm not sure if it's possible to return EAGAIN on freebsd or not.
But I guess looking at the caller, it doesn't really matter if we return EAGAIN or EIO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps it would be more clean to enhance zil_lwb_commit, which is the user of zfs_get_data, to recognize EAGAIN, I think

/*
* Under Linux, EAGAIN indicates the inode is being evicted and I_SYNC
* is also set possibly blocking eviction, so we can't loop in
* zfs_zget to avoid deadlock. Return EIO to force txg sync under such
* scenario.
*/
if (error == EAGAIN)
return (SET_ERROR(EIO));
#endif
/*
* Nothing to do if the file has been removed
*/
if (zfs_zget(zfsvfs, object, &zp) != 0)
if (error)
return (SET_ERROR(ENOENT));
if (zp->z_unlinked) {
/*
Expand Down
Loading