Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Three-way deadlock between z_iput, sync(), and fsync() #7964

Open
runderwo opened this issue Sep 27, 2018 · 4 comments · May be fixed by #17159
Open

Three-way deadlock between z_iput, sync(), and fsync() #7964

runderwo opened this issue Sep 27, 2018 · 4 comments · May be fixed by #17159
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@runderwo
Copy link

runderwo commented Sep 27, 2018

System information

Type Version/Name
Distribution Name Debian
Distribution Version Stretch
Linux Kernel 4.17.0-0.bpo.3-amd64
Architecture amd64
ZFS Version 0.7.9-3~bpo9+1
SPL Version 0.7.9-3~bpo9+1

Describe the problem you're observing

Three-way deadlock between z_iput, sync(), and fsync()

Describe how to reproduce the problem

Not sure - it just happens after a few weeks on an AFS fileserver with 4-spindle raidz

I previously had to disable swap due to #7734

There are no hardware errors logged and nothing apparently wrong with the disks. I can continue to use the pool but certain operations will hang in uninterruptible sleep until the system is rebooted.

Include any warning/errors/backtraces from the system logs

[1378261.919390]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378261.924563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378261.929657] z_iput          D    0   348      2 0x80000000
[1378261.934686] Call Trace:
[1378261.939606]  ? __schedule+0x3dc/0x860
[1378261.944430]  ? __switch_to_asm+0x34/0x70
[1378261.949234]  ? __switch_to_asm+0x40/0x70
[1378261.953852]  ? bit_wait_io_timeout+0x90/0x90
[1378261.958381]  schedule+0x32/0x80
[1378261.962864]  bit_wait+0xd/0x50
[1378261.967328]  __wait_on_bit+0x55/0x80
[1378261.971814]  ? __switch_to+0x175/0x450
[1378261.976315]  ? bit_wait_io_timeout+0x90/0x90
[1378261.980768]  __inode_wait_for_writeback+0xa7/0xe0
[1378261.985272]  ? init_wait_var_entry+0x40/0x40
[1378261.989790]  inode_wait_for_writeback+0x21/0x30
[1378261.994288]  evict+0xab/0x1c0
[1378261.998756]  taskq_thread+0x2d8/0x500 [spl]
[1378262.003184]  ? wake_up_q+0x70/0x70
[1378262.007632]  kthread+0xf8/0x130
[1378262.012013]  ? taskq_thread_spawn+0x50/0x50 [spl]
[1378262.016393]  ? kthread_create_worker_on_cpu+0x70/0x70
[1378262.020772]  ret_from_fork+0x22/0x40
[1378262.025144] INFO: task HWRaidManager:29978 blocked for more than 120 seconds.
[1378262.029447]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378262.033705] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378262.037912] HWRaidManager   D    0 29978      1 0x20020080
[1378262.042067] Call Trace:
[1378262.046099]  ? __schedule+0x3dc/0x860
[1378262.050046]  ? default_file_splice_write+0x20/0x20
[1378262.053890]  schedule+0x32/0x80
[1378262.057654]  wb_wait_for_completion+0x5e/0x90
[1378262.061335]  ? remove_wait_queue+0x60/0x60
[1378262.064932]  sync_inodes_sb+0xa7/0x280
[1378262.068438]  ? __queue_work+0x15e/0x410
[1378262.071847]  ? default_file_splice_write+0x20/0x20
[1378262.075180]  iterate_supers+0xb5/0x100
[1378262.078417]  ksys_sync+0x40/0xb0
[1378262.081593]  __ia32_sys_sync+0xa/0x10
[1378262.084747]  do_fast_syscall_32+0x98/0x1e0
[1378262.087866]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[1378262.090938] INFO: task vlserver:18469 blocked for more than 120 seconds.
[1378262.093961]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378262.097022] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378262.100142] vlserver        D    0 18469  18467 0x00000080
[1378262.103303] Call Trace:
[1378262.106405]  ? __schedule+0x3dc/0x860
[1378262.109524]  schedule+0x32/0x80
[1378262.112687]  cv_wait_common+0x115/0x130 [spl]
[1378262.115875]  ? remove_wait_queue+0x60/0x60
[1378262.119227]  zil_commit.part.13+0x86/0x8c0 [zfs]
[1378262.122464]  ? tsd_hash_search.isra.1+0x41/0x90 [spl]
[1378262.125708]  ? tsd_set+0x2db/0x480 [spl]
[1378262.128975]  ? _cond_resched+0x16/0x40
[1378262.132376]  zfs_fsync+0x70/0xe0 [zfs]
[1378262.135798]  zpl_fsync+0x63/0x90 [zfs]
[1378262.138987]  do_fsync+0x38/0x60
[1378262.142092]  __x64_sys_fsync+0x10/0x20
[1378262.145091]  do_syscall_64+0x55/0x110
[1378262.148037]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1378262.150909] RIP: 0033:0x7fdf15610560
[1378262.153769] RSP: 002b:00007ffddb903ad8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[1378262.156735] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fdf15610560
[1378262.159688] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
[1378262.162604] RBP: 0000559976687a50 R08: 000000000000000c R09: 00007ffddb903a20
[1378262.165473] R10: 000055997532e220 R11: 0000000000000246 R12: 0000559976687a40
[1378262.168331] R13: 00007ffddb903cf8 R14: 00000000ffffff9c R15: 0000559976687a40
[1378382.755489] INFO: task z_iput:348 blocked for more than 120 seconds.
[1378382.758346]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378382.761291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378382.764291] z_iput          D    0   348      2 0x80000000
[1378382.767307] Call Trace:
[1378382.770310]  ? __schedule+0x3dc/0x860
[1378382.773326]  ? __switch_to_asm+0x34/0x70
[1378382.776340]  ? __switch_to_asm+0x40/0x70
[1378382.779334]  ? bit_wait_io_timeout+0x90/0x90
[1378382.782321]  schedule+0x32/0x80
[1378382.785295]  bit_wait+0xd/0x50
[1378382.788256]  __wait_on_bit+0x55/0x80
[1378382.791216]  ? __switch_to+0x175/0x450
[1378382.794160]  ? bit_wait_io_timeout+0x90/0x90
[1378382.797118]  __inode_wait_for_writeback+0xa7/0xe0
[1378382.800087]  ? init_wait_var_entry+0x40/0x40
[1378382.803068]  inode_wait_for_writeback+0x21/0x30
[1378382.806036]  evict+0xab/0x1c0
[1378382.809011]  taskq_thread+0x2d8/0x500 [spl]
[1378382.811978]  ? wake_up_q+0x70/0x70
[1378382.814904]  kthread+0xf8/0x130
[1378382.817779]  ? taskq_thread_spawn+0x50/0x50 [spl]
[1378382.820652]  ? kthread_create_worker_on_cpu+0x70/0x70
[1378382.823497]  ret_from_fork+0x22/0x40
[1378382.826314] INFO: task auditd:3285 blocked for more than 120 seconds.
[1378382.829189]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378382.832123] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378382.835086] auditd          D    0  3285      1 0x00000000
[1378382.838049] Call Trace:
[1378382.841005]  ? __schedule+0x3dc/0x860
[1378382.843965]  schedule+0x32/0x80
[1378382.846931]  cv_wait_common+0x115/0x130 [spl]
[1378382.849881]  ? remove_wait_queue+0x60/0x60
[1378382.852994]  zil_commit.part.13+0x86/0x8c0 [zfs]
[1378382.855987]  ? tsd_hash_search.isra.1+0x41/0x90 [spl]
[1378382.858989]  ? tsd_set+0x2db/0x480 [spl]
[1378382.861953]  ? _cond_resched+0x16/0x40
[1378382.865043]  zfs_fsync+0x70/0xe0 [zfs]
[1378382.868107]  zpl_fsync+0x63/0x90 [zfs]
[1378382.870989]  do_fsync+0x38/0x60
[1378382.873795]  __x64_sys_fsync+0x10/0x20
[1378382.876586]  do_syscall_64+0x55/0x110
[1378382.879372]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1378382.882154] RIP: 0033:0x7fb4709786ed
[1378382.884956] RSP: 002b:00007fb46e4f0e30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[1378382.887829] RAX: ffffffffffffffda RBX: 0000559611942280 RCX: 00007fb4709786ed
[1378382.890722] RDX: 0000559611942280 RSI: 0000000000000000 RDI: 0000000000000004
[1378382.893611] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000f2e
[1378382.896514] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[1378382.899411] R13: 00007ffd5156da3f R14: 0000000000000000 R15: 00007fb4713f3040
[1378382.902349] INFO: task HWRaidManager:29978 blocked for more than 120 seconds.
[1378382.905289]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378382.908295] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378382.911381] HWRaidManager   D    0 29978      1 0x20020080
[1378382.914432] Call Trace:
[1378382.917486]  ? __schedule+0x3dc/0x860
[1378382.920538]  ? default_file_splice_write+0x20/0x20
[1378382.923591]  schedule+0x32/0x80
[1378382.926608]  wb_wait_for_completion+0x5e/0x90
[1378382.929588]  ? remove_wait_queue+0x60/0x60
[1378382.932575]  sync_inodes_sb+0xa7/0x280
[1378382.935545]  ? __queue_work+0x15e/0x410
[1378382.938482]  ? default_file_splice_write+0x20/0x20
[1378382.941447]  iterate_supers+0xb5/0x100
[1378382.944384]  ksys_sync+0x40/0xb0
[1378382.947255]  __ia32_sys_sync+0xa/0x10
[1378382.950060]  do_fast_syscall_32+0x98/0x1e0
[1378382.952861]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d
[1378382.955677] INFO: task vlserver:18469 blocked for more than 120 seconds.
[1378382.958524]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378382.961416] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378382.964385] vlserver        D    0 18469  18467 0x00000080
[1378382.967376] Call Trace:
[1378382.970326]  ? __schedule+0x3dc/0x860
[1378382.973312]  schedule+0x32/0x80
[1378382.976310]  cv_wait_common+0x115/0x130 [spl]
[1378382.979316]  ? remove_wait_queue+0x60/0x60
[1378382.982421]  zil_commit.part.13+0x86/0x8c0 [zfs]
[1378382.985437]  ? tsd_hash_search.isra.1+0x41/0x90 [spl]
[1378382.988453]  ? tsd_set+0x2db/0x480 [spl]
[1378382.991448]  ? _cond_resched+0x16/0x40
[1378382.994535]  zfs_fsync+0x70/0xe0 [zfs]
[1378382.997638]  zpl_fsync+0x63/0x90 [zfs]
[1378383.000610]  do_fsync+0x38/0x60
[1378383.003581]  __x64_sys_fsync+0x10/0x20
[1378383.006548]  do_syscall_64+0x55/0x110
[1378383.009489]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1378383.012460] RIP: 0033:0x7fdf15610560
[1378383.015399] RSP: 002b:00007ffddb903ad8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
[1378383.018339] RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fdf15610560
[1378383.021280] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
[1378383.024203] RBP: 0000559976687a50 R08: 000000000000000c R09: 00007ffddb903a20
[1378383.027129] R10: 000055997532e220 R11: 0000000000000246 R12: 0000559976687a40
[1378383.030049] R13: 00007ffddb903cf8 R14: 00000000ffffff9c R15: 0000559976687a40
[1378503.587095] INFO: task z_iput:348 blocked for more than 120 seconds.
[1378503.590045]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378503.593155] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378503.596215] z_iput          D    0   348      2 0x80000000
[1378503.599285] Call Trace:
[1378503.602332]  ? __schedule+0x3dc/0x860
[1378503.605388]  ? __switch_to_asm+0x34/0x70
[1378503.608448]  ? __switch_to_asm+0x40/0x70
[1378503.611499]  ? bit_wait_io_timeout+0x90/0x90
[1378503.614549]  schedule+0x32/0x80
[1378503.617605]  bit_wait+0xd/0x50
[1378503.620655]  __wait_on_bit+0x55/0x80
[1378503.623694]  ? __switch_to+0x175/0x450
[1378503.626709]  ? bit_wait_io_timeout+0x90/0x90
[1378503.629708]  __inode_wait_for_writeback+0xa7/0xe0
[1378503.632690]  ? init_wait_var_entry+0x40/0x40
[1378503.635654]  inode_wait_for_writeback+0x21/0x30
[1378503.638607]  evict+0xab/0x1c0
[1378503.641578]  taskq_thread+0x2d8/0x500 [spl]
[1378503.644536]  ? wake_up_q+0x70/0x70
[1378503.647460]  kthread+0xf8/0x130
[1378503.650336]  ? taskq_thread_spawn+0x50/0x50 [spl]
[1378503.653185]  ? kthread_create_worker_on_cpu+0x70/0x70
[1378503.656003]  ret_from_fork+0x22/0x40
[1378503.658817] INFO: task auditd:3285 blocked for more than 120 seconds.
[1378503.661676]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378503.664595] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378503.667544] auditd          D    0  3285      1 0x00000000
[1378503.670506] Call Trace:
[1378503.673447]  ? __schedule+0x3dc/0x860
[1378503.676396]  schedule+0x32/0x80
[1378503.679353]  cv_wait_common+0x115/0x130 [spl]
[1378503.682306]  ? remove_wait_queue+0x60/0x60
[1378503.685414]  zil_commit.part.13+0x86/0x8c0 [zfs]
[1378503.688397]  ? tsd_hash_search.isra.1+0x41/0x90 [spl]
[1378503.691391]  ? tsd_set+0x2db/0x480 [spl]
[1378503.694360]  ? _cond_resched+0x16/0x40
[1378503.697439]  zfs_fsync+0x70/0xe0 [zfs]
[1378503.700490]  zpl_fsync+0x63/0x90 [zfs]
[1378503.703372]  do_fsync+0x38/0x60
[1378503.706182]  __x64_sys_fsync+0x10/0x20
[1378503.708964]  do_syscall_64+0x55/0x110
[1378503.711734]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[1378503.714521] RIP: 0033:0x7fb4709786ed
[1378503.717325] RSP: 002b:00007fb46e4f0e30 EFLAGS: 00000293 ORIG_RAX: 000000000000004a
[1378503.720204] RAX: ffffffffffffffda RBX: 0000559611942280 RCX: 00007fb4709786ed
[1378503.723099] RDX: 0000559611942280 RSI: 0000000000000000 RDI: 0000000000000004
[1378503.725991] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000f2e
[1378503.728895] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
[1378503.731792] R13: 00007ffd5156da3f R14: 0000000000000000 R15: 00007fb4713f3040
[1378503.734737] INFO: task HWRaidManager:29978 blocked for more than 120 seconds.
[1378503.737681]       Tainted: P           O      4.17.0-0.bpo.3-amd64 #1 Debian 4.17.17-1~bpo9+1
[1378503.740688] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1378503.743728] HWRaidManager   D    0 29978      1 0x20020080
[1378503.746782] Call Trace:
[1378503.749832]  ? __schedule+0x3dc/0x860
[1378503.752880]  ? default_file_splice_write+0x20/0x20
[1378503.755929]  schedule+0x32/0x80
[1378503.758928]  wb_wait_for_completion+0x5e/0x90
[1378503.761924]  ? remove_wait_queue+0x60/0x60
[1378503.764908]  sync_inodes_sb+0xa7/0x280
[1378503.767873]  ? __queue_work+0x15e/0x410
[1378503.770812]  ? default_file_splice_write+0x20/0x20
[1378503.773774]  iterate_supers+0xb5/0x100
[1378503.776702]  ksys_sync+0x40/0xb0
[1378503.779576]  __ia32_sys_sync+0xa/0x10
[1378503.782427]  do_fast_syscall_32+0x98/0x1e0
[1378503.785225]  entry_SYSCALL_compat_after_hwframe+0x45/0x4d
rpool:
    version: 5000
    name: 'rpool'
    state: 0
    txg: 475211
    pool_guid: 17441379133630177917
    errata: 0
    hostid: 715009
    hostname: 'seraph'
    com.delphix:has_per_vdev_zaps
    vdev_children: 1
    vdev_tree:
        type: 'root'
        id: 0
        guid: 17441379133630177917
        children[0]:
            type: 'raidz'
            id: 0
            guid: 11960073590447584870
            nparity: 1
            metaslab_array: 35
            metaslab_shift: 37
            ashift: 12
            asize: 15998825529344
            is_log: 0
            create_txg: 4
            com.delphix:vdev_zap_top: 40
            children[0]:
                type: 'disk'
                id: 0
                guid: 13773523373170097779
                path: '/dev/disk/by-id/ata-HGST_HUS724040ALA640_PN2334PBJ3SV0T-part3'
                whole_disk: 0
                DTL: 65
                create_txg: 4
                com.delphix:vdev_zap_leaf: 41
            children[1]:
                type: 'disk'
                id: 1
                guid: 14927593852271164084
                path: '/dev/disk/by-id/ata-WDC_WD4003FZEX-00Z4SA0_WD-WMC1F0D1ZE3K-part3'
                whole_disk: 0
                DTL: 77
                create_txg: 4
                com.delphix:vdev_zap_leaf: 268
            children[2]:
                type: 'disk'
                id: 2
                guid: 4795217381968720306
                path: '/dev/disk/by-id/ata-HGST_HUS724040ALA640_PN2334PBJ525GT-part3'
                whole_disk: 0
                DTL: 76
                create_txg: 4
                com.delphix:vdev_zap_leaf: 75
            children[3]:
                type: 'disk'
                id: 3
                guid: 14529900694222241193
                path: '/dev/disk/by-id/ata-WDC_WD4003FZEX-00Z4SA0_WD-WMC5D0D2U951-part3'
                whole_disk: 0
                DTL: 62
                create_txg: 4
                com.delphix:vdev_zap_leaf: 44
    features_for_read:
        com.delphix:hole_birth
        com.delphix:embedded_data
  pool: rpool
 state: ONLINE
  scan: scrub repaired 0B in 4h38m with 0 errors on Sun Sep  9 05:02:45 2018
config:

        NAME                                                  STATE     READ WRITE CKSUM
        rpool                                                 ONLINE       0     0     0
          raidz1-0                                            ONLINE       0     0     0
            ata-HGST_HUS724040ALA640_PN2334PBJ3SV0T-part3     ONLINE       0     0     0
            ata-WDC_WD4003FZEX-00Z4SA0_WD-WMC1F0D1ZE3K-part3  ONLINE       0     0     0
            sda3                                              ONLINE       0     0     0
            ata-WDC_WD4003FZEX-00Z4SA0_WD-WMC5D0D2U951-part3  ONLINE       0     0     0
        spares
          sdf3                                                AVAIL   

errors: No known data errors
@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@TheUbuntuGuy
Copy link

This issue is still present in the latest 0.8.4 release. I ran into this on a production workload just a few weeks ago.

@stale stale bot removed the Status: Stale No recent activity for issue label Aug 25, 2020
@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Aug 25, 2020
@stale
Copy link

stale bot commented Aug 25, 2021

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2021
@behlendorf behlendorf added Bot: Not Stale Override for the stale bot and removed Status: Stale No recent activity for issue labels Sep 2, 2021
@tuxoko
Copy link
Contributor

tuxoko commented Mar 17, 2025

I think this is a deadlock issue between inode writeback issued from sync and inode evict.
When evict and writeback happen at the same time on the same inode,
evict will set I_FREEING flag, and writeback will set I_SYNC flag.
I_SYNC flag will cause evict to block at inode_wait_for_writeback(), while writeback will call into zil_commit and directly or indirectly call zfs_get_data and will block at zfs_zget because of I_FREEING.

tuxoko pushed a commit to tuxoko/zfs that referenced this issue Mar 19, 2025
If inode eviction and sync writeback happen on the same inode at the
same time, inode eviction will set I_FREEING and wait for sync
writeback, and sync writeback may eventually calls zfs_get_data and loop
in zfs_zget forever because igrab cannot succeed with I_FREEING, thus
causing deadlock.

To fix this, in zfs_get_data we call a variant of zfs_zget where we
bailout on loop if I_SYNC flag is set, and force the caller to wait for
txg sync.

Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Fixes openzfs#7964
Fixes openzfs#9430
@tuxoko tuxoko linked a pull request Mar 19, 2025 that will close this issue
13 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot: Not Stale Override for the stale bot Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants