You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ocfs2: ocfs2 crash due to invalid h_next_leaf_blk value in extent block
One of our customers reported the following stack.
crash-7.3.0> bt
PID: 250515 TASK: ffff888189482f80 CPU: 1 COMMAND: "vmbackup"
#0 [ffffc90025017878] die at ffffffff81033c22
#1 [ffffc900250178a8] do_trap at ffffffff81030990
#2 [ffffc900250178f8] do_error_trap at ffffffff810311d7
#3 [ffffc900250179c0] do_invalid_op at ffffffff81031310
#4 [ffffc900250179d0] invalid_op at ffffffff81a01f2a
[exception RIP: ocfs2_truncate_rec+1914]
RIP: ffffffffc1e73b4a RSP: ffffc90025017a80 RFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000053a75 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff8882d385be08 RDI: ffff8882d385be08
RBP: ffffc90025017b10 R8: 0000000000000000 R9: 0000000000005900
R10: 0000000000000001 R11: 0000000000aaaaaa R12: 0000000000000001
R13: ffff88829e5a9900 R14: ffffc90025017cf0 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: e030 SS: e02b
#5 [ffffc90025017b18] ocfs2_remove_extent at ffffffffc1e73e6c [ocfs2]
#6 [ffffc90025017bc8] ocfs2_remove_btree_range at ffffffffc1e745f2 [ocfs2]
#7 [ffffc90025017c60] ocfs2_commit_truncate at ffffffffc1e75b1f [ocfs2]
#8 [ffffc90025017d68] __dta_ocfs2_wipe_inode_606 at ffffffffc1e9a3e0 [ocfs2]
#9 [ffffc90025017dd8] ocfs2_evict_inode at ffffffffc1e9ac10 [ocfs2]
RIP: 00007f9b26ec8307 RSP: 00007ffc5a193f68 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000ddd0a0 RCX: 00007f9b26ec8307
RDX: 0000000000000001 RSI: 00007f9b2719e770 RDI: 0000000001010400
RBP: 0000000001263d80 R8: 0000000000000000 R9: 00000000012146a0
R10: 000000000000000d R11: 0000000000000246 R12: 0000000000ddd0a0
R13: 00007f9b27ba9595 R14: 00007f9b27ca4a50 R15: 00000000ffffffff
ORIG_RAX: 0000000000000057 CS: 0033 SS: 002b crash-7.3.0>
This crash resulted due to invalid extent record selected for truncate.
At the top of the function ocfs2_truncate_rec(), the code checks if the
first extent record at the leaf extent list corresponding to the input
path is still empty. In that case the tree is rotated left to get rid of
the empty extent record but this rotation did not happen.
But the function ocfs2_truncate_rec() assumes that the top level call
to ocfs2_rotate_tree_left() to get rid of the empty extent always
succeeds and hence it decrements the input "index" value. This results
in selection of a wrong record for truncate that causes to hit a call to
BUG() with the message "Owner %llu: Invalid record truncate: (%u, %u) ".
The stack above is the panic stack caused due to hitting BUG().
Though the function ocfs2_rotate_tree_left() was intended to get rid of
the first empty record in the extent block, it did not call the function
ocfs2_rotate_rightmost_leaf_left() as it did not find h_next_leaf_blk
in the extentleaf block to be zero, instead, it proceeded to call
__ocfs2_rotate_tree_left(). However the input "index" value was indeed
pointing to the last extent record in the leaf block. The macro
path_leaf_bh() was returning rightmost extent block as per the tree-depth.
and the function ocfs2_find_cpos_for_right_leaf() also found out that
the extent block in question is indeed the rightmost and hence there is
nothing to rotate at the last extent record pointed by the input "index"
value. Hence the extent tree in the leaf block was not totated at all.
Hence, the real reason for the above panic is that the value of the field
h_next_leaf_blk in the right most leaf block was non-zero that caused
the tree not to rotate left resulting in selection of invalid record for
truncate.
The reason why h_next_leaf_blk was not cleared for the last extent block
is still not known and the code changes here is a workaround to avoid
the panic by verifying that the extent block in question is indeed the
rightmost leaf block in the tree and then correcting the invalid
h_next_leaf_blk value. These changes have been verified by the customer
by running the provided rpm in their env.
Orabug: 34393593
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
0 commit comments