Skip to content

Commit e96bb93

Browse files
srabinovjfvogel
authored andcommitted
bonding: ratelimit no-delay interface up messages
This work around should be reverted when upstream commit (d8b91dd Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip) is available in uek. Issue appear is fixed in upstream tag 4.16.0 . Tag 4.15.0 still has this issue. The lask known commit on the perf topic branch that solve this issue is this: (c19d084 (tag: perf-core-for-mingo-4.16-20180125) perf trace beauty flock: Move to separate object file). Without this commit the perf topic branch has the below issue. With this commit the branch does not have the issue. Issue is that the above commit does not fix the issue on top of upstream tag 4.15.0. So the issue is probably fixed by this commit and some additional commits on the perf topic branch *or/and* on master branch below the point that the perf branch was branched. Also this specific commit is not a fix and the only possible relation to this bug is that it touches the 'flock' code which is used by bash/scripts to synchronize. To find the additional commits via git bisect I need to re-order the commits so that the above commit will be *below* the other commits that solve this issue. To do that I need to know what's the lowest commit that relate to this fix. I do not know and have no way to know that. Attempt to merge the perf topic on top of uek5 produce ~20k commits and tons of merge conflicts as uek5 is way behind the upstream. So can't even know if the topic branch with it's ~270 commits fix this issue for uek5. So I chose to work-around the issue and wait for the upstream topic merge to obsolite this commit. When issue occuer: Serial is flooded with messages: [71266.680745] bondib0: link status up for interface ib0, enabling it in 0 ms [71266.682740] bondib0: link status up for interface ib0, enabling it in 0 ms [71266.685738] bondib0: link status up for interface ib0, enabling it in 0 ms Then panic occur: [71266.695757] INFO: task NetworkManager:5837 blocked for more than 120 seconds. [71266.695759] Not tainted 4.14.35-1902.0.6.el7uek.x86_64 #2 [71266.695760] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [71266.695761] NetworkManager D 0 5837 1 0x00000082 [71266.695765] Call Trace: [71266.695778] __schedule+0x2bc/0x8da [71266.695782] schedule+0x36/0x7c [71266.695785] schedule_preempt_disabled+0xe/0x10 [71266.695788] __mutex_lock.isra.5+0x20c/0x634 [71266.695792] __mutex_lock_slowpath+0x13/0x15 [71266.695794] mutex_lock+0x2f/0x3a [71266.695800] rtnetlink_rcv_msg+0x1d0/0x289 [71266.695806] ? __skb_try_recv_datagram+0xca/0x174 [71266.695809] ? rtnl_calcit.isra.25+0x110/0x103 [71266.695812] netlink_rcv_skb+0xdf/0x111 [71266.695816] rtnetlink_rcv+0x15/0x17 [71266.695818] netlink_unicast+0x18d/0x255 [71266.695820] netlink_sendmsg+0x2df/0x3cc [71266.695825] sock_sendmsg+0x3e/0x4a [71266.695828] ___sys_sendmsg+0x2b5/0x2c6 [71266.695832] ? arch_tlb_finish_mmu+0x1b/0xcb [71266.695835] ? tlb_finish_mmu+0x23/0x30 [71266.695838] ? unmap_region+0xf4/0x12d [71266.695844] ? lockref_put_or_lock+0x44/0x72 [71266.695846] ? __vma_rb_erase+0x10f/0x1f4 [71266.695850] __sys_sendmsg+0x54/0x8d [71266.695854] SyS_sendmsg+0x12/0x1c [71266.695860] do_syscall_64+0x79/0x1ae [71266.695864] entry_SYSCALL_64_after_hwframe+0x151/0x0 [71266.695866] RIP: 0033:0x7f16f2553c5d [71266.695867] RSP: 002b:00007ffff7a493f0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [71266.695870] RAX: ffffffffffffffda RBX: 00005570a5026380 RCX: 00007f16f2553c5d [71266.695874] RDX: 0000000000000000 RSI: 00007ffff7a49420 RDI: 0000000000000007 [71266.695875] RBP: 00007ffff7a49420 R08: 0000000000000001 R09: 0000000000000000 [71266.695876] R10: 0000000000000808 R11: 0000000000000293 R12: 00005570a5026380 [71266.695876] R13: 0000000000000000 R14: 0000000000000000 R15: 00007f16d4004b70 Issue analysis: The ip process is hung in addrconf_notify while trying to print to serial one of the below messages: "ADDRCONF(NETDEV_UP): %s: link is not ready\n" "ADDRCONF(NETDEV_CHANGE): %s: link becomes ready\n" The ip process hold the rtnl_lock while network-manager process try to grab this lock in 1 msec loop and every time it fail to grab the lock, the network-manager send additional line to the serial log as seen in the dmesg: "bondib0: link status up for interface ib0, enabling it in 0 ms" So the bond device flood the serial while waiting for the rtnl_lock while ip hold the rtnl_lock while waiting for the serial. Offending stack trace from vmcore is this: PID: 30063 TASK: ffff909c3f675a00 CPU: 7 COMMAND: "ip" #0 [fffffe000013ce38] crash_nmi_callback at ffffffff8e059ba7 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/paravirt.h: 99 #1 [fffffe000013ce48] nmi_handle at ffffffff8e032748 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 137 #2 [fffffe000013cea0] default_do_nmi at ffffffff8e032c96 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 336 #3 [fffffe000013cec8] do_nmi at ffffffff8e032e76 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/kernel/nmi.c: 521 #4 [fffffe000013cef0] end_repeat_nmi at ffffffff8ea0436f /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 1750 [exception RIP: delay_tsc+51] RIP: ffffffff8e8558f3 RSP: ffff9f63c6c07390 RFLAGS: 00000046 RAX: 0000000016d23977 RBX: ffffffff903fbc00 RCX: 00009b7616d23038 RDX: 0000000000009b76 RSI: 0000000000000007 RDI: 000000000000095a RBP: ffff9f63c6c07390 R8: 00000000fffffffe R9: 0000000000000000 R10: 0000000000000005 R11: 0000000000020503 R12: 000000000000261f R13: 0000000000000020 R14: ffffffff8f96de2f R15: ffffffff903fbc00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193 <NMI exception stack> #5 [ffff9f63c6c07390] delay_tsc at ffffffff8e8558f3 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/include/asm/msr.h: 193 #6 [ffff9f63c6c07398] __const_udelay at ffffffff8e855838 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/lib/delay.c: 176 #7 [ffff9f63c6c073a8] wait_for_xmitr at ffffffff8e510dcc /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/nmi.h: 126 #8 [ffff9f63c6c073d0] serial8250_console_putchar at ffffffff8e510e6c /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/serial_core.h: 265 #9 [ffff9f63c6c073f0] uart_console_write at ffffffff8e509573 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/serial_core.c: 1886 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_port.c: 3256 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/drivers/tty/serial/8250/8250_core.c: 598 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1574 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1766 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1808 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk_safe.c: 402 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/printk/printk.c: 1842 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/ipv6/addrconf.c: 3532 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 95 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/kernel/notifier.c: 402 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1682 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 1697 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/dev.c: 6903 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2072 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 2624 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4255 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 2433 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/core/rtnetlink.c: 4268 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1287 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/netlink/af_netlink.c: 1877 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 646 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2061 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/include/linux/file.h: 26 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/net/socket.c: 2102 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/common.c: 295 /usr/src/debug/kernel-4.14.35/linux-4.14.35-1902.0.6.el7uek/arch/x86/entry/entry_64.S: 247 RIP: 00007faf75ccafd0 RSP: 00007ffc710a9368 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 000000005c65f66d RCX: 00007faf75ccafd0 RDX: 0000000000000000 RSI: 00007ffc710a93b0 RDI: 0000000000000003 RBP: 00007ffc710a93b0 R8: 0000000000000000 R9: 0000000000000008 R10: 00007ffc710a8f30 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000066a440 R14: 00007ffc710a9458 R15: 00007ffc710a9b88 ORIG_RAX: 000000000000002e CS: 0033 SS: 002b Orabug: 29357838 Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Aron Silverton <aron.silverton@oracle.com> Reviewed-by: John Haxby <john.haxby@oracle.com>
1 parent 7a81273 commit e96bb93

File tree

1 file changed

+12
-5
lines changed

1 file changed

+12
-5
lines changed

drivers/net/bonding/bond_main.c

+12-5
Original file line numberDiff line numberDiff line change
@@ -2102,11 +2102,18 @@ static int bond_miimon_inspect(struct bonding *bond)
21022102
slave->delay = bond->params.updelay;
21032103

21042104
if (slave->delay) {
2105-
netdev_info(bond->dev, "link status up for interface %s, enabling it in %d ms\n",
2106-
slave->dev->name,
2107-
ignore_updelay ? 0 :
2108-
bond->params.updelay *
2109-
bond->params.miimon);
2105+
static DEFINE_RATELIMIT_STATE(
2106+
rs,
2107+
DEFAULT_RATELIMIT_INTERVAL,
2108+
DEFAULT_RATELIMIT_BURST);
2109+
if (__ratelimit(&rs))
2110+
netdev_info(
2111+
bond->dev,
2112+
"link status up for interface %s, enabling it in %d ms\n",
2113+
slave->dev->name,
2114+
ignore_updelay ? 0 :
2115+
bond->params.updelay *
2116+
bond->params.miimon);
21102117
}
21112118
/*FALLTHRU*/
21122119
case BOND_LINK_BACK:

0 commit comments

Comments
 (0)