Skip to content

Commit dd1e6d4

Browse files
committed
libvirt: Increase incremental and max sleep time during device detach
Bug #1894804 outlines how DEVICE_DELETED events were often missing from QEMU on Focal based OpenStack CI hosts as originally seen in bug #1882521. This has eventually been tracked down to some undefined QEMU behaviour when a new device_del QMP command is received while another is still being processed, causing the original attempt to be aborted. We hit this race in slower OpenStack CI envs as n-cpu rather crudely retries attempts to detach devices using the RetryDecorator from oslo.service. The default incremental sleep time currently being tight enough to ensure QEMU is still processing the first device_del request on these slower CI hosts when n-cpu asks libvirt to retry the detach, sending another device_del to QEMU hitting the above behaviour. Additionally we have also seen the following check being hit when testing with QEMU >= v5.0.0. This check now rejects overlapping device_del requests in QEMU rather than aborting the original: qemu/qemu@cce8944 This change aims to avoid this situation entirely by raising the default incremental sleep time between detach requests from 2 seconds to 10, leaving enough time for the first attempt to complete. The overall maximum sleep time is also increased from 30 to 60 seconds. Future work will aim to entirely remove this retry logic with a libvirt event driven approach, polling for the the VIR_DOMAIN_EVENT_ID_DEVICE_REMOVED and VIR_DOMAIN_EVENT_ID_DEVICE_REMOVAL_FAILED events before retrying. Finally, the cleanup of unused arguments in detach_device_with_retry is left for a follow up change in order to keep this initial change small enough to quickly backport. Closes-Bug: #1882521 Related-Bug: #1894804 Change-Id: Ib9ed7069cef5b73033351f7a78a3fb566753970d
1 parent 2beb184 commit dd1e6d4

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

nova/virt/libvirt/guest.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -367,8 +367,8 @@ def get_all_devices(self, devtype=None):
367367
return devs
368368

369369
def detach_device_with_retry(self, get_device_conf_func, device, live,
370-
max_retry_count=7, inc_sleep_time=2,
371-
max_sleep_time=30,
370+
max_retry_count=7, inc_sleep_time=10,
371+
max_sleep_time=60,
372372
alternative_device_name=None,
373373
supports_device_missing_error_code=False):
374374
"""Detaches a device from the guest. After the initial detach request,

0 commit comments

Comments
 (0)