-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIC passthrough perf test #3706
base: main
Are you sure you want to change the base?
Conversation
For device passthrough case, some test need specific PCI device to be run with. Vendor/Device id based pool might assign unwanted devices if many devices are present. Addind support to pass bdf list to create pool here. Signed-off-by: Smit Gardhariya <sgardhariya@microsoft.com>
Add suuport to run iperf3 server with specific IP. Signed-off-by: Smit Gardhariya <sgardhariya@microsoft.com>
Add support to run netperf server/client with specific IP. Signed-off-by: Smit Gardhariya <sgardhariya@microsoft.com>
48b3596
to
ba8af94
Compare
self, | ||
node: Node, | ||
) -> Tuple[RemoteNode, str]: | ||
ctx = get_node_context(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid to use context in test cases, use features to operate nodes, so it's easy to support multiple platforms, and decouple cases and platforms.
So, use the NetworkInterface
feature, add a method like enable_passthrough
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, Will make this change
a1a57d3
to
25e82b7
Compare
@@ -135,6 +136,8 @@ def run_as_server_async( | |||
cmd += f" -f {report_unit} " | |||
if port: | |||
cmd += f" -p {port} " | |||
if interface_ip: | |||
cmd += f" -B {interface_ip}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain what does -B
exactly do ?
Something like server_ip
is passed from the client while running the iperf tool. What does the new interface_ip
additional do here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@purna It will run the command and use dedicated interface passed with -B
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One server might have multiple interface and we want to run with dedicated passthrough NIC IP
@@ -207,6 +207,7 @@ def perf_tcp_pps( | |||
test_type: str, | |||
server: Optional[RemoteNode] = None, | |||
client: Optional[RemoteNode] = None, | |||
run_with_internal_address: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use_internal_address
would be a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pupacha Sure will make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make this change globally in all files.
if server is not None or client is not None: | ||
assert server is not None, "server need to be specified, if client is set" | ||
assert client is not None, "client need to be specified, if server is set" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if server or client
is sufficient. is not None
is redundant.
Same for the assert statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pupacha Agree but it was causing issue with flak8 or mypy validation if i remember correctly. So i put it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you post the error message here?
assert passthrough_nic_ip, "Can not find interface IP" | ||
|
||
test_node = cast(RemoteNode, node) | ||
test_node.internal_address = passthrough_nic_ip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is internal_address
is a field to store any internal network's IP. For example, on an Azure node, it is IP associated with Vnet and for libvirt platforms, it is IP of virtual net setup that libvirt already has.
A testcase should not change node's internal_address
. This attribute ownership has to be just with platform.
@squirrelsc , please correct me if my understanding is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is internal address of node only. In passthrough testing, node will have 2 interfaces configured. One is virtio-net and other is with vfio. Any case as of now won't use internal_address
attribute for passthrough.
For Nating usecases just like we have for libvirt platform, we can use it but node is getting initialized there having host detail similar for internal_address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont think I fully understand it.
TestCase should not change the internal_address
or any attribute of the node. It should only be done by platform.
There could be other entities like Notifier, Transformer, Feature that could use internal_address
. I agree that its not used anywhere as of now, but we shouldn't change it.
May be we should figure out another way of storing IP assigned with the passthrough-ed NIC. How about having PassthroughNIC
as a feature and platform will create the feature based on the passthrough config from runbook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anirudhrb @gamora12 Any thoughts on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feature approach is cleaner. If it works, we should go with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we can have a feature for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like a tool more than a feature. The tool returns passthrough IP, and you can use it for testing. The features are used for platform operations, for example, add a data disk by Azure. The passthrough IP can be got from the VM itself, a tool is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since, the device assignment to VM is already taken care by the platform (with device pool), the only part left is to fetch the IP etc which are all node operations. I agree, Tool
is more applicable here.
server = self._get_host_as_server(variables) | ||
client, _ = self._configure_passthrough_nic_for_node(node) | ||
|
||
# Reboot the nodes. Libvirt sometime re-use the nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean Libvirt sometime re-use the node
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebooting the node will create fresh state conditions. I have seen that is requirements are similar, node is getting re-used.
# Reboot the nodes. Libvirt sometime re-use the nodes. | ||
# Try to run the test on fresh state of the nodes | ||
client.reboot() | ||
server.reboot(time_out=1200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no change happening to host, from the time the test started, so reboot is not needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have seen some port being occupied issue where we run test with libvirt platform one by one. So, rebooting it will create fresh state of VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would server ports be occupied?
If the tests are doing this, it should be cleanup proper in the after_case section.
vendor_id=vendor_id, | ||
device_id=device_id, | ||
) | ||
elif isinstance(devices, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be PciAddressIdentifier
instead of dict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be dict when it runbook loads it. PciAddressIdentifier is getting treated as dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If thats the case, why use isinstance(d, VendorDeviceIdIdentifier)
in line 60 ?
Add testcases for nttcp/iperf3/netperf for NIC device passthrough cases for host-guest and guest-guest scenarios. Signed-off-by: Smit Gardhariya <sgardhariya@microsoft.com>
25e82b7
to
9ee8763
Compare
return test_node, interface_name | ||
|
||
def _get_host_as_server(self, variables: Dict[str, Any]) -> RemoteNode: | ||
ip = variables.get("baremetal_host_ip", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't need to be baremetal, and the name should specify where the variables used. How about networkperf_peer_xxx
?
@dataclass | ||
class PciAddressIdentifier: | ||
# list of bdf like 0000:3b:00.0 - <domain>:<bus>:<slot>.<fn> | ||
pci_bdf: List[str] = field(default_factory=list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Identifier should have one BDF. Same as how the VendorDeviceIdIdentifer has one set of (vendor_id, device_id).
pool[iommu_group] = devices | ||
self.available_host_devices[pool_type] = pool | ||
|
||
def _get_pci_address_instance( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a constructor in DeviceAddressSchema
.
Also, DeviceAddressSchema
can be renamed to just DeviceAddress
if len(devices) > 1: | ||
raise LisaException( | ||
"Device Pool does not support more than one " | ||
"vendor/device id list for given pool type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we support one set of vendor id + device id or one list?
Add support to create libvirt device pool from BDF list
Add NIC perf test for passthrough scenario