Skip to content

NO-JIRA: OVNK BGP: workaround OCPBUGS-56488 #29853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 29, 2025

Conversation

jcaamano
Copy link
Contributor

@jcaamano jcaamano commented May 27, 2025

Due to a bug in the liveness probe of kubernetes-nmstate reported in OCPBUGS-56488, we need, as a workaround, to scale down the operator and disable the probe.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 27, 2025
@openshift-ci-robot
Copy link

@jcaamano: This pull request explicitly references no jira issue.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jcaamano
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview

@openshift-ci openshift-ci bot requested review from trozet and tssurya May 27, 2025 13:13
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 27, 2025

// This is a workaround for OCPBUGS-56488: scale down
// nmstate operator and disable liveness probe
o.Eventually(workaroundOCPBUGS56488).WithArguments(oc).WithTimeout(3 * time.Minute).WithPolling(5 * time.Second).Should(o.BeTrue())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this end up running in paralel ? Or we have just one test at this BeforeEach ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its only one test, but it wouldn't be an issue if it ran in parallel

Comment on lines 1515 to 1519
opPatch := []byte(`{"spec":{"replicas": 0}}`)
_, err := oc.AdminKubeClient().AppsV1().Deployments(nmstateNamespace).Patch(context.Background(), "nmstate-operator", types.MergePatchType, opPatch, metav1.PatchOptions{})
if err != nil {
return false, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need to wait nmstate-operator pods to not be there after scale them down ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unlikely to be an issue since we are waiting until the daemonset is fully deployed with the new prob so it would require the operator to not have scaled down yet but also it not updating the daemonset in all that time.

Anyway, added a check for this as well.

@jcaamano jcaamano force-pushed the ovnk-bgp-bug56488-wa branch from 46b90af to 2b66987 Compare May 27, 2025 13:37
@jcaamano
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview

@jcaamano jcaamano force-pushed the ovnk-bgp-bug56488-wa branch 2 times, most recently from 5717d28 to fb607fb Compare May 27, 2025 14:17
@jcaamano
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview

@jcaamano
Copy link
Contributor Author

/test ?

Copy link
Contributor

openshift-ci bot commented May 27, 2025

@jcaamano: The following commands are available to trigger required jobs:

/test e2e-aws-jenkins
/test e2e-aws-ovn-edge-zones
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-image-registry
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-ovn
/test e2e-gcp-ovn-builds
/test e2e-gcp-ovn-image-ecosystem
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi
/test images
/test lint
/test okd-scos-images
/test unit
/test verify
/test verify-deps

The following commands are available to trigger optional jobs:

/test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
/test e2e-agnostic-ovn-cmd
/test e2e-aws
/test e2e-aws-csi
/test e2e-aws-disruptive
/test e2e-aws-etcd-certrotation
/test e2e-aws-etcd-recovery
/test e2e-aws-ovn
/test e2e-aws-ovn-cgroupsv2
/test e2e-aws-ovn-etcd-scaling
/test e2e-aws-ovn-ipsec-serial
/test e2e-aws-ovn-kube-apiserver-rollout
/test e2e-aws-ovn-kubevirt
/test e2e-aws-ovn-serial-publicnet-1of2
/test e2e-aws-ovn-serial-publicnet-2of2
/test e2e-aws-ovn-single-node
/test e2e-aws-ovn-single-node-serial
/test e2e-aws-ovn-single-node-techpreview
/test e2e-aws-ovn-single-node-techpreview-serial
/test e2e-aws-ovn-single-node-upgrade
/test e2e-aws-ovn-upgrade
/test e2e-aws-ovn-upgrade-rollback
/test e2e-aws-ovn-upi
/test e2e-aws-ovn-virt-techpreview
/test e2e-aws-proxy
/test e2e-azure
/test e2e-azure-ovn-etcd-scaling
/test e2e-azure-ovn-upgrade
/test e2e-baremetalds-kubevirt
/test e2e-external-aws
/test e2e-external-aws-ccm
/test e2e-external-vsphere-ccm
/test e2e-gcp-csi
/test e2e-gcp-disruptive
/test e2e-gcp-fips-serial-1of2
/test e2e-gcp-fips-serial-2of2
/test e2e-gcp-ovn-etcd-scaling
/test e2e-gcp-ovn-rt-upgrade
/test e2e-gcp-ovn-techpreview
/test e2e-gcp-ovn-techpreview-serial-1of2
/test e2e-gcp-ovn-techpreview-serial-2of2
/test e2e-gcp-ovn-usernamespace
/test e2e-hypershift-conformance
/test e2e-metal-ipi-ovn
/test e2e-metal-ipi-ovn-dualstack
/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview
/test e2e-metal-ipi-ovn-dualstack-bgp-techpreview
/test e2e-metal-ipi-ovn-dualstack-local-gateway
/test e2e-metal-ipi-ovn-kube-apiserver-rollout
/test e2e-metal-ipi-serial-1of2
/test e2e-metal-ipi-serial-2of2
/test e2e-metal-ipi-serial-ovn-ipv6-1of2
/test e2e-metal-ipi-serial-ovn-ipv6-2of2
/test e2e-metal-ipi-virtualmedia
/test e2e-metal-ovn-single-node-live-iso
/test e2e-metal-ovn-single-node-with-worker-live-iso
/test e2e-metal-ovn-two-node-arbiter
/test e2e-metal-ovn-two-node-fencing
/test e2e-openstack-ovn
/test e2e-openstack-serial
/test e2e-vsphere-ovn-dualstack-primaryv6
/test e2e-vsphere-ovn-etcd-scaling
/test okd-e2e-gcp
/test okd-scos-e2e-aws-ovn

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-origin-main-4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
pull-ci-openshift-origin-main-e2e-agnostic-ovn-cmd
pull-ci-openshift-origin-main-e2e-aws
pull-ci-openshift-origin-main-e2e-aws-csi
pull-ci-openshift-origin-main-e2e-aws-disruptive
pull-ci-openshift-origin-main-e2e-aws-ovn
pull-ci-openshift-origin-main-e2e-aws-ovn-cgroupsv2
pull-ci-openshift-origin-main-e2e-aws-ovn-edge-zones
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling
pull-ci-openshift-origin-main-e2e-aws-ovn-fips
pull-ci-openshift-origin-main-e2e-aws-ovn-kube-apiserver-rollout
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift
pull-ci-openshift-origin-main-e2e-aws-ovn-microshift-serial
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-1of2
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-2of2
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-publicnet-1of2
pull-ci-openshift-origin-main-e2e-aws-ovn-serial-publicnet-2of2
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-serial
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade
pull-ci-openshift-origin-main-e2e-aws-ovn-upgrade
pull-ci-openshift-origin-main-e2e-aws-proxy
pull-ci-openshift-origin-main-e2e-azure
pull-ci-openshift-origin-main-e2e-azure-ovn-etcd-scaling
pull-ci-openshift-origin-main-e2e-azure-ovn-upgrade
pull-ci-openshift-origin-main-e2e-gcp-csi
pull-ci-openshift-origin-main-e2e-gcp-disruptive
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-1of2
pull-ci-openshift-origin-main-e2e-gcp-fips-serial-2of2
pull-ci-openshift-origin-main-e2e-gcp-ovn
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling
pull-ci-openshift-origin-main-e2e-gcp-ovn-rt-upgrade
pull-ci-openshift-origin-main-e2e-gcp-ovn-upgrade
pull-ci-openshift-origin-main-e2e-hypershift-conformance
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-dualstack-local-gateway
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-kube-apiserver-rollout
pull-ci-openshift-origin-main-e2e-metal-ipi-serial-1of2
pull-ci-openshift-origin-main-e2e-metal-ipi-serial-2of2
pull-ci-openshift-origin-main-e2e-metal-ipi-serial-ovn-ipv6-1of2
pull-ci-openshift-origin-main-e2e-metal-ipi-serial-ovn-ipv6-2of2
pull-ci-openshift-origin-main-e2e-metal-ipi-virtualmedia
pull-ci-openshift-origin-main-e2e-openstack-ovn
pull-ci-openshift-origin-main-e2e-openstack-serial
pull-ci-openshift-origin-main-e2e-vsphere-ovn
pull-ci-openshift-origin-main-e2e-vsphere-ovn-dualstack-primaryv6
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling
pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi
pull-ci-openshift-origin-main-images
pull-ci-openshift-origin-main-lint
pull-ci-openshift-origin-main-okd-e2e-gcp
pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn
pull-ci-openshift-origin-main-okd-scos-images
pull-ci-openshift-origin-main-unit
pull-ci-openshift-origin-main-verify
pull-ci-openshift-origin-main-verify-deps

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jcaamano
Copy link
Contributor Author

/test e2e-aws-ovn-ipsec-serial

@jcaamano jcaamano force-pushed the ovnk-bgp-bug56488-wa branch from fb607fb to ea97140 Compare May 27, 2025 18:39
@jcaamano
Copy link
Contributor Author

/test e2e-aws-ovn-ipsec-serial
/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview

Copy link

openshift-trt bot commented May 28, 2025

Job Failure Risk Analysis for sha: ea97140

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-disruptive Medium
[sig-node] static pods should start after being created
Potential external regression detected for High Risk Test analysis
pull-ci-openshift-origin-main-e2e-vsphere-ovn-etcd-scaling Medium
[sig-architecture] platform pods in ns/openshift-etcd should not exit an excessive amount of times
Potential external regression detected for High Risk Test analysis

@jcaamano
Copy link
Contributor Author

/test e2e-metal-ipi-ovn-dualstack-bgp-local-gw-techpreview

@jcaamano jcaamano force-pushed the ovnk-bgp-bug56488-wa branch from ea97140 to 26364f4 Compare May 28, 2025 12:59
@qinqon
Copy link
Contributor

qinqon commented May 28, 2025

/lgtm
/approve

Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
@jcaamano jcaamano force-pushed the ovnk-bgp-bug56488-wa branch from 26364f4 to 712a943 Compare May 28, 2025 13:01
@qinqon
Copy link
Contributor

qinqon commented May 28, 2025

/lgtm
/approve

Copy link
Contributor

openshift-ci bot commented May 28, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jcaamano, qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label May 28, 2025
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD 2836632 and 2 for PR HEAD 712a943 in total

@jcaamano
Copy link
Contributor Author

/retest-required

Copy link
Contributor

openshift-ci bot commented May 28, 2025

@jcaamano: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-azure-ovn-upgrade 712a943 link false /test e2e-azure-ovn-upgrade
ci/prow/e2e-metal-ipi-virtualmedia 712a943 link false /test e2e-metal-ipi-virtualmedia
ci/prow/e2e-aws-ovn-serial-publicnet-1of2 712a943 link false /test e2e-aws-ovn-serial-publicnet-1of2
ci/prow/e2e-aws-ovn-single-node-serial 712a943 link false /test e2e-aws-ovn-single-node-serial
ci/prow/e2e-aws-ovn-kube-apiserver-rollout 712a943 link false /test e2e-aws-ovn-kube-apiserver-rollout
ci/prow/e2e-gcp-disruptive 712a943 link false /test e2e-gcp-disruptive
ci/prow/okd-e2e-gcp 712a943 link false /test okd-e2e-gcp
ci/prow/e2e-gcp-ovn-etcd-scaling 712a943 link false /test e2e-gcp-ovn-etcd-scaling
ci/prow/e2e-gcp-fips-serial-2of2 712a943 link false /test e2e-gcp-fips-serial-2of2
ci/prow/e2e-vsphere-ovn-etcd-scaling 712a943 link false /test e2e-vsphere-ovn-etcd-scaling
ci/prow/e2e-aws-disruptive 712a943 link false /test e2e-aws-disruptive
ci/prow/e2e-aws-ovn-single-node-upgrade 712a943 link false /test e2e-aws-ovn-single-node-upgrade
ci/prow/e2e-openstack-ovn 712a943 link false /test e2e-openstack-ovn
ci/prow/e2e-aws-ovn 712a943 link false /test e2e-aws-ovn
ci/prow/e2e-vsphere-ovn-dualstack-primaryv6 712a943 link false /test e2e-vsphere-ovn-dualstack-primaryv6
ci/prow/e2e-gcp-fips-serial-1of2 712a943 link false /test e2e-gcp-fips-serial-1of2
ci/prow/4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback 712a943 link false /test 4.12-upgrade-from-stable-4.11-e2e-aws-ovn-upgrade-rollback
ci/prow/e2e-azure-ovn-etcd-scaling 712a943 link false /test e2e-azure-ovn-etcd-scaling
ci/prow/e2e-metal-ipi-ovn-kube-apiserver-rollout 712a943 link false /test e2e-metal-ipi-ovn-kube-apiserver-rollout
ci/prow/e2e-aws-ovn-etcd-scaling 712a943 link false /test e2e-aws-ovn-etcd-scaling
ci/prow/e2e-openstack-serial 712a943 link false /test e2e-openstack-serial

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

openshift-trt bot commented May 28, 2025

Job Failure Risk Analysis for sha: 712a943

Job Name Failure Risk
pull-ci-openshift-origin-main-e2e-aws-ovn-etcd-scaling High
[sig-etcd][Feature:EtcdVerticalScaling][Suite:openshift/etcd/scaling][Serial] etcd is able to vertically scale up and down when CPMS is disabled [apigroup:machine.openshift.io]
This test has passed 100.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:aws SecurityMode:default Topology:ha Upgrade:none] in the last week.
---
operator conditions etcd
This test has passed 99.71% of 3785 runs on release 4.20 [Overall] in the last week.
---
operator conditions kube-apiserver
This test has passed 99.42% of 3785 runs on release 4.20 [Overall] in the last week.
pull-ci-openshift-origin-main-e2e-aws-ovn-single-node-upgrade IncompleteTests
Tests for this run (26) are below the historical average (3811): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)
pull-ci-openshift-origin-main-e2e-gcp-disruptive High
[sig-node] static pods should start after being created
This test has passed 99.06% of 5309 runs on release 4.20 [Overall] in the last week.
---
[sig-etcd] etcd should not log excessive took too long messages
This test has passed 98.11% of 5227 runs on release 4.20 [Overall] in the last week.
---
[bz-Etcd] clusteroperator/etcd should not change condition/Available
This test has passed 99.81% of 5309 runs on release 4.20 [Overall] in the last week.
---
[sig-arch][Late] operators should not create watch channels very often
This test has passed 99.79% of 4852 runs on release 4.20 [Overall] in the last week.

Open Bugs
Component Readiness Shows Old Test Name For Renamed Tests
ResilientWatchCacheInitialization (Re)enablement - operator watch counts from component readiness
pull-ci-openshift-origin-main-e2e-gcp-ovn-etcd-scaling Low
[bz-Cloud Compute] clusteroperator/control-plane-machine-set should not change condition/Degraded
This test has passed 0.00% of 1 runs on release 4.20 [Architecture:amd64 FeatureSet:default Installer:ipi JobTier:rare Network:ovn NetworkStack:ipv4 Owner:eng Platform:gcp SecurityMode:default Topology:ha Upgrade:none] in the last week.

@openshift-merge-bot openshift-merge-bot bot merged commit 88edc2f into openshift:main May 29, 2025
38 of 59 checks passed
@openshift-bot
Copy link
Contributor

[ART PR BUILD NOTIFIER]

Distgit: openshift-enterprise-tests
This PR has been included in build openshift-enterprise-tests-container-v4.20.0-202505290511.p0.g88edc2f.assembly.stream.el9.
All builds following this will include this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants