You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Also revert "rds: ib: Correct the cm_id compare commit"
This reverts commit 7999a6a.
This reverts commit 7b0b7b2.
Replacing a stale pointer by a cyclically allocated identifier
that's hypothetically less likely to be stale
doesn't really solve the root cause of the staleness.
These pointer compares have a long history going back to:
commit fe00cae ("RDS: give up on half formed connections after 15s")
when a "ic->i_cm_id == cm_id" compare was introduced inside
"rds_ib_cm_handle_connect" in order to test if the upper-layer "ic->i_cm_id"
is already pointing to the underlying "cm_id".
However, upon arrival of a "RDMA_CM_EVENT_CONNECT_REQUEST" event,
the "cm_id" is expected to have been freshly allocated,
("cm_req_handler" unconditionally calls "ib_create_cm_id",
which allocates with "kzalloc")
and therefore cm_id->context is expected to be "NULL".
What's more likely in this case is that due to the missing
invalidation of "ic->i_cm_id" whenever "rds_rdma_cm_event_handler_cmn"
returns a non-zero value, that "ic->i_cm_id" was still pointing
to a "cm_id" that consequently gets destroyed ("rdma_destroy_id")
due to the non-zero return value of the "event_handler".
So a subsequent "kzalloc" may return the same pointer, but not
because it's the same object, but a different object that happened
to land on the same address as the never invalidated "ic->i_cm_id"
pointer.
Back to the commits we're reverting here.
There were a number of problems with this commit:
1) "drivers/infiniband/core/cma.c" is completely unaware of of the
"rds_ib_rdma_destroy_id" wrapper and will continue to call
"rdma_destroy_id" instead.
So the point of keeping track of and being able to invalidate these identifiers
is missed whenever it's "cma.c" that destroys the "cm_id" due to a non-zero
return value of "event_handler".
Not only that, but this will inevitably lead to identifier / memory-leaks.
2) The identifiers started with (start == 0) in "idr_alloc_cyclic",
and then cast to "(struct rds_connection *)(unsigned long)".
If we ignore the confusing nature of this pointer cast for a moment,
we now no longer can distinguish between a "NULL" pointer that was a result
of having cast an IDR value of "== 0" and a real NULL pointer
after the "kzalloc".
A freshly allocated "cm_id" would stand a chance to be mistaken for
"IDR value == 0".
3) There is no error check on the return value of "idr_alloc_cyclic".
If identifiers would run out (e.g. due to the leak desribed in #1),
"cm_id->context" would just end up with a value of PTR_ERR(-ENOSPC)
or whatever error code "idr_alloc_cyclic" returned.
That would lead to a number of hard to debug problems.
4) "RDS_IB_NO_CTX" was defined as "ERR_PTR(ENOENT)"
The convention is to use a negative errno in the kernel,
i.e. "ERR_PTR(-ENOENT)".
"ERR_PTR" is defined as an inline function that just casts
its parameter to a pointer.
By using a positive value of (ENOENT == 2) this also causes
a collision with IDR value "== 2", as both of them end up
with the same value in "cm_id->context".
So instead of making stale pointers or stale identifiers less likely,
we fix the root-cause(s) of these problems in subsequent commits.
Please note that these reverts also removes code that had been
introduced on behalf of this functionality (ow reverted)
between the original commit and this revert in
commit 2b13b0b ("rds: add tracepoint for RDS IB errors, info")
* -DEFINE_EVENT(rds_ib, rds_ib_cm_mismatch, ...);
* - reason = "rds_ib_rdma_create_id failed";
+ reason = "rdma_create_id failed";
* -bool rds_ib_same_cm_id(struct rds_ib_connection *ic, struct rdma_cm_id *cm_id)
* -EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_cm_mismatch);
Orabug: 32373816
Fixes: 7b0b7b2 ("rds: ib: Correct the cm_id compare commit")
Fixes: 7999a6a ("rds: ib: Implement proper cm_id compare")
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Reviewed-by: Ka-cheong Poon <ka-cheong.poon@oracle.com>
(cherry picked from commit d91b564)
Port to U3
Signed-off-by: Jack Vogel <jack.vogel@oracle.com>
Orabug: 33590097
UEK6 => UEK7
(cherry picked from commit 95037a6)
cherry-pick-repo=UEK/production/linux-uek.git
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Orabug: 33590087
UEK7 => LUCI
(cherry picked from commit e359ce1)
cherry-pick-repo=UEK/production/linux-uek.git
Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
0 commit comments