Skip to content

Commit 747f45e

Browse files
gerd-rauschvijay-suman
authored andcommitted
Revert "rds: ib: Implement proper cm_id compare"
Also revert "rds: ib: Correct the cm_id compare commit" This reverts commit 7999a6a. This reverts commit 7b0b7b2. Replacing a stale pointer by a cyclically allocated identifier that's hypothetically less likely to be stale doesn't really solve the root cause of the staleness. These pointer compares have a long history going back to: commit fe00cae ("RDS: give up on half formed connections after 15s") when a "ic->i_cm_id == cm_id" compare was introduced inside "rds_ib_cm_handle_connect" in order to test if the upper-layer "ic->i_cm_id" is already pointing to the underlying "cm_id". However, upon arrival of a "RDMA_CM_EVENT_CONNECT_REQUEST" event, the "cm_id" is expected to have been freshly allocated, ("cm_req_handler" unconditionally calls "ib_create_cm_id", which allocates with "kzalloc") and therefore cm_id->context is expected to be "NULL". What's more likely in this case is that due to the missing invalidation of "ic->i_cm_id" whenever "rds_rdma_cm_event_handler_cmn" returns a non-zero value, that "ic->i_cm_id" was still pointing to a "cm_id" that consequently gets destroyed ("rdma_destroy_id") due to the non-zero return value of the "event_handler". So a subsequent "kzalloc" may return the same pointer, but not because it's the same object, but a different object that happened to land on the same address as the never invalidated "ic->i_cm_id" pointer. Back to the commits we're reverting here. There were a number of problems with this commit: 1) "drivers/infiniband/core/cma.c" is completely unaware of of the "rds_ib_rdma_destroy_id" wrapper and will continue to call "rdma_destroy_id" instead. So the point of keeping track of and being able to invalidate these identifiers is missed whenever it's "cma.c" that destroys the "cm_id" due to a non-zero return value of "event_handler". Not only that, but this will inevitably lead to identifier / memory-leaks. 2) The identifiers started with (start == 0) in "idr_alloc_cyclic", and then cast to "(struct rds_connection *)(unsigned long)". If we ignore the confusing nature of this pointer cast for a moment, we now no longer can distinguish between a "NULL" pointer that was a result of having cast an IDR value of "== 0" and a real NULL pointer after the "kzalloc". A freshly allocated "cm_id" would stand a chance to be mistaken for "IDR value == 0". 3) There is no error check on the return value of "idr_alloc_cyclic". If identifiers would run out (e.g. due to the leak desribed in #1), "cm_id->context" would just end up with a value of PTR_ERR(-ENOSPC) or whatever error code "idr_alloc_cyclic" returned. That would lead to a number of hard to debug problems. 4) "RDS_IB_NO_CTX" was defined as "ERR_PTR(ENOENT)" The convention is to use a negative errno in the kernel, i.e. "ERR_PTR(-ENOENT)". "ERR_PTR" is defined as an inline function that just casts its parameter to a pointer. By using a positive value of (ENOENT == 2) this also causes a collision with IDR value "== 2", as both of them end up with the same value in "cm_id->context". So instead of making stale pointers or stale identifiers less likely, we fix the root-cause(s) of these problems in subsequent commits. Please note that these reverts also removes code that had been introduced on behalf of this functionality (ow reverted) between the original commit and this revert in commit 2b13b0b ("rds: add tracepoint for RDS IB errors, info") * -DEFINE_EVENT(rds_ib, rds_ib_cm_mismatch, ...); * - reason = "rds_ib_rdma_create_id failed"; + reason = "rdma_create_id failed"; * -bool rds_ib_same_cm_id(struct rds_ib_connection *ic, struct rdma_cm_id *cm_id) * -EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_cm_mismatch); Orabug: 32373816 Fixes: 7b0b7b2 ("rds: ib: Correct the cm_id compare commit") Fixes: 7999a6a ("rds: ib: Implement proper cm_id compare") Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Reviewed-by: Ka-cheong Poon <ka-cheong.poon@oracle.com> (cherry picked from commit d91b564) Port to U3 Signed-off-by: Jack Vogel <jack.vogel@oracle.com> Orabug: 33590097 UEK6 => UEK7 (cherry picked from commit 95037a6) cherry-pick-repo=UEK/production/linux-uek.git Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Orabug: 33590087 UEK7 => LUCI (cherry picked from commit e359ce1) cherry-pick-repo=UEK/production/linux-uek.git Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com>
1 parent 637d076 commit 747f45e

File tree

7 files changed

+21
-132
lines changed

7 files changed

+21
-132
lines changed

net/rds/ib.c

+3-4
Original file line numberDiff line numberDiff line change
@@ -902,7 +902,6 @@ static int rds_ib_laddr_check_cm(struct net *net, const struct in6_addr *addr,
902902
__u32 scope_id)
903903
{
904904
int ret;
905-
struct rds_ib_connection dummy_ic;
906905
struct rdma_cm_id *cm_id;
907906
#if IS_ENABLED(CONFIG_IPV6)
908907
struct sockaddr_in6 sin6;
@@ -932,8 +931,8 @@ static int rds_ib_laddr_check_cm(struct net *net, const struct in6_addr *addr,
932931
/* Create a CMA ID and try to bind it. This catches both
933932
* IB and iWARP capable NICs.
934933
*/
935-
cm_id = rds_ib_rdma_create_id(net, rds_rdma_cm_event_handler, &dummy_ic,
936-
NULL, RDMA_PS_TCP, IB_QPT_RC);
934+
cm_id = rdma_create_id(net, rds_rdma_cm_event_handler,
935+
NULL, RDMA_PS_TCP, IB_QPT_RC);
937936
if (IS_ERR(cm_id))
938937
return -EADDRNOTAVAIL;
939938

@@ -996,7 +995,7 @@ static int rds_ib_laddr_check_cm(struct net *net, const struct in6_addr *addr,
996995
cm_id->device ? cm_id->device->node_type : -1);
997996

998997
out:
999-
rds_ib_rdma_destroy_id(cm_id);
998+
rdma_destroy_id(cm_id);
1000999

10011000
return ret;
10021001
}

net/rds/ib.h

-65
Original file line numberDiff line numberDiff line change
@@ -209,7 +209,6 @@ struct rds_ib_connection {
209209
/* alphabet soup, IBTA style */
210210
struct rw_semaphore i_cm_id_free_lock;
211211
struct rdma_cm_id *i_cm_id;
212-
struct rds_connection *i_cm_id_ctx;
213212
struct ib_pd *i_pd;
214213
struct ib_mr *i_mr;
215214
struct ib_cq *i_scq;
@@ -564,69 +563,6 @@ struct rds_ib_statistics {
564563

565564
extern struct workqueue_struct *rds_ib_wq;
566565

567-
#define RDS_IB_NO_CTX ERR_PTR(ENOENT)
568-
569-
static inline struct rds_connection *rds_ib_map_conn(struct rds_connection *conn)
570-
{
571-
int id;
572-
573-
mutex_lock(&cm_id_map_lock);
574-
id = idr_alloc_cyclic(&cm_id_map, conn, 0, 0, GFP_KERNEL);
575-
mutex_unlock(&cm_id_map_lock);
576-
577-
if (id < 0)
578-
return ERR_PTR(id);
579-
580-
return (struct rds_connection *)(unsigned long)id;
581-
}
582-
583-
static inline struct rdma_cm_id *rds_ib_rdma_create_id(struct net *net,
584-
rdma_cm_event_handler event_handler,
585-
struct rds_ib_connection *ic,
586-
void *context,
587-
enum rdma_ucm_port_space ps,
588-
enum ib_qp_type qp_type)
589-
{
590-
ic->i_cm_id_ctx = rds_ib_map_conn(context);
591-
592-
return rdma_create_id(net, event_handler, ic->i_cm_id_ctx, ps, qp_type);
593-
}
594-
595-
static inline struct rds_connection *rds_ib_get_conn(struct rdma_cm_id *cm_id)
596-
{
597-
struct rds_connection *conn;
598-
599-
mutex_lock(&cm_id_map_lock);
600-
conn = idr_find(&cm_id_map, (unsigned long)cm_id->context);
601-
mutex_unlock(&cm_id_map_lock);
602-
603-
return conn;
604-
}
605-
606-
static inline void rds_ib_rdma_unlink_id(struct rdma_cm_id *cm_id)
607-
{
608-
struct rds_ib_connection *ic = NULL;
609-
struct rds_connection *conn;
610-
611-
conn = rds_ib_get_conn(cm_id);
612-
if (conn)
613-
ic = conn->c_transport_data;
614-
if (ic)
615-
ic->i_cm_id_ctx = RDS_IB_NO_CTX;
616-
617-
mutex_lock(&cm_id_map_lock);
618-
(void)idr_remove(&cm_id_map, (int)(u64)cm_id->context);
619-
mutex_unlock(&cm_id_map_lock);
620-
621-
cm_id->context = RDS_IB_NO_CTX;
622-
}
623-
624-
static inline void rds_ib_rdma_destroy_id(struct rdma_cm_id *cm_id)
625-
{
626-
rds_ib_rdma_unlink_id(cm_id);
627-
rdma_destroy_id(cm_id);
628-
}
629-
630566
/*
631567
* Fake ib_dma_sync_sg_for_{cpu,device} as long as ib_verbs.h
632568
* doesn't define it.
@@ -701,7 +637,6 @@ extern struct list_head ib_nodev_conns;
701637
extern struct socket *rds_ib_inet_socket;
702638

703639
/* ib_cm.c */
704-
bool rds_ib_same_cm_id(struct rds_ib_connection *ic, struct rdma_cm_id *cm_id);
705640
int rds_ib_conn_alloc(struct rds_connection *conn, gfp_t gfp);
706641
void rds_ib_conn_free(void *arg);
707642
bool rds_ib_conn_has_alt_conn(struct rds_connection *conn);

net/rds/ib_cm.c

+9-28
Original file line numberDiff line numberDiff line change
@@ -1439,23 +1439,6 @@ u32 __rds_find_ifindex_v4(struct net *net, __be32 addr)
14391439
return idx;
14401440
}
14411441

1442-
bool rds_ib_same_cm_id(struct rds_ib_connection *ic, struct rdma_cm_id *cm_id)
1443-
{
1444-
char *reason = "no connection";
1445-
1446-
if (ic) {
1447-
if (ic->i_cm_id != cm_id)
1448-
reason = "cm_id mismatch";
1449-
else if (ic->i_cm_id_ctx != cm_id->context)
1450-
reason = "context mismatch";
1451-
else
1452-
return true;
1453-
}
1454-
trace_rds_ib_cm_mismatch(NULL, ic ? ic->rds_ibdev : NULL,
1455-
ic ? ic->conn : NULL, ic, reason, 0);
1456-
return false;
1457-
}
1458-
14591442
static void rds_destroy_cm_id_worker(struct work_struct *_work)
14601443
{
14611444
struct rds_ib_destroy_cm_id_work *work = container_of(_work,
@@ -1498,12 +1481,11 @@ static int rds_ib_cm_accept(struct rds_connection *conn,
14981481

14991482
ic = conn->c_transport_data;
15001483

1501-
BUG_ON(rds_ib_get_conn(cm_id));
1484+
BUG_ON(cm_id->context);
15021485
BUG_ON(ic->i_cm_id);
15031486

15041487
ic->i_cm_id = cm_id;
1505-
cm_id->context = rds_ib_map_conn(conn);
1506-
ic->i_cm_id_ctx = cm_id->context;
1488+
cm_id->context = conn;
15071489

15081490
if (isv6) {
15091491
#if IS_ENABLED(CONFIG_IPV6)
@@ -1811,7 +1793,7 @@ void rds_ib_conn_destroy_init(struct rds_connection *conn)
18111793

18121794
int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id, bool isv6)
18131795
{
1814-
struct rds_connection *conn = rds_ib_get_conn(cm_id);
1796+
struct rds_connection *conn = cm_id->context;
18151797
struct rds_ib_connection *ic = conn->c_transport_data;
18161798
struct rdma_conn_param conn_param;
18171799
union rds_ib_conn_priv dp;
@@ -1878,7 +1860,7 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id, bool isv6)
18781860
trace_rds_ib_cm_initiate_connect_err(ic->rds_ibdev ?
18791861
ic->rds_ibdev->dev : NULL, ic->rds_ibdev, conn, ic,
18801862
reason, ret);
1881-
if (rds_ib_same_cm_id(ic, cm_id))
1863+
if (ic->i_cm_id == cm_id)
18821864
ret = 0;
18831865
}
18841866

@@ -2018,13 +2000,13 @@ int rds_ib_conn_path_connect(struct rds_conn_path *cp)
20182000
handler = rds_rdma_cm_event_handler;
20192001

20202002
WARN_ON(ic->i_cm_id);
2021-
ic->i_cm_id = rds_ib_rdma_create_id(rds_conn_net(conn),
2022-
handler, ic, conn, RDMA_PS_TCP, IB_QPT_RC);
2003+
ic->i_cm_id = rdma_create_id(rds_conn_net(conn),
2004+
handler, conn, RDMA_PS_TCP, IB_QPT_RC);
20232005

20242006
if (IS_ERR(ic->i_cm_id)) {
20252007
ret = PTR_ERR(ic->i_cm_id);
20262008
ic->i_cm_id = NULL;
2027-
reason = "rds_ib_rdma_create_id failed";
2009+
reason = "rdma_create_id failed";
20282010
goto out;
20292011
}
20302012

@@ -2067,7 +2049,7 @@ int rds_ib_conn_path_connect(struct rds_conn_path *cp)
20672049
ic->i_cm_id = NULL;
20682050
up_write(&ic->i_cm_id_free_lock);
20692051

2070-
rds_ib_rdma_destroy_id(cm_id);
2052+
rdma_destroy_id(cm_id);
20712053
}
20722054

20732055
out:
@@ -2165,10 +2147,10 @@ void rds_ib_conn_path_shutdown_final(struct rds_conn_path *cp)
21652147
* when the conn cannot be associated with the underlying
21662148
* device.
21672149
*/
2168-
rds_ib_rdma_unlink_id(ic->i_cm_id);
21692150

21702151
down_write(&ic->i_cm_id_free_lock);
21712152
cm_id = ic->i_cm_id;
2153+
cm_id->context = NULL;
21722154
ic->i_cm_id = NULL;
21732155
up_write(&ic->i_cm_id_free_lock);
21742156

@@ -2296,7 +2278,6 @@ int rds_ib_conn_alloc(struct rds_connection *conn, gfp_t gfp)
22962278
list_add_tail(&ic->ib_node, &ib_nodev_conns);
22972279
spin_unlock_irqrestore(&ib_nodev_conns_lock, flags);
22982280

2299-
ic->i_cm_id_ctx = RDS_IB_NO_CTX;
23002281
ic->i_irq_local_cpu = NR_CPUS;
23012282

23022283
INIT_DELAYED_WORK(&ic->i_cm_watchdog_w, rds_ib_cm_watchdog_handler);

net/rds/ib_send.c

+1-1
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,7 @@ int rds_ib_send_grab_credits(struct rds_ib_connection *ic,
458458
avail--;
459459

460460
if (avail < wanted) {
461-
struct rds_connection *conn = rds_ib_get_conn(ic->i_cm_id);
461+
struct rds_connection *conn = ic->i_cm_id->context;
462462

463463
/* Oops, there aren't that many credits left! */
464464
set_bit(RDS_LL_SEND_FULL, &conn->c_flags);

net/rds/rdma_transport.c

+8-23
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
105105
bool isv6)
106106
{
107107
/* this can be null in the listening path */
108-
struct rds_connection *conn;
108+
struct rds_connection *conn = cm_id->context;
109109
struct rds_transport *trans = &rds_ib_transport;
110110
int ret = 0;
111111
/* ADDR_CHANGE event indicates that the local address has moved
@@ -120,7 +120,6 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
120120
char *reason = NULL;
121121
int *err;
122122

123-
conn = rds_ib_get_conn(cm_id);
124123
if (!conn) {
125124
if (event->event == RDMA_CM_EVENT_CONNECT_REQUEST) {
126125
trace_rds_rdma_cm_event_handler(NULL, NULL, NULL,
@@ -205,7 +204,7 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
205204
*/
206205
reason = "resolve route failed";
207206
ibic = conn->c_transport_data;
208-
if (rds_ib_same_cm_id(ibic, cm_id))
207+
if (ibic && ibic->i_cm_id == cm_id)
209208
ibic->i_cm_id = NULL;
210209
rds_conn_drop(conn, DR_IB_RESOLVE_ROUTE_FAIL, ret);
211210
} else if (conn->c_to_index < (RDS_RDMA_RESOLVE_TO_MAX_INDEX-1))
@@ -220,7 +219,7 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
220219
* cm_id is valid before proceeding */
221220

222221
ibic = conn->c_transport_data;
223-
if (rds_ib_same_cm_id(ibic, cm_id)) {
222+
if (ibic && ibic->i_cm_id == cm_id) {
224223
/* ibacm caches the path record without considering the tos/sl.
225224
* It is considered a match if the <src,dest> matches the
226225
* cache. In order to create qp with the correct sl/vl, RDS
@@ -349,18 +348,13 @@ static int rds_rdma_listen_init_common(rdma_cm_event_handler handler,
349348
struct sockaddr *sa,
350349
struct rdma_cm_id **ret_cm_id)
351350
{
352-
struct rds_ib_connection *dummy_ic;
353351
struct rdma_cm_id *cm_id;
354352
int ret;
355353

356-
dummy_ic = kmalloc(sizeof(*dummy_ic), GFP_KERNEL);
357-
if (!dummy_ic)
358-
return -ENOMEM;
359-
360-
cm_id = rds_ib_rdma_create_id(&init_net, handler, dummy_ic, NULL, RDMA_PS_TCP, IB_QPT_RC);
354+
cm_id = rdma_create_id(&init_net, handler, NULL, RDMA_PS_TCP, IB_QPT_RC);
361355
if (IS_ERR(cm_id)) {
362356
ret = PTR_ERR(cm_id);
363-
printk(KERN_ERR "RDS/RDMA: failed to setup listener, rds_ib_rdma_create_id() returned %d\n",
357+
printk(KERN_ERR "RDS/RDMA: failed to setup listener, rdma_create_id() returned %d\n",
364358
ret);
365359
return ret;
366360
}
@@ -391,7 +385,7 @@ static int rds_rdma_listen_init_common(rdma_cm_event_handler handler,
391385
cm_id = NULL;
392386
out:
393387
if (cm_id)
394-
rds_ib_rdma_destroy_id(cm_id);
388+
rdma_destroy_id(cm_id);
395389
return ret;
396390
}
397391

@@ -436,25 +430,16 @@ static int rds_rdma_listen_init(void)
436430

437431
static void rds_rdma_listen_stop(void)
438432
{
439-
struct rds_ib_connection *ic;
440-
struct rds_connection *conn;
441-
442433
if (rds_rdma_listen_id) {
443-
conn = rds_ib_get_conn(rds_rdma_listen_id);
444-
ic = conn ? conn->c_transport_data : NULL;
445434
rdsdebug("cm %p\n", rds_rdma_listen_id);
446-
rds_ib_rdma_destroy_id(rds_rdma_listen_id);
435+
rdma_destroy_id(rds_rdma_listen_id);
447436
rds_rdma_listen_id = NULL;
448-
kfree(ic);
449437
}
450438
#if IS_ENABLED(CONFIG_IPV6)
451439
if (rds6_rdma_listen_id) {
452-
conn = rds_ib_get_conn(rds6_rdma_listen_id);
453-
ic = conn ? conn->c_transport_data : NULL;
454440
rdsdebug("cm %p\n", rds6_rdma_listen_id);
455-
rds_ib_rdma_destroy_id(rds6_rdma_listen_id);
441+
rdma_destroy_id(rds6_rdma_listen_id);
456442
rds6_rdma_listen_id = NULL;
457-
kfree(ic);
458443
}
459444
#endif
460445
}

net/rds/trace.c

-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_add_device_err);
2525
EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_remove_device);
2626
EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_remove_device_err);
2727
EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_shutdown_device);
28-
EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_cm_mismatch);
2928
EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_cm_accept_err);
3029
EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_cm_handle_connect);
3130
EXPORT_TRACEPOINT_SYMBOL_GPL(rds_ib_cm_handle_connect_err);

net/rds/trace.h

-10
Original file line numberDiff line numberDiff line change
@@ -728,16 +728,6 @@ DEFINE_EVENT(rds_ib, rds_ib_shutdown_device,
728728

729729
);
730730

731-
DEFINE_EVENT(rds_ib, rds_ib_cm_mismatch,
732-
733-
TP_PROTO(struct ib_device *dev, struct rds_ib_device *rds_ibdev,
734-
struct rds_connection *conn, struct rds_ib_connection *ic,
735-
char *reason, int err),
736-
737-
TP_ARGS(dev, rds_ibdev, conn, ic, reason, err)
738-
739-
);
740-
741731
DEFINE_EVENT(rds_ib, rds_ib_cm_accept_err,
742732

743733
TP_PROTO(struct ib_device *dev, struct rds_ib_device *rds_ibdev,

0 commit comments

Comments
 (0)