Commit graph

87 commits

Author SHA1 Message Date
Daisuke Matsuda
7f88072507 RDMA/rxe: Move some code to rxe_loc.h in preparation for ODP
rxe_mr_init() and resp_states are going to be used in rxe_odp.c, which is
to be created in the subsequent patch.

Link: https://patch.msgid.link/r/20241220100936.2193541-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-02-21 13:07:43 -04:00
Eric Biggers
ccca5e8aa1 RDMA/rxe: switch to using the crc32 library
Now that the crc32_le() library function takes advantage of
architecture-specific optimizations, it is unnecessary to go through the
crypto API.  Just use crc32_le().  This is much simpler, and it improves
performance due to eliminating the crypto API overhead.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Link: https://patch.msgid.link/20250207032316.53941-1-ebiggers@kernel.org
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-09 04:20:10 -05:00
Zhu Yanjun
d34d0bdb50 RDMA/rxe: Replace netdev dev addr with raw_gid
Because TUN device does not have dev_addr, but a gid in rdma is needed,
as such, a raw_gid is generated to act as the gid. The similar commit is
in SIW. This commit learns from the similar commit bad5b6e34f
("RDMA/siw: Fabricate a GID on tun and loopback devices") in SIW.

Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20250119172831.3123110-2-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-02-03 06:38:43 -05:00
Zhu Yanjun
2ac5415022 RDMA/rxe: Remove the direct link to net_device
The similar patch in siw is in the link:
https://git.kernel.org/rdma/rdma/c/16b87037b48889

This problem also occurred in RXE. The following analyze this problem.
In the following Call Traces:
"
BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0 net/core/dev.c:8782
Read of size 4 at addr ffff8880554640b0 by task kworker/1:4/5295

CPU: 1 UID: 0 PID: 5295 Comm: kworker/1:4 Not tainted
6.12.0-rc3-syzkaller-00399-g9197b73fd7bb #0
Hardware name: Google Compute Engine/Google Compute Engine,
BIOS Google 09/13/2024
Workqueue: infiniband ib_cache_event_task
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 print_address_description mm/kasan/report.c:377 [inline]
 print_report+0x169/0x550 mm/kasan/report.c:488
 kasan_report+0x143/0x180 mm/kasan/report.c:601
 dev_get_flags+0x188/0x1d0 net/core/dev.c:8782
 rxe_query_port+0x12d/0x260 drivers/infiniband/sw/rxe/rxe_verbs.c:60
 __ib_query_port drivers/infiniband/core/device.c:2111 [inline]
 ib_query_port+0x168/0x7d0 drivers/infiniband/core/device.c:2143
 ib_cache_update+0x1a9/0xb80 drivers/infiniband/core/cache.c:1494
 ib_cache_event_task+0xf3/0x1e0 drivers/infiniband/core/cache.c:1568
 process_one_work kernel/workqueue.c:3229 [inline]
 process_scheduled_works+0xa65/0x1850 kernel/workqueue.c:3310
 worker_thread+0x870/0xd30 kernel/workqueue.c:3391
 kthread+0x2f2/0x390 kernel/kthread.c:389
 ret_from_fork+0x4d/0x80 arch/x86/kernel/process.c:147
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
 </TASK>
"

1). In the link [1],

"
 infiniband syz2: set down
"

This means that on 839.350575, the event ib_cache_event_task was sent andi
queued in ib_wq.

2). In the link [1],

"
 team0 (unregistering): Port device team_slave_0 removed
"

It indicates that before 843.251853, the net device should be freed.

3). In the link [1],

"
 BUG: KASAN: slab-use-after-free in dev_get_flags+0x188/0x1d0
"

This means that on 850.559070, this slab-use-after-free problem occurred.

In all, on 839.350575, the event ib_cache_event_task was sent and queued
in ib_wq,

before 843.251853, the net device veth was freed.

on 850.559070, this event was executed, and the mentioned freed net device
was called. Thus, the above call trace occurred.

[1] https://syzkaller.appspot.com/x/log.txt?x=12e7025f980000

Reported-by: syzbot+4b87489410b4efd181bf@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4b87489410b4efd181bf
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20241220222325.2487767-1-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-12-24 04:36:40 -05:00
Bob Pearson
3d807a3ebc RDMA/rxe: Don't call rxe_requester from rxe_completer
Instead of rescheduling rxe_requester from rxe_completer() just extend the
duration of rxe_sender() by one pass. Setting run_requester_again forces
rxe_completer() to return 0 which will cause rxe_sender() to be called at
least one more time.

Link: https://lore.kernel.org/r/20240329145513.35381-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22 16:54:33 -03:00
Bob Pearson
67f57892f9 RDMA/rxe: Merge request and complete tasks
Currently the rxe driver has three work queue tasks per qp.  These are the
req.task, comp.task and resp.task which call rxe_requester(),
rxe_completer() and rxe_responder() respectively directly or on work
queues. Each of these subroutines checks to see if there is work to be
performed on the send queue or on the response packet queue or the request
packet queue and will run until there is no work remaining or yield the
cpu and reschedule itself until there is no work remaining.

This commit combines the req.task and comp.task into a single send.task
and renames the resp.task to the recv.task. The combined send.task calls
rxe_requester() and rxe_completer() serially and continues until all work
on both the send queue and the response packet queue are done.

In various benchmarks the performance is either improved or left the
same. At high scale there is a significant reduction in the load on the
cpu.

This is the first step in combining these two tasks. Once they are
serialized cross rescheduling of req.task and comp.task can be more
efficiently handled by just letting the send.task continue to run. This
will be done in the next several patches.

Link: https://lore.kernel.org/r/20240329145513.35381-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-04-22 16:54:33 -03:00
Bob Pearson
ee678e5dff RDMA/rxe: Fixes mr access supported list
A recent patch incorrectly did not include IB_ACCESS_RELAXED_ORDERING in
the list of supported access flags for the rxe driver. The driver actually
does nothing related to relaxed ordering but it causes no problems to
include it as supported but with no effect. This change caused ib_send_bw
and friends to not run correctly.

The correct approach is for the driver to allow any of the optional access
flags and otherwise ignore them. This patch adds IB_ACCESS_OPTIONAL to the
list of rxe supported flags.

Fixes: 02ed253770 ("RDMA/rxe: Introduce rxe access supported flags")
Link: https://lore.kernel.org/r/20230613171654.19334-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-20 11:42:39 -03:00
Bob Pearson
544c7f62cf RDMA/rxe: Implement rereg_user_mr
Implement the two easy cases of ib_rereg_user_mr.

Link: https://lore.kernel.org/r/20230530221334.89432-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09 13:18:53 -03:00
Bob Pearson
02ed253770 RDMA/rxe: Introduce rxe access supported flags
Introduce supported bit masks for setting the access attributes of MWs,
MRs, and QPs. Check these when attributes are set.

Link: https://lore.kernel.org/r/20230530221334.89432-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09 13:18:53 -03:00
Bob Pearson
d11442c6bd RDMA/rxe: Rename IB_ACCESS_REMOTE
Rename IB_ACCESS_REMOTE to RXE_ACCESS_REMOTE and move to rxe_verbs.h as an
enum instead of a #define. Shouldn't use IB_xxx for rxe symbols.

Link: https://lore.kernel.org/r/20230530221334.89432-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-06-09 13:18:52 -03:00
Bob Pearson
98e891b5e4 RDMA/rxe: Remove qp->req.state
The rxe driver has four different QP state variables,
    qp->attr.qp_state,
    qp->req.state,
    qp->comp.state, and
    qp->resp.state.
All of these basically carry the same information.

This patch replaces uses of qp->req.state by qp->attr.qp_state and enum
rxe_qp_state.  This is the third of three patches which will remove all
but the qp->attr.qp_state variable. This will bring the driver closer to
the IBA description.

Link: https://lore.kernel.org/r/20230405042611.6467-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Bob Pearson
f55efc2ed2 RDMA/rxe: Remove qp->comp.state
The rxe driver has four different QP state variables,
    qp->attr.qp_state,
    qp->req.state,
    qp->comp.state, and
    qp->resp.state.
All of these basically carry the same information.

This patch replaces uses of qp->comp.state by qp->attr.qp_state.  This is
the second of three patches which will remove all but the
qp->attr.qp_state variable. This will bring the driver closer to the IBA
description.

Link: https://lore.kernel.org/r/20230405042611.6467-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Bob Pearson
a588429a66 RDMA/rxe: Remove qp->resp.state
The rxe driver has four different QP state variables,
    qp->attr.qp_state,
    qp->req.state,
    qp->comp.state, and
    qp->resp.state.
All of these basically carry the same information.

This patch replaces uses of qp->resp.state by qp->attr.qp_state.  This is
the first of three patches which will remove all but the qp->attr.qp_state
variable. This will bring the driver closer to the IBA description.

Link: https://lore.kernel.org/r/20230405042611.6467-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-04-17 16:01:44 -03:00
Bob Pearson
78b26a3353 RDMA/rxe: Remove tasklet call from rxe_cq.c
Remove the tasklet call in rxe_cq.c and also the is_dying in the
cq struct. There is no reason for the rxe driver to defer the call
to the cq completion handler by scheduling a tasklet. rxe_cq_post()
is not called in a hard irq context.

The rxe driver currently is incorrect because the tasklet call is
made without protecting the cq pointer with a reference from having
the underlying memory freed before the deferred routine is called.
Executing the comp_handler inline fixes this problem.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Link: https://lore.kernel.org/r/20230327215643.10410-1-rpearsonhpe@gmail.com
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-03-29 14:25:11 +03:00
Bob Pearson
592627ccbd RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray
Replace struct rxe-phys_buf and struct rxe_map by struct xarray
in rxe_verbs.h. This allows using rcu locking on reads for
the memory maps stored in each mr.

This is based off of a sketch of a patch from Jason Gunthorpe in the
link below. Some changes were needed to make this work. It applies
cleanly to the current for-next and passes the pyverbs, perftest
and the same blktests test cases which run today.

Link: https://lore.kernel.org/r/20230119235936.19728-7-rpearsonhpe@gmail.com
Link: https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27 12:14:14 -04:00
Bob Pearson
325a7eb851 RDMA/rxe: Cleanup page variables in rxe_mr.c
Cleanup usage of mr->page_shift and mr->page_mask and introduce
an extractor for mr->ibmr.page_size. Normal usage in the kernel
has page_mask masking out offset in page rather than masking out
the page number. The rxe driver had reversed that which was confusing.
Implicitly there can be a per mr page_size which was not uniformly
supported.

Link: https://lore.kernel.org/r/20230119235936.19728-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:46 -04:00
Li Zhijian
ea1bb00ee9 RDMA/rxe: Implement flush execution in responder side
Only the requested placement types that also registered in the destination
memory region are acceptable.
Otherwise, responder will also reply NAK "Remote Access Error" if it
found a placement type violation.

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

This commit also adds 2 helpers to update qp.resp from the incoming packet.

Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
yangx.jy@fujitsu.com
71d2363991 RDMA/rxe: Remove the member 'type' of struct rxe_mr
The member 'type' is included in both struct rxe_mr and struct ib_mr
so remove the duplicate one of struct rxe_mr.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Link: https://lore.kernel.org/r/20221021134513.17730-1-yangx.jy@fujitsu.com
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24 14:50:14 +03:00
Daisuke Matsuda
954afc5a8f RDMA/rxe: Use members of generic struct in rxe_mr
rxe_mr and ib_mr have interchangeable members. Remove device specific
members and use ones in the generic struct. Both 'iova' and 'length' are
filled in ib_uverbs or ib_core layer after MR registration.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20220921080844.1616883-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22 12:46:39 +03:00
Daisuke Matsuda
d4ecb56e86 RDMA/rxe: Remove an unused member from struct rxe_mr
Commit 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for
FMRs"") brought back the member 'va' to struct rxe_mr. However, it is
actually used by nobody and thus can be removed.

Fixes: 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"")
Link: https://lore.kernel.org/r/20220829012335.1212697-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-29 09:44:07 +03:00
Bob Pearson
62494ec7fb RDMA/rxe: Split qp state for requester and completer
Currently the requester can continue to process send wqes after an local
qp operation error is detected because the setting of the qp state to the
error state is deferred until later. This patch splits the qp state for
the completer and requester into two separate states and sets
qp->req.state = QP_STATE_ERROR as soon as the error is detected before
another wqe can be executed.

Link: https://lore.kernel.org/r/1658307368-1851-4-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-08-02 13:53:36 -03:00
Li Zhijian
1e75550648 Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"
Below 2 commits will be reverted:
 commit 8ff5f5d9d8 ("RDMA/rxe: Prevent double freeing rxe_map_set()")
 commit 647bf13ce9 ("RDMA/rxe: Create duplicate mapping tables for FMRs")

The community has a few bug reports which pointed this commit at last.
Some proposals are raised up in the meantime but all of them have no
follow-up operation.

The previous commit led the map_set of FMR to be not available any more if
the MR is registered again after invalidating. Although the mentioned
patch try to fix a potential race in building/accessing the same table
for fast memory regions, it broke rtrs etc ULPs. Since the latter could
be worse, revert this patch.

With previous commit, it's observed that a same MR in rnbd server will
trigger below code path:
 -> rxe_mr_init_fast()
 |-> alloc map_set() # map_set is uninitialized
 |...-> rxe_map_mr_sg() # build the map_set
     |-> rxe_mr_set_page()
 |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
                          # we can access host memory(such rxe_mr_copy)
 |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
 |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
                          # but map_set was not built again
 |...-> rxe_mr_copy() # kernel crash due to access wild addresses
                      # that lookup from the map_set

The backtraces are not always identical.
[1st]----------
  RIP: 0010:lookup_iova+0x66/0xa0 [rdma_rxe]
  Code: 00 00 00 48 d3 ee 89 32 c3 4c 8b 18 49 8b 3b 48 8b 47 08 48 39 c6 72 38 48 29 c6 45 31 d2 b8 01 00 00 00 48 63 c8 48 c1 e1 04 <48> 8b 4c 0f 08 48 39 f1 77 21 83 c0 01 48 29 ce 3d 00 01 00 00 75
  RSP: 0018:ffffb7ff80063bf0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff9b9949d86800 RCX: 0000000000000000
  RDX: ffffb7ff80063c00 RSI: 0000000049f6b378 RDI: 002818da00000004
  RBP: 0000000000000120 R08: ffffb7ff80063c08 R09: ffffb7ff80063c04
  R10: 0000000000000002 R11: ffff9b9916f7eef8 R12: ffff9b99488a0038
  R13: ffff9b99488a0038 R14: ffff9b9914fb346a R15: ffff9b990ab27000
  FS:  0000000000000000(0000) GS:ffff9b997dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007efc33a98ed0 CR3: 0000000014f32004 CR4: 00000000001706f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   rxe_mr_copy.part.0+0x6f/0x140 [rdma_rxe]
   rxe_responder+0x12ee/0x1b60 [rdma_rxe]
   ? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
   ? rxe_rcv+0x1d0/0x780 [rdma_rxe]
   ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
   rxe_do_task+0x67/0xb0 [rdma_rxe]
   rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
   rxe_requester+0x680/0xee0 [rdma_rxe]
   ? update_load_avg+0x5f/0x690
   ? update_load_avg+0x5f/0x690
   ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]

[2nd]----------
  RIP: 0010:rxe_mr_copy.part.0+0xa8/0x140 [rdma_rxe]
  Code: 00 00 49 c1 e7 04 48 8b 00 4c 8d 2c d0 48 8b 44 24 10 4d 03 7d 00 85 ed 7f 10 eb 6c 89 54 24 0c 49 83 c7 10 31 c0 85 ed 7e 5e <49> 8b 3f 8b 14 24 4c 89 f6 48 01 c7 85 d2 74 06 48 89 fe 4c 89 f7
  RSP: 0018:ffffae3580063bf8 EFLAGS: 00010202
  RAX: 0000000000018978 RBX: ffff9d7ef7a03600 RCX: 0000000000000008
  RDX: 000000000000007c RSI: 000000000000007c RDI: ffff9d7ef7a03600
  RBP: 0000000000000120 R08: ffffae3580063c08 R09: ffffae3580063c04
  R10: ffff9d7efece0038 R11: ffff9d7ec4b1db00 R12: ffff9d7efece0038
  R13: ffff9d7ef4098260 R14: ffff9d7f11e23c6a R15: 4c79500065708144
  FS:  0000000000000000(0000) GS:ffff9d7f3dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fce47276c60 CR3: 0000000003f66004 CR4: 00000000001706f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   rxe_responder+0x12ee/0x1b60 [rdma_rxe]
   ? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
   ? rxe_rcv+0x1d0/0x780 [rdma_rxe]
   ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
   rxe_do_task+0x67/0xb0 [rdma_rxe]
   rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
   rxe_requester+0x680/0xee0 [rdma_rxe]
   ? update_load_avg+0x5f/0x690
   ? update_load_avg+0x5f/0x690
   ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]
   rxe_do_task+0x67/0xb0 [rdma_rxe]
   tasklet_action_common.constprop.0+0x92/0xc0
   __do_softirq+0xe1/0x2d8
   run_ksoftirqd+0x21/0x30
   smpboot_thread_fn+0x183/0x220
   ? sort_range+0x20/0x20
   kthread+0xe2/0x110
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x22/0x30

Link: https://lore.kernel.org/r/1658805386-2-1-git-send-email-lizhijian@fujitsu.com
Link: https://lore.kernel.org/all/20220210073655.42281-1-guoqing.jiang@linux.dev/T/
Link: https://www.spinics.net/lists/linux-rdma/msg110836.html
Link: https://lore.kernel.org/lkml/94a5ea93-b8bb-3a01-9497-e2021f29598a@linux.dev/t/
Tested-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-27 12:22:57 +03:00
Bob Pearson
445fd4f4fb RDMA/rxe: Fix rnr retry behavior
Currently the completer tasklet when retransmit timer or the rnr timer
fires the same flag (qp->req.need_retry) is set so that if either timer
fires it will attempt to perform a retry flow on the send queue.  This has
the effect of responding to an RNR NAK at the first retransmit timer event
which might not allow the requested rnr timeout.

This patch adds a new flag (qp->req.wait_for_rnr_timer) which, if set,
prevents a retry flow until the rnr nak timer fires.

This patch fixes rnr retry errors which can be observed by running the
pyverbs test_rdmacm_async_traffic_external_qp multiple times. With this
patch applied they do not occur.

Link: https://lore.kernel.org/linux-rdma/a8287823-1408-4273-bc22-99a0678db640@gmail.com/
Link: https://lore.kernel.org/linux-rdma/2bafda9e-2bb6-186d-12a1-179e8f6a2678@talpey.com/
Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220630190425.2251-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22 17:43:00 -03:00
Bob Pearson
96938258b1 RDMA/rxe: Remove unnecessary include statement
rxe_verbs.h includes the file <rdma/rdma_user_rxe.h>.  It should have been
<uapi/rdma/rdma_user_rxe.h>, however, it is not used and not required in
this file.

This patch removes the include statement.

Link: https://lore.kernel.org/r/20220630190425.2251-4-rpearsonhpe@gmail.com
Reported-by: Frank Zago <frank.zago@hpe.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-04 10:04:09 -03:00
Bob Pearson
dc18483881 RDMA/rxe: Merge normal and retry atomic flows
Make the execution of the atomic operation in rxe_atomic_reply()
conditional on res->replay and make duplicate_request() call into
rxe_atomic_reply() to merge the two flows. This is modeled on the behavior
of read reply. Delete the skb from the atomic responder resource since it
is no longer used. Adjust the reference counting of the qp in
send_atomic_ack() for this flow.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20220606143836.3323-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 14:00:21 -03:00
Bob Pearson
8264411595 RDMA/rxe: Move atomic original value to res
Move the saved original value to the atomic responder resource.  This
replaces saving it in the qp. In preparation for merging the normal and
retry atomic responder flows.

Link: https://lore.kernel.org/r/20220606143836.3323-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 14:00:21 -03:00
Bob Pearson
4703b4f0d9 RDMA/rxe: Enforce IBA C11-17
Add a counter to keep track of the number of WQs connected to a CQ and
return an error if destroy_cq() is called while the counter is non zero.

Link: https://lore.kernel.org/r/20220421014042.26985-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09 09:03:45 -03:00
Bob Pearson
409baed5d7 RDMA/rxe: Remove support for SMI QPs from rdma_rxe
Currently the rdma_rxe driver supports SMI type QPs in a few places which
is incorrect. RoCE devices never should support SMI QPs.  This commit
removes SMI QP support from the driver.

Link: https://lore.kernel.org/r/20220407185416.16372-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-08 14:38:33 -03:00
Bob Pearson
5c477ee768 RDMA/rxe: Remove mc_grp_pool from struct rxe_dev
Remove struct rxe_dev mc_grp_pool field. This field is no longer used.

Link: https://lore.kernel.org/r/20220407184849.14359-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-08 14:38:18 -03:00
Bob Pearson
8a1a0be894 RDMA/rxe: Replace mr by rkey in responder resources
Currently rxe saves a copy of MR in responder resources for RDMA reads.
Since the responder resources are never freed just over written if more
are needed this MR may not have a reference freed until the QP is
destroyed. This patch uses the rkey instead of the MR and on subsequent
packets of a multipacket read reply message it looks up the MR from the
rkey for each packet. This makes it possible for a user to deregister an
MR or unbind a MW on the fly and get correct behaviour.

Link: https://lore.kernel.org/r/20220304000808.225811-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-15 20:49:56 -03:00
Bob Pearson
4a4f107347 RDMA/rxe: Collect mca init code in a subroutine
Collect initialization code for struct rxe_mca into a subroutine,
__rxe_init_mca(), to cleanup rxe_attach_mcg() in rxe_mcast.c. Check
limit on total number of attached qp's.

Link: https://lore.kernel.org/r/20220223230706.50332-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-23 20:29:15 -04:00
Bob Pearson
3810c1a1cb RDMA/rxe: Remove mcg from rxe pools
Finish removing mcg from rxe pools. Replace rxe pools ref counting by
kref's. Replace rxe_alloc by kzalloc.

Link: https://lore.kernel.org/r/20220208211644.123457-8-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-16 12:11:29 -04:00
Bob Pearson
8a0a5fe0c4 RDMA/rxe: Replace pool key by rxe->mcg_tree
Continuing to decouple mcg from rxe pools. Create red-black tree code in
rxe_mcast.c to hold mcg index. Replace pool key calls by calls to local
red-black routines.

Link: https://lore.kernel.org/r/20220208211644.123457-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-16 12:06:35 -04:00
Bob Pearson
8a99c81f12 RDMA/rxe: Replace int num_qp by atomic_t qp_num
Replace int num_qp in struct rxe_mcg by atomic_t qp_num.

Link: https://lore.kernel.org/r/20220208211644.123457-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-16 12:01:22 -04:00
Bob Pearson
d572405518 RDMA/rxe: Use kzmalloc/kfree for mca
Remove rxe_mca (was rxe_mc_elem) from rxe pools and use kzmalloc and kfree
to allocate and free in rxe_mcast.c. Call kzalloc outside of spinlocks to
avoid having to use GFP_ATOMIC.

Link: https://lore.kernel.org/r/20220208211644.123457-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-16 11:59:11 -04:00
Bob Pearson
9fd0eb7c3c RDMA/rxe: Move mcg_lock to rxe
Replace mcg->mcg_lock and mc_grp_pool->pool_lock by rxe->mcg_lock.  This
is the first step of several intended to decouple the mc_grp and mc_elem
objects from the rxe pool code.

Link: https://lore.kernel.org/r/20220208211644.123457-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-16 11:55:28 -04:00
Bob Pearson
d3f6899b0b RDMA/rxe: Remove qp->grp_lock and qp->grp_list
Since it is no longer required to cleanup attachments to multicast
groups when a QP is destroyed qp->grp_lock and qp->grp_list are
no longer needed and are removed.

Link: https://lore.kernel.org/r/20220127213755.31697-7-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 14:33:28 -04:00
Bob Pearson
f9f4846057 RDMA/rxe: Enforce IBA o10-2.2.3
Add code to check if a QP is attached to one or more multicast groups
when destroy_qp is called and return an error if so.

Link: https://lore.kernel.org/r/20220127213755.31697-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 14:33:28 -04:00
Bob Pearson
02e3524474 RDMA/rxe: Rename rxe_mc_grp and rxe_mc_elem
Rename rxe_mc_grp to rxe_mcg. Rename rxe_mc_elem to rxe_mca.
These can be read 'multicast group' and 'multicast attachment'.
'elem' collided with the use of elem in rxe pools and was a little
confusing.

Link: https://lore.kernel.org/r/20220127213755.31697-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-28 13:22:09 -04:00
Zhu Yanjun
8803836fe7 RDMA/rxe: Remove the unused xmit_errors member
The member variable xmit_errors can be replaced with

 rxe_counter_inc(rxe, RXE_CNT_SEND_ERR)

Link: https://lore.kernel.org/r/20211216054842.1099428-1-yanjun.zhu@linux.dev
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-05 13:56:50 -04:00
Bob Pearson
02827b6708 RDMA/rxe: Cleanup rxe_pool_entry
Currently three different names are used to describe rxe pool elements.
They are referred to as entries, elems or pelems. This patch chooses one
'elem' and changes the other ones.

Link: https://lore.kernel.org/r/20211103050241.61293-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 13:29:14 -04:00
Bob Pearson
4da698eabf RDMA/rxe: Replace ah->pd by ah->ibah.pd
The pd field in struct rxe_ah is redundant with the pd field in the
rdma-core's ib_ah. Eliminate the pd field in rxe_ah and add an inline to
extract the pd from the ibah field.

Link: https://lore.kernel.org/r/20211007204051.10086-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12 13:25:26 -03:00
Bob Pearson
73a5493210 RDMA/rxe: Create AH index and return to user space
Make changes to rdma_user_rxe.h to allow indexing AH objects, passing the
index in UD send WRs to the driver and returning the index to the rxe
provider.

Modify rxe_create_ah() to add an index to AH when created and if called
from a new user provider return it to user space. If called from an old
provider mark the AH as not having a useful index.  Modify rxe_destroy_ah
to drop the index before deleting the object.

Link: https://lore.kernel.org/r/20211007204051.10086-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-12 13:25:26 -03:00
Xiao Yang
609bb8c3a3 RDMA/rxe: Change the is_user member of struct rxe_cq to bool
Make the is_user members of struct rxe_qp/rxe_cq has the same type.

Link: https://lore.kernel.org/r/20210930094813.226888-3-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06 19:45:30 -03:00
Xiao Yang
1cf2ce8272 RDMA/rxe: Remove the is_user members of struct rxe_sq/rxe_rq/rxe_srq
The is_user members of struct rxe_sq/rxe_rq/rxe_srq are unsed since
commit ae6e843fe0 ("RDMA/rxe: Add memory barriers to kernel queues").
In this case, it is fine to remove them directly.

Link: https://lore.kernel.org/r/20210930094813.226888-2-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-06 19:45:29 -03:00
Bob Pearson
647bf13ce9 RDMA/rxe: Create duplicate mapping tables for FMRs
For fast memory regions create duplicate mapping tables so ib_map_mr_sg()
can build a new mapping table which is then swapped into place
synchronously with the execution of an IB_WR_REG_MR work request.

Currently the rxe driver uses the same table for receiving RDMA operations
and for building new tables in preparation for reusing the MR. This
exposes users to potentially incorrect results.

Link: https://lore.kernel.org/r/20210914164206.19768-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24 10:15:00 -03:00
Bob Pearson
001345339f RDMA/rxe: Separate HW and SW l/rkeys
Separate software and simulated hardware lkeys and rkeys for MRs and MWs.
This makes struct ib_mr and struct ib_mw isolated from hardware changes
triggered by executing work requests.

This change fixes a bug seen in blktest.

Link: https://lore.kernel.org/r/20210914164206.19768-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24 10:14:59 -03:00
Bob Pearson
47b7f7064b RDMA/rxe: Cleanup MR status and type enums
Eliminate RXE_MR_STATE_ZOMBIE which is not compatible with IBA.
RXE_MR_STATE_INVALID is better.

Replace RXE_MR_TYPE_XXX by IB_MR_TYPE_XXX which covers all the needed
types.

Link: https://lore.kernel.org/r/20210914164206.19768-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24 10:14:59 -03:00
Zhu Yanjun
ad17bbef3d RDMA/rxe: remove the unnecessary variable
In the struct rxe_qp, the variable send_pkts is never used.  So remove it.

Link: https://lore.kernel.org/r/20210915075128.482919-1-yanjun.zhu@intel.com
Signed-off-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-14 16:22:48 -03:00
Leon Romanovsky
514aee660d RDMA: Globally allocate and release QP memory
Convert QP object to follow IB/core general allocation scheme.  That
change allows us to make sure that restrack properly kref the memory.

Link: https://lore.kernel.org/r/48e767124758aeecc433360ddd85eaa6325b34d9.1627040189.git.leonro@nvidia.com
Reviewed-by: Gal Pressman <galpress@amazon.com> #efa
Tested-by: Gal Pressman <galpress@amazon.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> #rdma and core
Tested-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-03 13:44:27 -03:00