2020年3月26日 星期四
Ubuntu 18.04 Mallanox ConnectX-5 EDR + 100Gbe (Cx556A) setting
Ubuntu 18.04 Mallanox ConnectX-5 EDR + 100Gbe (Cx556A) setting
** check device.
# lspci | grep Mellanox
01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
01:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
** check driver.
# lspci -k
01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
Subsystem: Mellanox Technologies MT27800 Family [ConnectX-5]
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
01:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
Subsystem: Mellanox Technologies MT27800 Family [ConnectX-5]
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
** driver install
1. 基本Ubuntu 18.04 已支援, 從上面確定 mlx5_core 已裝。
2. 若沒有安裝可以上網找尋 OFED driver,原廠有提供相關driver可以安裝
透過 https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed 下載
現階段只有Ubuntu18.04 支援 aarch64 系統.
解壓縮後執行 ./mlnxofedinstall 即可安裝完成.
可參考 https://community.mellanox.com/s/article/howto-install-mlnx-ofed-driver
** Important Packages and Their Installation
#apt-get install rdma-core
rdma-core RDMA core userspace libraries and daemons
#apt-get install opensm
opensm: InfiniBand Subnet Manager
opensm-libs Libraries used by OpenSM and included utilities
opensm OpenIB InfiniBand Subnet Manager and management utilities
#apt-get install Ibutils
Ibutils: OpenIB Mellanox InfiniBand Diagnostic Tools
ibutils-libs Shared libraries used by ibutils binaries
ibutils OpenIB Mellanox InfiniBand Diagnostic Tools
#apt-get install infiniband-diags
infiniband-diags: OpenFabrics Alliance InfiniBand Diagnostic Tools
#apt-get install perftest
perftest: IB Performance tests
#apt-get install mstflint
mstflint: Mellanox Firmware Burning and Diagnostics Tools
# apt-cache search infiniband
grub-ipxe - Network booting from GRUB using iPXE
ibverbs-providers - User space provider drivers for libibverbs
ipxe - PXE boot firmware
ipxe-qemu - PXE boot firmware - ROM images for qemu
ipxe-qemu-256k-compat-efi-roms - PXE boot firmware - Compat EFI ROM images for qemu
libibumad-dev - Development files for libibumad
libibumad3 - InfiniBand Userspace Management Datagram (uMAD) library
libibverbs-dev - Development files for the libibverbs library
libibverbs1 - Library for direct userspace use of RDMA (InfiniBand/iWARP)
librdmacm-dev - Development files for the librdmacm library
librdmacm1 - Library for managing RDMA connections
tgt - Linux SCSI target user-space daemon and tools
tgt-dbg - Linux SCSI target user-space daemon and tools - debug symbols
collectl - Utility to collect Linux performance data
ctdb - clustered database to store temporary data
dapl2-utils - utilities for use with the DAPL libraries
glusterfs-client - clustered file-system (client package)
glusterfs-common - GlusterFS common libraries and translator modules
glusterfs-server - clustered file-system (server package)
ibacm - InfiniBand Communication Manager Assistant (ACM)
ibsim-utils - InfiniBand fabric simulator utilities
ibutils - InfiniBand network utilities
ibverbs-utils - Examples for the libibverbs library
infiniband-diags - InfiniBand diagnostic programs
libdapl-dev - development files for the DAPL libraries
libdapl2 - Direct Access Programming Library (DAPL)
libibdm-dev - Development files for the libibdm library
libibdm1 - InfiniBand network diagnostic library
libibmad-dev - Development files for libibmad
libibmad5 - Infiniband Management Datagram (MAD) library
libibnetdisc-dev - InfiniBand diagnostics library headers
libibnetdisc5 - InfiniBand diagnostics library
libopensm-dev - Header files for compiling against libopensm
libopensm5a - InfiniBand subnet manager library
libosmcomp3 - InfiniBand subnet manager - component library
libosmvendor4 - InfiniBand subnet manager - vendor library
libpgm-5.2-0 - OpenPGM shared library
libpgm-dbg - OpenPGM debugging symbols
libpgm-dev - OpenPGM development files
libumad2sim0 - InfiniBand fabric simulator
opensm - InfiniBand subnet manager
opensm-doc - Documentation for the InfiniBand subnet manager
perftest - Infiniband verbs performance tests
rdma-core - RDMA core userspace infrastructure and documentation
rdmacm-utils - Examples for the librdmacm library
srptools - Tools for Infiniband attached storage (SRP)
tgt-rbd - Linux SCSI target user-space daemon and tools - RBD support
Ubuntu Installation: 官方建議安裝 參考就好
Run the following installation commands on both servers:
# apt-get install libmlx4-1 infiniband-diags ibutils ibverbs-utils rdmacm-utils perftest
** InfiniBand 基本運行在IB 模式, 使用 mlxconfig 將改為 ETH 模式
# mstconfig -d 01:00.0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2
Device #1:
----------
Device type: ConnectX5
PCI device: 01:00.0
Configurations: Next Boot New
LINK_TYPE_P1 IB(1) ETH(2)
LINK_TYPE_P2 IB(1) ETH(2)
Apply new Configuration? ? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.
重開機後就可以完成! 設定為ETH模式!
** ISER enable, 確認 iser module 啟用
# modprobe ib_iser
# lsmod | grep iser
ib_iser 49152 0
rdma_cm 61440 3 rpcrdma,ib_iser,rdma_ucm
libiscsi 53248 3 libiscsi_tcp,iscsi_tcp,ib_iser
scsi_transport_iscsi 98304 4 iscsi_tcp,ib_iser,libiscsi
ib_core 221184 10 rdma_cm,ib_ipoib,rpcrdma,iw_cm,ib_iser,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
** 使用 ibstat 確認狀態, 想要速度在100G有幾個限制,
1. PCIE SLOT 為Gen3 X16
2. 確定連接的Infiniband cable 有支援到 100G
# ibstat
CA 'mlx5_0'
CA type: MT4119
Number of ports: 1
Firmware version: 16.24.1000
Hardware version: 0
Node GUID: 0x1c34da03005d4508
System image GUID: 0x1c34da03005d4508
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x1e34dafffe5d4508
Link layer: Ethernet
CA 'mlx5_1'
CA type: MT4119
Number of ports: 1
Firmware version: 16.24.1000
Hardware version: 0
Node GUID: 0x1c34da03005d4509
System image GUID: 0x1c34da03005d4508
Port 1:
State: Down
Physical state: Disabled
Rate: 40
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x04010000
Port GUID: 0x1e34dafffe5d4509
Link layer: Ethernet
# lspci
01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
01:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
# ifconfig -a
enp0s31f6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.10.8.55 netmask 255.255.255.0 broadcast 10.10.8.255
inet6 fe80::428d:5cff:feb4:af78 prefixlen 64 scopeid 0x20<link>
ether 40:8d:5c:b4:af:78 txqueuelen 1000 (Ethernet)
RX packets 243241 bytes 28357786 (28.3 MB)
RX errors 0 dropped 43816 overruns 0 frame 0
TX packets 212004 bytes 229846081 (229.8 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device interrupt 16 memory 0xed200000-ed220000
enp1s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9000
inet 10.10.10.1 netmask 255.255.255.0 broadcast 10.10.10.255
inet6 fe80::1e34:daff:fe5d:4508 prefixlen 64 scopeid 0x20<link>
ether 1c:34:da:5d:45:08 txqueuelen 1000 (Ethernet)
RX packets 38564681 bytes 258283792450 (258.2 GB)
RX errors 0 dropped 10 overruns 0 frame 0
TX packets 52412745 bytes 433726224495 (433.7 GB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
enp1s0f1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
ether 1c:34:da:5d:45:09 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 3216 bytes 217429 (217.4 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 3216 bytes 217429 (217.4 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
設定 Server ip 與 Client ip, 我們皆使用第一個PORT連接
Setting Server ip
# ifconfig enp1s0f0 up 10.10.10.1/24 mtu 9000
Setting Client ip
# ifconfig enp1s0f0 up 10.10.10.2/24 mtu 9000
** 測試跑分 使用 iperf 測試傳輸速度
另外需要numactl 綁定CPUID, 以避免NUMA 訪問的性能問題. 初步研究應該只有多CPU的server borad 有差別!
# apt-get install numactl
# apt-get install iperf
** 確定平台幾個CPU NODE
# numactl --hardware
Server: 設定
# numactl --cpunodebind=0 iperf -s -P8 -w 256K
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 416 KByte (WARNING: requested 256 KByte)
------------------------------------------------------------
初步會在這邊等待Client 連線
Clietn: 設定
# numactl --cpunodebind=0 iperf -c 10.10.10.1 -t 60 -P8 -w 256K
------------------------------------------------------------
Client connecting to 10.10.10.1, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 256 KByte)
------------------------------------------------------------
[ 9] local 10.10.10.2 port 53176 connected with 10.10.10.1 port 5001
[ 10] local 10.10.10.2 port 53178 connected with 10.10.10.1 port 5001
[ 4] local 10.10.10.2 port 53166 connected with 10.10.10.1 port 5001
[ 8] local 10.10.10.2 port 53174 connected with 10.10.10.1 port 5001
[ 7] local 10.10.10.2 port 53172 connected with 10.10.10.1 port 5001
[ 3] local 10.10.10.2 port 53164 connected with 10.10.10.1 port 5001
[ 6] local 10.10.10.2 port 53170 connected with 10.10.10.1 port 5001
[ 5] local 10.10.10.2 port 53168 connected with 10.10.10.1 port 5001
Server 在 Client 連線後會顯示連線狀態
root@sam:/home/sam# numactl --cpunodebind=0 iperf -s -P8 -w 256K
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 416 KByte (WARNING: requested 256 KByte)
------------------------------------------------------------
[ 4] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53164
[ 5] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53166
[ 6] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53168
[ 7] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53172
[ 8] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53170
[ 9] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53174
[ 10] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53176
[ 11] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53178
Server 測試完成狀態
root@sam:/home/sam# numactl --cpunodebind=0 iperf -s -P8 -w 256K
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 416 KByte (WARNING: requested 256 KByte)
------------------------------------------------------------
[ 4] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53164
[ 5] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53166
[ 6] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53168
[ 7] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53172
[ 8] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53170
[ 9] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53174
[ 10] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53176
[ 11] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53178
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-60.0 sec 72.4 GBytes 10.4 Gbits/sec
[ 8] 0.0-60.0 sec 70.3 GBytes 10.1 Gbits/sec
[ 10] 0.0-60.0 sec 72.8 GBytes 10.4 Gbits/sec
[ 11] 0.0-60.0 sec 70.3 GBytes 10.1 Gbits/sec
[ 4] 0.0-60.0 sec 74.2 GBytes 10.6 Gbits/sec
[ 6] 0.0-60.0 sec 73.0 GBytes 10.4 Gbits/sec
[ 7] 0.0-60.0 sec 74.3 GBytes 10.6 Gbits/sec
[ 9] 0.0-60.0 sec 72.2 GBytes 10.3 Gbits/sec
Client 測試完成狀態
root@ubuntu_1804_server:/home/sam# numactl --cpunodebind=0 iperf -c 10.10.10.1 -t 60 -P8 -w 256K
------------------------------------------------------------
Client connecting to 10.10.10.1, TCP port 5001
TCP window size: 416 KByte (WARNING: requested 256 KByte)
------------------------------------------------------------
[ 9] local 10.10.10.2 port 53176 connected with 10.10.10.1 port 5001
[ 10] local 10.10.10.2 port 53178 connected with 10.10.10.1 port 5001
[ 4] local 10.10.10.2 port 53166 connected with 10.10.10.1 port 5001
[ 8] local 10.10.10.2 port 53174 connected with 10.10.10.1 port 5001
[ 7] local 10.10.10.2 port 53172 connected with 10.10.10.1 port 5001
[ 3] local 10.10.10.2 port 53164 connected with 10.10.10.1 port 5001
[ 6] local 10.10.10.2 port 53170 connected with 10.10.10.1 port 5001
[ 5] local 10.10.10.2 port 53168 connected with 10.10.10.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 9] 0.0-60.0 sec 72.8 GBytes 10.4 Gbits/sec
[ 10] 0.0-60.0 sec 70.3 GBytes 10.1 Gbits/sec
[ 4] 0.0-60.0 sec 72.4 GBytes 10.4 Gbits/sec
[ 8] 0.0-60.0 sec 72.2 GBytes 10.3 Gbits/sec
[ 7] 0.0-60.0 sec 74.3 GBytes 10.6 Gbits/sec
[ 3] 0.0-60.0 sec 74.2 GBytes 10.6 Gbits/sec
[ 6] 0.0-60.0 sec 70.3 GBytes 10.1 Gbits/sec
[ 5] 0.0-60.0 sec 73.0 GBytes 10.5 Gbits/sec
[SUM] 0.0-60.0 sec 580 GBytes 83.0 Gbits/sec
參考資料:
https://zhuanlan.zhihu.com/p/74082377
https://community.mellanox.com/s/article/getting-started-with-connectx-5-100gb-s-adapters-for-linux
https://community.mellanox.com/s/article/howto-setup-rdma-connection-using-inbox-driver--rhel--ubuntu-x
https://community.mellanox.com/s/article/howto-configure-lio-enabled-with-iser-for-ubuntu-14-04-inbox-driver
這是重編driver 自行安裝方式
http://benjr.tw/28088
假如有使用LIO 架設 iscsi disk
# lsscsi
[1:0:0:0] disk ATA INTEL SSDSC2CT24 335t /dev/sda
[5:0:0:0] cd/dvd ATAPI DVD D DH16D2S EP52 /dev/sr0
[6:0:0:0] disk LIO-ORG iscsi-ramdisk 4.0 /dev/sdb
root@ubuntu_1804_server:/home/sam# dd if=/dev/sdb of=/dev/null bs=64k iflag=direct
65536+0 records in
65536+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.98333 s, 615 MB/s
root@ubuntu_1804_server:/home/sam# dd if=/dev/sdb of=/dev/null bs=1M iflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 1.77295 s, 2.4 GB/s
** 其他方式確認device status
# ibv_devinfo
hca_id: mlx5_0
transport: InfiniBand (0)
fw_ver: 16.24.1000
node_guid: 1c34:da03:0057:49ec
sys_image_guid: 1c34:da03:0057:49ec
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 1
port_lid: 1
port_lmc: 0x00
link_layer: InfiniBand
hca_id: mlx5_1
transport: InfiniBand (0)
fw_ver: 16.24.1000
node_guid: 1c34:da03:0057:49ed
sys_image_guid: 1c34:da03:0057:49ec
vendor_id: 0x02c9
vendor_part_id: 4119
hw_ver: 0x0
board_id: MT_0000000008
phys_port_cnt: 1
port: 1
state: PORT_DOWN (1)
max_mtu: 4096 (5)
active_mtu: 4096 (5)
sm_lid: 0
port_lid: 65535
port_lmc: 0x00
link_layer: InfiniBand
訂閱:
張貼留言 (Atom)
沒有留言:
張貼留言