2020年3月26日 星期四

Ubuntu 18.04 Mallanox ConnectX-5 EDR + 100Gbe (Cx556A) setting



Ubuntu 18.04 Mallanox ConnectX-5 EDR + 100Gbe (Cx556A) setting

**  check device.

# lspci | grep Mellanox
01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
01:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

**  check driver.
# lspci -k
01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
        Subsystem: Mellanox Technologies MT27800 Family [ConnectX-5]
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core
01:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
        Subsystem: Mellanox Technologies MT27800 Family [ConnectX-5]
        Kernel driver in use: mlx5_core
        Kernel modules: mlx5_core

**  driver install
    1.  基本Ubuntu 18.04 已支援, 從上面確定 mlx5_core 已裝。
    2.  若沒有安裝可以上網找尋 OFED driver,原廠有提供相關driver可以安裝
        透過 https://www.mellanox.com/products/infiniband-drivers/linux/mlnx_ofed 下載
        現階段只有Ubuntu18.04 支援 aarch64 系統.
        解壓縮後執行 ./mlnxofedinstall 即可安裝完成.
        可參考 https://community.mellanox.com/s/article/howto-install-mlnx-ofed-driver

**  Important Packages and Their Installation
#apt-get install rdma-core
    rdma-core   RDMA core userspace libraries and daemons

#apt-get install opensm
    opensm: InfiniBand Subnet Manager
    opensm-libs Libraries used by OpenSM and included utilities
    opensm      OpenIB InfiniBand Subnet Manager and management utilities

#apt-get install Ibutils
    Ibutils: OpenIB Mellanox InfiniBand Diagnostic Tools
    ibutils-libs    Shared libraries used by ibutils binaries
    ibutils         OpenIB Mellanox InfiniBand Diagnostic Tools

#apt-get install infiniband-diags
    infiniband-diags:   OpenFabrics Alliance InfiniBand Diagnostic Tools

#apt-get install perftest
    perftest: IB Performance tests

#apt-get install mstflint
    mstflint: Mellanox Firmware Burning and Diagnostics Tools

# apt-cache search infiniband
grub-ipxe - Network booting from GRUB using iPXE
ibverbs-providers - User space provider drivers for libibverbs
ipxe - PXE boot firmware
ipxe-qemu - PXE boot firmware - ROM images for qemu
ipxe-qemu-256k-compat-efi-roms - PXE boot firmware - Compat EFI ROM images for qemu
libibumad-dev - Development files for libibumad
libibumad3 - InfiniBand Userspace Management Datagram (uMAD) library
libibverbs-dev - Development files for the libibverbs library
libibverbs1 - Library for direct userspace use of RDMA (InfiniBand/iWARP)
librdmacm-dev - Development files for the librdmacm library
librdmacm1 - Library for managing RDMA connections
tgt - Linux SCSI target user-space daemon and tools
tgt-dbg - Linux SCSI target user-space daemon and tools - debug symbols
collectl - Utility to collect Linux performance data
ctdb - clustered database to store temporary data
dapl2-utils - utilities for use with the DAPL libraries
glusterfs-client - clustered file-system (client package)
glusterfs-common - GlusterFS common libraries and translator modules
glusterfs-server - clustered file-system (server package)
ibacm - InfiniBand Communication Manager Assistant (ACM)
ibsim-utils - InfiniBand fabric simulator utilities
ibutils - InfiniBand network utilities
ibverbs-utils - Examples for the libibverbs library
infiniband-diags - InfiniBand diagnostic programs
libdapl-dev - development files for the DAPL libraries
libdapl2 - Direct Access Programming Library (DAPL)
libibdm-dev - Development files for the libibdm library
libibdm1 - InfiniBand network diagnostic library
libibmad-dev - Development files for libibmad
libibmad5 - Infiniband Management Datagram (MAD) library
libibnetdisc-dev - InfiniBand diagnostics library headers
libibnetdisc5 - InfiniBand diagnostics library
libopensm-dev - Header files for compiling against libopensm
libopensm5a - InfiniBand subnet manager library
libosmcomp3 - InfiniBand subnet manager - component library
libosmvendor4 - InfiniBand subnet manager - vendor library
libpgm-5.2-0 - OpenPGM shared library
libpgm-dbg - OpenPGM debugging symbols
libpgm-dev - OpenPGM development files
libumad2sim0 - InfiniBand fabric simulator
opensm - InfiniBand subnet manager
opensm-doc - Documentation for the InfiniBand subnet manager
perftest - Infiniband verbs performance tests
rdma-core - RDMA core userspace infrastructure and documentation
rdmacm-utils - Examples for the librdmacm library
srptools - Tools for Infiniband attached storage (SRP)
tgt-rbd - Linux SCSI target user-space daemon and tools - RBD support

Ubuntu Installation: 官方建議安裝 參考就好
Run the following installation commands on both servers:
# apt-get install libmlx4-1 infiniband-diags ibutils ibverbs-utils rdmacm-utils perftest

**  InfiniBand 基本運行在IB 模式, 使用 mlxconfig 將改為 ETH 模式
# mstconfig -d 01:00.0 set LINK_TYPE_P1=2 LINK_TYPE_P2=2

Device #1:
----------

Device type:    ConnectX5     
PCI device:     01:00.0       

Configurations:                              Next Boot       New
         LINK_TYPE_P1                        IB(1)           ETH(2)       
         LINK_TYPE_P2                        IB(1)           ETH(2)       

Apply new Configuration? ? (y/n) [n] : y
Applying... Done!
-I- Please reboot machine to load new configurations.

重開機後就可以完成! 設定為ETH模式!

** ISER enable, 確認 iser module 啟用
# modprobe ib_iser

# lsmod | grep iser
ib_iser                49152  0
rdma_cm                61440  3 rpcrdma,ib_iser,rdma_ucm
libiscsi               53248  3 libiscsi_tcp,iscsi_tcp,ib_iser
scsi_transport_iscsi    98304  4 iscsi_tcp,ib_iser,libiscsi
ib_core               221184  10 rdma_cm,ib_ipoib,rpcrdma,iw_cm,ib_iser,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm


**  使用 ibstat 確認狀態, 想要速度在100G有幾個限制,
    1. PCIE SLOT 為Gen3 X16
    2. 確定連接的Infiniband cable 有支援到 100G
 
# ibstat
CA 'mlx5_0'
        CA type: MT4119
        Number of ports: 1
        Firmware version: 16.24.1000
        Hardware version: 0
        Node GUID: 0x1c34da03005d4508
        System image GUID: 0x1c34da03005d4508
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x1e34dafffe5d4508
                Link layer: Ethernet
CA 'mlx5_1'
        CA type: MT4119
        Number of ports: 1
        Firmware version: 16.24.1000
        Hardware version: 0
        Node GUID: 0x1c34da03005d4509
        System image GUID: 0x1c34da03005d4508
        Port 1:
                State: Down
                Physical state: Disabled
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 0x04010000
                Port GUID: 0x1e34dafffe5d4509
                Link layer: Ethernet

# lspci
01:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
01:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

# ifconfig -a
enp0s31f6: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.10.8.55  netmask 255.255.255.0  broadcast 10.10.8.255
        inet6 fe80::428d:5cff:feb4:af78  prefixlen 64  scopeid 0x20<link>
        ether 40:8d:5c:b4:af:78  txqueuelen 1000  (Ethernet)
        RX packets 243241  bytes 28357786 (28.3 MB)
        RX errors 0  dropped 43816  overruns 0  frame 0
        TX packets 212004  bytes 229846081 (229.8 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device interrupt 16  memory 0xed200000-ed220000

enp1s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9000
        inet 10.10.10.1  netmask 255.255.255.0  broadcast 10.10.10.255
        inet6 fe80::1e34:daff:fe5d:4508  prefixlen 64  scopeid 0x20<link>
        ether 1c:34:da:5d:45:08  txqueuelen 1000  (Ethernet)
        RX packets 38564681  bytes 258283792450 (258.2 GB)
        RX errors 0  dropped 10  overruns 0  frame 0
        TX packets 52412745  bytes 433726224495 (433.7 GB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp1s0f1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether 1c:34:da:5d:45:09  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 3216  bytes 217429 (217.4 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3216  bytes 217429 (217.4 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

設定 Server ip 與 Client ip, 我們皆使用第一個PORT連接
Setting Server ip
# ifconfig enp1s0f0 up 10.10.10.1/24 mtu 9000
Setting Client ip
# ifconfig enp1s0f0 up 10.10.10.2/24 mtu 9000
     

**  測試跑分 使用 iperf 測試傳輸速度
      另外需要numactl 綁定CPUID, 以避免NUMA 訪問的性能問題. 初步研究應該只有多CPU的server borad 有差別!
# apt-get install numactl 
# apt-get install iperf

** 確定平台幾個CPU NODE
# numactl --hardware

Server:  設定
# numactl --cpunodebind=0 iperf -s -P8 -w 256K
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  416 KByte (WARNING: requested  256 KByte)
------------------------------------------------------------

初步會在這邊等待Client 連線


Clietn:  設定
# numactl --cpunodebind=0 iperf -c 10.10.10.1  -t 60 -P8 -w 256K
------------------------------------------------------------
Client connecting to 10.10.10.1, TCP port 5001
TCP window size:  416 KByte (WARNING: requested  256 KByte)
------------------------------------------------------------
[  9] local 10.10.10.2 port 53176 connected with 10.10.10.1 port 5001
[ 10] local 10.10.10.2 port 53178 connected with 10.10.10.1 port 5001
[  4] local 10.10.10.2 port 53166 connected with 10.10.10.1 port 5001
[  8] local 10.10.10.2 port 53174 connected with 10.10.10.1 port 5001
[  7] local 10.10.10.2 port 53172 connected with 10.10.10.1 port 5001
[  3] local 10.10.10.2 port 53164 connected with 10.10.10.1 port 5001
[  6] local 10.10.10.2 port 53170 connected with 10.10.10.1 port 5001
[  5] local 10.10.10.2 port 53168 connected with 10.10.10.1 port 5001


Server 在 Client 連線後會顯示連線狀態
root@sam:/home/sam# numactl --cpunodebind=0 iperf -s -P8 -w 256K
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  416 KByte (WARNING: requested  256 KByte)
------------------------------------------------------------
[  4] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53164
[  5] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53166
[  6] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53168
[  7] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53172
[  8] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53170
[  9] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53174
[ 10] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53176
[ 11] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53178

Server 測試完成狀態
root@sam:/home/sam# numactl --cpunodebind=0 iperf -s -P8 -w 256K
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  416 KByte (WARNING: requested  256 KByte)
------------------------------------------------------------
[  4] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53164
[  5] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53166
[  6] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53168
[  7] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53172
[  8] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53170
[  9] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53174
[ 10] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53176
[ 11] local 10.10.10.1 port 5001 connected with 10.10.10.2 port 53178
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-60.0 sec  72.4 GBytes  10.4 Gbits/sec
[  8]  0.0-60.0 sec  70.3 GBytes  10.1 Gbits/sec
[ 10]  0.0-60.0 sec  72.8 GBytes  10.4 Gbits/sec
[ 11]  0.0-60.0 sec  70.3 GBytes  10.1 Gbits/sec
[  4]  0.0-60.0 sec  74.2 GBytes  10.6 Gbits/sec
[  6]  0.0-60.0 sec  73.0 GBytes  10.4 Gbits/sec
[  7]  0.0-60.0 sec  74.3 GBytes  10.6 Gbits/sec
[  9]  0.0-60.0 sec  72.2 GBytes  10.3 Gbits/sec

Client 測試完成狀態
root@ubuntu_1804_server:/home/sam# numactl --cpunodebind=0 iperf -c 10.10.10.1  -t 60 -P8 -w 256K
------------------------------------------------------------
Client connecting to 10.10.10.1, TCP port 5001
TCP window size:  416 KByte (WARNING: requested  256 KByte)
------------------------------------------------------------
[  9] local 10.10.10.2 port 53176 connected with 10.10.10.1 port 5001
[ 10] local 10.10.10.2 port 53178 connected with 10.10.10.1 port 5001
[  4] local 10.10.10.2 port 53166 connected with 10.10.10.1 port 5001
[  8] local 10.10.10.2 port 53174 connected with 10.10.10.1 port 5001
[  7] local 10.10.10.2 port 53172 connected with 10.10.10.1 port 5001
[  3] local 10.10.10.2 port 53164 connected with 10.10.10.1 port 5001
[  6] local 10.10.10.2 port 53170 connected with 10.10.10.1 port 5001
[  5] local 10.10.10.2 port 53168 connected with 10.10.10.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  9]  0.0-60.0 sec  72.8 GBytes  10.4 Gbits/sec
[ 10]  0.0-60.0 sec  70.3 GBytes  10.1 Gbits/sec
[  4]  0.0-60.0 sec  72.4 GBytes  10.4 Gbits/sec
[  8]  0.0-60.0 sec  72.2 GBytes  10.3 Gbits/sec
[  7]  0.0-60.0 sec  74.3 GBytes  10.6 Gbits/sec
[  3]  0.0-60.0 sec  74.2 GBytes  10.6 Gbits/sec
[  6]  0.0-60.0 sec  70.3 GBytes  10.1 Gbits/sec
[  5]  0.0-60.0 sec  73.0 GBytes  10.5 Gbits/sec
[SUM]  0.0-60.0 sec   580 GBytes  83.0 Gbits/sec

參考資料:
https://zhuanlan.zhihu.com/p/74082377
https://community.mellanox.com/s/article/getting-started-with-connectx-5-100gb-s-adapters-for-linux
https://community.mellanox.com/s/article/howto-setup-rdma-connection-using-inbox-driver--rhel--ubuntu-x
https://community.mellanox.com/s/article/howto-configure-lio-enabled-with-iser-for-ubuntu-14-04-inbox-driver

這是重編driver 自行安裝方式
http://benjr.tw/28088

假如有使用LIO 架設 iscsi disk
# lsscsi
[1:0:0:0]    disk    ATA      INTEL SSDSC2CT24 335t  /dev/sda
[5:0:0:0]    cd/dvd  ATAPI    DVD D  DH16D2S   EP52  /dev/sr0
[6:0:0:0]    disk    LIO-ORG  iscsi-ramdisk    4.0   /dev/sdb

root@ubuntu_1804_server:/home/sam# dd if=/dev/sdb of=/dev/null bs=64k iflag=direct
65536+0 records in
65536+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 6.98333 s, 615 MB/s

root@ubuntu_1804_server:/home/sam# dd if=/dev/sdb of=/dev/null bs=1M iflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 1.77295 s, 2.4 GB/s


** 其他方式確認device status
ibv_devinfo

hca_id: mlx5_0
        transport:                      InfiniBand (0)
        fw_ver:                         16.24.1000
        node_guid:                      1c34:da03:0057:49ec
        sys_image_guid:                 1c34:da03:0057:49ec
        vendor_id:                      0x02c9
        vendor_part_id:                 4119
        hw_ver:                         0x0
        board_id:                       MT_0000000008
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 1
                        port_lid:               1
                        port_lmc:               0x00
                        link_layer:             InfiniBand

hca_id: mlx5_1
        transport:                      InfiniBand (0)
        fw_ver:                         16.24.1000
        node_guid:                      1c34:da03:0057:49ed
        sys_image_guid:                 1c34:da03:0057:49ec
        vendor_id:                      0x02c9
        vendor_part_id:                 4119
        hw_ver:                         0x0
        board_id:                       MT_0000000008
        phys_port_cnt:                  1
                port:   1
                        state:                  PORT_DOWN (1)
                        max_mtu:                4096 (5)
                        active_mtu:             4096 (5)
                        sm_lid:                 0
                        port_lid:               65535
                        port_lmc:               0x00
                        link_layer:             InfiniBand



沒有留言:

張貼留言