Important Note: The definitive source for Lustre documentation is the Lustre Operations Manual available at https://wiki.hpdd.intel.com/display/PUB/Documentation. These documents are copied from internal SSEC working documentation that may be useful for some, but be we provide no guarantee of accuraccy, correctness, or safety. Use at your own risk. |
At SSEC we tested mirrored meta data targets for our Lustre file systems. The idea is to use ZFS to mirror storage targets in 2 different MDS - so the data is always available on both servers without using iscsi or other technologies. The basis of this idea comes from Charles Taylor's LUG 2012 presentation "High Availability Lustre Using SRP-Mirrored LUNs"
Instead of LVM, we will use ZFS to provide the mirror. SCST for infiniband RDMA providing the targets and ZFS mirrors performed well in our testing. We did not have a chance to test more thoroughly for production.
Below are notes from our investigation.
The device to which data will be written. Usually it controls a group of LUNs (think OSS, not OST or individual disk).
The system or device attempting to access the target. Client system in our case.
Despite its name, this can be implemented w/o RMDA. As we would likely implement, it is a protocol used to communicate with SCSI devices directly over RMDA.
A layer of abstraction on the iSCSI protocol implemented by a "Datamover Architecture" and with RDMA support. The basic idea is simple: RMDA allows devices to reach each other's memory directly. When an initiator beings an unsolicited write, the disk uses the protocol to read the data from the initiator directly while writing to itself. So the target effectively goes and reads the data off of the initiator.
It appears that Datera got this thing into the Linux kernel, but deployment and usage documentation is nonexistent or very hard to actually find.
TargetCLI is a python CLI management interface for the targets.
This framework includes a few components:
- Core/Engine software
- Target "Drivers" - I put drivers in quotes because this part is implemented as a kernel module and they call it a driver, but it is software that controls the Target (think OSS) and doesn't really provide a hardware driver as far as I understand.
- Storage Drivers - This is the part that implements the SCSI commands on the LUN (in our case, attached OSTs).
We will likely need to compile/link the target and storage drivers against a kernel version, and install only with that kernel version. We already link kernel versions with Lustre, so this may not be unreasonable.
This doesn't perform as well as SCST according to the research. It's considered obsolete, but I included the definition because it may be mentioned in a lot of documentation.
NOTE BEFORE IMPLEMENTING: THIS APPEARS TO USE 128KB INODE SIZES WHEN EXPORTED VIA ZPOOLS
This requires the OFA OFED stack and links against it.
export KDIR=/usr/src/kernels/2.6.32-431.el6.x86_64/ export KVER=2.6.32-431.el6.x86_64
make scst && make scst_install make srpt && make srpt_install
Then load the modules into the kernel, and set it up to start on boot.
/usr/lib/lsb/install_initd scst chkconfig --add scst
modprobe scst modprobe ib_srpt modprobe scst_vdisk
zfs create -V 300G shps-meta/MDT zfs set canmount=off shps-meta
scstadmin -open_dev MDT1 -handler vdisk_blockio -attributes filename=/dev/zvol/shps-meta/MDT
scstadmin -list_device scstadmin -list_target ls -l /sys/kernel/scst_tgt/devices
scstadmin --add_group MDS -driver ib_srpt -target ib_srpt_target_0 scstadmin -list_group
scstadmin -add_lun 0 -driver ib_srpt -target ib_srpt_target_0 -group MDS -device MDT1
scstadmin -enable_target ib_srpt_target_0 -driver ib_srpt
scstadmin -set_drv_attr ib_srpt -attributes enabled=1
# cat /etc/modprobe.d/ib_srpt.conf options ib_srpt one_target_per_port=1
scstadmin -add_init '*' -driver ib_srpt -target ib_srpt_target_0 -group MDS
modprobe ib_srp
srp_daemon -oacd/dev/infiniband/umad0
find /sys -iname add_target -print echo "id_ext=0002c90300b77f40,ioc_guid=0002c90300b77f40, dgid=fe800000000000000002c90300b77f41,pkey=ffff, \ service_id=0002c90300b77f40" > /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0/infiniband_srp/srp-mlx4_0-1/add_target
scstadmin -write_config /etc/scst.conf
chkconfig --list scst ckconfig --list srpd chkconfig --list rdma
SRP_LOAD=yes