Important Note: The definitive source for Lustre documentation is the Lustre Operations Manual available at https://wiki.hpdd.intel.com/display/PUB/Documentation. These documents are copied from internal SSEC working documentation that may be useful for some, but be we provide no guarantee of accuraccy, correctness, or safety. Use at your own risk. |
Notes on how to perform a Lustre upgrade for zfs based Lustre filesystems
This details the process for a point release upgrade of Lustre on ZFS. For example, from Lustre 2.3 to 2.4 or from 2.4.0 to 2.4.2 type of upgrade.
These steps must be executed on the MDS and OSSs.
Use lshowmount -l on the MDS to see where Lustre FS is mounted and unmount the volume from the found clients.
After clients are unmounted, unmount the MDS followed by the OSSs.
service lustre stop service lnet stop
The first step in this process is to uninstall the Lustre module from the system. This is done so that the new Lustre module can be built cleanly against the kernel that you will be using.
yum remove lustre lustre-dkms
An existing Lustre file system in a ZFS pool is untouched by this process. When the new Lustre is installed and the pool is mounted, it will automatically be upgraded to the new version of Lustre.
At this point, verify that the horrible Lustre weak-updates are gone from the system. If they are not, you will probably need to delete them to prevent conflicts when the new Lustre module is built.
ls -l /lib/modules/*/weak-updates
Lustre updates usually bring in new kernel support. We should use this opportunity to upgrade the kernel on Lustre servers for performance improvements, bug fixes, security fixes.
yum update kernel-VERSION kernel-devel-VERSION kernel-firmware-VERSION
You do not want the latest kernel from the Centos repo, but instead you want the latest kernel supported by the version of Lustre you install. You will need to spell out the exact rpm names for the above updates for the correct VERSION.
FOR EXAMPLE: yum install kernel*-2.6.32-358.23.2.el6.x86_64
Now, ensure that the new kernel is selected in /etc/grub.conf and reboot.
After we're booted into the new kernel, you can update ZFS.
yum update zfs
That should take care of all of the dependencies from the ZFS repo.
At this point, we should be ready to build the Lustre module on the new kernel.
yum install lustre lustre-dkms
This step takes awhile, so let it chug away.
THIS STEP IS IMPORTANT!!! DUE TO DKMS STUPIDITY WE HAVE TO EXECUTE THE FOLLOWING COMMANDS:
dkms remove --all spl dkms remove --all zfs dkms install --force spl dkms install --force zfs Reboot
That is the only way to get the modules to load correctly as of 6/16/2014. Otherwise, the zpool import will not detect the filesystems.
Ensure that the appropriate configuration files are still there:
/etc/zfs/vdev_id.conf
/etc/ldev.conf
/etc/modprobe.d/lustre.conf
If they are, ensure that the ib0 network connection is up.
Start Lustre on the MDS first, followed by the OSSs.
service lnet start service lustre start
The filesystem should go through recovery. Test remounting on a single client. Monitor the /var/log/messages on MDS/OSS for errors.