Upgrading Integrated Manager for Lustre 4.0.x to Lustre 2.12.2 and Integrated Manager for Lustre 5.1.0.0

Upgrade Guide

Upgrade Integrated Manager for Lustre

The first component in the environment to upgrade is the Integrated Manager for Lustre server and software. The manager server upgrade can be conducted without any impact to the Lustre filesystem services.

Backup the Existing configuration

Prior to commencing the upgrade, it is essential that a backup of the existing configuration is completed. This will enable recovery of the original configuration in the event of a problem occurring during execution of the upgrade.

The following shell script can be used to capture the essential configuration information that is relevant to the Integrated Manager for Lustre software itself:

#!/bin/sh
# Integrated Manager for Lustre (IML) server backup script

BCKNAME=bck-$HOSTNAME-`date +%Y%m%d-%H%M%S`
BCKROOT=$HOME/$BCKNAME
mkdir -p $BCKROOT
tar cf - --exclude=/var/lib/chroma/repo \
/var/lib/chroma \
/etc/sysconfig/network \
/etc/sysconfig/network-scripts/ifcfg-* \
/etc/yum.conf \
/etc/yum.repos.d \
/etc/hosts \
/etc/passwd \
/etc/group \
/etc/shadow \
/etc/gshadow \
/etc/sudoers \
/etc/resolv.conf \
/etc/nsswitch.conf \
/etc/rsyslog.conf \
/etc/ntp.conf \
/etc/selinux/config \
/etc/ssh \
/root/.ssh \
| (cd $BCKROOT && tar xf -)

# IML Database
su - postgres -c "/usr/bin/pg_dumpall --clean" | /bin/gzip > $BCKROOT/pgbackup-`date +\%Y-\%m-\%d-\%H:\%M:\%S`.sql.gz

cd `dirname $BCKROOT`
tar zcf $BCKROOT.tgz `basename $BCKROOT`

Copy the backup tarball to a safe location that is not on the server being upgraded.

Note: This script is not intended to provide a comprehensive backup of the entire operating system configuration. It covers the essential components pertinent to Lustre servers managed by Integrated Manager for Lustre that are difficult to re-create if deleted.

Stopping the filesystem

IML requires that the filesystem(s) associated with each node to be upgraded must be stopped. Follow these steps:

  1. Navigate to Configuration->Filesystems
  2. For each filesystem listed:

    1. Click the filesystem’s Actions button
    2. Select Stop

Install the Integrated Manager for Lustre Upgrade

The software upgrade process requires super-user privileges to run. Login as the root user or use sudo to elevate privileges as required.

  1. Download the latest Integrated Manager for Lustre release repo:

    yum-config-manager --add-repo=https://github.com/whamcloud/integrated-manager-for-lustre/releases/download/v5.1.0/chroma_support.repo
    
  2. Verify that the old iml-4.0.x repo file has been removed from the repolist and that the 5.1 repo has been added.

    yum repolist
    
  3. Run the OS upgrade.

    yum clean metadata
    yum update
    

Refer to the operating system documentation for details on the correct procedure for upgrading between minor OS releases. Note that a mapping of new installs and upgrades will be displayed. Look through this chart carefully and verify that python2-iml-manager is marked for upgrade and that it will be upgraded to 5.1.0.0.

  1. Run chroma-config setup to complete the installation.

  2. Perform a hard refresh on the browser and verify that IML loads correctly.

Upgrade the Lustre Servers

Lustre server upgrades can be coordinated as either an online roll-out, leveraging the failover HA mechanism to migrate services between nodes and minimize disruption, or as an offline service outage, which has the advantage of usually being faster to deploy overall, with generally lower risk.

The upgrade procedure documented here describes the faster and more reliable approach, which requires that the filesystem be stopped. It assumes that the Lustre servers have been installed in pairs, where each server pair forms an independent high-availability cluster built on Pacemaker and Corosync. Integrated Manager for Lustre deploys these configurations and uses both the stock Lustre resource agent and clusterlabs ZFS resource agent. Integrated Manager for Lustre can also configure STONITH agents to provide node fencing in the event of a cluster partition or loss of quorum.

Backup Existing Server Data

  1. As a precaution, create a backup of the existing configuration for each server. The following shell script can be used to capture the essential configuration information that is relevant to Integrated Manager for Lustre managed mode servers:

    #!/bin/sh
    BCKNAME=bck-$HOSTNAME-`date +%Y%m%d-%H%M%S` BCKROOT=$HOME/$BCKNAME
    mkdir -p $BCKROOT
    tar cf - \
    /var/lib/chroma \
    /etc/sysconfig/network \
    /etc/sysconfig/network-scripts/ifcfg-* \
    /etc/yum.conf \
    /etc/yum.repos.d \
    /etc/hosts \
    /etc/passwd \
    /etc/group \
    /etc/shadow \
    /etc/gshadow \
    /etc/sudoers \
    /etc/resolv.conf \
    /etc/nsswitch.conf \
    /etc/rsyslog.conf \
    /etc/ntp.conf \
    /etc/selinux/config \
    /etc/modprobe.d/iml_lnet_module_parameters.conf \
    /etc/corosync/corosync.conf \
    /etc/ssh \
    /root/.ssh \
    | (cd $BCKROOT && tar xf -)
    
    # Pacemaker Configuration:
    cibadmin --query > $BCKROOT/cluster-cfg-$HOSTNAME.xml
    
    cd `dirname $BCKROOT`
    tar zcf $BCKROOT.tgz `basename $BCKROOT`
    

    Note: This is not intended to be a comprehensive backup of the entire operating system configuration. It covers the essential components pertinent to Lustre servers managed by Integrated Manager for Lustre that are difficult to re-create if deleted. Make sure to backup any other important configuration files that may be on your system, such as multipath configurations.

    The following files will need to be backed up if multipath is being used on the system:

    /etc/multipath/*
    /etc/multipath.conf
    
  2. Copy the backups for each server’s configuration to a safe location that is not on the servers being upgraded.

Upgrade the OS on each server node

In order to upgrade, make sure yum is configured on each storage server node to pull down CentOS 7.7 packages. Next, from the manager node, upgrade the OS for each host:

  1. Upgrade the OS

    yum clean metadata
    yum update
    
  2. Update the repos on each server node. As an example, consider the following hosts: mds1.local, mds2.local, oss1.local, and oss2.local:

    [root@manager]# iml update_repo --hosts mds[1,2].local,oss[1,2].local
    

Run the updates

Next, navigate to the server page and proceed to update each of the servers:

  1. Navigate to Configuration->Servers
  2. Each storage server should report “Updates are ready for server X”
  3. Click the Install Updates button
  4. Select all storage servers for upgrade and begin the upgrade.

Setting up HA

  1. Once the upgrade completes, migrate the storage server to the new HA setup. Run the following on each storage server:

    chroma-agent convert_targets
    

Summary

The filesystem(s) should now be started. Connect a client and verify that it is able to access files on the filesystem.

top