Virtual I/O Server Part 1 : Shared Storage Pool enhancements

Everybody knows that I’m a huge fan of Shared Storage Pools, you can check my previous post on this subject on the blog. With the new version of of Virtual I/O Servers Shared Storage Pool have been enhanced with some cool features : simplified management, pool mirroring, pool shrinking and the long awaited unicast mode to get rid of the multicast. This post will show you that Shared Storage Pool are now powerful and ready to be used for production server (this is my own opinion). I’ll not talk about this later but be aware that the maximum pool size is now 16TB and the pool can now serve 250 Virtual I/O Clients. Here we go :

Rolling updates

Rolling updates are available since Virtual I/O Server but it is the first time I am using it. This feature is not a new enhancement brought by the version but it still good to write about it :-). The rolling updates allow you to update Virtual I/O Severs (members of a Shared Storage Pool) one by one without causing an outage of the entire cluster. Each Virtual I/O Server has to leave the cluster before starting the update and can rejoin the cluster after the reboot and its update. A special node called database primary node (DBN) checks every ten minutes if Virtual I/O Servers are ON_LEVEL or UP_LEVEL. Before trying new Shared Storage Pool features I had to update Virtual I/O Servers, here is a little reminder on how to do it with rolling updates :

  • Who is the current database primary node in the cluster (This is the last I’ll update) :
  • # cluster -clustername vio-cluster -status -verbose -fmt : -field pool_name pool_state node_name node_mtm node_partition_num node_upgrade_status node_roles
  • The DBN is going to move from one Virtual I/O Server to one another at the moment it will leave the cluster :
  • vio-ssp:OK:vios1.domain.test:8202-E4C02064099R:2:UP_LEVEL:
     : :vios3.domain.test:8202-E4C02064011R:2:ON_LEVEL:
  • On the Virtual I/O Server you want to update leave the cluster before running the update :
  • # clstartstop -n vio-cluster -m $(hostname)
    # mount nim_server:/export/nim/lpp_source/vios2231-fp27-lpp_source /mnt
    # updateios -dev /mnt -accept -install
  • When the update is finished, re-join the cluster :
  • # ioslevel
    # clstartstop -start -n vio_cluster -m $(hostname)
  • On any Virtual I/O Server you can list the cluster and check if Virtual I/O Servers are ON_LEVEL or UP_LEVEL :
  • # cluster -clustername vio-cluster -status -verbose -fmt : -field pool_name pool_state node_name node_mtm node_partition_num node_upgrade_status node_roles
  • All backing devices served by the Shared Storage Pool are still available on each node no matter it is ON_LEVEL or UP_LEVEL.
  • When all the Virtual I/O Servers are updated (and the last one is updated, in my case the DBN), all Virtual I/O Server node_upgrade_status will be ON_LEVEL. Remember that you have to wait at least 10 minutes to check that all Virtual I/O Servers are ON_LEVEL :
  • $ cluster -clustername vio-cluster -status -verbose -fmt : -field pool_name pool_state node_name node_mtm node_partition_num node_upgrade_status node_roles
    vio-ssp:OK:vios1.domain.test:8202-E4C02064099R:2: ON_LEVEL:
    vio-ssp:OK:vios2.domain.test:8202-E4C02064099R:1: ON_LEVEL:DBN
    vio-ssp:OK:vios3.domain.test:8202-E4C02064011R:2: ON_LEVEL:
    vio-ssp:OK:vios4.domain.test:8202-E4C02064011R:1: ON_LEVEL:
  • The Shared Storage Pool upgrade is performed by root user by the crontab. You can run the sspupgrade command by hand if you want to check if there are any Shared Storage Pool update running :
  • # crontab -l | grep sspupgrade
    0,10,20,30,40,50 * * * * /usr/sbin/sspupgrade -upgrade
    # sspupgrade -status
    No Upgrade in progress
  • If all nodes are not at the same ioslevel you’ll not be able to use new commands (check the output below) :
  • # failgrp -list
    The requested operation can not be performed since the software capability is currently not enabled.
    Please upgrade all nodes within the cluster and retry the request once the upgrade has completed successfully.
  • If you are moving from any version to the communication mode for the cluster will change from multicast to unicast as part of rolling upgrade operation. You have nothing to do.

Mirroring the Shared Storage Pool with failgrp command

One of the major drawback of Shared Storage Pools was resilience. The Shared Storage Pool was not able to mirror luns from on site to one another. The failgrp command introduced in this new version is used to mirror the Shared Storage Pool across different SAN array and can easily answer to resiliency questions. In my opinion this was the missing feature needed to deploy Shared Storage Pools in a production environment. One of the major cool thing in this is that you have NOTHING to do at the lpar level. All the mirroring is performed by the Virtual I/O Server and the Shared Storage Pool themselves, no need at all to verify on each lpar mirroring of your volume groups. The single point of management for mirroring is now the failure group. :-)

  • By default after upgrading all nodes to all luns are assigned to a failure group called Default, you can rename it if you want to. Notice that the Shared Storage Pool is not mirrored at this state :
  • $ failgrp -list -fmt : -header
    $ failgrp -modify -clustername vio-cluster -sp vio-ssp -fg Default -attr FG_NAME=failgrp_site1
    Given attribute(s) modified successfully.
    $  failgrp -list -fmt : -header
    $ cluster -clustername vio-cluster -status -verbose 
        Pool Name:            vio-ssp
        Pool Id:              000000000AFD672900000000529641F4
        Pool Mirror State:    NOT_MIRRORED
  • Create the second failure group on the second site (be carefull with the command syntax) :
  • $ failgrp -create -clustername vio-cluster -sp vio-ssp -fg failgrp_site2: hdiskpower141
    failgrp_site2 FailureGroup has been created successfully.
    $ failgrp -list -fmt : -header
  • While failgrp command is mirroring the pool (and when it’s finished) you can check that your pool is in SYNCED mode :
  • $ cluster -status -clustername vio-cluster -verbose
        Pool Name:            vio-ssp
        Pool Id:              000000000AFD672900000000529641F4
        Pool Mirror State:    SYNCED
  • To identify luns in the Shared Storage Pool a new command is introduced called pv. You can quickly identify luns used by the Shared Storage Pool with their respective failure group :
  • $ pv -list
    POOL_NAME: vio-ssp
    FG_NAME: failgrp_site1
    PV_NAME          SIZE(MB)    STATE            UDID
    hdiskpower156    55770       ONLINE           1D06020F6709SYMMETRIX03EMCfcp
    hdiskpower1      111540      ONLINE           1D06020F6909SYMMETRIX03EMCfcp
    POOL_NAME: vio-ssp
    FG_NAME: failgrp_site2
    PV_NAME          SIZE(MB)    STATE            UDID
    hdiskpower2      223080      ONLINE           1D06020F6B09SYMMETRIX03EMCfcp

Mapping made easier with the lu command

If like me you ever used a Shared Storage Pool you probably already know that mapping a device was not so easy with the mkbdsp command. Once again the new version of Virtual I/O Server is trying to simplify the logical units and backing device management with the addition of a new command called lu. Its purpose is to manage logical units creation, listing, mapping and removing in a single easy to use command, i’ll not tell you here how to use the command but here are a few nice example :

  • Create a thin backing device called bd_lu01 and map it to vhost4 :
  • $ lu -create -clustername vio-cluster -sp vio-ssp -lu lu01 -vadapter vhost4 -vtd bd_lu01 -size 10G
    Lu Name:lu0011
    Lu Udid:e982bf85313bcafe0af7653e8e39c3d9
    Assigning logical unit 'lu01' as a backing device.
  • Do not forget to map it on the second Virtual I/O Server :
  • $ lu -map -clustername vio-cluster -sp vio-ssp -lu lu01 -vadapter vhost4 -vtd bd_lu01
    Assigning logical unit 'lu01' as a backing device.
  • List all logical units viewed in the Shared Storage Pool :
  • $ lu -list
    POOL_NAME: vio-ssp
    LU_NAME                 SIZE(MB)    UNUSED(MB)  UDID
    lu01                  10240       10240       e982bf85313bcafe0af7653e8e39c3d9

Physical volume management made easier with the pv command

Before this new release Shared Storage Pool was able to replace a lun by using the chsp command. You were not able to remove a lun for the Shared Storage Pool. The new release aims to simplify the Shared Storage Pool management and a new command called pv is introduced to manage luns from a single and easy command to add, replace, remove and list luns from the Shared Storage Pool.

  • Adding a disk to the Shared Storage Pool :
  • $ pv -add -clustername vio-cluster -sp vio-ssp -fg failgrp_site1: hdiskpower1
    Given physical volume(s) have been added successfully.
  • Replacing a disk from the Shared Storage Pool (note that you can’t replace a disk by a smaller one even if this one is not totally used) :
  • $ pv -replace -clustername vio-cluster -sp vio-ssp -oldpv hdiskpower2 -newpv hdiskpower1
    Current request action progress: % 5
    Current request action progress: % 100
    The capacity of the new device(s) is less than the capacity
    of the old device(s). The old device(s) cannot be replaced.
    $ pv -replace -clustername vio-cluster -sp vio-ssp -oldpv hdiskpower156 -newpv hdiskpower1
    Current request action progress: % 5
    Current request action progress: % 6
    Current request action progress: % 100
    Given physical volume(s) have been replaced successfully.
  • Unfortunately the REPDISK cannot be replaced by this command, you have to use chrepos command:
  • $ lspv | grep hdiskpower0
    hdiskpower0      00f7407858d6a19d                     caavg_private    active
    $ pv -replace -oldpv hdiskpower0 -newpv hdiskpower156
    The specified PV is not part of the storage pool
  • Used pv can be easily list with this command :
  • $ pv -list -verbose -fmt : -header
    vio-ssp:SYSTEM:failgrp_site1:hdiskpower1:111540:ONLINE:1D06020F6909SYMMETRIX03EMCfcp:PowerPath Device
    vio-ssp:SYSTEM:failgrp_site2:hdiskpower2:223080:ONLINE:1D06020F6B09SYMMETRIX03EMCfcp:PowerPath Device

    Removing a disk from the Shared Storage Pool

    It is now possible to remove disk from the Shared Storage Pool, this new features is known as pool shrinking

  • Removing a disk from the Shared Storage Pool :
  • $ pv -remove -clustername vio-cluster -sp vio-ssp -pv hdiskpower1
    Given physical volume(s) have been removed successfully.

    Repository disk failure and replacement

    The Shared Storage pool is now able to be up without its repository disk running. Having an error on the repository disk is not a problem at all. This one can be replaced at the moment you found an error on it by the chrepos command and it’s pretty easy to do. Here is a little reminder :

    • To identify repository disk you can use CAA command as root :
    • # /usr/lib/cluster/clras lsrepos
      hdisk558 has a cluster repository signature.
      hdisk559 has a cluster repository signature.
      hdisk560 has a cluster repository signature.
      Cycled 680 disks.
      hdisk561 has a cluster repository signature.
      hdiskpower139 has a cluster repository signature.
      Cycled 690 disks.
      Found 5 cluster repository disks.
    • For the example we are going to write some zero to the repository disk, and showing it’s not readable anymore by the Shared Storage Pool :
    • # lsvg -p caavg_private
      hdiskpower0       active            15          8           02..00..00..03..03
      # lsvg -l caavg_private
      LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
      caalv_private1      boot       1       1       1    closed/syncd  N/A
      caalv_private2      boot       1       1       1    closed/syncd  N/A
      caalv_private3                 4       4       1    open/syncd    N/A
      powerha_crlv        boot       1       1       1    closed/syncd  N/A
      #  dd if=/dev/zero of=/dev/hdiskpower0 bs=1024 count=1024
      1024+0 records in
      1024+0 records out
      # lsvg -l caavg_private
      0516-066 : Physical volume is not a volume group member.
              Check the physical volume name specified.
    • The REPDISK cannot be totally removed. You have to replace by another lun in case of emergency :
    • $ chrepos -n vio-cluster -r -hdiskpower0
      chrepos: The removal of repository disks is not currently supported.
    • If the repository disk has a problem you can move it another disk by using the chrepos command :
    • $ chrepos -r -hdiskpower0,+hdiskpower141
      chrepos: Successfully modified repository disk or disks.

    Cluster unicast communication by default

    Using unicast in a Cluster AIX Aware cluster is a long awaited feature asked by IBM customers. As everybody knows Shared Storage Pools are based on the Cluster AIX Aware cluster and is using multicast before the release. By updating all Virtual I/O Servers members of a Shared Storage Pool the communication_mode attribute used by CAA will now be change to unicast. This mode does not exists at all in previous version so do not try to modify it. With this new feature a new command will be available as padmin called clo. This one will lets you check and change CAA tunables, communication_mode included :

    • Before updating the Virtual I/O Server to the communication_mode tunable does not exists and you have to check it with the CAA command : clctrl (you have to be root) :
    • # clctrl -tune -a
         vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).config_timeout = 480
           vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).deadman_mode = a
           vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).link_timeout = 0
        vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).node_down_delay = 10000
           vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).node_timeout = 20000
             vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).packet_ttl = 32
       vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).remote_hb_factor = 10
             vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).repos_mode = e
      vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).site_merge_policy = p
    • As you can see on the output below running a lscluster command on a Shared Storage Pool before version gives you the multicast address used by the CAA cluster :
    • $ lscluster -i | grep MULTICAST
                      IPv4 MULTICAST ADDRESS:
                      IPv4 MULTICAST ADDRESS:
                      IPv4 MULTICAST ADDRESS:
                      IPv4 MULTICAST ADDRESS:
    • After updating a node to version in the /usr/ios/utils directory a new command named clo will be available, and a new tunable will be here called communication_mode :
    • $ oem_setup_env
      # ls -l /usr/ios/utils/clo
      lrwxrwxrwx    1 root     system           20 Dec  2 10:20 /usr/ios/utils/clo -> /usr/lib/cluster/clo
      # exit
      $ clo -L communication_mode
      NAME                      DEF    MIN    MAX    UNIT           SCOPE
           ENTITY_NAME(UUID)                                                CUR
      communication_mode        m                                   c
           vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648)                m
    • If all nodes in the cluster are not at the same level (an your update is not finished) you’ll not be able to use this tunable :
    • # clo -o communication_mode
      clo: 1485-506 The current cluster level does not support tunable communication_mode.
    • After updating all nodes to (all nodes will be ON_LEVEL), the communication_mode will automatically be configured to unicast :
    • $  clo -o communication_mode
      vio-cluster(791d19c8-5796-11e3-ac2c-5cf3fcea5648).communication_mode = u

    To sum up : in my opinion Shared Storage Pools are now production ready. I’ve never seen them in production neither in development environment but Shared Storage Pools really deserve to be used. Things are drasticly simplified and even more with this new version. Please try the new Shared Storage Pools and give me your feedbacks in the comments. Once again I hope it helps.

    10 thoughts on “Virtual I/O Server Part 1 : Shared Storage Pool enhancements

      • Hi,

        First of all like I said on the post you can re-create the repository disk if this one was lost by using the chrepos command.
        One thing you can do is to dump the repository by using a CAA command to check the chrepos command was ok :
        # /usr/lib/cluster/clras dumprepos

        Mapping and Shared Storage Pool configuration are saved by the viosbr command do not forget to run it on your Virtual I/O Servers :
        # viosbr -backup -clustername vio-cluster -file FileName -frequency daily
        To recover a lost databse you can also use the viosbr command :
        # viosbr -recoverdb -clustername vio-cluster -file cfgbackups/vios1.domain.test.viosbr

        Tell my if you have further question.



    1. Thanks for answers. I’ve one more question…

      The disk is protected by a matrix, but the “shared storage pool” is a filesystem – poolfs. What happens when filesystem is failure?

      • Just to really understand your question :

        What appends if the poolfs is scrambled ?

        In my case this one :

        # mount | grep SSP
                 /var/vio/SSP/vio-cluster/D_E_F_A_U_L_T_061310 /var/vio/SSP/vio-cluster/D_E_F_A_U_L_T_061310 poolfs Dec 03 18:07

        Let me test this by my first thought is to try to resync the cluster with the running configuration :

        $ cluster -sync -clustername vio-cluster

        Let me try this and i’ll update the answer.



    2. Hi,
      we are using SSP for 4 months now. All was OK until this tuesday after unsuccessful upgrade of one node to I restored the node from mksysb but it was not able to join the cluster and after some time the whole cluster went down. After restart of VIOses only one VIO could access SSP at a time. No other VIOs could join the cluster. IBM identified problem on multicast network. The network design did not incorporated multicast over two sites. When I started building the cluster I did not know that there is some multicast communication but somehow it worked over our two DCs untill the upgrade failure.
      So to solve the issue I have upgraded all nodes to VIOS and cluster is back and serving SSP. I came up with this solution thanks to this article. Big THANKS.

      Otherwise I can recommend usage of SSPs. SSP was working well and now after the upgrade to latest version I believe it is really worth using. Next week I will setup failgroup and add cluster backup into my viobackup script.


    3. Hi Benoit,
      Have two question:
      1)I tried upgrading my vios from to the But, I was stuck in build date verification failure error for the couple of filesets. I tried re-installing using force flag and overwrite same or newer version but that did not worked too. Do you have any idea how to by pass those error?
      2)Assume, if I have to re-build the vios from scratch but, I have saved the mapping and ssp configuration using viosbr command. Can I restore all the configuration the way it was before? I have doubt because the hdisk number and some other configuration might get changes after installation and not sure if viosbr command is smart enough to detect those changes and restore accordingly.
      Thanks in advance,

      • Hi

        1) I’ve faced the same problem when updating to There were a few filesets installed on my Virtual I/O Server coming from the expansion DVD (some iSCSI target/initiator filesets, and some ldap client fileset). The Virtual I/O Server lpp_source does not provides theses fileset and you have to download the expansion DVD from your Entitled Software Download. It can be a problem when upgrading. Then add the filesets you want in the lpp_source before updating. Maybe you can give me your output and we can identify which fileset is in cause.
        2) I’ve never used viosbr on a Shared Storage Pool, but I regulary use it for SEA/vSCSI/NPIV. Don’t be worried about your disks viosbr is smart enough and check disks with their PVID (maybe uuid too). For vSCSI mapping it works like a charm, creation of Shared Ethernet Adapter too. I trust this tool. Please be carefull when using viosbr on a SSP, use the right command to save and restore :

        # # This will backup all the node that are UP
        # viosbr -backup -file /home/padmin/viosbr/$(hostname).viosbr.ssp -clustername vio-cluster -frequency daily
        # # Check your bakcup is ok, there is one xml file per node
        # viosbr -view -clustername vio-cluster -file /home/padmin/viosbr/$(hostname).viosbr.ssp
        # # Restore the whole pool 
        # viosbr -restore -clustername vio-cluster -file /home/padmin/viosbr/$(hostname) -repopvs hdiskpowerX

        Please let me know if I can help you further.



        • Thanks for the quick reply. And, sorry for the long outputs:

          # lppchk -v
          lppchk: The following filesets need to be installed or corrected to bring
          the system to a consistent state: (usr: APPLIED, root: not installed) (usr: APPLIED, root: not installed) (usr: APPLIED, root: not installed)
          bos.sysmgt.trace (usr: APPLIED, root: not installed)
          devices.chrp.IBM.lhca.rte (usr: APPLIED, root: not installed)
          devices.chrp.pci.rte (usr: APPLIED, root: not installed)
          devices.common.IBM.ib.rte (usr: APPLIED, root: not installed)
          devices.common.IBM.mpio.rte (usr: APPLIED, root: not installed)
          devices.common.IBM.mpt2.rte (usr: APPLIED, root: not installed)
          devices.common.IBM.sissas.rte (usr: APPLIED, root: not installed)
          devices.common.IBM.storfwork.rte (usr: APPLIED, root: not installed)
          devices.fcp.disk.rte (usr: APPLIED, root: not installed)
          devices.fcp.tape.rte (usr: APPLIED, root: not installed)
          devices.ide.cdrom.rte (usr: APPLIED, root: not installed)
          devices.iscsi_sw.rte (usr: APPLIED, root: not installed)
          devices.pci.14107802.rte (usr: APPLIED, root: not installed) (usr: APPLIED, root: not installed) (usr: APPLIED, root: not installed)
          devices.pciex.8680c71014108003.rte (usr: APPLIED, root: not installed) (usr: APPLIED, root: not installed)
          devices.usbif.08025002.rte (usr: APPLIED, root: not installed)
          devices.vdevice.IBM.VASI-1.rte (usr: APPLIED, root: not installed)
          devices.vdevice.IBM.v-scsi-host.rte (usr: APPLIED, root: not installed)
          devices.vdevice.IBM.v-scsi.rte (usr: APPLIED, root: not installed)
          devices.vdevice.IBM.vfc-server.rte (usr: APPLIED, root: not installed)
          devices.vdevice.vbsd.rte (usr: APPLIED, root: not installed)
          devices.vtdev.scsi.rte (usr: APPLIED, root: not installed)
          ios.migration.rte (usr: APPLIED, root: not installed)
          ios.paging.rte (usr: APPLIED, root: not installed)
          ios.sea (usr: APPLIED, root: not installed)
          pool.basic.rte (usr: COMMITTED, root: not installed)

          ***When I tried to install above filesets one at a time, e.g: pool.basic.rte, I get this,***

          Requisite Failures
          SELECTED FILESETS: The following is a list of filesets that you asked to
          install. They cannot be installed until all of their requisite filesets
          are also installed. See subsequent lists for details of requisites.

          pool.basic.rte # Virtual Server Storage Subsy…
          pool.basic.rte # Virtual Server Storage Subsy…

          CONFLICTING REQUISITES: The following filesets are required by one or
          more of the selected filesets listed above. There are other versions of
          these filesets which are already installed (or which were selected to be
          installed during this installation session). A base level fileset cannot
          be installed automatically as another fileset’s requisite when a different
          version of the requisite is already installed. You must explicitly select
          the base level requisite for installation.

          ios.vlog.rte # Virtual Log Device Software

          ***And, when I tried to install ios.vlog.rte, I will get the build date error for all the filesets that I got from lppchk -v .

          The system currently has ios.vlog.rte installed. I tried force flag as well as over-write option to install ios.vlog. rte but did not work. Any help will be greatly appreciated.

          And, also to a note the filesets are already on lppsource.

    4. Pingback: Exploit the full potential of PowerVC by using Shared Storage Pools & Linked Clones | PowerVC secrets about Linked Clones (pooladm,mksnap,mkclone,vioservice) | chmod666

    5. Hi Benoit,

      It was great to meet you last week at the 2016 IBM TCC, and I really enjoyed your presentations.

      I am also a huge fan of shared storage pools. They have helped greatly speed up the delivery of LPARs in our test and dev environments and allow for easy LPM setup.

      I wanted to let the community know that we did experience a shared storage pool outage recently. Our pool has 14 VIOs nodes, and we had one VIO run out of memory unexpectedly. This node also happened to be the mfsMgr (aka server node for the pool). The node was in a kind of limbo state where it wasn’t completely dead, so the pool did not elect a new server node. The rest of the nodes hung waiting on the server node to respond. A couple LPARs using SSP backed disks were rebooted during this time, and they did not come up because they couldn’t find their rootvg disks anymore.

      Once I rebooted the VIO that was out of memory, normal pool functions returned and the LPARs that coudln’t boot were able to see their rootvg disks again.

      The out of memory condition was caused by a known memory leak in the solid db database used by the VIOS.

      IBM provided me two ifixes, one for, and one for to fix the memory leak issue.

      IV83165m5a.160422.epkg.Z is for
      IV83165s6a.160415.epkg.Z is for

      I’m working to get these installed ASAP, I love SSPs and want my whole team to love them too!

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>