Updating and backuping Virtual I/O Servers with NIM : Story of APARs IV46060, IV????? and IV?????

I recently had to find the best solution to update a bunch of Virtual I/O Server at a time. Since a couple of months I’m intensively using NIM new features such as DSM and my first thought was to use NIM to update all my Virtual I/O Servers. You’ve probably notice that a new operation exists in latest NIM version called “updateios“. With this new operation comes two new types, vios (a Virtual I/O Server machine) and ios_mksysb (a mksysb created by the backupios command on the Virtual I/O Server). I’m probably the only guy using this because at the time of writing this post the updateios command does not work. For IBMers who are reading this post I had the chance to work with french L3 Virtual I/O Server support on two PMRs (a big thanks to them for their skills and efficiency), you can have a look on it :

  • PMR 84369,664,706 : NIM updateios operation hanging on NIM master resulting in two APARs (IV?????; and IV?????) (these two APARs are still in validation at the time of writing).
  • PMR 84152,664,706 : NIM updateios problem with /usr/lpp/bos.sysmgt/nim/methods/c_updateios resulting in one APAR (IV46060) (http://www-01.ibm.com/support/docview.wss?crawler=1&uid=isg1IV46060).

After a few weeks of work with the support we finally found two workarounds for these problems. This post will explain the solutions we found with the support. If you had one lesson to remember by reading this post keep this one : “Always subscribe to SWMA support because they are damn brillant”.

Defining Virtual I/O Server object

If you are reading this post I hope you’ve already read my post about NIM Less known features. If you have no time to read this one here is a reminder. Before running any operation on a Virtual I/O Server, you have to create management objects associated to it :

  • Create the HMC object :
  • # dpasswd -f foo  -U hscroot
    Password file is /etc/ibm/sysmgt/dsm/config/foo
    Password:
    Re-enter password:
    Password file created.
    # dkeyexch -f /etc/ibm/sysmgt/dsm/config/myhmc_passwd -I hmc -H myhmc
    OpenSSH_6.0p1, OpenSSL 0.9.8x 10 May 2012
    # nim -o define -t hmc -a if1="find_net myhmc 0" -a passwd_file=/etc/ibm/sysmgt/dsm/config/myhmc_passwd myhmc
    
  • Create the CEC object, I’m using in this example the nimquery command to find serial number and machine type :
  • # nimquery -a hmc=myhmc-p | grep ^CEC
    [..]
    CEC SERVER1 - 8202-E4B_6565655 :
    CEC SERVER2 - 8205-E6B_0606065 :
    [..]
    # nim -o define -t cec -a hw_type=8202 -a hw_model=E4B -a hw_serial=6565655 -a mgmt_source=myhmc SERVER1 
    
  • Created the vios object, I’m using in this example the nimquery command to find the identity field :
  • # nimquery -a cec=SERVER1 -p
    [..]
    LPAR my_vios - lpar_id 2 :
            allow_perf_collection = 1
            auto_start = 0
            curr_lpar_proc_compat_mode = POWER7
            curr_profile = my_vios
            default_profile = my_vios
            desired_lpar_proc_compat_mode = default
            logical_serial_num = 6565655
            lpar_avail_priority = 191
            lpar_env = vioserver
            lpar_id = 2
            lpar_keylock = norm
            msp = 1
            name = my_vios
            os_version = VIOS 2.2.2.1
            power_ctrl_lpar_ids = none
            redundant_err_path_reporting = 0
            resource_config = 1
            rmc_ipaddr = 10.10.20.107
            rmc_state = active
            shared_proc_pool_util_auth = 1
            state = Running
            time_ref = 0
            work_group_id = none
    [..]
    # nim -o define -t vios -a if1="1020-10-10-20-0-s24-net my_vios 0" -a mgmt_source="SERVER1" -a identity=2  my_vios
    
  • Check everything is ok by using lsnim command :
  • # lsnim -t hmc
    my_hmc      management       hmc
    # lsnim -t cec
    SERVER2     management       cec
    # lsnim -t vios
    my_vios           management       vios
    

Setup Virtual I/O Server as a nim client

Only a few people knows that a Virtual I/O Server can be a setup as a NIM Client. Remember that you never had to use oem_setup_env to perform administration tasks on Virtual I/O Server. To setup a Virtual I/O Server as a NIM client use a special command called remote_management as padmin. It’s the niminit command for a Virtual I/O Server. Keep in mind that the remote_management setup NIM client to use nimsh protocol (it’s important for the rest of this post.) :

  • You probably had to add NIM servers entries in your /etc/hosts file :
  • # hostmap -addr 10.10.20.140 -host my_nim1 my_nim1.lab.chmod666.org
    # hostmap -addr 10.10.20.141 -host my_nim2 my_nim2.lab.chmod666.org
    
  • Enable remote_management :
  • # remote_management -interface en0 my_nim1
    nimsh:2:wait:/usr/bin/startsrc -e "LIBPATH=/usr/lib" -g nimclient >/dev/console 2>&1
    0513-059 The nimsh Subsystem has been started. Subsystem PID is 7340278.
    
  • If you have to disable remote_management use the disable option :
  • # remote_management -disable
    0513-044 The nimsh Subsystem was requested to stop.
    
  • Check nimsh is running :
  • # ps -ef | grep nimsh
        root 5767198 5963976   0   Aug 23      -  0:00 /usr/sbin/nimsh -s
    

Backuping Virtual I/O Server by creating an ios_mksysb resource.

Before updating the Virtual I/O Server create a ios_mksysb. Most PowerVM administrator are running a script from the Virtual I/O Server but you can now invoke the backupios command from the NIM server. You can now do this for all your Virtual I/O Server and store the ios_mksysb on the NIM server, much easier than running a command on the Virtual I/O Server and mounting an NFS share on it …. :

# nim -o define -t ios_mksysb -a source=my_vios -a location=/export/nim/mksysb/my_vios/my_vios-ios_mksysb  -a server=master -a mk_image=yes my_vios-ios_mksysb
+---------------------------------------------------------------------+
                System Backup Image Space Information
              (Sizes are displayed in 1024-byte blocks.)
+---------------------------------------------------------------------+
Required = 7316181 (7145 MB)    Available = 386230180 (377178 MB)


/tmp/7274624.mnt0/myvios-ios_mksysb  doesn't exist.

Creating /tmp/7274624.mnt0/myvios-ios_mksysb
Backup in progress.  This command can take a considerable amount of time
to complete, please be patient...


Creating information file (/image.data) for rootvg.

Creating list of files to back up.
....
Backing up 169631 files............
51526 of 169631 files (30%)..............................
155443 of 169631 files (91%)..

169631 of 169631 files (100%)
0512-038 savevg: Backup Completed Successfully.

While running this command you can have a look on the Virtual I/O Server. By “proctreeing” the nimsh process you can check that the backupios with mksysb flag command is running :

# proctree -a  9240678
1    /etc/init
   3342492    /usr/sbin/srcmstr
      5046448    /usr/sbin/nimsh -s
         10813570    /usr/sbin/nimsh -s
            6160534    /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_nimpush /usr/lpp/bos.sysmgt/nim/meth
               7274624    /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_backupios -aserver=my_nim1 -al
                  9240678    /usr/ios/cli/ioscli backupios -file /tmp/7274624.mnt0/my_vios-ios_mksysb -mk
                     10158278    /bin/ksh /usr/bin/savevg -X -i -f /tmp/7274624.mnt0/my_vios-ios_mksysb rootv
                        8585348    /bin/ksh /usr/bin/savevg -X -i -f /tmp/7274624.mnt0/my_vios-ios_mksysb rootv
                           10223832    /usr/bin/sleep 10
                        9764964    /usr/bin/cat /tmp/mksysb.10158278/.archive.list.10158278
                        11337872    backbyname -i -q -v -Z -p -U -f /tmp/7274624.mnt0/my_vios-ios_mksysb

After the ios_mksysb creation you can check the source and the ioslevel of your backup :

# lsnim -l my_vios-ios_mksysb
my_vios-ios_mksysb:
   class         = resources
   type          = ios_mksysb
   arch          = power
   Rstate        = ready for use
   prev_state    = unavailable for use
   location      = /export/nim/mksysb/my_vios/my_vios-ios_mksysb
   version       = 6
   release       = 1
   mod           = 8
   oslevel_r     = 6100-07
   alloc_count   = 0
   server        = master
   creation_date = Mon Sep 30 11:52:35 2013
   source_image  = my_vios
   ioslevel      = 2.2.2.1

Committing existing updates on the Virtual I/O Server with updateios operation.

Commit all uncommitted updates on the Virtual I/O Server. The NIM command will invoke “ioscli updateios -commit” command on the Virtual I/O Server. Remember to remove all ifix/efix before commiting (use emgr)

# /usr/sbin/emgr -r -L IV16920s02
# nim -o updateios -a lpp_source=vios2223-fp26-sp02-lpp_source  -a accept_licenses=yes -a preview=no -a updateios_flags="-commit" -a force=yes my_vios

Updating Virtual I/O Server with updateios operation.

First of all if the Virtual I/O Server is member of a Shared Storage Pool cluster it can’t be updated. Leave the cluster before running the update :

#  clstartstop -stop -n my_cluster -m my_vios

You will face two problems when updating a Virtual I/O Server from NIM with the updateios operation. Running an updateios operation from the NIM server call the script /usr/lpp/bos.sysmgt/nim/methods/c_updateios on the Virtual I/O Server. If you perform the updateios operation this one will fail with this output :

# nim -o updateios -a lpp_source=vios2223-fp26-sp02-lpp_source  -a accept_licenses=yes -a preview=no -a updateios_flags="-install" -a force=yes my_vios
[..]
******************************************************************************
End of installp PREVIEW.  No apply operation has actually occurred.
******************************************************************************

Continue bos.rte.install installation [y|n]?
[..]
******************************************************************************
End of installp PREVIEW.  No apply operation has actually occurred.
******************************************************************************

Continue the installation [y|n]?
Command did not complete.

As you can see on the output the updateios command is interactive and ask TWO yes/no questions. On the Virtual I/O Server while running the updateios operation you can check that /usr/lpp/bos.sysmgt/nim/methods/c_updateios is called by nimsh process :

# proctree 15466556
4260044    /usr/sbin/srcmstr
   7340280    /usr/sbin/nimsh -s -c
      12451968    /usr/sbin/nimsh -s -c
         15466556    /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_nimpush /usr/lpp/bos.sysmgt/nim/meth
            14352628    /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_updateios -aaccept_licenses=yes -afo
               10944754    /usr/ios/cli/ioscli updateios -install -dev /tmp/_nim_dir_14352628/mnt0 -f -acc
                  5374158    installp -e install.log -a -d /tmp/_nim_dir_14352628/mnt0 bos.rte.install
                     9961620    installp -e install.log -a -d /tmp/_nim_dir_14352628/mnt0 bos.rte.install

If you edit the /usr/lpp/bos.sysmgt/nim/methods/c_updateios you can see at the line 130 that ‘y’ it just send one time :

# vi /usr/lpp/bos.sysmgt/nim/methods/c_updateios
[..]
                -install)
                        argument="-install -dev $lpp_access ${force:+-f} ${accept_licenses:+-accept}"
                        if [[ $preview = "no" ]]; then
                                command="eval echo 'y' | /usr/ios/cli/ioscli updateios $argument"
                        else
                                command="eval echo 'n' | /usr/ios/cli/ioscli updateios $argument"
                        fi
                        ;;
[..]

Modify the ‘y’ by ‘y\ny’ and the script will send two ‘y’, easy :-) :

# grep -n eval /usr/lpp/bos.sysmgt/nim/methods/c_updateios | head -1
130:                            command="eval echo 'y\ny' | /usr/ios/cli/ioscli updateios $argument"

Rerun the NIM operation and the update will start.

At the end of the installation you will probably face another problem. This one occurs only if the Virtual I/O Server NIM client is using nimsh protocol. The NIM operation will hang forever on the NIM server : on the Virtual I/O Server a socket remain opened between the NIM client and the NIM server:

# netstat -Aan |grep 3901
f1000e0001cb2bb8 tcp4       0      0  10.10.20.107.3901   10.10.20.140.1021   ESTABLISHED
f1000e00098bdbb8 tcp        0      0  *.3901                *.*                   LISTEN
# rmsock f1000e00098bdbb8 tcpcb
The socket 0xf1000e00098bd808 is being held by proccess 8126526 (accessprocess).
#  rmsock f1000e0001cb2bb8 tcpcb
The socket 0xf1000e0001cb2808 is being held by proccess 12386388 (cimserver).
#  proctree 12386388
12386388
   8323090    /usr/ios/lpm/sbin/eventhelper --events ref_code,lpar_state,not_ivm,migration_st
# proctree 8126526
15269920    /usr/bin/ksh /usr/ios/lpm/sbin/lparmgr all start
   8126526    /usr/ios/lpm/sbin/accessprocess
# ps -ef |grep 12386388
    root  8323090 12386388   0 15:44:31      -  0:00 /usr/ios/lpm/sbin/eventhelper --events ref_code,lpar_state,not_ivm,migration_state,vsp_state
    root 12386388        1   0 15:42:56      -  0:16 [cimserve]

The issue was found with the support, a command called by cimserve called climgr is not closing correctly its file descriptors a the end of the update, modify this script to close all opened file descriptor :

# grep -n exec /usr/ios/sbin/climgr
366:exec 1<&-
367:exec 2<&-
368:exec 5<&-

Rerun the operation and evrything will just work fine :-)

Conclusion

I assume these two problems will be fixed in the next Virtual I/O Server release, probably not the 2.2.3.0 version but the next one (I have to wait in average 6 months before the fix is applied to the current version). Once again I want to thanks the IBM Support for helping me on these cases and for their efficiency. I hope it helps.

4 thoughts on “Updating and backuping Virtual I/O Servers with NIM : Story of APARs IV46060, IV????? and IV?????

  1. Hello,

    we built vios in the same way a week ago with jean.
    but we didn’t perform updateios through NIM yet, in order to go to the latest release (2.2.2.3-FP26_SP02 awful naming….)

    By reading your blog post, we will gain some precious time !
    thanks a lot for sharing !

  2. hello,

    after applied the tips you described, i invoked updateios operation from NIM server and nearly met the same issue.

    The updateios stalled on NIM server although the update is finished from VIO side :

    VIO :
    # netstat -Aan |grep 3901
    f1000e00028423b8 tcp 0 0 *.3901 *.* LISTEN
    f1000e0000b653b8 tcp4 0 0 10.254.138.50.3901 10.254.138.16.1023 ESTABLISHED

    # rmsock f1000e0000b653b8 tcpcb
    The socket 0xf1000e0000b65008 is being held by proccess 7930100 (ksh).

    I prefer “ps -T” than proctree ;-)

    # ps -T 7930100 -o pid,ppid,user,command,args
    PID PPID USER COMMAND COMMAND
    7930100 1 root ksh /usr/bin/ksh /usr/ios/lpm/sbin/lparmgr all start
    9896058 7930100 root \–accessprocess \–/usr/ios/lpm/sbin/accessprocess

    Obviously 3901 is listened by NIMSH…

    # rmsock f1000e00028423b8 tcpcb
    The socket 0xf1000e0002842008 is being held by proccess 4456590 (nimsh).
    # ps -T 4456590 -o pid,ppid,user,command,args
    PID PPID USER COMMAND COMMAND
    4456590 4194500 root nimsh /usr/sbin/nimsh -s

    It seems that the process “lparmgr” is blocked in kernel space :

    # truss -p 7930100
    truss: 0915-023 Cannot control process #7930100.

    Unlike a “regular” VIO where we can “truss” him :

    # ps -edf | grep lpar
    root 6553800 1 0 16:03:54 – 0:00 /usr/bin/ksh /usr/ios/lpm/sbin/lparmgr all start
    root 7536822 4784264 1 16:57:57 pts/0 0:00 grep lpar
    # truss -p 6553800
    kwaitpid(0x00000000, 0, 0, 0x00000000, 0x00000000) (sleeping…)
    Pstatus: process is not stopped

    With KDB we can see :

    (0)> p 121
    SLOT NAME STATE PID PPID ADSPACE CL #THS

    pvproc+01E400 121 ksh ACTIVE 07900F4 0000001 0000000879CCF480 0 0001

    (0)> f 121
    pvthread+007900 STACK:
    [000E47F8]e_block_thread+000298 ()
    [000E5368]e_sleep_thread+0000E8 (??, ??, ??)
    [002B7910]j2PagerThread+0001B0 (??)
    [00387774]threadentry+000094 (??, ??, ??, ??)

    ….seems to be a bug or i made a mistake somewhere…

    So, i finally rebooted the VIO and everything seem all right…

    • Ok, have the same issue with the support, close the file descriptor in lparmgr like you did it on climgr (after # Start Main Execution)

      # Start Main Execution
      if [ $# -lt 2 ] ; then
      usage
      exit 0
      fi

      –> close FD here

      One IV was open for climgr ….
      Second one was open for lparmgr …

      Hope it helps.

      Regards.

      Benoit.

  3. I like your posts, very advanced packed with practical commands samples. Love to stay in touch with you blog updates.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>