Adventures in IBM Systems Director in System P environment. Part 7 : Introduction to Smart Cloud Entry

If you’ve read the 6 previous parts of this series of post you can understand that managing Power Systems through VMControl and IBM System Director is not so easy. Nobody wants to have to type ugly long command lines with puzzling parameters that nobody understands. Nobody wants to check everything is OK on the Virtual I/O Servers, on the Systems Director, on the Storage side, and so on. It’s obvious, customers and administrators just want to deploy new servers, they want to do it quickly and it has to be easy. If you’ve read the previous parts you know that our VMcontrol is ready to deploy servers. The next step is to add on top of this the Smart Cloud Entry. It answer to all the questions below by letting you deploying new servers by using a nice a clear web interface. Smart Cloud Entry has one strength, it’s easy to deploy, easy to use, easy to manage, it’s EASY. This post will not be very technical but will show what you can do with Smart Cloud Entry. This is the last brick of a full automatized and virtualized environment. This is for me the result of one year of hard work trying to understand every brick of this configuration (this is the result of my “after hours work”, these posts and these configurations were written and designed between 18h30 and 22h, I mentioning this because this does not take place in my every day work and my boss do not ask me to do this. This is pure passion)

Installation, configuration, update

Installation

Installing Smart Cloud Entry is pretty easy. Just download the source package for your IBM Entitled Software page, and get the latest updates on FixCentral. In my case I have two files : ESD_-_IBM_SmartCloud_Entry_for_Power_V2.4.0_102012.tar.gz for the base installation package and 2.4.0.3-IBM-SCE-FP003-201304241341.zip for the update. Extract the base installation package and run the installer :

# gunzip ESD_-_IBM_SmartCloud_Entry_for_Power_V2.4.0_102012.tar.gz
# tar xf ESD_-_IBM_SmartCloud_Entry_for_Power_V2.4.0_102012.tar
# cd ESD_-_IBM_SmartCloud_Entry_for_Power_V2.4.0_102012/install/power/aix
# sh sce240_aix_installer.bin
Preparing to install...
Extracting the JRE from the installer archive...
Unpacking the JRE...
Extracting the installation resources from the installer archive...
Configuring the installer for this system's environment...

Launching installer...

===============================================================================
Choose Locale...
----------------

    1- Deutsch
  ->2- English
[..]
Preparing CONSOLE Mode Installation...

Accept License Agreement :

===============================================================================
License Agreement
-----------------

Installation and Use of IBM SmartCloud Entry Requires Acceptance of the
Following License Agreement:

International Program License Agreement

Part 1 - General Terms
[..]
DO YOU ACCEPT THE TERMS OF THIS LICENSE AGREEMENT? (Y/N):

Choose the destination folder. My advice is to create a filesystem dedicated to Smart Cloud Entry by default it goes under /opt/ibm but I’m using /app/sce filesystem to avoid filling /opt filesystem. Please note that I’m using the same folder for the properties files :

===============================================================================
Choose Property File Install Folder
-----------------------------------

Please choose a destination folder for the property files.

Where Would You Like to Install the Property Files? (DEFAULT: /root): /app/sce


===============================================================================
Pre-Installation Summary
------------------------

Please Review the Following Before Continuing:

Property File Install Folder:
    /app/sce/.SCE24/

Install Folder:
    /app/sce/SCE24

Disk Space:
    Free: 6497 MB Required: 765 MB



PRESS  TO CONTINUE:

===============================================================================
Installing...
-------------

 [==================|==================|==================|==================]
 [------------------|------------------|------------------|------------------]

 ===============================================================================
Install Finished
----------------

Define the password for the Smart Cloud Entry administrator :

===============================================================================
Add Configuration Values
------------------------

Changing these properties is optional. All options can be changed post install in  /app/list/sce/product/.SCE24/authentication.properties.  Press  to accept the default value.

Initial admin user name (DEFAULT: admin):

Initial admin name (DEFAULT: SmartCloud Entry Administrator):




===============================================================================
Add Configuration Values
------------------------


Initial administrator password:



===============================================================================
Add Configuration Values
------------------------


Verify initial administrator password:



===============================================================================
IBM SmartCloud Entry has been successfully installed
----------------------------------------------------

IBM SmartCloud Entry has been successfully installed to:

   /app/sce/SCE24

If you choose to create a silent install response file, it will be located in this directory.

    1- Create Silent Install Response File
  ->2- Do Not Create Silent Install Response File

ENTER THE NUMBER FOR YOUR CHOICE, OR PRESS  TO ACCEPT THE DEFAULT::

Update

The way to update Smart Cloud Entry is weird, you have to launch an osgi console and add a repository (the update has to be located in this directory). Then run the installupdates command :

# mkdir 2.4.0.3-IBM-SCE-FP003-201304241341
# mv 2.4.0.3-IBM-SCE-FP003-201304241341.zip 2.4.0.3-IBM-SCE-FP003-201304241341
# cd 2.4.0.3-IBM-SCE-FP003-201304241341
# unzip 2.4.0.3-IBM-SCE-FP003-201304241341.zip
# /app/sce/SCE24/skc
osgi> version
2.4.0.0-201208290130
osgi> addrepo file:/app/sce/2.4.0.3-IBM-SCE-FP003-201304241341
SmartCloud Entry update repository added
osgi> installupdates
SmartCloud Entry updates to install:
        com.ibm.cfs.product 2.4.0.0-201208290130 ==> com.ibm.cfs.product 2.4.0.3-201304241341
SmartCloud Entry update done
osgi> exit

Re-run Smart Cloud Entry and check the version has been correctly updated :

# /app/list/sce/product/SCE24/skc
osgi> version
2.4.0.3-201304241341
[08/26/13 14:01:11:529] 00000 INFO: IBM SmartCloud Entry 2.4.0.3-201304241341 is being initialized. - com.ibm.cfs.app.osgi.internal.Autostarter.start
[08/26/13 14:01:13:719] 00001 INFO: CYX5052I: Metering services are disabled by the administrator. - com.ibm.cfs.services.metering.impl.internal.MeteringServicesImpl.start
[08/26/13 14:01:13:728] 00001 INFO: CYX1223I: Current logging levels: default=FINE - com.ibm.cfs.app.osgi.internal.LoggingCommandProvider.logCurrentLevels
[08/26/13 14:01:13:732] 00001 INFO: *** IBM SmartCloud Entry 2.4.0.3-201304241341 is ready (startup time 00:16.450) ***  - com.ibm.cfs.app.impl.internal.CFSImpl.startup
[08/26/13 14:01:14:756] 00005 INFO: CYX4391I: Billing services have been disabled by the administrator. - com.ibm.cfs.services.billing.BillingServicesFactoryImpl.isBillingEnabled

Start Smart Cloud Entry at boot

The only way I’ve find to start Smart Cloud Entry at server boot is to add an entry in inittab with the -nosplash option :

# mkitab "skc:2:once:/app/list/sce/product/SCE24/skc -nosplash"
# tail -1 /etc/inittab
skc:2:once:/app/list/sce/product/SCE24/skc -nosplash

If you want to access the osgi console without re-running skc add a port number after the -console stanza in the skc.ini file :

# head -10 /app/sce/SCE24/skc.ini
-vm
jre/bin/java
-console
7777
-clean
# telnet localhost 7777
Trying...
Connected to loopback.
Escape character is '^]'.

osgi>

Accessing the web interface

Access Smart Cloud Entry web interface through this url (replace the hostname by yours) : http://pa000sce1.domain.local:8080/cloud/web/login.html.

Adding a new cloud to Smart Cloud Entry

Adding a VMControl Cloud to Smart Cloud Entry is really easy, just point to your IBM Systems Director server, enter your login and password (in my case I’m using my user and password, but my advice is to create a Smart Cloud Entry user on the Systems Director). Be sure you can resolve the IBM Systems Director server with your DNS server and check your firewall rules (TCP 8422) if Smart Cloud Entry and IBM Systems Director are not installed on the same network (this is my case and this is much easier to do this) :


Once the cloud has been correctly added you should see its status at the right of your screen, if everything is fine the status should be ok (like my image). This is it, your VMControl cloud is ready to use. You can now deploy/create/capture servers, easy isn’t it ? Be sure to select the right version of VMControl when adding the cloud. You can check your VMControl version by logging on your IBM Systems Director server and by checking in the GUI (didn’t find any command to check this by using the Systems Director cli …).

No DHCP : network pool creation

I’m not using any DHCP server. Smart Cloud Entry is smart and let you defining a pool of IP addresses by hand, this is very useful and easy to setup. While most of Cloud Managing tools force you to use DHCP, Smart Cloud Entry give you the choice. For my own use I’ve defined a pool with a bunch of addresses in the VLAN served by my Virtual I/O Server (be careful to set the network id to match you Shared Ethernet Adapter configuration)

One more time Smart Cloud Entry let you the choice in everything, while defining a network you’ll have to choose to define a range of addresses or a single IP, in the example below a range of addresses was created.

Even better, when an IP address is in use this one is defined as utilized. You can also lock an IP and Smart Cloud Entry will never use it. For instance do not use the five last IP of a range (these ones are reserved for the network team). This example tells you how Smart Cloud Entry is flexible.

Deploying Virtual Appliance

Deploying a Virtual Appliance is one of the easiest things to do in Smart Cloud Entry. Just select the Virtual Appliance from the list and choose deploy. Two modes are available, “basic”, and “advanced”, in advanced mode you can choose the hostname, redefine processor and memory settings, and define an expiration date. Be careful when setting an expiration date the workload will be totally deleted after the expiration :


While deploying a Virtual Appliance, you can check the Workload summary, one of the Workload will be “In transition”. Logs can be checked too while the Virtual Appliance is deploying.



After a successful creation, the Workload has a green ok button next to its name, you can also check in the resource that you now have less resources …


Approval, Accounting, Metering, …

By default Approval, Accounting (billing), and Metering functions are not enabled, by enabling theses features Smart Cloud Entry will have access to new functionality. I’m not an expert for this part but I’ll tell you how to enable it and what is the purpose of each one and gives you a few screenshots :

  • Approval : you can choose to approve all Smart Cloud Entry events. For instance each time a workload will be initiated the Smart Cloud Entry Administrator will receive a mail and have to approve the workload initiation, this one will be pending until the administrator approve it. The Approval policies can be configured from the web interface in the Cloud configuration tab :

  • Accounting : by enabling accounting, you’ll have the right to create accounts for you users, accounts are linked to a username and allows you to define each user credits. By creating and running appliances users will have to pay, and their current credits will decrease over time :
  • # cd /app/sce/SCE24
    # cp billing.properties billing.properties.$(date +%s)
    # sed s/com.ibm.cfs.billing.enabled=false/com.ibm.cfs.billing.enabled=true/ billing.properties.1377519521 > billing.properties
    # grep com.ibm.cfs.billing.enabled billing.properties
    com.ibm.cfs.billing.enabled=true
    

  • Metering : Enabling metering allow Smart Cloud Entry to calculate the resource usage of every appliance and of every user. The billing is calculated from the data collected by the metering :
  • # cp metering.properties metering.properties.$(date +%s)
    # sed "s/com.ibm.cfs.metering.enabled=false/com.ibm.cfs.metering.enabled=true/" metering.properties.1377519318 > metering.properties
    # grep com.ibm.cfs.metering.enabled metering.properties
    com.ibm.cfs.metering.enabled=true
    

Other features, conclusion

Smart Cloud Entry allows you to capture, suspend, resume, stop, start and resume a workload. You can also modify its size (Processor and memory). Smart Cloud Entry can also be used with VMware and PureSystems but I’ve not tested it myself . In my opinion it is a user-friendly interface to VMcontrol, simple to use, and simple to manage. This is the way a cloud software should be, light and easy to use. It sure can’t answer to all your problems but I’m sure it can fits most customers who wants a cloud. Smart Cloud Entry do a few things but it do it well.

Adventures in IBM Systems Director in System P environment. Part 6 : VMcontrol and Shared Storage Pool Linked Clones

As many of you already know Virtual I/O Shared Storage Pools comes with one very cool feature : snapshots ! If you have read the Part 5 of these adventures, you know how to use VMcontrol and deploy a new Virtual Appliance. The part 5 tells you how to deploy a Virtual Appliance through NIM using a mksysb image or a lpp_source. Using a mksysb or a lpp_source can takes time depending on the lpar configuration (entitlement capacity, virtual processors ..) or on the NIM network speed (for instance a NIM with a 100 Mbits network adapter). In my case a rte installation takes approximately twenty to thirty minutes. By using Shared Storage Pools feature, VMcontrol can create a snapshot of an actual Workload and use it to create a new one. This is called a Linked Clone (because the new Virtual Appliance will obviously be linked to its source Workload by its snapshot). By using a linked clone a new Virtual Appliance deployment takes twenty seconds, no joke ….

Here are the four needed steps to use linked clones, each one will be described in details in this post :

  1. Create a linked clones repository. This VMcontrol repository is created on a Virtual I/O Server participating in the Shared Storage Pool.
  2. On a existing Workload deploy the Activation Engine.
  3. Capture the Workload (the one with the Activation Engine installed), to create a new Virtual Appliance. At this point a snapshot of the Workload is created on the Shared Storage Pool.
  4. Deploy the Virtual Appliance to create a new Workload, the Workload will be booted an reconfigured by the Activation Engine. The activation engine will set the new hostname and the IP address.

Repository creation

  • Get the OID of one of the Virtual I/O Server participing in the Shared Storage Pool, this OID is needed for the creation of the repository :
  • # smcli lssys -oT vios03
    vios03, Server, 0x506a2
    vios3, OperatingSystem, 0x50654
    
  • A repository has to be created on a storage (this storage location can be on a NIM server or in our case for linked clones on a Shared Storage Pool). A list of available storage locations can be found by using the mkrepos command. In this case the storage OID is 326512 :
  • # mkrepos -C | grep -ip vios03
    [..]
    vios03 (329300)
    repositorystorage
            Min:    1
            Max:    1
            Description:    null
            Options:
            Key,    Storage,        Storage location,       Type,   Available GB,   Total GB,       Description,    OID
            [tst-ssp]       tst-ssp tst-cluster     SAN     6       68              326512
    [..]
    
  • With the storage OID and the Virtual I/O Server OID create the repository and give it a name :
  • # smcli mkrepos -S 326512 -O 0x50654 -n linked-clones-repository
    
  • List the repositories and check the new one is created :
  • # smcli lsrepos
    nim-repostory
    linked-clones-repository
    

Activation Engine

The Activation Engine is a script used to customize newly deployed Virtual Appliances. By default it changes the ip address and the hostname with new ones (you have to set ip and hostname of the new Virtual Appliance when you’ll deploy it). The Activation Engine can be customized but this post will not talk about it. Here is a link to the documentation : click here

  • The Activation Engine can be found on the director itself, in /opt/ibm/director/proddate/activation-engine/vmc.vsae.tar. Copy it to the Workload you want to capture :
  • # scp /opt/ibm/director/proddata/activation-engine/vmc.vsae.tar pyrite:/root
    root@pyrite's password:
    vmc.vsae.tar                                                                                                                                                                  100% 7950KB   7.8MB/s   00:01
    
  • Unpack it and run the installation :
  • # tar xvf vmc.vsae.tar
    x activation-engine-2.1-1.13.aix5.3.noarch.rpm, 86482 bytes, 169 tape blocks
    [..]
    x aix-install.sh, 2198 bytes, 5 tape blocks
    
    # export JAVA_HOME=/usr/java5/jre
    # ./aix-install.sh
    Install VSAE and VMC extensions
    JAVA_HOME=/usr/java5/jre
    [..]
    [2013-06-03 11:18:04,871] INFO: Looking for platform initialization commands
    [2013-06-03 11:18:04,905] INFO:  Version: AIX pyrite 1 6 00XXXXXXXX00
    [..]
    [2013-06-03 11:18:15,082] INFO: Created system services for activation.
    
  • Prepare the capture by running newly installed script AE.sh. Be aware that running this command will shutdown your host, so be sure all the customization that you want have been made on this host :
  • # /opt/ibm/ae/AE.sh --reset
    JAVA_HOME=/usr/java5/jre
    [2013-06-03 11:23:43,575] INFO: Looking for platform initialization commands
    [2013-06-03 11:23:43,591] INFO:  Version: AIX pyrite 1 6 00C0CE744C00
    [..]
    [2013-06-03 11:23:52,476] INFO: Cleaning AR and AP directories
    [2013-06-03 11:23:52,492] INFO: Shutting down the system
    
    SHUTDOWN PROGRAM
    Mon Jun  3 11:23:53 CDT 2013
    
    
    Broadcast message from root@pyrite.prodinfo.gca (tty) at 11:23:54 ...
    
    PLEASE LOG OFF NOW ! ! !
    System maintenance in progress.
    All processes will be killed now.
    ! ! ! SYSTEM BEING BROUGHT DOWN NOW ! ! !
    

Capture

We’re now ready to capture the host. You’ll need the Server’s OID and the repository’s OID :

  1. The repository OID is 0x6ec9b :
  2. # smcli lsrepos -o | grep linked-clones-repository
    linked-clones-repository, 453787 (0x6ec9b)
    
  3. The server to caputre OID is 0x6ef4e :
  4. smcli lssys -oT pyrite
    pyrite, Server, 0x6ef4e
    
  5. Capture the server with the captureva command or though the GUI (be sure you have a Server and Operating System object for this one):
  6. # smcli captureva -r 0x6ec9b -s 0x6ef4e -n pyrite-vmc-va -D "imported from server pyrite"
    Mon Jun 03 19:27:17 CEST 2013  captureva Operation started.
    Get capture customization data
    Call capture function
    DNZLOP411I Capturing virtual server pyrite to virtual appliance pyrite-vmc-va in repository linked-clones-repository.
    DNZLOP912I Disk group to be captured: DG_05.29.2013-13:26:28:062
    DNZLOP900I Requesting SAN volume(s)
    DNZLOP948I New disk group: DG_06.03.2013-19:27:21:609
    DNZLOP413I The virtual appliance is using disk group DG_06.03.2013-19:27:21:609 with the following SAN volumes: [pyrite-vmc-va4].
    DNZLOP414I The virtual server is using disk group DG_05.29.2013-13:26:28:062 with the following SAN volumes: [IBMsvsp22].
    DNZLOP909I Copying disk images
    DNZLOP409I Creating the OVF for the virtual appliance.
    Call capture command executed. Return code= 456,287
    Mon Jun 03 19:27:28 CEST 2013  captureva Operation took 11 seconds.
    
  7. This output tells you two things about the storage : the captured virtual server is using a backing device on the Shared Storage Pool called IBMsvsp22; a snapshot of this backing device has been created an will be used by the virtual appliance, this snapshot is called pyrite-vmc-va4. On the Shared Storage Pool you can check -by using the snapshot- command that a snaphot of IBMsvsp22 has been created :
  8. # snapshot -clustername vio-cluster -list -spname vio-ssp -lu IBMsvsp22
    Lu Name          Size(mb)    ProvisionType      %Used Unused(mb)  Lu Udid
    IBMsvsp22        9537        THIN                 41% 9537        7d2895ede7cb14dab04b988064616ff2
    Snapshot
    e68b174abd27e3fa0beb4c8d30d76f92IMSnap
    
    Lu(Client Image)Name     Size(mb)       ProvisionType     %Used Unused(mb)     Lu Udid
    pyrite-vmc-va4           9537           THIN                41% 9537           e68b174abd27e3fa0beb4c8d30d76f92
    
  9. Optionally you can check the creation of the Virtual Appliance on the Director itself :
  10. # smcli lsva -o | grep -i pyrite  
    pyrite-vmc-va, 456287 (0x6f65f)
    # smcli lsva -l -V 0x6f65f
    pyrite-vmc-va
            TrunkId:13
            Notifiable:true
            ClassName:com.ibm.usmi.datamodel.virtual.VirtualAppliance
            RevisionVersion:1.1
            Description:imported from server pyrite
            ChangedDate:2013-06-03T19:27:27+02:00
            TrunkName:pyrite-vmc-va
            DisplayName:pyrite-vmc-va
            CreatedDate:2013-06-03T19:27:26+02:00
            SpecificationId:1
            SpecificationVersion:1.1
            OID:456287
            Guid:25C4F588982E3A8C8249871DDFB15031
            ApplianceId:5c1e6c95-68bc-4697-a9ce-3b5641c4f48f
            ObjectType:VirtualAppliance
            DisplayNameSpecified:true
    

Deploy

We’re now ready to deploy the Virtual Appliance. For this one you’ll need the Virtual Appliance OID (we already have it : 0x6f65f), a system or a system pool where the Virtual Appliance we’ll be deployed, and a deploymentplanid. (Please don’t ask why we need a deploymentplanid, I don’t know, if an Ibmer is reading this one please tell us in why … :-)):

  1. In my case I’m using a system pool with OID 0x57e20 (by deploying a Virtual Appliance in a system pool this one can be resilient and automatically moved between the systems for instance in case of a hardware failure or anything else). Use lssys if you’re deploying on a system, or lssyspool if you’re deploying on a system pool :
  2. # smcli lssyspool 
    Show server system pool list. 1 Server system pool(s) found.
    --------------------------------
    ID:359968 (0x57e20)
    Name:FRMTST-systempool
    Description:Server System Pool
    Type:PowerHMC
    Status:Critical
    State:Active
    Resilience:Capable
    Server system pool properties
    AutoOptimization:0
    FarmType:PowerHMC
    LEMEnsembleId:0009ED4C0DCD4B5CA83CE5F0232989D4
    OperatingState:20
    OptimizationInterval:30
    Platform:3
    
    Storage Pool Case
    Storage Pool:326512 (0x4fb70),  vio-ssp
    Storage Pool owning Subsystem:vio-cluster
    --------------------------------
    
  3. Use the lscustomization command to find the deployementplanid (the -H option is telling my Workload will be resilient) :
  4. # smcli lscustomization -a deploy_new -V 0x6f65f -g 0x57e20 -H true
    [..]
    deploymentplanid
            Value:  -7980877749837517784_01
            Description:    null
    [..]
    
  5. It’s now time to deploy, this operation take 30 seconds, no joke :
  6. #smcli deployva -v -g 0x57e20 -V 0x6f65f -m -7980877749837517784_01 -a deploy_new -A poolstorages=326512,product.vs0.com.ibm.ovf.vmcontrol.system.networking.hostname=ruby,product.vs0.com.ibm.ovf.vmcontrol.adapter.networking.ipv4addresses.5=10.10.10.209,product.vs0.com.ibm.ovf.vmcontrol.adapter.networking.ipv4netmasks.5=255.255.255.0,product.vs0.com.ibm.ovf.vmcontrol.system.networking.ipv4defaultgateway=10.240.122.254,product.vs0.com.ibm.ovf.vmcontrol.system.networking.dnsIPaddresses=134.227.74.196,134.227.2.251,product.vs0.com.ibm.ovf.vmcontrol.system.networking.domainname=prodinfo.gca"
    Mon June 03 20:01:52 CEST 2013  deployva Operation started.
    Attempt to get the default customization data for deploy_new.
    Attempt to get the deploy_new customization data.
    Update collection with user entered attributes.
    Attempt to validate the deploy request for 456,287.
    Attempt to deploy new.
    Workload pyrite-vmc-va_52529 was created.
    Virtual server ruby added to workload pyrite-vmc-va_52529.
    Workload pyrite-vmc-va_52529 is stopped.
    DNZIMC094I Deployed Virtual Appliance pyrite-vmc-va to new Server ruby hosted by system .
    Mon May 27 18:31:41 CEST 2013  deployva Operation took 30 seconds.
    
  7. Have a look on the Shared Storage Pool and check the newly created server is using the snapshot created by the capture called pyrite-vmc-va4 :
    lsmap -vadapter vhost1
    SVSA            Physloc                                      Client Partition ID
    --------------- -------------------------------------------- ------------------
    vhost1          U8203.E4A.060CE74-V2-C12                     0x00000004
    
    VTD                   deploy504c27f14
    Status                Available
    LUN                   0x8200000000000000
    Backing device        /var/vio/SSP/vio-cluster/D_E_F_A_U_L_T_061310/VOL1/AEIM.47bab2102f7794906a65be98d9f126bf
    Physloc
    Mirrored              N/A
    
    VTD                   vtscsi1
    Status                Available
    LUN                   0x8100000000000000
    Backing device        IBMsvsp24.8c380f19e79706b992f9a970301f944a
    Physloc
    Mirrored              N/A
    # snapshot -clustername udvio000tst-cluster -list -spname udvio000tst-ssp -lu IBMsvsp24
    Lu(Client Image)Name     Size(mb)       ProvisionType     %Used Unused(mb)     Lu Udid
    pyrite-vmc-va4           9537           THIN                41% 9537           e68b174abd27e3fa0beb4c8d30d76f92
                    Snapshot
                    157708e20d49cbd00f21767f3aeda35eIMSnap
    


Hope this can help !

Adventures in IBM Systems Director in System P environment. Part 5: VMcontrol and Shared Storage Pool

I am working on IBM Systems Director since almost one year. I remember how I was frustrated when someone from IBM told me that I can’t use VMcontrol because of our SAN environment (CISCO Switches + EMC 40K Storage array). My first question after this statement was “OK no problem, i’ll use it over a Shared Storage Pool”; I was even more frustrated when the answer was “Uh, not yet supported”. Fortunately there was a “yet” in this sentence. This happens six months ago. With the IBM Systems Director update to 6.3.2, and the new 2.2.2.1 Virtual I/O Server version this fonctionality is now supported. I have successfully implemented VMcontrol over a Shared Storage Pool and I doesn’t have enough words to tell you how incredible it is. AWESOME. Here are my tips and tricks to setup VMcontrol over a Shared Storage Pool (v3). Enjoy :

Prerequisites

Before trying to deploy a Virtual appliance, or even to capture one, check all points one by one :

  • Update IBM System Director to 6.3.2 :
  • # smcli lsver
    6.3.2
    
  • Ensure that all your Virtual I/O Servers participing in the Shared Storage Pool are 2.2.2.1 Virtual I/O Server (I highly encourage you to install ifixes IV31624m0a and IV32091s0a) :
  • # ioslevel
    2.2.2.1
    # oem_setup_env
    # emgr -l
    [..]
    1    S    IV31624m0a 12/17/12 14:30:52            VIOS 2.2.2.1 + ifixes
    2    S    IV32091s0a 12/17/12 14:31:10            cleanup fails on source vios
    [..]
    
  • On all your Virtual I/O Servers ensure that the common agent is correctly installed with the viocluster subagent in 6.3.2 version :
  • # /opt/ibm/director/agent/bin/lwiupdatemgr.sh -listFeatures | grep -i vios
    com.ibm.director.hw.power.vioscluster.agent.feature_6.2.1.10 Disabled
    com.ibm.director.hw.power.vioscluster.agent.feature_6.3.2.0 Enabled
    com.ibm.director.hw.power.vioscluster.agent.installer.feature_6.3.2 Enabled
    
  • On your NIM (Network Installation Manager) server, ensure that the common agent is correctly installed with the common repository subagent, and the nim subagent :
  • # /opt/ibm/director/agent/bin/lwiupdatemgr.sh -listFeatures | grep -E "nim|cr"
    com.ibm.director.im.cr.agent.installer_2.4.2.0-201211131517 Enabled
    com.ibm.director.im.rf.nim.subagent_2.4.1 Disabled
    com.ibm.director.im.rf.nim.subagent_2.4.2.0-201211131517 Enabled
    
  • dsm, openssl, and openssh filesets have to be installed on the NIM server :
  • # lslpp -Lc | grep -E "dsm.core|openssh|openssl"
    dsm:dsm.core:7.1.2.0: : :C:F:Distributed Systems Management Core: : : : : : :0:0:/:1241
    openssh.base:openssh.base.client:6.0.0.6100: : :C: :Open Secure Shell Commands: : : : : : :0:0:/:
    openssh.base:openssh.base.server:6.0.0.6100: : :C: :Open Secure Shell Server: : : : : : :0:0:/:
    openssh.license:openssh.license:5.8.0.6102: : :C: :Open Secure Shell License: : : : : : :0:0:/:
    openssh.man.en_US:openssh.man.en_US:6.0.0.6100: : :C: :Open Secure Shell Documentation - U.S. English: : : : : : :0:0:/:
    openssl.base:openssl.base:0.9.8.2400: : :C: :Open Secure Socket Layer: : : : : : :0:0:/:
    openssl.license:openssl.license:0.9.8.2400: : :C: :Open Secure Socket License: : : : : : :0:0:/:
    openssl.man.en_US:openssl.man.en_US:0.9.8.2400: : :C: :Open Secure Socket Layer: : : : : : :0:0:/:
    

As always “discover, access, inventory”

On all IBM Systems Director objects involved in VMcontrol (Virtual I/O Servers, NIM Server, Pseries on which virtual appliance will be deployed, Pseries on which virtual appliance will be captured, HMC controlling theses Pseries) ensure that access is granted, and a full inventory has been collected :

  • Virtual I/O Server (if someone knows how to get “Last Collected Inventory” from command line it’ll be useful) :
  • # smcli lssys -oT -A AccessState vios1,vios2,vios3,vios4
    vios1, Server, 0x5067e: Unlocked
    vios1, OperatingSystem, 0x50646: Unlocked
    vios2, OperatingSystem, 0x5064c: Unlocked
    vios2, Server, 0x50707: Unlocked
    vios3, Server, 0x506a2: Unlocked
    vios3, OperatingSystem, 0x50654: Unlocked
    vios4, Server, 0x506e3: Unlocked
    vios4, OperatingSystem, 0x5065c: Unlocked
    

  • NIM Server :
  • # smcli lssys -oT -A AccessState nim
    nim, OperatingSystem, 0x28a28: Unlocked
    nim, Server, 0x47d73: Unlocked
    
  • Pseries :
  • # smcli lssys -oT -A AccessState P520-TST-1,P520-TST-2
    P520-TST-1, Server, 0x18312: Unlocked
    P520-TST-2, Server, 0x1830c: Unlocked
    

  • HMC :
  • # smcli lssys -oT -A AccessState hmc1
    hmc1, HardwareManagementConsole, 0x33bfe: Unlocked
    

If everything is going well, IBM Systems Director has now discovered a new Virtual I/O Cluster and a new Shared Storage Pool associated to it :

  • Virtual I/O Server :
  • # smcli lssys -le vio000tst-cluster
    vio000tst-cluster:
        DisplayName (Name) : vio000tst-cluster (vio000tst-cluster)
        Description (Description) : Storage Manageable Endpoint (Storage System)
        SerialNumber (Serial Number) : a549bcf4c77311e18f5400215e487480 (a549bcf4c77311e18f5400215e487480)
        MachineType (Machine Type) : VIOS Cluster (VIOS Cluster)
        PrimaryHostName (Primary Host Name) : 10.10.122.109 (10.10.122.109)
        Manufacturer (Manufacturer) : IBM (IBM)
        AccessState (Access State) : Unlocked (Full Access)
        CommunicationState (Communication State) : 2 (Communication OK)
        Model (Model) : VIOS Cluster (VIOS Cluster)
        CreatedDate (Created Date) : 2012-12-17T19:35:08+01:00 (2012-12-17T19:35:08+01:00)
        ChangedDate (Changed Date) : 2012-12-31T11:14:00+01:00 (2012-12-31T11:14:00+01:00)
        CurrentTimeZone (Agent Time Zone Offset) : -1 ()
        IPv4Address (IP Addresses) : { '10.10.122.109', '10.10.122.107', '10.10.122.108', '10.10.122.110' } (10.10.122.109, 10.10.122.107, 10.10.122.108, 10.10.122.110)
        HostName (IP Hosts) : { '10.10.122.109', '10.10.122.107', '10.10.122.108', '10.10.122.110' } (10.10.122.109, 10.10.122.107, 10.10.122.108, 10.10.122.110)
        OperatingState (State) : 0 (Unknown)
        DisplayPingTime (Query Vital Properties Interval) : 2 (Every hour)
        DisplayOperationalStatusTime (Verify Connection Interval) : 3 (Every 15 minutes)
    

  • Shared Storage Pool :
  • # smcli lssspstoragepool -C 0x4fb69 -l
    vio000tst-ssp:
            OID : 0x4fb70
            Capacity : 65,280
            RemainingManagedSpace : 30,574
            Threshold : 95
            CreatedDate : 12/17/12 7:35 PM
            ChangedDate : 12/31/12 11:12 AM
    # smcli lssharedstpool -l
    vio000tst-cluster:
            OID : 0x4fb69
            PrimaryHostName : 10.10.122.109
            CreatedDate : 12/17/12 7:35 PM
            ChangedDate : 12/31/12 11:14 AM
    # smcli lssspviosvs -C 0x4fb69
    vios1, vios2, vios3, vios4
    # smcli lssspphysvol -C 0x4fb69 -l
    Repository Volumes
    hdisk5:
            OID : 0x522bb
            UDID : 1D0667520609SYMMETRIX03EMCfcp
            TotalSize (MB) : 32768
            DeviceID on end-point : hdisk5
            CreatedDate : 12/18/12 11:12 AM
            ChangedDate : 12/31/12 11:15 AM
    
    Storage Pool Volumes
    hdisk6:
            OID : 0x522c3
            UDID : 1D0667520709SYMMETRIX03EMCfcp
            TotalSize (MB) : 32768
            DeviceID on end-point : hdisk6
            CreatedDate : 12/18/12 11:12 AM
            ChangedDate : 12/31/12 11:15 AM
    
    hdisk7:
            OID : 0x522c1
            UDID : 1D0667520809SYMMETRIX03EMCfcp
            TotalSize (MB) : 32768
            DeviceID on end-point : hdisk7
            CreatedDate : 12/18/12 11:12 AM
            ChangedDate : 12/31/12 11:15 AM
    

  • Check the Shared Storage Pool is present in the storage managment tab :

Step by step VMcontrol workflow

Here is a step by step workflow for the deployment of a new Workload :

  • 1/ Prerequistes sum up :
    • Ensure that IBM Systems Director version is 6.3.2.
    • Install the IBM System Director Common Agent with the Virtual I/O Server Cluster Subagent on each Shared Storage Pool’s Virtual I/O Server.
    • Install the IBM System Director Common Agent with the NIM Subagent and Common Repository Subagent on NIM Server.
    • Run a full inventory on each IBM System Director objects used by VMcontrol.
    • Check that the Virtual I/O Cluster and the Shared Storage Pool are discovered and accessible.
  • 2/ Create a Common Repository on NIM Server using ‘mkrepos’ command on IBM Systems Director. (Repository can also be created directly on the Virtual I/O Cluster.)
  • 3/ Capture or import a Virtual Appliance using ‘captureva’ or ‘importva’ command on IBM Systems Director. A Virtual Appliance can be capture from :
    • a mksysb.
    • a lpp_source.
    • a Virtual Server (a lpar) (lpar must be powered off to be captured).
    • an existing Virtual appliance.
  • 4/ Using ‘deployva’ command from IBM Systems Director deploy the previously captured Virtual Appliance. Use ‘lscustomization’ command to check available and compatible parameters. A Virtual Appliance can be deployed on :
    • A Server (if deployed on a Server Virtual Appliance resilience will not be active).
    • A System Pool (if deployed on a System Pool Virtual Appliance resilience will be active, and Virtual Appliance can automatically be relocated based on user defined criteria)
  • 5/ Using the Hardware Management Console IBM Systems Director create the new logical partition :
    • If you’re using dual Virtual I/O Servers two client scsi adapters will be created on the new logical parition.
    • On each Virtual I/O Servers a server scsi adapter will be created.
    • A new backing device will be created in the Shared Storage Pool.
  • 6/ IBM System Director will ‘prepare’ the NIM server trough the NIM subagent :
    • Management Object will be created (a hmc object, a cec object).
    • If this is the first deploy of the Virtual Appliance an associated spot will be created.
    • A machine object will be created (be careful, NIM has to resolve the hostname).
    • scripts, resolv_conf, bosinst_data object will be created
  • 7/ Previously created resources are correctly exported to enable the installation of the new Workload.
  • 8/ Logical partition is booted and installed using Hardware Management Console (lpar_netboot).
  • 9/ Resources are unexported.
  • 10/ Inventory is collected on the new created Workload.

Click on the image to enlarge it, this is how VMcontrol is working :

Common image repository creation

VMcontrol needs an Image Repository to store captured Virtual Appliances, a Common Image Repository can be created on a NIM Server or on a Virtual I/O Server. I’ve created 2 repositories, one on the NIM server and the other one on a Virtual I/O Server. Here is an example : how to create a Common Image Repository on a Virtual I/O Server. My “main” Common Image Repository was created on my NIM server, this is the one used for the rest of this post.

  • Use the ‘mkrepos’ command to identify the storage OID, in this case 326512 :
  • # smcli mkrepos -C | grep -ip vios1
    vios1 (329286)
    repositorystorage
            Min:    1
            Max:    1
            Description:    null
            Options:
            Key,    Storage,        Storage location,       Type,   Available GB,   Total GB,       Description,    OID
            [vio000tst-ssp]         vio000tst-ssp vio000tst-cluster     SAN     23      68              326512
    
  • Use the ‘lssys’ command to identify the Operating System on which the Common Image Repository will be created :
  • # smcli lssys  -oT  vios1
    vios1, Server, 0x5067e
    vios1, OperatingSystem, 0x50646
    
  • Using the Operating System’s OID and the Storage’s OID build the ‘mkrepos’ command and create the common repository :
  • # smcli mkrepos -S 326512 -O 0x50646 -n vio-common-repository
    
  • All repositories can be listed with ‘lsrepos’ command :
  • # smcli lsrepos -l
    nim
            Notifiable:true
            ClassName:com.ibm.usmi.datamodel.software.ImageRepository
            UniqueId:15b69cf1-433d-4bc8-98af-b7ec033797c1
            ImageRepositoryType:1
            ChangedDate:2012-12-19T11:08:29+01:00
            SourceTokens:{ 'NO_IR_DELETE' }
            DisplayName:nim
            CreatedDate:2012-12-19T11:08:29+01:00
            ImagingTool:DISCOVERY_NIM_REPOSITORY
            OID:340561
            Guid:07DFEC27B3763D56976642CD4CE0A493
            ObjectType:ImageRepository
            DisplayNameSpecified:true
    
    vio-common-repository
            Notifiable:true
            ClassName:com.ibm.usmi.datamodel.software.ImageRepository
            UniqueId:6b51515c-e754-458a-b5ed-310fd4b1d380
            ImageRepositoryType:0
            ChangedDate:2013-01-07T16:53:40+01:00
            DisplayName:udivo-common-repository
            CreatedDate:2013-01-07T16:53:40+01:00
            ImagingTool:DISCOVERY_CR_REPOSITORY
            OID:369178
            Guid:7CDBB36682143AD1A4B12E91B2707215
            AgentType:1
            ObjectType:ImageRepository
            DisplayNameSpecified:true
    

Virtual appliance capture

A captured Virtual Appliance is stored on a repository. Before trying to capture a new Virtual Appliance the first thing to do is to identify the common repository that will be used. As always, I’m working with ID or OID not with names :

  • Identify the common repository with the ‘lsrepos’ command :
  • # smcli lsrepos -o
    nim, 340561 (0x53251)
    vio-repository, 338309 (0x52985)
    

After the repository has been identify use the ‘captureva’ command to capture the virtual appliance, here are two examples, one using an mksysb the second one using an lpp_source on the NIM server :

  • Capturing a virtual appliance for an mksysb :
  • # smcli captureva -v -r 340561 -F repos://export/nim/images/moonstone -n 7100-02-00-1241-virtual_appliance -D "imported from mksysb 7100-02-00-1241" -A "cpushare=0.1,memsize=512"
    Wed Dec 19 16:26:49 CET 2012  captureva Operation started.
    Attempt to get capture object data from file repos://export/nim/image/moonstone
    Update collection with user entered attributes.
    Call captureFile function
    Call capture command executed. Return code= 340,914
    Wed Dec 19 16:27:40 CET 2012  captureva Operation took 51 seconds.
    
  • Capturing a virtual appliance from an lpp_source :
  • # smcli captureva -vvvv -r 340561 -F repos:6100-08-01-1245-lpp_source -n 6100-08-01-1245-virtual_appliance -D "imported from 6100-08-01-1245-lpp_source" -A "cpushare=0.1,memsize=512"
    Thu Dec 27 18:01:05 CET 2012  captureva Operation started.
    Attempt to get capture object data from file repos:6100-08-01-1245-lpp_source
    Update collection with user entered attributes.
    Call captureFile function
    Call capture command executed. Return code= 350,520
    Thu Dec 27 18:01:38 CET 2012  captureva Operation took 32 seconds.
    

If a mksysb is used for the capture, a new NIM Object is created :

# lsnim -l appliance-1_image-1
appliance-1_image-1:
   class          = resources
   type           = mksysb
   Rstate         = ready for use
   prev_state     = unavailable for use
   location       = /export/nim/appliances/84dd48b5-2eaa-416c-b70b-fe4fe3c5c6c1/moonstone
   version        = 7
   release        = 1
   mod            = 2
   oslevel_r      = 7100-02
   alloc_count    = 0
   server         = master
   extracted_spot = nimrf-0000000000000005-spot
   creation_date  = Wed Dec 19 16:28:48 2012

All captured Virtual Appliances are stored in /exports/nim/appliances:

# ls /export/nim/appliances
84dd48b5-2eaa-416c-b70b-fe4fe3c5c6c1  bf6c42a3-45c8-4764-835e-0b4dc10a90a4  d5227bc5-85ef-4e82-ba86-b7835652a5f7  lost+found
b907734c-84a5-41eb-a112-db7df014984d  d03f6420-d0e7-4756-8e02-7c2e350cfabb  da0b9051-4046-4341-9b41-b8c6dbefb9e6  version

Each Virtual Appliance is described in an ovf (open visualization format) file. This file can be edited by hand :

# more da0b9051-4046-4341-9b41-b8c6dbefb9e6.ovf

Deploy a new Virtual Appliance

Before trying to deploy a new Virtual Appliance you have to collect some information :

  • What is the OID of the Virtual Appliance to be deployed (use ‘lsva’ command to list virtual appliances):
  • # smcli lsva -o
    5300-12-05-1140-virtual_appliance, 346436 (0x54944)
    6100-08-01-1245-virtual_appliance, 350520 (0x55938)
    7100-02-00-1241-virtual_appliance, 359718 (0x57d26)
    
  • Is this a new Virtual Appliance (deploy_new : the lpar will be created) or an existing virtual appliance (deploy_existing : an existing lpar will be used to deploy the Virtual Appliance).
  • On which host, or on which system pool will the virtual appliance be deployed (use ‘lsdeploytargets’ to check eligible hosts):
  • On a server :
  • # smcli lsdeploytargets -v -a deploy_new -V 340914 | grep TST
    P520-TST-1, (0x18312) (P520-TST-1)
    P520-TST-2, (0x1830c) (P520-TST-2)
    
  • On a systems pool :
  • # smcli lsdeploytargets -v -o -a deploy_new -V 350520 | grep TST
    FRMTST-systempool, 359968 (0x57e20) (FRMTST-systempool)
    
  • Some parameters can be tuned (storage pool used, hostname, ip, etc..). Use ‘lscustomization’ command to check tunable parameters :
  • # smcli lscustomization -a deploy_new -V 350520 -s 0x1830c
    [..]
    virtualnetworks
            Description:    Network Mapping
            Changeable Columns:
                    Column Name*    CLI Attribute
                    Virtual Networks on Host        hostVnet
    
            Options:
            Key,    Network Name,   Description,    Virtual Networks on Host*
            [Network 1]     Network 1       Default network Discovered/1122/0
    
            Options:        Discovered/1122/0 (Discovered/1122/0 (VLAN 1122, Bridged)),
                            ETHERNET0/1122 (Discovered/1122/0 (VLAN 1122, Bridged)),
                            Discovered/4094/0 (Discovered/4094/0 (VLAN 4094, Not Bridged)),
                            ETHERNET0/4094 (Discovered/4094/0 (VLAN 4094, Not Bridged)),
                            Discovered/999/0 (Discovered/999/0 (VLAN 999, Bridged)),
                            ETHERNET0/999 (Discovered/999/0 (VLAN 999, Bridged))
    [..]
    poolstorages
            Min:    1
            Max:    1
            Description:    The storage pools available for virtual disk allocation. Used together with the storagemapping parameter.
            Options:
            Key,    Name,   Location,       VIOS Count,     Maximum Allocation (MB),        Description
            [326512]        vio000tst-ssp VIOS Cluster: vio000tst-cluster       2       18209   Shared Storage Pool accessed through one or more VIOS.
    [..]
    
  • Here are two examples of a Virtual Appliance deployment, one on a Server, the second one on a System Pool :
  • On a Server :
  • # smcli deployva -v -s 0x1830c -V 340914 -a deploy_new -A "poolstorages=326512,product.vs0.com.ibm.ovf.vmcontrol.system.networking.hostname=carbon,product.vs0.com.ibm.ovf.vmcontrol.adapter.networking.ipv4addresses.5=10.10.122.239,product.vs0.com.ibm.ovf.vmcontrol.adapter.networking.ipv4netmasks.5=255.255.255.0,product.vs0.com.ibm.ovf.vmcontrol.system.networking.ipv4defaultgateway=10.10.122.254,product.vs0.com.ibm.ovf.vmcontrol.system.networking.dnsIPaddresses=10.20.74.196 10.20.2.251,product.vs0.com.ibm.ovf.vmcontrol.system.networking.domainname=domain.test"
    Fri Dec 21 11:48:03 CET 2012  deployva Operation started.
    Attempt to get the default customization data for deploy_new.
    Attempt to get the deploy_new customization data.
    Update collection with user entered attributes.
    Attempt to validate the deploy request for 340,914.
    Attempt to deploy new.
    Workload 7100-02-00-1241-virtual_appliance_56750 was created.
    DNZLOP412I Deploying virtual appliance 7100-02-00-1241-virtual_appliance to server P520-TST-2.
    DNZLOP412I Deploying virtual appliance 7100-02-00-1241-virtual_appliance to server carbon.
    DNZLOP401I Booting virtual server carbon to the Open Firmware state.
    DNZLOP402I Gathering network adapter information for virtual server carbon.
    DNZLOP405I Initiating deploy processing on the NIM master.
    Virtual server carbon added to workload 7100-02-00-1241-virtual_appliance_56750.
    Workload 7100-02-00-1241-virtual_appliance_56750 is stopped.
    DNZIMC094I Deployed Virtual Appliance 7100-02-00-1241-virtual_appliance to new Server carbon hosted by system P520-TST-2.
    Fri Dec 21 12:13:14 CET 2012  deployva Operation took 1510 seconds.
    
  • On a System Pool :
  • smcli deployva -v -V 350520 -g 0x57e20 -m -1093167946908409598_01 -a deploy_new -A "poolstorages=326512,product.vs0.com.ibm.ovf.vmcontrol.system.networking.hostname=carbon,product.vs0.com.ibm.ovf.vmcontrol.adapter.networking.ipv4addresses.5=10.10.122.231,product.vs0.com.ibm.ovf.vmcontrol.adapter.networking.ipv4netmasks.5=255.255.255.0,product.vs0.com.ibm.ovf.vmcontrol.system.networking.ipv4defaultgateway=10.10.122.254,product.vs0.com.ibm.ovf.vmcontrol.system.networking.dnsIPaddresses=10.20.74.196 10.20.2.251,product.vs0.com.ibm.ovf.vmcontrol.system.networking.domainname=domain.test"
    

As you can see, the new Virtual Appliance is created in almost 20 minutes, no so bad ….. Here is a screenshot, with some deployed Virtual Appliances :

What’s next

In my opinion, VMcontrol is very powerfull, deploying a new AIX lpar in 20 minutes is incredible. Combined with a Shared Storage Pool, VMcontrol can easily be used and installed by everyone. In the part 6 of “Adventure in IBM Systems Director” I’ll post about how to create a resilient workload. A resiliant workload has to be created on a System Pool and can be automatically relocated between the System Pool’s hosts. These workloads are monitored with resiliency policy, if some problems are detected, action and relocation are taken to maintain workload resilience. I do not want to talk too much about that in this post, you’ll have to wait the next one.

Hope this can help.

Adventures in IBM Systems Director in System P environment. Part 4: Playing with errnotify and genevent.sh

For some puzzling reasons that I don’t understand my client do not want to install supervision agents (Tivoli) on VIO Servers. I can’t argue that decision, I have to work with it, anyway, in my opinion VIO Sersers are one of the most important part in a Pseries virtualized environment. It has to be monitored. One of our recurring problems comes from Ethernet Ports always flapping resulting in Shared Ethernet Adapter failover (become primary, then become backup, and so on) :

  • ent1 is flapping on this VIO Server :
  • # errlog | more 
    E136EAFA   1009203212 I H ent7           BECOME PRIMARY
    F3931284   1009203212 I H ent1           ETHERNET NETWORK RECOVERY MODE
    0B41DD00   1009103812 I H ent7           ADAPTER FAILURE
    EC0BCCD4   1009103812 T H ent1           ETHERNET DOWN
    E136EAFA   1009103812 I H ent7           BECOME PRIMARY
    F3931284   1009103812 I H ent1           ETHERNET NETWORK RECOVERY MODE
    0B41DD00   1009022212 I H ent7           ADAPTER FAILURE
    EC0BCCD4   1009022212 T H ent1           ETHERNET DOWN
    E136EAFA   1009022212 I H ent7           BECOME PRIMARY
    
  • SEA on second VIO Server become backup, then primary :
  • 40D97644   1009203212 I H ent7           BECOME BACKUP
    E136EAFA   1009103912 I H ent7           BECOME PRIMARY
    40D97644   1009103812 I H ent7           BECOME BACKUP
    E136EAFA   1009022212 I H ent7           BECOME PRIMARY
    40D97644   1009022212 I H ent7           BECOME BACKUP
    E136EAFA   1009022112 I H ent7           BECOME PRIMARY
    40D97644   1009022112 I H ent7           BECOME BACKUP
    E136EAFA   1009022112 I H ent7           BECOME PRIMARY
    40D97644   1009022112 I H ent7           BECOME BACKUP
    

This problem can really be important on production hosts and has to be detected “in real-time”. Trapping errpt errors and running scripts when an error is raised is possible with errnotify and can be even more useful if this error is raised on IBM Systems Director. Here is the method is used to send me a mail every time a port is flapping :

errnotify configuration

The first thing I have to do is to setup errnotify to run a script every time an ethernet link is down on a VIO Server :

  • Identify error code : on my VIO Server I have to identify which error code is raised when a link is up or down :
  • This list is maybe not exhaustive but these are errors code I found in errpt when a link is :
    • Down (MSNENT_LINK_DOWN and GOENT_LINK_DOWN) :
    • # oem_setup_env
      # errpt -t | grep ABB8A22B
      ABB8A22B MSNENT_LINK_DOWN    TEMP H  ETHERNET DOWN
      # errpt -t | grep EC0BCCD4
      EC0BCCD4 GOENT_LINK_DOWN     TEMP H  ETHERNET DOWN
      
    • Up (MSNENT_RCVRY_EXIT and GOENT_RCVRY_EXIT) :
    • # oem_setup_env
      # errpt -t | grep 4969AE33
      4969AE33 MSNENT_RCVRY_EXIT   INFO H  ETHERNET NETWORK RECOVERY MODE
      # errpt -t |grep F3931284
      F3931284 GOENT_RCVRY_EXIT    INFO H  ETHERNET NETWORK RECOVERY MODE
      
  • With these errpt identifiers create an odm entries file to be added in odm :
  • # vi errnotify.odmadd
    errnotify:
        en_pid = 0
        en_name = "MSNENT_RCVRY_EXIT"
        en_persistenceflg = 1
        en_label = "MSNENT_RCVRY_EXIT"
        en_class = H
        en_crcid = 0
        en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
    errnotify:
        en_pid = 0
        en_name = "MSNENT_LINK_DOWN"
        en_persistenceflg = 1
        en_label = "MSNENT_LINK_DOWN"
        en_class = H
        en_crcid = 0
        en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
    errnotify:
        en_pid = 0
        en_name = "GOENT_LINK_DOWN"
        en_persistenceflg = 1
        en_label = "GOENT_LINK_DOWN"
        en_class = H
        en_crcid = 0
        en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
    errnotify:
        en_pid = 0
        en_name = "GOENT_RCVRY_EXIT"
        en_persistenceflg = 1
        en_label = "GOENT_RCVRY_EXIT"
        en_class = H
        en_crcid = 0
        en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
    # odmadd errnotify.odmadd
    
  • Check entries are correctly added in odm :
  • # odmget errnotify | tail -16
    errnotify:
            en_pid = 0
            en_name = "GOENT_RCVRY_EXI"
            en_persistenceflg = 1
            en_label = "GOENT_RCVRY_EXIT"
            en_crcid = 0
            en_class = "H"
            en_type = ""
            en_alertflg = ""
            en_resource = ""
            en_rtype = ""
            en_rclass = ""
            en_symptom = ""
            en_err64 = ""
            en_dup = ""
            en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
    

Writing script called by errnotify. Using genevent.sh

For my own use I need to write in a file every time a link has become up or down, this will be added to the script. But what I want first is to generate an event on Systems Director. This can be done with a script delivered by Common Agent called genevent.sh. Let’s have a look on my link_state.notify script :

# cat /usr/lib/ras/link_state.notify
#!/bin/ksh
echo "`date` | $1 $2 $3 $4 $5 $6 $7 $8 $9" >> /home/padmin/vio_mon/vio_mon.errnotify
/var/opt/tivoli/ep/runtime/agent/subagents/director/genevent.sh /type:"Managed Resource.Managed System Resource.Logical Resource.Logical Device.Logical Port.Network Port.Ethernet Port" /text:"Link $6 $9" /sev:0

The first line as I told you before is for my own use, the second one call genvent.sh with :

  • /type : event type generated on Systems Director, in my case : Resource.Logical Device.Logical Port.Network Port.Ethernet Port.
  • /text : $6 is the ethernet adapter, and $9 the errpt identifer (GOENT_RCVRY_EXIT, etc.).
  • /sev : event severity, 0 for fatal, 1 for critial and so on.

Create Event filter, Event Action and Event Automation Plan

As shown on the image below, an error will be raised on the Systems Director when a link is flapping :

Right click on this event to create an Event Filter, mine is called “Link Down on VIO Server” :

Create a new Event Action, in my case I want to send en email on my mailbox to notify a Link Down (IP and e-mail adresses are hidden in this screenshot) :

With this Event Filter and the Event Action, create an Event Automation Plan to send a mail when a link is flapping, when you’re creating this Event Automation Plan, use the newly created Event filter and Event Action :

  • Event filter choice :
  • Event action choice :
  • Event automation plan creation summary :

Testing

Do not forget to test this newly created Event Automation Plan, call your network team to shut/no shut an Shared Ethernet adapter port. You’ll receive a new mail in your mail box :

Link ent1 MSNENT_RCVRY_EXIT

Event Text     Link ent1 MSNENT_RCVRY_EXIT
Date           10/18/2012 6:21 PM CEST
Severity       Fatal
Event Type     Managed Resource.Managed System Resource.Logical Resource.Logical Device.Logical Port.Network Port.Ethernet Port
System Name    vio35
Sender Name    vio35

Hope this can help.