NovaLink ‘HMC Co-Management’ and PowerVC 1.3.0.1Dynamic Resource Optimizer

Everybody now knows that I’m using PowerVC a lot in my current company. My environment is growing bigger and bigger and we are now managing more than 600 virtual machines with PowerVC (the goal is to reach ~ 3000 this year). Some of them were build by PowerVC itself and some of them were migrated through an homemade python script calling the PowerVC rest api and moving our old vSCSI machines to the new full NPIV/Live Partition Mobility/PowerVC environment (Still struggling with the “old mens” to move on SSP, but I’m alone versus everybody on this one). I’m happy with that but (there is always a but) I’m facing a lot problems. The first one is that we are doing more and more stuffs with PowerVC (Virtual Machine creation, virtual machines resizing, adding additional disks, moving machine with LPM, and finally using this python scripts to migrate the old machines to the new environment). I realized that the machine hosting the PowerVC was slower and slower and the more actions we do the more the PowerVC was “unresponsive”. By this I mean that the GUI was slow, creating objects was slower and slower. By looking at CPU graphs in lpar2rrd we noticed that the CPU consumption was growing as fast as we were doing stuffs on PowerVC (check the graph below). The second problem was my teams (unfortunately for me, we have here different teams doing different sort of stuffs here and everybody is using the Hardware Management Consoles it’s own way, some people are renaming the machine making them unusable with PowerVC, some people were changing the profiles disabling the synchronization, even worse we have some third party tools used for capacity planning making the Hardware Management Console unusable by PowerVC). The solution to all these problems is to use NovaLink and especially the NovaLink Co-Management. By doing this the Hardware Management Consoles will be restricted to a read-only view and PowerVC will stop querying the HMCs and will directly query the NovaLink partitions on each hosts instead of querying the Hardware Management Consoles.

cpu_powervc

What is NovaLink ?

If you are using PowerVC you know that this one is based on OpenStack. Until now all the Openstack services where running on the PowerVC host. If you check on the PowerVC today you can see that there is one Nova per managed host. In the example below I’m managing ten hosts so I have ten different Nova processes running :

# ps -ef | grep [n]ova-compute
nova       627     1 14 Jan16 ?        06:24:30 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_10D6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_10D6666.log
nova       649     1 14 Jan16 ?        06:30:25 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_65E6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_65E6666.log
nova       664     1 17 Jan16 ?        07:49:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1086666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_1086666.log
nova       675     1 19 Jan16 ?        08:40:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_06D6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_06D6666.log
nova       687     1 18 Jan16 ?        08:15:57 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6576666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_6576666.log
nova       697     1 21 Jan16 ?        09:35:40 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6556666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_6556666.log
nova       712     1 13 Jan16 ?        06:02:23 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_10A6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_10A6666.log
nova       728     1 17 Jan16 ?        07:49:02 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1016666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_1016666.log
nova       752     1 17 Jan16 ?        07:34:45 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1036666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9119MHE_1036666.log
nova       779     1 13 Jan16 ?        05:54:52 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6596666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9119MHE_6596666.log
# ps -ef | grep [n]ova-compute | wc -l
10

The goal of NovaLink is to move these processes on a dedicated partition running on each managed host (each PowerSystems). This partition is called the NovaLink partition. This one is running on an Ubuntu 15.10 Linux OS (Little endian) (so only available on Power8 hosts) and is in charge to run the Openstack nova processes. By doing that you will distribute the load across all the NovaLink partitions instead of charging one PowerVC host. Even better my understanding is that the NovaLink partition is able to communicate directly with the FSP. By using NovaLink you will be able to stop using the Hardware Management Consoles anymore and avoid the slowness of theses ones. As the NovaLink partition is hosted on the host itself the RMC connections are can now use a direct link (ipv6) through the PowerHypervisor. No more RMC connection problem at all ;-), it’s just awesome. NovaLink allows you to choose between two modes of management:

  • Full Nova Management: You install your new host directly with NovaLink on it and you will not need an Hardware Management Console Anymore (In this case the NovaLink installation is in charge to deploy the Virtual I/O Servers and the SEAs).
  • Nova Co-Management: Your host is already installed and you give the write access (setmaster) to the NovaLink partition, the Hardware Management Console will be limited in this mode (you will not be able to create partition anymore or modify profile, it’s not a “read only” mode as you will be able to start and stop the partitions and still do some stuffs with HMC but you will be very limited).
  • You can still mix NovaLink and Non-NovaLink management hosts, and still have P7/P6 managed by HMCs, P8 managed by HMCs, P8 Nova Co-Managed and P8 full Nova Managed ;-).
  • Nova1

Prerequisites

As always upgrade your systems to the latest code level if you want to use NovaLink and NovaLink Co-Management

  • Power 8 only with firmware version 840. (or later)
  • Virtual I/O Server 2.2.4.10 or later
  • For NovaLink co-management HMC V8R8.4.0
  • Obviously install NovaLink on each NovaLink managed system (install the latest patch version of NovaLink)
  • PowerVC 1.3.0.1 or later

NovaLink installation on an existing system

I’ll show you here how to install a NovaLink partition on an existing deployed system. Installing a new system from scratch is also possible. My advice is that you look at this address to start: , and check this youtube video showing you how a system is installed from scratch :

The goal of this post is to show you how to setup a co-managed system on an already existing system with Virtual I/O Servers already deployed on the host. My advice is to be very careful. The first thing you’ll need to do is to created a partition (2VP 0.5EC and 5GB Memory) (I’m calling it nova in the example below) and use the Virtual Optical device to load the NovaLink system on this one. In the example below the machine is “SSP” backed. Be very careful when do that: setup the profile name, and all the configuration stuffs before moving to co-managed mode … after that it will be harder for you to change things as the new pvmctl command will be very new to you:

# mkvdev -fbo -vadapter vhost0
vtopt0 Available
# lsrep
Size(mb) Free(mb) Parent Pool         Parent Size      Parent Free
    3059     1579 rootvg                   102272            73216

Name                                                  File Size Optical         Access
PowerVM_NovaLink_V1.1_122015.iso                           1479 None            rw
vopt_a19a8fbb57184aad8103e2c9ddefe7e7                         1 None            ro
# loadopt -disk PowerVM_NovaLink_V1.1_122015.iso -vtd vtopt0
# lsmap -vadapter vhost0 -fmt :
vhost0:U8286.41A.21AFF8V-V2-C40:0x00000003:nova_b1:Available:0x8100000000000000:nova_b1.7f863bacb45e3b32258864e499433b52: :N/A:vtopt0:Available:0x8200000000000000:/var/vio/VMLibrary/PowerVM_NovaLink_V1.1_122015.iso: :N/A
  • At the gurb page select the first entry:
  • install1

  • Wait for the machine to boot:
  • install2

  • Choose to perform an installation:
  • install3

  • Accept the licenses
  • install4

  • padmin user:/li>
    install5

  • Put you network configuration:
  • install6

  • Accept to install the Ubuntu system:
  • install8

  • You can then modify anything you want in the configuration file (in my case the timezone):
  • install9

    By default NovaLink (I think not 100% sure) is designed to be installed on SAS disk, so without multipathing. If like me you decide to install the NovaLink partition in a “boot-on-san” lpar my advice is to launch the installation without any multipathing enabled (only one vscsi adapter or one virtual fibre channel adapter). After the installation is completed install the Ubuntu multipathd service and configure the second vscsi or virtual fibre channel adapter. If you don’t do that you may experience problem at the installation time (RAID error). Please remember that you have to do that before enabling the co-management. Last thing about the installation it may takes a lot of time to finish. So be patient (especially the preseed step).

install10

Updating to the latest code level

The iso file provider in the Entitled Software Support is not updated to the latest available NovaLink code. Make a copy of the official repository available at this address: ftp://public.dhe.ibm.com/systems/virtualization/Novalink/debian. Serve the content of this ftp server on you how http server (use the command below to copy it):

# wget --mirror ftp://public.dhe.ibm.com/systems/virtualization/Novalink/debian

Modify the /etc/apt/sources.list (and source.list.d) and comment all the available deb repository to on only keep your copy

root@nova:~# grep -v ^# /etc/apt/sources.list
deb http://deckard.lab.chmod666.org/nova/Novalink/debian novalink_1.0.0 non-free
root@nova:/etc/apt/sources.list.d# apt-get upgrade
Reading package lists... Done
Building dependency tree
Reading state information... Done
Calculating upgrade... Done
The following packages will be upgraded:
  pvm-cli pvm-core pvm-novalink pvm-rest-app pvm-rest-server pypowervm
6 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Need to get 165 MB of archives.
After this operation, 53.2 kB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pypowervm all 1.0.0.1-151203-1553 [363 kB]
Get:2 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-cli all 1.0.0.1-151202-864 [63.4 kB]
Get:3 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-core ppc64el 1.0.0.1-151202-1495 [2,080 kB]
Get:4 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-rest-server ppc64el 1.0.0.1-151203-1563 [142 MB]
Get:5 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-rest-app ppc64el 1.0.0.1-151203-1563 [21.1 MB]
Get:6 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-novalink ppc64el 1.0.0.1-151203-408 [1,738 B]
Fetched 165 MB in 7s (20.8 MB/s)
(Reading database ... 72094 files and directories currently installed.)
Preparing to unpack .../pypowervm_1.0.0.1-151203-1553_all.deb ...
Unpacking pypowervm (1.0.0.1-151203-1553) over (1.0.0.0-151110-1481) ...
Preparing to unpack .../pvm-cli_1.0.0.1-151202-864_all.deb ...
Unpacking pvm-cli (1.0.0.1-151202-864) over (1.0.0.0-151110-761) ...
Preparing to unpack .../pvm-core_1.0.0.1-151202-1495_ppc64el.deb ...
Removed symlink /etc/systemd/system/multi-user.target.wants/pvm-core.service.
Unpacking pvm-core (1.0.0.1-151202-1495) over (1.0.0.0-151111-1375) ...
Preparing to unpack .../pvm-rest-server_1.0.0.1-151203-1563_ppc64el.deb ...
Unpacking pvm-rest-server (1.0.0.1-151203-1563) over (1.0.0.0-151110-1480) ...
Preparing to unpack .../pvm-rest-app_1.0.0.1-151203-1563_ppc64el.deb ...
Unpacking pvm-rest-app (1.0.0.1-151203-1563) over (1.0.0.0-151110-1480) ...
Preparing to unpack .../pvm-novalink_1.0.0.1-151203-408_ppc64el.deb ...
Unpacking pvm-novalink (1.0.0.1-151203-408) over (1.0.0.0-151112-304) ...
Processing triggers for ureadahead (0.100.0-19) ...
ureadahead will be reprofiled on next reboot
Setting up pypowervm (1.0.0.1-151203-1553) ...
Setting up pvm-cli (1.0.0.1-151202-864) ...
Installing bash completion script /etc/bash_completion.d/python-argcomplete.sh
Setting up pvm-core (1.0.0.1-151202-1495) ...
addgroup: The group `pvm_admin' already exists.
Created symlink from /etc/systemd/system/multi-user.target.wants/pvm-core.service to /usr/lib/systemd/system/pvm-core.service.
0513-071 The ctrmc Subsystem has been added.
Adding /usr/lib/systemd/system/ctrmc.service for systemctl ...
0513-059 The ctrmc Subsystem has been started. Subsystem PID is 3096.
Setting up pvm-rest-server (1.0.0.1-151203-1563) ...
The user `wlp' is already a member of `pvm_admin'.
Setting up pvm-rest-app (1.0.0.1-151203-1563) ...
Setting up pvm-novalink (1.0.0.1-151203-408) ...

NovaLink and HMC Co-Management configuration

Before adding the hosts on PowerVC you still need to do the most important thing. After the installation is finished enable the co-management mode to be able to have a system managed by NovaLink and still connected to an Hardware Management Console:

  • Enable the powerm_mgmt_capable attribute on the Nova partition:
  • # chsyscfg -r lpar -m br-8286-41A-2166666 -i "name=nova,powervm_mgmt_capable=1"
    # lssyscfg -r lpar -m br-8286-41A-2166666 -F name,powervm_mgmt_capable --filter "lpar_names=nova"
    nova,1
    
  • Enable co-management (please not here that you have to setmaster (you’ll see that the curr_master_name is the HMC) and then relmaster (you’ll see that the curr_master_name is the NovaLink Partition, this is that state where we want to be)):
  • # lscomgmt -m br-8286-41A-2166666
    is_master=null
    # chcomgmt -m br-8286-41A-2166666 -o setmaster -t norm --terms agree
    # lscomgmt -m br-8286-41A-2166666
    is_master=1,curr_master_name=myhmc1,curr_master_mtms=7042-CR8*2166666,curr_master_type=norm,pend_master_mtms=none
    # chcomgmt -m br-8286-41A-2166666 -o relmaster
    # lscomgmt -m br-8286-41A-2166666
    is_master=0,curr_master_name=nova,curr_master_mtms=3*8286-41A*2166666,curr_master_type=norm,pend_master_mtms=none
    

Going back to HMC managed system

You can go back to an Hardware Management Console managed system whenever you want (set the master to the HMC, delete the nova partition and release the master from the HMC).

# chcomgmt -m br-8286-41A-2166666 -o setmaster -t norm --terms agree
# lscomgmt -m br-8286-41A-2166666
is_master=1,curr_master_name=myhmc1,curr_master_mtms=7042-CR8*2166666,curr_master_type=norm,pend_master_mtms=none
# chlparstate -o shutdown -m br-8286-41A-2166666 --id 9 --immed
# rmsyscfg -r lpar -m br-8286-41A-2166666 --id 9
# chcomgmt -o relmaster -m br-8286-41A-2166666
# lscomgmt -m br-8286-41A-2166666
is_master=0,curr_master_mtms=none,curr_master_type=none,pend_master_mtms=none

Using NovaLink

After the installation you are now able to login on the NovaLink partition. (You can gain root access with “sudo su -” command). A command new called pvmctl is available on the NovaLink partition allowing you to perform any actions (stop, start virtual machine, list Virtual I/O Servers, ….). Before trying to add the host double check that the pvmctl command is working ok.

padmin@nova:~$ pvmctl lpar list
Logical Partitions
+------+----+---------+-----------+---------------+------+-----+-----+
| Name | ID |  State  |    Env    |    Ref Code   | Mem  | CPU | Ent |
+------+----+---------+-----------+---------------+------+-----+-----+
| nova | 3  | running | AIX/Linux | Linux ppc64le | 8192 |  2  | 0.5 |
+------+----+---------+-----------+---------------+------+-----+-----+

Adding hosts

On the PowerVC side add the NovaLink host by choosing the NovaLink option:

addhostnovalink

Some deb (ibmpowervc-power)packages will be installed on configured on the NovaLink machine:

addhostnovalink3
addhostnovalink4

By doing this, on each NovaLink machine you can check that a nova-compute process is here. (By adding the host the deb was installed and configured on the NovaLink host:

# ps -ef | grep nova
nova      4392     1  1 10:28 ?        00:00:07 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log
root      5218  5197  0 10:39 pts/1    00:00:00 grep --color=auto nova
# grep host_display_name /etc/nova/nova.conf
host_display_name = XXXX-8286-41A-XXXX
# tail -1 /var/log/apt/history.log
Start-Date: 2016-01-18  10:27:54
Commandline: /usr/bin/apt-get -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold -y install --force-yes --allow-unauthenticated ibmpowervc-powervm
Install: python-keystoneclient:ppc64el (1.6.0-2.ibm.ubuntu1, automatic), python-oslo.reports:ppc64el (0.1.0-1.ibm.ubuntu1, automatic), ibmpowervc-powervm:ppc64el (1.3.0.1), python-ceilometer:ppc64el (5.0.0-201511171217.ibm.ubuntu1.199, automatic), ibmpowervc-powervm-compute:ppc64el (1.3.0.1, automatic), nova-common:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), python-oslo.service:ppc64el (0.11.0-2.ibm.ubuntu1, automatic), python-oslo.rootwrap:ppc64el (2.0.0-1.ibm.ubuntu1, automatic), python-pycadf:ppc64el (1.1.0-1.ibm.ubuntu1, automatic), python-nova:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), python-keystonemiddleware:ppc64el (2.4.1-2.ibm.ubuntu1, automatic), python-kafka:ppc64el (0.9.3-1.ibm.ubuntu1, automatic), ibmpowervc-powervm-monitor:ppc64el (1.3.0.1, automatic), ibmpowervc-powervm-oslo:ppc64el (1.3.0.1, automatic), neutron-common:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), python-os-brick:ppc64el (0.4.0-1.ibm.ubuntu1, automatic), python-tooz:ppc64el (1.22.0-1.ibm.ubuntu1, automatic), ibmpowervc-powervm-ras:ppc64el (1.3.0.1, automatic), networking-powervm:ppc64el (1.0.0.0-151109-25, automatic), neutron-plugin-ml2:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), python-ceilometerclient:ppc64el (1.5.0-1.ibm.ubuntu1, automatic), python-neutronclient:ppc64el (2.6.0-1.ibm.ubuntu1, automatic), python-oslo.middleware:ppc64el (2.8.0-1.ibm.ubuntu1, automatic), python-cinderclient:ppc64el (1.3.1-1.ibm.ubuntu1, automatic), python-novaclient:ppc64el (2.30.1-1.ibm.ubuntu1, automatic), python-nova-ibm-ego-resource-optimization:ppc64el (2015.1-201511110358, automatic), python-neutron:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), nova-compute:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), nova-powervm:ppc64el (1.0.0.1-151203-215, automatic), openstack-utils:ppc64el (2015.2.0-201511171223.ibm.ubuntu1.18, automatic), ibmpowervc-powervm-network:ppc64el (1.3.0.1, automatic), python-oslo.policy:ppc64el (0.5.0-1.ibm.ubuntu1, automatic), python-oslo.db:ppc64el (2.4.1-1.ibm.ubuntu1, automatic), python-oslo.versionedobjects:ppc64el (0.9.0-1.ibm.ubuntu1, automatic), python-glanceclient:ppc64el (1.1.0-1.ibm.ubuntu1, automatic), ceilometer-common:ppc64el (5.0.0-201511171217.ibm.ubuntu1.199, automatic), openstack-i18n:ppc64el (2015.2-3.ibm.ubuntu1, automatic), python-oslo.messaging:ppc64el (2.1.0-2.ibm.ubuntu1, automatic), python-swiftclient:ppc64el (2.4.0-1.ibm.ubuntu1, automatic), ceilometer-powervm:ppc64el (1.0.0.0-151119-44, automatic)
End-Date: 2016-01-18  10:28:00

The command line interface

You can do ALL the stuffs you were doing on the HMC using the pvmctl command. The syntax is pretty simple: pvcmtl |OBJECT| |ACTION| where the OBJECT can be vios, vm, vea(virtual ethernet adapter), vswitch, lu (logical unit), or anything you want and ACTION can be list, delete, create, update. Here are a few examples :

  • List the Virtual I/O Servers:
  • # pvmctl vios list
    Virtual I/O Servers
    +--------------+----+---------+----------+------+-----+-----+
    |     Name     | ID |  State  | Ref Code | Mem  | CPU | Ent |
    +--------------+----+---------+----------+------+-----+-----+
    | s00ia9940825 | 1  | running |          | 8192 |  2  | 0.2 |
    | s00ia9940826 | 2  | running |          | 8192 |  2  | 0.2 |
    +--------------+----+---------+----------+------+-----+-----+
    
  • List the partitions (note the -d for display-fields allowing me to print somes attributes):
  • # pvmctl vm list
    Logical Partitions
    +----------+----+----------+----------+----------+-------+-----+-----+
    |   Name   | ID |  State   |   Env    | Ref Code |  Mem  | CPU | Ent |
    +----------+----+----------+----------+----------+-------+-----+-----+
    | aix72ca> | 3  | not act> | AIX/Lin> | 00000000 |  2048 |  1  | 0.1 |
    |   nova   | 4  | running  | AIX/Lin> | Linux p> |  8192 |  2  | 0.5 |
    | s00vl99> | 5  | running  | AIX/Lin> | Linux p> | 10240 |  2  | 0.2 |
    | test-59> | 6  | not act> | AIX/Lin> | 00000000 |  2048 |  1  | 0.1 |
    +----------+----+----------+----------+----------+-------+-----+-----+
    # pvmctl list vm -d name id 
    [..]
    # pvmctl vm list -i id=4 --display-fields LogicalPartition.name
    name=aix72-1-d3707953-00000090
    # pvmctl vm list  --display-fields LogicalPartition.name LogicalPartition.id LogicalPartition.srr_enabled SharedProcessorConfiguration.desired_virtual SharedProcessorConfiguration.uncapped_weight
    name=aix72capture,id=3,srr_enabled=False,desired_virtual=1,uncapped_weight=64
    name=nova,id=4,srr_enabled=False,desired_virtual=2,uncapped_weight=128
    name=s00vl9940243,id=5,srr_enabled=False,desired_virtual=2,uncapped_weight=128
    name=test-5925058d-0000008d,id=6,srr_enabled=False,desired_virtual=1,uncapped_weight=128
    
  • Delete the virtual adapter on the partition name nova (note the –parent-id to select the partition) with a certain uuid which was found with (pvmclt list vea):
  • # pvmctl vea delete --parent-id name=nova --object-id uuid=fe7389a8-667f-38ca-b61e-84c94e5a3c97
    
  • Power off the lpar named aix72-2:
  • # pvmctl vm power-off -i name=aix72-2-536bf0f8-00000091
    Powering off partition aix72-2-536bf0f8-00000091, this may take a few minutes.
    Partition aix72-2-536bf0f8-00000091 power-off successful.
    
  • Delete the lpar named aix72-2:
  • # pvmctl vm delete -i name=aix72-2-536bf0f8-00000091
    
  • Delete the vswitch named MGMTVSWITCH:
  • # pvmctl vswitch delete -i name=MGMTVSWITCH
    
  • Open a console:
  • #  mkvterm --id 4
    vterm for partition 4 is active.  Press Control+] to exit.
    |
    Elapsed time since release of system processors: 57014 mins 10 secs
    [..]
    
  • Power on an lpar:
  • # pvmctl vm power-on -i name=aix72capture
    Powering on partition aix72capture, this may take a few minutes.
    Partition aix72capture power-on successful.
    

Is this a dream ? No more RMC connectivty problem anymore

I’m 100% sure that you always have problems with RMC connectivity due to firwall issues, ports not opened, and IDS blocking RMC ongoing or outgoing traffic. NovaLink is THE solution that will solve all the RMC problems forever. I’m not joking it’s a major improvement for PowerVM. As the NovaLink partition is installed on each hosts this one can communicate through a dedicated IPv6 link with all the partitions hosted on the host. A dedicated virtual switch called MGMTSWITCH is used to allow the RMC flow to transit between all the lpars and the NovaLink partition. Of course this Virtual Switch must be created and one Virtual Ethernet Adapter must also be created on the NovaLink partition. These are the first two actions to do if you want to implement this solution. Before starting here are a few things you need to know:

  • For security reason the MGMTSWITCH must be created in Vepa mode. If you are not aware of what are VEPA and VEB modes here is a reminder:
  • In VEB mode all the the partitions connected to the same vlan can communicate together. We do not want that as it is a security issue.
  • The VEPA mode gives us the ability to isolate lpars that are on the same subnet. lpar to lpar traffic is forced out of the machine. This is what we want.
  • The PVID for this VEPA network is 4094
  • The adapter in the NovaLink partition must be a trunk adapter.
  • It is mandatory to name the VEPA vswitch MGMTSWITCH.
  • At the lpar creation if the MGMTSWITCH exists a new Virtual Ethernet Adapter will be automatically created on the deployed lpar.
  • To be correctly configured the deployed lpar needs the latest level of rsct code (3.2.1.0 for now).
  • The latest cloud-init version must be deploy on the captured lpar used to make the image.
  • You don’t need to configure any addresses on this adapter (on the deployed lpars the adapter is configured with the local-link address (it’s the same thing as 169.254.0.0/16 addresses used in IPv4 format but for IPv6)(please note that any IPv6 adapter must “by design” have a local-link address).

mgmtswitch2

  • Create the virtual switch called MGMTSWITCH in Vepa mode:
  • # pvmctl vswitch create --name MGMTSWITCH --mode=Vepa
    # pvmctl vswitch list  --display-fields VirtualSwitch.name VirtualSwitch.mode 
    name=ETHERNET0,mode=Veb
    name=vdct,mode=Veb
    name=vdcb,mode=Veb
    name=vdca,mode=Veb
    name=MGMTSWITCH,mode=Vepa
    
  • Create a virtual ethernet adapter on the NovaLink partition with the PVID 4094 and a trunk priorty set to 1 (it’s a trunk adapter). Note that we now have two adapters on the NovaLink partition (one in IPv4 (routable) and the other one in IPv6 (non-routable):
  • # pvmctl vea create --pvid 4094 --vswitch MGMTSWITCH --trunk-pri 1 --parent-id name=nova
    # pvmctl vea list --parent-id name=nova
    --------------------------
    | VirtualEthernetAdapter |
    --------------------------
      is_tagged_vlan_supported=False
      is_trunk=False
      loc_code=U8286.41A.216666-V3-C2
      mac=EE3B84FD1402
      pvid=666
      slot=2
      uuid=05a91ab4-9784-3551-bb4b-9d22c98934e6
      vswitch_id=1
    --------------------------
    | VirtualEthernetAdapter |
    --------------------------
      is_tagged_vlan_supported=True
      is_trunk=True
      loc_code=U8286.41A.216666-V3-C34
      mac=B6F837192E63
      pvid=4094
      slot=34
      trunk_pri=1
      uuid=fe7389a8-667f-38ca-b61e-84c94e5a3c97
      vswitch_id=4
    

    Configure the local-link IPv6 address in the NovaLink partition:

    # more /etc/network/interfaces
    [..]
    auto eth1
    iface eth1 inet manual
     up /sbin/ifconfig eth1 0.0.0.0
    # ifup eth1
    # ifconfig eth1
    eth1      Link encap:Ethernet  HWaddr b6:f8:37:19:2e:63
              inet6 addr: fe80::b4f8:37ff:fe19:2e63/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:0 (0.0 B)  TX bytes:1454 (1.4 KB)
              Interrupt:34
    

Capture an AIX host with the latest version of rsct installed (3.2.1.0) or later and the latest version of cloud-init installed. This version of RMC/rsct handle this new feature so this is mandatory to have it installed on the captured host. When PowerVC will deploy a Virtual Machine on a Nova managed host with this version of rsct installed a new adapter with the PVID 4094 in the virtual switch MGMTSWITCH will be created and finally all the RMC traffic will use this adapter instead of your public IP address:

# lslpp -L rsct*
  Fileset                      Level  State  Type  Description (Uninstaller)
  ----------------------------------------------------------------------------
  rsct.core.auditrm          3.2.1.0    C     F    RSCT Audit Log Resource
                                                   Manager
  rsct.core.errm             3.2.1.0    C     F    RSCT Event Response Resource
                                                   Manager
  rsct.core.fsrm             3.2.1.0    C     F    RSCT File System Resource
                                                   Manager
  rsct.core.gui              3.2.1.0    C     F    RSCT Graphical User Interface
  rsct.core.hostrm           3.2.1.0    C     F    RSCT Host Resource Manager
  rsct.core.lprm             3.2.1.0    C     F    RSCT Least Privilege Resource
                                                   Manager
  rsct.core.microsensor      3.2.1.0    C     F    RSCT MicroSensor Resource
                                                   Manager
  rsct.core.rmc              3.2.1.1    C     F    RSCT Resource Monitoring and
                                                   Control
  rsct.core.sec              3.2.1.0    C     F    RSCT Security
  rsct.core.sensorrm         3.2.1.0    C     F    RSCT Sensor Resource Manager
  rsct.core.sr               3.2.1.0    C     F    RSCT Registry
  rsct.core.utils            3.2.1.1    C     F    RSCT Utilities

When this image will be deployed a new adapter will be created in the MGMTSWITCH virtual switch, an IPv6 local-link address will be configured on it. You can check the cloud-init activation to see the IPv6 address is configured at the activation time:

# pvmctl vea list --parent-id name=aix72-2-0a0de5c5-00000095
--------------------------
| VirtualEthernetAdapter |
--------------------------
  is_tagged_vlan_supported=True
  is_trunk=False
  loc_code=U8286.41A.216666-V5-C32
  mac=FA620F66FF20
  pvid=3331
  slot=32
  uuid=7f1ec0ab-230c-38af-9325-eb16999061e2
  vswitch_id=1
--------------------------
| VirtualEthernetAdapter |
--------------------------
  is_tagged_vlan_supported=True
  is_trunk=False
  loc_code=U8286.41A.216666-V5-C33
  mac=46A066611B09
  pvid=4094
  slot=33
  uuid=560c67cd-733b-3394-80f3-3f2a02d1cb9d
  vswitch_id=4
# ifconfig -a
en0: flags=1e084863,14c0
        inet 10.10.66.66 netmask 0xffffff00 broadcast 10.14.33.255
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
en1: flags=1e084863,14c0
        inet6 fe80::c032:52ff:fe34:6e4f/64
         tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
sit0: flags=8100041
        inet6 ::10.10.66.66/96
[..]

Note that the local-link address is configured at the activation time (fe80 starting addresses):

# more /var/log/cloud-init-output.log
[..]
auto eth1

iface eth1 inet6 static
    address fe80::c032:52ff:fe34:6e4f
    hwaddress ether c2:32:52:34:6e:4f
    netmask 64
    pre-up [ $(ifconfig eth1 | grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}') = "c2:32:52:34:6e:4f" ]
        dns-search fr.net.intra
# entstat -d ent1 | grep -iE "switch|vlan"
Invalid VLAN ID Packets: 0
Port VLAN ID:  4094
VLAN Tag IDs:  None
Switch ID: MGMTSWITCH

To be sure all is working correctly here is a proof test. I’m taking down the en0 interface on which the IPv4 public address is configured. Then I’m launching a tcpdump on the en1 (on the MGMTSWITCH address). Finally I’m resizing the Virtual Machine with PowerVC. AND EVERYTHING IS WORKING GREAT !!!! AWESOME !!! :-) (note the fe80 to fe80 communication):

# ifconfig en0 down detach ; tcpdump -i en1 port 657
tcpdump: WARNING: en1: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on en1, link-type 1, capture size 96 bytes
22:00:43.224964 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: S 4049792650:4049792650(0) win 65535 
22:00:43.225022 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: S 2055569200:2055569200(0) ack 4049792651 win 28560 
22:00:43.225051 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: . ack 1 win 32844 
22:00:43.225547 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 1:209(208) ack 1 win 32844 
22:00:43.225593 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: . ack 209 win 232 
22:00:43.225638 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 1:97(96) ack 209 win 232 
22:00:43.225721 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 209:377(168) ack 97 win 32844 
22:00:43.225835 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 97:193(96) ack 377 win 240 
22:00:43.225910 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 377:457(80) ack 193 win 32844 
22:00:43.226076 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 193:289(96) ack 457 win 240 
22:00:43.226154 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 457:529(72) ack 289 win 32844 
22:00:43.226210 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 289:385(96) ack 529 win 240 
22:00:43.226276 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 529:681(152) ack 385 win 32844 
22:00:43.226335 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 385:481(96) ack 681 win 249 
22:00:43.424049 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: . ack 481 win 32844 
22:00:44.725800 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 88
22:00:44.726111 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 88
22:00:50.137605 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 632
22:00:50.137900 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 88
22:00:50.183108 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 408
22:00:51.683382 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 408
22:00:51.683661 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 88

To be sure security requirements are met from the lpar I’m pinging the NovaLink host (the first one) which is answering and then I’m pinging the second lpar (the second ping) which is not working. (And this is what we want !!!).

# ping fe80::d09e:aff:fecf:a868
PING fe80::d09e:aff:fecf:a868 (fe80::d09e:aff:fecf:a868): 56 data bytes
64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=0 ttl=64 time=0.203 ms
64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=1 ttl=64 time=0.206 ms
64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=2 ttl=64 time=0.216 ms
^C
--- fe80::d09e:aff:fecf:a868 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0/0/0 ms
# ping fe80::44a0:66ff:fe61:1b09
PING fe80::44a0:66ff:fe61:1b09 (fe80::44a0:66ff:fe61:1b09): 56 data bytes
^C
--- fe80::44a0:66ff:fe61:1b09 ping statistics ---
2 packets transmitted, 0 packets received, 100% packet loss

PowerVC 1.3.0.1 Dynamic Resource Optimizer

In addition to the NovaLink part of this blog post I also wanted to talk about the killer app of 2016. Dynamic Resource Optimizer. This feature can be used on any PowerVC 1.3.0.1 managed hosts (you obviously need at least to hosts). DRO is in charge to re-balance your Virtual Machines across all the available hosts (in the host-group). To sum up if a host is experiencing an heavy load and reaching a certain amount of CPU consumption over a period of time, DRO will move your virtual machines to re-balance the load across all the available hosts (this is done at a host level). Here are a few details about DRO:

  • The DRO configuration is done at a host level.
  • You setup a threshold (in the capture below) to reach to trigger the Live Partition Moblity or Mobily Cores movements (Power Entreprise Pool).
  • droo6
    droo3

  • To be triggered this threshold must be reached a certain number of time (stabilization) over a period you are defining (run interval).
  • You can choose to move virtual machines using Live Partition Mobilty, or to move “cores” using Power Entreprise Pool (you can do both; moving CPU will always be preferred as moving partitions)
  • DRO can be run in advise mode (nothing is done, a warning is thrown in the new DRO events tab) or in active mode (which is doing the job and moving things).
    droo2
    droo1

  • Your most critical virtual machines can be excluded from DRO:
  • droo5

How is DRO choosing which machines are moved

I’m running DRO in production since now one month and I had the time to check what is going on behind the scene. How is DRO choosing which machines are moved when a Live Partition Moblity operation must be run to face an heavy load on a host ? To do so I decided to launch 3 different cpuhog (16 forks, 4VP, SMT4) processes (which are eating CPU ressource) on three different lpars with 4VP each. On the PowerVC I can check that before launching this processes the CPU consumption is ok on this host (the three lpars are running on the same host) :

droo4

# cat cpuhog.pl
#!/usr/bin/perl

print "eating the CPUs\n";

foreach $i (1..16) {
      $pid = fork();
      last if $pid == 0;
      print "created PID $pid\n";
}

while (1) {
      $x++;
}
# perl cpuhog.pl
eating the CPUs
created PID 47514604
created PID 22675712
created PID 3015584
created PID 21496152
created PID 25166098
created PID 26018068
created PID 11796892
created PID 33424106
created PID 55444462
created PID 65077976
created PID 13369620
created PID 10813734
created PID 56623850
created PID 19333542
created PID 58393312
created PID 3211988

I’m waiting a couple of minutes and I realize that the virtual machines on which the cpuhog processes were launched are the ones which are migrated. So we can say that PowerVC is moving the machine that are eating CPU (another strategy could be to move all the non-eating CPU machines to let the working ones do their job without launching a mobility operation).

# errpt | head -3
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A5E6DB96   0118225116 I S pmig           Client Partition Migration Completed
08917DC6   0118225116 I S pmig           Client Partition Migration Started

After the moves are ok I can see that the load is now ok on the host. DRO has done the job for me and moved the lpar to met the configured thresold ;-)

droo7dro_effect

The images below will show you a good example of the “power” of PowerVC and DRO. To update my Virtual I/O Servers to the latest version the PowerVC maintenance mode was used to free up the Virtual I/O Servers. After leaving the maintenance mode the DRO was doing the job to re-balance the Virtual Machines across all the hosts (The red arrows symbolize the maintenance mode action and the purple ones the DRO actions). You can also see that some lpars were moved across 4 different hosts during this process. All these pictures are taken from real life experience on my production systems. This not a lab environment, this is one part of my production. So yes DRO and PowerVC 1.3.0.1 are production ready. Hell yes!

real1
real2
real3
real4
real5

Conclusion

As my environment is growing bigger the next step for me will be to move on NovaLink on my P8 hosts. Please note that the NovaLink Co-Management feature is today a “TechPreview” but should be released GA very soon. Talking about DRO I was waiting for that for years and it finally happens. I can assure you that it is production ready, to prove this I’ll just give you this number. To upgrade my Virtual I/O Servers to 2.2.4.10 release using PowerVC maintenance mode and DRO more than 1000 Live Partition Mobility moves were performed without any outage on production servers and during working hours. Nobody in my company was aware of this during the operations. It was a seamless experience for everybody.

What’s new in VIOS 2.2.4.10 and PowerVM : Part 1 Virtual I/O Server Rules

I will post a series of mini blog posts about new features of PowerVM and Virtual I/O Server that are release this month. By this I mean Hardware Management Console 840 + Power firmware 840 + Virtual I/O Sever 2.2.4.10. As writing blog posts is not a part of my job and that I’m doing in that in my spare time some of the topics I will talk about have already been covered by other AIX bloggers but I think the more materials we have and the better it is. Other ones like this first one will be new to you. So please accept my apologize if topics are not what I’m calling “0 day” (the day of release). Anyway writing things help me to understand better and I add little details I have not seen in others blog post or in official documentation. Last point I will always try in these mini posts to give something new to you at least my point of view as an IBM customer. I hope it will be useful for you.

The first topic I want to talk about is Virtual I/O Server Rules. With the latest version three new commands called “rules” and “rulescfgset” and “rulesdeploy” are now available in the Virtual I/O Servers. Theses ones helps you configure your devices attributes by creating, deploying, or checking rules (with the current configuration). I’m 100% sure that every time you are installing a Virtual I/O Server you are doing the same thing over and over again: you check your buffers attributes, you check attributes on fiber channels adapters and so on. The rules is a way to be sure everything is the same on all your Virtual I/O Servers (you can create a rule file (xml format) that can be deploy on every Virtual I/O Server you install). Even better, if you are a PowerVC user like me you want to be sure that any new device created by PowerVC are created with the attributes you want (for instance buffer for Virtual Ethernet Adapters). In the “old days” you have to use the chdef command, you can now do this by using the rules. Better than giving you a list of command I’ll show you here what I’m now doing on my Virtual I/O Server in 2.2.4.10.

Creating and modifying existing default rules

Before starting here are (a non exhaustive list) the attributes I’m changing on all my Virtual I/O Servers at deploy time. I now want to do that using the rules (these are just examples, you can do much more using the rules):

  • On fcs Adapters I’m changing the max_xfer_size attribute to 0x200000.
  • On fcs Adapters I’m changing the num_cmd_elems attribute to 2048.
  • On fscsi Devices I’m changing the dyntrk attribute to yes.
  • On fscsi Devices I’m changing the fc_err_recov to fast_fail.
  • On Virtual Ethernet Adapters I’m changing the max_buf_tiny attribute to 4096.
  • On Virtual Ethernet Adapters I’m changing the min_buf_tiny attribute to 4096.
  • On Virtual Ethernet Adapters I’m changing the max_buf_small attribute to 4096.
  • On Virtual Ethernet Adapters I’m changing the min_buf_small attribute to 4096.
  • On Virtual Ethernet Adapters I’m changing the max_buf_medium attribute to 512.
  • On Virtual Ethernet Adapters I’m changing the min_buf_medium attribute to 512.
  • On Virtual Ethernet Adapters I’m changing the max_buf_large attribute to 128.
  • On Virtual Ethernet Adapters I’m changing the min_buf_large attribute to 128.
  • On Virtual Ethernet Adapters I’m changing the max_buf_huge attribute to 128.
  • On Virtual Ethernet Adapters I’m changing the min_buf_huge attribute to 128.

Modify existing attributes using rules

By default a “factory” default rule file now exist in the Virtual I/O Server. This one is located in /home/padmin/rules/vios_current_rules.xml, you can check the content of the file (it’s an xml file) and list the rules contains in it:

# ls -l /home/padmin/rules
total 40
-r--r-----    1 root     system        17810 Dec 08 18:40 vios_current_rules.xml
$ oem_setup_env
# head -10 /home/padmin/rules/vios_current_rules.xml
<?xml version="1.0" encoding="UTF-8"?>
<Profile origin="get" version="3.0.0" date="2015-12-08T17:40:37Z">
 <Catalog id="devParam.disk.fcp.mpioosdisk" version="3.0">
  <Parameter name="reserve_policy" value="no_reserve" applyType="nextboot" reboot="true">
   <Target class="device" instance="disk/fcp/mpioosdisk"/>
  </Parameter>
 </Catalog>
 <Catalog id="devParam.disk.fcp.mpioapdisk" version="3.0">
  <Parameter name="reserve_policy" value="no_reserve" applyType="nextboot" reboot="true">
   <Target class="device" instance="disk/fcp/mpioapdisk"/>
[..]
$ rules -o list -d

Let’s now say you have an existing Virtual I/O Server with en existing SEA configured on it. You want two things by using the rules:

  • Applying the rules to modify to the existing devices.
  • Be sure that new devices will be created using the rules.

For the purpose of this example we will work here on the buffers attributes of a Virtual Network Adapter (same concepts are applying to other devices type). So we have an SEA with Virtual Network Adapters and we want to change the buffers attributes. Let’s first check the current values of the Virtual Adapters:

$ lsdev -type adapter | grep -i Shared
ent13            Available   Shared Ethernet Adapter
$ lsdev -dev ent13 -attr virt_adapters
value

ent8,ent9,ent10,ent11
$ lsdev -dev ent8 -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny
value

64
64
256
2048
2048
24
24
128
512
512
$ lsdev -dev ent9 -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny
value

64
64
256
2048
2048
24
24
128
512
512

Let’s now check the value in the current Virtual I/O Servers rules:

$ rules -o list | grep buf
adapter/vdevice/IBM,l-lan      max_buf_tiny         2048
adapter/vdevice/IBM,l-lan      min_buf_tiny         512
adapter/vdevice/IBM,l-lan      max_buf_small        2048
adapter/vdevice/IBM,l-lan      min_buf_small        512

For the tiny and small buffer I can change the rules easily using the rules command (using modify operation):

$ rules -o modify -t adapter/vdevice/IBM,l-lan -a max_buf_tiny=4096
$ rules -o modify -t adapter/vdevice/IBM,l-lan -a min_buf_tiny=4096
$ rules -o modify -t adapter/vdevice/IBM,l-lan -a max_buf_small=4096
$ rules -o modify -t adapter/vdevice/IBM,l-lan -a min_buf_small=4096

I’m re-running the rules command to check rules are now modified :

$ rules -o list | grep buf
adapter/vdevice/IBM,l-lan      max_buf_tiny         4096
adapter/vdevice/IBM,l-lan      min_buf_tiny         4096
adapter/vdevice/IBM,l-lan      max_buf_small        4096
adapter/vdevice/IBM,l-lan      min_buf_small        4096

I can check the current values of my system against the current defined rules by using the diff operation:

# rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096

Creating new attributes using rules

In the current Virtual I/O Server rules embedded with the current Virtual I/O Server release there are no existing rules for the medium, large and huge buffer. Unfortunately for me I’m modifying these attributes by default and I want a rule capable of doing that. The goal is now to create a new set of rules for the other buffers not already present in the default file … Let’s try to do that using the add operation:

# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_medium=512
The rule is not supported or does not exist.

Annoying, I can’t add a rule for the medium buffer (same for the large and huge ones). The available attributes for each device is based on the current AIX artex catalog. You can check all the files present in the catalog to check what are the available attributes for each device type, you can see in the output below that there is nothing in the current ARTEX catalog for the medium buffer.

$ oem_setup_env
# cd /etc/security/artex/catalogs
# ls -ltr | grep l-lan
-r--r-----    1 root     security       1261 Nov 10 00:30 devParam.adapter.vdevice.IBM,l-lan.xml
# grep medium devParam.adapter.vdevice.IBM,l-lan.xml
# 

To show that this is possible to add new rules I’ll show you a simple example to add the new ‘src_lun_val’ and ‘dst_lun_val’ on the vioslpm0 device. First I check that I can add this rules by looking in the ARTEX catalog:

$ oem_setup_env
# cd /etc/security/artex/catalogs
# ls -ltr | grep lpm
-r--r-----    1 root     security       2645 Nov 10 00:30 devParam.pseudo.vios.lpm.xml
# grep -iE "src_lun_val|dest_lun_val" devParam.pseudo.vios.lpm.xml
  <ParameterDef name="dest_lun_val" type="string" targetClass="device" cfgmethod="attr" reboot="true">
  <ParameterDef name="src_lun_val" type="string" targetClass="device" cfgmethod="attr" reboot="true">

Then I’m checking the ‘range’ of authorized values for both attributes:

# lsattr -l vioslpm0 -a src_lun_val -R
on
off
# lsattr -l vioslpm0 -a dest_lun_val -R
on
off
restart_off
lpm_off

I’m searching the type using the lsdev command (here pseudo/vios/lpm):

# lsdev -P | grep lpm
pseudo         lpm             vios           VIOS LPM Adapter

I’m finally adding the rules and checking the differences:

$ rules -o add -t pseudo/vios/lpm -a src_lun_val=on
$ rules -o add -t pseudo/vios/lpm -a dest_lun_val=on
$ rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on

But what about my buffers, is there any possibility to add these attributes in the current ARTEX catalog. The answer is yes. By looking in catalog used for Virtual Ethernet Adapters (file named: devParam.adapter.vdevice.IBM,l-lan.xml) you will see that a catalog file named ‘vioent.cat’ is utilized by this xml file. Check the content of this catalog file by using the dspcat command and find if there is anything related to medium, large and huge buffers (all the catalogs files are location in /usr/lib/methods):

$ oem_setup_env
# cd /usr/lib/methods
# dspcat vioent.cat |grep -iE "medium|large|huge"
1 : 10 Minimum Huge Buffers
1 : 11 Maximum Huge Buffers
1 : 12 Minimum Large Buffers
1 : 13 Maximum Large Buffers
1 : 14 Minimum Medium Buffers
1 : 15 Maximum Medium Buffers

Modify the xml file located in the ARTEX catalog and add the necessary information for these three new buffers type:

$ oem_setup_env
# vi /etc/security/artex/catalogs/devParam.adapter.vdevice.IBM,l-lan.xml
<?xml version="1.0" encoding="UTF-8"?>

<Catalog id="devParam.adapter.vdevice.IBM,l-lan" version="3.0" inherit="devCommon">

  <ShortDescription><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="1">Virtual I/O Ethernet Adapter (l-lan)</NLSCatalog></ShortDescription>

  <ParameterDef name="min_buf_huge" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="10">Minimum Huge Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="max_buf_huge" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="11">Maximum Huge Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="min_buf_large" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="12">Minimum Large Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="max_buf_large" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="13">Maximum Large Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="min_buf_medium" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="14">Minimum Medium Buffers<</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="max_buf_medium" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="15">Maximum Medium Buffers</NLSCatalog></Description>
  </ParameterDef>

[..]
  <ParameterDef name="max_buf_tiny" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="19">Maximum Tiny Buffers</NLSCatalog></Description>
  </ParameterDef>


Then I’m retrying to add the rules of the medium,large and huge buffers …. and it’s working great:

# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_medium=512
# rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_medium=512
# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_huge=128
# rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_huge=128
# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_large=128
# rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_large=128

Deploying the rules

Now that a couple of rules are defined let’s now apply them on the Virtual I/O server. First check the differences you will get after applying the rules by using the diff operation of the rules command:

$ rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_medium device=adapter/vdevice/IBM,l-lan   256 | 512
devParam.adapter.vdevice.IBM,l-lan:min_buf_medium device=adapter/vdevice/IBM,l-lan   128 | 512
devParam.adapter.vdevice.IBM,l-lan:max_buf_huge device=adapter/vdevice/IBM,l-lan      64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_huge device=adapter/vdevice/IBM,l-lan      24 | 128
devParam.adapter.vdevice.IBM,l-lan:max_buf_large device=adapter/vdevice/IBM,l-lan     64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_large device=adapter/vdevice/IBM,l-lan     24 | 128
devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on

Let’s now deploy the rules using the deploy operation of the rules command, you can notice that for some rules a mandatory reboot is needed to change the existing devices this is the case for the buffers, but not for the vioslpm0 attributes (we can check again that we now have no differences … some attributes are applied using the -P attribute of the chdev command):

$ rules -o deploy 
A manual post-operation is required for the changes to take effect, please reboot the system.
$ lsdev -dev ent8 -attr min_buf_small
value

4096
 lsdev -dev vioslpm0 -attr src_lun_val
value

on
$ rules -o diff -s

Don’t forget to reboot the Virtual I/O Server and check everything is ok after the reboot (check the kernel values by using enstat):

$ shutdown -force -restart
[..]
$ for i in ent8 ent9 ent10 ent11 ; do lsdev -dev $i -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny ; done
[..]
128
128
512
4096
4096
128
128
512
4096
4096
$ entstat -all ent13 | grep -i buf
[..]
No mbuf Errors: 0
  Transmit Buffers
    Buffer Size             65536
    Buffers                    32
      No Buffers                0
  Receive Buffers
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              4096     4096      512      128      128
    Max Buffers              4096     4096      512      128      128

For the fibre channels adapters I’m using theses rules:

$ rules -o modify -t driver/iocb/efscsi -a dyntrk=yes
$ rules -o modify -t driver/qliocb/qlfscsi -a dyntrk=yes
$ rules -o modify -t driver/qiocb/qfscsi -a dyntrk=yes
$ rules -o modify -t driver/iocb/efscsi -a fc_err_recov=fast_fail
$ rules -o modify -t driver/qliocb/qlfscsi -a fc_err_recov=fast_fail
$ rules -o modify -t driver/qiocb/qfscsi -a fc_err_recov=fast_fail

What about new devices ?

Let’s now create a new SEA by adding new Virtual Ethernet Adapter using DLPAR and check the devices are created with the good values. (I’m not showing you here how to create the VEA I’m doing it the GUI for simplicity) (14,15,16,17 are the new ones):

$ lsdev | grep ent
ent12            Available   EtherChannel / IEEE 802.3ad Link Aggregation
ent13            Available   Shared Ethernet Adapter
ent14            Available   Virtual I/O Ethernet Adapter (l-lan)
ent15            Available   Virtual I/O Ethernet Adapter (l-lan)
ent16            Available   Virtual I/O Ethernet Adapter (l-lan)
ent17            Available   Virtual I/O Ethernet Adapter (l-lan)
$ lsdev -dev ent14 -attr
buf_mode        min            Receive Buffer Mode                        True
copy_buffs      32             Transmit Copy Buffers                      True
max_buf_control 64             Maximum Control Buffers                    True
max_buf_huge    128            Maximum Huge Buffers                       True
max_buf_large   128            Maximum Large Buffers                      True
max_buf_medium  512            Maximum Medium Buffers                     True
max_buf_small   4096           Maximum Small Buffers                      True
max_buf_tiny    4096           Maximum Tiny Buffers                       True
min_buf_control 24             Minimum Control Buffers                    True
min_buf_huge    128            Minimum Huge Buffers                       True
min_buf_large   128            Minimum Large Buffers                      True
min_buf_medium  512            Minimum Medium Buffers                     True
min_buf_small   4096           Minimum Small Buffers                      True
min_buf_tiny    4096           Minimum Tiny Buffers                       True
$  mkvdev -sea ent0 -vadapter ent14 ent15 ent16 ent17 -default ent14 -defaultid 14 -attr ha_mode=sharing largesend=1 large_receive=yes
ent18 Available
$ entstat -all ent18 | grep -i buf
No mbuf Errors: 0
  Transmit Buffers
    Buffer Size             65536
    Buffers                    32
      No Buffers                0
  Receive Buffers
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              4096     4096      512      128      128
    Max Buffers              4096     4096      512      128      128
  Buffer Mode: Min
[..]

Deploying these rules to another Virtual I/O Server

The goal is now to use this rule file and deploy it on all my Virtual I/O Servers to be sure all the attributes are the same on all the Virtual I/O Servers.

I’m copying my rule file and copy it to another Virtual I/O Server:

$ oem_setup_env
# cp /home/padmin/rules
# scp /home/padmin/rules/custom_rules.xml anothervios:/home/padmin/rules
custom_rules.xml                   100%   19KB  18.6KB/s   00:00
# scp /etc/security/artex/catalogs/devParam.adapter.vdevice.IBM,l-lan.xml anothervios:/etc/security/artex/catalogs/
devParam.adapter.vdevice.IBM,l-lan.xml
devParam.adapter.vdevice.IBM,l-lan.xml    100% 2737     2.7KB/s   00:00

I’m now connecting to the new Virtual I/O Server and applying the rules:

$ rules -o import -f /home/padmin/rules/custom_rules.xml
$ rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_medium device=adapter/vdevice/IBM,l-lan   256 | 512
devParam.adapter.vdevice.IBM,l-lan:min_buf_medium device=adapter/vdevice/IBM,l-lan   128 | 512
devParam.adapter.vdevice.IBM,l-lan:max_buf_huge device=adapter/vdevice/IBM,l-lan      64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_huge device=adapter/vdevice/IBM,l-lan      24 | 128
devParam.adapter.vdevice.IBM,l-lan:max_buf_large device=adapter/vdevice/IBM,l-lan     64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_large device=adapter/vdevice/IBM,l-lan     24 | 128
devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on
$ rules -o deploy
A manual post-operation is required for the changes to take effect, please reboot the system.
$ entstat -all ent18 | grep -i buf
[..]
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers               512      512      128       24       24
    Max Buffers              2048     2048      256       64       64
[..]
$ shutdown -force -restart
$ entstat -all ent18 | grep -i buf
[..]
   Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              4096     4096      512      128      128
    Max Buffers              4096     4096      512      128      128
[..]

rulescfgset

If you don’t care at all about creating your own rules you can just use the rulecfgset command as padmin to apply default Virtual I/O Server rules, my advice for newbies is to do that just after the Virtual I/O Server is installed. By doing that you will be sure to have the default IBM rules. It is a good pratice to do that every time you will deploy a new Virtual I/O Server.

# rulescfgset

Conclusion

Use rules ! It is a good way to be sure your Virtual I/O Server devices attributes are the same. I hope my example are good enough to convince you to use it. For PowerVC user like me using rules is a must. As PowerVC is creating devices for you, you want to be sure all your devices are created with the exact same attributes. My example about Virtual Ethernet Adapter buffers is just a mandatory thing to do now for PowerVC users. As always I hope it helps.

A first look at SRIOV vNIC adapters

I have the chance to participate in the current Early Shipment Program (ESP) for Power Systems, especially the software part. One of my tasks is to test a new feature called SRIOV vNIC. For those who does not know anything about SRIOV this technology is comparable to LHEA except it is based on a industry standard (and have a couple of other features). By using SRIOV adapter you can divide a physical port into what we call a Virtual Function (or a Logical Port) and map this Virtual Function to a partition. You can also set “Quality Of Service” on these Virtual Functions. At the creation you will setup the Virtual Function allowing it to take a certain percentage of the physical port. These can be very useful if you want to be sure that your production server will always have a guaranteed bandwidth instead of using a Shared Ethernet Adapter where every clients partitions are competing for the bandwidth. Customers are also using SRIOV adapters for performance purpose ; as nothing is going through the Virtual I/O Server the latency added by this action is eliminated and CPU cycles are saved on the Virtual I/O Server side (Shared Ethernet Adapter consume a lot of CPU cycles). If you are not aware of what SRIOV is I encourage you to check the IBM Redbook about it (http://www.redbooks.ibm.com/abstracts/redp5065.html?Open. Unfortunately you can’t move a partition by using Live Partition Mobility if this one have a Virtual Function assigned to it. Using vNICs allows you to use SRIOV through the Virtual I/O Servers and enable the possibility to move your partition even if you are using an SRIOV logical port. The better of two worlds : performance/qos and virtualization. Is this the end of the Shared Ethernet Adapter ?

SRIOV vNIC, what’s this ?

Before talking about the technical details it is important to understand what vNICs are. When I’m explaining this to newbies I often refer to NPIV. Imagine something similar as the NPIV but for the network part. By using SRIOV vNIC:

  • A Virtual Function (SRIOV Logical Port) is created and assigned to the Virtual I/O Server.
  • A vNIC adapter is created in the client partition.
  • The Virtual Function and the vNIC adapter are linked (mapped) together.
  • This is a one to one relationship between a Virtual Function and a vNIC (like a vfcs adapter is a one to one relationship between your vfcs and the physical fiber channel adapter).

On the image below, the vNIC lpars are the “yellow” ones, you can see here that the SRIOV adapter is divided in different Virtual Function, and some of them are mapped to the Virtual I/O Server. The relationship between the Virtual Function and the vNIC is achieved by a vnicserver (this is a special Virtual I/O Server device).
vNIC

One of the major advantage of using vNIC is that you eliminate the need of the Virtual I/O Server for data flows:

  • The network data flow is direct between the partition memory and the SRIOV adapter, there is no data copy passing through the Virtual I/O Server and it eliminate the CPU cost and the latency of doing that. This is achieved by LRDMA. Pretty cool !
  • The vNIC will inherits the bandwidth allocation of the Virtual Function (QoS). If the VF is configured with a capacity of 2% the vNIC will also have this capacity.
  • vNIC2

vNIC Configuration

Before checking all the details on how to configure an SRIOV vNIC adapter you have to check all the prerequisites. As this is a new feature you will need the latest level of …. everything. My advice is to stay up to date as much as possible.

vNIC Prerequisites

These outputs are taken from the early shipment program. All of this can be changed at the GA release:

  • Hardware Management Console v840:
  • # lshmc -V
    lshmc -V
    "version= Version: 8
     Release: 8.4.0
     Service Pack: 0
    HMC Build level 20150803.3
    ","base_version=V8R8.4.0
    "
    
  • Power 8 only, firmware 840 at least (both enterprise and scale out systems):
  • firmware

  • AIX 7.1TL4 or AIX 7.2:
  • # oslevel -s
    7200-00-00-0000
    # cat /proc/version
    Oct 20 2015
    06:57:03
    1543A_720
    @(#) _kdb_buildinfo unix_64 Oct 20 2015 06:57:03 1543A_720
    
  • Obviously at least on SRIOV capable adapter!

Using the HMC GUI

The configuration of a vNIC is done at the partition level. The configuration is only available on the enhanced version of the GUI. Select the virtual machine on which you want to add the vNIC and in the Virtual I/O tab you’ll see that a new Virtual NICs session is here. Click on “Virtual NICs” and a new panel will be opened with a new button called “Add Virtual NIC”, just click this one to add a Virtual NIC:

vnic_n1
vnic_conf2

All the SRIOV capable port will be displayed on the next screen. Choose the SRIOV port you want (a virtual function will be created on this one. Don’t do anything more, the creation of a vNIC will automatically create a Virtual Function; assign it to Virtual I/O Server and do the mapping to the vNIC for you). Choose the Virtual I/O Server that will be used for this vNIC (the vNIC server will be created on this Virtual I/O Server. Don’t worry we will talk about vNIC redundancy later in this post) and the Virtual NIC Capacity (the percentage the Phyiscal SRIOV port that will be dedicated to this vNIC)(this has to be a multiple of 2)(be careful with that it can’t be changed afterwards and you’ll have to delete your vNIC to redo the configuration) :

vnic_conf3

The “Advanced Virtual NIC Settings” allows you to choose the Virtual NIC Adapter ID, choosing a MAC Address, and configuring the vlan restrictions and vlan tagging. In the example below I’m configuring my Virtual NIC in the vlan 310:

vnic_conf4
vnic_conf5
allvnic

Using the HMC Command Line

As always the configuration can be achieved using the HMC command line, using lshwres to list vNIC and chhwres to create a vNIC.

List SRIOV adapters to get the adapter_id needed by the chhwres command:

# lshwres -r sriov --rsubtype adapter -m blade-8286-41A-21AFFFF
adapter_id=1,slot_id=21020014,adapter_max_logical_ports=48,config_state=sriov,functional_state=1,logical_ports=48,phys_loc=U78C9.001.WZS06RN-P1-C12,phys_ports=4,sriov_status=running,alternate_config=0
# lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0

Create the vNIC:

# chhwres -r virtualio -m blade-8286-41A-21AFFFF -o a -p 72vm1 --rsubtype vnic -v -a "port_vlan_id=310,backing_devices=sriov/vios2/1/1/1/2"

List the vNIC after create:

# lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0
lpar_name=72vm1,lpar_id=9,slot_num=2,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87702,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios2/1/1/1/2700400a/2.0

System and Virtual I/O Server Side:

  • On the Virtual I/O Server you can use two commands to check your vNIC configuration. You can first use the lsmap command to check the one to one relationship between the VF and the vNIC (you see on the output below that a VF and a vnicserver device are created)(you can also see the name of the vNIC in the client partition side) :
  • # lsdev | grep VF
    ent4             Available   PCIe2 100/1000 Base-TX 4-port Converged Network Adapter VF (df1028e214103c04)
    # lsdev | grep vnicserver
    vnicserver0      Available   Virtual NIC Server Device (vnicserver)
    # lsmap -vadapter vnicserver0 -vnic
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver0   U8286.41A.21FFFFF-V2-C32897             6 72nim1         AIX
    
    Backing device:ent4
    Status:Available
    Physloc:U78C9.001.WZS06RN-P1-C12-T4-S16
    Client device name:ent1
    Client device physloc:U8286.41A.21FFFFF-V6-C3
    
  • You can get more details (QoS, vlan tagging, port states) by using the vnicstat command:
  • # vnicstat -b vnicserver0
    [..]
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver0
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent4
    
    Client Partition ID: 6
    Client Partition Name: 72nim1
    Client Operating System: AIX
    Client Device Name: ent1
    Client Device Location Code: U8286.41A.21FFFFF-V6-C3
    [..]
    Device ID: df1028e214103c04
    Version: 1
    Physical Port Link Status: Up
    Logical Port Link Status: Up
    Physical Port Speed: 1Gbps Full Duplex
    [..]
    Port VLAN (Priority:ID): 0:3331
    [..]
    VF Minimum Bandwidth: 2%
    VF Maximum Bandwidth: 100%
    
  • On the client side you can list your vNIC and as always have details using the entstat command:
  • # lsdev -c adapter -s vdevice -t IBM,vnic
    ent0 Available  Virtual NIC Client Adapter (vnic)
    ent1 Available  Virtual NIC Client Adapter (vnic)
    ent3 Available  Virtual NIC Client Adapter (vnic)
    ent4 Available  Virtual NIC Client Adapter (vnic)
    # entstat -d ent0 | more
    [..]
    ETHERNET STATISTICS (ent0) :
    Device Type: Virtual NIC Client Adapter (vnic)
    [..]
    Virtual NIC Client Adapter (vnic) Specific Statistics:
    ------------------------------------------------------
    Current Link State: Up
    Logical Port State: Up
    Physical Port State: Up
    
    Speed Running:  1 Gbps Full Duplex
    
    Jumbo Frames: Disabled
    [..]
    Port VLAN ID Status: Enabled
            Port VLAN ID: 3331
            Port VLAN Priority: 0
    

Redundancy

You will certainly agree that having a such new cool feature without having something that is fully redundant would be a shame. Hopefully we have here a solution with the return with a great fanfare of the Network Interface Backup (NIB). As I told you before each time a vNIC is created a vnicserver is created on one of the Virtual I/O Server. (At the vNIC creation you have to choose on which Virtual I/O server it will be created). So to be fully redundant and to have a failover feature the only way is to create two vNIC adapters (one using the first Virtual I/O Server and the second one using the second Virtual I/O Server, on top of this you then have to create a Network Interface Backup, like in the old times :-) ). Here are a couple of things and best practices to know before doing this.

  • You can’t use two VF coming from the same SRIOV adapter physical port (the NIB creation will be ok, but any configuration on top of this NIB will fail).
  • You can use two VF coming from the same SRIOV adapter but with two different logical ports (this is the example I will show below).
  • The best partice is to use two VF coming from two different SRIOV adapters (you can then afford to loose one of the two SRIOV adapter).

vNIC_nib

Verify on your partition that you have two vNIC adapters and check that the status are ok using the ‘entstat‘ command:

  • Both vNIC are available on the client partition:
  • # lsdev -c adapter -s vdevice -t IBM,vnic
    ent0 Available  Virtual NIC Client Adapter (vnic)
    ent1 Available  Virtual NIC Client Adapter (vnic)
    # lsdev -c adapter -s vdevice -t IBM,vnic -F physloc
    U8286.41A.21FFFFF-V6-C2
    U8286.41A.21FFFFF-V6-C3
    
  • You can check on the first Virtual I/O Server that “Current Link State”, “Logical Port State” and “Physical Port State” are ok (all of them needs to be up):
  • # entstat -d ent0 | grep -p vnic
    -------------------------------------------------------------
    ETHERNET STATISTICS (ent0) :
    Device Type: Virtual NIC Client Adapter (vnic)
    Hardware Address: ee:3b:86:f6:45:02
    Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
    
    Virtual NIC Client Adapter (vnic) Specific Statistics:
    ------------------------------------------------------
    Current Link State: Up
    Logical Port State: Up
    Physical Port State: Up
    
  • Same on the second Virtual I/O Server:
  • # entstat -d ent1 | grep -p vnic
    -------------------------------------------------------------
    ETHERNET STATISTICS (ent1) :
    Device Type: Virtual NIC Client Adapter (vnic)
    Hardware Address: ee:3b:86:f6:45:03
    Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
    
    Virtual NIC Client Adapter (vnic) Specific Statistics:
    ------------------------------------------------------
    Current Link State: Up
    Logical Port State: Up
    Physical Port State: Up
    

Verify on both Virtual I/O Server that the two vNIC are coming from two different SRIOV adapters (for the purpose of this test I’m using two different ports on the same SRIOV adapters but it remains the same with two different adapters). You can see on the output below that on Virtual I/O Server 1 the vNIC is backed to the adapter on position 3 (T3) and that on Virtual I/O Server 2 the vNIC is backed to the adapter on position 4 (T4):

  • Once again use the lsmap command on the first Virtual I/O Server to check that (note that you can check the client name, and the client device):
  • # lsmap -vadapter vnicserver0 -vnic
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver0   U8286.41A.21AFF8V-V1-C32897             6 72nim1         AIX
    
    Backing device:ent4
    Status:Available
    Physloc:U78C9.001.WZS06RN-P1-C12-T3-S13
    Client device name:ent0
    Client device physloc:U8286.41A.21AFF8V-V6-C2
    
  • Same thing on the second Virtual I/O Server:
  • # lsmap -vadapter vnicserver0 -vnic -fmt :
    vnicserver0:U8286.41A.21AFF8V-V2-C32897:6:72nim1:AIX:ent4:Available:U78C9.001.WZS06RN-P1-C12-T4-S14:ent1:U8286.41A.21AFF8V-V6-C3
    

Finally create the Network Interface Backup and put and IP on top of it:

# mkdev -c adapter -s pseudo -t ibm_ech -a adapter_names=ent0 -a backup_adapter=ent1
ent2 Available
# mktcpip -h 72nim1 -a 10.44.33.223 -i en2 -g 10.44.33.254 -m 255.255.255.0 -s
en2
72nim1
inet0 changed
en2 changed
inet0 changed
[..]
# echo "vnic" | kdb
+-------------------------------------------------+
|       pACS       | Device | Link |    State     |
|------------------+--------+------+--------------|
| F1000A0032880000 |  ent0  |  Up  |     Open     |
|------------------+--------+------+--------------|
| F1000A00329B0000 |  ent1  |  Up  |     Open     |
+-------------------------------------------------+

Let’s now try different things to see if the redundancy is working ok. First let’s shutdown one of the Virtual I/O Server and let’s ping our machine from another one:

# ping 10.14.33.223
PING 10.14.33.223 (10.14.33.223) 56(84) bytes of data.
64 bytes from 10.14.33.223: icmp_seq=1 ttl=255 time=0.496 ms
64 bytes from 10.14.33.223: icmp_seq=2 ttl=255 time=0.528 ms
64 bytes from 10.14.33.223: icmp_seq=3 ttl=255 time=0.513 ms
[..]
64 bytes from 10.14.33.223: icmp_seq=40 ttl=255 time=0.542 ms
64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.514 ms
64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.550 ms
64 bytes from 10.14.33.223: icmp_seq=48 ttl=255 time=0.596 ms
[..]
--- 10.14.33.223 ping statistics ---
50 packets transmitted, 45 received, 10% packet loss, time 49052ms
rtt min/avg/max/mdev = 0.457/0.525/0.596/0.043 ms
# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
59224136   1120200815 P H ent2           ETHERCHANNEL FAILOVER
F655DA07   1120200815 I S ent0           VNIC Link Down
3DEA4C5F   1120200815 T S ent0           VNIC Error CRQ
81453EE1   1120200815 T S vscsi1         Underlying transport error
DE3B8540   1120200815 P H hdisk0         PATH HAS FAILED
# echo "vnic" | kdb
(0)> vnic
+-------------------------------------------------+
|       pACS       | Device | Link |    State     |
|------------------+--------+------+--------------|
| F1000A0032880000 |  ent0  | Down |   Unknown    |
|------------------+--------+------+--------------|
| F1000A00329B0000 |  ent1  |  Up  |     Open     |
+-------------------------------------------------+

Same test with the addition of an address to ping, and I’m only loosing 4 packets:

# ping 10.14.33.223
[..]
64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.627 ms
64 bytes from 10.14.33.223: icmp_seq=42 ttl=255 time=0.548 ms
64 bytes from 10.14.33.223: icmp_seq=46 ttl=255 time=0.629 ms
64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.492 ms
[..]
# errpt | more
59224136   1120203215 P H ent2           ETHERCHANNEL FAILOVER
F655DA07   1120203215 I S ent0           VNIC Link Down
3DEA4C5F   1120203215 T S ent0           VNIC Error CRQ

vNIC Live Partition Mobility

By default you can use Live Partition Mobility with SRIOV vNIC, it is super simple and it is fully supported by IBM, as always I’ll show you how to do that using the HMC GUI and the command line:

Using the GUI

First validate the mobility operation, it will allow you to choose the destination SRIOV adapter/port on which to map your current vNIC. You have to choose:

  • The adapter (if you have more than one SRIOV adapter).
  • The Physical port on which the vNIC will be mapped.
  • The Virtual I/O Server on which the vnicserver will be created.

New options are now available in the mobility validation panel:

lpmiov1

Modify each vNIC to match your destination SRIOV adapter and ports (choose the destination Virtual I/O Server here):

lpmiov2
lpmiov3

Then migrate:

lpmiov4

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A5E6DB96   1120205915 I S pmig           Client Partition Migration Completed
4FB9389C   1120205915 I S ent1           VNIC Link Up
F655DA07   1120205915 I S ent1           VNIC Link Down
11FDF493   1120205915 I H ent2           ETHERCHANNEL RECOVERY
4FB9389C   1120205915 I S ent1           VNIC Link Up
4FB9389C   1120205915 I S ent0           VNIC Link Up
[..]
59224136   1120205915 P H ent2           ETHERCHANNEL FAILOVER
B50A3F81   1120205915 P H ent2           TOTAL ETHERCHANNEL FAILURE
F655DA07   1120205915 I S ent1           VNIC Link Down
3DEA4C5F   1120205915 T S ent1           VNIC Error CRQ
F655DA07   1120205915 I S ent0           VNIC Link Down
3DEA4C5F   1120205915 T S ent0           VNIC Error CRQ
08917DC6   1120205915 I S pmig           Client Partition Migration Started

The ping test during the lpm show only 9 ping lost, due to etherchannel failover (on of my port was down at the destination server):

# ping 10.14.33.223
64 bytes from 10.14.33.223: icmp_seq=23 ttl=255 time=0.504 ms
64 bytes from 10.14.33.223: icmp_seq=31 ttl=255 time=0.607 ms

Using the command line

I’m moving back the partition using the HMC command line interface, check the manpage for all the details. Here is the details for the vnic_mappings: slot_num/ded/[vios_lpar_name]/[vios_lpar_id]/[adapter_id]/[physical_port_id]/[capacity]:

  • Validate:
  • # migrlpar -o v -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'
    
    Warnings:
    HSCLA291 The selected partition may have an open virtual terminal session.  The management console will force termination of the partition's open virtual terminal session when the migration has completed.
    
  • Migrate:
  • # migrlpar -o m -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'
    

Port Labelling

One thing very annoying using LPM with vNIC is that you have to do the mapping of your vNIC each time you are moving. The default choices are never ok and the GUI will always show you the first port or the first adapter and you will have to do that job by yourself. Even worse with the command line the vnic_mappings can give you some headaches :-) . Hopefully there is a feature called port labelling. You can put a label on each SRIOV Physical port and all your machines. My advice is to tag the ports that are serving the same network and the same vlan with the same label on all your machines. During the mobility operation if labels are matching between two machine the adapter/port combination matching the label will be automatically chosen for the mobility and you will have nothing to do to map on your own. Super useful. The outputs below show you how to label your SRIOV ports:

label1
label2

# chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=3,phys_port_label=adapter1port3"
# chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=2,phys_port_label=adapter1port2"
# lshwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport --level eth -F adapter_id,phys_port_label
1,adapter1port2
1,adapter1port3

At the validation time source and destination ports will automatically be matched:

labelautochoose

What about performance

One of the main reason I’m looking for SRIOV vNIC adapter is performance. As all of our design is based on the fact that we need to move all of our virtual machines from a host to one another we need a solution allowing both mobility and performance. If you have tried to run a TSM server in a virtualized environment you’ll probably understand what I mean about performance and virtualization. In the case of TSM you need a lot of network bandwidth. My current customer and my previous one tried to do that using Shared Ethernet Adapters and of course this solution did not work because a classic Virtual Ethernet Adapter is not able to provide enough bandwidth for a single Virtual I/O client. I’m not an expert about network performance but the result you will see below are pretty obvious to understand and will show you the power of vNIC and SRIOV (I know some optimization can be done on the SEA side but it’s just a super simple test).

Methodology

I will try here to compare a classic Virtual Ethernet Adapter with a vNIC in the same configuration, both environments are the same, using same machines, same switches on so on:

  • Two machines are used to do the test. In case of vNIC both are using a single vNIC bacedk to a 10Gb adapter, in case of Virtual Ethernet Adapter both are backed to a SEA build on top of a 10Gb adapter.
  • The two machines are running on two different s814.
  • Entitlement and memory are the same for source and destination machines.
  • In the case of vNIC the capacity of the VF is set at 100% and the physical port of the SRIOV adapter is dedicated to the vNIC.
  • In the case of vent the SEA is dedicated to the test virtual machine.
  • In both cases a MTU of 1500 is utilized.
  • The tool used for the performance test is iperf (MTU 1500, Window Size 64K, and 10 TCP thread)

SEA test for reference only

  • iperf server:
  • seaserver1

  • iperf client:
  • seacli1

vNIC SRIOV test

We are here running the exact same test:

  • iperf server:
  • iperf_vnic_client2

  • iperf client:
  • iperf_vnic_client

By using a vNIC I get 300% of the bandwidth I get with an virtual ethernet adapter. Just awesome ;-) no tuning (out of the box configuration). Nothing more to add about it it’s pretty obvious that the usage of vNIC for performance will be a must.

Conclusion

Are SRIOV vNICs the end of the SEAs ? Maybe, but not yet ! For some cases like performance and QoS it will be very useful and adopted (I’m pretty sure I will use that for my current customer to virtualized the TSM servers). But today in my opinion SRIOV lacks a real redundancy feature at the adapter level. What I want is a heartbeat communication between the two SRIOV adapters. Having such a feature on a SRIOV adapter will finish to convince customers to move from SEA to SRIOV vNIC. I know nothing about the future but I hope something like that will be available in the next few years. To sum up SRIOV vNICs are powerful, easy to use and simplify the configuration and management of your Power Servers. Please wait for the GA and try this new killer functionality. As always I hope it helps.

Using Chef and cloud-init with PowerVC 1.2.2.2 | What’s new in version 1.2.2.2

I’ve been busy; very busy and I apologize for that … almost two months since the last update on the blog, but I’m still alive and I love AIX more than ever ;-). There is no blog post about it but I’ve developped a tool called “lsseas” which can be useful to all PowerVM administrators (you can find the script on github at this address https://github.com/chmod666org/lsseas). I’ll not talk to much about it but I thought sharing the information to all my readers who are not following me on twitter was the best way to promote the tool. Have a look on it, submit your own changes on github, code and share !

This said we can talk about this new blog post. PowerVC 1.2.2.2 has been released since a few months and there are a few things I wanted to talk about. The new version include new features making the product more powerful than ever (export/import images, activation input, vscsi lun management). PowerVC is only building “empty” machine, it’s a good start but we can do better. The activation engine can customize the virtual machines but is limited and in my humble opinion not really usable for post-installation tasks. With the recent release of cloud-init and Chef for AIX PowerVC can be utilized to build your machines from nothing … and finally get your application running in minutes. Using cloud-init and Chef can help you making your infrastructure repeatable, “versionable” and testable this is what we call infrastructure as code and it is damn powerful.

A big thank you to Jay Kruemcke (@chromeaix), Philippe Hermes (@phhermes) and S.Tran (https://github.com/transt) , they gave me very useful help about the cloud-init support on AIX. Follow them on twitter !

PowerVC 1.2.2.1 mandatory fixes

Before starting please note that I strongly recommend to have the latest ifixes installed on your Virtual I/O Server. These ones are mandatory for PowerVC, install these ifixes no matter what :

  • On Virtual I/O Servers install IV66758m4c, rsctvios2:
  • # emgr -X -e /mnt/VIOS_2.2.3.4_IV66758m4c.150112.epkg.Z
    # emgr -l
    [..]
    ID  STATE LABEL      INSTALL TIME      UPDATED BY ABSTRACT
    === ===== ========== ================= ========== ======================================
    1    S    rsctvios2  03/03/15 12:13:42            RSCT fixes for VIOS
    2    S    IV66758m4c 03/03/15 12:16:04            Multiple PowerVC fixes VIOS 2.2.3.4
    3    S    IV67568s4a 03/03/15 14:12:45            man fails in VIOS shell
    [..]
    
  • Check you have the latest version of the Hardware Management Console (I strongly recommend v8.2.20 Service Pack 1):
  • hscroot@myhmc:~> lshmc -V
    "version= Version: 8
     Release: 8.2.0
     Service Pack: 1
    HMC Build level 20150216.1
    ","base_version=V8R8.2.0
    "
    

Exporting and importing image from another PowerVC

The PowerVC latest version allows you to export and import images. It’s a good thing ! Let’s say that like me you have a few PowerVC hosts, on different SAN networks with different storage arrays, you probably do not want to create your images on each one and you prefer to be sure to use the same image for each PowerVC. Just create one image and use the export/import feature to copy/move this image to a different storage array or PowerVC host:

  • To do so map your current image disk on the PowerVC itself (in my case by using the SVC), you can’t attach volume used for an image volume directly from PowerVC so you have to do it on the storage side by hand:
  • maptohost
    maptohost2

  • On the PowerVC host, rescan the volume and copy the whole new discovered lun with a dd:
  • powervc_source# rescan-scsi-bus.sh
    [..]
    powervc_source# multipath -ll
    mpathe (3600507680c810010f800000000000097) dm-10 IBM,2145
    [..]
    powervc_source# dd if=/dev/mapper/mpathe of=/data/download/aix7100-03-04-cloudinit-chef-ohai bs=4M
    16384+0 records in
    16384+0 records out
    68719476736 bytes (69 GB) copied, 314.429 s, 219 MB/s                                         
    
  • Map a new volume to the new PowerVC server and upload this new created file on the new PowerVC server, then dd the file back to the new volume:
  • mapnewlun

    powervc_dest# scp /data/download/aix7100-03-04-cloudinit-chef-ohai new_powervc:/data/download
    aix7100-03-04-cloudinit-chef-ohai          100%   64GB  25.7MB/s   42:28.
    powervc_dest# dd if=/data/download/aix7100-03-04-cloudinit-chef-ohai of=/dev/mapper/mpathc bs=4M
    16384+0 records in
    16384+0 records out
    68719476736 bytes (69 GB) copied, 159.028 s, 432 MB/s
    
  • Unmap the volume from the new PowerVC after the dd operation, and import it with the PowerVC graphical interface.
  • Manage the existing current volume you just created (note that the current PowerVC code does not allows you to choose cloud-init as an activation engine even if it is working great) :
  • manage_ex1
    manage_ex2

  • Import the image:
  • import1
    import2
    import3
    import4

You can also use the command powervc-volume-image-import to import the new volume by using the command line instead of the graphical user interface. Here is an example with a Red Hat Enterprise Linux 6.4 image:

powervc_source# dd if=/dev/hdisk4 of=/apps/images/rhel-6.4.raw bs=4M
5815360+0 records in
15360+0 records out
powervc_dest# scp 10.255.248.38:/apps/images/rhel-6.4.raw .
powervc_dest# dd if=/home/rhel-6.4.raw of=/dev/mapper/mpathe
30720+0 records in
30720+0 records out
64424509440 bytes (64 GB) copied, 124.799 s, 516 MB/s
powervc_dest# powervc-volume-image-import --name rhel64 --os rhel --volume volume_capture2 --activation-type ae
Password:
Image creation complete for image id: e3a4ece1-c0cd-4d44-b197-4bbbc2984a34

Activation input (cloud-init and ae)

Instead of doing post-installation tasks by hand after the deployment of the machine you can now use the activation input added recently in PowerVC. The activation input can be utilized to run any scripts you want or even better things (such as cloud-config syntax) if you are using cloud-init instead of the old activation engine. You have to remember that cloud-init is not yet officially supported by PowerVC, for this reason I think most of customers will still use the old activation engine. Latest activation engine version is also working with the activation input. On the examples below I’m of course using cloud-init :-). Don’t worry I’ll detail later in this post how to install and use cloud-init on AIX:

  • If you are using the activation engine please be sure to use the latest version. The current version of the activation engine in PowerVC 1.2.2.* is vmc-vsae-ext-2.4.5-1, the only way to be sure your are using this version is to check the size of /opt/ibm/ae/AS/vmc-sys-net/activate.py. The size of this file is 21127 bytes for the latest version. Check this before trying to do anything with the activation input. More information can be found here: Activation input documentation.
  • A simple shebang script can be used, on the example below this one is just writing a file, but it can be anything you want:
  • ai1

    # cat /tmp/activation_input
    Activation input was used on this server
    
  • If you are using cloud-init you can directly put cloud-config “script” in the activation input. The first line is always mandatory to tell the format of the activation input. If you forget to put this first line the activation input can not determine the format and the script will not be executed. Check the next point for more information about activation input:
  • ai2

    # cat /tmp/activation_input
    cloud-config activation input
    
  • There are additional fields called “server meta data key/value pairs”, just do not use them. They are used by images provided by IBM with customization of the activation engine. Forget about this it is useless, use this field only if IBM told you to do so.
  • cloud-init valid activation input can be found here: http://cloudinit.readthedocs.org/en/latest/topics/format.html. As you can see on the two examples above shell scripts and cloud-config format can be utilized, but you can also upload a gzip archive, or use a part handler format. Go on the url above for more informations.

vscsi and mix NPIV/vscsi machine creation

This is one of the major enhancement, PowerVC is now able create and map vscsi disks, even better you can create mixed NPIV vscsi machine. To do so create storage connectivity groups for each technology you want to use. You can choose a different way to create disk for boot volumes and for data volumes. Here are three examples, full NPIV, full vscsi, and a mixed vscsi(boot) and NPIV(data):

connectivitygroup1
connectivitygroup2
connectivitygroup3

What is really cool about this new feature is that PowerVC can use existing mapped luns on the Virtual I/O Server, please note that PowerVC will only use SAN backed devices and cannot use iSCSI or local disk (local disk can be use in the express version). You obviously have to make the zoning of your Virtual I/O Server by yourself. Here is an example where I have 69 devices mapped to my Virtual I/O Server, you can see that PowerVC is using one of the existing device for its deployment. This can be very useful if you have different teams working for the SAN and the system side, the storage guys will not change their habits and still can map you bunch of luns on the Virtual I/O Server, this can be used as a transition if you did not succeed in convincing guys from you storage team:

$ lspv | wc -l
      69

connectivitygroup_deploy1

$ lspv | wc -l
      69
$ lsmap -all -fmt :
vhost1:U8202.E4D.845B2DV-V2-C28:0x00000009:vtopt0:Available:0x8100000000000000:/var/vio/VMLibrary/vopt_c1309be1ed244a5c91829e1a5dfd281c: :N/A:vtscsi1:Available:0x8200000000000000:hdisk66:U78AA.001.WZSKM6P-P1-C3-T1-W500507680C11021F-L41000000000000:false

Please note that you still need to add fabrics and storage on PowerVC even if you have pre-mapped luns on your Virtual I/O Servers. This is mandatory for PowerVC image management and creation.

Maintenance Mode

This last feature is probably the one I like the most. You can now put your host in maintenance mode, this means that when you put a host in maintenance mode all the virtual machines hosted on this one are migrated with live partition mobility (remember the migrlpar –all option, I’m pretty sure this option is utilized for the PowerVC maintenance mode). By putting an host in maintenance mode this one is no longer available for new machines deployment and for mobility operations. The host can be shutdown for instance for a firmware upgrade.

  • Select a host and click the “Enter maintenance mode button”:
  • maintenance1

  • Choose where you want to move virtual machines, or let PowerVC decide for you (packing or stripping placement policy):
  • maintenance2

  • The host is entering maintenance mode:
  • maintenance3

  • Once the host is in maintenance mode this one is ready to be shutdown:
  • maintenance4

  • Leave the maintenance mode when you are ready:
  • maintenance5

An overview of Chef and cloud-init

With PowerVC you are now able to deploy new AIX virtual machines in a few minutes but there is still some work to do. What about post-installation tasks ? I’m sure that most of you are using NIM post-install scripts for post installation tasks. PowerVC does not use NIM and even if you can run your own shell scripts after a PowerVC deployment the goal of this tool is to automate a full installation… post-install included.

If the activation engine do the job to change the hostname and ip address of the machine it is pretty hard to customize it to do other tasks. Documentation is hard to find and I can assure you that it is not easy at all to customize and maintain. PowerVC Linux user’s are probably already aware of cloud-init. cloud-init is a tool (like the activation engine) in charge of the reconfiguration of your machine after its deployment, as the activation engine do today cloud-init change the hostname and the ip address of the machine but it can do way more than that (create user, add ssh-keys, mounting a filesystem, …). The good news is that cloud-init is now available an AIX since a few days, and you can use it with PowerVC. Awesome \o/.

If cloud-init can do one part of this job, it can’t do all and is not designed for that! It is not a configuration management tool, configurations are not centralized in a server, there is now way to create cookbooks, runbooks (or whatever you call it), you can’t pull product sources from a git server, there are a lot of things missing. cloud-init is a light tool designed for a simple job. I recently (at work and in my spare time) played a lot with configuration management tools. I’m a huge fan of Saltstack but unfortunately salt-minion (which are Saltstack clients) is not available on AIX… I had to find another tool. A few months ago Chef (by Opscode) announced the support of AIX and a release of chef-client for AIX, I decided to give it a try and I can assure you that this is damn powerful, let me explain this further.

Instead of creating shell scripts to do your post installation, Chef allows you to create cookbooks. Cookbooks are composed by recipes and each recipes is doing a task, for instance install an Oracle client, create the home directory for root user and create its profile file, enable or disable service on the system. The recipes are coded in a Chef language, and you can directly put Ruby code inside a recipe. Chef recipes are idempotent, it means that if something has already be done, it will not be done again. The advantage of using a solution like this is that you don’t have to maintain shell code and shells scripts which are difficult to change/rewrite. Your infrastructure is repeatable and changeable in minutes (after Chef is installed you can for instance told him to change /etc/resolv.conf for all your Websphere server). This is called “infrastructure as a code”. Give it a try and you’ll see that the first thing you’ll think will be “waaaaaaaaaaaaaooooooooooo”.

Trying to explain how PowerVC, cloud-init and Chef can work together is not really easy, a nice diagram is probably better than a long text:

chef

  1. You have built an AIX virtual machine. On this machine cloud-init is installed, Chef client 12 is installed. cloud-init is configured to register the chef-client on the chef-server, and to run a cookbook for a specific role. This server has been captured with PowerVC and is now ready to be deployed.
  2. Virtual machines are created with PowerVC.
  3. When the machine is built cloud-init is running on first boot. The ip address and the hostname of this machine is changed with the values provided in PowerVC. cloud-init create the chef-client configuration (client.rb, validation.pem). Finally chef-client is called.
  4. chef-client is registering on chef-server. Machine are now known by the chef-server.
  5. chef-client is resolving and downloading cookbooks for a specific role. Cookbooks and recipes are executed on the machine. After cookbooks execution the machine is ready and configured.
  6. Administrator create and upload cookbooks an recipe from his knife workstation. (knife is the tool to interact with the chef-server this one can be hosted anywhere you want, your laptop, a server …)

In a few step here is what you need to do to use PowerVC, cloud-init, and Chef together:

  1. Create a virtual machine with PowerVC.
  2. Download cloud-init, and install cloud-init in this virtual machine.
  3. Download chef-client, and install chef-client in this virtual machine.
  4. Configure cloud-init, modifiy /opt/freeware/etc/cloud.cfg. In this file put the Chef configuration of the cc_chef cloud-init module.
  5. Create mandatory files, such as /etc/chef directory, put your ohai plugins in /etc/chef/ohai-plugins directory.
  6. Stop the virtual machine.
  7. Capture the virtual machine with PowerVC.
  8. Obviously as prerequisites a chef-server is up and running, cookbooks, recipes, roles, environments are ok in this chef-server.

cloud-init installation

cloud-init is now available on AIX, but you have to build the rpm by yourself. Sources can be found on github at this address : https://github.com/transt/cloud-init-0.7.5. There are a lot of prerequisites, most of them can be found on the github page, some of them on famous perzl site, download and install these prerequisites; it is mandatory (links to download the prerequisites are on the github page, the zip file containing cloud-init can be downloaded here : https://github.com/transt/cloud-init-0.7.5/archive/master.zip

# rpm -ivh --nodeps gettext-0.17-8.aix6.1.ppc.rpm
[..]
gettext                     ##################################################
# for rpm in bzip2-1.0.6-2.aix6.1.ppc.rpm db-4.8.24-4.aix6.1.ppc.rpm expat-2.1.0-1.aix6.1.ppc.rpm gmp-5.1.3-1.aix6.1.ppc.rpm libffi-3.0.11-1.aix6.1.ppc.rpm openssl-1.0.1g-1.aix6.1.ppc.rpm zlib-1.2.5-6.aix6.1.ppc.rpm gdbm-1.10-1.aix6.1.ppc.rpm libiconv-1.14-1.aix6.1.ppc.rpm bash-4.2-9.aix6.1.ppc.rpm info-5.0-2.aix6.1.ppc.rpm readline-6.2-3.aix6.1.ppc.rpm ncurses-5.9-3.aix6.1.ppc.rpm sqlite-3.7.15.2-2.aix6.1.ppc.rpm python-2.7.6-1.aix6.1.ppc.rpm python-2.7.6-1.aix6.1.ppc.rpm python-devel-2.7.6-1.aix6.1.ppc.rpm python-xml-0.8.4-1.aix6.1.ppc.rpm python-boto-2.34.0-1.aix6.1.noarch.rpm python-argparse-1.2.1-1.aix6.1.noarch.rpm python-cheetah-2.4.4-2.aix6.1.ppc.rpm python-configobj-5.0.5-1.aix6.1.noarch.rpm python-jsonpointer-1.0.c1ec3df-1.aix6.1.noarch.rpm python-jsonpatch-1.8-1.aix6.1.noarch.rpm python-oauth-1.0.1-1.aix6.1.noarch.rpm python-pyserial-2.7-1.aix6.1.ppc.rpm python-prettytable-0.7.2-1.aix6.1.noarch.rpm python-requests-2.4.3-1.aix6.1.noarch.rpm libyaml-0.1.4-1.aix6.1.ppc.rpm python-setuptools-0.9.8-2.aix6.1.noarch.rpm fdupes-1.51-1.aix5.1.ppc.rpm ; do rpm -ivh $rpm ;done
[..]
python-oauth                ##################################################
python-pyserial             ##################################################
python-prettytable          ##################################################
python-requests             ##################################################
libyaml                     ##################################################

Build the rpm by following the commands below. You can reuse this rpm on every AIX on which you want to install cloud-init package:

# jar -xvf cloud-init-0.7.5-master.zip
inflated: cloud-init-0.7.5-master/upstart/cloud-log-shutdown.conf
# mv cloud-init-0.7.5-master  cloud-init-0.7.5
# chmod -Rf +x cloud-init-0.7.5/bin
# chmod -Rf +x cloud-init-0.7.5/tools
# cp cloud-init-0.7.5/packages/aix/cloud-init.spec.in /opt/freeware/src/packages/SPECS/cloud-init.spec
# tar -cvf cloud-init-0.7.5.tar cloud-init-0.7.5
[..]
a cloud-init-0.7.5/upstart/cloud-init.conf 1 blocks
a cloud-init-0.7.5/upstart/cloud-log-shutdown.conf 2 blocks
# gzip cloud-init-0.7.5.tar
# cp cloud-init-0.7.5.tar.gz /opt/freeware/src/packages/SOURCES/cloud-init-0.7.5.tar.gz
# rpm -v -bb /opt/freeware/src/packages/SPECS/cloud-init.spec
[..]
Requires: cloud-init = 0.7.5
Wrote: /opt/freeware/src/packages/RPMS/ppc/cloud-init-0.7.5-4.1.aix7.1.ppc.rpm
Wrote: /opt/freeware/src/packages/RPMS/ppc/cloud-init-doc-0.7.5-4.1.aix7.1.ppc.rpm
Wrote: /opt/freeware/src/packages/RPMS/ppc/cloud-init-test-0.7.5-4.1.aix7.1.ppc.rpm

Finally install the rpm:

# rpm -ivh /opt/freeware/src/packages/RPMS/ppc/cloud-init-0.7.5-4.1.aix7.1.ppc.rpm
cloud-init                  ##################################################
# rpm -qa | grep cloud-init
cloud-init-0.7.5-4.1

cloud-init configuration

By installing cloud-init package on AIX some entries have been added to /etc/rc.d/rc2.d:

ls -l /etc/rc.d/rc2.d | grep cloud
lrwxrwxrwx    1 root     system           33 Apr 26 15:13 S01cloud-init-local -> /etc/rc.d/init.d/cloud-init-local
lrwxrwxrwx    1 root     system           27 Apr 26 15:13 S02cloud-init -> /etc/rc.d/init.d/cloud-init
lrwxrwxrwx    1 root     system           29 Apr 26 15:13 S03cloud-config -> /etc/rc.d/init.d/cloud-config
lrwxrwxrwx    1 root     system           28 Apr 26 15:13 S04cloud-final -> /etc/rc.d/init.d/cloud-final

The default configuration file is located in /opt/freeware/etc/cloud/cloud.cfg, this configuration file is splited in three parts. The first one called cloud_init_module tells cloud-init to run specifics modules when the cloud-init script is started at boot time. For instance set the hostname of the machine (set_hostname), reset the rmc (reset_rmc) and so on. In our case this part will automatically change the hostname and the ip address of the machine by the values provided in PowerVC at the deployement time. This cloud_init_module part is splited in two, the local one and the normal one. The local on is using information provided by the cdrom build by PowerVC at the time of the deployment. This cdrom provides ip and hostname of the machine, activation input script, nameservers information. The datasource_list stanza tells cloud-init to use the “ConfigDrive” (in our case virtual cdrom) to get ip and hostname needed by some cloud_init_modules. The second one called cloud_config_module tells cloud-init to run specific modules when cloud-config script is called, at this stage the minimal requirements have already been configured by the previous cloud_init_module stage (dns, ip address, hostname are ok). We will configure and setup the chef-client in this stage. The last part called cloud_final_module tells cloud-init to run specific modules when the cloud-final script is called. You can at this step print a final message, reboot the host and so on (In my case host reboot is needed by my install_sddpcm Chef recipe). Here is an overview of the cloud.cfg configuration file:

cloud-init

  • The datasource_list stanza tells cloud-init to use the virtual cdrom as a source of information:
  • datasource_list: ['ConfigDrive']
    
  • cloud_init_module:
  • cloud_init_modules:
    [..]
     - set-multipath-hcheck-interval
     - update-bootlist
     - reset-rmc
     - set_hostname
     - update_hostname
     - update_etc_host
    
  • cloud_config_module:
  • cloud_config_modules:
    [..]
      - mounts
      - chef
      - runcmd
    
  • cloud_final_module:
  • cloud_final_modules:
      [..]
      - final-message
    

If you do not want to use Chef at all you can modify the cloud.cfg file to fit you needs (running homemade scripts, mounting filesystems …), but my goal here is to do the job with Chef. We will try to do the minimal job with cloud-init, so the goal here is to configure cloud-init to configure chef-client. Anyway I also wanted to play with cloud-init and see its capabilities. The full documentation of cloud-init can be found here https://cloudinit.readthedocs.org/en/latest/. Here are a few thing I just added (the Chef part will be detailed later), but keep in mind you can just use cloud-init without Chef if you want (setup you ssh key, mount or create filesystems, create files and so on):

write_files:
  - path: /tmp/cloud-init-started
    content: |
      cloud-init was started on this server
    permissions: '0755'
  - path: /var/log/cloud-init-sub.log
    content: |
      starting chef logging
    permissions: '0755'

final_message: "The system is up, cloud-init is finished"

EDIT : The IBM developper of cloud-init for AIX just send me a mail yesterday about the new support of cc_power_state. As I need to reboot my host at the end of the build I can with the latest version of cloud-init for AIX use the power_state stanza, I here use poweroff as an example, use reboot … for reboot:

power_state:
 delay: "+5"
 mode: poweroff
 message: cloud-init mandatory reboot for sddpcm
 timeout: 5

power_state1

Rerun cloud-init for testing purpose

You probably want to test your cloud-init configuration before of after capturing the machine. When cloud-init is launched by the startup script a check is performed to be sure that cloud-init has not already been run. Some “semaphores” files are created in /opt/freeware/var/lib/cloud/instance/sem to tell modules have already been executed. If you want to re-run cloud-init by hand without having to rebuild a machine, just remove these files in this directory :

# rm -rf /opt/freeware/var/lib/cloud/instance/sem

Let’s say we just want to re-run the Chef part:

# rm /opt/freeware/var/lib/cloud/instance/sem/config_chef

To sum up here is what I want to do with cloud-init:

  1. Use the cdrom as datasource.
  2. Set the hostname and ip.
  3. Setup my chef-client.
  4. Print a final message.
  5. Do a mandatory reboot at the end of the installation.

chef-client installation and configuration

Before modifying the cloud.cfg file to tell cloud-init to setup the Chef client we first have to download and install the chef-client on the AIX host we will capture later. Download the Chef client bff file at this address: https://opscode-omnibus-packages.s3.amazonaws.com/aix/6.1/powerpc/chef-12.1.2-1.powerpc.bff and install it:

# installp -aXYgd . chef
[..]
+-----------------------------------------------------------------------------+
                         Installing Software...
+-----------------------------------------------------------------------------+

installp: APPLYING software for:
        chef 12.1.2.1
[..]
Installation Summary
--------------------
Name                        Level           Part        Event       Result
-------------------------------------------------------------------------------
chef                        12.1.2.1        USR         APPLY       SUCCESS
chef                        12.1.2.1        ROOT        APPLY       SUCCESS
# lslpp -l | grep -i chef
  chef                      12.1.2.1    C     F    The full stack of chef
# which chef-client
/usr/bin/chef-client

The configuration file of chef-client created by cloud-init will be created in the /etc/chef directory, by default the /etc/chef directory does not exists, so you’ll have to create it

# mkdir -p /etc/chef
# mkdir -p /etc/chef/ohai_plugins

If -like me- you are using custom ohai plugins, you have two things to do. cloud-init is using templates files to build configuration files needed by Chef. Theses templates files are located in /opt/freeware/etc/cloud/templates. Modify the chef_client.rb.tmpl file to add a configuration line for ohai plugin_path. Copy your ohai plugin in /etc/chef/ohai_plugins:

# tail -1 /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl
Ohai::Config[:plugin_path] << '/etc/chef/ohai_plugins'
# ls /etc/chef/ohai_plugins
aixcustom.rb

Add the chef stanza in the /opt/freeware/cloud/cloud.cfg. After this step the image is ready to be captured (Check ohai plugin configuration if you need one), so the chef-client is already installed. Put the force_install stanza to false, put the server_url, the validation_name of your Chef server, the organization and finally put the validation RSA private key provided in your Chef server (in the example below the key has been truncated for obvious purpose; server_url and validation_name have also been replaced). As you can see below, I tell here to Chef to run all recipes defined in the aix7 cookbook, we'll see later how to create a cookbook and recipes :

chef:
  force_install: false
  server_url: "https://chefserver.lab.chmod666.org/organizations/chmod666"
  validation_name: "chmod666-validator"
  validation_key: |
    -----BEGIN RSA PRIVATE KEY-----
    MIIEpQIBAAKCAQEApj/Qqb+zppWZP+G3e/OA/2FXukNXskV8Z7ygEI9027XC3Jg8
    [..]
    XCEHzpaBXQbQyLshS4wAIVGxnPtyqXkdDIN5bJwIgLaMTLRSTtjH/WY=
    -----END RSA PRIVATE KEY-----
  run_list:
    - "role[aix7]"

runcmd:
  - /usr/bin/chef-client

EDIT: With the latest build of cloud-init for AIX there is no need to run chef-client with the runcmd stanza. Just add exec: 1 in the chef stanza.

To sum up, cloud-init is installed, cloud-init is configured to run a few actions at boot time but mainly to configure chef-client and run it with a specific role> The chef-client is installed. The machine can now be shutdown and is ready to be deployed. At the deployement time cloud-init will do the job to change ip address and hostname, and configure Chef. Chef will retreive the cookbooks and recipes and run it on the machine.

If you want to use custom ohai plugins read the ohai part before capturing your machine.

capture
capture2

Use chef-solo for testing

You will have to create your own recipes. My advice is to use chef-solo to debug. The chef-solo binary file is provided with the chef-client package. This one can be use without a Chef server to run and execute Chef recipes:

  • Create a test recipe:
  • # mkdir -p ~/chef/cookbooks/testing/recipes
    # cat  ~/chef/cookbooks/testing/recipes/test.rb
    file "/tmp/helloworld.txt" do
      owner "root"
      group "system"
      mode "0755"
      action :create
      content "Hello world !"
    end
    
  • Create a run_list with you test recipe:
  • # cat ~/chef/node.json
    {
      "run_list": [ "recipe[testing::test]" ]
    }
    
  • Create attribute file for chef-solo execution:
  • # cat  ~/chef/solo.rb
    file_cache_path "/root/chef"
    cookbook_path "/root/chef/cookbooks"
    json_attribs "/root/chef/node.json"
    
  • Run chef-solo:
  • # chef-solo -c /root/chef/solo.rb
    

chef-solo

cookbooks and recipes example on AIX

Let's say you have written all you recipes using chef-solo on a test server. On the Chef server you now want to put all these recipes in a cookbook. From the workstation, create a cookbook :

# knife cookbook create test
** Creating cookbook test in /home/kadmin/.chef/cookbooks
** Creating README for cookbook: aix7
** Creating CHANGELOG for cookbook: aix7
** Creating metadata for cookbook: aix7

In the .chef directory you can now find a directory for the aix7 cookbook. In this one you will find a directory for each Chef objects : recipes, templates, files, and so on. This place is called the chef-repo. I strongly recommend using this place as a git repository (you will by doing this save all modifications of any object in the cookbook).

# ls /home/kadmin/.chef/cookbooks/aix7/recipes
create_fs_rootvg.rb  create_profile_root.rb  create_user_group.rb  delete_group.rb  delete_user.rb  dns.rb  install_sddpcm.rb  install_ssh.rb  ntp.rb  ohai_custom.rb  test_ohai.rb
# ls /home/kadmin/.chef/cookbooks/aix7/templates/default
aixcustom.rb.erb  ntp.conf.erb  ohai_test.erb  resolv.conf.erb

Recipes

Here are a few examples of my own recipes:

  • install_ssh, the recipe is mounting an nfs filesystem (nim server). The nim_server is an attribute coming from role default attribute (we will check that later), the oslevel is an ohai attribute coming from an ohai custom plugin (we will check that later too). openssh.license and openssh.server filesets are installed, the filesystem is unmounted, and finally ssh service is started:
  • # creating temporary directory
    directory "/var/mnttmp" do
      action :create
    end
    # mouting nim server
    mount "/var/mnttmp" do
      device "#{node[:nim_server]}:/export/nim/lppsource/#{node['aixcustom']['oslevel']}"
      fstype "nfs"
      action :mount
    end
    # installing ssh packages (openssh.license, openssh.base)
    bff_package "openssh.license" do
      source "/var/mnttmp"
      action :install
    end
    bff_package "openssh.base" do
      source "/var/mnttmp"
      action :install
    end
    # umount the /var/mnttmp directory
    mount "/var/mnttmp" do
      fstype "nfs"
      action :umount
    end
    # deleting temporary directory
    directory "/var/mnttmp" do
      action :delete
    end
    # start and enable ssh service
    service "sshd" do
      action :start
    end
    
  • install_sddpcm, the recipe is mounting an nfs filesystem (nim server). The nim_server is an attribute coming from role default attribute (we will check that later), the platform_version is coming from ohai. devices.fcp.disk.ibm.mpio and devices.sddpcm.71.rte filesets are installed, the filesystem is unmounted:
  • # creating temporary directory
    directory "/var/mnttmp" do
      action :create
    end
    # mouting nim server
    mount "/var/mnttmp" do
      device "#{node[:nim_server]}:/export/nim/lpp_source/#{node['platform_version']}/sddpcm-71-2660"
      fstype "nfs"
      action :mount
    end
    # installing sddpcm packages (devices.fcp.disk.ibm.mpio, devices.sddpcm.71.rte)
    bff_package "devices.fcp.disk.ibm.mpio" do
      source "/var/mnttmp"
      action :install
    end
    bff_package "devices.sddpcm.71.rte" do
      source "/var/mnttmp"
      action :install
    end
    # umount the /var/mnttmp directory
    mount "/var/mnttmp" do
      fstype "nfs"
      action :umount
    end
    # deleting temporary directory
    directory "/var/mnttmp" do
      action :delete
    end
    
  • create_fs_rootvg, some filesystems are extended, an /apps filesystem is created and mounted. Please note that there are no cookbooks for AIX lvm for the moment and you have here to use the execute statement which is the only not to be idempotent:
  • execute "hd3" do
      command "chfs -a size=1024M /tmp"
    end
    execute "hd9var" do
      command "chfs -a size=512M /var"
    end
    execute "/apps" do
      command "crfs -v jfs2 -g rootvg -m /apps -Ay -a size=1M ; chlv -n appslv fslv00"
      not_if { ::File.exists?("/dev/appslv")}
    end
    mount "/apps" do
      device "/dev/appslv"
      fstype "jfs2"
    end
    
  • ntp, ntp.conf.erb located in the template directory is copied to /etc/ntp.conf:
  • template "/etc/ntp.conf" do
      source "ntp.conf.erb"
    end
    
  • dns, resolv.conf.erb located in the template directory is copied to /etc/resolv.conf:
  • template "/etc/resolv.conf" do
      source "resolv.conf.erb"
    end
    
  • crearte_user_group, a user for tadd is created:
  • user "taddmux" do
      gid 'sys'
      uid 421
      home '/home/taddmux'
      comment 'user TADDM connect SSH'
    end
    

Templates

On the recipes above templates are used for ntp and dns configuration. Templates files are files in which some strings are replaced by Chef attributes found in the roles, the environments, in ohai, or even directly in recipes, here are the two files I used for dns and ntp

  • ntp.conf.erb, ntpserver1,2,3 attributes are found in environments (let's say I have siteA and siteB and ntp are different for each site, I can define an environment for siteA en siteB):
  • [..]
    server <%= node['ntpserver1'] %>
    server <%= node['ntpserver2'] %>
    server <%= node['ntpserver3'] %>
    driftfile /etc/ntp.drift
    tracefile /etc/ntp.trace
    
  • resolv.conf.erb, nameserver1,2,3 and namesearch are found in environments:
  • search  <%= node['namesearch'] %>
    nameserver      <%= node['nameserver1'] %>
    nameserver      <%= node['nameserver2'] %>
    nameserver      <%= node['nameserver3'] %>
    

role assignation

Chef roles can be used to run different chef recipes depending of the type of server you want to post install. You can for instance create a role for webserver in which the Websphere recipe will be executed and create a role for databases server in which the recipe for Oracle will be executed. In my case and for the simplicity of this example I just create one role called aix7

# knife role create aix7
Created role[aix7]
# knife role edit aix7
{
  "name": "aix7",
  "description": "",
  "json_class": "Chef::Role",
  "default_attributes": {
    "nim_server": "nimsrv01"
  },
  "override_attributes": {

  },
  "chef_type": "role",
  "run_list": [
    "recipe[aix7::ohai_custom]",
    "recipe[aix7::create_fs_rootvg]",
    "recipe[aix7::create_profile_root]",
    "recipe[aix7::test_ohai]",
    "recipe[aix7::install_ssh]",
    "recipe[aix7::install_sddpcm]",
    "recipe[aix7::ntp]",
    "recipe[aix7::dns]"
  ],
  "env_run_lists": {

  }
}

What we can se here are two important things. We created an attribute specific to this role called nim_server. In all recipes, templates "node['nim_server']" will be replaced by nimsrv01 (remember the recipes above, and remember we told chef-client to run the aix7 role). We created a run_list telling that recipes coming from aix7 cookbook : install_ssh, install_sddpcm, ... should be exectued on a server calling chef-client with the aix7 role.

environments

Chef environments can be use to separate you environments, for instance production, developpement, backup, or in my example sites. In my company depending the site on which you are building a machine nameservers and ntp servers will differ. Remember that we are using templates files for resolv.conf and ntp.conf files :

knife environment show siteA
chef_type:           environment
cookbook_versions:
default_attributes:
  namesearch:  lab.chmod666.org chmod666.org
  nameserver1: 10.10.10.10
  nameserver2: 10.10.10.11
  nameserver3: 10.10.10.12
  ntpserver1:  11.10.10.10
  ntpserver2:  11.10.10.11
  ntpserver3:  11.10.10.12
description:         production site
json_class:          Chef::Environment
name:                siteA
override_attributes:

When chef-client will be called with -E siteA attribute it will replace node['namesearch'] by "lab.chmod666.org chomd666.org" in all recipes, and templates files.

A Chef run

When you are ok with your cookbook upload it to the Chef server:

# knife cookbook upload aix7
Uploading aix7           [0.1.0]
Uploaded 1 cookbook.

When chef-client is not executed by cloud-init you can run it by hand. I thought it is interessting to put an output of chef-client here, you can see that files are modified, packages installed and so on ;-) :

chef-clientrun1
chef-clientrun2

Ohai

ohai is a command delivered with chef-client. Its purpose is to gather information about the machine on which chef-client is executed. Each time chef-client is running a call to ohai is launched. By default ohai is gathering a lot of information such as ip address of the machine, the lpar id, the lpar name, and so on. A call to ohai is returning a json tree. Each element of this json tree can be accessed in Chef recipes or in Chef templates. For instance to get the lpar name the 'node['virtualization']['lpar_name']' can be called. Here is an example of a single call to ohai:

# ohai | more
  "ipaddress": "10.244.248.56",
  "macaddress": "FA:A3:6A:5C:82:20",
  "os": "aix",
  "os_version": "1",
  "platform": "aix",
  "platform_version": "7.1",
  "platform_family": "aix",
  "uptime_seconds": 14165,
  "uptime": "3 hours 56 minutes 05 seconds",
  "virtualization": {
    "lpar_no": "7",
    "lpar_name": "s00va9940866-ada56a6e-0000004d"
  },

At the time of writing this blog post there is -at my humble opinion- some attirbutes missing in ohai. For instance if you want to install a specific package from an lpp_source you first need to know what is you current oslevel (I mean the output of oslevel -s). Fortunately ohai can be surcharged by custom plugin and you can add your own attributes what ever it is.

  • In ohai 7 (the one shipped with chef-client 12) an attribute needs to be added to the Chef client.rb configuration to tells where the ohai plugins will be located. Remember that the chef-client is configured by cloud-init, to do so you need to modify the template used by cloud-init the build the client.rb file. This one is located in /opt/freeware/etc/cloud/template:
  • # tail -1 /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl
    Ohai::Config[:plugin_path] << '/etc/chef/ohai_plugins'
    # mkdir -p /etc/chef/ohai_plugins
    
  • After this modification the machine is ready to be captured.
  • You want your custom ohai plugins to be uploaded to the chef-client machine at the time of chef-client execution, here is an example of custom ohai plugin used as a template. This one will gather the oslevel (oslevel -s), the node name, the partition name and the memory mode of the machine. These attributes are gathered with lparstat command:
  • Ohai.plugin(:Aixcustom) do
      provides "aixcustom"
    
      collect_data(:aix) do
        aixcustom Mash.new
    
        oslevel = shell_out("oslevel -s").stdout.split($/)[0]
        nodename = shell_out("lparstat -i | awk -F ':' '$1 ~ \"Node Name\" {print $2}'").stdout.split($/)[0]
        partitionname = shell_out("lparstat -i | awk -F ':' '$1 ~ \"Partition Name\" {print $2}'").stdout.split($/)[0]
        memorymode = shell_out("lparstat -i | awk -F ':' '$1 ~ \"Memory Mode\" {print $2}'").stdout.split($/)[0]
    
        aixcustom[:oslevel] = oslevel
        aixcustom[:nodename] = nodename
        aixcustom[:partitionname] = partitionname
        aixcustom[:memorymode] = memorymode
      end
    end
    
  • The custom ohai plugin is written. Remember that you want this one to be uploaded on the machine a the chef-client execution. New attributes created by this plugin needs to be added in ohai. Here is a recipe uploading the custom ohai plugin, at the time the plugin is uploaded ohai is reloaded and new attributes can be utilized in any further templates (for recipes you have no other choice than putting the custom ohai plugin in the directroy before the capture):
  • cat ~/.chef/cookbooks/aix7/recipes/ohai_custom.rb
    ohai "reload" do
      action :reload
    end
    
    template "/etc/chef/ohai_plugins/aixcustom.rb" do
      notifies :reload, "ohai[reload]", :immediately
    end
    

chef-server, chef workstation, knife

I'll not detail here how to setup a Chef server, and how configure you Chef workstation (knife). There are plenty of good tutorials about that on the internet. Please just note that you need to use Chef sever 12 if you are using Chef client 12. Here are some good link to start.

I had some difficulties during the configuration here are a few tricks to know :

  • cacert can by found here: /opt/opscode/embedded/ssl/cert/cacert.pem
  • The Chef validation key can be found in /etc/chef/chef-validator.pem

Building the machine, checking the logs

  • The write_file part was executed, the file is present in /tmp filesystem:
  • # cat /tmp/cloud-init-started
    cloud-init was started on this server
    
  • The chef-client was configured, file are present in /etc/chef directory, looking at the log file these files were created by cloud-init
  • # ls -l /etc/chef
    total 32
    -rw-------    1 root     system         1679 Apr 26 23:46 client.pem
    -rw-r--r--    1 root     system          646 Apr 26 23:46 client.rb
    -rw-r--r--    1 root     system           38 Apr 26 23:46 firstboot.json
    -rw-r--r--    1 root     system         1679 Apr 26 23:46 validation.pem
    
    # grep chef | /var/log/cloud-init-output.log
    2015-04-26 23:46:22,463 - importer.py[DEBUG]: Found cc_chef with attributes ['handle'] in ['cloudinit.config.cc_chef']
    2015-04-26 23:46:22,879 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instances/a8b8fe0d-34c1-4bdb-821c-777fca1c391f/sem/config_chef - wb: [420] 23 bytes
    2015-04-26 23:46:22,882 - helpers.py[DEBUG]: Running config-chef using lock ()
    2015-04-26 23:46:22,884 - util.py[DEBUG]: Writing to /etc/chef/validation.pem - wb: [420] 1679 bytes
    2015-04-26 23:46:22,887 - util.py[DEBUG]: Reading from /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl (quiet=False)
    2015-04-26 23:46:22,889 - util.py[DEBUG]: Read 892 bytes from /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl
    2015-04-26 23:46:22,954 - util.py[DEBUG]: Writing to /etc/chef/client.rb - wb: [420] 646 bytes
    2015-04-26 23:46:22,958 - util.py[DEBUG]: Writing to /etc/chef/firstboot.json - wb: [420] 38 bytes
    
  • The runcmd part was executed:
  • # cat /opt/freeware/var/lib/cloud/instance/scripts/runcmd
    #!/bin/sh
    /usr/bin/chef-client
    
    2015-04-26 23:46:22,488 - importer.py[DEBUG]: Found cc_runcmd with attributes ['handle'] in ['cloudinit.config.cc_runcmd']
    2015-04-26 23:46:22,983 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instances/a8b8fe0d-34c1-4bdb-821c-777fca1c391f/sem/config_runcmd - wb: [420] 23 bytes
    2015-04-26 23:46:22,986 - helpers.py[DEBUG]: Running config-runcmd using lock ()
    2015-04-26 23:46:22,987 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instances/a8b8fe0d-34c1-4bdb-821c-777fca1c391f/scripts/runcmd - wb: [448] 31 bytes
    2015-04-26 23:46:25,868 - util.py[DEBUG]: Running command ['/opt/freeware/var/lib/cloud/instance/scripts/runcmd'] with allowed return codes [0] (shell=False, capture=False)
    
  • The final message was printed in the output of the cloud-init log file
  • 2015-04-26 23:06:01,203 - helpers.py[DEBUG]: Running config-final-message using lock ()
    The system is up, cloud-init is finished
    2015-04-26 23:06:01,240 - util.py[DEBUG]: The system is up, cloud-init is finished
    2015-04-26 23:06:01,242 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instance/boot-finished - wb: [420] 57 bytes
    

On the Chef server you can check the client registred itself and get details about it.

# knife node list | grep a8b8fe0d-34c1-4bdb-821c-777fca1c391f
a8b8fe0d-34c1-4bdb-821c-777fca1c391f
# knife node show a8b8fe0d-34c1-4bdb-821c-777fca1c391f
Node Name:   a8b8fe0d-34c1-4bdb-821c-777fca1c391f
Environment: _default
FQDN:
IP:          10.10.208.61
Run List:    role[aix7]
Roles:       france_testing
Recipes:     aix7::create_fs_rootvg, aix7::create_profile_root
Platform:    aix 7.1
Tags:

What's next ?

If you have a look on the Chef supermarket (the place where you can download Chef cookbooks written by the community and validated by opscode) you'll see that there are not a lot of cookbooks for AIX. I'm currently writting my own cookbook for AIX logical volume manager and filesystems creation, but there is still a lot of work to do on cookbooks creation for AIX. Here is a list of cookbooks that needs to be written by the community : chdev, multibos, mksysb, nim client, wpar, update_all, ldap_client .... I can continue this list but I'm sure that you have a lot of ideas. Last word learn ruby and write cookbooks, they will be used by the community and we can finally have a good configuration management tool on AIX. With PowerVC, cloud-init and Chef support AIX will have a full "DevOps" stack and can finally fight against Linux. As always hope this blog post helps you to understand PowerVC, cloud-init and Chef !