What’s new in VIOS 2.2.4.10 and PowerVM : Part 1 Virtual I/O Server Rules

I will post a series of mini blog posts about new features of PowerVM and Virtual I/O Server that are release this month. By this I mean Hardware Management Console 840 + Power firmware 840 + Virtual I/O Sever 2.2.4.10. As writing blog posts is not a part of my job and that I’m doing in that in my spare time some of the topics I will talk about have already been covered by other AIX bloggers but I think the more materials we have and the better it is. Other ones like this first one will be new to you. So please accept my apologize if topics are not what I’m calling “0 day” (the day of release). Anyway writing things help me to understand better and I add little details I have not seen in others blog post or in official documentation. Last point I will always try in these mini posts to give something new to you at least my point of view as an IBM customer. I hope it will be useful for you.

The first topic I want to talk about is Virtual I/O Server Rules. With the latest version three new commands called “rules” and “rulescfgset” and “rulesdeploy” are now available in the Virtual I/O Servers. Theses ones helps you configure your devices attributes by creating, deploying, or checking rules (with the current configuration). I’m 100% sure that every time you are installing a Virtual I/O Server you are doing the same thing over and over again: you check your buffers attributes, you check attributes on fiber channels adapters and so on. The rules is a way to be sure everything is the same on all your Virtual I/O Servers (you can create a rule file (xml format) that can be deploy on every Virtual I/O Server you install). Even better, if you are a PowerVC user like me you want to be sure that any new device created by PowerVC are created with the attributes you want (for instance buffer for Virtual Ethernet Adapters). In the “old days” you have to use the chdef command, you can now do this by using the rules. Better than giving you a list of command I’ll show you here what I’m now doing on my Virtual I/O Server in 2.2.4.10.

Creating and modifying existing default rules

Before starting here are (a non exhaustive list) the attributes I’m changing on all my Virtual I/O Servers at deploy time. I now want to do that using the rules (these are just examples, you can do much more using the rules):

  • On fcs Adapters I’m changing the max_xfer_size attribute to 0x200000.
  • On fcs Adapters I’m changing the num_cmd_elems attribute to 2048.
  • On fscsi Devices I’m changing the dyntrk attribute to yes.
  • On fscsi Devices I’m changing the fc_err_recov to fast_fail.
  • On Virtual Ethernet Adapters I’m changing the max_buf_tiny attribute to 4096.
  • On Virtual Ethernet Adapters I’m changing the min_buf_tiny attribute to 4096.
  • On Virtual Ethernet Adapters I’m changing the max_buf_small attribute to 4096.
  • On Virtual Ethernet Adapters I’m changing the min_buf_small attribute to 4096.
  • On Virtual Ethernet Adapters I’m changing the max_buf_medium attribute to 512.
  • On Virtual Ethernet Adapters I’m changing the min_buf_medium attribute to 512.
  • On Virtual Ethernet Adapters I’m changing the max_buf_large attribute to 128.
  • On Virtual Ethernet Adapters I’m changing the min_buf_large attribute to 128.
  • On Virtual Ethernet Adapters I’m changing the max_buf_huge attribute to 128.
  • On Virtual Ethernet Adapters I’m changing the min_buf_huge attribute to 128.

Modify existing attributes using rules

By default a “factory” default rule file now exist in the Virtual I/O Server. This one is located in /home/padmin/rules/vios_current_rules.xml, you can check the content of the file (it’s an xml file) and list the rules contains in it:

# ls -l /home/padmin/rules
total 40
-r--r-----    1 root     system        17810 Dec 08 18:40 vios_current_rules.xml
$ oem_setup_env
# head -10 /home/padmin/rules/vios_current_rules.xml
<?xml version="1.0" encoding="UTF-8"?>
<Profile origin="get" version="3.0.0" date="2015-12-08T17:40:37Z">
 <Catalog id="devParam.disk.fcp.mpioosdisk" version="3.0">
  <Parameter name="reserve_policy" value="no_reserve" applyType="nextboot" reboot="true">
   <Target class="device" instance="disk/fcp/mpioosdisk"/>
  </Parameter>
 </Catalog>
 <Catalog id="devParam.disk.fcp.mpioapdisk" version="3.0">
  <Parameter name="reserve_policy" value="no_reserve" applyType="nextboot" reboot="true">
   <Target class="device" instance="disk/fcp/mpioapdisk"/>
[..]
$ rules -o list -d

Let’s now say you have an existing Virtual I/O Server with en existing SEA configured on it. You want two things by using the rules:

  • Applying the rules to modify to the existing devices.
  • Be sure that new devices will be created using the rules.

For the purpose of this example we will work here on the buffers attributes of a Virtual Network Adapter (same concepts are applying to other devices type). So we have an SEA with Virtual Network Adapters and we want to change the buffers attributes. Let’s first check the current values of the Virtual Adapters:

$ lsdev -type adapter | grep -i Shared
ent13            Available   Shared Ethernet Adapter
$ lsdev -dev ent13 -attr virt_adapters
value

ent8,ent9,ent10,ent11
$ lsdev -dev ent8 -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny
value

64
64
256
2048
2048
24
24
128
512
512
$ lsdev -dev ent9 -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny
value

64
64
256
2048
2048
24
24
128
512
512

Let’s now check the value in the current Virtual I/O Servers rules:

$ rules -o list | grep buf
adapter/vdevice/IBM,l-lan      max_buf_tiny         2048
adapter/vdevice/IBM,l-lan      min_buf_tiny         512
adapter/vdevice/IBM,l-lan      max_buf_small        2048
adapter/vdevice/IBM,l-lan      min_buf_small        512

For the tiny and small buffer I can change the rules easily using the rules command (using modify operation):

$ rules -o modify -t adapter/vdevice/IBM,l-lan -a max_buf_tiny=4096
$ rules -o modify -t adapter/vdevice/IBM,l-lan -a min_buf_tiny=4096
$ rules -o modify -t adapter/vdevice/IBM,l-lan -a max_buf_small=4096
$ rules -o modify -t adapter/vdevice/IBM,l-lan -a min_buf_small=4096

I’m re-running the rules command to check rules are now modified :

$ rules -o list | grep buf
adapter/vdevice/IBM,l-lan      max_buf_tiny         4096
adapter/vdevice/IBM,l-lan      min_buf_tiny         4096
adapter/vdevice/IBM,l-lan      max_buf_small        4096
adapter/vdevice/IBM,l-lan      min_buf_small        4096

I can check the current values of my system against the current defined rules by using the diff operation:

# rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096

Creating new attributes using rules

In the current Virtual I/O Server rules embedded with the current Virtual I/O Server release there are no existing rules for the medium, large and huge buffer. Unfortunately for me I’m modifying these attributes by default and I want a rule capable of doing that. The goal is now to create a new set of rules for the other buffers not already present in the default file … Let’s try to do that using the add operation:

# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_medium=512
The rule is not supported or does not exist.

Annoying, I can’t add a rule for the medium buffer (same for the large and huge ones). The available attributes for each device is based on the current AIX artex catalog. You can check all the files present in the catalog to check what are the available attributes for each device type, you can see in the output below that there is nothing in the current ARTEX catalog for the medium buffer.

$ oem_setup_env
# cd /etc/security/artex/catalogs
# ls -ltr | grep l-lan
-r--r-----    1 root     security       1261 Nov 10 00:30 devParam.adapter.vdevice.IBM,l-lan.xml
# grep medium devParam.adapter.vdevice.IBM,l-lan.xml
# 

To show that this is possible to add new rules I’ll show you a simple example to add the new ‘src_lun_val’ and ‘dst_lun_val’ on the vioslpm0 device. First I check that I can add this rules by looking in the ARTEX catalog:

$ oem_setup_env
# cd /etc/security/artex/catalogs
# ls -ltr | grep lpm
-r--r-----    1 root     security       2645 Nov 10 00:30 devParam.pseudo.vios.lpm.xml
# grep -iE "src_lun_val|dest_lun_val" devParam.pseudo.vios.lpm.xml
  <ParameterDef name="dest_lun_val" type="string" targetClass="device" cfgmethod="attr" reboot="true">
  <ParameterDef name="src_lun_val" type="string" targetClass="device" cfgmethod="attr" reboot="true">

Then I’m checking the ‘range’ of authorized values for both attributes:

# lsattr -l vioslpm0 -a src_lun_val -R
on
off
# lsattr -l vioslpm0 -a dest_lun_val -R
on
off
restart_off
lpm_off

I’m searching the type using the lsdev command (here pseudo/vios/lpm):

# lsdev -P | grep lpm
pseudo         lpm             vios           VIOS LPM Adapter

I’m finally adding the rules and checking the differences:

$ rules -o add -t pseudo/vios/lpm -a src_lun_val=on
$ rules -o add -t pseudo/vios/lpm -a dest_lun_val=on
$ rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on

But what about my buffers, is there any possibility to add these attributes in the current ARTEX catalog. The answer is yes. By looking in catalog used for Virtual Ethernet Adapters (file named: devParam.adapter.vdevice.IBM,l-lan.xml) you will see that a catalog file named ‘vioent.cat’ is utilized by this xml file. Check the content of this catalog file by using the dspcat command and find if there is anything related to medium, large and huge buffers (all the catalogs files are location in /usr/lib/methods):

$ oem_setup_env
# cd /usr/lib/methods
# dspcat vioent.cat |grep -iE "medium|large|huge"
1 : 10 Minimum Huge Buffers
1 : 11 Maximum Huge Buffers
1 : 12 Minimum Large Buffers
1 : 13 Maximum Large Buffers
1 : 14 Minimum Medium Buffers
1 : 15 Maximum Medium Buffers

Modify the xml file located in the ARTEX catalog and add the necessary information for these three new buffers type:

$ oem_setup_env
# vi /etc/security/artex/catalogs/devParam.adapter.vdevice.IBM,l-lan.xml
<?xml version="1.0" encoding="UTF-8"?>

<Catalog id="devParam.adapter.vdevice.IBM,l-lan" version="3.0" inherit="devCommon">

  <ShortDescription><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="1">Virtual I/O Ethernet Adapter (l-lan)</NLSCatalog></ShortDescription>

  <ParameterDef name="min_buf_huge" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="10">Minimum Huge Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="max_buf_huge" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="11">Maximum Huge Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="min_buf_large" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="12">Minimum Large Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="max_buf_large" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="13">Maximum Large Buffers</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="min_buf_medium" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="14">Minimum Medium Buffers<</NLSCatalog></Description>
  </ParameterDef>

  <ParameterDef name="max_buf_medium" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="15">Maximum Medium Buffers</NLSCatalog></Description>
  </ParameterDef>

[..]
  <ParameterDef name="max_buf_tiny" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
    <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="19">Maximum Tiny Buffers</NLSCatalog></Description>
  </ParameterDef>


Then I’m retrying to add the rules of the medium,large and huge buffers …. and it’s working great:

# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_medium=512
# rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_medium=512
# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_huge=128
# rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_huge=128
# rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_large=128
# rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_large=128

Deploying the rules

Now that a couple of rules are defined let’s now apply them on the Virtual I/O server. First check the differences you will get after applying the rules by using the diff operation of the rules command:

$ rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_medium device=adapter/vdevice/IBM,l-lan   256 | 512
devParam.adapter.vdevice.IBM,l-lan:min_buf_medium device=adapter/vdevice/IBM,l-lan   128 | 512
devParam.adapter.vdevice.IBM,l-lan:max_buf_huge device=adapter/vdevice/IBM,l-lan      64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_huge device=adapter/vdevice/IBM,l-lan      24 | 128
devParam.adapter.vdevice.IBM,l-lan:max_buf_large device=adapter/vdevice/IBM,l-lan     64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_large device=adapter/vdevice/IBM,l-lan     24 | 128
devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on

Let’s now deploy the rules using the deploy operation of the rules command, you can notice that for some rules a mandatory reboot is needed to change the existing devices this is the case for the buffers, but not for the vioslpm0 attributes (we can check again that we now have no differences … some attributes are applied using the -P attribute of the chdev command):

$ rules -o deploy 
A manual post-operation is required for the changes to take effect, please reboot the system.
$ lsdev -dev ent8 -attr min_buf_small
value

4096
 lsdev -dev vioslpm0 -attr src_lun_val
value

on
$ rules -o diff -s

Don’t forget to reboot the Virtual I/O Server and check everything is ok after the reboot (check the kernel values by using enstat):

$ shutdown -force -restart
[..]
$ for i in ent8 ent9 ent10 ent11 ; do lsdev -dev $i -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny ; done
[..]
128
128
512
4096
4096
128
128
512
4096
4096
$ entstat -all ent13 | grep -i buf
[..]
No mbuf Errors: 0
  Transmit Buffers
    Buffer Size             65536
    Buffers                    32
      No Buffers                0
  Receive Buffers
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              4096     4096      512      128      128
    Max Buffers              4096     4096      512      128      128

For the fibre channels adapters I’m using theses rules:

$ rules -o modify -t driver/iocb/efscsi -a dyntrk=yes
$ rules -o modify -t driver/qliocb/qlfscsi -a dyntrk=yes
$ rules -o modify -t driver/qiocb/qfscsi -a dyntrk=yes
$ rules -o modify -t driver/iocb/efscsi -a fc_err_recov=fast_fail
$ rules -o modify -t driver/qliocb/qlfscsi -a fc_err_recov=fast_fail
$ rules -o modify -t driver/qiocb/qfscsi -a fc_err_recov=fast_fail

What about new devices ?

Let’s now create a new SEA by adding new Virtual Ethernet Adapter using DLPAR and check the devices are created with the good values. (I’m not showing you here how to create the VEA I’m doing it the GUI for simplicity) (14,15,16,17 are the new ones):

$ lsdev | grep ent
ent12            Available   EtherChannel / IEEE 802.3ad Link Aggregation
ent13            Available   Shared Ethernet Adapter
ent14            Available   Virtual I/O Ethernet Adapter (l-lan)
ent15            Available   Virtual I/O Ethernet Adapter (l-lan)
ent16            Available   Virtual I/O Ethernet Adapter (l-lan)
ent17            Available   Virtual I/O Ethernet Adapter (l-lan)
$ lsdev -dev ent14 -attr
buf_mode        min            Receive Buffer Mode                        True
copy_buffs      32             Transmit Copy Buffers                      True
max_buf_control 64             Maximum Control Buffers                    True
max_buf_huge    128            Maximum Huge Buffers                       True
max_buf_large   128            Maximum Large Buffers                      True
max_buf_medium  512            Maximum Medium Buffers                     True
max_buf_small   4096           Maximum Small Buffers                      True
max_buf_tiny    4096           Maximum Tiny Buffers                       True
min_buf_control 24             Minimum Control Buffers                    True
min_buf_huge    128            Minimum Huge Buffers                       True
min_buf_large   128            Minimum Large Buffers                      True
min_buf_medium  512            Minimum Medium Buffers                     True
min_buf_small   4096           Minimum Small Buffers                      True
min_buf_tiny    4096           Minimum Tiny Buffers                       True
$  mkvdev -sea ent0 -vadapter ent14 ent15 ent16 ent17 -default ent14 -defaultid 14 -attr ha_mode=sharing largesend=1 large_receive=yes
ent18 Available
$ entstat -all ent18 | grep -i buf
No mbuf Errors: 0
  Transmit Buffers
    Buffer Size             65536
    Buffers                    32
      No Buffers                0
  Receive Buffers
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              4096     4096      512      128      128
    Max Buffers              4096     4096      512      128      128
  Buffer Mode: Min
[..]

Deploying these rules to another Virtual I/O Server

The goal is now to use this rule file and deploy it on all my Virtual I/O Servers to be sure all the attributes are the same on all the Virtual I/O Servers.

I’m copying my rule file and copy it to another Virtual I/O Server:

$ oem_setup_env
# cp /home/padmin/rules
# scp /home/padmin/rules/custom_rules.xml anothervios:/home/padmin/rules
custom_rules.xml                   100%   19KB  18.6KB/s   00:00
# scp /etc/security/artex/catalogs/devParam.adapter.vdevice.IBM,l-lan.xml anothervios:/etc/security/artex/catalogs/
devParam.adapter.vdevice.IBM,l-lan.xml
devParam.adapter.vdevice.IBM,l-lan.xml    100% 2737     2.7KB/s   00:00

I’m now connecting to the new Virtual I/O Server and applying the rules:

$ rules -o import -f /home/padmin/rules/custom_rules.xml
$ rules -o diff -s
devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
devParam.adapter.vdevice.IBM,l-lan:max_buf_medium device=adapter/vdevice/IBM,l-lan   256 | 512
devParam.adapter.vdevice.IBM,l-lan:min_buf_medium device=adapter/vdevice/IBM,l-lan   128 | 512
devParam.adapter.vdevice.IBM,l-lan:max_buf_huge device=adapter/vdevice/IBM,l-lan      64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_huge device=adapter/vdevice/IBM,l-lan      24 | 128
devParam.adapter.vdevice.IBM,l-lan:max_buf_large device=adapter/vdevice/IBM,l-lan     64 | 128
devParam.adapter.vdevice.IBM,l-lan:min_buf_large device=adapter/vdevice/IBM,l-lan     24 | 128
devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on
$ rules -o deploy
A manual post-operation is required for the changes to take effect, please reboot the system.
$ entstat -all ent18 | grep -i buf
[..]
    Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers               512      512      128       24       24
    Max Buffers              2048     2048      256       64       64
[..]
$ shutdown -force -restart
$ entstat -all ent18 | grep -i buf
[..]
   Buffer Type              Tiny    Small   Medium    Large     Huge
    Min Buffers              4096     4096      512      128      128
    Max Buffers              4096     4096      512      128      128
[..]

rulescfgset

If you don’t care at all about creating your own rules you can just use the rulecfgset command as padmin to apply default Virtual I/O Server rules, my advice for newbies is to do that just after the Virtual I/O Server is installed. By doing that you will be sure to have the default IBM rules. It is a good pratice to do that every time you will deploy a new Virtual I/O Server.

# rulescfgset

Conclusion

Use rules ! It is a good way to be sure your Virtual I/O Server devices attributes are the same. I hope my example are good enough to convince you to use it. For PowerVC user like me using rules is a must. As PowerVC is creating devices for you, you want to be sure all your devices are created with the exact same attributes. My example about Virtual Ethernet Adapter buffers is just a mandatory thing to do now for PowerVC users. As always I hope it helps.

A first look at SRIOV vNIC adapters

I have the chance to participate in the current Early Shipment Program (ESP) for Power Systems, especially the software part. One of my tasks is to test a new feature called SRIOV vNIC. For those who does not know anything about SRIOV this technology is comparable to LHEA except it is based on a industry standard (and have a couple of other features). By using SRIOV adapter you can divide a physical port into what we call a Virtual Function (or a Logical Port) and map this Virtual Function to a partition. You can also set “Quality Of Service” on these Virtual Functions. At the creation you will setup the Virtual Function allowing it to take a certain percentage of the physical port. These can be very useful if you want to be sure that your production server will always have a guaranteed bandwidth instead of using a Shared Ethernet Adapter where every clients partitions are competing for the bandwidth. Customers are also using SRIOV adapters for performance purpose ; as nothing is going through the Virtual I/O Server the latency added by this action is eliminated and CPU cycles are saved on the Virtual I/O Server side (Shared Ethernet Adapter consume a lot of CPU cycles). If you are not aware of what SRIOV is I encourage you to check the IBM Redbook about it (http://www.redbooks.ibm.com/abstracts/redp5065.html?Open. Unfortunately you can’t move a partition by using Live Partition Mobility if this one have a Virtual Function assigned to it. Using vNICs allows you to use SRIOV through the Virtual I/O Servers and enable the possibility to move your partition even if you are using an SRIOV logical port. The better of two worlds : performance/qos and virtualization. Is this the end of the Shared Ethernet Adapter ?

SRIOV vNIC, what’s this ?

Before talking about the technical details it is important to understand what vNICs are. When I’m explaining this to newbies I often refer to NPIV. Imagine something similar as the NPIV but for the network part. By using SRIOV vNIC:

  • A Virtual Function (SRIOV Logical Port) is created and assigned to the Virtual I/O Server.
  • A vNIC adapter is created in the client partition.
  • The Virtual Function and the vNIC adapter are linked (mapped) together.
  • This is a one to one relationship between a Virtual Function and a vNIC (like a vfcs adapter is a one to one relationship between your vfcs and the physical fiber channel adapter).

On the image below, the vNIC lpars are the “yellow” ones, you can see here that the SRIOV adapter is divided in different Virtual Function, and some of them are mapped to the Virtual I/O Server. The relationship between the Virtual Function and the vNIC is achieved by a vnicserver (this is a special Virtual I/O Server device).
vNIC

One of the major advantage of using vNIC is that you eliminate the need of the Virtual I/O Server for data flows:

  • The network data flow is direct between the partition memory and the SRIOV adapter, there is no data copy passing through the Virtual I/O Server and it eliminate the CPU cost and the latency of doing that. This is achieved by LRDMA. Pretty cool !
  • The vNIC will inherits the bandwidth allocation of the Virtual Function (QoS). If the VF is configured with a capacity of 2% the vNIC will also have this capacity.
  • vNIC2

vNIC Configuration

Before checking all the details on how to configure an SRIOV vNIC adapter you have to check all the prerequisites. As this is a new feature you will need the latest level of …. everything. My advice is to stay up to date as much as possible.

vNIC Prerequisites

These outputs are taken from the early shipment program. All of this can be changed at the GA release:

  • Hardware Management Console v840:
  • # lshmc -V
    lshmc -V
    "version= Version: 8
     Release: 8.4.0
     Service Pack: 0
    HMC Build level 20150803.3
    ","base_version=V8R8.4.0
    "
    
  • Power 8 only, firmware 840 at least (both enterprise and scale out systems):
  • firmware

  • AIX 7.1TL4 or AIX 7.2:
  • # oslevel -s
    7200-00-00-0000
    # cat /proc/version
    Oct 20 2015
    06:57:03
    1543A_720
    @(#) _kdb_buildinfo unix_64 Oct 20 2015 06:57:03 1543A_720
    
  • Obviously at least on SRIOV capable adapter!

Using the HMC GUI

The configuration of a vNIC is done at the partition level. The configuration is only available on the enhanced version of the GUI. Select the virtual machine on which you want to add the vNIC and in the Virtual I/O tab you’ll see that a new Virtual NICs session is here. Click on “Virtual NICs” and a new panel will be opened with a new button called “Add Virtual NIC”, just click this one to add a Virtual NIC:

vnic_n1
vnic_conf2

All the SRIOV capable port will be displayed on the next screen. Choose the SRIOV port you want (a virtual function will be created on this one. Don’t do anything more, the creation of a vNIC will automatically create a Virtual Function; assign it to Virtual I/O Server and do the mapping to the vNIC for you). Choose the Virtual I/O Server that will be used for this vNIC (the vNIC server will be created on this Virtual I/O Server. Don’t worry we will talk about vNIC redundancy later in this post) and the Virtual NIC Capacity (the percentage the Phyiscal SRIOV port that will be dedicated to this vNIC)(this has to be a multiple of 2)(be careful with that it can’t be changed afterwards and you’ll have to delete your vNIC to redo the configuration) :

vnic_conf3

The “Advanced Virtual NIC Settings” allows you to choose the Virtual NIC Adapter ID, choosing a MAC Address, and configuring the vlan restrictions and vlan tagging. In the example below I’m configuring my Virtual NIC in the vlan 310:

vnic_conf4
vnic_conf5
allvnic

Using the HMC Command Line

As always the configuration can be achieved using the HMC command line, using lshwres to list vNIC and chhwres to create a vNIC.

List SRIOV adapters to get the adapter_id needed by the chhwres command:

# lshwres -r sriov --rsubtype adapter -m blade-8286-41A-21AFFFF
adapter_id=1,slot_id=21020014,adapter_max_logical_ports=48,config_state=sriov,functional_state=1,logical_ports=48,phys_loc=U78C9.001.WZS06RN-P1-C12,phys_ports=4,sriov_status=running,alternate_config=0
# lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0

Create the vNIC:

# chhwres -r virtualio -m blade-8286-41A-21AFFFF -o a -p 72vm1 --rsubtype vnic -v -a "port_vlan_id=310,backing_devices=sriov/vios2/1/1/1/2"

List the vNIC after create:

# lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0
lpar_name=72vm1,lpar_id=9,slot_num=2,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87702,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios2/1/1/1/2700400a/2.0

System and Virtual I/O Server Side:

  • On the Virtual I/O Server you can use two commands to check your vNIC configuration. You can first use the lsmap command to check the one to one relationship between the VF and the vNIC (you see on the output below that a VF and a vnicserver device are created)(you can also see the name of the vNIC in the client partition side) :
  • # lsdev | grep VF
    ent4             Available   PCIe2 100/1000 Base-TX 4-port Converged Network Adapter VF (df1028e214103c04)
    # lsdev | grep vnicserver
    vnicserver0      Available   Virtual NIC Server Device (vnicserver)
    # lsmap -vadapter vnicserver0 -vnic
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver0   U8286.41A.21FFFFF-V2-C32897             6 72nim1         AIX
    
    Backing device:ent4
    Status:Available
    Physloc:U78C9.001.WZS06RN-P1-C12-T4-S16
    Client device name:ent1
    Client device physloc:U8286.41A.21FFFFF-V6-C3
    
  • You can get more details (QoS, vlan tagging, port states) by using the vnicstat command:
  • # vnicstat -b vnicserver0
    [..]
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver0
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent4
    
    Client Partition ID: 6
    Client Partition Name: 72nim1
    Client Operating System: AIX
    Client Device Name: ent1
    Client Device Location Code: U8286.41A.21FFFFF-V6-C3
    [..]
    Device ID: df1028e214103c04
    Version: 1
    Physical Port Link Status: Up
    Logical Port Link Status: Up
    Physical Port Speed: 1Gbps Full Duplex
    [..]
    Port VLAN (Priority:ID): 0:3331
    [..]
    VF Minimum Bandwidth: 2%
    VF Maximum Bandwidth: 100%
    
  • On the client side you can list your vNIC and as always have details using the entstat command:
  • # lsdev -c adapter -s vdevice -t IBM,vnic
    ent0 Available  Virtual NIC Client Adapter (vnic)
    ent1 Available  Virtual NIC Client Adapter (vnic)
    ent3 Available  Virtual NIC Client Adapter (vnic)
    ent4 Available  Virtual NIC Client Adapter (vnic)
    # entstat -d ent0 | more
    [..]
    ETHERNET STATISTICS (ent0) :
    Device Type: Virtual NIC Client Adapter (vnic)
    [..]
    Virtual NIC Client Adapter (vnic) Specific Statistics:
    ------------------------------------------------------
    Current Link State: Up
    Logical Port State: Up
    Physical Port State: Up
    
    Speed Running:  1 Gbps Full Duplex
    
    Jumbo Frames: Disabled
    [..]
    Port VLAN ID Status: Enabled
            Port VLAN ID: 3331
            Port VLAN Priority: 0
    

Redundancy

You will certainly agree that having a such new cool feature without having something that is fully redundant would be a shame. Hopefully we have here a solution with the return with a great fanfare of the Network Interface Backup (NIB). As I told you before each time a vNIC is created a vnicserver is created on one of the Virtual I/O Server. (At the vNIC creation you have to choose on which Virtual I/O server it will be created). So to be fully redundant and to have a failover feature the only way is to create two vNIC adapters (one using the first Virtual I/O Server and the second one using the second Virtual I/O Server, on top of this you then have to create a Network Interface Backup, like in the old times :-) ). Here are a couple of things and best practices to know before doing this.

  • You can’t use two VF coming from the same SRIOV adapter physical port (the NIB creation will be ok, but any configuration on top of this NIB will fail).
  • You can use two VF coming from the same SRIOV adapter but with two different logical ports (this is the example I will show below).
  • The best partice is to use two VF coming from two different SRIOV adapters (you can then afford to loose one of the two SRIOV adapter).

vNIC_nib

Verify on your partition that you have two vNIC adapters and check that the status are ok using the ‘entstat‘ command:

  • Both vNIC are available on the client partition:
  • # lsdev -c adapter -s vdevice -t IBM,vnic
    ent0 Available  Virtual NIC Client Adapter (vnic)
    ent1 Available  Virtual NIC Client Adapter (vnic)
    # lsdev -c adapter -s vdevice -t IBM,vnic -F physloc
    U8286.41A.21FFFFF-V6-C2
    U8286.41A.21FFFFF-V6-C3
    
  • You can check on the first Virtual I/O Server that “Current Link State”, “Logical Port State” and “Physical Port State” are ok (all of them needs to be up):
  • # entstat -d ent0 | grep -p vnic
    -------------------------------------------------------------
    ETHERNET STATISTICS (ent0) :
    Device Type: Virtual NIC Client Adapter (vnic)
    Hardware Address: ee:3b:86:f6:45:02
    Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
    
    Virtual NIC Client Adapter (vnic) Specific Statistics:
    ------------------------------------------------------
    Current Link State: Up
    Logical Port State: Up
    Physical Port State: Up
    
  • Same on the second Virtual I/O Server:
  • # entstat -d ent1 | grep -p vnic
    -------------------------------------------------------------
    ETHERNET STATISTICS (ent1) :
    Device Type: Virtual NIC Client Adapter (vnic)
    Hardware Address: ee:3b:86:f6:45:03
    Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
    
    Virtual NIC Client Adapter (vnic) Specific Statistics:
    ------------------------------------------------------
    Current Link State: Up
    Logical Port State: Up
    Physical Port State: Up
    

Verify on both Virtual I/O Server that the two vNIC are coming from two different SRIOV adapters (for the purpose of this test I’m using two different ports on the same SRIOV adapters but it remains the same with two different adapters). You can see on the output below that on Virtual I/O Server 1 the vNIC is backed to the adapter on position 3 (T3) and that on Virtual I/O Server 2 the vNIC is backed to the adapter on position 4 (T4):

  • Once again use the lsmap command on the first Virtual I/O Server to check that (note that you can check the client name, and the client device):
  • # lsmap -vadapter vnicserver0 -vnic
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver0   U8286.41A.21AFF8V-V1-C32897             6 72nim1         AIX
    
    Backing device:ent4
    Status:Available
    Physloc:U78C9.001.WZS06RN-P1-C12-T3-S13
    Client device name:ent0
    Client device physloc:U8286.41A.21AFF8V-V6-C2
    
  • Same thing on the second Virtual I/O Server:
  • # lsmap -vadapter vnicserver0 -vnic -fmt :
    vnicserver0:U8286.41A.21AFF8V-V2-C32897:6:72nim1:AIX:ent4:Available:U78C9.001.WZS06RN-P1-C12-T4-S14:ent1:U8286.41A.21AFF8V-V6-C3
    

Finally create the Network Interface Backup and put and IP on top of it:

# mkdev -c adapter -s pseudo -t ibm_ech -a adapter_names=ent0 -a backup_adapter=ent1
ent2 Available
# mktcpip -h 72nim1 -a 10.44.33.223 -i en2 -g 10.44.33.254 -m 255.255.255.0 -s
en2
72nim1
inet0 changed
en2 changed
inet0 changed
[..]
# echo "vnic" | kdb
+-------------------------------------------------+
|       pACS       | Device | Link |    State     |
|------------------+--------+------+--------------|
| F1000A0032880000 |  ent0  |  Up  |     Open     |
|------------------+--------+------+--------------|
| F1000A00329B0000 |  ent1  |  Up  |     Open     |
+-------------------------------------------------+

Let’s now try different things to see if the redundancy is working ok. First let’s shutdown one of the Virtual I/O Server and let’s ping our machine from another one:

# ping 10.14.33.223
PING 10.14.33.223 (10.14.33.223) 56(84) bytes of data.
64 bytes from 10.14.33.223: icmp_seq=1 ttl=255 time=0.496 ms
64 bytes from 10.14.33.223: icmp_seq=2 ttl=255 time=0.528 ms
64 bytes from 10.14.33.223: icmp_seq=3 ttl=255 time=0.513 ms
[..]
64 bytes from 10.14.33.223: icmp_seq=40 ttl=255 time=0.542 ms
64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.514 ms
64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.550 ms
64 bytes from 10.14.33.223: icmp_seq=48 ttl=255 time=0.596 ms
[..]
--- 10.14.33.223 ping statistics ---
50 packets transmitted, 45 received, 10% packet loss, time 49052ms
rtt min/avg/max/mdev = 0.457/0.525/0.596/0.043 ms
# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
59224136   1120200815 P H ent2           ETHERCHANNEL FAILOVER
F655DA07   1120200815 I S ent0           VNIC Link Down
3DEA4C5F   1120200815 T S ent0           VNIC Error CRQ
81453EE1   1120200815 T S vscsi1         Underlying transport error
DE3B8540   1120200815 P H hdisk0         PATH HAS FAILED
# echo "vnic" | kdb
(0)> vnic
+-------------------------------------------------+
|       pACS       | Device | Link |    State     |
|------------------+--------+------+--------------|
| F1000A0032880000 |  ent0  | Down |   Unknown    |
|------------------+--------+------+--------------|
| F1000A00329B0000 |  ent1  |  Up  |     Open     |
+-------------------------------------------------+

Same test with the addition of an address to ping, and I’m only loosing 4 packets:

# ping 10.14.33.223
[..]
64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.627 ms
64 bytes from 10.14.33.223: icmp_seq=42 ttl=255 time=0.548 ms
64 bytes from 10.14.33.223: icmp_seq=46 ttl=255 time=0.629 ms
64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.492 ms
[..]
# errpt | more
59224136   1120203215 P H ent2           ETHERCHANNEL FAILOVER
F655DA07   1120203215 I S ent0           VNIC Link Down
3DEA4C5F   1120203215 T S ent0           VNIC Error CRQ

vNIC Live Partition Mobility

By default you can use Live Partition Mobility with SRIOV vNIC, it is super simple and it is fully supported by IBM, as always I’ll show you how to do that using the HMC GUI and the command line:

Using the GUI

First validate the mobility operation, it will allow you to choose the destination SRIOV adapter/port on which to map your current vNIC. You have to choose:

  • The adapter (if you have more than one SRIOV adapter).
  • The Physical port on which the vNIC will be mapped.
  • The Virtual I/O Server on which the vnicserver will be created.

New options are now available in the mobility validation panel:

lpmiov1

Modify each vNIC to match your destination SRIOV adapter and ports (choose the destination Virtual I/O Server here):

lpmiov2
lpmiov3

Then migrate:

lpmiov4

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A5E6DB96   1120205915 I S pmig           Client Partition Migration Completed
4FB9389C   1120205915 I S ent1           VNIC Link Up
F655DA07   1120205915 I S ent1           VNIC Link Down
11FDF493   1120205915 I H ent2           ETHERCHANNEL RECOVERY
4FB9389C   1120205915 I S ent1           VNIC Link Up
4FB9389C   1120205915 I S ent0           VNIC Link Up
[..]
59224136   1120205915 P H ent2           ETHERCHANNEL FAILOVER
B50A3F81   1120205915 P H ent2           TOTAL ETHERCHANNEL FAILURE
F655DA07   1120205915 I S ent1           VNIC Link Down
3DEA4C5F   1120205915 T S ent1           VNIC Error CRQ
F655DA07   1120205915 I S ent0           VNIC Link Down
3DEA4C5F   1120205915 T S ent0           VNIC Error CRQ
08917DC6   1120205915 I S pmig           Client Partition Migration Started

The ping test during the lpm show only 9 ping lost, due to etherchannel failover (on of my port was down at the destination server):

# ping 10.14.33.223
64 bytes from 10.14.33.223: icmp_seq=23 ttl=255 time=0.504 ms
64 bytes from 10.14.33.223: icmp_seq=31 ttl=255 time=0.607 ms

Using the command line

I’m moving back the partition using the HMC command line interface, check the manpage for all the details. Here is the details for the vnic_mappings: slot_num/ded/[vios_lpar_name]/[vios_lpar_id]/[adapter_id]/[physical_port_id]/[capacity]:

  • Validate:
  • # migrlpar -o v -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'
    
    Warnings:
    HSCLA291 The selected partition may have an open virtual terminal session.  The management console will force termination of the partition's open virtual terminal session when the migration has completed.
    
  • Migrate:
  • # migrlpar -o m -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'
    

Port Labelling

One thing very annoying using LPM with vNIC is that you have to do the mapping of your vNIC each time you are moving. The default choices are never ok and the GUI will always show you the first port or the first adapter and you will have to do that job by yourself. Even worse with the command line the vnic_mappings can give you some headaches :-) . Hopefully there is a feature called port labelling. You can put a label on each SRIOV Physical port and all your machines. My advice is to tag the ports that are serving the same network and the same vlan with the same label on all your machines. During the mobility operation if labels are matching between two machine the adapter/port combination matching the label will be automatically chosen for the mobility and you will have nothing to do to map on your own. Super useful. The outputs below show you how to label your SRIOV ports:

label1
label2

# chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=3,phys_port_label=adapter1port3"
# chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=2,phys_port_label=adapter1port2"
# lshwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport --level eth -F adapter_id,phys_port_label
1,adapter1port2
1,adapter1port3

At the validation time source and destination ports will automatically be matched:

labelautochoose

What about performance

One of the main reason I’m looking for SRIOV vNIC adapter is performance. As all of our design is based on the fact that we need to move all of our virtual machines from a host to one another we need a solution allowing both mobility and performance. If you have tried to run a TSM server in a virtualized environment you’ll probably understand what I mean about performance and virtualization. In the case of TSM you need a lot of network bandwidth. My current customer and my previous one tried to do that using Shared Ethernet Adapters and of course this solution did not work because a classic Virtual Ethernet Adapter is not able to provide enough bandwidth for a single Virtual I/O client. I’m not an expert about network performance but the result you will see below are pretty obvious to understand and will show you the power of vNIC and SRIOV (I know some optimization can be done on the SEA side but it’s just a super simple test).

Methodology

I will try here to compare a classic Virtual Ethernet Adapter with a vNIC in the same configuration, both environments are the same, using same machines, same switches on so on:

  • Two machines are used to do the test. In case of vNIC both are using a single vNIC bacedk to a 10Gb adapter, in case of Virtual Ethernet Adapter both are backed to a SEA build on top of a 10Gb adapter.
  • The two machines are running on two different s814.
  • Entitlement and memory are the same for source and destination machines.
  • In the case of vNIC the capacity of the VF is set at 100% and the physical port of the SRIOV adapter is dedicated to the vNIC.
  • In the case of vent the SEA is dedicated to the test virtual machine.
  • In both cases a MTU of 1500 is utilized.
  • The tool used for the performance test is iperf (MTU 1500, Window Size 64K, and 10 TCP thread)

SEA test for reference only

  • iperf server:
  • seaserver1

  • iperf client:
  • seacli1

vNIC SRIOV test

We are here running the exact same test:

  • iperf server:
  • iperf_vnic_client2

  • iperf client:
  • iperf_vnic_client

By using a vNIC I get 300% of the bandwidth I get with an virtual ethernet adapter. Just awesome ;-) no tuning (out of the box configuration). Nothing more to add about it it’s pretty obvious that the usage of vNIC for performance will be a must.

Conclusion

Are SRIOV vNICs the end of the SEAs ? Maybe, but not yet ! For some cases like performance and QoS it will be very useful and adopted (I’m pretty sure I will use that for my current customer to virtualized the TSM servers). But today in my opinion SRIOV lacks a real redundancy feature at the adapter level. What I want is a heartbeat communication between the two SRIOV adapters. Having such a feature on a SRIOV adapter will finish to convince customers to move from SEA to SRIOV vNIC. I know nothing about the future but I hope something like that will be available in the next few years. To sum up SRIOV vNICs are powerful, easy to use and simplify the configuration and management of your Power Servers. Please wait for the GA and try this new killer functionality. As always I hope it helps.

IBM Technical University for PowerSystems 2015 – Cannes (both sessions files included)

I’m traveling the world since my first IBM Technical University for PowerSystems in Dublin (4 years ago as far as I remember). I had the chance to be in Budapest last year and in Cannes this year (a little bit less funny for a French guy than Dublin and Budapest) but in a different way. I had this year the opportunity to be a speaker for two sessions (and two repeats) thanks to the kindness of Alex Abderrazag (thank you for trusting me Alex). My first plan was to go to Tokyo for the Openstack summit to talk about PowerVC but unfortunately for me I was not able to make it because of confidentiality issues I had with my current company (the goal here was to be a customer reference for PowerVC). I didn’t realized that creating two sessions from scratch on two topics which are pretty new would have been so hard for me. I thought it would take me a couple of hours for each one but it took me so many hours for each one that I now have to be impressed by people who are doing this as their daily job ;-) . Something that took me even more hours than creating the slides is the preparation of these two sessions (Speaker notes, practicing (special thanks here to the people who helped me to practice the sessions especially the fantastic Bill Miller ;-) ) and so on …). One last thing I didn’t realized is that you have to manage your stress. As it was my first time in a such a big event I can assure you that I was super stressed. One funny thing about the stress is that I didn’t have any stress anymore just one hour before the session. Before that moment I had to find solution to deal with the stress … and I just realized that I wasn’t stress because of the sessions but because I had to speak English in front of so much people (super tricky thing to do for a shy french guy, trust me !). My first sessions (on both topics) were full (no more chairs available in the room) and the repeat were ok too, so I think it was ok and I think I was not so bad at it ;-) .

IMG_20151104_233030

I wanted here to thanks all the people who helped me to do this. Philippe Hermes (best pre-sales in France ;-) ) for believing in me and helping me to do that (re-reading my Powerpoint, and taking care of me during the event). Alex Abderrazag for allowing me to do that. Nigel Griffiths for re-reading the PowerVC session and giving me a couple of tips and tricks about being a speaker. Bill Miller and Alain Lechevalier for the rehearsal of both sessions and finally Rosa Davidson (she gave me the envy to do that). I’m not forgetting Jay Kruemcke who gave me some IBM shirts to do these sessions (and also for a lot of other things). Sorry for those whom I may have forgotten.

Many people asked me to share my Powerpoint files, you will find both files below in this post, here are the two presentations:

  • PowerVC for PowerVM deep dive – Tips & Tricks.
  • Using Chef Automation on AIX.

PowerVC for PowerVM deep dive – Tips & Tricks

This session is for PowerVC advanced users. You’ll find a lot of tips and tricks allowing you to customize your PowerVC. More than a couple of tips and tricks you’ll also find in this session how PowerVC works (images, activation, cloud-init, and so on). If you are not a PowerVC user this session can be a little bit difficult for you. But these tips and tricks are the lessons I learned from the field using PowerVC in a production environment:

Using Chef Automation on AIX

This session will give you all the basis to understand what is Chef and what you can do with this tool. You’ll also find examples on how to update service pack and technology level on AIX using Chef. Good examples about using Chef for post installation tasks and how to use it with PowerVC are also provided in this session.

Conclusion

I hope you enjoyed the session if you were at Cannes this year. On my side I really enjoyed doing that, it was a very good experience for me. I hope I’ll have the opportunity to do that again. Feel free to tell my if want to see me in future technical events like these one. The next step is now to do something at Edge … not so sure this dream will come true any time ;-) .

Updating AIX TL and SP using Chef

Creating something to automate the update of a service pack or a technology level has always been a dream that never come true. You can trust me almost every customers that I know tried to make that dream come true. Different customers, same story everywhere. They tried to do something and then tripped up in a miserable way. A fact that is always true in those stories is that the decision of doing that is always taken by someone that do not understand that AIX cannot be managed like a workstation or any other OS (who said windows). A good example of that is an IBM (and you know that I’m an IBM fan) tool call BigFix/TEM (for Tivoli Endpoint Manager): I’m not an expert about TEM (so maybe I am wrong) but you can use this one to update your Windows OS, your Linux, your AIX and even your Iphones or Android devices. LET ME LAUGH! How can it be possible that someone think about this: updating an Iphone the same way as you update an AIX. A good joke! (To be clear I am always and will always support IBM but my role is also to say what I think). Another good example is the utilization of IBM Systems Director (unfortunately … or fortunately this one has been withdrawn since a couple of days). I tried this one myself a few years ago (you can check this post). System Director was (in my humble opinion) the least worst solution to update an AIX or a Virtual I/O Server in a automated way. So how are we going to do this in a world that is always asking to do more with less people ?. I had to find a solution a few months ago to update more than 700 hosts from AIX 6.1 to AIX 7.1, the job was to create something that anybody can launch without knowing anything about AIX (one more time who can even think this is possible ?). I tried to do things like writing scripts to automate nimadm and I’m pretty happy with this solution (almost 80% were ok without any errors, but there were tons of prerequisites before launching the scripts and we faced some problems that were inevitable (nimsh error, sendmail configuration, broken filesets) forcing the AIX L3 team to fix tons of migrations). As everybody knows now I’m working on Chef since a few months and this can be the solution to what our world is asking today : replacing hundred of peoples by a single man launching a magical thing that can do everything without knowing anything about anything and save money! This is obviously ironical but unfortunately this is the reality of what happends today in France. “Money” and “resource” rules everything without having any plans about the future (to be clear I’m here talking about a generality, nothing here can reflect what’s going on in my place). It is like it is and as a good soldier I’m going to give you solutions to face the reality of this harsh world. But now it’s action time ! I don’t want to be too pessimistic but this is unfortunately the reality of what is happening today and my anger about that only reflects the fact that I’m living in fear, the fear of becoming bad or the fear of doing a job I really don’t like. I think I have to find a solution about this problem. The picture below is clear enough to give you a good a example of what I’m trying to do with Chef.

CF8j9_dWgAAOuyC

How do we update machines

I’m not here to teach you how to update a service pack or a technology level (I’m sure everybody know that) but in an automated way we need to talk about the method and identify each needed steps to perform an update. As there is always one more way to do it I have identified three ways to update a machine (the multibos way, the nimclient way and finally the alt_disk_copy way). To be able to update using Chef we obviously need to have an available provider for each method (you can do this with the execute resource, but we’re here to have fun and to learn some new things). So we need one provider capable of managing multibos, one capable of managing nimclient, and one capable of managing alt_disk_copy. All of these three providers are available now and can be used to write different recipes doing what is necessary to update a machine. Obviously there are pre-update and post-update steps needed (removing efixes, checking filesets). Let’s identify the step required first:

  • Verify with lppchk the consistency of all installed packages.
  • Remove any installed efixes (using emgr provider)
  • The multibos way:
    • You don’t need to create a backup of the rootvg using the multibos way.
    • Mount the SP or TL directory from the NIM server (using Chef mount resource).
    • Create the multibos instance and update using the remote mounted directory (using multibos resource).
  • The nimclient way:
    • Create a backup of your rootvg (using the altdisk resource).
    • Use nimclient to run a cust operation (using niminit,nimclient resource).
  • The alt_disk_copy way:
    • You don’t new to create a backup of the rootvg using the alt_disk_copy way.
    • Mount the SP or TL directory from the NIM server (using Chef mount).
    • Create the altinst_rootvg volume group and update it using the remote mounted directory (using altdisk provider).
  • Reboot the machine.
  • Remove any unwanted bos, old_rootvg.

Reminder where to download the AIX Chef cookbook:

Before trying to do all these steps in a single way let’s try to use the resources one by one to understand what each one is doing.

Fixes installation

This one is simple and allows you to install or remove fixes from your AIX machine, in the example below we are going to show how to do that with two Chef recipes: one for installing and the other one for removing! Super easy.

Installing fixes

In the recipe provides all the fixes name in an array and specify the name of the directory in which the filesets are (this can be an NFS mount point if you want to). Please note here that I’m using the cookbook_file resource to download the fixes, this resource allows you to download a file directly from the cookbook (so from the Chef server). Imagine using this single recipe to install a fix on all your machines. Quite easy ;-)

directory "/var/tmp/fixes" do
  action :create
end

cookbook_file "/var/tmp/fixes/IV75031s5a.150716.71TL03SP05.epkg.Z" do
  source 'IV75031s5a.150716.71TL03SP05.epkg.Z'
  action :create
end

cookbook_file "/var/tmp/fixes/IV77596s5a.150930.71TL03SP05.epkg.Z" do
  source 'IV77596s5a.150930.71TL03SP05.epkg.Z'
  action :create
end

aix_fixes "installing fixes" do
  fixes ["IV75031s5a.150716.71TL03SP05.epkg.Z", "IV77596s5a.150930.71TL03SP05.epkg.Z"]
  directory "/var/tmp/fixes"
  action :install
end

directory "/var/tmp/fixes" do
  recursive true
  action :delete
end

emgr1

Removing fixes

The recipe is almost the same but with the remove action instead of the install action. Please note that you can specify which fixes to remove or use the keyword all to remove all the installed fixes (in the case of our recipe to update our servers we will use “all” as we want to remove all fixes before trying launch the update).

aix_fixes "remove fixes IV75031s5a and IV77596s5a" do
  fixes ["IV75031s5a", "IV77596s5a]
  action :remove
end
aix_fixes "remove all fixes" do
  fixes ["all"]
end

emgr2

Alternate disks

In most AIX places I have seen the solution to backup your system before doing anything is to create an alternate disk using the alt_disk_copy command. Sometimes in some places where sysadmins love their job this disk is updated on the go to do a TL or SP upgrade. The altdisk resource I’ve coded for Chef take care of this. I’ll not detail with examples every actions available and will focus on create and cust:

  • create: This action create an alternate disk we will detail the attributes in the next section.
  • cleanup: Cleanup the alternate disk (remove it).
  • rename: Rename the alternate disk.
  • sleep: Put the alternate disk in sleep (umount every /alt_inst/* filesystem and varyoff the volume group)
  • wakeup: Wake up the alternate disk (varyon the volume group and mount every filesystems)
  • customize: Run a cust operation (the current resource is coded to use a directory to update the alternate disk with all the filesets present in a directory).

Creation

The alternate disk create action create an alternate disk an helps you to find an available disk for this creation. In any cases only free disks will be choosen (disks with no PVID and no volume group defined). Different types are available to choose the disk on which the alternate disk will be created:

  • Size: If type is size a disk by the exact same size of the value attribute will be used.
  • Name: If type is name a disk by the name of the value attribute will be used.
  • Auto: In auto mode available values for value are bigger and equals. If bigger is choose the first disk found with a size bigger than the current rootvg size will be used. If equals is choose the first disk found with a size equals to the current rootvg size is used.
aix_altdisk "cloning rootvg by name" do
  type :name
  value "hdisk3"
  action :create
end
aix_altdisk "cloning rootvg by size 66560" do
  type :size
  value "66560"
end
aix_altdisk "removing old alternates" do
  action :cleanup
end

aix_altdisk "cloning rootvg" do
  type :auto
  value "bigger"
  action :create
end

altdisk1

Customization

The customization action will update the previously created alternate disk with the filesets presents in an NFS mounted directory (from the NIM server). Please note in the recipe below that we are mounting the directory from NFS. The node[:nim_server] is an attribute of the node telling which nim server will be mounted. For instance you can define a nim server used for production environment and a nim server used for development environment.

# mounting /mnt
mount "/mnt" do
  device '#{node[:nim_server]}:/export/nim/lpp_source'
  fstype 'nfs'
  action :mount
end

# updating the current disk
aix_altdisk "altdisk_update" do
  image_location "/mnt/7100-03-05-1524"
  action :customize
end

mount "/mnt" do
  action :umount
end

altdisk_cust

niminit/nimclient

The niminit and nimclient resources are used to register the nimclient to the nim master and then run a nimclient operation from the client. In my humble opinion this is the best way to do the update at the time of writing this blog post. One cool thing is that you can specify on which adapter the nimclient will be configured by using some ohai attributes. It’s an elegant way to do that, one more time this is showing you the power of Chef ;-) . Let’s start with some examples:

niminit

aix_niminit node[:hostname] do
  action :remove
end

aix_niminit node[:hostname] do 
  master "nimcloud"
  connect "nimsh"
  pif_name node[:network][:default_interface]
  action :setup
end

nimclient1

nimclient

nimclient can first be used to install some filesets you may need. The provider is intelligent and can choose the good lpp_source for you. Please note that you will need lpp_source with a specific naming convention if you want to use this feature. To find the next/latest available sp/tl the provider is checking the current oslevel of the current machine and compare it with the available lpp_source present on you nim server. The naming convention needed is $(oslevel s)-lpp_source (ie. 7100-03-05-1524-lpp_source) (same principle is applicable to the spot when you need to use spot)

$ lsnim -t lpp_source | grep 7100
7100-03-00-0000-lpp_source             resources       lpp_source
7100-03-01-1341-lpp_source             resources       lpp_source
7100-03-02-1412-lpp_source             resources       lpp_source
7100-03-03-1415-lpp_source             resources       lpp_source
7100-03-04-1441-lpp_source             resources       lpp_source
7100-03-05-1524-lpp_source             resources       lpp_source

If your nim resources name are ok the lpp_source attribute can be:

  • latest_sp: the latest available service pack.
  • next_sp: the next available service.
  • latest_tl: the latest available technology level.
  • next_tl: the next available techonlogy level.
  • If you do not want to do this you can still specify the name of the lpp_source by hand.

Here are a few example to install packages

aix_nimclient "installing filesets" do
  installp_flags "aXYg"
  lpp_source "7100-03-04-1441-lpp_source"
  filesets ["openssh.base.client","openssh.base.server","openssh.license"]
  action :cust
end

aix_nimclient "installing filesets" do
  installp_flags "aXYg"
  lpp_source "7100-03-04-1441-lpp_source"
  filesets ["bos.compat.cmds", "bos.compat.libs"]
  action :cust
end

aix_nimclient "installing filesets" do
  installp_flags "aXYg"
  lpp_source "7100-03-04-1441-lpp_source"
  filesets ["Java6_64.samples"]
  action :cust
end

nimclient2

Please note that some filesets were already installed and the resource did not converge because of that ;-) . Let’s now try to update to the latest service pack:

aix_nimclient "updating to latest sp" do
  installp_flags "aXYg"
  lpp_source "latest_sp"
  fixes "update_all"
  action :cust
end

nimclient3

Tadam the machine was updated from 7100-03-04-1441 to 7100-03-05-1524 using a single a recipe and without knowing which service pack was available to update!

multibos

I really like the multibos way and I don’t know why today so few peoples are using it, anyway, I know some customers who are only working this way so I thought it was worth it working on a multibos resource. Here is a nice recipe creating a bos and updating it.

# creating dir for mount
directory "/var/tmp/mnt" do
  action :create
end

# mounting /mnt
mount "/var/tmp/mnt" do
  device "#{node[:nim_server]}:/export/nim/lpp_source"
  fstype 'nfs'
  action :mount
end

# removing standby multibos
aix_multibos "removing standby bos" do
  action :remove
end

# create multibos and updateit
aix_multibos "creating bos " do
  action :create
end

aix_multibos "update bos" do
  update_device "/var/tmp/mnt/7100-03-05-1524"
  action :update
end

# unmount /mnt
mount "/var/tmp/mnt" do
  action :umount
end

# deleting temp directory
directory "/var/tmp/mnt" do
  action :delete
end

multibos1
multibos2

Full recipes for updates

Let’s now write a big recipe doing all the things we need for an update. Remember that if one resource is failing the recipe stop by itself. For instance you’ll see in the recipe below that I’m doing an “lppchk -vm3″. If it returns something other than 0, the resources fail and the recipe fail. It’s obviously a normal behavior, it’s seems ok not to continue if there is a problem. So to sum up here are all the steps this recipe is doing: check fileset consistency, removing all fixes, committing filesets, creating an alternate disk, configuring the nimclient, running the update, deallocating resources

# if lppchk -vm return code is different
# than zero recipe will fail
# no guard needed here
execute "lppchk" do
  command 'lppchk -vm3'
end

# removing any efixes
aix_fixes "remvoving_efixes" do
  fixes ["all"]
  action :remove
end

# committing filesets
# no guard needed here
execute 'commit' do
  command 'installp -c all'
end

# cleaning exsiting altdisk
aix_altdisk "cleanup alternate rootvg" do
  action :cleanup
end

# creating an alternate disk using the
# first disk bigger than the actual rootvg
# bootlist to false as this disk is just a backup copy
aix_altdisk "altdisk_by_auto" do
  type :auto
  value "bigger"
  change_bootlist true
  action :create
end

# nimclient configuration
aix_niminit node[:hostname] do
  master "nimcloud"
  connect "nimsh"
  pif_name "en1"
  action :setup
end

# update to latest available tl/sp
aix_nimclient "updating to latest sp" do
  installp_flags "aXYg"
  lpp_source "latest_sp"
  fixes "update_all"
  action :cust
end

# dealloacate resource
aix_nimclient "deallocating resources" do
  action :deallocate
end

How about a single point of management “knife ssh”, “pushjobs”

Chef is and was designed on a pull model, it means that the client is asking to server to get the recipes and cookbooks and then execute them. This is the role of the chef-client. In a Linux environment, people are often running the client in demonized mode, it means that the client is waking up on a time interval basis and is executed (then every change to the cookbooks are run by the client). I’m almost sure that every AIX shop will be against this method because this one is dangerous. If you are doing that run the change first in test environment, then in dev, and finally in production. To be honest this is not the model I want to build where I am working. We want for some actions (like updates) a push model. By default Chef is delivered with a feature called push jobs. Push jobs is a way to run jobs like “execute the chef-client” from your knife workstation, unfortunately push jobs needs plugin to the chef-client and this one is only available on Linux OS …. not yet one AIX. Anyway we have an alternative, this one is the ssh knife plugin. This plugin that is in knife by default allows you to run commands on the nodes with ssh. Even better if you already have an ssh gateway with key sharing enabled knife ssh can use this gateway to communicate with the clients. Using knife ssh you’ll have the possibility to say “run chef-client on all my AIX 6.1 nodes” or “run this recipes installing this fix on all my AIX 7.1 nodes”, possibilities are infinite. Last note about knife ssh. This one is creating tunnels through your ssh gateway to communicate with the node, so if you use a shared key you have to copy the private key on the knife workstation (it tooks me time to understand that). Here are somes exmples:

knifessh

  • On two nodes check the current os level:
  • ssh1

  • Run the update with Chef:
  • update3

  • Alternates disk have been created:
  • update4

  • Both systems are up to date:
  • update5

Conclusion

I think this blog post helped you to better understand Chef and what is Chef capable of. We are still on the very beginning of the Chef cookbook and I’m sure plenty of new things (recipes, providers) will come in the next few months. Try it by yourself and I’m sure you’ll like the way it work. I must admit that it is difficult to learn and to start but if you are doing this right you’ll get the benefit of an automation tool working on AIX … and honestly AIX needs an automation tool. I’m almost sure it will be Chef (in fact we have no other choice). Help us to write postinstall recipes, updates recipes and any other recipes you can think about. We need your help and it is happening now! You have the opportunity to be a part of this, a part of something new that will help AIX in the future. We don’t want a dying os, Chef will give AIX the opportunity to be an OS with a fully working automation tool. Go give it a try now!