Unleash the true potential of SRIOV vNIC using vNIC failover !

I’m always working on tight schedule, I never have the time to write documentation because we’re moving fast, very fast … but not as fast as I want to ;-). A few months ago we were asked to put the TSM servers in our PowerVC environment I thought it was a very very bad idea to put a pet among the cattle as TSM servers are very specific and super I/O intensive in our environment (and are configured with plenty of rmt devices. This means that we tried to put lan-free stuffs into Openstack which is not designed at all for this kind of things). In my previous place we tried to put the TSM servers behind a virtualized environment (this means serving network through Shared Ethernet Adapters) and this was an EPIC FAIL. A few weeks after putting the servers in production we decided to move back to physical I/O and decided to used dedicated network adapters. As we didn’t want to make the same mistake in my current place we decided not to go on Shared Ethernet Adapters. Instead of that we took the decision to use SRIOV vNIC. SRIOV vNIC have the advantage to be fully virtualized (this means LPM aware and super flexible) allowing us to have the wanted flexibility (by moving TSM servers between sites if we feel the need to put a host in maintenance mode or if we are facing any kind of outage). In my previous blog post about vNIC I was very happy with the performance but not with the reliability. I didn’t want to go on NIB adapters for network redundancy (because it is an anti-virtualization way of doing things (we do not want to manage anything inside the VM, we want to let the virtualization environment do the job for us)). Lucky for me the project was reschedule to the end of the year and we finally took the decision not to put the TSM server into our big Openstack by dedicating some hosts for the backup stuffs. The latest version of PowerVM, HMC and firmware arrived just at time to let me use SRIOV vNIC failover new feature for this new TSM environment (fortunately for me we had some data center issues allowing me to wait enough time not to go on NIB and start the production directly with SRIOV vNIC \o/). I just have delivered the first four servers to my backup team yesterday and I must admit that SRIOV vNIC failover is a killer feature for this kind of things. Let’s now see how to setup this !

Prerequisites

As always using the latest features means you need to have everything up to date. In this case the minimal requierements for SRIOV vNIC failover are Virtual I/O Servers 2.2.5.10, Hardware Management Console v8R860 with the latest patchs and finally having a firmware up to date (ie. fw 860). Note that not all AIX versions are ok with SRIOV vNIC I’m here only using AIX 7.2 TL1 SP1:

  • Check the Virtual I/O Server are installed in 2.2.5.10:
  • # ioslevel
    2.2.5.10
    
  • Check the HMC is in the latest version (V8R860)
  • hscroot@myhmc:~> lshmc -V
    "version= Version: 8
     Release: 8.6.0
     Service Pack: 0
    HMC Build level 20161101.1
    MH01655: Required fix for HMC V8R8.6.0 (11-01-2016)
    ","base_version=V8R8.6.0
    "
    

    860

  • Check the firmware version is ok on the PowerSystem:
  • # updlic -o u -t sys -l latest -m reptilian-9119-MME-659707C -r mountpoint -d /home/hscroot/860_056/ -v
    # lslic -m reptilan-9119-MME-65BA46F -F activated_level,activated_spname
    56,FW860.10
    

    fw

What’s SRIOV vNIC failover and how it works ?

I’ll not explain here what’s an SRIOV vNIC, if you want to know more about it just check my previous blog post speaking about this topic A first look at SRIOV vNIC adapters. What’s failover is adding is a feature allowing you to add as “many” backing devices as you want for a vNIC adapter (the maximum is 6 backing devices). For each backing device you have the possibility to choose on which Virtual I/O Server will be created the corresponding vnicserver and set a failover priority to determine which backing device is active. Keep in mind that priorities are working the exact same way as it is with Shared Ethernet Adapter. This means that priority 10 is an higher priority than priority 20.

vnicvisio1

On the example shown on the images above and below the vNIC is configured with two backing devices (on two differents SRIOV adapters) with priority 10 and 20. As long as there is no outage (for instance on the Virtual I/O Server or on the adapter itself) the physical port utilized will be the one with priority 10. If the adapter has for instance an hardware issue we will have the possiblity to manually fallback on the second backing device or let the hypervisor do this for us by checking the next highest priority to choose the right backing device to use. Easy. This allow us to have redundant LPM aware and high performance adapters fully virtualized. A MUST :-) !

vnicvisio2

Creating a SRIOV vNIC failover using the HMC GUI and administrating it

To create or delete an SRIOV vNIC failover adapter (I’ll call this vNIC for the rest of the blog post) the machine must be shutdown or active (this is not possible to add a vNIC when a machine is booted in OpenFirmware). The only way to do this using the HMC GUI is to used the enhanced interface (no problem as we will have no other choice in a near future). Select the machine on which you want to create the adapter and click on the “Virtual NICs” tab.

vnic1b

Click “Add Virtual NIC”:

vnic1c

Chose the “Physical Port Location Code” (the physical port of the SRIOV adapter) on which you want to create the vNIC. You can add from one to six “backup adapter” (by clicking the “Add Entry” buton). This means that only one vNIC will be active at a moment. If this one is failing (adapter issue, network issue) the vNIC will failover to the next backup adapter depending on the “Failover priority”. Be careful to spread the hosting Virtual I/O Server to be sure that having a Virtual I/O Server down will be seamless for you partition:

vnic1d

On the example above:

  • I’m creating a vNIC failover with “vNIC Auto Priority Failover” enabled.
  • Four VF will be created two on the VIOS ending with 88, two on the VIOS ending with 89.
  • Obviously four vnicservers will be created on the VIOS (2 on each).
  • The lower priority will take the lead. This means That if the first one with priority 10 is failing the active adapter will be the second one. Then if the second one with priority 20 is failing the third one will be active and so on. Keep in my that if your lower priority is ok nothing will appends if one on the other backup adapter is failing. Be smart when choosing the priorities. As Yoda says “Wise you must be!”.
  • The physical ports are located on different CECs.

vnic1e

The “Advanced Virtual NIC Settings” is applied to all the vNIC that will be created (in the example above 4). For instance I’m using vlan tagging on these port so I just need to apply the “Port VLAN ID” one time.

vnic1f

You can choose or not to allow the hypervisor to perform the failover/fallback automatically depending on the priorities you have set. If you click “enable” the hypervisor will automatically failover to the next operational backing device depending on the priorities. If it is disabled only a user can trigger a failover operation.

vnic1g

Be careful the priorities are designed the same way they are on Shared Ethernet Adapter. This means the lowest number you will have in the failover priority will be the “highest priority failover” just like it is designed for Shared Ethernet Adapter. On the image below you can notice that the “priority” 10 which is the “highest failover priority” is active (but it is the lowest number between 10 20 30 and 40)

vnic1h

After the creation of the vNIC you can check differents stuffs on the Virtual I/O Server. You will notice that every entry added for the creation of the vNIC has a corresponding VF (virtual function) and a corresponding vnicserver (each vnicserver has a VF mapped on it):

  • You can see that for each entry added when creating a vNIC you’ll have the corresponding VF device present on the Virtual I/O Servers:
  • vios1# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C3-C1-T2-S5                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C3-C1-T2-S6                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
    vios2# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
  • For each VF you’ll see the corresponding vnicserver devices:
  • vios1# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
    vios2# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
  • You can check the corresponding mapped VF for each vnicserver using the ‘lsmap’ command. You can check on funny thing: when the adapter was never “used” by using the “Make the backing Device Active” button in the GUI the corresponding client name and Client device will not be showed:
  • vios1# lsmap -all -vnic -fmt :
    [..]
    vnicserver1:U9119.MME.659707C-V2-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C3-C1-T2-S5:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V2-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C3-C1-T2-S6:N/A:U9119.MME.659707C-V6-C6
    
    vios2# lsmap -all -vnic
    [..]
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver1   U9119.MME.659707C-V1-C32898             6 N/A            N/A
    
    Backing device:ent3
    Status:Available
    Physloc:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2
    Client device name:ent0
    Client device physloc:U9119.MME.659707C-V6-C6
    
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver2   U9119.MME.659707C-V1-C32899             6 N/A            N/A
    
    Backing device:ent4
    Status:Available
    Physloc:U78CA.001.CSS08EL-P1-C4-C1-T2-S2
    Client device name:N/A
    Client device physloc:U9119.MME.659707C-V6-C6
    
  • You can activate the device by yourself just by clicking the “Make backing Device Active Button” in the GUI and check the vnicserver is now logged:
  • vnic1i
    vnic1j

    vios2# lsmap -all -vnic -vadapter
    [..]
    vnicserver1:U9119.MME.659707C-V1-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V1-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C4-C1-T2-S2:N/A:U9119.MME.659707C-V6-C6
    
  • I noticed something pretty strange for me. When you are doing a manual failover of the vNIC the auto-priority will be set to disable. Remember to re-enable it after the manual operation was performed:
  • vnic1k

    You can also check the status and the priority of the vNIC in the Virtual I/O Server using the vnicstat command. Some good information are showed by the command, the state of the device, if it is active or not (I have noticed 2 different states in my test which are “active” (meaning this is the vf/vnicserver you are using) and “config_2″ meaning the adapter is ready and available for a failover operation (there is probably another state when the link is down but I didn’t had the time to ask my network team to shut a port to verify this)) and finally the failover priority. The vnicstat command is a root command.

    vios1#  vnicstat vnicserver1
    
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 10
    
    Client Partition ID: 6
    Client Partition Name: lizard
    Client Operating System: AIX
    Client Device Name: ent0
    Client Device Location Code: U9119.MME.659707C-V6-C6
    [..]
    
    vios2# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: config_2
    Backing Device Name: ent3
    
    Failover State: inactive
    Failover Readiness: operational
    Failover Priority: 20
    [..]
    

    You can also check vnic server events in this errpt (login when failover and so on …)

    # errpt | more
    8C577CB6   1202195216 I S vnicserver1    VNIC Transport Event
    60D73419   1202194816 I S vnicserver1    VNIC Client Login
    # errpt -aj 60D73419 | more
    ---------------------------------------------------------------------------
    LABEL:          VS_CLIENT_LOGIN
    IDENTIFIER:     60D73419
    
    Date/Time:       Fri Dec  2 19:48:06 2016
    Sequence Number: 10567
    Machine Id:      00C9707C4C00
    Node Id:         vios2
    Class:           S
    Type:            INFO
    WPAR:            Global
    Resource Name:   vnicserver1
    
    Description
    VNIC Client Login
    
    Probable Causes
    VNIC Client Login
    
    Failure Causes
    VNIC Client Login
    

    Same thing using the hmc command line.

    Now we will do the same thing in command line. I warn you the commands are pretty huge !!!!

    • List the sriov adapter (you will need those to create the vNICs):
    • # lshwres -r sriov --rsubtype adapter -m reptilian-9119-MME-65BA46F
      adapter_id=3,slot_id=21010012,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=4,slot_id=21010013,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=1,slot_id=21010022,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=2,slot_id=21010023,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      
    • List vNIC for virtual machine “lizard”:
    • lshwres -r virtualio  -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=0,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/3/0/2700c003/2.0/2.0/50,sriov/vios2/2/1/0/27004003/2.0/2.0/60","backing_device_states=sriov/2700c003/0/Operational,sriov/27004003/1/Operational"
      
    • Creates a vNIC with 2 backing devices first one on Virtual I/O Server 1 on adapter 1 on physical port 2 with a failover priority set to 10, second one on Virtual I/O Server 2 on adapter 3 on physical port 2 with a failover priority set to 20 (this vNIC will take the next available slot which will be 6) (WARNING: Physical port numbering starts from 0):
    • #chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o a -p lizard --rsubtype vnic -v -a 'port_vlan_id=3455,auto_priority_failover=1,backing_devices="sriov/vios1//1/1/2.0/10,sriov/vios1//3/1/2.0/20"'
      #lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational"
      
    • Add two backing devices (one on each vios on adapter 2 and 4, both on physical port 2 with failover priority set to 30 and 40) on vNIC with slot 6:
    • # chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o s --rsubtype vnic -p lizard -s 6 -a '"backing_devices+=sriov/vios1//2/1/2.0/30,sriov/vios2//4/1/2.0/40"'
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Change the failover priority of logical port 2700400b of the vNIC in slot 6 to 11:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnicbkdev -p lizard -s 6 --logport 2700400b -a "failover_priority=11"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Make logical port 27008005 active on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o act --rsubtype vnicbkdev -p lizard  -s 6 --logport 27008005 
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/0/Operational,sriov/2700c008/0/Operational,sriov/27008005/1/Operational,sriov/27010002/0/Operational"
      
    • Re-enable automatic failover on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnic -p lizard  -s 6 -a "auto_priority_failover=1"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      

    Testing the failover.

    It’s now time to test is the failover is working as intended. The test will be super simple I will just shutoff one of the two Virtual I/O Server and check if I’m loosing some packets or not. I’m first checking on which VIOS is located the active adapter:

    vnic1l

    I now need to shutdown the Virtual I/O Server ending with 88 and check if the one ending with 89 is taking the lead:

    *****88# shutdown -force 
    

    Priorities 10 and 30 are on the shutted Virtual I/O Server, the highest priority is on the active Virtual I/O Server is 20. This backing device hosted on the second Virtual I/O Server is serving the network I/Os;

    vnic1m

    You can check the same thing with command line on the remaining Virtual I/O Server:

    *****89# errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    60D73419   1202214716 I S vnicserver0    VNIC Client Login
    60D73419   1202214716 I S vnicserver1    VNIC Client Login
    *****89# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 20
    
    

    During my tests the failover was working as I expected. You can see on the picture below that during this test I only lost one ping between 64 and 66 during the failover/failback process.

    vnic1n

    In the partition I saw some messaging in the errpt during the failover:

    # errpt | mroe 
    4FB9389C   1202215816 I S ent0           VNIC Link Up
    F655DA07   1202215816 I S ent0           VNIC Link Down
    # errpt -a | more
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: DOWN   logical link: DOWN
    Status
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: UP   logical link: UP
    Status
    

    What about Live Partition Mobility.

    If you want a seamless LPM experience without having to choose the destination adapter and physical port on which to map you current vNIC backing devices on the destination, just fill the label and sublabel (most important is label) for each physical port of your SRIOV adapter. Then during the LPM if names are aligned between two systems the good physical port will be automatically chose depending on the names of the label:

    vnic1o
    vnic1p

    The LPM was working like a charm and I didn’t notice any particular problems during the move. vNIC failover and LPM are working ok as long as you take care of your SRIOV labels :-). I did notice on AIX 7.2 TL1 SP1 that there was no errpt messages in the partition itself but just in the Virtual I/O Server … weird :-)

    # errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    3EB09F5A   1202222416 I S Migration      Migration completed successfully
    

    Conlusion.

    No long story here. If you need performance AND flexibility you absolutely have to use SRIOV vNIC failover adapters. This feature offers you the best of two worlds having the possibility to dedicate 10GB adapters with a failover capability without having to be worried about LPM or about NIB configuration. It’s not applicable in all cases but it’s definitely something to have for an environment such as TSM or network I/O intensive workloads. Use it !

    About reptilians !

    Before you start reading this, keep your sense of humor and be noticed that what I say is not related to my workplace at all it’s a general way of thinking not especially based on my experience. Don’t be offended by this it’s just a personal opinion based on things I may or may have not seen during my life. You’ve been warned.

    This blog was never a place to share my opinions about life and society but I must admit that I should have done that before. Speaking about this kind of things makes you feel alive in world where everything needs to be ok and where you don’t have anymore the right to feel or express something about what you are living. There are couple of good blog posts speaking of this kind of things and related to the IT world. I agree with all of what is said in these posts. Some of the authors of these posts are just telling what they love in their daily jobs but I think it’s also a way to say what they probably won’t love in another one :-) :

    • Adam Leventhal’s “I’m not a resource”: here
    • Brendan Gregg’s “Working at Netflix in 2016″: here

    All of this to say that I work at nights, I work on weekends, I’m thinking about PowerSystems/computers when I fall asleep. I always have new ideas and I always want to learn new things, discover new technologies and features. I truly, deeply love this but being like this does not help me and will never help me in my daily job for one single reason. In this world people who have the knowledge are not people who are taking technical decisions it’s sad but true. I’m just good at working the most I can for the less money possible. Nobody cares if techs are happy, unhappy, want to stay or leave. I doesn’t make any differences for anyone driving a company. What’s important is money. Everything is meaningless. We are no one we are nothing, just number in a excel spreadsheet. I’m probably saying because I’m not good enough in anything to find an acceptable workplace. Once again sad but true.

    Even worst, if you just want to follow what’s the industry is asking you have to be everywhere and know everything. I know I’ll be forced in a very near future to move on Devops/ Linux (I love Linux I’m an RHCE certified engineer !). That’s why since a couple of years now, at night after my daily job is finished I’m working again: working to understand how Docker is working, working to install my own Openstack on my own machines, working to understand Saltstack, Ceph, Python, Ruby, Go …. it’s a never ending process. But it’s still not enough for them ! No enough to be consider as good or good enough guy to fit for a job. I remember being asked to know about Openstack, Cassandra, Hadoop, AWS, KVM, Linux, Automation tools (puppet this time), Docker and continuous integration for one single job application. First, I seriously doubt that someone will have such skills and be good at each. Second even if I’m an expert on each one if you have a look a few years ago it was the exact same thing but with different products. You have to understand and be good at every new products in minutes. All of this to understand that one or two years after you are considered as an “expert” you are bad at everything that exists in the industry. I’m really sick of this fight against something I can’t control. Being a hard worker and clever enough to understand every new features is not enough nowadays. On top of that you also need to be a beautiful person with a nice perfect smile wearing a perfect suit. You also have to be on LinkedIn and be connected with the good persons. And even if every of these boxes are checked you still need to be lucky enough to be at the right place at the right moment. I’m so sick of this. Work doesn’t pay. Only luck. I don’t want to live in this kind of world but I have to. Anyway this is just a “two-cents” way of thinking. Everything is probably a big trick orchestrated by this reptilians lizard mens ! ^^. Be good at what you do and don’t care about what people are thinking of you (even your horrible french accent during your sessions) … that’s the most important !

    picture-of-reptilian-alien

    What’s new in VIOS 2.2.4.10 and PowerVM : Part 1 Virtual I/O Server Rules

    I will post a series of mini blog posts about new features of PowerVM and Virtual I/O Server that are release this month. By this I mean Hardware Management Console 840 + Power firmware 840 + Virtual I/O Sever 2.2.4.10. As writing blog posts is not a part of my job and that I’m doing in that in my spare time some of the topics I will talk about have already been covered by other AIX bloggers but I think the more materials we have and the better it is. Other ones like this first one will be new to you. So please accept my apologize if topics are not what I’m calling “0 day” (the day of release). Anyway writing things help me to understand better and I add little details I have not seen in others blog post or in official documentation. Last point I will always try in these mini posts to give something new to you at least my point of view as an IBM customer. I hope it will be useful for you.

    The first topic I want to talk about is Virtual I/O Server Rules. With the latest version three new commands called “rules” and “rulescfgset” and “rulesdeploy” are now available in the Virtual I/O Servers. Theses ones helps you configure your devices attributes by creating, deploying, or checking rules (with the current configuration). I’m 100% sure that every time you are installing a Virtual I/O Server you are doing the same thing over and over again: you check your buffers attributes, you check attributes on fiber channels adapters and so on. The rules is a way to be sure everything is the same on all your Virtual I/O Servers (you can create a rule file (xml format) that can be deploy on every Virtual I/O Server you install). Even better, if you are a PowerVC user like me you want to be sure that any new device created by PowerVC are created with the attributes you want (for instance buffer for Virtual Ethernet Adapters). In the “old days” you have to use the chdef command, you can now do this by using the rules. Better than giving you a list of command I’ll show you here what I’m now doing on my Virtual I/O Server in 2.2.4.10.

    Creating and modifying existing default rules

    Before starting here are (a non exhaustive list) the attributes I’m changing on all my Virtual I/O Servers at deploy time. I now want to do that using the rules (these are just examples, you can do much more using the rules):

    • On fcs Adapters I’m changing the max_xfer_size attribute to 0x200000.
    • On fcs Adapters I’m changing the num_cmd_elems attribute to 2048.
    • On fscsi Devices I’m changing the dyntrk attribute to yes.
    • On fscsi Devices I’m changing the fc_err_recov to fast_fail.
    • On Virtual Ethernet Adapters I’m changing the max_buf_tiny attribute to 4096.
    • On Virtual Ethernet Adapters I’m changing the min_buf_tiny attribute to 4096.
    • On Virtual Ethernet Adapters I’m changing the max_buf_small attribute to 4096.
    • On Virtual Ethernet Adapters I’m changing the min_buf_small attribute to 4096.
    • On Virtual Ethernet Adapters I’m changing the max_buf_medium attribute to 512.
    • On Virtual Ethernet Adapters I’m changing the min_buf_medium attribute to 512.
    • On Virtual Ethernet Adapters I’m changing the max_buf_large attribute to 128.
    • On Virtual Ethernet Adapters I’m changing the min_buf_large attribute to 128.
    • On Virtual Ethernet Adapters I’m changing the max_buf_huge attribute to 128.
    • On Virtual Ethernet Adapters I’m changing the min_buf_huge attribute to 128.

    Modify existing attributes using rules

    By default a “factory” default rule file now exist in the Virtual I/O Server. This one is located in /home/padmin/rules/vios_current_rules.xml, you can check the content of the file (it’s an xml file) and list the rules contains in it:

    # ls -l /home/padmin/rules
    total 40
    -r--r-----    1 root     system        17810 Dec 08 18:40 vios_current_rules.xml
    $ oem_setup_env
    # head -10 /home/padmin/rules/vios_current_rules.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <Profile origin="get" version="3.0.0" date="2015-12-08T17:40:37Z">
     <Catalog id="devParam.disk.fcp.mpioosdisk" version="3.0">
      <Parameter name="reserve_policy" value="no_reserve" applyType="nextboot" reboot="true">
       <Target class="device" instance="disk/fcp/mpioosdisk"/>
      </Parameter>
     </Catalog>
     <Catalog id="devParam.disk.fcp.mpioapdisk" version="3.0">
      <Parameter name="reserve_policy" value="no_reserve" applyType="nextboot" reboot="true">
       <Target class="device" instance="disk/fcp/mpioapdisk"/>
    [..]
    
    $ rules -o list -d
    

    Let’s now say you have an existing Virtual I/O Server with en existing SEA configured on it. You want two things by using the rules:

    • Applying the rules to modify to the existing devices.
    • Be sure that new devices will be created using the rules.

    For the purpose of this example we will work here on the buffers attributes of a Virtual Network Adapter (same concepts are applying to other devices type). So we have an SEA with Virtual Network Adapters and we want to change the buffers attributes. Let’s first check the current values of the Virtual Adapters:

    $ lsdev -type adapter | grep -i Shared
    ent13            Available   Shared Ethernet Adapter
    $ lsdev -dev ent13 -attr virt_adapters
    value
    
    ent8,ent9,ent10,ent11
    
    $ lsdev -dev ent8 -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny
    value
    
    64
    64
    256
    2048
    2048
    24
    24
    128
    512
    512
    $ lsdev -dev ent9 -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny
    value
    
    64
    64
    256
    2048
    2048
    24
    24
    128
    512
    512
    

    Let’s now check the value in the current Virtual I/O Servers rules:

    $ rules -o list | grep buf
    adapter/vdevice/IBM,l-lan      max_buf_tiny         2048
    adapter/vdevice/IBM,l-lan      min_buf_tiny         512
    adapter/vdevice/IBM,l-lan      max_buf_small        2048
    adapter/vdevice/IBM,l-lan      min_buf_small        512
    

    For the tiny and small buffer I can change the rules easily using the rules command (using modify operation):

    $ rules -o modify -t adapter/vdevice/IBM,l-lan -a max_buf_tiny=4096
    $ rules -o modify -t adapter/vdevice/IBM,l-lan -a min_buf_tiny=4096
    $ rules -o modify -t adapter/vdevice/IBM,l-lan -a max_buf_small=4096
    $ rules -o modify -t adapter/vdevice/IBM,l-lan -a min_buf_small=4096
    

    I’m re-running the rules command to check rules are now modified :

    $ rules -o list | grep buf
    adapter/vdevice/IBM,l-lan      max_buf_tiny         4096
    adapter/vdevice/IBM,l-lan      min_buf_tiny         4096
    adapter/vdevice/IBM,l-lan      max_buf_small        4096
    adapter/vdevice/IBM,l-lan      min_buf_small        4096
    

    I can check the current values of my system against the current defined rules by using the diff operation:

    # rules -o diff -s
    devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
    devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
    devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
    devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
    

    Creating new attributes using rules

    In the current Virtual I/O Server rules embedded with the current Virtual I/O Server release there are no existing rules for the medium, large and huge buffer. Unfortunately for me I’m modifying these attributes by default and I want a rule capable of doing that. The goal is now to create a new set of rules for the other buffers not already present in the default file … Let’s try to do that using the add operation:

    # rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_medium=512
    The rule is not supported or does not exist.
    

    Annoying, I can’t add a rule for the medium buffer (same for the large and huge ones). The available attributes for each device is based on the current AIX artex catalog. You can check all the files present in the catalog to check what are the available attributes for each device type, you can see in the output below that there is nothing in the current ARTEX catalog for the medium buffer.

    $ oem_setup_env
    # cd /etc/security/artex/catalogs
    # ls -ltr | grep l-lan
    -r--r-----    1 root     security       1261 Nov 10 00:30 devParam.adapter.vdevice.IBM,l-lan.xml
    # grep medium devParam.adapter.vdevice.IBM,l-lan.xml
    # 
    

    To show that this is possible to add new rules I’ll show you a simple example to add the new ‘src_lun_val’ and ‘dst_lun_val’ on the vioslpm0 device. First I check that I can add this rules by looking in the ARTEX catalog:

    $ oem_setup_env
    # cd /etc/security/artex/catalogs
    # ls -ltr | grep lpm
    -r--r-----    1 root     security       2645 Nov 10 00:30 devParam.pseudo.vios.lpm.xml
    # grep -iE "src_lun_val|dest_lun_val" devParam.pseudo.vios.lpm.xml
      <ParameterDef name="dest_lun_val" type="string" targetClass="device" cfgmethod="attr" reboot="true">
      <ParameterDef name="src_lun_val" type="string" targetClass="device" cfgmethod="attr" reboot="true">
    

    Then I’m checking the ‘range’ of authorized values for both attributes:

    # lsattr -l vioslpm0 -a src_lun_val -R
    on
    off
    # lsattr -l vioslpm0 -a dest_lun_val -R
    on
    off
    restart_off
    lpm_off
    

    I’m searching the type using the lsdev command (here pseudo/vios/lpm):

    # lsdev -P | grep lpm
    pseudo         lpm             vios           VIOS LPM Adapter
    

    I’m finally adding the rules and checking the differences:

    $ rules -o add -t pseudo/vios/lpm -a src_lun_val=on
    $ rules -o add -t pseudo/vios/lpm -a dest_lun_val=on
    $ rules -o diff -s
    devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
    devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
    devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
    devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
    devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
    devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on
    

    But what about my buffers, is there any possibility to add these attributes in the current ARTEX catalog. The answer is yes. By looking in catalog used for Virtual Ethernet Adapters (file named: devParam.adapter.vdevice.IBM,l-lan.xml) you will see that a catalog file named ‘vioent.cat’ is utilized by this xml file. Check the content of this catalog file by using the dspcat command and find if there is anything related to medium, large and huge buffers (all the catalogs files are location in /usr/lib/methods):

    $ oem_setup_env
    # cd /usr/lib/methods
    # dspcat vioent.cat |grep -iE "medium|large|huge"
    1 : 10 Minimum Huge Buffers
    1 : 11 Maximum Huge Buffers
    1 : 12 Minimum Large Buffers
    1 : 13 Maximum Large Buffers
    1 : 14 Minimum Medium Buffers
    1 : 15 Maximum Medium Buffers
    

    Modify the xml file located in the ARTEX catalog and add the necessary information for these three new buffers type:

    $ oem_setup_env
    # vi /etc/security/artex/catalogs/devParam.adapter.vdevice.IBM,l-lan.xml
    <?xml version="1.0" encoding="UTF-8"?>
    
    <Catalog id="devParam.adapter.vdevice.IBM,l-lan" version="3.0" inherit="devCommon">
    
      <ShortDescription><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="1">Virtual I/O Ethernet Adapter (l-lan)</NLSCatalog></ShortDescription>
    
      <ParameterDef name="min_buf_huge" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
        <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="10">Minimum Huge Buffers</NLSCatalog></Description>
      </ParameterDef>
    
      <ParameterDef name="max_buf_huge" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
        <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="11">Maximum Huge Buffers</NLSCatalog></Description>
      </ParameterDef>
    
      <ParameterDef name="min_buf_large" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
        <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="12">Minimum Large Buffers</NLSCatalog></Description>
      </ParameterDef>
    
      <ParameterDef name="max_buf_large" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
        <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="13">Maximum Large Buffers</NLSCatalog></Description>
      </ParameterDef>
    
      <ParameterDef name="min_buf_medium" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
        <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="14">Minimum Medium Buffers<</NLSCatalog></Description>
      </ParameterDef>
    
      <ParameterDef name="max_buf_medium" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
        <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="15">Maximum Medium Buffers</NLSCatalog></Description>
      </ParameterDef>
    
    [..]
      <ParameterDef name="max_buf_tiny" type="integer" targetClass="device" cfgmethod="attr" reboot="true">
        <Description><NLSCatalog catalog="vioent.cat" setNum="1" msgNum="19">Maximum Tiny Buffers</NLSCatalog></Description>
      </ParameterDef>
    
    
    

    Then I’m retrying to add the rules of the medium,large and huge buffers …. and it’s working great:

    # rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_medium=512
    # rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_medium=512
    # rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_huge=128
    # rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_huge=128
    # rules -o add -t adapter/vdevice/IBM,l-lan -a max_buf_large=128
    # rules -o add -t adapter/vdevice/IBM,l-lan -a min_buf_large=128
    

    Deploying the rules

    Now that a couple of rules are defined let’s now apply them on the Virtual I/O server. First check the differences you will get after applying the rules by using the diff operation of the rules command:

    $ rules -o diff -s
    devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
    devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
    devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
    devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
    devParam.adapter.vdevice.IBM,l-lan:max_buf_medium device=adapter/vdevice/IBM,l-lan   256 | 512
    devParam.adapter.vdevice.IBM,l-lan:min_buf_medium device=adapter/vdevice/IBM,l-lan   128 | 512
    devParam.adapter.vdevice.IBM,l-lan:max_buf_huge device=adapter/vdevice/IBM,l-lan      64 | 128
    devParam.adapter.vdevice.IBM,l-lan:min_buf_huge device=adapter/vdevice/IBM,l-lan      24 | 128
    devParam.adapter.vdevice.IBM,l-lan:max_buf_large device=adapter/vdevice/IBM,l-lan     64 | 128
    devParam.adapter.vdevice.IBM,l-lan:min_buf_large device=adapter/vdevice/IBM,l-lan     24 | 128
    devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
    devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on
    

    Let’s now deploy the rules using the deploy operation of the rules command, you can notice that for some rules a mandatory reboot is needed to change the existing devices this is the case for the buffers, but not for the vioslpm0 attributes (we can check again that we now have no differences … some attributes are applied using the -P attribute of the chdev command):

    $ rules -o deploy 
    A manual post-operation is required for the changes to take effect, please reboot the system.
    $ lsdev -dev ent8 -attr min_buf_small
    value
    
    4096
     lsdev -dev vioslpm0 -attr src_lun_val
    value
    
    on
    $ rules -o diff -s
    

    Don’t forget to reboot the Virtual I/O Server and check everything is ok after the reboot (check the kernel values by using enstat):

    $ shutdown -force -restart
    [..]
    $ for i in ent8 ent9 ent10 ent11 ; do lsdev -dev $i -attr max_buf_huge,max_buf_large,max_buf_medium,max_buf_small,max_buf_tiny,min_buf_huge,min_buf_large,min_buf_medium,min_buf_small,min_buf_tiny ; done
    [..]
    128
    128
    512
    4096
    4096
    128
    128
    512
    4096
    4096
    $ entstat -all ent13 | grep -i buf
    [..]
    No mbuf Errors: 0
      Transmit Buffers
        Buffer Size             65536
        Buffers                    32
          No Buffers                0
      Receive Buffers
        Buffer Type              Tiny    Small   Medium    Large     Huge
        Min Buffers              4096     4096      512      128      128
        Max Buffers              4096     4096      512      128      128
    

    For the fibre channels adapters I’m using theses rules:

    $ rules -o modify -t driver/iocb/efscsi -a dyntrk=yes
    $ rules -o modify -t driver/qliocb/qlfscsi -a dyntrk=yes
    $ rules -o modify -t driver/qiocb/qfscsi -a dyntrk=yes
    $ rules -o modify -t driver/iocb/efscsi -a fc_err_recov=fast_fail
    $ rules -o modify -t driver/qliocb/qlfscsi -a fc_err_recov=fast_fail
    $ rules -o modify -t driver/qiocb/qfscsi -a fc_err_recov=fast_fail
    

    What about new devices ?

    Let’s now create a new SEA by adding new Virtual Ethernet Adapter using DLPAR and check the devices are created with the good values. (I’m not showing you here how to create the VEA I’m doing it the GUI for simplicity) (14,15,16,17 are the new ones):

    $ lsdev | grep ent
    ent12            Available   EtherChannel / IEEE 802.3ad Link Aggregation
    ent13            Available   Shared Ethernet Adapter
    ent14            Available   Virtual I/O Ethernet Adapter (l-lan)
    ent15            Available   Virtual I/O Ethernet Adapter (l-lan)
    ent16            Available   Virtual I/O Ethernet Adapter (l-lan)
    ent17            Available   Virtual I/O Ethernet Adapter (l-lan)
    $ lsdev -dev ent14 -attr
    buf_mode        min            Receive Buffer Mode                        True
    copy_buffs      32             Transmit Copy Buffers                      True
    max_buf_control 64             Maximum Control Buffers                    True
    max_buf_huge    128            Maximum Huge Buffers                       True
    max_buf_large   128            Maximum Large Buffers                      True
    max_buf_medium  512            Maximum Medium Buffers                     True
    max_buf_small   4096           Maximum Small Buffers                      True
    max_buf_tiny    4096           Maximum Tiny Buffers                       True
    min_buf_control 24             Minimum Control Buffers                    True
    min_buf_huge    128            Minimum Huge Buffers                       True
    min_buf_large   128            Minimum Large Buffers                      True
    min_buf_medium  512            Minimum Medium Buffers                     True
    min_buf_small   4096           Minimum Small Buffers                      True
    min_buf_tiny    4096           Minimum Tiny Buffers                       True
    $  mkvdev -sea ent0 -vadapter ent14 ent15 ent16 ent17 -default ent14 -defaultid 14 -attr ha_mode=sharing largesend=1 large_receive=yes
    ent18 Available
    $ entstat -all ent18 | grep -i buf
    No mbuf Errors: 0
      Transmit Buffers
        Buffer Size             65536
        Buffers                    32
          No Buffers                0
      Receive Buffers
        Buffer Type              Tiny    Small   Medium    Large     Huge
        Min Buffers              4096     4096      512      128      128
        Max Buffers              4096     4096      512      128      128
      Buffer Mode: Min
    [..]
    

    Deploying these rules to another Virtual I/O Server

    The goal is now to use this rule file and deploy it on all my Virtual I/O Servers to be sure all the attributes are the same on all the Virtual I/O Servers.

    I’m copying my rule file and copy it to another Virtual I/O Server:

    $ oem_setup_env
    # cp /home/padmin/rules
    # scp /home/padmin/rules/custom_rules.xml anothervios:/home/padmin/rules
    custom_rules.xml                   100%   19KB  18.6KB/s   00:00
    # scp /etc/security/artex/catalogs/devParam.adapter.vdevice.IBM,l-lan.xml anothervios:/etc/security/artex/catalogs/
    devParam.adapter.vdevice.IBM,l-lan.xml
    devParam.adapter.vdevice.IBM,l-lan.xml    100% 2737     2.7KB/s   00:00
    

    I’m now connecting to the new Virtual I/O Server and applying the rules:

    $ rules -o import -f /home/padmin/rules/custom_rules.xml
    $ rules -o diff -s
    devParam.adapter.vdevice.IBM,l-lan:max_buf_tiny device=adapter/vdevice/IBM,l-lan    2048 | 4096
    devParam.adapter.vdevice.IBM,l-lan:min_buf_tiny device=adapter/vdevice/IBM,l-lan     512 | 4096
    devParam.adapter.vdevice.IBM,l-lan:max_buf_small device=adapter/vdevice/IBM,l-lan   2048 | 4096
    devParam.adapter.vdevice.IBM,l-lan:min_buf_small device=adapter/vdevice/IBM,l-lan    512 | 4096
    devParam.adapter.vdevice.IBM,l-lan:max_buf_medium device=adapter/vdevice/IBM,l-lan   256 | 512
    devParam.adapter.vdevice.IBM,l-lan:min_buf_medium device=adapter/vdevice/IBM,l-lan   128 | 512
    devParam.adapter.vdevice.IBM,l-lan:max_buf_huge device=adapter/vdevice/IBM,l-lan      64 | 128
    devParam.adapter.vdevice.IBM,l-lan:min_buf_huge device=adapter/vdevice/IBM,l-lan      24 | 128
    devParam.adapter.vdevice.IBM,l-lan:max_buf_large device=adapter/vdevice/IBM,l-lan     64 | 128
    devParam.adapter.vdevice.IBM,l-lan:min_buf_large device=adapter/vdevice/IBM,l-lan     24 | 128
    devParam.pseudo.vios.lpm:src_lun_val device=pseudo/vios/lpm                          off | on
    devParam.pseudo.vios.lpm:dest_lun_val device=pseudo/vios/lpm                 restart_off | on
    $ rules -o deploy
    A manual post-operation is required for the changes to take effect, please reboot the system.
    $ entstat -all ent18 | grep -i buf
    [..]
        Buffer Type              Tiny    Small   Medium    Large     Huge
        Min Buffers               512      512      128       24       24
        Max Buffers              2048     2048      256       64       64
    [..]
    $ shutdown -force -restart
    $ entstat -all ent18 | grep -i buf
    [..]
       Buffer Type              Tiny    Small   Medium    Large     Huge
        Min Buffers              4096     4096      512      128      128
        Max Buffers              4096     4096      512      128      128
    [..]
    

    rulescfgset

    If you don’t care at all about creating your own rules you can just use the rulecfgset command as padmin to apply default Virtual I/O Server rules, my advice for newbies is to do that just after the Virtual I/O Server is installed. By doing that you will be sure to have the default IBM rules. It is a good pratice to do that every time you will deploy a new Virtual I/O Server.

    # rulescfgset
    

    Conclusion

    Use rules ! It is a good way to be sure your Virtual I/O Server devices attributes are the same. I hope my example are good enough to convince you to use it. For PowerVC user like me using rules is a must. As PowerVC is creating devices for you, you want to be sure all your devices are created with the exact same attributes. My example about Virtual Ethernet Adapter buffers is just a mandatory thing to do now for PowerVC users. As always I hope it helps.

    A first look at SRIOV vNIC adapters

    I have the chance to participate in the current Early Shipment Program (ESP) for Power Systems, especially the software part. One of my tasks is to test a new feature called SRIOV vNIC. For those who does not know anything about SRIOV this technology is comparable to LHEA except it is based on a industry standard (and have a couple of other features). By using SRIOV adapter you can divide a physical port into what we call a Virtual Function (or a Logical Port) and map this Virtual Function to a partition. You can also set “Quality Of Service” on these Virtual Functions. At the creation you will setup the Virtual Function allowing it to take a certain percentage of the physical port. These can be very useful if you want to be sure that your production server will always have a guaranteed bandwidth instead of using a Shared Ethernet Adapter where every clients partitions are competing for the bandwidth. Customers are also using SRIOV adapters for performance purpose ; as nothing is going through the Virtual I/O Server the latency added by this action is eliminated and CPU cycles are saved on the Virtual I/O Server side (Shared Ethernet Adapter consume a lot of CPU cycles). If you are not aware of what SRIOV is I encourage you to check the IBM Redbook about it (http://www.redbooks.ibm.com/abstracts/redp5065.html?Open. Unfortunately you can’t move a partition by using Live Partition Mobility if this one have a Virtual Function assigned to it. Using vNICs allows you to use SRIOV through the Virtual I/O Servers and enable the possibility to move your partition even if you are using an SRIOV logical port. The better of two worlds : performance/qos and virtualization. Is this the end of the Shared Ethernet Adapter ?

    SRIOV vNIC, what’s this ?

    Before talking about the technical details it is important to understand what vNICs are. When I’m explaining this to newbies I often refer to NPIV. Imagine something similar as the NPIV but for the network part. By using SRIOV vNIC:

    • A Virtual Function (SRIOV Logical Port) is created and assigned to the Virtual I/O Server.
    • A vNIC adapter is created in the client partition.
    • The Virtual Function and the vNIC adapter are linked (mapped) together.
    • This is a one to one relationship between a Virtual Function and a vNIC (like a vfcs adapter is a one to one relationship between your vfcs and the physical fiber channel adapter).

    On the image below, the vNIC lpars are the “yellow” ones, you can see here that the SRIOV adapter is divided in different Virtual Function, and some of them are mapped to the Virtual I/O Server. The relationship between the Virtual Function and the vNIC is achieved by a vnicserver (this is a special Virtual I/O Server device).
    vNIC

    One of the major advantage of using vNIC is that you eliminate the need of the Virtual I/O Server for data flows:

    • The network data flow is direct between the partition memory and the SRIOV adapter, there is no data copy passing through the Virtual I/O Server and it eliminate the CPU cost and the latency of doing that. This is achieved by LRDMA. Pretty cool !
    • The vNIC will inherits the bandwidth allocation of the Virtual Function (QoS). If the VF is configured with a capacity of 2% the vNIC will also have this capacity.
    • vNIC2

    vNIC Configuration

    Before checking all the details on how to configure an SRIOV vNIC adapter you have to check all the prerequisites. As this is a new feature you will need the latest level of …. everything. My advice is to stay up to date as much as possible.

    vNIC Prerequisites

    These outputs are taken from the early shipment program. All of this can be changed at the GA release:

    • Hardware Management Console v840:
    • # lshmc -V
      lshmc -V
      "version= Version: 8
       Release: 8.4.0
       Service Pack: 0
      HMC Build level 20150803.3
      ","base_version=V8R8.4.0
      "
      
    • Power 8 only, firmware 840 at least (both enterprise and scale out systems):
    • firmware

    • AIX 7.1TL4 or AIX 7.2:
    • # oslevel -s
      7200-00-00-0000
      # cat /proc/version
      Oct 20 2015
      06:57:03
      1543A_720
      @(#) _kdb_buildinfo unix_64 Oct 20 2015 06:57:03 1543A_720
      
    • Obviously at least on SRIOV capable adapter!

    Using the HMC GUI

    The configuration of a vNIC is done at the partition level. The configuration is only available on the enhanced version of the GUI. Select the virtual machine on which you want to add the vNIC and in the Virtual I/O tab you’ll see that a new Virtual NICs session is here. Click on “Virtual NICs” and a new panel will be opened with a new button called “Add Virtual NIC”, just click this one to add a Virtual NIC:

    vnic_n1
    vnic_conf2

    All the SRIOV capable port will be displayed on the next screen. Choose the SRIOV port you want (a virtual function will be created on this one. Don’t do anything more, the creation of a vNIC will automatically create a Virtual Function; assign it to Virtual I/O Server and do the mapping to the vNIC for you). Choose the Virtual I/O Server that will be used for this vNIC (the vNIC server will be created on this Virtual I/O Server. Don’t worry we will talk about vNIC redundancy later in this post) and the Virtual NIC Capacity (the percentage the Phyiscal SRIOV port that will be dedicated to this vNIC)(this has to be a multiple of 2)(be careful with that it can’t be changed afterwards and you’ll have to delete your vNIC to redo the configuration) :

    vnic_conf3

    The “Advanced Virtual NIC Settings” allows you to choose the Virtual NIC Adapter ID, choosing a MAC Address, and configuring the vlan restrictions and vlan tagging. In the example below I’m configuring my Virtual NIC in the vlan 310:

    vnic_conf4
    vnic_conf5
    allvnic

    Using the HMC Command Line

    As always the configuration can be achieved using the HMC command line, using lshwres to list vNIC and chhwres to create a vNIC.

    List SRIOV adapters to get the adapter_id needed by the chhwres command:

    # lshwres -r sriov --rsubtype adapter -m blade-8286-41A-21AFFFF
    adapter_id=1,slot_id=21020014,adapter_max_logical_ports=48,config_state=sriov,functional_state=1,logical_ports=48,phys_loc=U78C9.001.WZS06RN-P1-C12,phys_ports=4,sriov_status=running,alternate_config=0
    # lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
    lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0
    

    Create the vNIC:

    # chhwres -r virtualio -m blade-8286-41A-21AFFFF -o a -p 72vm1 --rsubtype vnic -v -a "port_vlan_id=310,backing_devices=sriov/vios2/1/1/1/2"
    

    List the vNIC after create:

    # lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
    lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0
    lpar_name=72vm1,lpar_id=9,slot_num=2,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87702,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios2/1/1/1/2700400a/2.0
    

    System and Virtual I/O Server Side:

    • On the Virtual I/O Server you can use two commands to check your vNIC configuration. You can first use the lsmap command to check the one to one relationship between the VF and the vNIC (you see on the output below that a VF and a vnicserver device are created)(you can also see the name of the vNIC in the client partition side) :
    • # lsdev | grep VF
      ent4             Available   PCIe2 100/1000 Base-TX 4-port Converged Network Adapter VF (df1028e214103c04)
      # lsdev | grep vnicserver
      vnicserver0      Available   Virtual NIC Server Device (vnicserver)
      # lsmap -vadapter vnicserver0 -vnic
      Name          Physloc                            ClntID ClntName       ClntOS
      ------------- ---------------------------------- ------ -------------- -------
      vnicserver0   U8286.41A.21FFFFF-V2-C32897             6 72nim1         AIX
      
      Backing device:ent4
      Status:Available
      Physloc:U78C9.001.WZS06RN-P1-C12-T4-S16
      Client device name:ent1
      Client device physloc:U8286.41A.21FFFFF-V6-C3
      
    • You can get more details (QoS, vlan tagging, port states) by using the vnicstat command:
    • # vnicstat -b vnicserver0
      [..]
      --------------------------------------------------------------------------------
      VNIC Server Statistics: vnicserver0
      --------------------------------------------------------------------------------
      Device Statistics:
      ------------------
      State: active
      Backing Device Name: ent4
      
      Client Partition ID: 6
      Client Partition Name: 72nim1
      Client Operating System: AIX
      Client Device Name: ent1
      Client Device Location Code: U8286.41A.21FFFFF-V6-C3
      [..]
      Device ID: df1028e214103c04
      Version: 1
      Physical Port Link Status: Up
      Logical Port Link Status: Up
      Physical Port Speed: 1Gbps Full Duplex
      [..]
      Port VLAN (Priority:ID): 0:3331
      [..]
      VF Minimum Bandwidth: 2%
      VF Maximum Bandwidth: 100%
      
    • On the client side you can list your vNIC and as always have details using the entstat command:
    • # lsdev -c adapter -s vdevice -t IBM,vnic
      ent0 Available  Virtual NIC Client Adapter (vnic)
      ent1 Available  Virtual NIC Client Adapter (vnic)
      ent3 Available  Virtual NIC Client Adapter (vnic)
      ent4 Available  Virtual NIC Client Adapter (vnic)
      # entstat -d ent0 | more
      [..]
      ETHERNET STATISTICS (ent0) :
      Device Type: Virtual NIC Client Adapter (vnic)
      [..]
      Virtual NIC Client Adapter (vnic) Specific Statistics:
      ------------------------------------------------------
      Current Link State: Up
      Logical Port State: Up
      Physical Port State: Up
      
      Speed Running:  1 Gbps Full Duplex
      
      Jumbo Frames: Disabled
      [..]
      Port VLAN ID Status: Enabled
              Port VLAN ID: 3331
              Port VLAN Priority: 0
      

    Redundancy

    You will certainly agree that having a such new cool feature without having something that is fully redundant would be a shame. Hopefully we have here a solution with the return with a great fanfare of the Network Interface Backup (NIB). As I told you before each time a vNIC is created a vnicserver is created on one of the Virtual I/O Server. (At the vNIC creation you have to choose on which Virtual I/O server it will be created). So to be fully redundant and to have a failover feature the only way is to create two vNIC adapters (one using the first Virtual I/O Server and the second one using the second Virtual I/O Server, on top of this you then have to create a Network Interface Backup, like in the old times :-) ). Here are a couple of things and best practices to know before doing this.

    • You can’t use two VF coming from the same SRIOV adapter physical port (the NIB creation will be ok, but any configuration on top of this NIB will fail).
    • You can use two VF coming from the same SRIOV adapter but with two different logical ports (this is the example I will show below).
    • The best partice is to use two VF coming from two different SRIOV adapters (you can then afford to loose one of the two SRIOV adapter).

    vNIC_nib

    Verify on your partition that you have two vNIC adapters and check that the status are ok using the ‘entstat‘ command:

    • Both vNIC are available on the client partition:
    • # lsdev -c adapter -s vdevice -t IBM,vnic
      ent0 Available  Virtual NIC Client Adapter (vnic)
      ent1 Available  Virtual NIC Client Adapter (vnic)
      # lsdev -c adapter -s vdevice -t IBM,vnic -F physloc
      U8286.41A.21FFFFF-V6-C2
      U8286.41A.21FFFFF-V6-C3
      
    • You can check on the first Virtual I/O Server that “Current Link State”, “Logical Port State” and “Physical Port State” are ok (all of them needs to be up):
    • # entstat -d ent0 | grep -p vnic
      -------------------------------------------------------------
      ETHERNET STATISTICS (ent0) :
      Device Type: Virtual NIC Client Adapter (vnic)
      Hardware Address: ee:3b:86:f6:45:02
      Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
      
      Virtual NIC Client Adapter (vnic) Specific Statistics:
      ------------------------------------------------------
      Current Link State: Up
      Logical Port State: Up
      Physical Port State: Up
      
    • Same on the second Virtual I/O Server:
    • # entstat -d ent1 | grep -p vnic
      -------------------------------------------------------------
      ETHERNET STATISTICS (ent1) :
      Device Type: Virtual NIC Client Adapter (vnic)
      Hardware Address: ee:3b:86:f6:45:03
      Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
      
      Virtual NIC Client Adapter (vnic) Specific Statistics:
      ------------------------------------------------------
      Current Link State: Up
      Logical Port State: Up
      Physical Port State: Up
      

    Verify on both Virtual I/O Server that the two vNIC are coming from two different SRIOV adapters (for the purpose of this test I’m using two different ports on the same SRIOV adapters but it remains the same with two different adapters). You can see on the output below that on Virtual I/O Server 1 the vNIC is backed to the adapter on position 3 (T3) and that on Virtual I/O Server 2 the vNIC is backed to the adapter on position 4 (T4):

    • Once again use the lsmap command on the first Virtual I/O Server to check that (note that you can check the client name, and the client device):
    • # lsmap -vadapter vnicserver0 -vnic
      Name          Physloc                            ClntID ClntName       ClntOS
      ------------- ---------------------------------- ------ -------------- -------
      vnicserver0   U8286.41A.21AFF8V-V1-C32897             6 72nim1         AIX
      
      Backing device:ent4
      Status:Available
      Physloc:U78C9.001.WZS06RN-P1-C12-T3-S13
      Client device name:ent0
      Client device physloc:U8286.41A.21AFF8V-V6-C2
      
    • Same thing on the second Virtual I/O Server:
    • # lsmap -vadapter vnicserver0 -vnic -fmt :
      vnicserver0:U8286.41A.21AFF8V-V2-C32897:6:72nim1:AIX:ent4:Available:U78C9.001.WZS06RN-P1-C12-T4-S14:ent1:U8286.41A.21AFF8V-V6-C3
      

    Finally create the Network Interface Backup and put and IP on top of it:

    # mkdev -c adapter -s pseudo -t ibm_ech -a adapter_names=ent0 -a backup_adapter=ent1
    ent2 Available
    # mktcpip -h 72nim1 -a 10.44.33.223 -i en2 -g 10.44.33.254 -m 255.255.255.0 -s
    en2
    72nim1
    inet0 changed
    en2 changed
    inet0 changed
    [..]
    # echo "vnic" | kdb
    +-------------------------------------------------+
    |       pACS       | Device | Link |    State     |
    |------------------+--------+------+--------------|
    | F1000A0032880000 |  ent0  |  Up  |     Open     |
    |------------------+--------+------+--------------|
    | F1000A00329B0000 |  ent1  |  Up  |     Open     |
    +-------------------------------------------------+
    

    Let’s now try different things to see if the redundancy is working ok. First let’s shutdown one of the Virtual I/O Server and let’s ping our machine from another one:

    # ping 10.14.33.223
    PING 10.14.33.223 (10.14.33.223) 56(84) bytes of data.
    64 bytes from 10.14.33.223: icmp_seq=1 ttl=255 time=0.496 ms
    64 bytes from 10.14.33.223: icmp_seq=2 ttl=255 time=0.528 ms
    64 bytes from 10.14.33.223: icmp_seq=3 ttl=255 time=0.513 ms
    [..]
    64 bytes from 10.14.33.223: icmp_seq=40 ttl=255 time=0.542 ms
    64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.514 ms
    64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.550 ms
    64 bytes from 10.14.33.223: icmp_seq=48 ttl=255 time=0.596 ms
    [..]
    --- 10.14.33.223 ping statistics ---
    50 packets transmitted, 45 received, 10% packet loss, time 49052ms
    rtt min/avg/max/mdev = 0.457/0.525/0.596/0.043 ms
    
    # errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    59224136   1120200815 P H ent2           ETHERCHANNEL FAILOVER
    F655DA07   1120200815 I S ent0           VNIC Link Down
    3DEA4C5F   1120200815 T S ent0           VNIC Error CRQ
    81453EE1   1120200815 T S vscsi1         Underlying transport error
    DE3B8540   1120200815 P H hdisk0         PATH HAS FAILED
    # echo "vnic" | kdb
    (0)> vnic
    +-------------------------------------------------+
    |       pACS       | Device | Link |    State     |
    |------------------+--------+------+--------------|
    | F1000A0032880000 |  ent0  | Down |   Unknown    |
    |------------------+--------+------+--------------|
    | F1000A00329B0000 |  ent1  |  Up  |     Open     |
    +-------------------------------------------------+
    

    Same test with the addition of an address to ping, and I’m only loosing 4 packets:

    # ping 10.14.33.223
    [..]
    64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.627 ms
    64 bytes from 10.14.33.223: icmp_seq=42 ttl=255 time=0.548 ms
    64 bytes from 10.14.33.223: icmp_seq=46 ttl=255 time=0.629 ms
    64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.492 ms
    [..]
    # errpt | more
    59224136   1120203215 P H ent2           ETHERCHANNEL FAILOVER
    F655DA07   1120203215 I S ent0           VNIC Link Down
    3DEA4C5F   1120203215 T S ent0           VNIC Error CRQ
    

    vNIC Live Partition Mobility

    By default you can use Live Partition Mobility with SRIOV vNIC, it is super simple and it is fully supported by IBM, as always I’ll show you how to do that using the HMC GUI and the command line:

    Using the GUI

    First validate the mobility operation, it will allow you to choose the destination SRIOV adapter/port on which to map your current vNIC. You have to choose:

    • The adapter (if you have more than one SRIOV adapter).
    • The Physical port on which the vNIC will be mapped.
    • The Virtual I/O Server on which the vnicserver will be created.

    New options are now available in the mobility validation panel:

    lpmiov1

    Modify each vNIC to match your destination SRIOV adapter and ports (choose the destination Virtual I/O Server here):

    lpmiov2
    lpmiov3

    Then migrate:

    lpmiov4

    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    A5E6DB96   1120205915 I S pmig           Client Partition Migration Completed
    4FB9389C   1120205915 I S ent1           VNIC Link Up
    F655DA07   1120205915 I S ent1           VNIC Link Down
    11FDF493   1120205915 I H ent2           ETHERCHANNEL RECOVERY
    4FB9389C   1120205915 I S ent1           VNIC Link Up
    4FB9389C   1120205915 I S ent0           VNIC Link Up
    [..]
    59224136   1120205915 P H ent2           ETHERCHANNEL FAILOVER
    B50A3F81   1120205915 P H ent2           TOTAL ETHERCHANNEL FAILURE
    F655DA07   1120205915 I S ent1           VNIC Link Down
    3DEA4C5F   1120205915 T S ent1           VNIC Error CRQ
    F655DA07   1120205915 I S ent0           VNIC Link Down
    3DEA4C5F   1120205915 T S ent0           VNIC Error CRQ
    08917DC6   1120205915 I S pmig           Client Partition Migration Started
    

    The ping test during the lpm show only 9 ping lost, due to etherchannel failover (on of my port was down at the destination server):

    # ping 10.14.33.223
    64 bytes from 10.14.33.223: icmp_seq=23 ttl=255 time=0.504 ms
    64 bytes from 10.14.33.223: icmp_seq=31 ttl=255 time=0.607 ms
    

    Using the command line

    I’m moving back the partition using the HMC command line interface, check the manpage for all the details. Here is the details for the vnic_mappings: slot_num/ded/[vios_lpar_name]/[vios_lpar_id]/[adapter_id]/[physical_port_id]/[capacity]:

    • Validate:
    • # migrlpar -o v -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'
      
      Warnings:
      HSCLA291 The selected partition may have an open virtual terminal session.  The management console will force termination of the partition's open virtual terminal session when the migration has completed.
      
    • Migrate:
    • # migrlpar -o m -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'
      

    Port Labelling

    One thing very annoying using LPM with vNIC is that you have to do the mapping of your vNIC each time you are moving. The default choices are never ok and the GUI will always show you the first port or the first adapter and you will have to do that job by yourself. Even worse with the command line the vnic_mappings can give you some headaches :-) . Hopefully there is a feature called port labelling. You can put a label on each SRIOV Physical port and all your machines. My advice is to tag the ports that are serving the same network and the same vlan with the same label on all your machines. During the mobility operation if labels are matching between two machine the adapter/port combination matching the label will be automatically chosen for the mobility and you will have nothing to do to map on your own. Super useful. The outputs below show you how to label your SRIOV ports:

    label1
    label2

    # chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=3,phys_port_label=adapter1port3"
    # chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=2,phys_port_label=adapter1port2"
    
    # lshwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport --level eth -F adapter_id,phys_port_label
    1,adapter1port2
    1,adapter1port3
    

    At the validation time source and destination ports will automatically be matched:

    labelautochoose

    What about performance

    One of the main reason I’m looking for SRIOV vNIC adapter is performance. As all of our design is based on the fact that we need to move all of our virtual machines from a host to one another we need a solution allowing both mobility and performance. If you have tried to run a TSM server in a virtualized environment you’ll probably understand what I mean about performance and virtualization. In the case of TSM you need a lot of network bandwidth. My current customer and my previous one tried to do that using Shared Ethernet Adapters and of course this solution did not work because a classic Virtual Ethernet Adapter is not able to provide enough bandwidth for a single Virtual I/O client. I’m not an expert about network performance but the result you will see below are pretty obvious to understand and will show you the power of vNIC and SRIOV (I know some optimization can be done on the SEA side but it’s just a super simple test).

    Methodology

    I will try here to compare a classic Virtual Ethernet Adapter with a vNIC in the same configuration, both environments are the same, using same machines, same switches on so on:

    • Two machines are used to do the test. In case of vNIC both are using a single vNIC bacedk to a 10Gb adapter, in case of Virtual Ethernet Adapter both are backed to a SEA build on top of a 10Gb adapter.
    • The two machines are running on two different s814.
    • Entitlement and memory are the same for source and destination machines.
    • In the case of vNIC the capacity of the VF is set at 100% and the physical port of the SRIOV adapter is dedicated to the vNIC.
    • In the case of vent the SEA is dedicated to the test virtual machine.
    • In both cases a MTU of 1500 is utilized.
    • The tool used for the performance test is iperf (MTU 1500, Window Size 64K, and 10 TCP thread)

    SEA test for reference only

    • iperf server:
    • seaserver1

    • iperf client:
    • seacli1

    vNIC SRIOV test

    We are here running the exact same test:

    • iperf server:
    • iperf_vnic_client2

    • iperf client:
    • iperf_vnic_client

    By using a vNIC I get 300% of the bandwidth I get with an virtual ethernet adapter. Just awesome ;-) no tuning (out of the box configuration). Nothing more to add about it it’s pretty obvious that the usage of vNIC for performance will be a must.

    Conclusion

    Are SRIOV vNICs the end of the SEAs ? Maybe, but not yet ! For some cases like performance and QoS it will be very useful and adopted (I’m pretty sure I will use that for my current customer to virtualized the TSM servers). But today in my opinion SRIOV lacks a real redundancy feature at the adapter level. What I want is a heartbeat communication between the two SRIOV adapters. Having such a feature on a SRIOV adapter will finish to convince customers to move from SEA to SRIOV vNIC. I know nothing about the future but I hope something like that will be available in the next few years. To sum up SRIOV vNICs are powerful, easy to use and simplify the configuration and management of your Power Servers. Please wait for the GA and try this new killer functionality. As always I hope it helps.

    Using Chef and cloud-init with PowerVC 1.2.2.2 | What’s new in version 1.2.2.2

    I’ve been busy; very busy and I apologize for that … almost two months since the last update on the blog, but I’m still alive and I love AIX more than ever ;-). There is no blog post about it but I’ve developped a tool called “lsseas” which can be useful to all PowerVM administrators (you can find the script on github at this address https://github.com/chmod666org/lsseas). I’ll not talk to much about it but I thought sharing the information to all my readers who are not following me on twitter was the best way to promote the tool. Have a look on it, submit your own changes on github, code and share !

    This said we can talk about this new blog post. PowerVC 1.2.2.2 has been released since a few months and there are a few things I wanted to talk about. The new version include new features making the product more powerful than ever (export/import images, activation input, vscsi lun management). PowerVC is only building “empty” machine, it’s a good start but we can do better. The activation engine can customize the virtual machines but is limited and in my humble opinion not really usable for post-installation tasks. With the recent release of cloud-init and Chef for AIX PowerVC can be utilized to build your machines from nothing … and finally get your application running in minutes. Using cloud-init and Chef can help you making your infrastructure repeatable, “versionable” and testable this is what we call infrastructure as code and it is damn powerful.

    A big thank you to Jay Kruemcke (@chromeaix), Philippe Hermes (@phhermes) and S.Tran (https://github.com/transt) , they gave me very useful help about the cloud-init support on AIX. Follow them on twitter !

    PowerVC 1.2.2.1 mandatory fixes

    Before starting please note that I strongly recommend to have the latest ifixes installed on your Virtual I/O Server. These ones are mandatory for PowerVC, install these ifixes no matter what :

    • On Virtual I/O Servers install IV66758m4c, rsctvios2:
    • # emgr -X -e /mnt/VIOS_2.2.3.4_IV66758m4c.150112.epkg.Z
      # emgr -l
      [..]
      ID  STATE LABEL      INSTALL TIME      UPDATED BY ABSTRACT
      === ===== ========== ================= ========== ======================================
      1    S    rsctvios2  03/03/15 12:13:42            RSCT fixes for VIOS
      2    S    IV66758m4c 03/03/15 12:16:04            Multiple PowerVC fixes VIOS 2.2.3.4
      3    S    IV67568s4a 03/03/15 14:12:45            man fails in VIOS shell
      [..]
      
    • Check you have the latest version of the Hardware Management Console (I strongly recommend v8.2.20 Service Pack 1):
    • hscroot@myhmc:~> lshmc -V
      "version= Version: 8
       Release: 8.2.0
       Service Pack: 1
      HMC Build level 20150216.1
      ","base_version=V8R8.2.0
      "
      

    Exporting and importing image from another PowerVC

    The PowerVC latest version allows you to export and import images. It’s a good thing ! Let’s say that like me you have a few PowerVC hosts, on different SAN networks with different storage arrays, you probably do not want to create your images on each one and you prefer to be sure to use the same image for each PowerVC. Just create one image and use the export/import feature to copy/move this image to a different storage array or PowerVC host:

    • To do so map your current image disk on the PowerVC itself (in my case by using the SVC), you can’t attach volume used for an image volume directly from PowerVC so you have to do it on the storage side by hand:
    • maptohost
      maptohost2

    • On the PowerVC host, rescan the volume and copy the whole new discovered lun with a dd:
    • powervc_source# rescan-scsi-bus.sh
      [..]
      powervc_source# multipath -ll
      mpathe (3600507680c810010f800000000000097) dm-10 IBM,2145
      [..]
      powervc_source# dd if=/dev/mapper/mpathe of=/data/download/aix7100-03-04-cloudinit-chef-ohai bs=4M
      16384+0 records in
      16384+0 records out
      68719476736 bytes (69 GB) copied, 314.429 s, 219 MB/s                                         
      
    • Map a new volume to the new PowerVC server and upload this new created file on the new PowerVC server, then dd the file back to the new volume:
    • mapnewlun

      powervc_dest# scp /data/download/aix7100-03-04-cloudinit-chef-ohai new_powervc:/data/download
      aix7100-03-04-cloudinit-chef-ohai          100%   64GB  25.7MB/s   42:28.
      powervc_dest# dd if=/data/download/aix7100-03-04-cloudinit-chef-ohai of=/dev/mapper/mpathc bs=4M
      16384+0 records in
      16384+0 records out
      68719476736 bytes (69 GB) copied, 159.028 s, 432 MB/s
      
    • Unmap the volume from the new PowerVC after the dd operation, and import it with the PowerVC graphical interface.
    • Manage the existing current volume you just created (note that the current PowerVC code does not allows you to choose cloud-init as an activation engine even if it is working great) :
    • manage_ex1
      manage_ex2

    • Import the image:
    • import1
      import2
      import3
      import4

    You can also use the command powervc-volume-image-import to import the new volume by using the command line instead of the graphical user interface. Here is an example with a Red Hat Enterprise Linux 6.4 image:

    powervc_source# dd if=/dev/hdisk4 of=/apps/images/rhel-6.4.raw bs=4M
    5815360+0 records in
    15360+0 records out
    powervc_dest# scp 10.255.248.38:/apps/images/rhel-6.4.raw .
    powervc_dest# dd if=/home/rhel-6.4.raw of=/dev/mapper/mpathe
    30720+0 records in
    30720+0 records out
    64424509440 bytes (64 GB) copied, 124.799 s, 516 MB/s
    powervc_dest# powervc-volume-image-import --name rhel64 --os rhel --volume volume_capture2 --activation-type ae
    Password:
    Image creation complete for image id: e3a4ece1-c0cd-4d44-b197-4bbbc2984a34
    

    Activation input (cloud-init and ae)

    Instead of doing post-installation tasks by hand after the deployment of the machine you can now use the activation input added recently in PowerVC. The activation input can be utilized to run any scripts you want or even better things (such as cloud-config syntax) if you are using cloud-init instead of the old activation engine. You have to remember that cloud-init is not yet officially supported by PowerVC, for this reason I think most of customers will still use the old activation engine. Latest activation engine version is also working with the activation input. On the examples below I’m of course using cloud-init :-). Don’t worry I’ll detail later in this post how to install and use cloud-init on AIX:

    • If you are using the activation engine please be sure to use the latest version. The current version of the activation engine in PowerVC 1.2.2.* is vmc-vsae-ext-2.4.5-1, the only way to be sure your are using this version is to check the size of /opt/ibm/ae/AS/vmc-sys-net/activate.py. The size of this file is 21127 bytes for the latest version. Check this before trying to do anything with the activation input. More information can be found here: Activation input documentation.
    • A simple shebang script can be used, on the example below this one is just writing a file, but it can be anything you want:
    • ai1

      # cat /tmp/activation_input
      Activation input was used on this server
      
    • If you are using cloud-init you can directly put cloud-config “script” in the activation input. The first line is always mandatory to tell the format of the activation input. If you forget to put this first line the activation input can not determine the format and the script will not be executed. Check the next point for more information about activation input:
    • ai2

      # cat /tmp/activation_input
      cloud-config activation input
      
    • There are additional fields called “server meta data key/value pairs”, just do not use them. They are used by images provided by IBM with customization of the activation engine. Forget about this it is useless, use this field only if IBM told you to do so.
    • cloud-init valid activation input can be found here: http://cloudinit.readthedocs.org/en/latest/topics/format.html. As you can see on the two examples above shell scripts and cloud-config format can be utilized, but you can also upload a gzip archive, or use a part handler format. Go on the url above for more informations.

    vscsi and mix NPIV/vscsi machine creation

    This is one of the major enhancement, PowerVC is now able create and map vscsi disks, even better you can create mixed NPIV vscsi machine. To do so create storage connectivity groups for each technology you want to use. You can choose a different way to create disk for boot volumes and for data volumes. Here are three examples, full NPIV, full vscsi, and a mixed vscsi(boot) and NPIV(data):

    connectivitygroup1
    connectivitygroup2
    connectivitygroup3

    What is really cool about this new feature is that PowerVC can use existing mapped luns on the Virtual I/O Server, please note that PowerVC will only use SAN backed devices and cannot use iSCSI or local disk (local disk can be use in the express version). You obviously have to make the zoning of your Virtual I/O Server by yourself. Here is an example where I have 69 devices mapped to my Virtual I/O Server, you can see that PowerVC is using one of the existing device for its deployment. This can be very useful if you have different teams working for the SAN and the system side, the storage guys will not change their habits and still can map you bunch of luns on the Virtual I/O Server, this can be used as a transition if you did not succeed in convincing guys from you storage team:

    $ lspv | wc -l
          69
    

    connectivitygroup_deploy1

    $ lspv | wc -l
          69
    $ lsmap -all -fmt :
    vhost1:U8202.E4D.845B2DV-V2-C28:0x00000009:vtopt0:Available:0x8100000000000000:/var/vio/VMLibrary/vopt_c1309be1ed244a5c91829e1a5dfd281c: :N/A:vtscsi1:Available:0x8200000000000000:hdisk66:U78AA.001.WZSKM6P-P1-C3-T1-W500507680C11021F-L41000000000000:false
    

    Please note that you still need to add fabrics and storage on PowerVC even if you have pre-mapped luns on your Virtual I/O Servers. This is mandatory for PowerVC image management and creation.

    Maintenance Mode

    This last feature is probably the one I like the most. You can now put your host in maintenance mode, this means that when you put a host in maintenance mode all the virtual machines hosted on this one are migrated with live partition mobility (remember the migrlpar –all option, I’m pretty sure this option is utilized for the PowerVC maintenance mode). By putting an host in maintenance mode this one is no longer available for new machines deployment and for mobility operations. The host can be shutdown for instance for a firmware upgrade.

    • Select a host and click the “Enter maintenance mode button”:
    • maintenance1

    • Choose where you want to move virtual machines, or let PowerVC decide for you (packing or stripping placement policy):
    • maintenance2

    • The host is entering maintenance mode:
    • maintenance3

    • Once the host is in maintenance mode this one is ready to be shutdown:
    • maintenance4

    • Leave the maintenance mode when you are ready:
    • maintenance5

    An overview of Chef and cloud-init

    With PowerVC you are now able to deploy new AIX virtual machines in a few minutes but there is still some work to do. What about post-installation tasks ? I’m sure that most of you are using NIM post-install scripts for post installation tasks. PowerVC does not use NIM and even if you can run your own shell scripts after a PowerVC deployment the goal of this tool is to automate a full installation… post-install included.

    If the activation engine do the job to change the hostname and ip address of the machine it is pretty hard to customize it to do other tasks. Documentation is hard to find and I can assure you that it is not easy at all to customize and maintain. PowerVC Linux user’s are probably already aware of cloud-init. cloud-init is a tool (like the activation engine) in charge of the reconfiguration of your machine after its deployment, as the activation engine do today cloud-init change the hostname and the ip address of the machine but it can do way more than that (create user, add ssh-keys, mounting a filesystem, …). The good news is that cloud-init is now available an AIX since a few days, and you can use it with PowerVC. Awesome \o/.

    If cloud-init can do one part of this job, it can’t do all and is not designed for that! It is not a configuration management tool, configurations are not centralized in a server, there is now way to create cookbooks, runbooks (or whatever you call it), you can’t pull product sources from a git server, there are a lot of things missing. cloud-init is a light tool designed for a simple job. I recently (at work and in my spare time) played a lot with configuration management tools. I’m a huge fan of Saltstack but unfortunately salt-minion (which are Saltstack clients) is not available on AIX… I had to find another tool. A few months ago Chef (by Opscode) announced the support of AIX and a release of chef-client for AIX, I decided to give it a try and I can assure you that this is damn powerful, let me explain this further.

    Instead of creating shell scripts to do your post installation, Chef allows you to create cookbooks. Cookbooks are composed by recipes and each recipes is doing a task, for instance install an Oracle client, create the home directory for root user and create its profile file, enable or disable service on the system. The recipes are coded in a Chef language, and you can directly put Ruby code inside a recipe. Chef recipes are idempotent, it means that if something has already be done, it will not be done again. The advantage of using a solution like this is that you don’t have to maintain shell code and shells scripts which are difficult to change/rewrite. Your infrastructure is repeatable and changeable in minutes (after Chef is installed you can for instance told him to change /etc/resolv.conf for all your Websphere server). This is called “infrastructure as a code”. Give it a try and you’ll see that the first thing you’ll think will be “waaaaaaaaaaaaaooooooooooo”.

    Trying to explain how PowerVC, cloud-init and Chef can work together is not really easy, a nice diagram is probably better than a long text:

    chef

    1. You have built an AIX virtual machine. On this machine cloud-init is installed, Chef client 12 is installed. cloud-init is configured to register the chef-client on the chef-server, and to run a cookbook for a specific role. This server has been captured with PowerVC and is now ready to be deployed.
    2. Virtual machines are created with PowerVC.
    3. When the machine is built cloud-init is running on first boot. The ip address and the hostname of this machine is changed with the values provided in PowerVC. cloud-init create the chef-client configuration (client.rb, validation.pem). Finally chef-client is called.
    4. chef-client is registering on chef-server. Machine are now known by the chef-server.
    5. chef-client is resolving and downloading cookbooks for a specific role. Cookbooks and recipes are executed on the machine. After cookbooks execution the machine is ready and configured.
    6. Administrator create and upload cookbooks an recipe from his knife workstation. (knife is the tool to interact with the chef-server this one can be hosted anywhere you want, your laptop, a server …)

    In a few step here is what you need to do to use PowerVC, cloud-init, and Chef together:

    1. Create a virtual machine with PowerVC.
    2. Download cloud-init, and install cloud-init in this virtual machine.
    3. Download chef-client, and install chef-client in this virtual machine.
    4. Configure cloud-init, modifiy /opt/freeware/etc/cloud.cfg. In this file put the Chef configuration of the cc_chef cloud-init module.
    5. Create mandatory files, such as /etc/chef directory, put your ohai plugins in /etc/chef/ohai-plugins directory.
    6. Stop the virtual machine.
    7. Capture the virtual machine with PowerVC.
    8. Obviously as prerequisites a chef-server is up and running, cookbooks, recipes, roles, environments are ok in this chef-server.

    cloud-init installation

    cloud-init is now available on AIX, but you have to build the rpm by yourself. Sources can be found on github at this address : https://github.com/transt/cloud-init-0.7.5. There are a lot of prerequisites, most of them can be found on the github page, some of them on famous perzl site, download and install these prerequisites; it is mandatory (links to download the prerequisites are on the github page, the zip file containing cloud-init can be downloaded here : https://github.com/transt/cloud-init-0.7.5/archive/master.zip

    # rpm -ivh --nodeps gettext-0.17-8.aix6.1.ppc.rpm
    [..]
    gettext                     ##################################################
    # for rpm in bzip2-1.0.6-2.aix6.1.ppc.rpm db-4.8.24-4.aix6.1.ppc.rpm expat-2.1.0-1.aix6.1.ppc.rpm gmp-5.1.3-1.aix6.1.ppc.rpm libffi-3.0.11-1.aix6.1.ppc.rpm openssl-1.0.1g-1.aix6.1.ppc.rpm zlib-1.2.5-6.aix6.1.ppc.rpm gdbm-1.10-1.aix6.1.ppc.rpm libiconv-1.14-1.aix6.1.ppc.rpm bash-4.2-9.aix6.1.ppc.rpm info-5.0-2.aix6.1.ppc.rpm readline-6.2-3.aix6.1.ppc.rpm ncurses-5.9-3.aix6.1.ppc.rpm sqlite-3.7.15.2-2.aix6.1.ppc.rpm python-2.7.6-1.aix6.1.ppc.rpm python-2.7.6-1.aix6.1.ppc.rpm python-devel-2.7.6-1.aix6.1.ppc.rpm python-xml-0.8.4-1.aix6.1.ppc.rpm python-boto-2.34.0-1.aix6.1.noarch.rpm python-argparse-1.2.1-1.aix6.1.noarch.rpm python-cheetah-2.4.4-2.aix6.1.ppc.rpm python-configobj-5.0.5-1.aix6.1.noarch.rpm python-jsonpointer-1.0.c1ec3df-1.aix6.1.noarch.rpm python-jsonpatch-1.8-1.aix6.1.noarch.rpm python-oauth-1.0.1-1.aix6.1.noarch.rpm python-pyserial-2.7-1.aix6.1.ppc.rpm python-prettytable-0.7.2-1.aix6.1.noarch.rpm python-requests-2.4.3-1.aix6.1.noarch.rpm libyaml-0.1.4-1.aix6.1.ppc.rpm python-setuptools-0.9.8-2.aix6.1.noarch.rpm fdupes-1.51-1.aix5.1.ppc.rpm ; do rpm -ivh $rpm ;done
    [..]
    python-oauth                ##################################################
    python-pyserial             ##################################################
    python-prettytable          ##################################################
    python-requests             ##################################################
    libyaml                     ##################################################
    

    Build the rpm by following the commands below. You can reuse this rpm on every AIX on which you want to install cloud-init package:

    # jar -xvf cloud-init-0.7.5-master.zip
    inflated: cloud-init-0.7.5-master/upstart/cloud-log-shutdown.conf
    # mv cloud-init-0.7.5-master  cloud-init-0.7.5
    # chmod -Rf +x cloud-init-0.7.5/bin
    # chmod -Rf +x cloud-init-0.7.5/tools
    # cp cloud-init-0.7.5/packages/aix/cloud-init.spec.in /opt/freeware/src/packages/SPECS/cloud-init.spec
    # tar -cvf cloud-init-0.7.5.tar cloud-init-0.7.5
    [..]
    a cloud-init-0.7.5/upstart/cloud-init.conf 1 blocks
    a cloud-init-0.7.5/upstart/cloud-log-shutdown.conf 2 blocks
    # gzip cloud-init-0.7.5.tar
    # cp cloud-init-0.7.5.tar.gz /opt/freeware/src/packages/SOURCES/cloud-init-0.7.5.tar.gz
    # rpm -v -bb /opt/freeware/src/packages/SPECS/cloud-init.spec
    [..]
    Requires: cloud-init = 0.7.5
    Wrote: /opt/freeware/src/packages/RPMS/ppc/cloud-init-0.7.5-4.1.aix7.1.ppc.rpm
    Wrote: /opt/freeware/src/packages/RPMS/ppc/cloud-init-doc-0.7.5-4.1.aix7.1.ppc.rpm
    Wrote: /opt/freeware/src/packages/RPMS/ppc/cloud-init-test-0.7.5-4.1.aix7.1.ppc.rpm
    

    Finally install the rpm:

    # rpm -ivh /opt/freeware/src/packages/RPMS/ppc/cloud-init-0.7.5-4.1.aix7.1.ppc.rpm
    cloud-init                  ##################################################
    # rpm -qa | grep cloud-init
    cloud-init-0.7.5-4.1
    

    cloud-init configuration

    By installing cloud-init package on AIX some entries have been added to /etc/rc.d/rc2.d:

    ls -l /etc/rc.d/rc2.d | grep cloud
    lrwxrwxrwx    1 root     system           33 Apr 26 15:13 S01cloud-init-local -> /etc/rc.d/init.d/cloud-init-local
    lrwxrwxrwx    1 root     system           27 Apr 26 15:13 S02cloud-init -> /etc/rc.d/init.d/cloud-init
    lrwxrwxrwx    1 root     system           29 Apr 26 15:13 S03cloud-config -> /etc/rc.d/init.d/cloud-config
    lrwxrwxrwx    1 root     system           28 Apr 26 15:13 S04cloud-final -> /etc/rc.d/init.d/cloud-final
    

    The default configuration file is located in /opt/freeware/etc/cloud/cloud.cfg, this configuration file is splited in three parts. The first one called cloud_init_module tells cloud-init to run specifics modules when the cloud-init script is started at boot time. For instance set the hostname of the machine (set_hostname), reset the rmc (reset_rmc) and so on. In our case this part will automatically change the hostname and the ip address of the machine by the values provided in PowerVC at the deployement time. This cloud_init_module part is splited in two, the local one and the normal one. The local on is using information provided by the cdrom build by PowerVC at the time of the deployment. This cdrom provides ip and hostname of the machine, activation input script, nameservers information. The datasource_list stanza tells cloud-init to use the “ConfigDrive” (in our case virtual cdrom) to get ip and hostname needed by some cloud_init_modules. The second one called cloud_config_module tells cloud-init to run specific modules when cloud-config script is called, at this stage the minimal requirements have already been configured by the previous cloud_init_module stage (dns, ip address, hostname are ok). We will configure and setup the chef-client in this stage. The last part called cloud_final_module tells cloud-init to run specific modules when the cloud-final script is called. You can at this step print a final message, reboot the host and so on (In my case host reboot is needed by my install_sddpcm Chef recipe). Here is an overview of the cloud.cfg configuration file:

    cloud-init

    • The datasource_list stanza tells cloud-init to use the virtual cdrom as a source of information:
    • datasource_list: ['ConfigDrive']
      
    • cloud_init_module:
    • cloud_init_modules:
      [..]
       - set-multipath-hcheck-interval
       - update-bootlist
       - reset-rmc
       - set_hostname
       - update_hostname
       - update_etc_host
      
    • cloud_config_module:
    • cloud_config_modules:
      [..]
        - mounts
        - chef
        - runcmd
      
    • cloud_final_module:
    • cloud_final_modules:
        [..]
        - final-message
      

    If you do not want to use Chef at all you can modify the cloud.cfg file to fit you needs (running homemade scripts, mounting filesystems …), but my goal here is to do the job with Chef. We will try to do the minimal job with cloud-init, so the goal here is to configure cloud-init to configure chef-client. Anyway I also wanted to play with cloud-init and see its capabilities. The full documentation of cloud-init can be found here https://cloudinit.readthedocs.org/en/latest/. Here are a few thing I just added (the Chef part will be detailed later), but keep in mind you can just use cloud-init without Chef if you want (setup you ssh key, mount or create filesystems, create files and so on):

    write_files:
      - path: /tmp/cloud-init-started
        content: |
          cloud-init was started on this server
        permissions: '0755'
      - path: /var/log/cloud-init-sub.log
        content: |
          starting chef logging
        permissions: '0755'
    
    final_message: "The system is up, cloud-init is finished"
    

    EDIT : The IBM developper of cloud-init for AIX just send me a mail yesterday about the new support of cc_power_state. As I need to reboot my host at the end of the build I can with the latest version of cloud-init for AIX use the power_state stanza, I here use poweroff as an example, use reboot … for reboot:

    power_state:
     delay: "+5"
     mode: poweroff
     message: cloud-init mandatory reboot for sddpcm
     timeout: 5
    

    power_state1

    Rerun cloud-init for testing purpose

    You probably want to test your cloud-init configuration before of after capturing the machine. When cloud-init is launched by the startup script a check is performed to be sure that cloud-init has not already been run. Some “semaphores” files are created in /opt/freeware/var/lib/cloud/instance/sem to tell modules have already been executed. If you want to re-run cloud-init by hand without having to rebuild a machine, just remove these files in this directory :

    # rm -rf /opt/freeware/var/lib/cloud/instance/sem
    

    Let’s say we just want to re-run the Chef part:

    # rm /opt/freeware/var/lib/cloud/instance/sem/config_chef
    

    To sum up here is what I want to do with cloud-init:

    1. Use the cdrom as datasource.
    2. Set the hostname and ip.
    3. Setup my chef-client.
    4. Print a final message.
    5. Do a mandatory reboot at the end of the installation.

    chef-client installation and configuration

    Before modifying the cloud.cfg file to tell cloud-init to setup the Chef client we first have to download and install the chef-client on the AIX host we will capture later. Download the Chef client bff file at this address: https://opscode-omnibus-packages.s3.amazonaws.com/aix/6.1/powerpc/chef-12.1.2-1.powerpc.bff and install it:

    # installp -aXYgd . chef
    [..]
    +-----------------------------------------------------------------------------+
                             Installing Software...
    +-----------------------------------------------------------------------------+
    
    installp: APPLYING software for:
            chef 12.1.2.1
    [..]
    Installation Summary
    --------------------
    Name                        Level           Part        Event       Result
    -------------------------------------------------------------------------------
    chef                        12.1.2.1        USR         APPLY       SUCCESS
    chef                        12.1.2.1        ROOT        APPLY       SUCCESS
    # lslpp -l | grep -i chef
      chef                      12.1.2.1    C     F    The full stack of chef
    # which chef-client
    /usr/bin/chef-client
    

    The configuration file of chef-client created by cloud-init will be created in the /etc/chef directory, by default the /etc/chef directory does not exists, so you’ll have to create it

    # mkdir -p /etc/chef
    # mkdir -p /etc/chef/ohai_plugins
    

    If -like me- you are using custom ohai plugins, you have two things to do. cloud-init is using templates files to build configuration files needed by Chef. Theses templates files are located in /opt/freeware/etc/cloud/templates. Modify the chef_client.rb.tmpl file to add a configuration line for ohai plugin_path. Copy your ohai plugin in /etc/chef/ohai_plugins:

    # tail -1 /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl
    Ohai::Config[:plugin_path] << '/etc/chef/ohai_plugins'
    # ls /etc/chef/ohai_plugins
    aixcustom.rb
    

    Add the chef stanza in the /opt/freeware/cloud/cloud.cfg. After this step the image is ready to be captured (Check ohai plugin configuration if you need one), so the chef-client is already installed. Put the force_install stanza to false, put the server_url, the validation_name of your Chef server, the organization and finally put the validation RSA private key provided in your Chef server (in the example below the key has been truncated for obvious purpose; server_url and validation_name have also been replaced). As you can see below, I tell here to Chef to run all recipes defined in the aix7 cookbook, we'll see later how to create a cookbook and recipes :

    chef:
      force_install: false
      server_url: "https://chefserver.lab.chmod666.org/organizations/chmod666"
      validation_name: "chmod666-validator"
      validation_key: |
        -----BEGIN RSA PRIVATE KEY-----
        MIIEpQIBAAKCAQEApj/Qqb+zppWZP+G3e/OA/2FXukNXskV8Z7ygEI9027XC3Jg8
        [..]
        XCEHzpaBXQbQyLshS4wAIVGxnPtyqXkdDIN5bJwIgLaMTLRSTtjH/WY=
        -----END RSA PRIVATE KEY-----
      run_list:
        - "role[aix7]"
    
    runcmd:
      - /usr/bin/chef-client
    

    EDIT: With the latest build of cloud-init for AIX there is no need to run chef-client with the runcmd stanza. Just add exec: 1 in the chef stanza.

    To sum up, cloud-init is installed, cloud-init is configured to run a few actions at boot time but mainly to configure chef-client and run it with a specific role> The chef-client is installed. The machine can now be shutdown and is ready to be deployed. At the deployement time cloud-init will do the job to change ip address and hostname, and configure Chef. Chef will retreive the cookbooks and recipes and run it on the machine.

    If you want to use custom ohai plugins read the ohai part before capturing your machine.

    capture
    capture2

    Use chef-solo for testing

    You will have to create your own recipes. My advice is to use chef-solo to debug. The chef-solo binary file is provided with the chef-client package. This one can be use without a Chef server to run and execute Chef recipes:

    • Create a test recipe:
    • # mkdir -p ~/chef/cookbooks/testing/recipes
      # cat  ~/chef/cookbooks/testing/recipes/test.rb
      file "/tmp/helloworld.txt" do
        owner "root"
        group "system"
        mode "0755"
        action :create
        content "Hello world !"
      end
      
    • Create a run_list with you test recipe:
    • # cat ~/chef/node.json
      {
        "run_list": [ "recipe[testing::test]" ]
      }
      
    • Create attribute file for chef-solo execution:
    • # cat  ~/chef/solo.rb
      file_cache_path "/root/chef"
      cookbook_path "/root/chef/cookbooks"
      json_attribs "/root/chef/node.json"
      
    • Run chef-solo:
    • # chef-solo -c /root/chef/solo.rb
      

    chef-solo

    cookbooks and recipes example on AIX

    Let's say you have written all you recipes using chef-solo on a test server. On the Chef server you now want to put all these recipes in a cookbook. From the workstation, create a cookbook :

    # knife cookbook create test
    ** Creating cookbook test in /home/kadmin/.chef/cookbooks
    ** Creating README for cookbook: aix7
    ** Creating CHANGELOG for cookbook: aix7
    ** Creating metadata for cookbook: aix7
    

    In the .chef directory you can now find a directory for the aix7 cookbook. In this one you will find a directory for each Chef objects : recipes, templates, files, and so on. This place is called the chef-repo. I strongly recommend using this place as a git repository (you will by doing this save all modifications of any object in the cookbook).

    # ls /home/kadmin/.chef/cookbooks/aix7/recipes
    create_fs_rootvg.rb  create_profile_root.rb  create_user_group.rb  delete_group.rb  delete_user.rb  dns.rb  install_sddpcm.rb  install_ssh.rb  ntp.rb  ohai_custom.rb  test_ohai.rb
    # ls /home/kadmin/.chef/cookbooks/aix7/templates/default
    aixcustom.rb.erb  ntp.conf.erb  ohai_test.erb  resolv.conf.erb
    

    Recipes

    Here are a few examples of my own recipes:

    • install_ssh, the recipe is mounting an nfs filesystem (nim server). The nim_server is an attribute coming from role default attribute (we will check that later), the oslevel is an ohai attribute coming from an ohai custom plugin (we will check that later too). openssh.license and openssh.server filesets are installed, the filesystem is unmounted, and finally ssh service is started:
    • # creating temporary directory
      directory "/var/mnttmp" do
        action :create
      end
      # mouting nim server
      mount "/var/mnttmp" do
        device "#{node[:nim_server]}:/export/nim/lppsource/#{node['aixcustom']['oslevel']}"
        fstype "nfs"
        action :mount
      end
      # installing ssh packages (openssh.license, openssh.base)
      bff_package "openssh.license" do
        source "/var/mnttmp"
        action :install
      end
      bff_package "openssh.base" do
        source "/var/mnttmp"
        action :install
      end
      # umount the /var/mnttmp directory
      mount "/var/mnttmp" do
        fstype "nfs"
        action :umount
      end
      # deleting temporary directory
      directory "/var/mnttmp" do
        action :delete
      end
      # start and enable ssh service
      service "sshd" do
        action :start
      end
      
    • install_sddpcm, the recipe is mounting an nfs filesystem (nim server). The nim_server is an attribute coming from role default attribute (we will check that later), the platform_version is coming from ohai. devices.fcp.disk.ibm.mpio and devices.sddpcm.71.rte filesets are installed, the filesystem is unmounted:
    • # creating temporary directory
      directory "/var/mnttmp" do
        action :create
      end
      # mouting nim server
      mount "/var/mnttmp" do
        device "#{node[:nim_server]}:/export/nim/lpp_source/#{node['platform_version']}/sddpcm-71-2660"
        fstype "nfs"
        action :mount
      end
      # installing sddpcm packages (devices.fcp.disk.ibm.mpio, devices.sddpcm.71.rte)
      bff_package "devices.fcp.disk.ibm.mpio" do
        source "/var/mnttmp"
        action :install
      end
      bff_package "devices.sddpcm.71.rte" do
        source "/var/mnttmp"
        action :install
      end
      # umount the /var/mnttmp directory
      mount "/var/mnttmp" do
        fstype "nfs"
        action :umount
      end
      # deleting temporary directory
      directory "/var/mnttmp" do
        action :delete
      end
      
    • create_fs_rootvg, some filesystems are extended, an /apps filesystem is created and mounted. Please note that there are no cookbooks for AIX lvm for the moment and you have here to use the execute statement which is the only not to be idempotent:
    • execute "hd3" do
        command "chfs -a size=1024M /tmp"
      end
      execute "hd9var" do
        command "chfs -a size=512M /var"
      end
      execute "/apps" do
        command "crfs -v jfs2 -g rootvg -m /apps -Ay -a size=1M ; chlv -n appslv fslv00"
        not_if { ::File.exists?("/dev/appslv")}
      end
      mount "/apps" do
        device "/dev/appslv"
        fstype "jfs2"
      end
      
    • ntp, ntp.conf.erb located in the template directory is copied to /etc/ntp.conf:
    • template "/etc/ntp.conf" do
        source "ntp.conf.erb"
      end
      
    • dns, resolv.conf.erb located in the template directory is copied to /etc/resolv.conf:
    • template "/etc/resolv.conf" do
        source "resolv.conf.erb"
      end
      
    • crearte_user_group, a user for tadd is created:
    • user "taddmux" do
        gid 'sys'
        uid 421
        home '/home/taddmux'
        comment 'user TADDM connect SSH'
      end
      

    Templates

    On the recipes above templates are used for ntp and dns configuration. Templates files are files in which some strings are replaced by Chef attributes found in the roles, the environments, in ohai, or even directly in recipes, here are the two files I used for dns and ntp

    • ntp.conf.erb, ntpserver1,2,3 attributes are found in environments (let's say I have siteA and siteB and ntp are different for each site, I can define an environment for siteA en siteB):
    • [..]
      server <%= node['ntpserver1'] %>
      server <%= node['ntpserver2'] %>
      server <%= node['ntpserver3'] %>
      driftfile /etc/ntp.drift
      tracefile /etc/ntp.trace
      
    • resolv.conf.erb, nameserver1,2,3 and namesearch are found in environments:
    • search  <%= node['namesearch'] %>
      nameserver      <%= node['nameserver1'] %>
      nameserver      <%= node['nameserver2'] %>
      nameserver      <%= node['nameserver3'] %>
      

    role assignation

    Chef roles can be used to run different chef recipes depending of the type of server you want to post install. You can for instance create a role for webserver in which the Websphere recipe will be executed and create a role for databases server in which the recipe for Oracle will be executed. In my case and for the simplicity of this example I just create one role called aix7

    # knife role create aix7
    Created role[aix7]
    # knife role edit aix7
    {
      "name": "aix7",
      "description": "",
      "json_class": "Chef::Role",
      "default_attributes": {
        "nim_server": "nimsrv01"
      },
      "override_attributes": {
    
      },
      "chef_type": "role",
      "run_list": [
        "recipe[aix7::ohai_custom]",
        "recipe[aix7::create_fs_rootvg]",
        "recipe[aix7::create_profile_root]",
        "recipe[aix7::test_ohai]",
        "recipe[aix7::install_ssh]",
        "recipe[aix7::install_sddpcm]",
        "recipe[aix7::ntp]",
        "recipe[aix7::dns]"
      ],
      "env_run_lists": {
    
      }
    }
    

    What we can se here are two important things. We created an attribute specific to this role called nim_server. In all recipes, templates "node['nim_server']" will be replaced by nimsrv01 (remember the recipes above, and remember we told chef-client to run the aix7 role). We created a run_list telling that recipes coming from aix7 cookbook : install_ssh, install_sddpcm, ... should be exectued on a server calling chef-client with the aix7 role.

    environments

    Chef environments can be use to separate you environments, for instance production, developpement, backup, or in my example sites. In my company depending the site on which you are building a machine nameservers and ntp servers will differ. Remember that we are using templates files for resolv.conf and ntp.conf files :

    knife environment show siteA
    chef_type:           environment
    cookbook_versions:
    default_attributes:
      namesearch:  lab.chmod666.org chmod666.org
      nameserver1: 10.10.10.10
      nameserver2: 10.10.10.11
      nameserver3: 10.10.10.12
      ntpserver1:  11.10.10.10
      ntpserver2:  11.10.10.11
      ntpserver3:  11.10.10.12
    description:         production site
    json_class:          Chef::Environment
    name:                siteA
    override_attributes:
    

    When chef-client will be called with -E siteA attribute it will replace node['namesearch'] by "lab.chmod666.org chomd666.org" in all recipes, and templates files.

    A Chef run

    When you are ok with your cookbook upload it to the Chef server:

    # knife cookbook upload aix7
    Uploading aix7           [0.1.0]
    Uploaded 1 cookbook.
    

    When chef-client is not executed by cloud-init you can run it by hand. I thought it is interessting to put an output of chef-client here, you can see that files are modified, packages installed and so on ;-) :

    chef-clientrun1
    chef-clientrun2

    Ohai

    ohai is a command delivered with chef-client. Its purpose is to gather information about the machine on which chef-client is executed. Each time chef-client is running a call to ohai is launched. By default ohai is gathering a lot of information such as ip address of the machine, the lpar id, the lpar name, and so on. A call to ohai is returning a json tree. Each element of this json tree can be accessed in Chef recipes or in Chef templates. For instance to get the lpar name the 'node['virtualization']['lpar_name']' can be called. Here is an example of a single call to ohai:

    # ohai | more
      "ipaddress": "10.244.248.56",
      "macaddress": "FA:A3:6A:5C:82:20",
      "os": "aix",
      "os_version": "1",
      "platform": "aix",
      "platform_version": "7.1",
      "platform_family": "aix",
      "uptime_seconds": 14165,
      "uptime": "3 hours 56 minutes 05 seconds",
      "virtualization": {
        "lpar_no": "7",
        "lpar_name": "s00va9940866-ada56a6e-0000004d"
      },
    

    At the time of writing this blog post there is -at my humble opinion- some attirbutes missing in ohai. For instance if you want to install a specific package from an lpp_source you first need to know what is you current oslevel (I mean the output of oslevel -s). Fortunately ohai can be surcharged by custom plugin and you can add your own attributes what ever it is.

    • In ohai 7 (the one shipped with chef-client 12) an attribute needs to be added to the Chef client.rb configuration to tells where the ohai plugins will be located. Remember that the chef-client is configured by cloud-init, to do so you need to modify the template used by cloud-init the build the client.rb file. This one is located in /opt/freeware/etc/cloud/template:
    • # tail -1 /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl
      Ohai::Config[:plugin_path] << '/etc/chef/ohai_plugins'
      # mkdir -p /etc/chef/ohai_plugins
      
    • After this modification the machine is ready to be captured.
    • You want your custom ohai plugins to be uploaded to the chef-client machine at the time of chef-client execution, here is an example of custom ohai plugin used as a template. This one will gather the oslevel (oslevel -s), the node name, the partition name and the memory mode of the machine. These attributes are gathered with lparstat command:
    • Ohai.plugin(:Aixcustom) do
        provides "aixcustom"
      
        collect_data(:aix) do
          aixcustom Mash.new
      
          oslevel = shell_out("oslevel -s").stdout.split($/)[0]
          nodename = shell_out("lparstat -i | awk -F ':' '$1 ~ \"Node Name\" {print $2}'").stdout.split($/)[0]
          partitionname = shell_out("lparstat -i | awk -F ':' '$1 ~ \"Partition Name\" {print $2}'").stdout.split($/)[0]
          memorymode = shell_out("lparstat -i | awk -F ':' '$1 ~ \"Memory Mode\" {print $2}'").stdout.split($/)[0]
      
          aixcustom[:oslevel] = oslevel
          aixcustom[:nodename] = nodename
          aixcustom[:partitionname] = partitionname
          aixcustom[:memorymode] = memorymode
        end
      end
      
    • The custom ohai plugin is written. Remember that you want this one to be uploaded on the machine a the chef-client execution. New attributes created by this plugin needs to be added in ohai. Here is a recipe uploading the custom ohai plugin, at the time the plugin is uploaded ohai is reloaded and new attributes can be utilized in any further templates (for recipes you have no other choice than putting the custom ohai plugin in the directroy before the capture):
    • cat ~/.chef/cookbooks/aix7/recipes/ohai_custom.rb
      ohai "reload" do
        action :reload
      end
      
      template "/etc/chef/ohai_plugins/aixcustom.rb" do
        notifies :reload, "ohai[reload]", :immediately
      end
      

    chef-server, chef workstation, knife

    I'll not detail here how to setup a Chef server, and how configure you Chef workstation (knife). There are plenty of good tutorials about that on the internet. Please just note that you need to use Chef sever 12 if you are using Chef client 12. Here are some good link to start.

    I had some difficulties during the configuration here are a few tricks to know :

    • cacert can by found here: /opt/opscode/embedded/ssl/cert/cacert.pem
    • The Chef validation key can be found in /etc/chef/chef-validator.pem

    Building the machine, checking the logs

    • The write_file part was executed, the file is present in /tmp filesystem:
    • # cat /tmp/cloud-init-started
      cloud-init was started on this server
      
    • The chef-client was configured, file are present in /etc/chef directory, looking at the log file these files were created by cloud-init
    • # ls -l /etc/chef
      total 32
      -rw-------    1 root     system         1679 Apr 26 23:46 client.pem
      -rw-r--r--    1 root     system          646 Apr 26 23:46 client.rb
      -rw-r--r--    1 root     system           38 Apr 26 23:46 firstboot.json
      -rw-r--r--    1 root     system         1679 Apr 26 23:46 validation.pem
      
      # grep chef | /var/log/cloud-init-output.log
      2015-04-26 23:46:22,463 - importer.py[DEBUG]: Found cc_chef with attributes ['handle'] in ['cloudinit.config.cc_chef']
      2015-04-26 23:46:22,879 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instances/a8b8fe0d-34c1-4bdb-821c-777fca1c391f/sem/config_chef - wb: [420] 23 bytes
      2015-04-26 23:46:22,882 - helpers.py[DEBUG]: Running config-chef using lock ()
      2015-04-26 23:46:22,884 - util.py[DEBUG]: Writing to /etc/chef/validation.pem - wb: [420] 1679 bytes
      2015-04-26 23:46:22,887 - util.py[DEBUG]: Reading from /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl (quiet=False)
      2015-04-26 23:46:22,889 - util.py[DEBUG]: Read 892 bytes from /opt/freeware/etc/cloud/templates/chef_client.rb.tmpl
      2015-04-26 23:46:22,954 - util.py[DEBUG]: Writing to /etc/chef/client.rb - wb: [420] 646 bytes
      2015-04-26 23:46:22,958 - util.py[DEBUG]: Writing to /etc/chef/firstboot.json - wb: [420] 38 bytes
      
    • The runcmd part was executed:
    • # cat /opt/freeware/var/lib/cloud/instance/scripts/runcmd
      #!/bin/sh
      /usr/bin/chef-client
      
      2015-04-26 23:46:22,488 - importer.py[DEBUG]: Found cc_runcmd with attributes ['handle'] in ['cloudinit.config.cc_runcmd']
      2015-04-26 23:46:22,983 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instances/a8b8fe0d-34c1-4bdb-821c-777fca1c391f/sem/config_runcmd - wb: [420] 23 bytes
      2015-04-26 23:46:22,986 - helpers.py[DEBUG]: Running config-runcmd using lock ()
      2015-04-26 23:46:22,987 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instances/a8b8fe0d-34c1-4bdb-821c-777fca1c391f/scripts/runcmd - wb: [448] 31 bytes
      2015-04-26 23:46:25,868 - util.py[DEBUG]: Running command ['/opt/freeware/var/lib/cloud/instance/scripts/runcmd'] with allowed return codes [0] (shell=False, capture=False)
      
    • The final message was printed in the output of the cloud-init log file
    • 2015-04-26 23:06:01,203 - helpers.py[DEBUG]: Running config-final-message using lock ()
      The system is up, cloud-init is finished
      2015-04-26 23:06:01,240 - util.py[DEBUG]: The system is up, cloud-init is finished
      2015-04-26 23:06:01,242 - util.py[DEBUG]: Writing to /opt/freeware/var/lib/cloud/instance/boot-finished - wb: [420] 57 bytes
      

    On the Chef server you can check the client registred itself and get details about it.

    # knife node list | grep a8b8fe0d-34c1-4bdb-821c-777fca1c391f
    a8b8fe0d-34c1-4bdb-821c-777fca1c391f
    # knife node show a8b8fe0d-34c1-4bdb-821c-777fca1c391f
    Node Name:   a8b8fe0d-34c1-4bdb-821c-777fca1c391f
    Environment: _default
    FQDN:
    IP:          10.10.208.61
    Run List:    role[aix7]
    Roles:       france_testing
    Recipes:     aix7::create_fs_rootvg, aix7::create_profile_root
    Platform:    aix 7.1
    Tags:
    

    What's next ?

    If you have a look on the Chef supermarket (the place where you can download Chef cookbooks written by the community and validated by opscode) you'll see that there are not a lot of cookbooks for AIX. I'm currently writting my own cookbook for AIX logical volume manager and filesystems creation, but there is still a lot of work to do on cookbooks creation for AIX. Here is a list of cookbooks that needs to be written by the community : chdev, multibos, mksysb, nim client, wpar, update_all, ldap_client .... I can continue this list but I'm sure that you have a lot of ideas. Last word learn ruby and write cookbooks, they will be used by the community and we can finally have a good configuration management tool on AIX. With PowerVC, cloud-init and Chef support AIX will have a full "DevOps" stack and can finally fight against Linux. As always hope this blog post helps you to understand PowerVC, cloud-init and Chef !