Unleash the true potential of SRIOV vNIC using vNIC failover !

I’m always working on tight schedule, I never have the time to write documentation because we’re moving fast, very fast … but not as fast as I want to ;-). A few months ago we were asked to put the TSM servers in our PowerVC environment I thought it was a very very bad idea to put a pet among the cattle as TSM servers are very specific and super I/O intensive in our environment (and are configured with plenty of rmt devices. This means that we tried to put lan-free stuffs into Openstack which is not designed at all for this kind of things). In my previous place we tried to put the TSM servers behind a virtualized environment (this means serving network through Shared Ethernet Adapters) and this was an EPIC FAIL. A few weeks after putting the servers in production we decided to move back to physical I/O and decided to used dedicated network adapters. As we didn’t want to make the same mistake in my current place we decided not to go on Shared Ethernet Adapters. Instead of that we took the decision to use SRIOV vNIC. SRIOV vNIC have the advantage to be fully virtualized (this means LPM aware and super flexible) allowing us to have the wanted flexibility (by moving TSM servers between sites if we feel the need to put a host in maintenance mode or if we are facing any kind of outage). In my previous blog post about vNIC I was very happy with the performance but not with the reliability. I didn’t want to go on NIB adapters for network redundancy (because it is an anti-virtualization way of doing things (we do not want to manage anything inside the VM, we want to let the virtualization environment do the job for us)). Lucky for me the project was reschedule to the end of the year and we finally took the decision not to put the TSM server into our big Openstack by dedicating some hosts for the backup stuffs. The latest version of PowerVM, HMC and firmware arrived just at time to let me use SRIOV vNIC failover new feature for this new TSM environment (fortunately for me we had some data center issues allowing me to wait enough time not to go on NIB and start the production directly with SRIOV vNIC \o/). I just have delivered the first four servers to my backup team yesterday and I must admit that SRIOV vNIC failover is a killer feature for this kind of things. Let’s now see how to setup this !

Prerequisites

As always using the latest features means you need to have everything up to date. In this case the minimal requierements for SRIOV vNIC failover are Virtual I/O Servers 2.2.5.10, Hardware Management Console v8R860 with the latest patchs and finally having a firmware up to date (ie. fw 860). Note that not all AIX versions are ok with SRIOV vNIC I’m here only using AIX 7.2 TL1 SP1:

  • Check the Virtual I/O Server are installed in 2.2.5.10:
  • # ioslevel
    2.2.5.10
    
  • Check the HMC is in the latest version (V8R860)
  • hscroot@myhmc:~> lshmc -V
    "version= Version: 8
     Release: 8.6.0
     Service Pack: 0
    HMC Build level 20161101.1
    MH01655: Required fix for HMC V8R8.6.0 (11-01-2016)
    ","base_version=V8R8.6.0
    "
    

    860

  • Check the firmware version is ok on the PowerSystem:
  • # updlic -o u -t sys -l latest -m reptilian-9119-MME-659707C -r mountpoint -d /home/hscroot/860_056/ -v
    # lslic -m reptilan-9119-MME-65BA46F -F activated_level,activated_spname
    56,FW860.10
    

    fw

What’s SRIOV vNIC failover and how it works ?

I’ll not explain here what’s an SRIOV vNIC, if you want to know more about it just check my previous blog post speaking about this topic A first look at SRIOV vNIC adapters. What’s failover is adding is a feature allowing you to add as “many” backing devices as you want for a vNIC adapter (the maximum is 6 backing devices). For each backing device you have the possibility to choose on which Virtual I/O Server will be created the corresponding vnicserver and set a failover priority to determine which backing device is active. Keep in mind that priorities are working the exact same way as it is with Shared Ethernet Adapter. This means that priority 10 is an higher priority than priority 20.

vnicvisio1

On the example shown on the images above and below the vNIC is configured with two backing devices (on two differents SRIOV adapters) with priority 10 and 20. As long as there is no outage (for instance on the Virtual I/O Server or on the adapter itself) the physical port utilized will be the one with priority 10. If the adapter has for instance an hardware issue we will have the possiblity to manually fallback on the second backing device or let the hypervisor do this for us by checking the next highest priority to choose the right backing device to use. Easy. This allow us to have redundant LPM aware and high performance adapters fully virtualized. A MUST :-) !

vnicvisio2

Creating a SRIOV vNIC failover using the HMC GUI and administrating it

To create or delete an SRIOV vNIC failover adapter (I’ll call this vNIC for the rest of the blog post) the machine must be shutdown or active (this is not possible to add a vNIC when a machine is booted in OpenFirmware). The only way to do this using the HMC GUI is to used the enhanced interface (no problem as we will have no other choice in a near future). Select the machine on which you want to create the adapter and click on the “Virtual NICs” tab.

vnic1b

Click “Add Virtual NIC”:

vnic1c

Chose the “Physical Port Location Code” (the physical port of the SRIOV adapter) on which you want to create the vNIC. You can add from one to six “backup adapter” (by clicking the “Add Entry” buton). This means that only one vNIC will be active at a moment. If this one is failing (adapter issue, network issue) the vNIC will failover to the next backup adapter depending on the “Failover priority”. Be careful to spread the hosting Virtual I/O Server to be sure that having a Virtual I/O Server down will be seamless for you partition:

vnic1d

On the example above:

  • I’m creating a vNIC failover with “vNIC Auto Priority Failover” enabled.
  • Four VF will be created two on the VIOS ending with 88, two on the VIOS ending with 89.
  • Obviously four vnicservers will be created on the VIOS (2 on each).
  • The lower priority will take the lead. This means That if the first one with priority 10 is failing the active adapter will be the second one. Then if the second one with priority 20 is failing the third one will be active and so on. Keep in my that if your lower priority is ok nothing will appends if one on the other backup adapter is failing. Be smart when choosing the priorities. As Yoda says “Wise you must be!”.
  • The physical ports are located on different CECs.

vnic1e

The “Advanced Virtual NIC Settings” is applied to all the vNIC that will be created (in the example above 4). For instance I’m using vlan tagging on these port so I just need to apply the “Port VLAN ID” one time.

vnic1f

You can choose or not to allow the hypervisor to perform the failover/fallback automatically depending on the priorities you have set. If you click “enable” the hypervisor will automatically failover to the next operational backing device depending on the priorities. If it is disabled only a user can trigger a failover operation.

vnic1g

Be careful the priorities are designed the same way they are on Shared Ethernet Adapter. This means the lowest number you will have in the failover priority will be the “highest priority failover” just like it is designed for Shared Ethernet Adapter. On the image below you can notice that the “priority” 10 which is the “highest failover priority” is active (but it is the lowest number between 10 20 30 and 40)

vnic1h

After the creation of the vNIC you can check differents stuffs on the Virtual I/O Server. You will notice that every entry added for the creation of the vNIC has a corresponding VF (virtual function) and a corresponding vnicserver (each vnicserver has a VF mapped on it):

  • You can see that for each entry added when creating a vNIC you’ll have the corresponding VF device present on the Virtual I/O Servers:
  • vios1# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C3-C1-T2-S5                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C3-C1-T2-S6                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
    vios2# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
  • For each VF you’ll see the corresponding vnicserver devices:
  • vios1# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
    vios2# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
  • You can check the corresponding mapped VF for each vnicserver using the ‘lsmap’ command. You can check on funny thing: when the adapter was never “used” by using the “Make the backing Device Active” button in the GUI the corresponding client name and Client device will not be showed:
  • vios1# lsmap -all -vnic -fmt :
    [..]
    vnicserver1:U9119.MME.659707C-V2-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C3-C1-T2-S5:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V2-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C3-C1-T2-S6:N/A:U9119.MME.659707C-V6-C6
    
    vios2# lsmap -all -vnic
    [..]
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver1   U9119.MME.659707C-V1-C32898             6 N/A            N/A
    
    Backing device:ent3
    Status:Available
    Physloc:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2
    Client device name:ent0
    Client device physloc:U9119.MME.659707C-V6-C6
    
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver2   U9119.MME.659707C-V1-C32899             6 N/A            N/A
    
    Backing device:ent4
    Status:Available
    Physloc:U78CA.001.CSS08EL-P1-C4-C1-T2-S2
    Client device name:N/A
    Client device physloc:U9119.MME.659707C-V6-C6
    
  • You can activate the device by yourself just by clicking the “Make backing Device Active Button” in the GUI and check the vnicserver is now logged:
  • vnic1i
    vnic1j

    vios2# lsmap -all -vnic -vadapter
    [..]
    vnicserver1:U9119.MME.659707C-V1-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V1-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C4-C1-T2-S2:N/A:U9119.MME.659707C-V6-C6
    
  • I noticed something pretty strange for me. When you are doing a manual failover of the vNIC the auto-priority will be set to disable. Remember to re-enable it after the manual operation was performed:
  • vnic1k

    You can also check the status and the priority of the vNIC in the Virtual I/O Server using the vnicstat command. Some good information are showed by the command, the state of the device, if it is active or not (I have noticed 2 different states in my test which are “active” (meaning this is the vf/vnicserver you are using) and “config_2″ meaning the adapter is ready and available for a failover operation (there is probably another state when the link is down but I didn’t had the time to ask my network team to shut a port to verify this)) and finally the failover priority. The vnicstat command is a root command.

    vios1#  vnicstat vnicserver1
    
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 10
    
    Client Partition ID: 6
    Client Partition Name: lizard
    Client Operating System: AIX
    Client Device Name: ent0
    Client Device Location Code: U9119.MME.659707C-V6-C6
    [..]
    
    vios2# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: config_2
    Backing Device Name: ent3
    
    Failover State: inactive
    Failover Readiness: operational
    Failover Priority: 20
    [..]
    

    You can also check vnic server events in this errpt (login when failover and so on …)

    # errpt | more
    8C577CB6   1202195216 I S vnicserver1    VNIC Transport Event
    60D73419   1202194816 I S vnicserver1    VNIC Client Login
    # errpt -aj 60D73419 | more
    ---------------------------------------------------------------------------
    LABEL:          VS_CLIENT_LOGIN
    IDENTIFIER:     60D73419
    
    Date/Time:       Fri Dec  2 19:48:06 2016
    Sequence Number: 10567
    Machine Id:      00C9707C4C00
    Node Id:         vios2
    Class:           S
    Type:            INFO
    WPAR:            Global
    Resource Name:   vnicserver1
    
    Description
    VNIC Client Login
    
    Probable Causes
    VNIC Client Login
    
    Failure Causes
    VNIC Client Login
    

    Same thing using the hmc command line.

    Now we will do the same thing in command line. I warn you the commands are pretty huge !!!!

    • List the sriov adapter (you will need those to create the vNICs):
    • # lshwres -r sriov --rsubtype adapter -m reptilian-9119-MME-65BA46F
      adapter_id=3,slot_id=21010012,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=4,slot_id=21010013,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=1,slot_id=21010022,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=2,slot_id=21010023,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      
    • List vNIC for virtual machine “lizard”:
    • lshwres -r virtualio  -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=0,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/3/0/2700c003/2.0/2.0/50,sriov/vios2/2/1/0/27004003/2.0/2.0/60","backing_device_states=sriov/2700c003/0/Operational,sriov/27004003/1/Operational"
      
    • Creates a vNIC with 2 backing devices first one on Virtual I/O Server 1 on adapter 1 on physical port 2 with a failover priority set to 10, second one on Virtual I/O Server 2 on adapter 3 on physical port 2 with a failover priority set to 20 (this vNIC will take the next available slot which will be 6) (WARNING: Physical port numbering starts from 0):
    • #chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o a -p lizard --rsubtype vnic -v -a 'port_vlan_id=3455,auto_priority_failover=1,backing_devices="sriov/vios1//1/1/2.0/10,sriov/vios1//3/1/2.0/20"'
      #lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational"
      
    • Add two backing devices (one on each vios on adapter 2 and 4, both on physical port 2 with failover priority set to 30 and 40) on vNIC with slot 6:
    • # chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o s --rsubtype vnic -p lizard -s 6 -a '"backing_devices+=sriov/vios1//2/1/2.0/30,sriov/vios2//4/1/2.0/40"'
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Change the failover priority of logical port 2700400b of the vNIC in slot 6 to 11:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnicbkdev -p lizard -s 6 --logport 2700400b -a "failover_priority=11"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Make logical port 27008005 active on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o act --rsubtype vnicbkdev -p lizard  -s 6 --logport 27008005 
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/0/Operational,sriov/2700c008/0/Operational,sriov/27008005/1/Operational,sriov/27010002/0/Operational"
      
    • Re-enable automatic failover on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnic -p lizard  -s 6 -a "auto_priority_failover=1"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      

    Testing the failover.

    It’s now time to test is the failover is working as intended. The test will be super simple I will just shutoff one of the two Virtual I/O Server and check if I’m loosing some packets or not. I’m first checking on which VIOS is located the active adapter:

    vnic1l

    I now need to shutdown the Virtual I/O Server ending with 88 and check if the one ending with 89 is taking the lead:

    *****88# shutdown -force 
    

    Priorities 10 and 30 are on the shutted Virtual I/O Server, the highest priority is on the active Virtual I/O Server is 20. This backing device hosted on the second Virtual I/O Server is serving the network I/Os;

    vnic1m

    You can check the same thing with command line on the remaining Virtual I/O Server:

    *****89# errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    60D73419   1202214716 I S vnicserver0    VNIC Client Login
    60D73419   1202214716 I S vnicserver1    VNIC Client Login
    *****89# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 20
    
    

    During my tests the failover was working as I expected. You can see on the picture below that during this test I only lost one ping between 64 and 66 during the failover/failback process.

    vnic1n

    In the partition I saw some messaging in the errpt during the failover:

    # errpt | mroe 
    4FB9389C   1202215816 I S ent0           VNIC Link Up
    F655DA07   1202215816 I S ent0           VNIC Link Down
    # errpt -a | more
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: DOWN   logical link: DOWN
    Status
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: UP   logical link: UP
    Status
    

    What about Live Partition Mobility.

    If you want a seamless LPM experience without having to choose the destination adapter and physical port on which to map you current vNIC backing devices on the destination, just fill the label and sublabel (most important is label) for each physical port of your SRIOV adapter. Then during the LPM if names are aligned between two systems the good physical port will be automatically chose depending on the names of the label:

    vnic1o
    vnic1p

    The LPM was working like a charm and I didn’t notice any particular problems during the move. vNIC failover and LPM are working ok as long as you take care of your SRIOV labels :-). I did notice on AIX 7.2 TL1 SP1 that there was no errpt messages in the partition itself but just in the Virtual I/O Server … weird :-)

    # errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    3EB09F5A   1202222416 I S Migration      Migration completed successfully
    

    Conlusion.

    No long story here. If you need performance AND flexibility you absolutely have to use SRIOV vNIC failover adapters. This feature offers you the best of two worlds having the possibility to dedicate 10GB adapters with a failover capability without having to be worried about LPM or about NIB configuration. It’s not applicable in all cases but it’s definitely something to have for an environment such as TSM or network I/O intensive workloads. Use it !

    About reptilians !

    Before you start reading this, keep your sense of humor and be noticed that what I say is not related to my workplace at all it’s a general way of thinking not especially based on my experience. Don’t be offended by this it’s just a personal opinion based on things I may or may have not seen during my life. You’ve been warned.

    This blog was never a place to share my opinions about life and society but I must admit that I should have done that before. Speaking about this kind of things makes you feel alive in world where everything needs to be ok and where you don’t have anymore the right to feel or express something about what you are living. There are couple of good blog posts speaking of this kind of things and related to the IT world. I agree with all of what is said in these posts. Some of the authors of these posts are just telling what they love in their daily jobs but I think it’s also a way to say what they probably won’t love in another one :-) :

    • Adam Leventhal’s “I’m not a resource”: here
    • Brendan Gregg’s “Working at Netflix in 2016″: here

    All of this to say that I work at nights, I work on weekends, I’m thinking about PowerSystems/computers when I fall asleep. I always have new ideas and I always want to learn new things, discover new technologies and features. I truly, deeply love this but being like this does not help me and will never help me in my daily job for one single reason. In this world people who have the knowledge are not people who are taking technical decisions it’s sad but true. I’m just good at working the most I can for the less money possible. Nobody cares if techs are happy, unhappy, want to stay or leave. I doesn’t make any differences for anyone driving a company. What’s important is money. Everything is meaningless. We are no one we are nothing, just number in a excel spreadsheet. I’m probably saying because I’m not good enough in anything to find an acceptable workplace. Once again sad but true.

    Even worst, if you just want to follow what’s the industry is asking you have to be everywhere and know everything. I know I’ll be forced in a very near future to move on Devops/ Linux (I love Linux I’m an RHCE certified engineer !). That’s why since a couple of years now, at night after my daily job is finished I’m working again: working to understand how Docker is working, working to install my own Openstack on my own machines, working to understand Saltstack, Ceph, Python, Ruby, Go …. it’s a never ending process. But it’s still not enough for them ! No enough to be consider as good or good enough guy to fit for a job. I remember being asked to know about Openstack, Cassandra, Hadoop, AWS, KVM, Linux, Automation tools (puppet this time), Docker and continuous integration for one single job application. First, I seriously doubt that someone will have such skills and be good at each. Second even if I’m an expert on each one if you have a look a few years ago it was the exact same thing but with different products. You have to understand and be good at every new products in minutes. All of this to understand that one or two years after you are considered as an “expert” you are bad at everything that exists in the industry. I’m really sick of this fight against something I can’t control. Being a hard worker and clever enough to understand every new features is not enough nowadays. On top of that you also need to be a beautiful person with a nice perfect smile wearing a perfect suit. You also have to be on LinkedIn and be connected with the good persons. And even if every of these boxes are checked you still need to be lucky enough to be at the right place at the right moment. I’m so sick of this. Work doesn’t pay. Only luck. I don’t want to live in this kind of world but I have to. Anyway this is just a “two-cents” way of thinking. Everything is probably a big trick orchestrated by this reptilians lizard mens ! ^^. Be good at what you do and don’t care about what people are thinking of you (even your horrible french accent during your sessions) … that’s the most important !

    picture-of-reptilian-alien

    Using the Simplified Remote Restart capability on Power8 Scale Out Servers

    A few weeks ago I had to work on simplified remote restart. I’m not lucky enough yet -because of some political decisions in my company- to have access to any E880 or E870. We just have a few scale-out machines to play with (S814). For some critical applications we need in the future to be able to reboot the virtual machine if the system hosting the machine has failed (Hardware problem). We decided a couple of month ago not to use remote restart because it was mandatory to use a reserved storage pool device and it was too hard to manage because of this mandatory storage. We now have enough P8 boxes to try and understand the new version of remote restart called simplified remote restart which does not need any reserved storage pool device. For those who want to understand what remote restart is I strongly recommend you to check my previous blog post about remote restart on two P7 boxes: Configuration of a remote restart partition. For the others here is what I learned about the simplified version of this awesome feature.

    Please keep in mind that the FSP of the machine must be up to perform a simplified remote restart operation. It means that if for instance you loose one of your datacenter or the link between your two datacenters you cannot use simplified remote restart to restart you partitions on the main/backup site. Simplified Remote Restart only prevents you from an hardware failure of your machine. Maybe this will change in a near future but for the moment it is the most important thing to understand about simplified remote restart.

    Updating to the latest version of firmware

    I was very surprised when I got my Power8 machines. After deploying these boxes I decided to give a try to simplified remote restart but It was just not possible. Since the Power8 Scale Out servers were release they were NOT simplified remote restart capable. The release of the SV830 firmware now enables the Simplified Remote restart on Power8 Scale Out machines. Please note that there is nothing about it in the patch note, so chmod666.org is the only place where you can get this information :-). Here is the patch note: here. Last word you will find on the internet that you need Power8 to use simplified remote restart. It’s true but partially true. YOU NEED A P8 MACHINE WITH AT LEAST A 820 FIRMWARE.

    The first thing to do is to update your firmware to the SV830 version (on both systems participating in the simplified remote restart operation):

    # updlic -o u -t sys -l latest -m p814-1 -r mountpoint -d /home/hscroot/SV830_048 -v
    [..]
    # lslic -m p814-1 -F activated_spname,installed_level,ecnumber
    FW830.00,48,01SV830
    # lslic -m p814-2 -F activated_spname,installed_level,ecnumber
    FW830.00,48,01SV830
    

    You can check the firmware version directly from the Hardware Management Console or in the ASMI:

    fw1
    fw3

    After the firmware upgrade verify that you now have the Simplfied Remote Restart capability set to true.

    fw2

    # lssyscfg -r sys -F name,powervm_lpar_simplified_remote_restart_capable
    p720-1,0
    p814-1,1
    p720-2,0
    p814-2,1
    

    Prerequisites

    These prerequisites are true ONLY for Scale out systems:

    • To update to the firmware SV830_048 you need the latest Hardware Management Console release which is v8r8.3.0 plus MH01514 PTF.
    • Obviously on Scale out system SV830_048 is the minimum firmware requirement.
    • Minimum level of Virtual I/O Servers is 2.2.3.4 (for both source and destination systems).
    • PowerVM enterprise. (to be confirmed)

    Enabling simplified remote restart of an existing partition

    You probably want to enable simplified remote restart after an LPM migration/evacuation. After migrating your virtual machine(s) to a Power 8 with the Simplified Remote Restart Capability you have to enable this capability on all the virtual machines. This can only be done when the machine is shutdown, so you first have to stop the virtual machines (after a live partition mobility move) if you want to enable the SRR. It can’t be done without having to reboot the virtual machine:

    • List current partition running on the system and check which one are “simplified remote restart capable” (here only one is simplified remote restart capable):
    • # lssyscfg -r lpar -m p814-1 -F name,simplified_remote_restart_capable
      vios1,0
      vios2,0
      lpar1,1
      lpar2,0
      lpar3,0
      lpar4,0
      lpar5,0
      lpar6,0
      lpar7,0
      
    • For each lpar not simplified remote restart capable change the simplified_remote_restart_capable attribute using the chssyscfg command. Please note that you can’t do this using the Hardware Management Console gui (in the latest 8r8.3.0, when enabling it by the Hardware management console the GUI is telling you that you need a reserved device storage which is needed by the Remote Restart Capability and not by the simplified version of remote restart. You have to use the command line ! (check screenshot below)
    • You can’t change this attribute while the machine is running:
    • gui_change_to_srr

    • You can’t do it with the GUI after the machine is shutdown:
    • gui_change_to_srr2
      gui_change_to_srr3

    • The only way to enable this attribute is to do it by using the Hardware Management Console command line (please note in the output below that running lpar cannot be changed):
    • # for i in lpar2 lpar3 lpar4 lpar5 lpar6 lpar7 ; do chsyscfg -r lpar -m p824-2 -i "name=$i,simplified_remote_restart_capable=1" ; done
      An error occurred while changing the partition named lpar6.
      HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown.
      An error occurred while changing the partition named lpar7.
      HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown.
      # lssyscfg -r lpar -m p824-1 -F name,simplified_remote_restart_capable,lpar_env | grep -v vioserver
      lpar1,1,aixlinux
      lpar2,1,aixlinux
      lpar3,1,aixlinux
      lpar4,1,aixlinux
      lpar5,1,aixlinux
      lpar6,0,aixlinux
      lpar7,0,aixlinux
      

    Remote restarting

    If you are trying to do a live partition mobility operation back to a P7 or P8 box without the simplified remote restart capability it will not be possible. Enabling the simplified remote restart will force the virtual machine to stay on P8 boxes with simplified remote restart capability. This is one of the reason why most of customers are not doing it:

    # migrlpar -o v -m p814-1 -t p720-1 -p lpar2
    Errors:
    HSCLB909 This operation is not allowed because managed system p720-1 does not support PowerVM Simplified Partition Remote Restart.
    

    lpm_not_capable_anymore

    On the Hardware Management Console you can see that the virtual machine is simplified remote restart capable by checking its properties:

    gui_change_to_srr4

    You can now try to remote restart your virtual machines to another server. As always the status of the server has to be different from Operating (Power Off, Error, Error – Dump in progress, Initializing). As always my advice is to validate before restarting:

    # rrstartlpar -o validate -m p824-1 -t p824-2 -p lpar1
    # echo $?
    0
    # rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1
    HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.
    
    # lssyscfg -r sys -F name,state
    p824-2,Operating
    p824-1,Power Off
    # rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1
    

    By doing a remote restart operation the machine will boot automatically. You can check in the errpt that in most cases the partition ID will be changed (proving that you are on another machine):

    # errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    A6DF45AA   0618170615 I O RMCdaemon      The daemon is started.
    1BA7DF4E   0618170615 P S SRC            SOFTWARE PROGRAM ERROR
    CB4A951F   0618170615 I S SRC            SOFTWARE PROGRAM ERROR
    CB4A951F   0618170615 I S SRC            SOFTWARE PROGRAM ERROR
    D872C399   0618170615 I O sys0           Partition ID changed and devices recreat
    

    Be very careful with the ghostdev sys0 attribute. Every VM remote restarted needs to have ghostdev set to 0 to avoid an ODM wipe (If you remote restart an lpar with ghostdev set to 1 you will loose all ODM customization)

    # lsattr -El sys0 -a ghostdev
    ghostdev 0 Recreate ODM devices on system change / modify PVID True
    

    When the source machine is up and running you have to clean the old definition of the remote restarted lpar by launching a cleanup operation. This will wipe the old lpar defintion:

    # rrstartlpar -o cleanup -m p814-1 -p lpar1
    

    The RRmonitor (modified version)

    There is a script delivered by IBM called rrMonitor, this one is looking at the PowerSystem‘s state and if this one is in particular state is restarting a specific virtual machine. This script is just not usable by a user because it has to be executed directly on the HMC (you need a pesh password to put the script on the hmc) and is only checking one particular virtual machine. I had to modify this script to ssh to the HMC and then check for every lpar on the machine and not just one in particular. You can download my modified version here : rrMonitor. Here is what’s the script is doing:

    • Checking the state of the source machine.
    • If this one is not “Operating”, the script search for every remote restartable lpars on the machine.
    • The script is launching remote restart operations to remote restart all the partitions.
    • The script is telling the user the command to cleanup the old lpar when the source machine will be running again.
    # ./rrMonitor p814-1 p814-2 all 60 myhmc
    Getting remote restartable lpars
    lpar1 is rr simplified capable
    lpar1 rr status is Remote Restartable
    lpar2 is rr simplified capable
    lpar2 rr status is Remote Restartable
    lpar3 is rr simplified capable
    lpar3 rr status is Remote Restartable
    lpar4 is rr simplified capable
    lpar4 rr status is Remote Restartable
    Checking for source server state....
    Source server state is Operating
    Checking for source server state....
    Source server state is Operating
    Checking for source server state....
    Source server state is Power Off In Progress
    Checking for source server state....
    Source server state is Power Off
    It's time to remote restart
    Remote restarting lpar1
    Remote restarting lpar2
    Remote restarting lpar3
    Remote restarting lpar4
    Thu Jun 18 20:20:40 CEST 2015
    Source server p814-1 state is Power Off
    Source server has crashed and hence attempting a remote restart of the partition lpar1 in the destination server p814-2
    Thu Jun 18 20:23:12 CEST 2015
    The remote restart operation was successful
    The cleanup operation has to be executed on the source server once the server is back to operating state
    The following command can be used to execute the cleanup operation,
    rrstartlpar -m p814-1 -p lpar1 -o cleanup
    Thu Jun 18 20:23:12 CEST 2015
    Source server p814-1 state is Power Off
    Source server has crashed and hence attempting a remote restart of the partition lpar2 in the destination server p814-2
    Thu Jun 18 20:25:42 CEST 2015
    The remote restart operation was successful
    The cleanup operation has to be executed on the source server once the server is back to operating state
    The following command can be used to execute the cleanup operation,
    rrstartlpar -m sp814-1 -p lpar2 -o cleanup
    Thu Jun 18 20:25:42 CEST 2015
    [..]
    

    Conclusion

    As you can see the Simplified version of the remote restart feature is simpler that the normal one. My advice is to create all your lpars with the simplified remote restart attribute. It’s that easy :). If you plan to LPM back to P6 or P7 box, don’t use simplified remote restart. I think this functionality will become more popular when all the old P7 and P6 will be replaced by P8. As always I hope it helps.

    Here are a couple of link with great documentations about Simplified Remote Restart:

    • Simplified Remote Restart Whitepaper: here
    • Original rrMonitor: here
    • Materials about lastest HMC release and a couple of videos related to the Simplified Remote Restart: here

    Adventures in IBM Systems Director in System P environment. Part 4: Playing with errnotify and genevent.sh

    For some puzzling reasons that I don’t understand my client do not want to install supervision agents (Tivoli) on VIO Servers. I can’t argue that decision, I have to work with it, anyway, in my opinion VIO Sersers are one of the most important part in a Pseries virtualized environment. It has to be monitored. One of our recurring problems comes from Ethernet Ports always flapping resulting in Shared Ethernet Adapter failover (become primary, then become backup, and so on) :

    • ent1 is flapping on this VIO Server :
    • # errlog | more 
      E136EAFA   1009203212 I H ent7           BECOME PRIMARY
      F3931284   1009203212 I H ent1           ETHERNET NETWORK RECOVERY MODE
      0B41DD00   1009103812 I H ent7           ADAPTER FAILURE
      EC0BCCD4   1009103812 T H ent1           ETHERNET DOWN
      E136EAFA   1009103812 I H ent7           BECOME PRIMARY
      F3931284   1009103812 I H ent1           ETHERNET NETWORK RECOVERY MODE
      0B41DD00   1009022212 I H ent7           ADAPTER FAILURE
      EC0BCCD4   1009022212 T H ent1           ETHERNET DOWN
      E136EAFA   1009022212 I H ent7           BECOME PRIMARY
      
    • SEA on second VIO Server become backup, then primary :
    • 40D97644   1009203212 I H ent7           BECOME BACKUP
      E136EAFA   1009103912 I H ent7           BECOME PRIMARY
      40D97644   1009103812 I H ent7           BECOME BACKUP
      E136EAFA   1009022212 I H ent7           BECOME PRIMARY
      40D97644   1009022212 I H ent7           BECOME BACKUP
      E136EAFA   1009022112 I H ent7           BECOME PRIMARY
      40D97644   1009022112 I H ent7           BECOME BACKUP
      E136EAFA   1009022112 I H ent7           BECOME PRIMARY
      40D97644   1009022112 I H ent7           BECOME BACKUP
      

    This problem can really be important on production hosts and has to be detected “in real-time”. Trapping errpt errors and running scripts when an error is raised is possible with errnotify and can be even more useful if this error is raised on IBM Systems Director. Here is the method is used to send me a mail every time a port is flapping :

    errnotify configuration

    The first thing I have to do is to setup errnotify to run a script every time an ethernet link is down on a VIO Server :

    • Identify error code : on my VIO Server I have to identify which error code is raised when a link is up or down :
    • This list is maybe not exhaustive but these are errors code I found in errpt when a link is :
      • Down (MSNENT_LINK_DOWN and GOENT_LINK_DOWN) :
      • # oem_setup_env
        # errpt -t | grep ABB8A22B
        ABB8A22B MSNENT_LINK_DOWN    TEMP H  ETHERNET DOWN
        # errpt -t | grep EC0BCCD4
        EC0BCCD4 GOENT_LINK_DOWN     TEMP H  ETHERNET DOWN
        
      • Up (MSNENT_RCVRY_EXIT and GOENT_RCVRY_EXIT) :
      • # oem_setup_env
        # errpt -t | grep 4969AE33
        4969AE33 MSNENT_RCVRY_EXIT   INFO H  ETHERNET NETWORK RECOVERY MODE
        # errpt -t |grep F3931284
        F3931284 GOENT_RCVRY_EXIT    INFO H  ETHERNET NETWORK RECOVERY MODE
        
    • With these errpt identifiers create an odm entries file to be added in odm :
    • # vi errnotify.odmadd
      errnotify:
          en_pid = 0
          en_name = "MSNENT_RCVRY_EXIT"
          en_persistenceflg = 1
          en_label = "MSNENT_RCVRY_EXIT"
          en_class = H
          en_crcid = 0
          en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
      errnotify:
          en_pid = 0
          en_name = "MSNENT_LINK_DOWN"
          en_persistenceflg = 1
          en_label = "MSNENT_LINK_DOWN"
          en_class = H
          en_crcid = 0
          en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
      errnotify:
          en_pid = 0
          en_name = "GOENT_LINK_DOWN"
          en_persistenceflg = 1
          en_label = "GOENT_LINK_DOWN"
          en_class = H
          en_crcid = 0
          en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
      errnotify:
          en_pid = 0
          en_name = "GOENT_RCVRY_EXIT"
          en_persistenceflg = 1
          en_label = "GOENT_RCVRY_EXIT"
          en_class = H
          en_crcid = 0
          en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
      # odmadd errnotify.odmadd
      
    • Check entries are correctly added in odm :
    • # odmget errnotify | tail -16
      errnotify:
              en_pid = 0
              en_name = "GOENT_RCVRY_EXI"
              en_persistenceflg = 1
              en_label = "GOENT_RCVRY_EXIT"
              en_crcid = 0
              en_class = "H"
              en_type = ""
              en_alertflg = ""
              en_resource = ""
              en_rtype = ""
              en_rclass = ""
              en_symptom = ""
              en_err64 = ""
              en_dup = ""
              en_method = "/usr/lib/ras/link_state.notify $1 $2 $3 $4 $5 $6 $7 $8 $9"
      

    Writing script called by errnotify. Using genevent.sh

    For my own use I need to write in a file every time a link has become up or down, this will be added to the script. But what I want first is to generate an event on Systems Director. This can be done with a script delivered by Common Agent called genevent.sh. Let’s have a look on my link_state.notify script :

    # cat /usr/lib/ras/link_state.notify
    #!/bin/ksh
    echo "`date` | $1 $2 $3 $4 $5 $6 $7 $8 $9" >> /home/padmin/vio_mon/vio_mon.errnotify
    /var/opt/tivoli/ep/runtime/agent/subagents/director/genevent.sh /type:"Managed Resource.Managed System Resource.Logical Resource.Logical Device.Logical Port.Network Port.Ethernet Port" /text:"Link $6 $9" /sev:0
    

    The first line as I told you before is for my own use, the second one call genvent.sh with :

    • /type : event type generated on Systems Director, in my case : Resource.Logical Device.Logical Port.Network Port.Ethernet Port.
    • /text : $6 is the ethernet adapter, and $9 the errpt identifer (GOENT_RCVRY_EXIT, etc.).
    • /sev : event severity, 0 for fatal, 1 for critial and so on.

    Create Event filter, Event Action and Event Automation Plan

    As shown on the image below, an error will be raised on the Systems Director when a link is flapping :

    Right click on this event to create an Event Filter, mine is called “Link Down on VIO Server” :

    Create a new Event Action, in my case I want to send en email on my mailbox to notify a Link Down (IP and e-mail adresses are hidden in this screenshot) :

    With this Event Filter and the Event Action, create an Event Automation Plan to send a mail when a link is flapping, when you’re creating this Event Automation Plan, use the newly created Event filter and Event Action :

    • Event filter choice :
    • Event action choice :
    • Event automation plan creation summary :

    Testing

    Do not forget to test this newly created Event Automation Plan, call your network team to shut/no shut an Shared Ethernet adapter port. You’ll receive a new mail in your mail box :

    Link ent1 MSNENT_RCVRY_EXIT
    
    Event Text     Link ent1 MSNENT_RCVRY_EXIT
    Date           10/18/2012 6:21 PM CEST
    Severity       Fatal
    Event Type     Managed Resource.Managed System Resource.Logical Resource.Logical Device.Logical Port.Network Port.Ethernet Port
    System Name    vio35
    Sender Name    vio35
    

    Hope this can help.