Unleash the true potential of SRIOV vNIC using vNIC failover !

I’m always working on tight schedule, I never have the time to write documentation because we’re moving fast, very fast … but not as fast as I want to ;-). A few months ago we were asked to put the TSM servers in our PowerVC environment I thought it was a very very bad idea to put a pet among the cattle as TSM servers are very specific and super I/O intensive in our environment (and are configured with plenty of rmt devices. This means that we tried to put lan-free stuffs into Openstack which is not designed at all for this kind of things). In my previous place we tried to put the TSM servers behind a virtualized environment (this means serving network through Shared Ethernet Adapters) and this was an EPIC FAIL. A few weeks after putting the servers in production we decided to move back to physical I/O and decided to used dedicated network adapters. As we didn’t want to make the same mistake in my current place we decided not to go on Shared Ethernet Adapters. Instead of that we took the decision to use SRIOV vNIC. SRIOV vNIC have the advantage to be fully virtualized (this means LPM aware and super flexible) allowing us to have the wanted flexibility (by moving TSM servers between sites if we feel the need to put a host in maintenance mode or if we are facing any kind of outage). In my previous blog post about vNIC I was very happy with the performance but not with the reliability. I didn’t want to go on NIB adapters for network redundancy (because it is an anti-virtualization way of doing things (we do not want to manage anything inside the VM, we want to let the virtualization environment do the job for us)). Lucky for me the project was reschedule to the end of the year and we finally took the decision not to put the TSM server into our big Openstack by dedicating some hosts for the backup stuffs. The latest version of PowerVM, HMC and firmware arrived just at time to let me use SRIOV vNIC failover new feature for this new TSM environment (fortunately for me we had some data center issues allowing me to wait enough time not to go on NIB and start the production directly with SRIOV vNIC \o/). I just have delivered the first four servers to my backup team yesterday and I must admit that SRIOV vNIC failover is a killer feature for this kind of things. Let’s now see how to setup this !

Prerequisites

As always using the latest features means you need to have everything up to date. In this case the minimal requierements for SRIOV vNIC failover are Virtual I/O Servers 2.2.5.10, Hardware Management Console v8R860 with the latest patchs and finally having a firmware up to date (ie. fw 860). Note that not all AIX versions are ok with SRIOV vNIC I’m here only using AIX 7.2 TL1 SP1:

  • Check the Virtual I/O Server are installed in 2.2.5.10:
  • # ioslevel
    2.2.5.10
    
  • Check the HMC is in the latest version (V8R860)
  • hscroot@myhmc:~> lshmc -V
    "version= Version: 8
     Release: 8.6.0
     Service Pack: 0
    HMC Build level 20161101.1
    MH01655: Required fix for HMC V8R8.6.0 (11-01-2016)
    ","base_version=V8R8.6.0
    "
    

    860

  • Check the firmware version is ok on the PowerSystem:
  • # updlic -o u -t sys -l latest -m reptilian-9119-MME-659707C -r mountpoint -d /home/hscroot/860_056/ -v
    # lslic -m reptilan-9119-MME-65BA46F -F activated_level,activated_spname
    56,FW860.10
    

    fw

What’s SRIOV vNIC failover and how it works ?

I’ll not explain here what’s an SRIOV vNIC, if you want to know more about it just check my previous blog post speaking about this topic A first look at SRIOV vNIC adapters. What’s failover is adding is a feature allowing you to add as “many” backing devices as you want for a vNIC adapter (the maximum is 6 backing devices). For each backing device you have the possibility to choose on which Virtual I/O Server will be created the corresponding vnicserver and set a failover priority to determine which backing device is active. Keep in mind that priorities are working the exact same way as it is with Shared Ethernet Adapter. This means that priority 10 is an higher priority than priority 20.

vnicvisio1

On the example shown on the images above and below the vNIC is configured with two backing devices (on two differents SRIOV adapters) with priority 10 and 20. As long as there is no outage (for instance on the Virtual I/O Server or on the adapter itself) the physical port utilized will be the one with priority 10. If the adapter has for instance an hardware issue we will have the possiblity to manually fallback on the second backing device or let the hypervisor do this for us by checking the next highest priority to choose the right backing device to use. Easy. This allow us to have redundant LPM aware and high performance adapters fully virtualized. A MUST :-) !

vnicvisio2

Creating a SRIOV vNIC failover using the HMC GUI and administrating it

To create or delete an SRIOV vNIC failover adapter (I’ll call this vNIC for the rest of the blog post) the machine must be shutdown or active (this is not possible to add a vNIC when a machine is booted in OpenFirmware). The only way to do this using the HMC GUI is to used the enhanced interface (no problem as we will have no other choice in a near future). Select the machine on which you want to create the adapter and click on the “Virtual NICs” tab.

vnic1b

Click “Add Virtual NIC”:

vnic1c

Chose the “Physical Port Location Code” (the physical port of the SRIOV adapter) on which you want to create the vNIC. You can add from one to six “backup adapter” (by clicking the “Add Entry” buton). This means that only one vNIC will be active at a moment. If this one is failing (adapter issue, network issue) the vNIC will failover to the next backup adapter depending on the “Failover priority”. Be careful to spread the hosting Virtual I/O Server to be sure that having a Virtual I/O Server down will be seamless for you partition:

vnic1d

On the example above:

  • I’m creating a vNIC failover with “vNIC Auto Priority Failover” enabled.
  • Four VF will be created two on the VIOS ending with 88, two on the VIOS ending with 89.
  • Obviously four vnicservers will be created on the VIOS (2 on each).
  • The lower priority will take the lead. This means That if the first one with priority 10 is failing the active adapter will be the second one. Then if the second one with priority 20 is failing the third one will be active and so on. Keep in my that if your lower priority is ok nothing will appends if one on the other backup adapter is failing. Be smart when choosing the priorities. As Yoda says “Wise you must be!”.
  • The physical ports are located on different CECs.

vnic1e

The “Advanced Virtual NIC Settings” is applied to all the vNIC that will be created (in the example above 4). For instance I’m using vlan tagging on these port so I just need to apply the “Port VLAN ID” one time.

vnic1f

You can choose or not to allow the hypervisor to perform the failover/fallback automatically depending on the priorities you have set. If you click “enable” the hypervisor will automatically failover to the next operational backing device depending on the priorities. If it is disabled only a user can trigger a failover operation.

vnic1g

Be careful the priorities are designed the same way they are on Shared Ethernet Adapter. This means the lowest number you will have in the failover priority will be the “highest priority failover” just like it is designed for Shared Ethernet Adapter. On the image below you can notice that the “priority” 10 which is the “highest failover priority” is active (but it is the lowest number between 10 20 30 and 40)

vnic1h

After the creation of the vNIC you can check differents stuffs on the Virtual I/O Server. You will notice that every entry added for the creation of the vNIC has a corresponding VF (virtual function) and a corresponding vnicserver (each vnicserver has a VF mapped on it):

  • You can see that for each entry added when creating a vNIC you’ll have the corresponding VF device present on the Virtual I/O Servers:
  • vios1# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C3-C1-T2-S5                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C3-C1-T2-S6                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
    vios2# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
  • For each VF you’ll see the corresponding vnicserver devices:
  • vios1# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
    vios2# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
  • You can check the corresponding mapped VF for each vnicserver using the ‘lsmap’ command. You can check on funny thing: when the adapter was never “used” by using the “Make the backing Device Active” button in the GUI the corresponding client name and Client device will not be showed:
  • vios1# lsmap -all -vnic -fmt :
    [..]
    vnicserver1:U9119.MME.659707C-V2-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C3-C1-T2-S5:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V2-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C3-C1-T2-S6:N/A:U9119.MME.659707C-V6-C6
    
    vios2# lsmap -all -vnic
    [..]
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver1   U9119.MME.659707C-V1-C32898             6 N/A            N/A
    
    Backing device:ent3
    Status:Available
    Physloc:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2
    Client device name:ent0
    Client device physloc:U9119.MME.659707C-V6-C6
    
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver2   U9119.MME.659707C-V1-C32899             6 N/A            N/A
    
    Backing device:ent4
    Status:Available
    Physloc:U78CA.001.CSS08EL-P1-C4-C1-T2-S2
    Client device name:N/A
    Client device physloc:U9119.MME.659707C-V6-C6
    
  • You can activate the device by yourself just by clicking the “Make backing Device Active Button” in the GUI and check the vnicserver is now logged:
  • vnic1i
    vnic1j

    vios2# lsmap -all -vnic -vadapter
    [..]
    vnicserver1:U9119.MME.659707C-V1-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V1-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C4-C1-T2-S2:N/A:U9119.MME.659707C-V6-C6
    
  • I noticed something pretty strange for me. When you are doing a manual failover of the vNIC the auto-priority will be set to disable. Remember to re-enable it after the manual operation was performed:
  • vnic1k

    You can also check the status and the priority of the vNIC in the Virtual I/O Server using the vnicstat command. Some good information are showed by the command, the state of the device, if it is active or not (I have noticed 2 different states in my test which are “active” (meaning this is the vf/vnicserver you are using) and “config_2″ meaning the adapter is ready and available for a failover operation (there is probably another state when the link is down but I didn’t had the time to ask my network team to shut a port to verify this)) and finally the failover priority. The vnicstat command is a root command.

    vios1#  vnicstat vnicserver1
    
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 10
    
    Client Partition ID: 6
    Client Partition Name: lizard
    Client Operating System: AIX
    Client Device Name: ent0
    Client Device Location Code: U9119.MME.659707C-V6-C6
    [..]
    
    vios2# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: config_2
    Backing Device Name: ent3
    
    Failover State: inactive
    Failover Readiness: operational
    Failover Priority: 20
    [..]
    

    You can also check vnic server events in this errpt (login when failover and so on …)

    # errpt | more
    8C577CB6   1202195216 I S vnicserver1    VNIC Transport Event
    60D73419   1202194816 I S vnicserver1    VNIC Client Login
    # errpt -aj 60D73419 | more
    ---------------------------------------------------------------------------
    LABEL:          VS_CLIENT_LOGIN
    IDENTIFIER:     60D73419
    
    Date/Time:       Fri Dec  2 19:48:06 2016
    Sequence Number: 10567
    Machine Id:      00C9707C4C00
    Node Id:         vios2
    Class:           S
    Type:            INFO
    WPAR:            Global
    Resource Name:   vnicserver1
    
    Description
    VNIC Client Login
    
    Probable Causes
    VNIC Client Login
    
    Failure Causes
    VNIC Client Login
    

    Same thing using the hmc command line.

    Now we will do the same thing in command line. I warn you the commands are pretty huge !!!!

    • List the sriov adapter (you will need those to create the vNICs):
    • # lshwres -r sriov --rsubtype adapter -m reptilian-9119-MME-65BA46F
      adapter_id=3,slot_id=21010012,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=4,slot_id=21010013,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=1,slot_id=21010022,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=2,slot_id=21010023,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      
    • List vNIC for virtual machine “lizard”:
    • lshwres -r virtualio  -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=0,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/3/0/2700c003/2.0/2.0/50,sriov/vios2/2/1/0/27004003/2.0/2.0/60","backing_device_states=sriov/2700c003/0/Operational,sriov/27004003/1/Operational"
      
    • Creates a vNIC with 2 backing devices first one on Virtual I/O Server 1 on adapter 1 on physical port 2 with a failover priority set to 10, second one on Virtual I/O Server 2 on adapter 3 on physical port 2 with a failover priority set to 20 (this vNIC will take the next available slot which will be 6) (WARNING: Physical port numbering starts from 0):
    • #chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o a -p lizard --rsubtype vnic -v -a 'port_vlan_id=3455,auto_priority_failover=1,backing_devices="sriov/vios1//1/1/2.0/10,sriov/vios1//3/1/2.0/20"'
      #lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational"
      
    • Add two backing devices (one on each vios on adapter 2 and 4, both on physical port 2 with failover priority set to 30 and 40) on vNIC with slot 6:
    • # chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o s --rsubtype vnic -p lizard -s 6 -a '"backing_devices+=sriov/vios1//2/1/2.0/30,sriov/vios2//4/1/2.0/40"'
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Change the failover priority of logical port 2700400b of the vNIC in slot 6 to 11:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnicbkdev -p lizard -s 6 --logport 2700400b -a "failover_priority=11"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Make logical port 27008005 active on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o act --rsubtype vnicbkdev -p lizard  -s 6 --logport 27008005 
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/0/Operational,sriov/2700c008/0/Operational,sriov/27008005/1/Operational,sriov/27010002/0/Operational"
      
    • Re-enable automatic failover on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnic -p lizard  -s 6 -a "auto_priority_failover=1"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      

    Testing the failover.

    It’s now time to test is the failover is working as intended. The test will be super simple I will just shutoff one of the two Virtual I/O Server and check if I’m loosing some packets or not. I’m first checking on which VIOS is located the active adapter:

    vnic1l

    I now need to shutdown the Virtual I/O Server ending with 88 and check if the one ending with 89 is taking the lead:

    *****88# shutdown -force 
    

    Priorities 10 and 30 are on the shutted Virtual I/O Server, the highest priority is on the active Virtual I/O Server is 20. This backing device hosted on the second Virtual I/O Server is serving the network I/Os;

    vnic1m

    You can check the same thing with command line on the remaining Virtual I/O Server:

    *****89# errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    60D73419   1202214716 I S vnicserver0    VNIC Client Login
    60D73419   1202214716 I S vnicserver1    VNIC Client Login
    *****89# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 20
    
    

    During my tests the failover was working as I expected. You can see on the picture below that during this test I only lost one ping between 64 and 66 during the failover/failback process.

    vnic1n

    In the partition I saw some messaging in the errpt during the failover:

    # errpt | mroe 
    4FB9389C   1202215816 I S ent0           VNIC Link Up
    F655DA07   1202215816 I S ent0           VNIC Link Down
    # errpt -a | more
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: DOWN   logical link: DOWN
    Status
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: UP   logical link: UP
    Status
    

    What about Live Partition Mobility.

    If you want a seamless LPM experience without having to choose the destination adapter and physical port on which to map you current vNIC backing devices on the destination, just fill the label and sublabel (most important is label) for each physical port of your SRIOV adapter. Then during the LPM if names are aligned between two systems the good physical port will be automatically chose depending on the names of the label:

    vnic1o
    vnic1p

    The LPM was working like a charm and I didn’t notice any particular problems during the move. vNIC failover and LPM are working ok as long as you take care of your SRIOV labels :-). I did notice on AIX 7.2 TL1 SP1 that there was no errpt messages in the partition itself but just in the Virtual I/O Server … weird :-)

    # errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    3EB09F5A   1202222416 I S Migration      Migration completed successfully
    

    Conlusion.

    No long story here. If you need performance AND flexibility you absolutely have to use SRIOV vNIC failover adapters. This feature offers you the best of two worlds having the possibility to dedicate 10GB adapters with a failover capability without having to be worried about LPM or about NIB configuration. It’s not applicable in all cases but it’s definitely something to have for an environment such as TSM or network I/O intensive workloads. Use it !

    About reptilians !

    Before you start reading this, keep your sense of humor and be noticed that what I say is not related to my workplace at all it’s a general way of thinking not especially based on my experience. Don’t be offended by this it’s just a personal opinion based on things I may or may have not seen during my life. You’ve been warned.

    This blog was never a place to share my opinions about life and society but I must admit that I should have done that before. Speaking about this kind of things makes you feel alive in world where everything needs to be ok and where you don’t have anymore the right to feel or express something about what you are living. There are couple of good blog posts speaking of this kind of things and related to the IT world. I agree with all of what is said in these posts. Some of the authors of these posts are just telling what they love in their daily jobs but I think it’s also a way to say what they probably won’t love in another one :-) :

    • Adam Leventhal’s “I’m not a resource”: here
    • Brendan Gregg’s “Working at Netflix in 2016″: here

    All of this to say that I work at nights, I work on weekends, I’m thinking about PowerSystems/computers when I fall asleep. I always have new ideas and I always want to learn new things, discover new technologies and features. I truly, deeply love this but being like this does not help me and will never help me in my daily job for one single reason. In this world people who have the knowledge are not people who are taking technical decisions it’s sad but true. I’m just good at working the most I can for the less money possible. Nobody cares if techs are happy, unhappy, want to stay or leave. I doesn’t make any differences for anyone driving a company. What’s important is money. Everything is meaningless. We are no one we are nothing, just number in a excel spreadsheet. I’m probably saying because I’m not good enough in anything to find an acceptable workplace. Once again sad but true.

    Even worst, if you just want to follow what’s the industry is asking you have to be everywhere and know everything. I know I’ll be forced in a very near future to move on Devops/ Linux (I love Linux I’m an RHCE certified engineer !). That’s why since a couple of years now, at night after my daily job is finished I’m working again: working to understand how Docker is working, working to install my own Openstack on my own machines, working to understand Saltstack, Ceph, Python, Ruby, Go …. it’s a never ending process. But it’s still not enough for them ! No enough to be consider as good or good enough guy to fit for a job. I remember being asked to know about Openstack, Cassandra, Hadoop, AWS, KVM, Linux, Automation tools (puppet this time), Docker and continuous integration for one single job application. First, I seriously doubt that someone will have such skills and be good at each. Second even if I’m an expert on each one if you have a look a few years ago it was the exact same thing but with different products. You have to understand and be good at every new products in minutes. All of this to understand that one or two years after you are considered as an “expert” you are bad at everything that exists in the industry. I’m really sick of this fight against something I can’t control. Being a hard worker and clever enough to understand every new features is not enough nowadays. On top of that you also need to be a beautiful person with a nice perfect smile wearing a perfect suit. You also have to be on LinkedIn and be connected with the good persons. And even if every of these boxes are checked you still need to be lucky enough to be at the right place at the right moment. I’m so sick of this. Work doesn’t pay. Only luck. I don’t want to live in this kind of world but I have to. Anyway this is just a “two-cents” way of thinking. Everything is probably a big trick orchestrated by this reptilians lizard mens ! ^^. Be good at what you do and don’t care about what people are thinking of you (even your horrible french accent during your sessions) … that’s the most important !

    picture-of-reptilian-alien

    NovaLink ‘HMC Co-Management’ and PowerVC 1.3.0.1Dynamic Resource Optimizer

    Everybody now knows that I’m using PowerVC a lot in my current company. My environment is growing bigger and bigger and we are now managing more than 600 virtual machines with PowerVC (the goal is to reach ~ 3000 this year). Some of them were build by PowerVC itself and some of them were migrated through an homemade python script calling the PowerVC rest api and moving our old vSCSI machines to the new full NPIV/Live Partition Mobility/PowerVC environment (Still struggling with the “old mens” to move on SSP, but I’m alone versus everybody on this one). I’m happy with that but (there is always a but) I’m facing a lot problems. The first one is that we are doing more and more stuffs with PowerVC (Virtual Machine creation, virtual machines resizing, adding additional disks, moving machine with LPM, and finally using this python scripts to migrate the old machines to the new environment). I realized that the machine hosting the PowerVC was slower and slower and the more actions we do the more the PowerVC was “unresponsive”. By this I mean that the GUI was slow, creating objects was slower and slower. By looking at CPU graphs in lpar2rrd we noticed that the CPU consumption was growing as fast as we were doing stuffs on PowerVC (check the graph below). The second problem was my teams (unfortunately for me, we have here different teams doing different sort of stuffs here and everybody is using the Hardware Management Consoles it’s own way, some people are renaming the machine making them unusable with PowerVC, some people were changing the profiles disabling the synchronization, even worse we have some third party tools used for capacity planning making the Hardware Management Console unusable by PowerVC). The solution to all these problems is to use NovaLink and especially the NovaLink Co-Management. By doing this the Hardware Management Consoles will be restricted to a read-only view and PowerVC will stop querying the HMCs and will directly query the NovaLink partitions on each hosts instead of querying the Hardware Management Consoles.

    cpu_powervc

    What is NovaLink ?

    If you are using PowerVC you know that this one is based on OpenStack. Until now all the Openstack services where running on the PowerVC host. If you check on the PowerVC today you can see that there is one Nova per managed host. In the example below I’m managing ten hosts so I have ten different Nova processes running :

    # ps -ef | grep [n]ova-compute
    nova       627     1 14 Jan16 ?        06:24:30 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_10D6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_10D6666.log
    nova       649     1 14 Jan16 ?        06:30:25 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_65E6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_65E6666.log
    nova       664     1 17 Jan16 ?        07:49:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1086666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_1086666.log
    nova       675     1 19 Jan16 ?        08:40:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_06D6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_06D6666.log
    nova       687     1 18 Jan16 ?        08:15:57 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6576666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_6576666.log
    nova       697     1 21 Jan16 ?        09:35:40 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6556666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_6556666.log
    nova       712     1 13 Jan16 ?        06:02:23 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_10A6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_10A6666.log
    nova       728     1 17 Jan16 ?        07:49:02 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1016666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_1016666.log
    nova       752     1 17 Jan16 ?        07:34:45 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1036666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9119MHE_1036666.log
    nova       779     1 13 Jan16 ?        05:54:52 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6596666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9119MHE_6596666.log
    # ps -ef | grep [n]ova-compute | wc -l
    10
    

    The goal of NovaLink is to move these processes on a dedicated partition running on each managed host (each PowerSystems). This partition is called the NovaLink partition. This one is running on an Ubuntu 15.10 Linux OS (Little endian) (so only available on Power8 hosts) and is in charge to run the Openstack nova processes. By doing that you will distribute the load across all the NovaLink partitions instead of charging one PowerVC host. Even better my understanding is that the NovaLink partition is able to communicate directly with the FSP. By using NovaLink you will be able to stop using the Hardware Management Consoles anymore and avoid the slowness of theses ones. As the NovaLink partition is hosted on the host itself the RMC connections are can now use a direct link (ipv6) through the PowerHypervisor. No more RMC connection problem at all ;-), it’s just awesome. NovaLink allows you to choose between two modes of management:

    • Full Nova Management: You install your new host directly with NovaLink on it and you will not need an Hardware Management Console Anymore (In this case the NovaLink installation is in charge to deploy the Virtual I/O Servers and the SEAs).
    • Nova Co-Management: Your host is already installed and you give the write access (setmaster) to the NovaLink partition, the Hardware Management Console will be limited in this mode (you will not be able to create partition anymore or modify profile, it’s not a “read only” mode as you will be able to start and stop the partitions and still do some stuffs with HMC but you will be very limited).
    • You can still mix NovaLink and Non-NovaLink management hosts, and still have P7/P6 managed by HMCs, P8 managed by HMCs, P8 Nova Co-Managed and P8 full Nova Managed ;-).
    • Nova1

    Prerequisites

    As always upgrade your systems to the latest code level if you want to use NovaLink and NovaLink Co-Management

    • Power 8 only with firmware version 840. (or later)
    • Virtual I/O Server 2.2.4.10 or later
    • For NovaLink co-management HMC V8R8.4.0
    • Obviously install NovaLink on each NovaLink managed system (install the latest patch version of NovaLink)
    • PowerVC 1.3.0.1 or later

    NovaLink installation on an existing system

    I’ll show you here how to install a NovaLink partition on an existing deployed system. Installing a new system from scratch is also possible. My advice is that you look at this address to start: , and check this youtube video showing you how a system is installed from scratch :

    The goal of this post is to show you how to setup a co-managed system on an already existing system with Virtual I/O Servers already deployed on the host. My advice is to be very careful. The first thing you’ll need to do is to created a partition (2VP 0.5EC and 5GB Memory) (I’m calling it nova in the example below) and use the Virtual Optical device to load the NovaLink system on this one. In the example below the machine is “SSP” backed. Be very careful when do that: setup the profile name, and all the configuration stuffs before moving to co-managed mode … after that it will be harder for you to change things as the new pvmctl command will be very new to you:

    # mkvdev -fbo -vadapter vhost0
    vtopt0 Available
    # lsrep
    Size(mb) Free(mb) Parent Pool         Parent Size      Parent Free
        3059     1579 rootvg                   102272            73216
    
    Name                                                  File Size Optical         Access
    PowerVM_NovaLink_V1.1_122015.iso                           1479 None            rw
    vopt_a19a8fbb57184aad8103e2c9ddefe7e7                         1 None            ro
    # loadopt -disk PowerVM_NovaLink_V1.1_122015.iso -vtd vtopt0
    # lsmap -vadapter vhost0 -fmt :
    vhost0:U8286.41A.21AFF8V-V2-C40:0x00000003:nova_b1:Available:0x8100000000000000:nova_b1.7f863bacb45e3b32258864e499433b52: :N/A:vtopt0:Available:0x8200000000000000:/var/vio/VMLibrary/PowerVM_NovaLink_V1.1_122015.iso: :N/A
    
    • At the gurb page select the first entry:
    • install1

    • Wait for the machine to boot:
    • install2

    • Choose to perform an installation:
    • install3

    • Accept the licenses
    • install4

    • padmin user:/li>
      install5

    • Put you network configuration:
    • install6

    • Accept to install the Ubuntu system:
    • install8

    • You can then modify anything you want in the configuration file (in my case the timezone):
    • install9

      By default NovaLink (I think not 100% sure) is designed to be installed on SAS disk, so without multipathing. If like me you decide to install the NovaLink partition in a “boot-on-san” lpar my advice is to launch the installation without any multipathing enabled (only one vscsi adapter or one virtual fibre channel adapter). After the installation is completed install the Ubuntu multipathd service and configure the second vscsi or virtual fibre channel adapter. If you don’t do that you may experience problem at the installation time (RAID error). Please remember that you have to do that before enabling the co-management. Last thing about the installation it may takes a lot of time to finish. So be patient (especially the preseed step).

    install10

    Updating to the latest code level

    The iso file provider in the Entitled Software Support is not updated to the latest available NovaLink code. Make a copy of the official repository available at this address: ftp://public.dhe.ibm.com/systems/virtualization/Novalink/debian. Serve the content of this ftp server on you how http server (use the command below to copy it):

    # wget --mirror ftp://public.dhe.ibm.com/systems/virtualization/Novalink/debian
    

    Modify the /etc/apt/sources.list (and source.list.d) and comment all the available deb repository to on only keep your copy

    root@nova:~# grep -v ^# /etc/apt/sources.list
    deb http://deckard.lab.chmod666.org/nova/Novalink/debian novalink_1.0.0 non-free
    root@nova:/etc/apt/sources.list.d# apt-get upgrade
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    Calculating upgrade... Done
    The following packages will be upgraded:
      pvm-cli pvm-core pvm-novalink pvm-rest-app pvm-rest-server pypowervm
    6 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    Need to get 165 MB of archives.
    After this operation, 53.2 kB of additional disk space will be used.
    Do you want to continue? [Y/n]
    Get:1 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pypowervm all 1.0.0.1-151203-1553 [363 kB]
    Get:2 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-cli all 1.0.0.1-151202-864 [63.4 kB]
    Get:3 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-core ppc64el 1.0.0.1-151202-1495 [2,080 kB]
    Get:4 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-rest-server ppc64el 1.0.0.1-151203-1563 [142 MB]
    Get:5 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-rest-app ppc64el 1.0.0.1-151203-1563 [21.1 MB]
    Get:6 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-novalink ppc64el 1.0.0.1-151203-408 [1,738 B]
    Fetched 165 MB in 7s (20.8 MB/s)
    (Reading database ... 72094 files and directories currently installed.)
    Preparing to unpack .../pypowervm_1.0.0.1-151203-1553_all.deb ...
    Unpacking pypowervm (1.0.0.1-151203-1553) over (1.0.0.0-151110-1481) ...
    Preparing to unpack .../pvm-cli_1.0.0.1-151202-864_all.deb ...
    Unpacking pvm-cli (1.0.0.1-151202-864) over (1.0.0.0-151110-761) ...
    Preparing to unpack .../pvm-core_1.0.0.1-151202-1495_ppc64el.deb ...
    Removed symlink /etc/systemd/system/multi-user.target.wants/pvm-core.service.
    Unpacking pvm-core (1.0.0.1-151202-1495) over (1.0.0.0-151111-1375) ...
    Preparing to unpack .../pvm-rest-server_1.0.0.1-151203-1563_ppc64el.deb ...
    Unpacking pvm-rest-server (1.0.0.1-151203-1563) over (1.0.0.0-151110-1480) ...
    Preparing to unpack .../pvm-rest-app_1.0.0.1-151203-1563_ppc64el.deb ...
    Unpacking pvm-rest-app (1.0.0.1-151203-1563) over (1.0.0.0-151110-1480) ...
    Preparing to unpack .../pvm-novalink_1.0.0.1-151203-408_ppc64el.deb ...
    Unpacking pvm-novalink (1.0.0.1-151203-408) over (1.0.0.0-151112-304) ...
    Processing triggers for ureadahead (0.100.0-19) ...
    ureadahead will be reprofiled on next reboot
    Setting up pypowervm (1.0.0.1-151203-1553) ...
    Setting up pvm-cli (1.0.0.1-151202-864) ...
    Installing bash completion script /etc/bash_completion.d/python-argcomplete.sh
    Setting up pvm-core (1.0.0.1-151202-1495) ...
    addgroup: The group `pvm_admin' already exists.
    Created symlink from /etc/systemd/system/multi-user.target.wants/pvm-core.service to /usr/lib/systemd/system/pvm-core.service.
    0513-071 The ctrmc Subsystem has been added.
    Adding /usr/lib/systemd/system/ctrmc.service for systemctl ...
    0513-059 The ctrmc Subsystem has been started. Subsystem PID is 3096.
    Setting up pvm-rest-server (1.0.0.1-151203-1563) ...
    The user `wlp' is already a member of `pvm_admin'.
    Setting up pvm-rest-app (1.0.0.1-151203-1563) ...
    Setting up pvm-novalink (1.0.0.1-151203-408) ...
    

    NovaLink and HMC Co-Management configuration

    Before adding the hosts on PowerVC you still need to do the most important thing. After the installation is finished enable the co-management mode to be able to have a system managed by NovaLink and still connected to an Hardware Management Console:

    • Enable the powerm_mgmt_capable attribute on the Nova partition:
    • # chsyscfg -r lpar -m br-8286-41A-2166666 -i "name=nova,powervm_mgmt_capable=1"
      # lssyscfg -r lpar -m br-8286-41A-2166666 -F name,powervm_mgmt_capable --filter "lpar_names=nova"
      nova,1
      
    • Enable co-management (please not here that you have to setmaster (you’ll see that the curr_master_name is the HMC) and then relmaster (you’ll see that the curr_master_name is the NovaLink Partition, this is that state where we want to be)):
    • # lscomgmt -m br-8286-41A-2166666
      is_master=null
      # chcomgmt -m br-8286-41A-2166666 -o setmaster -t norm --terms agree
      # lscomgmt -m br-8286-41A-2166666
      is_master=1,curr_master_name=myhmc1,curr_master_mtms=7042-CR8*2166666,curr_master_type=norm,pend_master_mtms=none
      # chcomgmt -m br-8286-41A-2166666 -o relmaster
      # lscomgmt -m br-8286-41A-2166666
      is_master=0,curr_master_name=nova,curr_master_mtms=3*8286-41A*2166666,curr_master_type=norm,pend_master_mtms=none
      

    Going back to HMC managed system

    You can go back to an Hardware Management Console managed system whenever you want (set the master to the HMC, delete the nova partition and release the master from the HMC).

    # chcomgmt -m br-8286-41A-2166666 -o setmaster -t norm --terms agree
    # lscomgmt -m br-8286-41A-2166666
    is_master=1,curr_master_name=myhmc1,curr_master_mtms=7042-CR8*2166666,curr_master_type=norm,pend_master_mtms=none
    # chlparstate -o shutdown -m br-8286-41A-2166666 --id 9 --immed
    # rmsyscfg -r lpar -m br-8286-41A-2166666 --id 9
    # chcomgmt -o relmaster -m br-8286-41A-2166666
    # lscomgmt -m br-8286-41A-2166666
    is_master=0,curr_master_mtms=none,curr_master_type=none,pend_master_mtms=none
    

    Using NovaLink

    After the installation you are now able to login on the NovaLink partition. (You can gain root access with “sudo su -” command). A command new called pvmctl is available on the NovaLink partition allowing you to perform any actions (stop, start virtual machine, list Virtual I/O Servers, ….). Before trying to add the host double check that the pvmctl command is working ok.

    padmin@nova:~$ pvmctl lpar list
    Logical Partitions
    +------+----+---------+-----------+---------------+------+-----+-----+
    | Name | ID |  State  |    Env    |    Ref Code   | Mem  | CPU | Ent |
    +------+----+---------+-----------+---------------+------+-----+-----+
    | nova | 3  | running | AIX/Linux | Linux ppc64le | 8192 |  2  | 0.5 |
    +------+----+---------+-----------+---------------+------+-----+-----+
    

    Adding hosts

    On the PowerVC side add the NovaLink host by choosing the NovaLink option:

    addhostnovalink

    Some deb (ibmpowervc-power)packages will be installed on configured on the NovaLink machine:

    addhostnovalink3
    addhostnovalink4

    By doing this, on each NovaLink machine you can check that a nova-compute process is here. (By adding the host the deb was installed and configured on the NovaLink host:

    # ps -ef | grep nova
    nova      4392     1  1 10:28 ?        00:00:07 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log
    root      5218  5197  0 10:39 pts/1    00:00:00 grep --color=auto nova
    # grep host_display_name /etc/nova/nova.conf
    host_display_name = XXXX-8286-41A-XXXX
    # tail -1 /var/log/apt/history.log
    Start-Date: 2016-01-18  10:27:54
    Commandline: /usr/bin/apt-get -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold -y install --force-yes --allow-unauthenticated ibmpowervc-powervm
    Install: python-keystoneclient:ppc64el (1.6.0-2.ibm.ubuntu1, automatic), python-oslo.reports:ppc64el (0.1.0-1.ibm.ubuntu1, automatic), ibmpowervc-powervm:ppc64el (1.3.0.1), python-ceilometer:ppc64el (5.0.0-201511171217.ibm.ubuntu1.199, automatic), ibmpowervc-powervm-compute:ppc64el (1.3.0.1, automatic), nova-common:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), python-oslo.service:ppc64el (0.11.0-2.ibm.ubuntu1, automatic), python-oslo.rootwrap:ppc64el (2.0.0-1.ibm.ubuntu1, automatic), python-pycadf:ppc64el (1.1.0-1.ibm.ubuntu1, automatic), python-nova:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), python-keystonemiddleware:ppc64el (2.4.1-2.ibm.ubuntu1, automatic), python-kafka:ppc64el (0.9.3-1.ibm.ubuntu1, automatic), ibmpowervc-powervm-monitor:ppc64el (1.3.0.1, automatic), ibmpowervc-powervm-oslo:ppc64el (1.3.0.1, automatic), neutron-common:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), python-os-brick:ppc64el (0.4.0-1.ibm.ubuntu1, automatic), python-tooz:ppc64el (1.22.0-1.ibm.ubuntu1, automatic), ibmpowervc-powervm-ras:ppc64el (1.3.0.1, automatic), networking-powervm:ppc64el (1.0.0.0-151109-25, automatic), neutron-plugin-ml2:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), python-ceilometerclient:ppc64el (1.5.0-1.ibm.ubuntu1, automatic), python-neutronclient:ppc64el (2.6.0-1.ibm.ubuntu1, automatic), python-oslo.middleware:ppc64el (2.8.0-1.ibm.ubuntu1, automatic), python-cinderclient:ppc64el (1.3.1-1.ibm.ubuntu1, automatic), python-novaclient:ppc64el (2.30.1-1.ibm.ubuntu1, automatic), python-nova-ibm-ego-resource-optimization:ppc64el (2015.1-201511110358, automatic), python-neutron:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), nova-compute:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), nova-powervm:ppc64el (1.0.0.1-151203-215, automatic), openstack-utils:ppc64el (2015.2.0-201511171223.ibm.ubuntu1.18, automatic), ibmpowervc-powervm-network:ppc64el (1.3.0.1, automatic), python-oslo.policy:ppc64el (0.5.0-1.ibm.ubuntu1, automatic), python-oslo.db:ppc64el (2.4.1-1.ibm.ubuntu1, automatic), python-oslo.versionedobjects:ppc64el (0.9.0-1.ibm.ubuntu1, automatic), python-glanceclient:ppc64el (1.1.0-1.ibm.ubuntu1, automatic), ceilometer-common:ppc64el (5.0.0-201511171217.ibm.ubuntu1.199, automatic), openstack-i18n:ppc64el (2015.2-3.ibm.ubuntu1, automatic), python-oslo.messaging:ppc64el (2.1.0-2.ibm.ubuntu1, automatic), python-swiftclient:ppc64el (2.4.0-1.ibm.ubuntu1, automatic), ceilometer-powervm:ppc64el (1.0.0.0-151119-44, automatic)
    End-Date: 2016-01-18  10:28:00
    

    The command line interface

    You can do ALL the stuffs you were doing on the HMC using the pvmctl command. The syntax is pretty simple: pvcmtl |OBJECT| |ACTION| where the OBJECT can be vios, vm, vea(virtual ethernet adapter), vswitch, lu (logical unit), or anything you want and ACTION can be list, delete, create, update. Here are a few examples :

    • List the Virtual I/O Servers:
    • # pvmctl vios list
      Virtual I/O Servers
      +--------------+----+---------+----------+------+-----+-----+
      |     Name     | ID |  State  | Ref Code | Mem  | CPU | Ent |
      +--------------+----+---------+----------+------+-----+-----+
      | s00ia9940825 | 1  | running |          | 8192 |  2  | 0.2 |
      | s00ia9940826 | 2  | running |          | 8192 |  2  | 0.2 |
      +--------------+----+---------+----------+------+-----+-----+
      
    • List the partitions (note the -d for display-fields allowing me to print somes attributes):
    • # pvmctl vm list
      Logical Partitions
      +----------+----+----------+----------+----------+-------+-----+-----+
      |   Name   | ID |  State   |   Env    | Ref Code |  Mem  | CPU | Ent |
      +----------+----+----------+----------+----------+-------+-----+-----+
      | aix72ca> | 3  | not act> | AIX/Lin> | 00000000 |  2048 |  1  | 0.1 |
      |   nova   | 4  | running  | AIX/Lin> | Linux p> |  8192 |  2  | 0.5 |
      | s00vl99> | 5  | running  | AIX/Lin> | Linux p> | 10240 |  2  | 0.2 |
      | test-59> | 6  | not act> | AIX/Lin> | 00000000 |  2048 |  1  | 0.1 |
      +----------+----+----------+----------+----------+-------+-----+-----+
      # pvmctl list vm -d name id 
      [..]
      # pvmctl vm list -i id=4 --display-fields LogicalPartition.name
      name=aix72-1-d3707953-00000090
      # pvmctl vm list  --display-fields LogicalPartition.name LogicalPartition.id LogicalPartition.srr_enabled SharedProcessorConfiguration.desired_virtual SharedProcessorConfiguration.uncapped_weight
      name=aix72capture,id=3,srr_enabled=False,desired_virtual=1,uncapped_weight=64
      name=nova,id=4,srr_enabled=False,desired_virtual=2,uncapped_weight=128
      name=s00vl9940243,id=5,srr_enabled=False,desired_virtual=2,uncapped_weight=128
      name=test-5925058d-0000008d,id=6,srr_enabled=False,desired_virtual=1,uncapped_weight=128
      
    • Delete the virtual adapter on the partition name nova (note the –parent-id to select the partition) with a certain uuid which was found with (pvmclt list vea):
    • # pvmctl vea delete --parent-id name=nova --object-id uuid=fe7389a8-667f-38ca-b61e-84c94e5a3c97
      
    • Power off the lpar named aix72-2:
    • # pvmctl vm power-off -i name=aix72-2-536bf0f8-00000091
      Powering off partition aix72-2-536bf0f8-00000091, this may take a few minutes.
      Partition aix72-2-536bf0f8-00000091 power-off successful.
      
    • Delete the lpar named aix72-2:
    • # pvmctl vm delete -i name=aix72-2-536bf0f8-00000091
      
    • Delete the vswitch named MGMTVSWITCH:
    • # pvmctl vswitch delete -i name=MGMTVSWITCH
      
    • Open a console:
    • #  mkvterm --id 4
      vterm for partition 4 is active.  Press Control+] to exit.
      |
      Elapsed time since release of system processors: 57014 mins 10 secs
      [..]
      
    • Power on an lpar:
    • # pvmctl vm power-on -i name=aix72capture
      Powering on partition aix72capture, this may take a few minutes.
      Partition aix72capture power-on successful.
      

    Is this a dream ? No more RMC connectivty problem anymore

    I’m 100% sure that you always have problems with RMC connectivity due to firwall issues, ports not opened, and IDS blocking RMC ongoing or outgoing traffic. NovaLink is THE solution that will solve all the RMC problems forever. I’m not joking it’s a major improvement for PowerVM. As the NovaLink partition is installed on each hosts this one can communicate through a dedicated IPv6 link with all the partitions hosted on the host. A dedicated virtual switch called MGMTSWITCH is used to allow the RMC flow to transit between all the lpars and the NovaLink partition. Of course this Virtual Switch must be created and one Virtual Ethernet Adapter must also be created on the NovaLink partition. These are the first two actions to do if you want to implement this solution. Before starting here are a few things you need to know:

    • For security reason the MGMTSWITCH must be created in Vepa mode. If you are not aware of what are VEPA and VEB modes here is a reminder:
    • In VEB mode all the the partitions connected to the same vlan can communicate together. We do not want that as it is a security issue.
    • The VEPA mode gives us the ability to isolate lpars that are on the same subnet. lpar to lpar traffic is forced out of the machine. This is what we want.
    • The PVID for this VEPA network is 4094
    • The adapter in the NovaLink partition must be a trunk adapter.
    • It is mandatory to name the VEPA vswitch MGMTSWITCH.
    • At the lpar creation if the MGMTSWITCH exists a new Virtual Ethernet Adapter will be automatically created on the deployed lpar.
    • To be correctly configured the deployed lpar needs the latest level of rsct code (3.2.1.0 for now).
    • The latest cloud-init version must be deploy on the captured lpar used to make the image.
    • You don’t need to configure any addresses on this adapter (on the deployed lpars the adapter is configured with the local-link address (it’s the same thing as 169.254.0.0/16 addresses used in IPv4 format but for IPv6)(please note that any IPv6 adapter must “by design” have a local-link address).

    mgmtswitch2

    • Create the virtual switch called MGMTSWITCH in Vepa mode:
    • # pvmctl vswitch create --name MGMTSWITCH --mode=Vepa
      # pvmctl vswitch list  --display-fields VirtualSwitch.name VirtualSwitch.mode 
      name=ETHERNET0,mode=Veb
      name=vdct,mode=Veb
      name=vdcb,mode=Veb
      name=vdca,mode=Veb
      name=MGMTSWITCH,mode=Vepa
      
    • Create a virtual ethernet adapter on the NovaLink partition with the PVID 4094 and a trunk priorty set to 1 (it’s a trunk adapter). Note that we now have two adapters on the NovaLink partition (one in IPv4 (routable) and the other one in IPv6 (non-routable):
    • # pvmctl vea create --pvid 4094 --vswitch MGMTSWITCH --trunk-pri 1 --parent-id name=nova
      # pvmctl vea list --parent-id name=nova
      --------------------------
      | VirtualEthernetAdapter |
      --------------------------
        is_tagged_vlan_supported=False
        is_trunk=False
        loc_code=U8286.41A.216666-V3-C2
        mac=EE3B84FD1402
        pvid=666
        slot=2
        uuid=05a91ab4-9784-3551-bb4b-9d22c98934e6
        vswitch_id=1
      --------------------------
      | VirtualEthernetAdapter |
      --------------------------
        is_tagged_vlan_supported=True
        is_trunk=True
        loc_code=U8286.41A.216666-V3-C34
        mac=B6F837192E63
        pvid=4094
        slot=34
        trunk_pri=1
        uuid=fe7389a8-667f-38ca-b61e-84c94e5a3c97
        vswitch_id=4
      

      Configure the local-link IPv6 address in the NovaLink partition:

      # more /etc/network/interfaces
      [..]
      auto eth1
      iface eth1 inet manual
       up /sbin/ifconfig eth1 0.0.0.0
      # ifup eth1
      # ifconfig eth1
      eth1      Link encap:Ethernet  HWaddr b6:f8:37:19:2e:63
                inet6 addr: fe80::b4f8:37ff:fe19:2e63/64 Scope:Link
                UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
                RX packets:0 errors:0 dropped:0 overruns:0 frame:0
                TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
                collisions:0 txqueuelen:1000
                RX bytes:0 (0.0 B)  TX bytes:1454 (1.4 KB)
                Interrupt:34
      

    Capture an AIX host with the latest version of rsct installed (3.2.1.0) or later and the latest version of cloud-init installed. This version of RMC/rsct handle this new feature so this is mandatory to have it installed on the captured host. When PowerVC will deploy a Virtual Machine on a Nova managed host with this version of rsct installed a new adapter with the PVID 4094 in the virtual switch MGMTSWITCH will be created and finally all the RMC traffic will use this adapter instead of your public IP address:

    # lslpp -L rsct*
      Fileset                      Level  State  Type  Description (Uninstaller)
      ----------------------------------------------------------------------------
      rsct.core.auditrm          3.2.1.0    C     F    RSCT Audit Log Resource
                                                       Manager
      rsct.core.errm             3.2.1.0    C     F    RSCT Event Response Resource
                                                       Manager
      rsct.core.fsrm             3.2.1.0    C     F    RSCT File System Resource
                                                       Manager
      rsct.core.gui              3.2.1.0    C     F    RSCT Graphical User Interface
      rsct.core.hostrm           3.2.1.0    C     F    RSCT Host Resource Manager
      rsct.core.lprm             3.2.1.0    C     F    RSCT Least Privilege Resource
                                                       Manager
      rsct.core.microsensor      3.2.1.0    C     F    RSCT MicroSensor Resource
                                                       Manager
      rsct.core.rmc              3.2.1.1    C     F    RSCT Resource Monitoring and
                                                       Control
      rsct.core.sec              3.2.1.0    C     F    RSCT Security
      rsct.core.sensorrm         3.2.1.0    C     F    RSCT Sensor Resource Manager
      rsct.core.sr               3.2.1.0    C     F    RSCT Registry
      rsct.core.utils            3.2.1.1    C     F    RSCT Utilities
    

    When this image will be deployed a new adapter will be created in the MGMTSWITCH virtual switch, an IPv6 local-link address will be configured on it. You can check the cloud-init activation to see the IPv6 address is configured at the activation time:

    # pvmctl vea list --parent-id name=aix72-2-0a0de5c5-00000095
    --------------------------
    | VirtualEthernetAdapter |
    --------------------------
      is_tagged_vlan_supported=True
      is_trunk=False
      loc_code=U8286.41A.216666-V5-C32
      mac=FA620F66FF20
      pvid=3331
      slot=32
      uuid=7f1ec0ab-230c-38af-9325-eb16999061e2
      vswitch_id=1
    --------------------------
    | VirtualEthernetAdapter |
    --------------------------
      is_tagged_vlan_supported=True
      is_trunk=False
      loc_code=U8286.41A.216666-V5-C33
      mac=46A066611B09
      pvid=4094
      slot=33
      uuid=560c67cd-733b-3394-80f3-3f2a02d1cb9d
      vswitch_id=4
    # ifconfig -a
    en0: flags=1e084863,14c0
            inet 10.10.66.66 netmask 0xffffff00 broadcast 10.14.33.255
             tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    en1: flags=1e084863,14c0
            inet6 fe80::c032:52ff:fe34:6e4f/64
             tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    sit0: flags=8100041
            inet6 ::10.10.66.66/96
    [..]
    

    Note that the local-link address is configured at the activation time (fe80 starting addresses):

    # more /var/log/cloud-init-output.log
    [..]
    auto eth1
    
    iface eth1 inet6 static
        address fe80::c032:52ff:fe34:6e4f
        hwaddress ether c2:32:52:34:6e:4f
        netmask 64
        pre-up [ $(ifconfig eth1 | grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}') = "c2:32:52:34:6e:4f" ]
            dns-search fr.net.intra
    # entstat -d ent1 | grep -iE "switch|vlan"
    Invalid VLAN ID Packets: 0
    Port VLAN ID:  4094
    VLAN Tag IDs:  None
    Switch ID: MGMTSWITCH
    

    To be sure all is working correctly here is a proof test. I’m taking down the en0 interface on which the IPv4 public address is configured. Then I’m launching a tcpdump on the en1 (on the MGMTSWITCH address). Finally I’m resizing the Virtual Machine with PowerVC. AND EVERYTHING IS WORKING GREAT !!!! AWESOME !!! :-) (note the fe80 to fe80 communication):

    # ifconfig en0 down detach ; tcpdump -i en1 port 657
    tcpdump: WARNING: en1: no IPv4 address assigned
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on en1, link-type 1, capture size 96 bytes
    22:00:43.224964 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: S 4049792650:4049792650(0) win 65535 
    22:00:43.225022 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: S 2055569200:2055569200(0) ack 4049792651 win 28560 
    22:00:43.225051 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: . ack 1 win 32844 
    22:00:43.225547 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 1:209(208) ack 1 win 32844 
    22:00:43.225593 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: . ack 209 win 232 
    22:00:43.225638 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 1:97(96) ack 209 win 232 
    22:00:43.225721 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 209:377(168) ack 97 win 32844 
    22:00:43.225835 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 97:193(96) ack 377 win 240 
    22:00:43.225910 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 377:457(80) ack 193 win 32844 
    22:00:43.226076 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 193:289(96) ack 457 win 240 
    22:00:43.226154 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 457:529(72) ack 289 win 32844 
    22:00:43.226210 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 289:385(96) ack 529 win 240 
    22:00:43.226276 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 529:681(152) ack 385 win 32844 
    22:00:43.226335 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 385:481(96) ack 681 win 249 
    22:00:43.424049 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: . ack 481 win 32844 
    22:00:44.725800 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 88
    22:00:44.726111 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 88
    22:00:50.137605 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 632
    22:00:50.137900 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 88
    22:00:50.183108 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 408
    22:00:51.683382 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 408
    22:00:51.683661 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 88
    

    To be sure security requirements are met from the lpar I’m pinging the NovaLink host (the first one) which is answering and then I’m pinging the second lpar (the second ping) which is not working. (And this is what we want !!!).

    # ping fe80::d09e:aff:fecf:a868
    PING fe80::d09e:aff:fecf:a868 (fe80::d09e:aff:fecf:a868): 56 data bytes
    64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=0 ttl=64 time=0.203 ms
    64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=1 ttl=64 time=0.206 ms
    64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=2 ttl=64 time=0.216 ms
    ^C
    --- fe80::d09e:aff:fecf:a868 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0/0/0 ms
    # ping fe80::44a0:66ff:fe61:1b09
    PING fe80::44a0:66ff:fe61:1b09 (fe80::44a0:66ff:fe61:1b09): 56 data bytes
    ^C
    --- fe80::44a0:66ff:fe61:1b09 ping statistics ---
    2 packets transmitted, 0 packets received, 100% packet loss
    

    PowerVC 1.3.0.1 Dynamic Resource Optimizer

    In addition to the NovaLink part of this blog post I also wanted to talk about the killer app of 2016. Dynamic Resource Optimizer. This feature can be used on any PowerVC 1.3.0.1 managed hosts (you obviously need at least to hosts). DRO is in charge to re-balance your Virtual Machines across all the available hosts (in the host-group). To sum up if a host is experiencing an heavy load and reaching a certain amount of CPU consumption over a period of time, DRO will move your virtual machines to re-balance the load across all the available hosts (this is done at a host level). Here are a few details about DRO:

    • The DRO configuration is done at a host level.
    • You setup a threshold (in the capture below) to reach to trigger the Live Partition Moblity or Mobily Cores movements (Power Entreprise Pool).
    • droo6
      droo3

    • To be triggered this threshold must be reached a certain number of time (stabilization) over a period you are defining (run interval).
    • You can choose to move virtual machines using Live Partition Mobilty, or to move “cores” using Power Entreprise Pool (you can do both; moving CPU will always be preferred as moving partitions)
    • DRO can be run in advise mode (nothing is done, a warning is thrown in the new DRO events tab) or in active mode (which is doing the job and moving things).
      droo2
      droo1

    • Your most critical virtual machines can be excluded from DRO:
    • droo5

    How is DRO choosing which machines are moved

    I’m running DRO in production since now one month and I had the time to check what is going on behind the scene. How is DRO choosing which machines are moved when a Live Partition Moblity operation must be run to face an heavy load on a host ? To do so I decided to launch 3 different cpuhog (16 forks, 4VP, SMT4) processes (which are eating CPU ressource) on three different lpars with 4VP each. On the PowerVC I can check that before launching this processes the CPU consumption is ok on this host (the three lpars are running on the same host) :

    droo4

    # cat cpuhog.pl
    #!/usr/bin/perl
    
    print "eating the CPUs\n";
    
    foreach $i (1..16) {
          $pid = fork();
          last if $pid == 0;
          print "created PID $pid\n";
    }
    
    while (1) {
          $x++;
    }
    # perl cpuhog.pl
    eating the CPUs
    created PID 47514604
    created PID 22675712
    created PID 3015584
    created PID 21496152
    created PID 25166098
    created PID 26018068
    created PID 11796892
    created PID 33424106
    created PID 55444462
    created PID 65077976
    created PID 13369620
    created PID 10813734
    created PID 56623850
    created PID 19333542
    created PID 58393312
    created PID 3211988
    

    I’m waiting a couple of minutes and I realize that the virtual machines on which the cpuhog processes were launched are the ones which are migrated. So we can say that PowerVC is moving the machine that are eating CPU (another strategy could be to move all the non-eating CPU machines to let the working ones do their job without launching a mobility operation).

    # errpt | head -3
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    A5E6DB96   0118225116 I S pmig           Client Partition Migration Completed
    08917DC6   0118225116 I S pmig           Client Partition Migration Started
    

    After the moves are ok I can see that the load is now ok on the host. DRO has done the job for me and moved the lpar to met the configured thresold ;-)

    droo7dro_effect

    The images below will show you a good example of the “power” of PowerVC and DRO. To update my Virtual I/O Servers to the latest version the PowerVC maintenance mode was used to free up the Virtual I/O Servers. After leaving the maintenance mode the DRO was doing the job to re-balance the Virtual Machines across all the hosts (The red arrows symbolize the maintenance mode action and the purple ones the DRO actions). You can also see that some lpars were moved across 4 different hosts during this process. All these pictures are taken from real life experience on my production systems. This not a lab environment, this is one part of my production. So yes DRO and PowerVC 1.3.0.1 are production ready. Hell yes!

    real1
    real2
    real3
    real4
    real5

    Conclusion

    As my environment is growing bigger the next step for me will be to move on NovaLink on my P8 hosts. Please note that the NovaLink Co-Management feature is today a “TechPreview” but should be released GA very soon. Talking about DRO I was waiting for that for years and it finally happens. I can assure you that it is production ready, to prove this I’ll just give you this number. To upgrade my Virtual I/O Servers to 2.2.4.10 release using PowerVC maintenance mode and DRO more than 1000 Live Partition Mobility moves were performed without any outage on production servers and during working hours. Nobody in my company was aware of this during the operations. It was a seamless experience for everybody.

    Using the Simplified Remote Restart capability on Power8 Scale Out Servers

    A few weeks ago I had to work on simplified remote restart. I’m not lucky enough yet -because of some political decisions in my company- to have access to any E880 or E870. We just have a few scale-out machines to play with (S814). For some critical applications we need in the future to be able to reboot the virtual machine if the system hosting the machine has failed (Hardware problem). We decided a couple of month ago not to use remote restart because it was mandatory to use a reserved storage pool device and it was too hard to manage because of this mandatory storage. We now have enough P8 boxes to try and understand the new version of remote restart called simplified remote restart which does not need any reserved storage pool device. For those who want to understand what remote restart is I strongly recommend you to check my previous blog post about remote restart on two P7 boxes: Configuration of a remote restart partition. For the others here is what I learned about the simplified version of this awesome feature.

    Please keep in mind that the FSP of the machine must be up to perform a simplified remote restart operation. It means that if for instance you loose one of your datacenter or the link between your two datacenters you cannot use simplified remote restart to restart you partitions on the main/backup site. Simplified Remote Restart only prevents you from an hardware failure of your machine. Maybe this will change in a near future but for the moment it is the most important thing to understand about simplified remote restart.

    Updating to the latest version of firmware

    I was very surprised when I got my Power8 machines. After deploying these boxes I decided to give a try to simplified remote restart but It was just not possible. Since the Power8 Scale Out servers were release they were NOT simplified remote restart capable. The release of the SV830 firmware now enables the Simplified Remote restart on Power8 Scale Out machines. Please note that there is nothing about it in the patch note, so chmod666.org is the only place where you can get this information :-). Here is the patch note: here. Last word you will find on the internet that you need Power8 to use simplified remote restart. It’s true but partially true. YOU NEED A P8 MACHINE WITH AT LEAST A 820 FIRMWARE.

    The first thing to do is to update your firmware to the SV830 version (on both systems participating in the simplified remote restart operation):

    # updlic -o u -t sys -l latest -m p814-1 -r mountpoint -d /home/hscroot/SV830_048 -v
    [..]
    # lslic -m p814-1 -F activated_spname,installed_level,ecnumber
    FW830.00,48,01SV830
    # lslic -m p814-2 -F activated_spname,installed_level,ecnumber
    FW830.00,48,01SV830
    

    You can check the firmware version directly from the Hardware Management Console or in the ASMI:

    fw1
    fw3

    After the firmware upgrade verify that you now have the Simplfied Remote Restart capability set to true.

    fw2

    # lssyscfg -r sys -F name,powervm_lpar_simplified_remote_restart_capable
    p720-1,0
    p814-1,1
    p720-2,0
    p814-2,1
    

    Prerequisites

    These prerequisites are true ONLY for Scale out systems:

    • To update to the firmware SV830_048 you need the latest Hardware Management Console release which is v8r8.3.0 plus MH01514 PTF.
    • Obviously on Scale out system SV830_048 is the minimum firmware requirement.
    • Minimum level of Virtual I/O Servers is 2.2.3.4 (for both source and destination systems).
    • PowerVM enterprise. (to be confirmed)

    Enabling simplified remote restart of an existing partition

    You probably want to enable simplified remote restart after an LPM migration/evacuation. After migrating your virtual machine(s) to a Power 8 with the Simplified Remote Restart Capability you have to enable this capability on all the virtual machines. This can only be done when the machine is shutdown, so you first have to stop the virtual machines (after a live partition mobility move) if you want to enable the SRR. It can’t be done without having to reboot the virtual machine:

    • List current partition running on the system and check which one are “simplified remote restart capable” (here only one is simplified remote restart capable):
    • # lssyscfg -r lpar -m p814-1 -F name,simplified_remote_restart_capable
      vios1,0
      vios2,0
      lpar1,1
      lpar2,0
      lpar3,0
      lpar4,0
      lpar5,0
      lpar6,0
      lpar7,0
      
    • For each lpar not simplified remote restart capable change the simplified_remote_restart_capable attribute using the chssyscfg command. Please note that you can’t do this using the Hardware Management Console gui (in the latest 8r8.3.0, when enabling it by the Hardware management console the GUI is telling you that you need a reserved device storage which is needed by the Remote Restart Capability and not by the simplified version of remote restart. You have to use the command line ! (check screenshot below)
    • You can’t change this attribute while the machine is running:
    • gui_change_to_srr

    • You can’t do it with the GUI after the machine is shutdown:
    • gui_change_to_srr2
      gui_change_to_srr3

    • The only way to enable this attribute is to do it by using the Hardware Management Console command line (please note in the output below that running lpar cannot be changed):
    • # for i in lpar2 lpar3 lpar4 lpar5 lpar6 lpar7 ; do chsyscfg -r lpar -m p824-2 -i "name=$i,simplified_remote_restart_capable=1" ; done
      An error occurred while changing the partition named lpar6.
      HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown.
      An error occurred while changing the partition named lpar7.
      HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown.
      # lssyscfg -r lpar -m p824-1 -F name,simplified_remote_restart_capable,lpar_env | grep -v vioserver
      lpar1,1,aixlinux
      lpar2,1,aixlinux
      lpar3,1,aixlinux
      lpar4,1,aixlinux
      lpar5,1,aixlinux
      lpar6,0,aixlinux
      lpar7,0,aixlinux
      

    Remote restarting

    If you are trying to do a live partition mobility operation back to a P7 or P8 box without the simplified remote restart capability it will not be possible. Enabling the simplified remote restart will force the virtual machine to stay on P8 boxes with simplified remote restart capability. This is one of the reason why most of customers are not doing it:

    # migrlpar -o v -m p814-1 -t p720-1 -p lpar2
    Errors:
    HSCLB909 This operation is not allowed because managed system p720-1 does not support PowerVM Simplified Partition Remote Restart.
    

    lpm_not_capable_anymore

    On the Hardware Management Console you can see that the virtual machine is simplified remote restart capable by checking its properties:

    gui_change_to_srr4

    You can now try to remote restart your virtual machines to another server. As always the status of the server has to be different from Operating (Power Off, Error, Error – Dump in progress, Initializing). As always my advice is to validate before restarting:

    # rrstartlpar -o validate -m p824-1 -t p824-2 -p lpar1
    # echo $?
    0
    # rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1
    HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.
    
    # lssyscfg -r sys -F name,state
    p824-2,Operating
    p824-1,Power Off
    # rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1
    

    By doing a remote restart operation the machine will boot automatically. You can check in the errpt that in most cases the partition ID will be changed (proving that you are on another machine):

    # errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    A6DF45AA   0618170615 I O RMCdaemon      The daemon is started.
    1BA7DF4E   0618170615 P S SRC            SOFTWARE PROGRAM ERROR
    CB4A951F   0618170615 I S SRC            SOFTWARE PROGRAM ERROR
    CB4A951F   0618170615 I S SRC            SOFTWARE PROGRAM ERROR
    D872C399   0618170615 I O sys0           Partition ID changed and devices recreat
    

    Be very careful with the ghostdev sys0 attribute. Every VM remote restarted needs to have ghostdev set to 0 to avoid an ODM wipe (If you remote restart an lpar with ghostdev set to 1 you will loose all ODM customization)

    # lsattr -El sys0 -a ghostdev
    ghostdev 0 Recreate ODM devices on system change / modify PVID True
    

    When the source machine is up and running you have to clean the old definition of the remote restarted lpar by launching a cleanup operation. This will wipe the old lpar defintion:

    # rrstartlpar -o cleanup -m p814-1 -p lpar1
    

    The RRmonitor (modified version)

    There is a script delivered by IBM called rrMonitor, this one is looking at the PowerSystem‘s state and if this one is in particular state is restarting a specific virtual machine. This script is just not usable by a user because it has to be executed directly on the HMC (you need a pesh password to put the script on the hmc) and is only checking one particular virtual machine. I had to modify this script to ssh to the HMC and then check for every lpar on the machine and not just one in particular. You can download my modified version here : rrMonitor. Here is what’s the script is doing:

    • Checking the state of the source machine.
    • If this one is not “Operating”, the script search for every remote restartable lpars on the machine.
    • The script is launching remote restart operations to remote restart all the partitions.
    • The script is telling the user the command to cleanup the old lpar when the source machine will be running again.
    # ./rrMonitor p814-1 p814-2 all 60 myhmc
    Getting remote restartable lpars
    lpar1 is rr simplified capable
    lpar1 rr status is Remote Restartable
    lpar2 is rr simplified capable
    lpar2 rr status is Remote Restartable
    lpar3 is rr simplified capable
    lpar3 rr status is Remote Restartable
    lpar4 is rr simplified capable
    lpar4 rr status is Remote Restartable
    Checking for source server state....
    Source server state is Operating
    Checking for source server state....
    Source server state is Operating
    Checking for source server state....
    Source server state is Power Off In Progress
    Checking for source server state....
    Source server state is Power Off
    It's time to remote restart
    Remote restarting lpar1
    Remote restarting lpar2
    Remote restarting lpar3
    Remote restarting lpar4
    Thu Jun 18 20:20:40 CEST 2015
    Source server p814-1 state is Power Off
    Source server has crashed and hence attempting a remote restart of the partition lpar1 in the destination server p814-2
    Thu Jun 18 20:23:12 CEST 2015
    The remote restart operation was successful
    The cleanup operation has to be executed on the source server once the server is back to operating state
    The following command can be used to execute the cleanup operation,
    rrstartlpar -m p814-1 -p lpar1 -o cleanup
    Thu Jun 18 20:23:12 CEST 2015
    Source server p814-1 state is Power Off
    Source server has crashed and hence attempting a remote restart of the partition lpar2 in the destination server p814-2
    Thu Jun 18 20:25:42 CEST 2015
    The remote restart operation was successful
    The cleanup operation has to be executed on the source server once the server is back to operating state
    The following command can be used to execute the cleanup operation,
    rrstartlpar -m sp814-1 -p lpar2 -o cleanup
    Thu Jun 18 20:25:42 CEST 2015
    [..]
    

    Conclusion

    As you can see the Simplified version of the remote restart feature is simpler that the normal one. My advice is to create all your lpars with the simplified remote restart attribute. It’s that easy :). If you plan to LPM back to P6 or P7 box, don’t use simplified remote restart. I think this functionality will become more popular when all the old P7 and P6 will be replaced by P8. As always I hope it helps.

    Here are a couple of link with great documentations about Simplified Remote Restart:

    • Simplified Remote Restart Whitepaper: here
    • Original rrMonitor: here
    • Materials about lastest HMC release and a couple of videos related to the Simplified Remote Restart: here

    Configuration of a Remote Restart Capable partition

    How can we move a partition to another machine if the machine or the data-center on which the partition is hosted is totally unavailable ? This question is often asked by managers and technical people. Live Partition Mobility can’t answer to this question because the source machine needs to be running to initiate the mobility. I’m sure that most of you are implementing a manual solution based on a bunch of scripts recreating the partition profile by hand but this is hard to maintain and it’s not fully automatized and not supported by IBM. A solution to this problem is to setup your partitions as Remote Restart Capable partitions. This PowerVM feature is available since the release of VMcontrol (IBM Systems Director plugin). Unfortunately this powerful feature is not well documented but will probably in the next months or in the next year be a must have on your newly deployed AIX machines. One last word : with the new Power8 machines things are going to change about remote restart, the functionality will be easier to use and a lot of prerequisites are going to disappear. Just to be clear this post has been written using Power7+ 9117-MMD machines, the only thing you can’t do with these machines (compared to Power8 ones) is changing a current partition to be remote restart capable aware without having to delete and recreate its profile.

    Pre-requesite

    To create and use a remote restart partition on Power7+/Power8 machines you’ll need this prerequisites :

    • A PowerVM enterprise license (Capability “PowerVM remote restart capable” to true, be careful there is another capability named “Remote restart capable” this was used by VMcontrol only, so double check the capability ok for you).
    • A firmware 780 (or later all Power8 firmware are ok, all Power7 >= 780 are ok).
    • Your source and destination machine are connected to the same Hardware Management Console, you can’t remote restart between two HMC at the moment.
    • Minimum version of HMC is 8r8.0.0. Check you have the rrstartlpar command (not the rrlpar command used by VMcontrol only).
    • Better than a long post check this video (don’t laugh at me, I’m trying to do my best but this is one of my first video …. hope it is good) :

    What is a remote restart capable virtual machine ?

    Better than a long text to explain you what is, check the picture below and follow each number from 1 to 4 to understand what is a remote restart partition :

    remote_restart_explanation

    Create the profile of you remote restart capable partition : Power7 vs Power8

    A good reason to move to Power8 faster than you planed is that you can change a virtual machine to be remote restart capable without having to recreate the whole profile. I don’t know why at the time of writing this post but changing a non remote restart capable lpar to a remote restart capable lpar is only available on Power8 systems. If you are using a Power7 machine (like me in all the examples below) be carful to check this option while creating the machine. Keep in mind that if you forgot to check to option you will not be able to enable the remote restart capability afterwards and you unfortunately have to remove you profile and recreate it, sad but true … :

    • Don’t forget to check the check box to allow the partition to be remote restart capable :
    • remote_restart_capable_enabled1

    • After the partition is created you can notice in the I/O tab that all remote restart capable partition are not able to own any physical I/O adapter :
    • rr2_nophys

    • You can check in the properties that the remote restart capable feature is activated :
    • remote_restart_capable_activated

    • If you try to modify an existing profile on a Power7 machine you’ll get this error message. On a Power8 machine there will be not problem :
    • # chsyscfg -r lpar -m XXXX-9117-MMD-658B2AD -p test_lpar-i remote_restart_capable=1
      An error occurred while changing the partition named test_lpar.
      The managed system does not support changing the remote restart capability of a partition. You must delete the partition and recreate it with the desired remote restart capability.
      
    • You can verify that some of your lpar are remote restart capable :
    • lssyscfg -r lpar -m source-machine -F name,remote_restart_capable
      [..]
      lpar1,1
      lpar2,1
      lpar3,1
      remote-restart,1
      [..]
      
    • On a Power 7 machine the best way to enable remote restart on an already created machine is to delete the profile and recreate it by hand and adding it the remote restart attribute :
    • Get the current partition profile :
    • $ lssyscfg -r prof -m s00ka9927558-9117-MMD-658B2AD --filter "lpar_names=temp3-b642c120-00000133"
      name=default_profile,lpar_name=temp3-b642c120-00000133,lpar_id=11,lpar_env=aixlinux,all_resources=0,min_mem=8192,desired_mem=8192,max_mem=8192,min_num_huge_pages=0,desired_num_huge_pages=0,max_num_huge_pages=0,mem_mode=ded,mem_expansion=0.0,hpt_ratio=1:128,proc_mode=shared,min_proc_units=2.0,desired_proc_units=2.0,max_proc_units=2.0,min_procs=4,desired_procs=4,max_procs=4,sharing_mode=uncap,uncap_weight=128,shared_proc_pool_id=0,shared_proc_pool_name=DefaultPool,affinity_group_id=none,io_slots=none,lpar_io_pool_ids=none,max_virtual_slots=64,"virtual_serial_adapters=0/server/1/any//any/1,1/server/1/any//any/1",virtual_scsi_adapters=3/client/2/s00ia9927560/32/0,virtual_eth_adapters=32/0/1659//0/0/vdct/facc157c3e20/all/0,virtual_eth_vsi_profiles=none,"virtual_fc_adapters=""2/client/1/s00ia9927559/32/c050760727c5007a,c050760727c5007b/0"",""4/client/1/s00ia9927559/35/c050760727c5007c,c050760727c5007d/0"",""5/client/2/s00ia9927560/34/c050760727c5007e,c050760727c5007f/0"",""6/client/2/s00ia9927560/35/c050760727c50080,c050760727c50081/0""",vtpm_adapters=none,hca_adapters=none,boot_mode=norm,conn_monitoring=1,auto_start=0,power_ctrl_lpar_ids=none,work_group_id=none,redundant_err_path_reporting=0,bsr_arrays=0,lpar_proc_compat_mode=default,electronic_err_reporting=null,sriov_eth_logical_ports=none
      
    • Remove the partition :
    • $ chsysstate -r lpar -o shutdown --immed -m source-server -n temp3-b642c120-00000133
      $ rmsyscfg -r lpar -m source-server -n temp3-b642c120-00000133
      
    • Recreate the partition with the remote restart attribute enabled :
    • mksyscfg -r lpar -m s00ka9927558-9117-MMD-658B2AD -i 'name=temp3-b642c120-00000133,profile_name=default_profile,remote_restart_capable=1,lpar_id=11,lpar_env=aixlinux,all_resources=0,min_mem=8192,desired_mem=8192,max_mem=8192,min_num_huge_pages=0,desired_num_huge_pages=0,max_num_huge_pages=0,mem_mode=ded,mem_expansion=0.0,hpt_ratio=1:128,proc_mode=shared,min_proc_units=2.0,desired_proc_units=2.0,max_proc_units=2.0,min_procs=4,desired_procs=4,max_procs=4,sharing_mode=uncap,uncap_weight=128,shared_proc_pool_name=DefaultPool,affinity_group_id=none,io_slots=none,lpar_io_pool_ids=none,max_virtual_slots=64,"virtual_serial_adapters=0/server/1/any//any/1,1/server/1/any//any/1",virtual_scsi_adapters=3/client/2/s00ia9927560/32/0,virtual_eth_adapters=32/0/1659//0/0/vdct/facc157c3e20/all/0,virtual_eth_vsi_profiles=none,"virtual_fc_adapters=""2/client/1/s00ia9927559/32/c050760727c5007a,c050760727c5007b/0"",""4/client/1/s00ia9927559/35/c050760727c5007c,c050760727c5007d/0"",""5/client/2/s00ia9927560/34/c050760727c5007e,c050760727c5007f/0"",""6/client/2/s00ia9927560/35/c050760727c50080,c050760727c50081/0""",vtpm_adapters=none,hca_adapters=none,boot_mode=norm,conn_monitoring=1,auto_start=0,power_ctrl_lpar_ids=none,work_group_id=none,redundant_err_path_reporting=0,bsr_arrays=0,lpar_proc_compat_mode=default,sriov_eth_logical_ports=none'
      

    Creating a reserved storage device

    The reserved storage device pool is used to store the configuration data of the remote restart partition. At the time of writing this post thoses devices are mandatory and as far as I know they are used just to store the configuration and not the state (memory state) of the virtual machines itself (maybe in the future, who knows ?) (You can’t create or boot any remote restart partition if you do not have a reserved storage device pool created, do this before doing anything else) :

    • You have first to find on both Virtual I/O Server and on both machines (source and destination machine used for the remote restart operation) a bunch of devices. These ones have to be the same on all the Virtual I/O Server used for the remote restart operation. The lsmemdev command is used to find those devices :
    • vios1$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
      hdisk988         00ced82ce999d6f3                     None
      hdisk989         00ced82ce999d960                     None
      hdisk990         00ced82ce999dbec                     None
      vios2$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
      hdisk988         00ced82ce999d6f3                     None
      hdisk989         00ced82ce999d960                     None
      hdisk990         00ced82ce999dbec                     None
      vios3$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
      hdisk988         00ced82ce999d6f3                     None
      hdisk989         00ced82ce999d960                     None
      hdisk990         00ced82ce999dbec                     None
      vios4$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
      hdisk988         00ced82ce999d6f3                     None
      hdisk989         00ced82ce999d960                     None
      hdisk990         00ced82ce999dbec                     None
      
      $ lsmemdev -r avail -m source-machine -p vios1,vios2
      [..]
      device_name=hdisk988,redundant_device_name=hdisk988,size=61440,type=phys,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E5000000000000,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E5000000000000,redundant_capable=1
      device_name=hdisk989,redundant_device_name=hdisk989,size=61440,type=phys,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E6000000000000,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E6000000000000,redundant_capable=1
      device_name=hdisk990,redundant_device_name=hdisk990,size=61440,type=phys,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E7000000000000,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E7000000000000,redundant_capable=1
      [..]
      $ lsmemdev -r avail -m dest-machine -p vios3,vios4
      [..]
      device_name=hdisk988,redundant_device_name=hdisk988,size=61440,type=phys,phys_loc=U2C4E.001.DBJN914-P2-C2-T1-W500507680140F32C-L3E5000000000000,redundant_phys_loc=U2C4E.001.DBJN914-P2-C1-T1-W500507680140F32C-L3E5000000000000,redundant_capable=1
      device_name=hdisk989,redundant_device_name=hdisk989,size=61440,type=phys,phys_loc=U2C4E.001.DBJN914-P2-C2-T1-W500507680140F32C-L3E6000000000000,redundant_phys_loc=U2C4E.001.DBJN914-P2-C1-T1-W500507680140F32C-L3E6000000000000,redundant_capable=1
      device_name=hdisk990,redundant_device_name=hdisk990,size=61440,type=phys,phys_loc=U2C4E.001.DBJN914-P2-C2-T1-W500507680140F32C-L3E7000000000000,redundant_phys_loc=U2C4E.001.DBJN914-P2-C1-T1-W500507680140F32C-L3E7000000000000,redundant_capable=1
      [..]
      
    • Create the reserved storage device pool using the chhwres command on the Hardware Management Console (create on all machines used by the remote restart operation) :
    • $ chhwres -r rspool -m source-machine -o a -a vios_names=\"vios1,vios2\"
      $ chhwres -r rspool -m source-machine -o a -p vios1 --rsubtype rsdev --device hdisk988 --manual
      $ chhwres -r rspool -m source-machine -o a -p vios1 --rsubtype rsdev --device hdisk989 --manual
      $ chhwres -r rspool -m source-machine -o a -p vios1 --rsubtype rsdev --device hdisk990 --manual
      $ lshwres -r rspool -m source-machine --rsubtype rsdev
      device_name=hdisk988,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Inactive,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E5000000000000,is_redundant=1,redundant_device_name=hdisk988,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Inactive,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E5000000000000,lpar_id=none,device_selection_type=manual
      device_name=hdisk989,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Inactive,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E6000000000000,is_redundant=1,redundant_device_name=hdisk989,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Inactive,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E6000000000000,lpar_id=none,device_selection_type=manual
      device_name=hdisk990,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Inactive,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E7000000000000,is_redundant=1,redundant_device_name=hdisk990,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Inactive,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E7000000000000,lpar_id=none,device_selection_type=manual
      $ lshwres -r rspool -m source-machine
      "vios_names=vios1,vios2","vios_ids=1,2"
      
    • You can also create the reserved storage device pool from Hardware Management Console GUI :
    • After selecting the Virtual I/O Server, click select devices :
    • rr_rsd_pool_p

    • Choose the maximum and minimum size to filter the devices you can select for the creation of the reserved storage device :
    • rr_rsd_pool2_p

    • Choose the disk you want to put in you reserved storage device pool (put all the devices used by remote restart partitions in manual mode, automatic devices are used by suspend/resume operation or AMS pool. One device can not be shared by two remote restart partitions) :
    • rr_rsd_pool_waiting_3_p
      rr_pool_create_7_p

    • You can check afterwards that your reserved device storage pool is created and is composed by three devices :
    • rr_pool_create_9
      rr_pool_create_8_p

    Select a storage device for each remote restart partition before starting it :

    After creating the reserved device storage pool you have for every partition to select a device from the storage pool. This device will be used to store the configuration data of the partition :

    • You can see you cannot start the partition if no devices were selected !
    • To select the correct device size you first have to calculate the needed space for every partition using the using the lsrsdevsize command. This size around the size of max memory value set in the partition profile (don’t ask me why):
    • $ lsrsdevsize -m source-machine -p temp3-b642c120-00000133
      size=8498
      
    • Select the device you want to assign to your machine (in my case there was already a device selected for this machine) :
    • rr_rsd_pool_assign_p

    • Then select the machine you want to assign for the device :
    • rr_rsd_pool_assign2_p

    • Or do this in command line :
    • $ chsyscfg -r lpar -m source-machine -i "name=temp3-b642c120-00000133,primary_rs_vios_name=vios1,secondary_rs_vios_name=vios2,rs_device_name=hdisk988"
      $ lssyscfg -r lpar -m source-machine --filter "lpar_names=temp3-b642c120-00000133" -F primary_rs_vios_name,secondary_rs_vios_name,curr_rs_vios_name
      vios1,vios2,vios1
      $ lshwres -r rspool -m source-machine --rsubtype rsdev
      device_name=hdisk988,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Active,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E5000000000000,is_redundant=1,redundant_device_name=hdisk988,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Active,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E5000000000000,lpar_name=temp3-b642c120-00000133,lpar_id=11,device_selection_type=manual
      

    Launch the remote restart operation

    All the remote restart operations are launched from the Hardware Management Console with the rrstartlpar command. At the time of writing this post there is not GUI function to remote restart a machine and you can only do it with the command line :

    Validation

    Like you can do it with a Live Partition Mobility move you can validate a remote restart operation before running it. You can only perform the remote restart operation if the machine on which the remote restart machine is hosted is shutdown or in error, so the validation is very useful and mandatory to check your remote restart machine are well configured without having to stop the source machine :

    $ rrstartlpar -o validate -m source-machine -t dest-machine -p rrlpar
    $ rrstartlpar -o validate -m source-machine -t dest-machine -p rrlpar -d 5
    $ rrstartlpar -o validate -m source-machine -t dest-machine -p rrlpar --redundantvios 2 -d 5 -v
    

    Execution

    As I said before the remote restart operation can only be performed if the source machine is in a particular state, the states that allows a remote restart operation are :

    • Power Off.
    • Error.
    • Error – Dump in progress state.

    So the only way to test a remote restart operation today is to shutdown your source machine :

    • Shutdown the source machine :
    • step1

      $ chsysstate -m source-machine -r sys  -o off --immed
      

      rr_step2_mod

    • You can next check on the Hardware Management Console that Virtual I/O Servers and the remote restart lpar are in state “Not available”. You’re now ready to remote restart the lpar (if the partition id is used on the destination machine the next available one will be used) (you have to wait a little before remote restarting the partition, check below) :
    • $ rrstartlpar -o restart -m source-machine -t dest-machine -p rrlpar -d 5 -v
      HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.
      $ rrstartlpar -o restart -m source-machine -t dest-machine -p rrlpar -d 5 -v
      Warnings:
      HSCLA32F The specified partition ID is no longer valid. The next available partition ID will be used.
      

      step3
      rr_step4_mod
      step5

    Cleanup

    When the source machine is ready to be up (after an outage for instance) just boot the machine and its Virtual I/O Server. After the machine is up you can notice that the rrlpar profile is still there and it can be a huge problem if somebody is trying to boot this machine because it is started on the other machine after the remote restart operation. To prevent such an error you have to cleanup your remote restart partition by using the rrstartlpar command again. Be careful not to check the option to boot the partitions after the machine is started :

    • Restart the source machine and its Virtual I/O Servers :
    • $ chsysstate -m source-machine -r sys -o on
      $ chsysstate -r lpar -m source-machine -n vios1 -o on -f default_profile
      $ chsysstate -r lpar -m source-machine -n vios2 -o on -f default_profile
      

      rr_step6_mod

    • Perform the cleanup operation to remove the profile of the remote restart partition (if you want later to LPM back your machine you have to keep the device of the reserved device storage pool in the pool, if you do not use the –retaindev option the device will be automatically removed from the pool) :
    • $ rrstartlpar -o cleanup -m source-machine -p rrlpar --retaindev -d 5 -v --force
      

      rr_step7_mod

    Refresh the partition and profile data

    During my test I encounter a problem. The configuration was not correctly synced between the device used in the reserved device storage pool and the current partition profile. I had to use a command named refdev (for refresh device) to synchronize the partition and profile data to the storage device.

    $ refdev -m source-machine -p refdev -m sys1 -p temp3-b642c120-00000133 -v 
    

    What’s in the reserved storage device ?

    I’m a curious guy. After playing with remote restart I asked myself a question, what is really stored in the reserved device storage device assigned to the remote restart partition. Looking in the documentation on the internet does not answer to my question so I had to look on it on my own. By ‘dding” the reserved storage device assigned to a partition I realized that the profile is stored in xml format. Maybe this format is the same format that the one used by the HMC 8 templates library. For the moment and during my tests on Power7+ machine the state of the memory of the partition is not transferred to the destination machine, maybe because I had to shutdown the whole source machine to test. Maybe the memory state of the machine is transferred to the destination machine if this one is in error state or is dumping. I had not chance to test this :

    root@vios1:/home/padmin# dd if=/dev/hdisk17 of=/tmp/hdisk17.out bs=1024 count=10
    10+0 records in
    10+0 records out
    root@vios1:/home/padmin# more hdisk17.out
    [..]
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    BwEAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACgDIAZAAAQAEAAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" Profile="H4sIAAAAA
    98VjxbxEAhNaZEqpEptPS/iMJO4cTJBdHVj38zcYvu619fTGQlQVmxY0AUICSH4A5XYorJgA1I3sGMBCx5Vs4RNd2zgXI89tpNMxslIiRzPufec853zfefk/t/osMfRBYPZRbpuF9ueUTQsShxR1NSl9dvEEPPMMgnfvPnVk
    a2ixplLuOiVCHaUKn/yYMv/PY/ydTRuv016TbgOzdVv4w6+KM0vyheMX62jgq0L7hsCXtxBH6J814WoZqRh/96+4a+ff3Br8+o3uTE0pqJZA7vYoKKnOgYnNoSsoiPECp7KzHfELTQV/lnBAgt0/Fbfs4Wd1sV+ble7Lup/c
    be0LQj01FJpoVpecaNP15MhHxpcJP8al6b7fg8hxCnPY68t8LpFjn83/eKFhcffjqF8DRUshs0almioaFK0OfHaUKCue/1GcN0ndyfg9/fwsyzQ6SblellXK6RDDaIIwem6L4iXCiCfCuBZxltFz6G4eHed2EWD2sVVx6Mth
    eEOtnzSjQoVwLbo2+uEf3T/s2emPv3z4xA16eD0AC6oRN3FXNnYoA6U7y3OfFc1g5hOIiTQsVUHSusSc43QVluEX2wKdKJZq4q2YmJXEF7hhuqYJA0+inNx3YTDab2m6T7vEGpBlAaJnU0qjWofTkj+uT2Tv3Rl69prZx/9s
    thQTBMK42WK7XSzrizqFhPL5E6FeHGVhnSJQLlKKreab1l6z9MwF0C/jTi3OfmKCsoczcJGwITgy+f74Z4Lu2OU70SDyIdXg1+JAApBWZoAbLaEj4InyonZIDbjvZGwv3H5+tb7C5tPThQA9oUdsCN0HsnWoLxWLjPHAdJSp
    Ja45pBarVb3JDyUJOn3aemXcIqtUfgPi3wCuiw76tMh6mVtNVDHOB+BxqEUDWZGtPgPrFc9oBgBhhJzEdsEVI9zC1gr0JTexhwgThzIwYEG7lLbt3dcPyHQLKQqfGzVsSNzVSvenkDJU/lUoiXGRNrdxLy2soyhtcNX47INZ
    nHKOCjYfsoeR3kpm58GdYDVxipIZXDgSmhfCDCPlKZm4dZoVFORzEX0J6CLvK4py6N7Pz94yiXlPBAArd3zqIEtjXFZ4izJzQ44sCv7hh3bTnY5TbKdnOtHGtatTjrEynTuWFNXV3ouaUKIIKfDgE5XrrpWb/SHWyWCbXMM5
    DkaHNzXVJws6csK57jnpToLopiQLZdgHJJh9wm+M+wbof7GzSRJBYvAAaV0RvE8ZlA5yxSob4fAiJiNNwwQAwu2y5/O881fvvz3HxgK70ZDwc1FS8JezBgKR0e/S4XR3ta8OwmdS56akXJITAmYBpElF5lZOdlXuO+8N0opU
    m0HeJTw76oiD8PS9QfRECUYqk0B1KGkZ+pRGQPUhPFEb12XIoe7u4WXuwdVqTAnZT8gyYrvAPlL/sYG4RkDmAx5HFZpFIVnAz9Lrlyh9tFIc4nZAColOLNGdFRKmE8GJd5zZx++zMiAoTOWNrJvBjODNo1UOGuXngzcHWjrn
    LgmkxjBXLj+6Fjy1DHFF0zV6lVH/p+VYO6pbZzYD9/ORFLouy6MwvlGuRz8Qz10ugawprAdtJ4GxWAOtmQjZXJ+Lg58T/fDy4K74bYWr9CyLIVdQiplHPLbjinZRu4BZuAENE6jxTP2zNkBVgfiWiFcv7f3xYjFqxs/7vb0P
     lpar_name="rrlpar" lpar_uuid="0D80582A44F64B43B2981D632743A6C8" lpar_uuid_gen_method="0"><SourceLparConfig additional_mac_addr_bases="" ame_capability="0" auto_start_e
    rmal" conn_monitoring="0" desired_proc_compat_mode="default" effective_proc_compat_mode="POWER7" hardware_mem_encryption="10" hardware_mem_expansion="5" keylock="normal
    "4" lpar_placement="0" lpar_power_mgmt="0" lpar_rr_dev_desc="	<cpage>		<P>1</P>
    		<S>51</S>
    		<VIOS_descri
    00010E0000000000003FB04214503IBMfcp</VIOS_descriptor>
    	</cpage>
    " lpar_rr_status="6" lpar_tcc_slot_id="65535" lpar_vtpm_status="65535" mac_addres
    x_virtual_slots="10" partition_type="rpa" processor_compatibility_mode="default" processor_mode="shared" shared_pool_util_authority="0" sharing_mode="uncapped" slb_mig_
    ofile="1" time_reference="0" uncapped_weight="128"><VirtualScsiAdapter is_required="false" remote_lpar_id="2" src_vios_slot_number="4" virtual_slot_number="4"/><Virtual
    "false" remote_lpar_id="1" src_vios_slot_number="3" virtual_slot_number="3"/><Processors desired="4" max="8" min="1"/><VirtualFibreChannelAdapter/><VirtualEthernetAdapt
    " filter_mac_address="" is_ieee="0" is_required="false" mac_address="82776CE63602" mac_address_flags="0" qos_priority="0" qos_priority_control="false" virtual_slot_numb
    witch_id="1" vswitch_name="vdct"/><Memory desired="8192" hpt_ratio="7" max="16384" memory_mode="ded" min="256" mode="ded" psp_usage="3"><IoEntitledMem usage="auto"/></M
     desired="200" max="400" min="10"/></SourceLparConfig></SourceLparInfo></SourceInfo><FileInfo modification="0" version="1"/><SriovEthMappings><SriovEthVFInfo/></SriovEt
    VirtualFibreChannelAdapterInfo/></VfcMappings><ProcPools capacity="0"/><TargetInfo concurr_mig_in_prog="-1" max_msp_concur_mig_limit_dynamic="-1" max_msp_concur_mig_lim
    concur_mig_limit="-1" mpio_override="1" state="nonexitent" uuid_override="1" vlan_override="1" vsi_override="1"><ManagerInfo/><TargetMspInfo port_number="-1"/><TargetLp
    ar_name="rrlpar" processor_pool_id="-1" target_profile_name="mig3_9117_MMD_10C94CC141109224549"><SharedMemoryConfig pool_id="-1" primary_paging_vios_id="0"/></TargetLpa
    argetInfo><VlanMappings><VlanInfo description="VkVSU0lPTj0xClZJT19UWVBFPVZFVEgKVkxBTl9JRD0zMzMxClZTV0lUQ0g9dmRjdApCUklER0VEPXllcwo=" vlan_id="3331" vswitch_mode="VEB" v
    ibleTargetVios/></VlanInfo></VlanMappings><MspMappings><MspInfo/></MspMappings><VscsiMappings><VirtualScsiAdapterInfo description="PHYtc2NzaS1ob3N0PgoJPGdlbmVyYWxJbmZvP
    mVyc2lvbj4KCQk8bWF4VHJhbmZlcj4yNjIxNDQ8L21heFRyYW5mZXI+CgkJPGNsdXN0ZXJJRD4wPC9jbHVzdGVySUQ+CgkJPHNyY0RyY05hbWU+VTkxMTcuTU1ELjEwQzk0Q0MtVjItQzQ8L3NyY0RyY05hbWU+CgkJPG1pb
    U9TcGF0Y2g+CgkJPG1pblZJT1Njb21wYXRhYmlsaXR5PjE8L21pblZJT1Njb21wYXRhYmlsaXR5PgoJCTxlZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4xPC9lZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4KCTwvZ2VuZ
    TxwYXJ0aXRpb25JRD4yPC9wYXJ0aXRpb25JRD4KCTwvcmFzPgoJPHZpcnREZXY+CgkJPHZEZXZOYW1lPnJybHBhcl9yb290dmc8L3ZEZXZOYW1lPgoJCTx2TFVOPgoJCQk8TFVBPjB4ODEwMDAwMDAwMDAwMDAwMDwvTFVBP
    FVOU3RhdGU+CgkJCTxjbGllbnRSZXNlcnZlPm5vPC9jbGllbnRSZXNlcnZlPgoJCQk8QUlYPgoJCQkJPHR5cGU+dmRhc2Q8L3R5cGU+CgkJCQk8Y29ubldoZXJlPjE8L2Nvbm5XaGVyZT4KCQkJPC9BSVg+CgkJPC92TFVOP
    gkJCTxyZXNlcnZlVHlwZT5OT19SRVNFUlZFPC9yZXNlcnZlVHlwZT4KCQkJPGJkZXZUeXBlPjE8L2JkZXZUeXBlPgoJCQk8cmVzdG9yZTUyMD50cnVlPC9yZXN0b3JlNTIwPgoJCQk8QUlYPgoJCQkJPHVkaWQ+MzMyMTM2M
    DAwMDAwMDAwMDNGQTA0MjE0NTAzSUJNZmNwPC91ZGlkPgoJCQkJPHR5cGU+VURJRDwvdHlwZT4KCQkJPC9BSVg+CgkJPC9ibG9ja1N0b3JhZ2U+Cgk8L3ZpcnREZXY+Cjwvdi1zY3NpLWhvc3Q+" slot_number="4" sou
    _slot_number="4"><PossibleTargetVios/></VirtualScsiAdapterInfo><VirtualScsiAdapterInfo description="PHYtc2NzaS1ob3N0PgoJPGdlbmVyYWxJbmZvPgoJCTx2ZXJzaW9uPjIuNDwvdmVyc2lv
    NjIxNDQ8L21heFRyYW5mZXI+CgkJPGNsdXN0ZXJJRD4wPC9jbHVzdGVySUQ+CgkJPHNyY0RyY05hbWU+VTkxMTcuTU1ELjEwQzk0Q0MtVjEtQzM8L3NyY0RyY05hbWU+CgkJPG1pblZJT1NwYXRjaD4wPC9taW5WSU9TcGF0
    YXRhYmlsaXR5PjE8L21pblZJT1Njb21wYXRhYmlsaXR5PgoJCTxlZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4xPC9lZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4KCTwvZ2VuZXJhbEluZm8+Cgk8cmFzPgoJCTxwYXJ0
    b25JRD4KCTwvcmFzPgoJPHZpcnREZXY+CgkJPHZEZXZOYW1lPnJybHBhcl9yb290dmc8L3ZEZXZOYW1lPgoJCTx2TFVOPgoJCQk8TFVBPjB4ODEwMDAwMDAwMDAwMDAwMDwvTFVBPgoJCQk8TFVOU3RhdGU+MDwvTFVOU3Rh
    cnZlPm5vPC9jbGllbnRSZXNlcnZlPgoJCQk8QUlYPgoJCQkJPHR5cGU+dmRhc2Q8L3R5cGU+CgkJCQk8Y29ubldoZXJlPjE8L2Nvbm5XaGVyZT4KCQkJPC9BSVg+CgkJPC92TFVOPgoJCTxibG9ja1N0b3JhZ2U+CgkJCTxy
    UlZFPC9yZXNlcnZlVHlwZT4KCQkJPGJkZXZUeXBlPjE8L2JkZXZUeXBlPgoJCQk8cmVzdG9yZTUyMD50cnVlPC9yZXN0b3JlNTIwPgoJCQk8QUlYPgoJCQkJPHVkaWQ+MzMyMTM2MDA1MDc2ODBDODAwMDEwRTAwMDAwMDAw
    ZmNwPC91ZGlkPgoJCQkJPHR5cGU+VURJRDwvdHlwZT4KCQkJPC9BSVg+CgkJPC9ibG9ja1N0b3JhZ2U+Cgk8L3ZpcnREZXY+Cjwvdi1zY3NpLWhvc3Q+" slot_number="3" source_vios_id="1" src_vios_slot_n
    tVios/></VirtualScsiAdapterInfo></VscsiMappings><SharedMemPools find_devices="false" max_mem="16384"><SharedMemPool/></SharedMemPools><MigrationSession optional_capabil
    les" recover="na" required_capabilities="veth_switch,hmc_compatibilty,proc_compat_modes,remote_restart_capability,lpar_uuid" stream_id="9988047026654530562" stream_id_p
    on>
    

    About the state of the source machine ?

    You have to know this before using remote restart : at the time of writing this post the remote restart feature is still young and have to evolve before being usable in real life, I’m saying this because the FSP of the source machine has to be up to perform a remote restart operation. To be clear the remote restart feature does not answer to the total loss of one of your site. It’s just useful to restart partitions of a system with a problem that is not an FSP problem (problem with memory DIMM, problem with CPUs for instance). It can be used in your DRP exercises but not if your whole site is totally down which is -in my humble opinion- one of the key feature that remote restart needs to answer. Don’t be afraid read the conclusion ….

    Conclusion

    This post have been written using Power7+ machines, my goal was to give you an example of remote restart operations : a summary of what is is, how it work, and where and when to use it. I’m pretty sure that a lot of things are going to change about remote restart. First, on Power8 machines you don’t have to recreate the partitions to make them remote restart aware. Second, I know that changes are on the way for remote restart on Power8 machines, especially about reserved storage devices and about the state of the source machine. I’m sure this feature will have a bright future and used with PowerVC it can be a killer feature. Hope to see all this changes in a near future ;-). Once again I hope this post helps you.