Unleash the true potential of SRIOV vNIC using vNIC failover !

I’m always working on tight schedule, I never have the time to write documentation because we’re moving fast, very fast … but not as fast as I want to ;-). A few months ago we were asked to put the TSM servers in our PowerVC environment I thought it was a very very bad idea to put a pet among the cattle as TSM servers are very specific and super I/O intensive in our environment (and are configured with plenty of rmt devices. This means that we tried to put lan-free stuffs into Openstack which is not designed at all for this kind of things). In my previous place we tried to put the TSM servers behind a virtualized environment (this means serving network through Shared Ethernet Adapters) and this was an EPIC FAIL. A few weeks after putting the servers in production we decided to move back to physical I/O and decided to used dedicated network adapters. As we didn’t want to make the same mistake in my current place we decided not to go on Shared Ethernet Adapters. Instead of that we took the decision to use SRIOV vNIC. SRIOV vNIC have the advantage to be fully virtualized (this means LPM aware and super flexible) allowing us to have the wanted flexibility (by moving TSM servers between sites if we feel the need to put a host in maintenance mode or if we are facing any kind of outage). In my previous blog post about vNIC I was very happy with the performance but not with the reliability. I didn’t want to go on NIB adapters for network redundancy (because it is an anti-virtualization way of doing things (we do not want to manage anything inside the VM, we want to let the virtualization environment do the job for us)). Lucky for me the project was reschedule to the end of the year and we finally took the decision not to put the TSM server into our big Openstack by dedicating some hosts for the backup stuffs. The latest version of PowerVM, HMC and firmware arrived just at time to let me use SRIOV vNIC failover new feature for this new TSM environment (fortunately for me we had some data center issues allowing me to wait enough time not to go on NIB and start the production directly with SRIOV vNIC \o/). I just have delivered the first four servers to my backup team yesterday and I must admit that SRIOV vNIC failover is a killer feature for this kind of things. Let’s now see how to setup this !

Prerequisites

As always using the latest features means you need to have everything up to date. In this case the minimal requierements for SRIOV vNIC failover are Virtual I/O Servers 2.2.5.10, Hardware Management Console v8R860 with the latest patchs and finally having a firmware up to date (ie. fw 860). Note that not all AIX versions are ok with SRIOV vNIC I’m here only using AIX 7.2 TL1 SP1:

  • Check the Virtual I/O Server are installed in 2.2.5.10:
  • # ioslevel
    2.2.5.10
    
  • Check the HMC is in the latest version (V8R860)
  • hscroot@myhmc:~> lshmc -V
    "version= Version: 8
     Release: 8.6.0
     Service Pack: 0
    HMC Build level 20161101.1
    MH01655: Required fix for HMC V8R8.6.0 (11-01-2016)
    ","base_version=V8R8.6.0
    "
    

    860

  • Check the firmware version is ok on the PowerSystem:
  • # updlic -o u -t sys -l latest -m reptilian-9119-MME-659707C -r mountpoint -d /home/hscroot/860_056/ -v
    # lslic -m reptilan-9119-MME-65BA46F -F activated_level,activated_spname
    56,FW860.10
    

    fw

What’s SRIOV vNIC failover and how it works ?

I’ll not explain here what’s an SRIOV vNIC, if you want to know more about it just check my previous blog post speaking about this topic A first look at SRIOV vNIC adapters. What’s failover is adding is a feature allowing you to add as “many” backing devices as you want for a vNIC adapter (the maximum is 6 backing devices). For each backing device you have the possibility to choose on which Virtual I/O Server will be created the corresponding vnicserver and set a failover priority to determine which backing device is active. Keep in mind that priorities are working the exact same way as it is with Shared Ethernet Adapter. This means that priority 10 is an higher priority than priority 20.

vnicvisio1

On the example shown on the images above and below the vNIC is configured with two backing devices (on two differents SRIOV adapters) with priority 10 and 20. As long as there is no outage (for instance on the Virtual I/O Server or on the adapter itself) the physical port utilized will be the one with priority 10. If the adapter has for instance an hardware issue we will have the possiblity to manually fallback on the second backing device or let the hypervisor do this for us by checking the next highest priority to choose the right backing device to use. Easy. This allow us to have redundant LPM aware and high performance adapters fully virtualized. A MUST :-) !

vnicvisio2

Creating a SRIOV vNIC failover using the HMC GUI and administrating it

To create or delete an SRIOV vNIC failover adapter (I’ll call this vNIC for the rest of the blog post) the machine must be shutdown or active (this is not possible to add a vNIC when a machine is booted in OpenFirmware). The only way to do this using the HMC GUI is to used the enhanced interface (no problem as we will have no other choice in a near future). Select the machine on which you want to create the adapter and click on the “Virtual NICs” tab.

vnic1b

Click “Add Virtual NIC”:

vnic1c

Chose the “Physical Port Location Code” (the physical port of the SRIOV adapter) on which you want to create the vNIC. You can add from one to six “backup adapter” (by clicking the “Add Entry” buton). This means that only one vNIC will be active at a moment. If this one is failing (adapter issue, network issue) the vNIC will failover to the next backup adapter depending on the “Failover priority”. Be careful to spread the hosting Virtual I/O Server to be sure that having a Virtual I/O Server down will be seamless for you partition:

vnic1d

On the example above:

  • I’m creating a vNIC failover with “vNIC Auto Priority Failover” enabled.
  • Four VF will be created two on the VIOS ending with 88, two on the VIOS ending with 89.
  • Obviously four vnicservers will be created on the VIOS (2 on each).
  • The lower priority will take the lead. This means That if the first one with priority 10 is failing the active adapter will be the second one. Then if the second one with priority 20 is failing the third one will be active and so on. Keep in my that if your lower priority is ok nothing will appends if one on the other backup adapter is failing. Be smart when choosing the priorities. As Yoda says “Wise you must be!”.
  • The physical ports are located on different CECs.

vnic1e

The “Advanced Virtual NIC Settings” is applied to all the vNIC that will be created (in the example above 4). For instance I’m using vlan tagging on these port so I just need to apply the “Port VLAN ID” one time.

vnic1f

You can choose or not to allow the hypervisor to perform the failover/fallback automatically depending on the priorities you have set. If you click “enable” the hypervisor will automatically failover to the next operational backing device depending on the priorities. If it is disabled only a user can trigger a failover operation.

vnic1g

Be careful the priorities are designed the same way they are on Shared Ethernet Adapter. This means the lowest number you will have in the failover priority will be the “highest priority failover” just like it is designed for Shared Ethernet Adapter. On the image below you can notice that the “priority” 10 which is the “highest failover priority” is active (but it is the lowest number between 10 20 30 and 40)

vnic1h

After the creation of the vNIC you can check differents stuffs on the Virtual I/O Server. You will notice that every entry added for the creation of the vNIC has a corresponding VF (virtual function) and a corresponding vnicserver (each vnicserver has a VF mapped on it):

  • You can see that for each entry added when creating a vNIC you’ll have the corresponding VF device present on the Virtual I/O Servers:
  • vios1# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C3-C1-T2-S5                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C3-C1-T2-S6                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
    vios2# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
  • For each VF you’ll see the corresponding vnicserver devices:
  • vios1# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
    vios2# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
  • You can check the corresponding mapped VF for each vnicserver using the ‘lsmap’ command. You can check on funny thing: when the adapter was never “used” by using the “Make the backing Device Active” button in the GUI the corresponding client name and Client device will not be showed:
  • vios1# lsmap -all -vnic -fmt :
    [..]
    vnicserver1:U9119.MME.659707C-V2-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C3-C1-T2-S5:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V2-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C3-C1-T2-S6:N/A:U9119.MME.659707C-V6-C6
    
    vios2# lsmap -all -vnic
    [..]
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver1   U9119.MME.659707C-V1-C32898             6 N/A            N/A
    
    Backing device:ent3
    Status:Available
    Physloc:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2
    Client device name:ent0
    Client device physloc:U9119.MME.659707C-V6-C6
    
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver2   U9119.MME.659707C-V1-C32899             6 N/A            N/A
    
    Backing device:ent4
    Status:Available
    Physloc:U78CA.001.CSS08EL-P1-C4-C1-T2-S2
    Client device name:N/A
    Client device physloc:U9119.MME.659707C-V6-C6
    
  • You can activate the device by yourself just by clicking the “Make backing Device Active Button” in the GUI and check the vnicserver is now logged:
  • vnic1i
    vnic1j

    vios2# lsmap -all -vnic -vadapter
    [..]
    vnicserver1:U9119.MME.659707C-V1-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V1-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C4-C1-T2-S2:N/A:U9119.MME.659707C-V6-C6
    
  • I noticed something pretty strange for me. When you are doing a manual failover of the vNIC the auto-priority will be set to disable. Remember to re-enable it after the manual operation was performed:
  • vnic1k

    You can also check the status and the priority of the vNIC in the Virtual I/O Server using the vnicstat command. Some good information are showed by the command, the state of the device, if it is active or not (I have noticed 2 different states in my test which are “active” (meaning this is the vf/vnicserver you are using) and “config_2″ meaning the adapter is ready and available for a failover operation (there is probably another state when the link is down but I didn’t had the time to ask my network team to shut a port to verify this)) and finally the failover priority. The vnicstat command is a root command.

    vios1#  vnicstat vnicserver1
    
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 10
    
    Client Partition ID: 6
    Client Partition Name: lizard
    Client Operating System: AIX
    Client Device Name: ent0
    Client Device Location Code: U9119.MME.659707C-V6-C6
    [..]
    
    vios2# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: config_2
    Backing Device Name: ent3
    
    Failover State: inactive
    Failover Readiness: operational
    Failover Priority: 20
    [..]
    

    You can also check vnic server events in this errpt (login when failover and so on …)

    # errpt | more
    8C577CB6   1202195216 I S vnicserver1    VNIC Transport Event
    60D73419   1202194816 I S vnicserver1    VNIC Client Login
    # errpt -aj 60D73419 | more
    ---------------------------------------------------------------------------
    LABEL:          VS_CLIENT_LOGIN
    IDENTIFIER:     60D73419
    
    Date/Time:       Fri Dec  2 19:48:06 2016
    Sequence Number: 10567
    Machine Id:      00C9707C4C00
    Node Id:         vios2
    Class:           S
    Type:            INFO
    WPAR:            Global
    Resource Name:   vnicserver1
    
    Description
    VNIC Client Login
    
    Probable Causes
    VNIC Client Login
    
    Failure Causes
    VNIC Client Login
    

    Same thing using the hmc command line.

    Now we will do the same thing in command line. I warn you the commands are pretty huge !!!!

    • List the sriov adapter (you will need those to create the vNICs):
    • # lshwres -r sriov --rsubtype adapter -m reptilian-9119-MME-65BA46F
      adapter_id=3,slot_id=21010012,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=4,slot_id=21010013,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=1,slot_id=21010022,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=2,slot_id=21010023,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      
    • List vNIC for virtual machine “lizard”:
    • lshwres -r virtualio  -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=0,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/3/0/2700c003/2.0/2.0/50,sriov/vios2/2/1/0/27004003/2.0/2.0/60","backing_device_states=sriov/2700c003/0/Operational,sriov/27004003/1/Operational"
      
    • Creates a vNIC with 2 backing devices first one on Virtual I/O Server 1 on adapter 1 on physical port 2 with a failover priority set to 10, second one on Virtual I/O Server 2 on adapter 3 on physical port 2 with a failover priority set to 20 (this vNIC will take the next available slot which will be 6) (WARNING: Physical port numbering starts from 0):
    • #chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o a -p lizard --rsubtype vnic -v -a 'port_vlan_id=3455,auto_priority_failover=1,backing_devices="sriov/vios1//1/1/2.0/10,sriov/vios1//3/1/2.0/20"'
      #lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational"
      
    • Add two backing devices (one on each vios on adapter 2 and 4, both on physical port 2 with failover priority set to 30 and 40) on vNIC with slot 6:
    • # chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o s --rsubtype vnic -p lizard -s 6 -a '"backing_devices+=sriov/vios1//2/1/2.0/30,sriov/vios2//4/1/2.0/40"'
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Change the failover priority of logical port 2700400b of the vNIC in slot 6 to 11:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnicbkdev -p lizard -s 6 --logport 2700400b -a "failover_priority=11"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Make logical port 27008005 active on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o act --rsubtype vnicbkdev -p lizard  -s 6 --logport 27008005 
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/0/Operational,sriov/2700c008/0/Operational,sriov/27008005/1/Operational,sriov/27010002/0/Operational"
      
    • Re-enable automatic failover on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnic -p lizard  -s 6 -a "auto_priority_failover=1"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      

    Testing the failover.

    It’s now time to test is the failover is working as intended. The test will be super simple I will just shutoff one of the two Virtual I/O Server and check if I’m loosing some packets or not. I’m first checking on which VIOS is located the active adapter:

    vnic1l

    I now need to shutdown the Virtual I/O Server ending with 88 and check if the one ending with 89 is taking the lead:

    *****88# shutdown -force 
    

    Priorities 10 and 30 are on the shutted Virtual I/O Server, the highest priority is on the active Virtual I/O Server is 20. This backing device hosted on the second Virtual I/O Server is serving the network I/Os;

    vnic1m

    You can check the same thing with command line on the remaining Virtual I/O Server:

    *****89# errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    60D73419   1202214716 I S vnicserver0    VNIC Client Login
    60D73419   1202214716 I S vnicserver1    VNIC Client Login
    *****89# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 20
    
    

    During my tests the failover was working as I expected. You can see on the picture below that during this test I only lost one ping between 64 and 66 during the failover/failback process.

    vnic1n

    In the partition I saw some messaging in the errpt during the failover:

    # errpt | mroe 
    4FB9389C   1202215816 I S ent0           VNIC Link Up
    F655DA07   1202215816 I S ent0           VNIC Link Down
    # errpt -a | more
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: DOWN   logical link: DOWN
    Status
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: UP   logical link: UP
    Status
    

    What about Live Partition Mobility.

    If you want a seamless LPM experience without having to choose the destination adapter and physical port on which to map you current vNIC backing devices on the destination, just fill the label and sublabel (most important is label) for each physical port of your SRIOV adapter. Then during the LPM if names are aligned between two systems the good physical port will be automatically chose depending on the names of the label:

    vnic1o
    vnic1p

    The LPM was working like a charm and I didn’t notice any particular problems during the move. vNIC failover and LPM are working ok as long as you take care of your SRIOV labels :-). I did notice on AIX 7.2 TL1 SP1 that there was no errpt messages in the partition itself but just in the Virtual I/O Server … weird :-)

    # errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    3EB09F5A   1202222416 I S Migration      Migration completed successfully
    

    Conlusion.

    No long story here. If you need performance AND flexibility you absolutely have to use SRIOV vNIC failover adapters. This feature offers you the best of two worlds having the possibility to dedicate 10GB adapters with a failover capability without having to be worried about LPM or about NIB configuration. It’s not applicable in all cases but it’s definitely something to have for an environment such as TSM or network I/O intensive workloads. Use it !

    About reptilians !

    Before you start reading this, keep your sense of humor and be noticed that what I say is not related to my workplace at all it’s a general way of thinking not especially based on my experience. Don’t be offended by this it’s just a personal opinion based on things I may or may have not seen during my life. You’ve been warned.

    This blog was never a place to share my opinions about life and society but I must admit that I should have done that before. Speaking about this kind of things makes you feel alive in world where everything needs to be ok and where you don’t have anymore the right to feel or express something about what you are living. There are couple of good blog posts speaking of this kind of things and related to the IT world. I agree with all of what is said in these posts. Some of the authors of these posts are just telling what they love in their daily jobs but I think it’s also a way to say what they probably won’t love in another one :-) :

    • Adam Leventhal’s “I’m not a resource”: here
    • Brendan Gregg’s “Working at Netflix in 2016″: here

    All of this to say that I work at nights, I work on weekends, I’m thinking about PowerSystems/computers when I fall asleep. I always have new ideas and I always want to learn new things, discover new technologies and features. I truly, deeply love this but being like this does not help me and will never help me in my daily job for one single reason. In this world people who have the knowledge are not people who are taking technical decisions it’s sad but true. I’m just good at working the most I can for the less money possible. Nobody cares if techs are happy, unhappy, want to stay or leave. I doesn’t make any differences for anyone driving a company. What’s important is money. Everything is meaningless. We are no one we are nothing, just number in a excel spreadsheet. I’m probably saying because I’m not good enough in anything to find an acceptable workplace. Once again sad but true.

    Even worst, if you just want to follow what’s the industry is asking you have to be everywhere and know everything. I know I’ll be forced in a very near future to move on Devops/ Linux (I love Linux I’m an RHCE certified engineer !). That’s why since a couple of years now, at night after my daily job is finished I’m working again: working to understand how Docker is working, working to install my own Openstack on my own machines, working to understand Saltstack, Ceph, Python, Ruby, Go …. it’s a never ending process. But it’s still not enough for them ! No enough to be consider as good or good enough guy to fit for a job. I remember being asked to know about Openstack, Cassandra, Hadoop, AWS, KVM, Linux, Automation tools (puppet this time), Docker and continuous integration for one single job application. First, I seriously doubt that someone will have such skills and be good at each. Second even if I’m an expert on each one if you have a look a few years ago it was the exact same thing but with different products. You have to understand and be good at every new products in minutes. All of this to understand that one or two years after you are considered as an “expert” you are bad at everything that exists in the industry. I’m really sick of this fight against something I can’t control. Being a hard worker and clever enough to understand every new features is not enough nowadays. On top of that you also need to be a beautiful person with a nice perfect smile wearing a perfect suit. You also have to be on LinkedIn and be connected with the good persons. And even if every of these boxes are checked you still need to be lucky enough to be at the right place at the right moment. I’m so sick of this. Work doesn’t pay. Only luck. I don’t want to live in this kind of world but I have to. Anyway this is just a “two-cents” way of thinking. Everything is probably a big trick orchestrated by this reptilians lizard mens ! ^^. Be good at what you do and don’t care about what people are thinking of you (even your horrible french accent during your sessions) … that’s the most important !

    picture-of-reptilian-alien

    Putting NovaLink in Production & more PowerVC (1.3.1.2) tips and tricks

    I’ve been quite busy and writing the blog is getting to be more and more difficult with the amount of work I have but I try to stick to my thing as writing these blogs posts is almost the only thing I can do properly in my whole life. So why do without ? As my place is one of the craziest place I have ever worked in -(for the good … and the bad (I’ll not talk here about how are the things organized here or how is the recognition of your work but be sure it is probably be one the main reason I’ll probably leave this place one day or another)- the PowerSystems growth is crazy and the number of AIX partitions we are managing with PowerVC never stops increasing and I think that we are one the biggest PowerVC customer in the whole world (I don’t know if it is a good thing or not). Just to give you a couple of examples we have here on the biggest Power Enterprise Pool I have ever seen (384 Power8 mobile cores), the number of partitions managed by PowerVC is around 2600 and we have a PowerVC managing almost 30 hosts. You have understand well … theses numbers are huge. It’s seems to be very funny, but it’s not ; the growth is problem, a technical problem and we are facing problems that most of you will never hit. I’m speaking about density and scalability. Hopefully for us the “vertical” design of PowerVC can now be replaced by what I call an “horizontal” design. Instead of putting all the nova instances on one single machine, we now have the possibility to spread the load on each host by using NovaLink. As we needed to solve these density and scalability problems we decided to move all the P8 hosts to NovaLink (this process is still ongoing but most of the engineering stuffs are already done). As you now know we are not deploying a host every year but generally a couple by month and that’s why we needed to find a solution to automate this. So this blog post will talk about all the things and the best practices I have learn using and implementing NovaLink in a huge production environment (automated installation, tips and tricks, post-install, migration and so on). But we will not stop here I’ll also talk about the new things I have learn about PowerVC (1.3.1.2 and 1.3.0.1) and give more tips and tricks to use the product as it best. Before going any further I’d first want to say a big thank you to the whole PowerVC team for their kindness and the precious time they gave to us to advise and educate the OpenStack noob I am. (A special thanks to Drew Thorstensen for the long discussions we had about Openstack and PowerVC. He is probably one the most passionate guy I have ever met at IBM).

    Novalink Automated installation

    I’ll not write big introduction, let’s work and let’s start with NovaLink and how to automate the Novalink installation process. Copy the content of the installation cdrom to a directory that can be served by an http server on your NIM server (I’m using my NIM server for the bootp and tftp part). Note that I’m doing this with a tar command because there are symbolic links in the iso and a simple cp will end up with a full filesystem.

    # loopmount -i ESD_-_PowerVM_NovaLink_V1.0.0.3_062016.iso -o "-V cdrfs -o ro" -m /mnt
    # tar cvf iso.tar /mnt/*
    # tar xvf ios.tar -C /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso
    # ls -l /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso
    total 320
    dr-xr-xr-x    2 root     system          256 Jul 28 17:54 .disk
    -r--r--r--    1 root     system          243 Apr 20 21:27 README.diskdefines
    -r--r--r--    1 root     system         3053 May 25 22:25 TRANS.TBL
    dr-xr-xr-x    3 root     system          256 Apr 20 11:59 boot
    dr-xr-xr-x    3 root     system          256 Apr 20 21:27 dists
    dr-xr-xr-x    3 root     system          256 Apr 20 21:27 doc
    dr-xr-xr-x    2 root     system         4096 Aug 09 15:59 install
    -r--r--r--    1 root     system       145981 Apr 20 21:34 md5sum.txt
    dr-xr-xr-x    2 root     system         4096 Apr 20 21:27 pics
    dr-xr-xr-x    3 root     system          256 Apr 20 21:27 pool
    dr-xr-xr-x    3 root     system          256 Apr 20 11:59 ppc
    dr-xr-xr-x    2 root     system          256 Apr 20 21:27 preseed
    dr-xr-xr-x    4 root     system          256 May 25 22:25 pvm
    lrwxrwxrwx    1 root     system            1 Aug 29 14:55 ubuntu -> .
    dr-xr-xr-x    3 root     system          256 May 25 22:25 vios
    

    Prepare the PowerVM NovaLink repository. The content of the repository can be found in the NovaLink iso image in pvm/repo/pvmrepo.tgz:

    # ls -l /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/pvm/repo/
    total 720192
    -r--r--r--    1 root     system          223 May 25 22:25 TRANS.TBL
    -rw-r--r--    1 root     system         2106 Sep 05 15:56 pvm-install.cfg
    -r--r--r--    1 root     system    368722592 May 25 22:25 pvmrepo.tgz
    

    Extract the content of this tgz file in a directory that can be served by the http server:

    # mkdir /export/nim/lpp_source/powervc/novalink/1.0.0.3/pvmrepo
    # cp /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/pvm/repo/pvmrepo.tgz
    # cd /export/nim/lpp_source/powervc/novalink/1.0.0.3/pvmrepo
    # gunzip pvmrepo.tgz
    # tar xvf pvmrepo.tar
    [..]
    x ./pool/non-free/p/pvm-core/pvm-core-dbg_1.0.0.3-160525-2192_ppc64el.deb, 54686380 bytes, 106810 media blocks.
    x ./pool/non-free/p/pvm-core/pvm-core_1.0.0.3-160525-2192_ppc64el.deb, 2244784 bytes, 4385 media blocks.
    x ./pool/non-free/p/pvm-core/pvm-core-dev_1.0.0.3-160525-2192_ppc64el.deb, 618378 bytes, 1208 media blocks.
    x ./pool/non-free/p/pvm-pkg-tools/pvm-pkg-tools_1.0.0.3-160525-492_ppc64el.deb, 170700 bytes, 334 media blocks.
    x ./pool/non-free/p/pvm-rest-server/pvm-rest-server_1.0.0.3-160524-2229_ppc64el.deb, 263084432 bytes, 513837 media blocks.
    # rm pvmrepo.tar 
    # ls -l 
    total 16
    drwxr-xr-x    2 root     system          256 Sep 11 13:26 conf
    drwxr-xr-x    2 root     system          256 Sep 11 13:26 db
    -rw-r--r--    1 root     system          203 May 26 02:19 distributions
    drwxr-xr-x    3 root     system          256 Sep 11 13:26 dists
    -rw-r--r--    1 root     system         3132 May 24 20:25 novalink-gpg-pub.key
    drwxr-xr-x    4 root     system          256 Sep 11 13:26 pool
    

    Copy the NovaLink boot files in a directory that can be served by your tftp server (I’m using /var/lib/tftpboot):

    # mkdir /var/lib/tftpboot
    # cp -r /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/pvm /var/lib/tftpboot
    # ls -l /var/lib/tftpboot
    total 1016
    -r--r--r--    1 root     system         1120 Jul 26 20:53 TRANS.TBL
    -r--r--r--    1 root     system       494072 Jul 26 20:53 core.elf
    -r--r--r--    1 root     system          856 Jul 26 21:18 grub.cfg
    -r--r--r--    1 root     system        12147 Jul 26 20:53 pvm-install-config.template
    dr-xr-xr-x    2 root     system          256 Jul 26 20:53 repo
    dr-xr-xr-x    2 root     system          256 Jul 26 20:53 rootfs
    -r--r--r--    1 root     system         2040 Jul 26 20:53 sample_grub.cfg
    

    I still don’t know why this is the case on AIX but the tftp server is searching for the grub.cfg in the root directory of your AIX system. It’s not the case for my RedHat Enterprise Linux installation but it’s the case for the NovaLink/Ubuntu installation. Copy the sample-grub.cfg to /grub.cfg and modify the content of the file:

    • As the gateway, netmask and nameserver will be provided the the pvm-install-config.cfg (the configuration file of the Novalink installer we will talk about this later) file comment those three lines.
    • The hostname will still be needed.
    • Modify the linux line and point to the vmlinux file provided in the NovaLink iso image.
    • Modify the live-installer to point to the filesystem.squashfs provided in the NovaLink iso image.
    • Modify the pvm-repo line to point to the pvm-repository directory we created before.
    • Modify the pvm-installer line to point to the NovaLink install configuration file (we will modify this one after).
    • Don’t do anything with the pvm-vios line as we are installing NovaLink on a system already having Virtual I/O Servers installed (I’m not installing Scale Out system but high end models only).
    • I’ll talk later about the pvm-disk line (this line is not by default in the pvm-install-config.template provided in the NovaLink iso image).
    # cp /var/lib/tftpboot/sample_grub.cfg /grub.cfg
    # cat /grub.cfg
    # Sample GRUB configuration for NovaLink network installation
    set default=0
    set timeout=10
    
    menuentry 'PowerVM NovaLink Install/Repair' {
     insmod http
     insmod tftp
     regexp -s 1:mac_pos1 -s 2:mac_pos2 -s 3:mac_pos3 -s 4:mac_pos4 -s 5:mac_pos5 -s 6:mac_pos6 '(..):(..):(..):(..):(..):(..)' ${net_default_mac}
     set bootif=01-${mac_pos1}-${mac_pos2}-${mac_pos3}-${mac_pos4}-${mac_pos5}-${mac_pos6}
     regexp -s 1:prefix '(.*)\.(\.*)' ${net_default_ip}
    # Setup variables with values from Grub's default variables
     set ip=${net_default_ip}
     set serveraddress=${net_default_server}
     set domain=${net_ofnet_network_domain}
    # If tftp is desired, replace http with tftp in the line below
     set root=http,${serveraddress}
    # Remove comment after providing the values below for
    # GATEWAY_ADDRESS, NETWORK_MASK, NAME_SERVER_IP_ADDRESS
    # set gateway=10.10.10.1
    # set netmask=255.255.255.0
    # set namserver=10.20.2.22
      set hostname=nova0696010
    # In this sample file, the directory novalink is assumed to exist on the
    # BOOTP server and has the NovaLink ISO content
     linux /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/vmlinux \
     live-installer/net-image=http://${serveraddress}/export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/filesystem.squashfs \
     pkgsel/language-pack-patterns= \
     pkgsel/install-language-support=false \
     netcfg/disable_dhcp=true \
     netcfg/choose_interface=auto \
     netcfg/get_ipaddress=${ip} \
     netcfg/get_netmask=${netmask} \
     netcfg/get_gateway=${gateway} \
     netcfg/get_nameservers=${nameserver} \
     netcfg/get_hostname=${hostname} \
     netcfg/get_domain=${domain} \
     debian-installer/locale=en_US.UTF-8 \
     debian-installer/country=US \
    # The directory novalink-repo on the BOOTP server contains the content
    # of the pvmrepo.tgz file obtained from the pvm/repo directory on the
    # NovaLink ISO file.
    # The directory novalink-vios on the BOOTP server contains the files
    # needed to perform a NIM install of VIOS server(s)
    #  pvmdebug=1
     pvm-repo=http://${serveraddress}/export/nim/lpp_source/powervc/novalink/1.0.0.3/novalink-repo/ \
     pvm-installer-config=http://${serveraddress}/export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/pvm/repo/pvm-install.cfg \
     pvm-viosdir=http://${serveraddress}/novalink-vios \
     pvmdisk=/dev/mapper/mpatha \
     initrd /export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/install/netboot_initrd.gz
    }
    

    Modify the pvm-install.cfg, it’s the NovaLink installer configuration file. We just need to modify here the [SystemConfig],[NovaLinkGeneralSettings],[NovaLinkNetworkSettings],[NovaLinkAPTRepoConfig] and [NovaLinkAdminCredential]. My advice is to configure one NovaLink by hand (by doing an installation directly with the iso image, then after the installation your configuration file is saved in /var/log/pvm-install/novalink-install.cfg. You can copy this one as your template on your installation server. This file is filled by the answers you gave during the NovaLink installation)

    # more /export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/pvm/repo/pvm-install.cfg
    [SystemConfig]
    serialnumber = XXXXXXXX
    lmbsize = 256
    
    [NovaLinkGeneralSettings]
    ntpenabled = True
    ntpserver = timeserver1
    timezone = Europe/Paris
    
    [NovaLinkNetworkSettings]
    dhcpip = DISABLED
    ipaddress = YYYYYYYY
    gateway = ZZZZZZZZ
    netmask = 255.255.255.0
    dns1 = 8.8.8.8
    dns2 = 8.8.9.9
    hostname = WWWWWWWW
    domain = lab.chmod666.org
    
    [NovaLinkAPTRepoConfig]
    downloadprotocol = http
    mirrorhostname = nimserver
    mirrordirectory = /export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/
    mirrorproxy =
    
    [VIOSNIMServerConfig]
    novalink_private_ip = 192.168.128.1
    vios1_private_ip = 192.168.128.2
    vios2_private_ip = 192.168.128.3
    novalink_netmask = 255.255.128.0
    viosinstallprompt = False
    
    [NovaLinkAdminCredentials]
    username = padmin
    password = $6$N1hP6cJ32p17VMpQ$sdThvaGaR8Rj12SRtJsTSRyEUEhwPaVtCTvbdocW8cRzSQDglSbpS.jgKJpmz9L5SAv8qptgzUrHDCz5ureCS.
    userdescription = NovaLink System Administrator
    

    Finally modify the /etc/bootptab file and add a line matching your installation:

    # tail -1 /etc/bootptab
    nova0696010:bf=/var/lib/tftpboot/core.elf:ip=10.20.65.16:ht=ethernet:sa=10.255.228.37:gw=10.20.65.1:sm=255.255.255.0:
    

    Don’t forget to setup an http server, serving all the needed files. I know this configuration is super unsecured. But honestly I don’t care my NIM server is in a super secured network just accessible by the VIOS and NovaLink partition. So I’m good :-) :

    # cd /opt/freeware/etc/httpd/ 
    # grep -Ei "^Listen|^DocumentRoot" conf/httpd.conf
    Listen 80
    DocumentRoot "/"
    

    novaserved

    Instead of doing this over and over and over at every NovaLink installation I have written a custom script preparing my NovaLink installation file, what I do in this script is:

    • Preparing the pvm-install.cfg file.
    • Modifying the grub.cfg file.
    • Adding a line to the /etc/bootptab file.
    #  ./custnovainstall.ksh nova0696010 10.20.65.16 10.20.65.1 255.255.255.0
    #!/usr/bin/ksh
    
    novalinkname=$1
    novalinkip=$2
    novalinkgw=$3
    novalinknm=$4
    cfgfile=/export/nim/lpp_source/powervc/novalink/novalink-install.cfg
    desfile=/export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/pvm/repo/pvm-install.cfg
    grubcfg=/export/nim/lpp_source/powervc/novalink/grub.cfg
    grubdes=/grub.cfg
    
    echo "+--------------------------------------+"
    echo "NovaLink name: ${novalinkname}"
    echo "NovaLink IP: ${novalinkip}"
    echo "NovaLink GW: ${novalinkgw}"
    echo "NovaLink NM: ${novalinknm}"
    echo "+--------------------------------------+"
    echo "Cfg ref: ${cfgfile}"
    echo "Cfg file: ${cfgfile}.${novalinkname}"
    echo "+--------------------------------------+"
    
    typeset -u serialnumber
    serialnumber=$(echo ${novalinkname} | sed 's/nova//g')
    
    echo "SerNum: ${serialnumber}"
    
    cat ${cfgfile} | sed "s/serialnumber = XXXXXXXX/serialnumber = ${serialnumber}/g" | sed "s/ipaddress = YYYYYYYY/ipaddress = ${novalinkip}/g" | sed "s/gateway = ZZZZZZZZ/gateway = ${novalinkgw}
    /g" | sed "s/netmask = 255.255.255.0/netmask = ${novalinknm}/g" | sed "s/hostname = WWWWWWWW/hostname = ${novalinkname}/g" > ${cfgfile}.${novalinkname}
    cp ${cfgfile}.${novalinkname} ${desfile}
    cat ${grubcfg} | sed "s/  set hostname=WWWWWWWW/  set hostname=${novalinkname}/g" > ${grubcfg}.${novalinkname}
    cp ${grubcfg}.${novalinkname} ${grubdes}
    # nova1009425:bf=/var/lib/tftpboot/core.elf:ip=10.20.65.15:ht=ethernet:sa=10.255.248.37:gw=10.20.65.1:sm=255.255.255.0:
    echo "${novalinkname}:bf=/var/lib/tftpboot/core.elf:ip=${novalinkip}:ht=ethernet:sa=10.255.248.37:gw=${novalinkgw}:sm=${novalinknm}:" >> /etc/bootptab
    

    Novalink installation: vSCSI or NPIV ?

    NovaLink is not designed to be installed of top of NPIV it’s a fact. As it is designed to be installed on a totally new system without any Virtual I/O Servers configured the NovaLink installation is by default creating the Virtual I/O Servers and using these VIOS the installation process is creating backing devices on top of logical volumes created in the default VIOS storage pool. Then the Novalink installation partition is created on top of these two logical volumes and at the end mirrored. This is the way NovaLink is doing for Scale Out systems.

    For High End systems NovaLink is assuming your going to install the NovaLink partition on top of vSCSI (have personnaly tried with hdisk backed and SSP Logical Unit backed and both are working ok). For those like me who wants to install NovaLink on top of NPIV (I know this is not a good choice, but once again I was forced to do that) there still is a possiblity to do it. (In my humble opinion the NPIV design is done for high performance and the Novalink partition is not going to be an I/O intensive partition. Even worse our whole new design is based on NPIV for LPARs …. it’s a shame as NPIV is not a solution designed for high denstity and high scalability. Every PowerVM system administrator should remember this. NPIV IS NOT A GOOD CHOICE FOR DENSITY AND SCALABILITY USE IT FOR PERFORMANCE ONLY !!!. The story behind this is funny. I’m 100% sure that SSP is ten time a better choice to achieve density and scalability. I decided to open a poll on twitter asking this question “Will you choose SSP or NPIV to design a scalable AIX cloud based on PowerVC ?”. I was 100% sure SSP will win and made a bet with friend (I owe him beers now) that I’ll be right. What was my surprise when seeing the results. 90% of people vote for NPIV. I’m sorry to say that guys but there are two possibilities: 1/ You don’t really know what scalability and density means because you never faced it so that’s why you made the wrong choice. 2/ You know it and you’re just wrong :-) . This little story is another proof telling that IBM is not responsible about the dying of AIX and PowerVM … but unfortunately you are responsible of it not understanding that the only way to survive is to face high scalable solution like Linux is doing with Openstack and Ceph. It’s a fact. Period.)

    This said … if you are trying to install NovaLink on top of NPIV you’ll get an error. A workaround to this problem is to add the following line to the grub.cfg file

     pvmdisk=/dev/mapper/mpatha \
    

    If you do that you’ll be able to install NovaLink on your NPIV disk but still have an error the first time you’ll install it at the “grub-install step”. Just re-run the installation a second time and the grub-install command will work ok :-) (I’ll explain how to do to avoid this second issue later).

    One work-around to this second issue is to recreate the initrd by adding a line in the debian-installer config file.

    Fully automated installation by example

    • Here the core.elf file is downloaded by tftp. You can se in the capture below that the grub.cfg file is searched in / :
    • 1m
      13m

    • The installer is starting:
    • 2

    • The vmlinux is downloaded (http):
    • 3

    • The root.squashfs is downloaded (http):
    • 4m

    • The pvm-install.cfg configuration file is downloaded (http):
    • 5

    • pvm services are started. At this time if you are running in co-management mode you’ll see the Red lock in the HMC Server status:
    • 6

    • The Linux and Novalink istallation is ongoing:
    • 7
      8
      9
      10
      11
      12

    • System is ready:
    • 14

    Novalink code auto update

    When adding a NovaLink host to PowerVC the powervc packages coming from the powervc management host will be installed on the NovaLink partition. You can check this during the installation. Here is what’s going on when adding the NovaLink host to PowerVC:

    15
    16

    # cat /opt/ibm/powervc/log/powervc_install_2016-09-11-164205.log
    ################################################################################
    Starting the IBM PowerVC Novalink Installation on:
    2016-09-11T16:42:05+02:00
    ################################################################################
    
    LOG file is /opt/ibm/powervc/log/powervc_install_2016-09-11-164205.log
    
    2016-09-11T16:42:05.18+02:00 Installation directory is /opt/ibm/powervc
    2016-09-11T16:42:05.18+02:00 Installation source location is /tmp/powervc_img_temp_1473611916_1627713/powervc-1.3.1.2
    [..]
    Setting up python-neutron (10:8.0.0-201608161728.ibm.ubuntu1.375) ...
    Setting up neutron-common (10:8.0.0-201608161728.ibm.ubuntu1.375) ...
    Setting up neutron-plugin-ml2 (10:8.0.0-201608161728.ibm.ubuntu1.375) ...
    Setting up ibmpowervc-powervm-network (1.3.1.2) ...
    Setting up ibmpowervc-powervm-oslo (1.3.1.2) ...
    Setting up ibmpowervc-powervm-ras (1.3.1.2) ...
    Setting up ibmpowervc-powervm (1.3.1.2) ...
    W: --force-yes is deprecated, use one of the options starting with --allow instead.
    
    ***************************************************************************
    IBM PowerVC Novalink installation
     successfully completed at 2016-09-11T17:02:30+02:00.
     Refer to
     /opt/ibm/powervc/log/powervc_install_2016-09-11-165617.log
     for more details.
    ***************************************************************************
    

    17

    Installing the missing deb packages if NovaLink host was added before PowerVC upgrade

    If the NovaLink host was added in PowerVC 1.3.1.1 and you updated to PowerVC 1.3.1.2 you have to update the package by hand because there is a little bug during the update of some packages:

    • From the PowerVC management host copy the latest packages to the NovaLink host:
    • # scp /opt/ibm/powervc/images/powervm/powervc-powervm-compute-1.3.1.2.tgz padmin@nova0696010:~
      padmin@nova0696010's password:
      powervc-powervm-compute-1.3.1.2.tgz
      
    • Update the packages on the NovaLink host
    • # tar xvzf powervc-powervm-compute-1.3.1.2.tgz
      # cd powervc-1.3.1.2/packages/powervm
      # dpkg -i nova-powervm_2.0.3-160816-48_all.deb
      # dpkg -i networking-powervm_2.0.1-160816-6_all.deb
      # dpkg -i ceilometer-powervm_2.0.1-160816-17_all.deb
      # /opt/ibm/powervc/bin/powervc-services restart
      

    rsct and pvm deb update

    Never forget to install latest rsct and pvm packages after the installation. You can clone the official IBM repository for pvm and rsct files (you can check my previous post about Novalink for more details about cloning the repository). Then create two files in /etc/apt/sources.list.d one for pvm, the other for rsct

    # vi /etc/apt/sources.list.d/pvm.list
    deb http://nimserver/export/nim/lpp_source/powervc/novalink/nova/debian novalink_1.0.0 non-free
    # vi /etc/apt/source.list.d/rsct.list
    deb http://nimserver/export/nim/lpp_source/powervc/novalink/rsct/ubuntu xenial main
    # dpkg -l | grep -i rsct
    ii  rsct.basic                                3.2.1.0-15300                           ppc64el      Reliable Scalable Cluster Technology - Basic
    ii  rsct.core                                 3.2.1.3-16106-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Core
    ii  rsct.core.utils                           3.2.1.3-16106-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Utilities
    # # dpkg -l | grep -i pvm
    ii  pvm-cli                                   1.0.0.3-160516-1488                     all          Power VM Command Line Interface
    ii  pvm-core                                  1.0.0.3-160525-2192                     ppc64el      PVM core runtime package
    ii  pvm-novalink                              1.0.0.3-160525-1000                     ppc64el      Meta package for all PowerVM Novalink packages
    ii  pvm-rest-app                              1.0.0.3-160524-2229                     ppc64el      The PowerVM NovaLink REST API Application
    ii  pvm-rest-server                           1.0.0.3-160524-2229                     ppc64el      Holds the basic installation of the REST WebServer (Websphere Liberty Profile) for PowerVM NovaLink 
    # apt-get install rsct.core rsct.basic
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following packages were automatically installed and are no longer required:
      docutils-common libpaper-utils libpaper1 python-docutils python-roman
    Use 'apt autoremove' to remove them.
    The following additional packages will be installed:
      rsct.core.utils src
    The following packages will be upgraded:
      rsct.core rsct.core.utils src
    3 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.
    Need to get 9,356 kB of archives.
    After this operation, 548 kB disk space will be freed.
    [..]
    # apt-get install pvm-novalink
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following packages were automatically installed and are no longer required:
      docutils-common libpaper-utils libpaper1 python-docutils python-roman
    Use 'apt autoremove' to remove them.
    The following additional packages will be installed:
      pvm-core pvm-rest-app pvm-rest-server pypowervm
    The following packages will be upgraded:
      pvm-core pvm-novalink pvm-rest-app pvm-rest-server pypowervm
    5 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
    Need to get 287 MB of archives.
    After this operation, 203 kB of additional disk space will be used.
    Do you want to continue? [Y/n] Y
    [..]
    

    After the installation, here is what you should have if everything was updated properly:

    dpkg -l | grep rsct
    ii  rsct.basic                                3.2.1.4-16154-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Basic
    ii  rsct.core                                 3.2.1.4-16154-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Core
    ii  rsct.core.utils                           3.2.1.4-16154-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Utilities
    dpkg -l | grep pvm
    ii  pvm-cli                                   1.0.0.3-160516-1488                     all          Power VM Command Line Interface
    ii  pvm-core                                  1.0.0.3.1-160713-2441                   ppc64el      PVM core runtime package
    ii  pvm-novalink                              1.0.0.3.1-160714-1152                   ppc64el      Meta package for all PowerVM Novalink packages
    ii  pvm-rest-app                              1.0.0.3.1-160713-2417                   ppc64el      The PowerVM NovaLink REST API Application
    ii  pvm-rest-server                           1.0.0.3.1-160713-2417                   ppc64el      Holds the basic installation of the REST WebServer (Websphere Liberty Profile) for PowerVM NovaLink
    

    Novalink post-installation (my ansible way to do that)

    You all now know that I’m not very fond of doing the same things over and over again, that’s why I have create an ansible post-install playbook especially for NovaLink post installation. You can download it here: nova_ansible. Then install ansible on a host that has an ssh access to all your NovaLink partitions and run the the ansible playbook:

    • Untar the ansible playbook:
    • # mkdir /srv/ansible
      # cd /srv/ansible
      # tar xvf novalink_ansible.tar 
      
    • Modify the group_vars/novalink.yml to fit your environment:
    • # cat group_vars/novalink.yml
      ntpservers:
        - ntpserver1
        - ntpserver2
      dnsservers:
        - 8.8.8.8
        - 8.8.9.9
      dnssearch:
        - lab.chmod666.org
      vepa_iface: ibmveth6
      repo: nimserver
      
    • Share root ssh key to the NovaLink host (be careful by default NovaLink does not allow root login you have to modify the sshd configuration file):
    • Put all your Novalink hosts into the inventory file:
    • #cat inventories/hosts.novalink
      [novalink]
      nova65a0cab
      nova65ff4cd
      nova10094ef
      nova06960ab
      
    • Run ansible-playbook and you’re done:
    • # ansible-playbook -i inventories/hosts.novalink site.yml
      

      ansible1
      ansible2
      ansible3

    More details about NovaLink

    MGMTSWITCH vswitch automatic creation

    Do not try to create the MGMTSWITCH by yourself. The NovaLink installer is doing it for you. As my Virtual I/O Servers are installed using the IBM Provisioning Toolkit for PowerVM … I was creating the MGMTSWITCH at this time but I was wrong. You can see this in the file /var/log/pvm-install/pvminstall.log on the NovaLink partition:

    # cat /var/log/pvm-install/pvminstall.log
    Fri Aug 12 17:26:07 UTC 2016: PVMDebug = 0
    Fri Aug 12 17:26:07 UTC 2016: Running initEnv
    [..]
    Fri Aug 12 17:27:08 UTC 2016: Using user provided pvm-install configuration file
    Fri Aug 12 17:27:08 UTC 2016: Auto Install set
    [..]
    Fri Aug 12 17:27:44 UTC 2016: Auto Install = 1
    Fri Aug 12 17:27:44 UTC 2016: Validating configuration file
    Fri Aug 12 17:27:44 UTC 2016: Initializing private network configuration
    Fri Aug 12 17:27:45 UTC 2016: Running /opt/ibm/pvm-install/bin/switchnetworkcfg -o c
    Fri Aug 12 17:27:46 UTC 2016: Running /opt/ibm/pvm-install/bin/switchnetworkcfg -o n -i 3 -n MGMTSWITCH -p 4094 -t 1
    Fri Aug 12 17:27:49 UTC 2016: Start setupinstalldisk operation for /dev/mapper/mpatha
    Fri Aug 12 17:27:49 UTC 2016: Running updatedebconf
    Fri Aug 12 17:56:06 UTC 2016: Pre-seeding disk recipe
    

    NPIV lpar creation problem !

    As you know my environment is crazy. Every lpar we are creating have 4 virtual fibre channels adapters. Obviously two on fabric A and two on fabric B. And obviously again each fabric must be present on each Virtual I/O Servers. So to sum up. An lpar must have access to fabric A and B using VIOS1 and to fabric A and B using VIOS2. Unfortunately there was a little bug in the current NovaLink (1.0.0.3) code and all the lpar created were created with only two adapters. The PowerVC team gave my a patch to handle this particular issue patching the npiv.py file. This patch needs to be installed on the NovaLink partition itself.:

    # cd /usr/lib/python2.7/dist-packages/powervc_nova/virt/ibmpowervm/pvm/volume
    # sdiff npiv.py.back npiv.bck
    

    npivpb

    I’m intentionally not giving you the solution here (just by copying/pasting code) because an issue is addressed and an APAR has been opened for this issue and is resolved in 1.3.1.2 version. IT16534

    From NovaLink to HMC …. and the opposite

    One of the challenge for me was to be sure everything was working ok regarding LPM and NovaLink. So I decided to test different cases:

    • From NovaLink host to Novalink host (didn’t had any trouble) :-)
    • From NovaLink host to HMC host (didn’t had any trouble) :-)
    • From HMC host to Novalink host (had a trouble) :-(

    Once again this issue avoiding HMC to Novalink LPM to work correctly is related to storage. A patch is ongoing but let me explain this issue a little bit (only if you have to absolutely move an LPAR from HMC to NovaLink and your are in the same case as I am):

    PowerVC is not correctly doing the mapping to the destination Virtual I/O Servers and is trying to map two times the fabric A on the VIOS1 and two time the fabric B on the VIOS2. Hopefully for us you can do the migration by hand :

    • Do the LPM operation from PowerVC and check on the HMC side how PowerVC is doing the mapping (log on the HMC to check this):
    • #  lssvcevents -t console -d 0 | grep powervc_admin | grep migrlpar
      time=08/31/2016 18:53:27,"text=HSCE2124 User name powervc_admin: migrlpar -m 9119-MME-656C38A -t 9119-MME-65A0C31 --id 18 --ip 10.22.33.198 -u wlp -i ""virtual_fc_mappings=6/vios1/2//fcs2,3/vios2/1//fcs2,4/vios2/1//fcs1,5/vios1/2//fcs1"",shared_proc_pool_id=0 -o m command failed."
      
    • One interesting point you can see here is that the NovaLink user used for LPM is not padmin but wlp. Have look on the Novalink machine if you are a little bit curious:
    • 18

    • If you are double checking the mapping you’ll see that PowerVC is mixing up the VIOS. Just rerun the command in the right order and you’ll see that you’re going to be able to do HMC to NovaLink LPM (By the way PowerVC is automattically detecting that the host has changed for this lpar (moved outside of PowerVC)):
    • # migrlpar -m 9119-MME-656C38A -t 9119-MME-65A0C31 --id 18 --ip 10.22.33.198 -u wlp -i '"virtual_fc_mappings=6/vios2/1//fcs2,3/vios1/2//fcs2,4/vios2/1//fcs1,5/vios1/2//fcs1"',shared_proc_pool_id=0 -o m
      # lssvcevents -t console -d 0 | grep powervc_admin | grep migrlpar
      time=08/31/2016 19:13:00,"text=HSCE2123 User name powervc_admin: migrlpar -m 9119-MME-656C38A -t 9119-MME-65A0C31 --id 18 --ip 10.22.33.198 -u wlp -i ""virtual_fc_mappings=6/vios2/1//fcs2,3/vios1/2//fcs2,4/vios2/1//fcs1,5/vios1/2//fcs1"",shared_proc_pool_id=0 -o m command was executed successfully."
      
    hmctonova

    One more time don't worry about this issue a patch is on the way. But I thought it was interessting to talk about it just to show you how PowerVC is handling this (user, key sharing, check on the HMC).

    Deep dive into the initrd

    I am curious and there is no way to change this. As I wanted to know how the NovaLink installer is working I had to check into the netboot_initrd.gz file. There are a lot of interesting stuff to check in this initrd. Run the commands below on a Linux partition if you also want to have a look:

    # scp nimdy:/export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/netboot_initrd.gz .
    # gunzip netboot_initrd
    # cpio -i < netboot_initrd
    185892 blocks
    

    The installer is located in opt/ibm/pvm-install:

    # ls opt/ibm/pvm-install/data/
    40mirror.pvm  debpkgs.txt  license.txt  nimclient.info  pvm-install-config.template  pvm-install-preseed.cfg  rsct-gpg-pub.key  vios_diagram.txt
    # ls opt/ibm/pvm-install/bin
    assignio.py        envsetup        installpvm                    monitor        postProcessing    pvmwizardmain.py  restore.py        switchnetworkcfg  vios
    cfgviosnetwork.py  functions       installPVMPartitionWizard.py  network        procmem           recovery          setupinstalldisk  updatedebconf     vioscfg
    chviospasswd       getnetworkinfo  ioadapter                     networkbridge  pvmconfigdata.py  removemem         setupviosinstall  updatenimsetup    welcome.py
    editpvmconfig      initEnv         mirror                        nimscript      pvmtime           resetsystem       summary.py        user              wizpkg
    

    You can for instance check what's the installer is exactly doing. Let's take again the exemple of the MGMTSWITCH creation, you can see in the output below that I was right saying that:

    initrd1

    Remember that I was telling you before that I had problem with installation on NPIV. You can avoid installing NovaLink two times by modifying the debian installer directly in the initrd by adding a line in the debian installer file opt/ibm/pvm-install/data/pvm-install-preseed.cfg (you have to rebuild the initrd after doing this) :

    # grep bootdev opt/ibm/pvm-install/data/pvm-install-preseed.cfg
    d-i grub-installer/bootdev string /dev/mapper/mpatha
    # find | cpio -H newc -o > ../new_initrd_file
    # gzip -9 ../new_initrd_file
    # scp ../new_initrdfile.gz nimdy:/export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/netboot_initrd.gz
    

    You can also find good example here of pvmctl commands:

    # grep -R pvmctl *
    pvmctl lv create --size $LV_SIZE --name $LV_NAME -p id=$vid
    pvmctl scsi create --type lv --vg name=rootvg --lpar id=1 -p id=$vid --stor-id name=$LV_NAME
    

    Troubleshooting

    NovaLink is not PowerVC so here is a little reminder of what I do to troubleshot Novalink:

    • Installation troubleshooting:
    • #cat /var/log/pvm-install/pvminstall.log
      
    • Neutron Agent log (always double check this one):
    • # cat /var/log/neutron/neutron-powervc-pvm-sea-agent.log
      
    • Nova logs for this host are not accessible on the PowerVC management host anymore, so check it on the NovaLink partition if needed:
    • # cat /var/log/nova/nova-compute.log
      
    • pvmctl logs:
    • # cat /var/log/pvm/pvmctl.log
      

    One last thing to add about NovaLink. One thing I like a lot is that Novalink is doing backups of the system and VIOS hourly/daily. These backup are stored in /var/backup/pvm :

    # crontab -l
    # VIOS hourly backups - at 15 past every hour except for midnight
    15 1-23 * * * /usr/sbin/pvm-backup --type vios --frequency hourly
    # Hypervisor hourly backups - at 15 past every hour except for midnight
    15 1-23 * * * /usr/sbin/pvm-backup --type system --frequency hourly
    # VIOS daily backups - at 15 past midnight
    15 0    * * * /usr/sbin/pvm-backup --type vios --frequency daily
    # Hypervisor daily backups - at 15 past midnight
    15 0    * * * /usr/sbin/pvm-backup --type system --frequency daily
    #ls -l /var/backups/pvm
    total 4
    drwxr-xr-x 2 root pvm_admin 4096 Sep  9 00:15 9119-MME*0265FF47B
    

    More PowerVC tips and tricks

    Let's finish this blog post with more PowerVC tips and tricks. Before giving you the tricks I have to warn you. All of these tricks are not supported by PowerVC, use them at your own risk OR contact your support before doing anything else. You may break and destroy everything if you are not aware of what you are doing. So please be very careful using all these tricks. YOU HAVE BEEN WARNED !!!!!!

    Accessing and querying the database

    This first trick is funny and will allow you to query and modify the PowerVC database. Once again do this a your own risks. One of the issue I had was strange. I do not remeber how it happends exactly but some of my luns that were not attached to any hosts and were still showing an attachmenent number equals to 1 and I didn't had the possibility to remove it. Even worse someone has deleted these luns on the SVC side. So these luns were what I called "ghost lun". Non existing but non-deletable luns. (I had also to remove the storage provider related to these luns). The only way to change this was to change the state to detached directly in the cinder database. Be careful this trick is only working with MariaDB.

    First get the database password. Get the encrypted password from /opt/ibm/powervc/data/powervc-db.conf file and decode it to have the clear password:

    # grep ^db_password /opt/ibm/powervc/data/powervc-db.conf
    db_password = aes-ctr:NjM2ODM5MjM0NTAzMTg4MzQzNzrQZWi+mrUC+HYj9Mxi5fQp1XyCXA==
    # python -c "from powervc_keystone.encrypthandler import EncryptHandler; print EncryptHandler().decode('aes-ctr:NjM2ODM5MjM0NTAzMTg4MzQzNzrQZWi+mrUC+HYj9Mxi5fQp1XyCXA==')"
    OhnhBBS_gvbCcqHVfx2N
    # mysql -u root -p cinder
    Enter password:
    MariaDB [cinder]> MariaDB [cinder]> show tables;
    +----------------------------+
    | Tables_in_cinder           |
    +----------------------------+
    | backups                    |
    | cgsnapshots                |
    | consistencygroups          |
    | driver_initiator_data      |
    | encryption                 |
    [..]
    

    Then get the lun uuid on the PowerVC gui for the lun you want to change, and follow the commands below:

    dummy

    MariaDB [cinder]> select * from volume_attachment where volume_id='9cf6d85a-3edd-4ab7-b797-577ff6566f78' \G
    *************************** 1. row ***************************
       created_at: 2016-05-26 08:52:51
       updated_at: 2016-05-26 08:54:23
       deleted_at: 2016-05-26 08:54:23
          deleted: 1
               id: ce4238b5-ea39-4ce1-9ae7-6e305dd506b1
        volume_id: 9cf6d85a-3edd-4ab7-b797-577ff6566f78
    attached_host: NULL
    instance_uuid: 44c7a72c-610c-4af1-a3ed-9476746841ab
       mountpoint: /dev/sdb
      attach_time: 2016-05-26 08:52:51
      detach_time: 2016-05-26 08:54:23
      attach_mode: rw
    attach_status: attached
    1 row in set (0.01 sec)
    MariaDB [cinder]> select * from volumes where id='9cf6d85a-3edd-4ab7-b797-577ff6566f78' \G
    *************************** 1. row ***************************
                     created_at: 2016-05-26 08:51:57
                     updated_at: 2016-05-26 08:54:23
                     deleted_at: NULL
                        deleted: 0
                             id: 9cf6d85a-3edd-4ab7-b797-577ff6566f78
                         ec2_id: NULL
                        user_id: 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9
                     project_id: 1471acf124a0479c8d525aa79b2582d0
                           host: pb01_mn_svc_qual
                           size: 1
              availability_zone: nova
                         status: available
                  attach_status: attached
                   scheduled_at: 2016-05-26 08:51:57
                    launched_at: 2016-05-26 08:51:59
                  terminated_at: NULL
                   display_name: dummy
            display_description: NULL
              provider_location: NULL
                  provider_auth: NULL
                    snapshot_id: NULL
                 volume_type_id: e49e9cc3-efc3-4e7e-bcb9-0291ad28df42
                   source_volid: NULL
                       bootable: 0
              provider_geometry: NULL
                       _name_id: NULL
              encryption_key_id: NULL
               migration_status: NULL
             replication_status: disabled
    replication_extended_status: NULL
        replication_driver_data: NULL
            consistencygroup_id: NULL
                    provider_id: NULL
                    multiattach: 0
                previous_status: NULL
    1 row in set (0.00 sec)
    MariaDB [cinder]> update volume_attachment set attach_status='detached' where volume_id='9cf6d85a-3edd-4ab7-b797-577ff6566f78';
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    MariaDB [cinder]> update volumes set attach_status='detached' where id='9cf6d85a-3edd-4ab7-b797-577ff6566f78';
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    

    The second issue I had was about having some machines in deleted state but the reality was that the HMC just rebooted and for an unknow reason these machines where seen as 'deleted' .. but they were not. Using this trick I was able to force a re-evalutation of each machine is this case:

    #  mysql -u root -p nova
    Enter password:
    MariaDB [nova]> select * from instance_health_status where health_state='WARNING';
    +---------------------+---------------------+------------+---------+--------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+
    | created_at          | updated_at          | deleted_at | deleted | id                                   | health_state | reason                                                                                                                                                                                                                | unknown_reason_details |
    +---------------------+---------------------+------------+---------+--------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+
    | 2016-07-11 08:58:37 | NULL                | NULL       |       0 | 1af1805c-bb59-4bc9-8b6d-adeaeb4250f3 | WARNING      | [{"resource_local": "server", "display_name": "p00ww6754398", "resource_property_key": "rmc_state", "resource_property_value": "initializing", "resource_id": "1af1805c-bb59-4bc9-8b6d-adeaeb4250f3"}]                |                        |
    | 2015-07-31 16:53:50 | 2015-07-31 18:49:50 | NULL       |       0 | 2668e808-10a1-425f-a272-6b052584557d | WARNING      | [{"resource_local": "server", "display_name": "multi-vol", "resource_property_key": "vm_state", "resource_property_value": "deleted", "resource_id": "2668e808-10a1-425f-a272-6b052584557d"}]                         |                        |
    | 2015-08-03 11:22:38 | 2015-08-03 15:47:41 | NULL       |       0 | 2934fb36-5d91-48cd-96de-8c16459c50f3 | WARNING      | [{"resource_local": "server", "display_name": "clouddev-test-754df319-00000038", "resource_property_key": "rmc_state", "resource_property_value": "inactive", "resource_id": "2934fb36-5d91-48cd-96de-8c16459c50f3"}] |                        |
    | 2016-07-11 09:03:59 | NULL                | NULL       |       0 | 3fc42502-856b-46a5-9c36-3d0864d6aa4c | WARNING      | [{"resource_local": "server", "display_name": "p00ww3254401", "resource_property_key": "rmc_state", "resource_property_value": "initializing", "resource_id": "3fc42502-856b-46a5-9c36-3d0864d6aa4c"}]                |                        |
    | 2015-07-08 20:11:48 | 2015-07-08 20:14:09 | NULL       |       0 | 54d02c60-bd0e-4f34-9cb6-9c0a0b366873 | WARNING      | [{"resource_local": "server", "display_name": "p00wb3740870", "resource_property_key": "rmc_state", "resource_property_value": "inactive", "resource_id": "54d02c60-bd0e-4f34-9cb6-9c0a0b366873"}]                    |                        |
    | 2015-07-31 17:44:16 | 2015-07-31 18:49:50 | NULL       |       0 | d5ec2a9c-221b-44c0-8573-d8e3695a8dd7 | WARNING      | [{"resource_local": "server", "display_name": "multi-vol-sp5", "resource_property_key": "vm_state", "resource_property_value": "deleted", "resource_id": "d5ec2a9c-221b-44c0-8573-d8e3695a8dd7"}]                     |                        |
    +---------------------+---------------------+------------+---------+--------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+
    6 rows in set (0.00 sec)
    MariaDB [nova]> update instance_health_status set health_state='PENDING',reason='' where health_state='WARNING';
    Query OK, 6 rows affected (0.00 sec)
    Rows matched: 6  Changed: 6  Warnings: 0
    

    pending

    The ceilometer issue

    When updating from PowerVC 1.3.0.1 to 1.3.1.1 PowerVC is changing the database backend from DB2 to MariaDB. This is a good thing but the way the update is done is by exporting all the data in flat files and then re-inserting it in the MariaDB database records per records. I had a huge problem because of this, just because my ceilodb base was huge because of the number of machines I had and the number of operations we run on PowerVC since it is in production. The DB insert took more than 3 days and never finish. If you don't need the ceilo data my advice is to change the retention from 270 days y default to 2 hours:

    # powervc-config metering event_ttl --set 2 --unit hr 
    # ceilometer-expirer --config-file /etc/ceilometer/ceilometer.conf
    

    If this is not enough an you still experiencing problems regarding the update the best way is to flush the entire table before the update:

    # /opt/ibm/powervc/bin/powervc-services stop
    # /opt/ibm/powervc/bin/powervc-services db2 start
    # /bin/su - pwrvcdb -c "db2 drop database ceilodb2"
    # /bin/su - pwrvcdb -c "db2 CREATE DATABASE ceilodb2 AUTOMATIC STORAGE YES ON /home/pwrvcdb DBPATH ON /home/pwrvcdb USING CODESET UTF-8 TERRITORY US COLLATE USING SYSTEM PAGESIZE 16384 RESTRICTIVE"
    # /bin/su - pwrvcdb -c "db2 connect to ceilodb2 ; db2 grant dbadm on database to user ceilometer"
    # /opt/ibm/powervc/bin/powervc-dbsync ceilometer
    # /bin/su - pwrvcdb -c "db2 connect TO ceilodb2; db2 CALL GET_DBSIZE_INFO '(?, ?, ?, 0)' > /tmp/ceilodb2_db_size.out; db2 terminate" > /dev/null
    

    Multi tenancy ... how to deal with a huge environment

    As my environment is growing bigger and bigger I faced a couple people trying to force me to multiply the number of PowerVC machine we have. As Openstack is a solution designed to handle both density and scalability I said that doing this is just a "non-sense". Seriously people who still believe in this have not understand anything about the cloud, openstack and PowerVC. Hopefully we found a solution acceptable by everybody. As we are created what we are calling "building-block" we had to find a way to isolate one "block" from one another. The solution for host isolation is called mutly tenancy isolation. For the storage side we are just going to play with quotas. By doing this a user will be able to manage a couple of hosts and the associated storage (storage template) without having the right to do anything on the others:

    multitenancyisolation

    Before doing anything create the tenant (or project) and a user associated with it:

    # cat /opt/ibm/powervc/version.properties | grep cloud_enabled
    cloud_enabled = yes
    # ~/powervcrc
    export OS_USERNAME=root
    export OS_PASSWORD=root
    export OS_TENANT_NAME=ibm-default
    export OS_AUTH_URL=https://powervc.lab.chmod666.org:5000/v3/
    export OS_IDENTITY_API_VERSION=3
    export OS_CACERT=/etc/pki/tls/certs/powervc.crt
    export OS_REGION_NAME=RegionOne
    export OS_USER_DOMAIN_NAME=Default
    export OS_PROJECT_DOMAIN_NAME=Default
    export OS_COMPUTE_API_VERSION=2.25
    export OS_NETWORK_API_VERSION=2.0
    export OS_IMAGE_API_VERSION=2
    export OS_VOLUME_API_VERSION=2
    # source powervcrc
    # openstack project create hb01
    +-------------+----------------------------------+
    | Field       | Value                            |
    +-------------+----------------------------------+
    | description |                                  |
    | domain_id   | default                          |
    | enabled     | True                             |
    | id          | 90d064b4abea4339acd32a8b6a8b1fdf |
    | is_domain   | False                            |
    | name        | hb01                             |
    | parent_id   | default                          |
    +-------------+----------------------------------+
    # openstack role list
    +----------------------------------+---------------------+
    | ID                               | Name                |
    +----------------------------------+---------------------+
    | 1a76014f12594214a50c36e6a8e3722c | deployer            |
    | 54616a8b136742098dd81eede8fd5aa8 | vm_manager          |
    | 7bd6de32c14d46f2bd5300530492d4a4 | storage_manager     |
    | 8260b7c3a4c24a38ba6bee8e13ced040 | deployer_restricted |
    | 9b69a55c6b9346e2b317d0806a225621 | image_manager       |
    | bc455ed006154d56ad53cca3a50fa7bd | admin               |
    | c19a43973db148608eb71eb3d86d4735 | service             |
    | cb130e4fa4dc4f41b7bb4f1fdcf79fc2 | self_service        |
    | f1a0c1f9041d4962838ec10671befe33 | vm_user             |
    | f8cf9127468045e891d5867ce8825d30 | viewer              |
    +----------------------------------+---------------------+
    # useradd hb01_admin
    # openstack role add --project hb01 --user hb01_admin admin
    

    Then associate each host group (aggregates in Openstack terms) (you have to put your allowed hosts in an host group to enable this feature) that are allowed for this tenant using filter_tenant_id meta-data. For each allowed host group add this field to the metatadata of the host. (first find the tenant id):

    # openstack project list
    +----------------------------------+-------------+
    | ID                               | Name        |
    +----------------------------------+-------------+
    | 1471acf124a0479c8d525aa79b2582d0 | ibm-default |
    | 90d064b4abea4339acd32a8b6a8b1fdf | hb01        |
    | b79b694c70734a80bc561e84a95b313d | powervm     |
    | c8c42d45ef9e4a97b3b55d7451d72591 | service     |
    | f371d1f29c774f2a97f4043932b94080 | project1    |
    +----------------------------------+-------------+
    # openstack aggregate list
    +----+---------------+-------------------+
    | ID | Name          | Availability Zone |
    +----+---------------+-------------------+
    |  1 | Default Group | None              |
    | 21 | aggregate2    | None              |
    | 41 | hg2           | None              |
    | 43 | hb01_mn       | None              |
    | 44 | hb01_me       | None              |
    +----+---------------+-------------------+
    # nova aggregate-set-metadata hb01_mn filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf 
    Metadata has been successfully updated for aggregate 43.
    | Id | Name    | Availability Zone | Hosts             | Metadata                                                                                                                                   
    | 43 | hb01_mn | -                 | '9119MME_1009425' | 'dro_enabled=False', 'filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf', 'hapolicy-id=1', 'hapolicy-run_interval=1', 'hapolicy-stabilization=1', 'initialpolicy-id=4', 'runtimepolicy-action=migrate_vm_advise_only', 'runtimepolicy-id=5', 'runtimepolicy-max_parallel=10', 'runtimepolicy-run_interval=5', 'runtimepolicy-stabilization=2', 'runtimepolicy-threshold=70' |
    # nova aggregate-set-metadata hb01_me filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf 
    Metadata has been successfully updated for aggregate 44.
    | Id | Name    | Availability Zone | Hosts             | Metadata                                                                                                                                   
    | 44 | hb01_me | -                 | '9119MME_0696010' | 'dro_enabled=False', 'filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf', 'hapolicy-id=1', 'hapolicy-run_interval=1', 'hapolicy-stabilization=1', 'initialpolicy-id=2', 'runtimepolicy-action=migrate_vm_advise_only', 'runtimepolicy-id=5', 'runtimepolicy-max_parallel=10', 'runtimepolicy-run_interval=5', 'runtimepolicy-stabilization=2', 'runtimepolicy-threshold=70' |
    

    To make this work add the AggregateMultiTenancyIsolation to the scheduler_default_filter in nova.conf file and restart nova services:

    # grep scheduler_default_filter /etc/nova/nova.conf
    scheduler_default_filters = RamFilter,CoreFilter,ComputeFilter,RetryFilter,AvailabilityZoneFilter,ImagePropertiesFilter,ComputeCapabilitiesFilter,MaintenanceFilter,PowerVCServerGroupAffinityFilter,PowerVCServerGroupAntiAffinityFilter,PowerVCHostAggregateFilter,PowerVMNetworkFilter,PowerVMProcCompatModeFilter,PowerLMBSizeFilter,PowerMigrationLicenseFilter,PowerVMMigrationCountFilter,PowerVMStorageFilter,PowerVMIBMiMobilityFilter,PowerVMRemoteRestartFilter,PowerVMRemoteRestartSameHMCFilter,PowerVMEndianFilter,PowerVMGuestCapableFilter,PowerVMSharedProcPoolFilter,PowerVCResizeSameHostFilter,PowerVCDROFilter,PowerVMActiveMemoryExpansionFilter,PowerVMNovaLinkMobilityFilter,AggregateMultiTenancyIsolation
    # powervc-services restart
    

    We are done regarding the hosts.

    Enabling quotas

    To allow one user/tenant to create volumes only on onz storage provider we first need to enable quotas using the following commands:

    # grep quota /opt/ibm/powervc/policy/cinder/policy.json
        "volume_extension:quotas:show": "",
        "volume_extension:quotas:update": "rule:admin_only",
        "volume_extension:quotas:delete": "rule:admin_only",
        "volume_extension:quota_classes": "rule:admin_only",
        "volume_extension:quota_classes:validate_setup_for_nested_quota_use": "rule:admin_only",
    

    Then put to 0 all the non-allowed storage template for this tenant and let the only one you want to 10000. Easy:

    # cinder --service-type volume type-list
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    |                  ID                  |                     Name                    | Description | Is_Public |
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    | 53434872-a0d2-49ea-9683-15c7940b30e5 |               svc2 base template            |      -      |    True   |
    | e49e9cc3-efc3-4e7e-bcb9-0291ad28df42 |               svc1 base template            |      -      |    True   |
    | f45469d5-df66-44cf-8b60-b226425eee4f |                     svc3                    |      -      |    True   |
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    # cinder --service-type volume quota-update --volumes 0 --volume-type "svc2" 90d064b4abea4339acd32a8b6a8b1fdf
    # cinder --service-type volume quota-update --volumes 0 --volume-type "svc3" 90d064b4abea4339acd32a8b6a8b1fdf
    +-------------------------------------------------------+----------+
    |                        Property                       |  Value   |
    +-------------------------------------------------------+----------+
    |                    backup_gigabytes                   |   1000   |
    |                        backups                        |    10    |
    |                       gigabytes                       | 1000000  |
    |              gigabytes_svc2 base template             | 10000000 |
    |              gigabytes_svc1 base template             | 10000000 |
    |                     gigabytes_svc3                    |    -1    |
    |                  per_volume_gigabytes                 |    -1    |
    |                       snapshots                       |  100000  |
    |             snapshots_svc2 base template              |  100000  |
    |             snapshots_svc1 base template              |  100000  |
    |                     snapshots_svc3                    |    -1    |
    |                        volumes                        |  100000  |
    |            volumes_svc2 base template                 |  100000  |
    |            volumes_svc1 base template                 |    0     |
    |                      volumes_svc3                     |    0     |
    +-------------------------------------------------------+----------+
    # powervc-services stop
    # powervc-services start
    

    By doing this you have enable the isolation between two tenants. Then use the appropriate user to do the appropriate task.

    PowerVC cinder above the Petabyte

    Now that quota are enabled use this command if you want to be able to have more that one petabyte of data managed by PowerVC:

    # cinder --service-type volume quota-class-update --gigabytes -1 default
    # powervc-services stop
    # powervc-services start
    

    PowerVC cinder above 10000 luns

    Change the osapi_max_limit in cinder.conf if you want to go above the 10000 lun limits (check every cinder configuration files; the cinder.conf if for the global number of volumes):

    # grep ^osapi_max_limit cinder.conf
    osapi_max_limit = 15000
    # powervc-services stop
    # powervc-services start
    

    Snapshot and consistncy group

    There is a new cool feature available with the latest version of PowerVC (1.3.1.2). This feature allows you to create snapshots of volume (only on SVC and Storwise for the moment). You now have the possibility to create consistency group (group of volumes) and create snapshots of these consistency groups (allowing for instance to make a backup of a volume group directly from OpenStack. I'm doing the example below using the command line because I think it is easier to understand with these commands rather than showing you the same thing with the rest api):

    First create a consistency group:

    # cinder --service-type volume type-list
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    |                  ID                  |                     Name                    | Description | Is_Public |
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    | 53434872-a0d2-49ea-9683-15c7940b30e5 |              svc2 base template             |      -      |    True   |
    | 862b0a8e-cab4-400c-afeb-99247838f889 |             p8_ssp base template            |      -      |    True   |
    | e49e9cc3-efc3-4e7e-bcb9-0291ad28df42 |               svc1 base template            |      -      |    True   |
    | f45469d5-df66-44cf-8b60-b226425eee4f |                     svc3                    |      -      |    True   |
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    # cinder --service-type volume consisgroup-create --name foovg_cg "svc1 base template"
    +-------------------+-------------------------------------------+
    |      Property     |                   Value                   |
    +-------------------+-------------------------------------------+
    | availability_zone |                    nova                   |
    |     created_at    |         2016-09-11T21:10:58.000000        |
    |    description    |                    None                   |
    |         id        |    950a5193-827b-49ab-9511-41ba120c9ebd   |
    |        name       |                  foovg_cg                 |
    |       status      |                  creating                 |
    |    volume_types   | [u'e49e9cc3-efc3-4e7e-bcb9-0291ad28df42'] |
    +-------------------+-------------------------------------------+
    # cinder --service-type volume consisgroup-list
    +--------------------------------------+-----------+----------+
    |                  ID                  |   Status  |   Name   |
    +--------------------------------------+-----------+----------+
    | 950a5193-827b-49ab-9511-41ba120c9ebd | available | foovg_cg |
    +--------------------------------------+-----------+----------+
    

    Create volume in this consistency group:

    # cinder --service-type volume create --volume-type "svc1 base template" --name foovg_vol1 --consisgroup-id 950a5193-827b-49ab-9511-41ba120c9ebd 200
    # cinder --service-type volume create --volume-type "svc1 base template" --name foovg_vol2 --consisgroup-id 950a5193-827b-49ab-9511-41ba120c9ebd 200
    +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
    |           Property           |                                                                          Value                                                                           |
    +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
    |         attachments          |                                                                            []                                                                            |
    |      availability_zone       |                                                                           nova                                                                           |
    |           bootable           |                                                                          false                                                                           |
    |     consistencygroup_id      |                                                           950a5193-827b-49ab-9511-41ba120c9ebd                                                           |
    |          created_at          |                                                                2016-09-11T21:23:02.000000                                                                |
    |         description          |                                                                           None                                                                           |
    |          encrypted           |                                                                          False                                                                           |
    |        health_status         | {u'health_value': u'PENDING', u'id': u'8d078772-00b5-45fc-89c8-82c63e2c48ed', u'value_reason': u'PENDING', u'updated_at': u'2016-09-11T21:23:02.669372'} |
    |              id              |                                                           8d078772-00b5-45fc-89c8-82c63e2c48ed                                                           |
    |           metadata           |                                                                            {}                                                                            |
    |       migration_status       |                                                                           None                                                                           |
    |         multiattach          |                                                                          False                                                                           |
    |             name             |                                                                        foovg_vol2                                                                        |
    |    os-vol-host-attr:host     |                                                                           None                                                                           |
    | os-vol-tenant-attr:tenant_id |                                                             1471acf124a0479c8d525aa79b2582d0                                                             |
    |      replication_status      |                                                                         disabled                                                                         |
    |             size             |                                                                           200                                                                            |
    |         snapshot_id          |                                                                           None                                                                           |
    |         source_volid         |                                                                           None                                                                           |
    |            status            |                                                                         creating                                                                         |
    |          updated_at          |                                                                           None                                                                           |
    |           user_id            |                                             0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9                                             |
    |         volume_type          |                                                                   svc1 base template                                                                     |
    +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
    

    You're now able to attach these two volumes to a machine from the PowerVC GUI:

    consist

    # lsmpio -q
    Device           Vendor Id  Product Id       Size    Volume Name
    ------------------------------------------------------------------------------
    hdisk0           IBM        2145                 64G volume-aix72-44c7a72c-000000e0-
    hdisk1           IBM        2145                100G volume-snap1-dab0e2d1-130a
    hdisk2           IBM        2145                100G volume-snap2-5e863fdb-ab8c
    hdisk3           IBM        2145                200G volume-foovg_vol1-3ba0ff59-acd8
    hdisk4           IBM        2145                200G volume-foovg_vol2-8d078772-00b5
    # cfgmr
    # lspv
    hdisk0          00c8b2add70d7db0                    rootvg          active
    hdisk1          00f9c9f51afe960e                    None
    hdisk2          00f9c9f51afe9698                    None
    hdisk3          none                                None
    hdisk4          none                                None
    

    Then you can create a snapshot fo these two volumes. It's that easy :-) :

    # cinder --service-type volume cgsnapshot-create 950a5193-827b-49ab-9511-41ba120c9ebd
    +---------------------+--------------------------------------+
    |       Property      |                Value                 |
    +---------------------+--------------------------------------+
    | consistencygroup_id | 950a5193-827b-49ab-9511-41ba120c9ebd |
    |      created_at     |      2016-09-11T21:31:12.000000      |
    |     description     |                 None                 |
    |          id         | 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f |
    |         name        |                 None                 |
    |        status       |               creating               |
    +---------------------+--------------------------------------+
    # cinder --service-type volume cgsnapshot-list
    +--------------------------------------+-----------+------+
    |                  ID                  |   Status  | Name |
    +--------------------------------------+-----------+------+
    | 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f | available |  -   |
    +--------------------------------------+-----------+------+
    # cinder --service-type volume cgsnapshot-show 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f
    +---------------------+--------------------------------------+
    |       Property      |                Value                 |
    +---------------------+--------------------------------------+
    | consistencygroup_id | 950a5193-827b-49ab-9511-41ba120c9ebd |
    |      created_at     |      2016-09-11T21:31:12.000000      |
    |     description     |                 None                 |
    |          id         | 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f |
    |         name        |                 None                 |
    |        status       |              available               |
    +---------------------+--------------------------------------+
    

    cgsnap

    Conclusion

    Please keep in mind that the content of this blog post comes from real life and production examples. I hope you will be able to better understand that scalability, density, fast deployment, snapshots, multi tenancy are some features that are absolutely needed in the AIX world. As you can see the PowerVC team is moving fast. Probably faster than every customer I have ever seen. I must admit they are right. Doing this is the only way the face the Linux X86 offering. And I must confess this is damn fun to work on those things. I'm so happy to have the best of two worlds AIX/PowerSystem and Openstack. This is the only direction we have to take if we want AIX to survive. So please stop being scared or not convinced by these solutions they are damn good, production ready. Please face and embrace the future and stop looking at the past. As always I hope it help.

    What’s new in VIOS 2.2.4.10 and PowerVM : Part 2 Shared Processor Pool weighting

    First of all before beginning this blog post I owe you an explanation about these two months without new posts. These two months were very busy. On the personal side I was forced to move from my current apartment and had to find another one which was suitable for me (and I can assure you that this is not something really easy in Paris). As I was visiting apartments almost 3 days a week the time kept for writing blog posts (please remember that I’m doing that in my “after hours” work) was taken for something else :-(. At work things were crazy too, we had to build twelve new E870 boxes (with the provisioning toolkit and SRIOV adapters) and make them work with our current implementation of PowerVC. Then I had to do a huge vscsi to NPIV migration (more than 500 AIX machines to migrate from vscsi to NPIV and then move to P8 boxes in less than three weeks … yes more than 500 machines in less than 3 weeks (4000 zones created …). Thanks to the help of STG Lab Services consultant (Bonnie LeBarron) this was achieved using a modified version of her script (to fit our need (zoning and mapping part) (latest hmc releases)). I’m back in business now and I have planned a couple of blog posts this month. This first of this series is about the Shared Processor Pool weighting on the latest Power8 firmware versions. You’ll see that it changes a lot of things compared to P7 boxes.

    A short history of Shared Processor Pool weighting

    This long story began a few years ago for me (I’ll say at least 4 years ago) (I was planing to do a blog post about it a few years ago but decided not to do it because I was thinking this topic was considered as “sensible”, now that we have documentation and an official statement on this there is no reason to hide this anymore). I was working for a bank using two P795 with a lot of cores activated. We were using Multiple Shared Processor Pool in an unconventional way (as far as I remember two pools per customers one for Oracle and one for WAS, and we had more than 5 or 6 customers, so each box had at least 10 MSPP). As you may already know I only believe what I can see. So I decided to make tests on my own. By reading the Redbook I realized that there was not enough information about pool and partition weighting. We were like a lot of today’s customers having different weights for development (32), qualification (64), pre-production (128), production (192) and finally Virtual I/O Server (255). As we were using Shared Processor Pool I was expecting that when the Shared Processor Pool is full (contention) the weight will work and will prioritize the partition with the higher weight. What was my surprise when I realized the weighting was not working inside a Shared Processor Pool but only in the DefaultPool (Pool 0). Remember forever this statement on Power7 partition weighting is only working when the default pool is full. There is no “intelligence” in a Shared Processor Pool and you have to be very careful with the size of the pool because of that. On Power7 pools are used ONLY for licensing purpose. I then decided to contact my preferred IBM pre-sales in France to tell him about this incredible discovery. I had no answer for one month, then (as always) he came back with the answer of someone who already knows the truth about this. He introduced me to a performance expert (she was a performance expert at this time and is now specialized in security) and she was telling me that I was absolutely right with my discovery but that only a few people were aware of this. I decided to say nothing about it … but was sure that IBM realized there was a something to clarify about this. Then last year at the IBM Technical Collaboration Council I saw a PowerPoint slide telling that latest IBM Power8 firmware will add this long awaited feature. Partition weighting will work inside a Shared Processor Pool. Finally after waiting for more than four years I have what I want. As I was working on a new project in my current job I had to create a lot of Shared Processor Pool in a mixed Power7 (P770) and Power8 (E870) environment. It was the time to check if this new feature was really working and compare the differences between a Power8 (with latest firmware) and a Power7 machine (with latest firmware). The way we are implementing and monitoring the Shared Processor Pool on a Power8 will now be very different than it was on Power7 box. I think that this is really important and that everybody now needs to understand the differences for their future implementation. But let’s first have a look in the Redbooks to check the official statements:

    The Redbook talking about this is “IBM PowerVM Virtualization Introduction and Configuration”, here is the key paragraph to understand (page 113 and 114):

    redbook_statement

    It was super hard to find but there is place were IBM is talking about this. I’m below quoting this link: https://www.ibm.com/support/knowledgecenter/9119-MME/p8hat/p8hat_sharedproc.htm

    When the firmware is at level 8.3.0, or earlier, uncapped weight is used only when more virtual processors consume unused resources than the available physical processors in the shared processor pool. If no contention exists for processor resources, the virtual processors are immediately distributed across the physical processors, independent of their uncapped weights. This can result in situations where the uncapped weights of the logical partitions do not exactly reflect the amount of unused capacity.

    For example, logical partition 2 has one virtual processor and an uncapped weight of 100. Logical partition 3 also has one virtual processor, but an uncapped weight of 200. If logical partitions 2 and 3 both require more processing capacity, and there is not enough physical processor capacity to run both logical partitions, logical partition 3 receives two more processing units for every additional processing unit that logical partition 2 receives. If logical partitions 2 and 3 both require more processing capacity, and there is enough physical processor capacity to run both logical partitions, logical partition 2 and 3 receive an equal amount of unused capacity. In this situation, their uncapped weights are ignored.

    When the firmware is at level 8.4.0, or later, if multiple partitions are assigned to a shared processor pool, the uncapped weight is used as an indicator of how the processor resources must be distributed among the partitions in the shared processor pool with respect to the maximum amount of capacity that can be used by the shared processor pool. For example, logical partition 2 has one virtual processor and an uncapped weight of 100. Logical partition 3 also has one virtual processor, but an uncapped weight of 200. If logical partitions 2 and 3 both require more processing capacity, logical partition 3 receives two additional processing units for every additional processing unit that logical partition 2 receives.

    The server distributes unused capacity among all of the uncapped shared processor partitions that are configured on the server, regardless of the shared processor pools to which they are assigned. For example, if you configure logical partition 1 to the default shared processor pool and you configure logical partitions 2 and 3 to a different shared processor pool, all three logical partitions compete for the same unused physical processor capacity in the server, even though they belong to different shared processor pools.

    Testing methodology

    We now need to demonstrate that the behavior of the weighting is different between a Power7 and a Power8 machine, here is how we are going to proceed :

    • On a Power8 machine (E870 SC840_056) we create a Shared Processor Pool with a “Maximum Processing unit” set to 1.
    • On a Power7 machine we create a Shared Processor Pool with a “Maximum Processing unit” set to 1.
    • We create two partitions in the P8 pool (1VP, 0.1EC) called mspp1 and mspp2.
    • We create two partitions in the P7 pool (1VP, 0.1EC) called mspp3 and mspp4.
    • Using ncpu providev with the nstress tools (http://public.dhe.ibm.com/systems/power/community/wikifiles/PerfTools/nstress_AIX6_April_2014.tar) we create an heavy load on each partition. Obviously this load can’t be higher than 1 processing unit in total (sum of each physc).
    • We then use these testing scenarios (each test has a duration of 15 minutes, we are recording cpu and pool stats with nmon and lpar2rrd)
      1. First partition with a weight of 128, the second partition with a weight of 128 (test with the same weight).
      2. First partition with a weight of 64, the second partition with a weight of 128 (test weight multiply by two 1/2).
      3. First partition with a weight of 32, the second partition with a weight of 128 (test weight multiply by four 1/4).
      4. First partition with a weight of 1, the second partition with a weight of 2 (we try here to prove that the ratio between two values is more important that the value itself. Values of 1 and 2 should give us the same result as 64 and 128)
      5. First partition with a weight of 1, the second partition with a weight of 255 (a ratio of 1:255) (you’ll see here that the result is pretty interesting :-) ).
    • You’ll see that It will not be necessary to do all these tests on the P7 box …. :-)

    The Power8 case

    Prerequistes

    Firmware P8 SC840* or SV840* are mandatory to enable the weighting in a Shared Processor Pool on a machine without contention for processor resources (no contention in the DefaultPool). This means that all P6, P7 and P8 (with a firmware < 840) machines do not have this feature coded in the firmware. My advice is to update all your P8 machines to the latest level to enable this new behavior.

    Tests

    For each test, we prove the weight of each partition using the lparstat command, then we capture a nmon file every 30 seconds and we launch ncpu for a duration of 15 minutes with four CPUs (we are in SMT4) on both P8 and P7 box. We will show you here that weight are taken into account in a Power8 MSPP, but are not taken into account in a Power7 MSPP.

    #lparstat -i | grep -iE "Variable Capacity Weight|^Partition"
    Partition Name                             : mspp1-23bad3d7-00000898
    Partition Number                           : 3
    Partition Group-ID                         : 32771
    Variable Capacity Weight                   : 255
    Desired Variable Capacity Weight           : 255
    # /usr/bin/nmon -F /admin/nmon/$(hostname)_weight255.nmon -s30 -c30 -t ; ./ncpu -p 4 -s 900
    # lparstat 1 10
    
    • Both weights at 128, you can check in the picture below that the “physc” value are strictly equal (0.5 for both lpars) (the ratio of 1 between the two weight is respected) :
    • weight128

    • One partition to 64 and one partition to 128, you can check in the pictures below (lparstat output, and nmon analyser graph) that we now have different values for the physc value (0.36 for the mssp2 lpar and 0.64 for the mssp1 lpar). We now have a ratio of 2, mspp1 physc is two time the mspp2 physc (the weights are respected in the Shared Processor Pool):
    • weight64_128

    nmonx2

    This lpar2rrd graph show you the weighting behavior on a Power8 machine (test one: both weights equal to 128, and test two: with two different weights of 128 and 64).

    graph_p8_128128_12864

    • One partition to 32 and one partition to 128: you can check in the picture below that the ratio of 3 (32:128) is respected (physc value to 0.26 and 0.74).
    • weight32_128

    • One partition to 1 and one partition to 2. The results here are exactly the same as the second test (128 and 64 weights), it proves you that the important thing to configure are the ratio between the weights and not the value itself (using 1 2 3 weights will give you the exact same results as 2 4 6):
    • weight1_2

    • Finally one partition to 1 and one partition to 255. Be careful here the ratio is big enough to have an unresponsive lpar when loading both partitions. I do not recommend putting such high ratios because of this:
    • weight1_255

    graph_p8_12832_12_1255

    The Power7 case

    Let’s do one test on a Power7 machine with on lpar with a weight of 1 and the other one with a weight of 255 … you’ll see a huge difference here … and I think it is clear enough to avoid doing all the test scenarios on the Power7 machine.

    Tests

    You can see here that I’m doing the exact same test, weight to 1 and 255, now both partition have an equal physc value (0.5 for both partitions). On a Power7 box the weights will be taken into account only if the DefaultPool (pool0) is full (contention). The pictures below show you the reality of the Multiple Shared Processors pool running on a Power7 box. On Power7 MSPP must be used only for licensing purpose and nothing else.

    weight1_255_power7
    graph_p7_1255

    Conclusion

    I hope you better understand the Multiple Shared Processor Pools differences between Power8 and Power7. Now that you are aware of this my advice is to have different strategies when you are implementing MSPP on Power7 and Power8. On Power7 double check and monitor your MSPP to be sure the pools are never full and that you can get enough capacity to run you load. On a Power8 box setup you weights wisely on your different environments (backup, production, development). You can then be sure that the production will be prioritized whatever appends even if you reduce your MSPP sizes, by doing this you’ll maximize licensing costs. As always I hope it help.

    NovaLink ‘HMC Co-Management’ and PowerVC 1.3.0.1Dynamic Resource Optimizer

    Everybody now knows that I’m using PowerVC a lot in my current company. My environment is growing bigger and bigger and we are now managing more than 600 virtual machines with PowerVC (the goal is to reach ~ 3000 this year). Some of them were build by PowerVC itself and some of them were migrated through an homemade python script calling the PowerVC rest api and moving our old vSCSI machines to the new full NPIV/Live Partition Mobility/PowerVC environment (Still struggling with the “old mens” to move on SSP, but I’m alone versus everybody on this one). I’m happy with that but (there is always a but) I’m facing a lot problems. The first one is that we are doing more and more stuffs with PowerVC (Virtual Machine creation, virtual machines resizing, adding additional disks, moving machine with LPM, and finally using this python scripts to migrate the old machines to the new environment). I realized that the machine hosting the PowerVC was slower and slower and the more actions we do the more the PowerVC was “unresponsive”. By this I mean that the GUI was slow, creating objects was slower and slower. By looking at CPU graphs in lpar2rrd we noticed that the CPU consumption was growing as fast as we were doing stuffs on PowerVC (check the graph below). The second problem was my teams (unfortunately for me, we have here different teams doing different sort of stuffs here and everybody is using the Hardware Management Consoles it’s own way, some people are renaming the machine making them unusable with PowerVC, some people were changing the profiles disabling the synchronization, even worse we have some third party tools used for capacity planning making the Hardware Management Console unusable by PowerVC). The solution to all these problems is to use NovaLink and especially the NovaLink Co-Management. By doing this the Hardware Management Consoles will be restricted to a read-only view and PowerVC will stop querying the HMCs and will directly query the NovaLink partitions on each hosts instead of querying the Hardware Management Consoles.

    cpu_powervc

    What is NovaLink ?

    If you are using PowerVC you know that this one is based on OpenStack. Until now all the Openstack services where running on the PowerVC host. If you check on the PowerVC today you can see that there is one Nova per managed host. In the example below I’m managing ten hosts so I have ten different Nova processes running :

    # ps -ef | grep [n]ova-compute
    nova       627     1 14 Jan16 ?        06:24:30 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_10D6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_10D6666.log
    nova       649     1 14 Jan16 ?        06:30:25 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_65E6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_65E6666.log
    nova       664     1 17 Jan16 ?        07:49:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1086666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_1086666.log
    nova       675     1 19 Jan16 ?        08:40:27 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_06D6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_06D6666.log
    nova       687     1 18 Jan16 ?        08:15:57 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6576666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_6576666.log
    nova       697     1 21 Jan16 ?        09:35:40 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6556666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_6556666.log
    nova       712     1 13 Jan16 ?        06:02:23 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_10A6666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_10A6666.log
    nova       728     1 17 Jan16 ?        07:49:02 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1016666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9117MMD_1016666.log
    nova       752     1 17 Jan16 ?        07:34:45 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_1036666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9119MHE_1036666.log
    nova       779     1 13 Jan16 ?        05:54:52 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova-9117MMD_6596666.conf --log-file /var/log/nova/nova-compute.log --log-file /var/log/nova/nova-compute-9119MHE_6596666.log
    # ps -ef | grep [n]ova-compute | wc -l
    10
    

    The goal of NovaLink is to move these processes on a dedicated partition running on each managed host (each PowerSystems). This partition is called the NovaLink partition. This one is running on an Ubuntu 15.10 Linux OS (Little endian) (so only available on Power8 hosts) and is in charge to run the Openstack nova processes. By doing that you will distribute the load across all the NovaLink partitions instead of charging one PowerVC host. Even better my understanding is that the NovaLink partition is able to communicate directly with the FSP. By using NovaLink you will be able to stop using the Hardware Management Consoles anymore and avoid the slowness of theses ones. As the NovaLink partition is hosted on the host itself the RMC connections are can now use a direct link (ipv6) through the PowerHypervisor. No more RMC connection problem at all ;-), it’s just awesome. NovaLink allows you to choose between two modes of management:

    • Full Nova Management: You install your new host directly with NovaLink on it and you will not need an Hardware Management Console Anymore (In this case the NovaLink installation is in charge to deploy the Virtual I/O Servers and the SEAs).
    • Nova Co-Management: Your host is already installed and you give the write access (setmaster) to the NovaLink partition, the Hardware Management Console will be limited in this mode (you will not be able to create partition anymore or modify profile, it’s not a “read only” mode as you will be able to start and stop the partitions and still do some stuffs with HMC but you will be very limited).
    • You can still mix NovaLink and Non-NovaLink management hosts, and still have P7/P6 managed by HMCs, P8 managed by HMCs, P8 Nova Co-Managed and P8 full Nova Managed ;-).
    • Nova1

    Prerequisites

    As always upgrade your systems to the latest code level if you want to use NovaLink and NovaLink Co-Management

    • Power 8 only with firmware version 840. (or later)
    • Virtual I/O Server 2.2.4.10 or later
    • For NovaLink co-management HMC V8R8.4.0
    • Obviously install NovaLink on each NovaLink managed system (install the latest patch version of NovaLink)
    • PowerVC 1.3.0.1 or later

    NovaLink installation on an existing system

    I’ll show you here how to install a NovaLink partition on an existing deployed system. Installing a new system from scratch is also possible. My advice is that you look at this address to start: , and check this youtube video showing you how a system is installed from scratch :

    The goal of this post is to show you how to setup a co-managed system on an already existing system with Virtual I/O Servers already deployed on the host. My advice is to be very careful. The first thing you’ll need to do is to created a partition (2VP 0.5EC and 5GB Memory) (I’m calling it nova in the example below) and use the Virtual Optical device to load the NovaLink system on this one. In the example below the machine is “SSP” backed. Be very careful when do that: setup the profile name, and all the configuration stuffs before moving to co-managed mode … after that it will be harder for you to change things as the new pvmctl command will be very new to you:

    # mkvdev -fbo -vadapter vhost0
    vtopt0 Available
    # lsrep
    Size(mb) Free(mb) Parent Pool         Parent Size      Parent Free
        3059     1579 rootvg                   102272            73216
    
    Name                                                  File Size Optical         Access
    PowerVM_NovaLink_V1.1_122015.iso                           1479 None            rw
    vopt_a19a8fbb57184aad8103e2c9ddefe7e7                         1 None            ro
    # loadopt -disk PowerVM_NovaLink_V1.1_122015.iso -vtd vtopt0
    # lsmap -vadapter vhost0 -fmt :
    vhost0:U8286.41A.21AFF8V-V2-C40:0x00000003:nova_b1:Available:0x8100000000000000:nova_b1.7f863bacb45e3b32258864e499433b52: :N/A:vtopt0:Available:0x8200000000000000:/var/vio/VMLibrary/PowerVM_NovaLink_V1.1_122015.iso: :N/A
    
    • At the gurb page select the first entry:
    • install1

    • Wait for the machine to boot:
    • install2

    • Choose to perform an installation:
    • install3

    • Accept the licenses
    • install4

    • padmin user:/li>
      install5

    • Put you network configuration:
    • install6

    • Accept to install the Ubuntu system:
    • install8

    • You can then modify anything you want in the configuration file (in my case the timezone):
    • install9

      By default NovaLink (I think not 100% sure) is designed to be installed on SAS disk, so without multipathing. If like me you decide to install the NovaLink partition in a “boot-on-san” lpar my advice is to launch the installation without any multipathing enabled (only one vscsi adapter or one virtual fibre channel adapter). After the installation is completed install the Ubuntu multipathd service and configure the second vscsi or virtual fibre channel adapter. If you don’t do that you may experience problem at the installation time (RAID error). Please remember that you have to do that before enabling the co-management. Last thing about the installation it may takes a lot of time to finish. So be patient (especially the preseed step).

    install10

    Updating to the latest code level

    The iso file provider in the Entitled Software Support is not updated to the latest available NovaLink code. Make a copy of the official repository available at this address: ftp://public.dhe.ibm.com/systems/virtualization/Novalink/debian. Serve the content of this ftp server on you how http server (use the command below to copy it):

    # wget --mirror ftp://public.dhe.ibm.com/systems/virtualization/Novalink/debian
    

    Modify the /etc/apt/sources.list (and source.list.d) and comment all the available deb repository to on only keep your copy

    root@nova:~# grep -v ^# /etc/apt/sources.list
    deb http://deckard.lab.chmod666.org/nova/Novalink/debian novalink_1.0.0 non-free
    root@nova:/etc/apt/sources.list.d# apt-get upgrade
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    Calculating upgrade... Done
    The following packages will be upgraded:
      pvm-cli pvm-core pvm-novalink pvm-rest-app pvm-rest-server pypowervm
    6 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
    Need to get 165 MB of archives.
    After this operation, 53.2 kB of additional disk space will be used.
    Do you want to continue? [Y/n]
    Get:1 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pypowervm all 1.0.0.1-151203-1553 [363 kB]
    Get:2 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-cli all 1.0.0.1-151202-864 [63.4 kB]
    Get:3 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-core ppc64el 1.0.0.1-151202-1495 [2,080 kB]
    Get:4 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-rest-server ppc64el 1.0.0.1-151203-1563 [142 MB]
    Get:5 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-rest-app ppc64el 1.0.0.1-151203-1563 [21.1 MB]
    Get:6 http://deckard.lab.chmod666.org/nova/Novalink/debian/ novalink_1.0.0/non-free pvm-novalink ppc64el 1.0.0.1-151203-408 [1,738 B]
    Fetched 165 MB in 7s (20.8 MB/s)
    (Reading database ... 72094 files and directories currently installed.)
    Preparing to unpack .../pypowervm_1.0.0.1-151203-1553_all.deb ...
    Unpacking pypowervm (1.0.0.1-151203-1553) over (1.0.0.0-151110-1481) ...
    Preparing to unpack .../pvm-cli_1.0.0.1-151202-864_all.deb ...
    Unpacking pvm-cli (1.0.0.1-151202-864) over (1.0.0.0-151110-761) ...
    Preparing to unpack .../pvm-core_1.0.0.1-151202-1495_ppc64el.deb ...
    Removed symlink /etc/systemd/system/multi-user.target.wants/pvm-core.service.
    Unpacking pvm-core (1.0.0.1-151202-1495) over (1.0.0.0-151111-1375) ...
    Preparing to unpack .../pvm-rest-server_1.0.0.1-151203-1563_ppc64el.deb ...
    Unpacking pvm-rest-server (1.0.0.1-151203-1563) over (1.0.0.0-151110-1480) ...
    Preparing to unpack .../pvm-rest-app_1.0.0.1-151203-1563_ppc64el.deb ...
    Unpacking pvm-rest-app (1.0.0.1-151203-1563) over (1.0.0.0-151110-1480) ...
    Preparing to unpack .../pvm-novalink_1.0.0.1-151203-408_ppc64el.deb ...
    Unpacking pvm-novalink (1.0.0.1-151203-408) over (1.0.0.0-151112-304) ...
    Processing triggers for ureadahead (0.100.0-19) ...
    ureadahead will be reprofiled on next reboot
    Setting up pypowervm (1.0.0.1-151203-1553) ...
    Setting up pvm-cli (1.0.0.1-151202-864) ...
    Installing bash completion script /etc/bash_completion.d/python-argcomplete.sh
    Setting up pvm-core (1.0.0.1-151202-1495) ...
    addgroup: The group `pvm_admin' already exists.
    Created symlink from /etc/systemd/system/multi-user.target.wants/pvm-core.service to /usr/lib/systemd/system/pvm-core.service.
    0513-071 The ctrmc Subsystem has been added.
    Adding /usr/lib/systemd/system/ctrmc.service for systemctl ...
    0513-059 The ctrmc Subsystem has been started. Subsystem PID is 3096.
    Setting up pvm-rest-server (1.0.0.1-151203-1563) ...
    The user `wlp' is already a member of `pvm_admin'.
    Setting up pvm-rest-app (1.0.0.1-151203-1563) ...
    Setting up pvm-novalink (1.0.0.1-151203-408) ...
    

    NovaLink and HMC Co-Management configuration

    Before adding the hosts on PowerVC you still need to do the most important thing. After the installation is finished enable the co-management mode to be able to have a system managed by NovaLink and still connected to an Hardware Management Console:

    • Enable the powerm_mgmt_capable attribute on the Nova partition:
    • # chsyscfg -r lpar -m br-8286-41A-2166666 -i "name=nova,powervm_mgmt_capable=1"
      # lssyscfg -r lpar -m br-8286-41A-2166666 -F name,powervm_mgmt_capable --filter "lpar_names=nova"
      nova,1
      
    • Enable co-management (please not here that you have to setmaster (you’ll see that the curr_master_name is the HMC) and then relmaster (you’ll see that the curr_master_name is the NovaLink Partition, this is that state where we want to be)):
    • # lscomgmt -m br-8286-41A-2166666
      is_master=null
      # chcomgmt -m br-8286-41A-2166666 -o setmaster -t norm --terms agree
      # lscomgmt -m br-8286-41A-2166666
      is_master=1,curr_master_name=myhmc1,curr_master_mtms=7042-CR8*2166666,curr_master_type=norm,pend_master_mtms=none
      # chcomgmt -m br-8286-41A-2166666 -o relmaster
      # lscomgmt -m br-8286-41A-2166666
      is_master=0,curr_master_name=nova,curr_master_mtms=3*8286-41A*2166666,curr_master_type=norm,pend_master_mtms=none
      

    Going back to HMC managed system

    You can go back to an Hardware Management Console managed system whenever you want (set the master to the HMC, delete the nova partition and release the master from the HMC).

    # chcomgmt -m br-8286-41A-2166666 -o setmaster -t norm --terms agree
    # lscomgmt -m br-8286-41A-2166666
    is_master=1,curr_master_name=myhmc1,curr_master_mtms=7042-CR8*2166666,curr_master_type=norm,pend_master_mtms=none
    # chlparstate -o shutdown -m br-8286-41A-2166666 --id 9 --immed
    # rmsyscfg -r lpar -m br-8286-41A-2166666 --id 9
    # chcomgmt -o relmaster -m br-8286-41A-2166666
    # lscomgmt -m br-8286-41A-2166666
    is_master=0,curr_master_mtms=none,curr_master_type=none,pend_master_mtms=none
    

    Using NovaLink

    After the installation you are now able to login on the NovaLink partition. (You can gain root access with “sudo su -” command). A command new called pvmctl is available on the NovaLink partition allowing you to perform any actions (stop, start virtual machine, list Virtual I/O Servers, ….). Before trying to add the host double check that the pvmctl command is working ok.

    padmin@nova:~$ pvmctl lpar list
    Logical Partitions
    +------+----+---------+-----------+---------------+------+-----+-----+
    | Name | ID |  State  |    Env    |    Ref Code   | Mem  | CPU | Ent |
    +------+----+---------+-----------+---------------+------+-----+-----+
    | nova | 3  | running | AIX/Linux | Linux ppc64le | 8192 |  2  | 0.5 |
    +------+----+---------+-----------+---------------+------+-----+-----+
    

    Adding hosts

    On the PowerVC side add the NovaLink host by choosing the NovaLink option:

    addhostnovalink

    Some deb (ibmpowervc-power)packages will be installed on configured on the NovaLink machine:

    addhostnovalink3
    addhostnovalink4

    By doing this, on each NovaLink machine you can check that a nova-compute process is here. (By adding the host the deb was installed and configured on the NovaLink host:

    # ps -ef | grep nova
    nova      4392     1  1 10:28 ?        00:00:07 /usr/bin/python /usr/bin/nova-compute --config-file /etc/nova/nova.conf --log-file /var/log/nova/nova-compute.log
    root      5218  5197  0 10:39 pts/1    00:00:00 grep --color=auto nova
    # grep host_display_name /etc/nova/nova.conf
    host_display_name = XXXX-8286-41A-XXXX
    # tail -1 /var/log/apt/history.log
    Start-Date: 2016-01-18  10:27:54
    Commandline: /usr/bin/apt-get -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold -y install --force-yes --allow-unauthenticated ibmpowervc-powervm
    Install: python-keystoneclient:ppc64el (1.6.0-2.ibm.ubuntu1, automatic), python-oslo.reports:ppc64el (0.1.0-1.ibm.ubuntu1, automatic), ibmpowervc-powervm:ppc64el (1.3.0.1), python-ceilometer:ppc64el (5.0.0-201511171217.ibm.ubuntu1.199, automatic), ibmpowervc-powervm-compute:ppc64el (1.3.0.1, automatic), nova-common:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), python-oslo.service:ppc64el (0.11.0-2.ibm.ubuntu1, automatic), python-oslo.rootwrap:ppc64el (2.0.0-1.ibm.ubuntu1, automatic), python-pycadf:ppc64el (1.1.0-1.ibm.ubuntu1, automatic), python-nova:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), python-keystonemiddleware:ppc64el (2.4.1-2.ibm.ubuntu1, automatic), python-kafka:ppc64el (0.9.3-1.ibm.ubuntu1, automatic), ibmpowervc-powervm-monitor:ppc64el (1.3.0.1, automatic), ibmpowervc-powervm-oslo:ppc64el (1.3.0.1, automatic), neutron-common:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), python-os-brick:ppc64el (0.4.0-1.ibm.ubuntu1, automatic), python-tooz:ppc64el (1.22.0-1.ibm.ubuntu1, automatic), ibmpowervc-powervm-ras:ppc64el (1.3.0.1, automatic), networking-powervm:ppc64el (1.0.0.0-151109-25, automatic), neutron-plugin-ml2:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), python-ceilometerclient:ppc64el (1.5.0-1.ibm.ubuntu1, automatic), python-neutronclient:ppc64el (2.6.0-1.ibm.ubuntu1, automatic), python-oslo.middleware:ppc64el (2.8.0-1.ibm.ubuntu1, automatic), python-cinderclient:ppc64el (1.3.1-1.ibm.ubuntu1, automatic), python-novaclient:ppc64el (2.30.1-1.ibm.ubuntu1, automatic), python-nova-ibm-ego-resource-optimization:ppc64el (2015.1-201511110358, automatic), python-neutron:ppc64el (7.0.0-201511171221.ibm.ubuntu1.280, automatic), nova-compute:ppc64el (12.0.0-201511171221.ibm.ubuntu1.213, automatic), nova-powervm:ppc64el (1.0.0.1-151203-215, automatic), openstack-utils:ppc64el (2015.2.0-201511171223.ibm.ubuntu1.18, automatic), ibmpowervc-powervm-network:ppc64el (1.3.0.1, automatic), python-oslo.policy:ppc64el (0.5.0-1.ibm.ubuntu1, automatic), python-oslo.db:ppc64el (2.4.1-1.ibm.ubuntu1, automatic), python-oslo.versionedobjects:ppc64el (0.9.0-1.ibm.ubuntu1, automatic), python-glanceclient:ppc64el (1.1.0-1.ibm.ubuntu1, automatic), ceilometer-common:ppc64el (5.0.0-201511171217.ibm.ubuntu1.199, automatic), openstack-i18n:ppc64el (2015.2-3.ibm.ubuntu1, automatic), python-oslo.messaging:ppc64el (2.1.0-2.ibm.ubuntu1, automatic), python-swiftclient:ppc64el (2.4.0-1.ibm.ubuntu1, automatic), ceilometer-powervm:ppc64el (1.0.0.0-151119-44, automatic)
    End-Date: 2016-01-18  10:28:00
    

    The command line interface

    You can do ALL the stuffs you were doing on the HMC using the pvmctl command. The syntax is pretty simple: pvcmtl |OBJECT| |ACTION| where the OBJECT can be vios, vm, vea(virtual ethernet adapter), vswitch, lu (logical unit), or anything you want and ACTION can be list, delete, create, update. Here are a few examples :

    • List the Virtual I/O Servers:
    • # pvmctl vios list
      Virtual I/O Servers
      +--------------+----+---------+----------+------+-----+-----+
      |     Name     | ID |  State  | Ref Code | Mem  | CPU | Ent |
      +--------------+----+---------+----------+------+-----+-----+
      | s00ia9940825 | 1  | running |          | 8192 |  2  | 0.2 |
      | s00ia9940826 | 2  | running |          | 8192 |  2  | 0.2 |
      +--------------+----+---------+----------+------+-----+-----+
      
    • List the partitions (note the -d for display-fields allowing me to print somes attributes):
    • # pvmctl vm list
      Logical Partitions
      +----------+----+----------+----------+----------+-------+-----+-----+
      |   Name   | ID |  State   |   Env    | Ref Code |  Mem  | CPU | Ent |
      +----------+----+----------+----------+----------+-------+-----+-----+
      | aix72ca> | 3  | not act> | AIX/Lin> | 00000000 |  2048 |  1  | 0.1 |
      |   nova   | 4  | running  | AIX/Lin> | Linux p> |  8192 |  2  | 0.5 |
      | s00vl99> | 5  | running  | AIX/Lin> | Linux p> | 10240 |  2  | 0.2 |
      | test-59> | 6  | not act> | AIX/Lin> | 00000000 |  2048 |  1  | 0.1 |
      +----------+----+----------+----------+----------+-------+-----+-----+
      # pvmctl list vm -d name id 
      [..]
      # pvmctl vm list -i id=4 --display-fields LogicalPartition.name
      name=aix72-1-d3707953-00000090
      # pvmctl vm list  --display-fields LogicalPartition.name LogicalPartition.id LogicalPartition.srr_enabled SharedProcessorConfiguration.desired_virtual SharedProcessorConfiguration.uncapped_weight
      name=aix72capture,id=3,srr_enabled=False,desired_virtual=1,uncapped_weight=64
      name=nova,id=4,srr_enabled=False,desired_virtual=2,uncapped_weight=128
      name=s00vl9940243,id=5,srr_enabled=False,desired_virtual=2,uncapped_weight=128
      name=test-5925058d-0000008d,id=6,srr_enabled=False,desired_virtual=1,uncapped_weight=128
      
    • Delete the virtual adapter on the partition name nova (note the –parent-id to select the partition) with a certain uuid which was found with (pvmclt list vea):
    • # pvmctl vea delete --parent-id name=nova --object-id uuid=fe7389a8-667f-38ca-b61e-84c94e5a3c97
      
    • Power off the lpar named aix72-2:
    • # pvmctl vm power-off -i name=aix72-2-536bf0f8-00000091
      Powering off partition aix72-2-536bf0f8-00000091, this may take a few minutes.
      Partition aix72-2-536bf0f8-00000091 power-off successful.
      
    • Delete the lpar named aix72-2:
    • # pvmctl vm delete -i name=aix72-2-536bf0f8-00000091
      
    • Delete the vswitch named MGMTVSWITCH:
    • # pvmctl vswitch delete -i name=MGMTVSWITCH
      
    • Open a console:
    • #  mkvterm --id 4
      vterm for partition 4 is active.  Press Control+] to exit.
      |
      Elapsed time since release of system processors: 57014 mins 10 secs
      [..]
      
    • Power on an lpar:
    • # pvmctl vm power-on -i name=aix72capture
      Powering on partition aix72capture, this may take a few minutes.
      Partition aix72capture power-on successful.
      

    Is this a dream ? No more RMC connectivty problem anymore

    I’m 100% sure that you always have problems with RMC connectivity due to firwall issues, ports not opened, and IDS blocking RMC ongoing or outgoing traffic. NovaLink is THE solution that will solve all the RMC problems forever. I’m not joking it’s a major improvement for PowerVM. As the NovaLink partition is installed on each hosts this one can communicate through a dedicated IPv6 link with all the partitions hosted on the host. A dedicated virtual switch called MGMTSWITCH is used to allow the RMC flow to transit between all the lpars and the NovaLink partition. Of course this Virtual Switch must be created and one Virtual Ethernet Adapter must also be created on the NovaLink partition. These are the first two actions to do if you want to implement this solution. Before starting here are a few things you need to know:

    • For security reason the MGMTSWITCH must be created in Vepa mode. If you are not aware of what are VEPA and VEB modes here is a reminder:
    • In VEB mode all the the partitions connected to the same vlan can communicate together. We do not want that as it is a security issue.
    • The VEPA mode gives us the ability to isolate lpars that are on the same subnet. lpar to lpar traffic is forced out of the machine. This is what we want.
    • The PVID for this VEPA network is 4094
    • The adapter in the NovaLink partition must be a trunk adapter.
    • It is mandatory to name the VEPA vswitch MGMTSWITCH.
    • At the lpar creation if the MGMTSWITCH exists a new Virtual Ethernet Adapter will be automatically created on the deployed lpar.
    • To be correctly configured the deployed lpar needs the latest level of rsct code (3.2.1.0 for now).
    • The latest cloud-init version must be deploy on the captured lpar used to make the image.
    • You don’t need to configure any addresses on this adapter (on the deployed lpars the adapter is configured with the local-link address (it’s the same thing as 169.254.0.0/16 addresses used in IPv4 format but for IPv6)(please note that any IPv6 adapter must “by design” have a local-link address).

    mgmtswitch2

    • Create the virtual switch called MGMTSWITCH in Vepa mode:
    • # pvmctl vswitch create --name MGMTSWITCH --mode=Vepa
      # pvmctl vswitch list  --display-fields VirtualSwitch.name VirtualSwitch.mode 
      name=ETHERNET0,mode=Veb
      name=vdct,mode=Veb
      name=vdcb,mode=Veb
      name=vdca,mode=Veb
      name=MGMTSWITCH,mode=Vepa
      
    • Create a virtual ethernet adapter on the NovaLink partition with the PVID 4094 and a trunk priorty set to 1 (it’s a trunk adapter). Note that we now have two adapters on the NovaLink partition (one in IPv4 (routable) and the other one in IPv6 (non-routable):
    • # pvmctl vea create --pvid 4094 --vswitch MGMTSWITCH --trunk-pri 1 --parent-id name=nova
      # pvmctl vea list --parent-id name=nova
      --------------------------
      | VirtualEthernetAdapter |
      --------------------------
        is_tagged_vlan_supported=False
        is_trunk=False
        loc_code=U8286.41A.216666-V3-C2
        mac=EE3B84FD1402
        pvid=666
        slot=2
        uuid=05a91ab4-9784-3551-bb4b-9d22c98934e6
        vswitch_id=1
      --------------------------
      | VirtualEthernetAdapter |
      --------------------------
        is_tagged_vlan_supported=True
        is_trunk=True
        loc_code=U8286.41A.216666-V3-C34
        mac=B6F837192E63
        pvid=4094
        slot=34
        trunk_pri=1
        uuid=fe7389a8-667f-38ca-b61e-84c94e5a3c97
        vswitch_id=4
      

      Configure the local-link IPv6 address in the NovaLink partition:

      # more /etc/network/interfaces
      [..]
      auto eth1
      iface eth1 inet manual
       up /sbin/ifconfig eth1 0.0.0.0
      # ifup eth1
      # ifconfig eth1
      eth1      Link encap:Ethernet  HWaddr b6:f8:37:19:2e:63
                inet6 addr: fe80::b4f8:37ff:fe19:2e63/64 Scope:Link
                UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
                RX packets:0 errors:0 dropped:0 overruns:0 frame:0
                TX packets:17 errors:0 dropped:0 overruns:0 carrier:0
                collisions:0 txqueuelen:1000
                RX bytes:0 (0.0 B)  TX bytes:1454 (1.4 KB)
                Interrupt:34
      

    Capture an AIX host with the latest version of rsct installed (3.2.1.0) or later and the latest version of cloud-init installed. This version of RMC/rsct handle this new feature so this is mandatory to have it installed on the captured host. When PowerVC will deploy a Virtual Machine on a Nova managed host with this version of rsct installed a new adapter with the PVID 4094 in the virtual switch MGMTSWITCH will be created and finally all the RMC traffic will use this adapter instead of your public IP address:

    # lslpp -L rsct*
      Fileset                      Level  State  Type  Description (Uninstaller)
      ----------------------------------------------------------------------------
      rsct.core.auditrm          3.2.1.0    C     F    RSCT Audit Log Resource
                                                       Manager
      rsct.core.errm             3.2.1.0    C     F    RSCT Event Response Resource
                                                       Manager
      rsct.core.fsrm             3.2.1.0    C     F    RSCT File System Resource
                                                       Manager
      rsct.core.gui              3.2.1.0    C     F    RSCT Graphical User Interface
      rsct.core.hostrm           3.2.1.0    C     F    RSCT Host Resource Manager
      rsct.core.lprm             3.2.1.0    C     F    RSCT Least Privilege Resource
                                                       Manager
      rsct.core.microsensor      3.2.1.0    C     F    RSCT MicroSensor Resource
                                                       Manager
      rsct.core.rmc              3.2.1.1    C     F    RSCT Resource Monitoring and
                                                       Control
      rsct.core.sec              3.2.1.0    C     F    RSCT Security
      rsct.core.sensorrm         3.2.1.0    C     F    RSCT Sensor Resource Manager
      rsct.core.sr               3.2.1.0    C     F    RSCT Registry
      rsct.core.utils            3.2.1.1    C     F    RSCT Utilities
    

    When this image will be deployed a new adapter will be created in the MGMTSWITCH virtual switch, an IPv6 local-link address will be configured on it. You can check the cloud-init activation to see the IPv6 address is configured at the activation time:

    # pvmctl vea list --parent-id name=aix72-2-0a0de5c5-00000095
    --------------------------
    | VirtualEthernetAdapter |
    --------------------------
      is_tagged_vlan_supported=True
      is_trunk=False
      loc_code=U8286.41A.216666-V5-C32
      mac=FA620F66FF20
      pvid=3331
      slot=32
      uuid=7f1ec0ab-230c-38af-9325-eb16999061e2
      vswitch_id=1
    --------------------------
    | VirtualEthernetAdapter |
    --------------------------
      is_tagged_vlan_supported=True
      is_trunk=False
      loc_code=U8286.41A.216666-V5-C33
      mac=46A066611B09
      pvid=4094
      slot=33
      uuid=560c67cd-733b-3394-80f3-3f2a02d1cb9d
      vswitch_id=4
    # ifconfig -a
    en0: flags=1e084863,14c0
            inet 10.10.66.66 netmask 0xffffff00 broadcast 10.14.33.255
             tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    en1: flags=1e084863,14c0
            inet6 fe80::c032:52ff:fe34:6e4f/64
             tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
    sit0: flags=8100041
            inet6 ::10.10.66.66/96
    [..]
    

    Note that the local-link address is configured at the activation time (fe80 starting addresses):

    # more /var/log/cloud-init-output.log
    [..]
    auto eth1
    
    iface eth1 inet6 static
        address fe80::c032:52ff:fe34:6e4f
        hwaddress ether c2:32:52:34:6e:4f
        netmask 64
        pre-up [ $(ifconfig eth1 | grep -o -E '([[:xdigit:]]{1,2}:){5}[[:xdigit:]]{1,2}') = "c2:32:52:34:6e:4f" ]
            dns-search fr.net.intra
    # entstat -d ent1 | grep -iE "switch|vlan"
    Invalid VLAN ID Packets: 0
    Port VLAN ID:  4094
    VLAN Tag IDs:  None
    Switch ID: MGMTSWITCH
    

    To be sure all is working correctly here is a proof test. I’m taking down the en0 interface on which the IPv4 public address is configured. Then I’m launching a tcpdump on the en1 (on the MGMTSWITCH address). Finally I’m resizing the Virtual Machine with PowerVC. AND EVERYTHING IS WORKING GREAT !!!! AWESOME !!! :-) (note the fe80 to fe80 communication):

    # ifconfig en0 down detach ; tcpdump -i en1 port 657
    tcpdump: WARNING: en1: no IPv4 address assigned
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on en1, link-type 1, capture size 96 bytes
    22:00:43.224964 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: S 4049792650:4049792650(0) win 65535 
    22:00:43.225022 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: S 2055569200:2055569200(0) ack 4049792651 win 28560 
    22:00:43.225051 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: . ack 1 win 32844 
    22:00:43.225547 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 1:209(208) ack 1 win 32844 
    22:00:43.225593 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: . ack 209 win 232 
    22:00:43.225638 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 1:97(96) ack 209 win 232 
    22:00:43.225721 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 209:377(168) ack 97 win 32844 
    22:00:43.225835 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 97:193(96) ack 377 win 240 
    22:00:43.225910 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 377:457(80) ack 193 win 32844 
    22:00:43.226076 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 193:289(96) ack 457 win 240 
    22:00:43.226154 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 457:529(72) ack 289 win 32844 
    22:00:43.226210 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 289:385(96) ack 529 win 240 
    22:00:43.226276 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: P 529:681(152) ack 385 win 32844 
    22:00:43.226335 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.32819: P 385:481(96) ack 681 win 249 
    22:00:43.424049 IP6 fe80::9850:f6ff:fe9c:5739.32819 > fe80::d09e:aff:fecf:a868.rmc: . ack 481 win 32844 
    22:00:44.725800 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 88
    22:00:44.726111 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 88
    22:00:50.137605 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 632
    22:00:50.137900 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 88
    22:00:50.183108 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 408
    22:00:51.683382 IP6 fe80::9850:f6ff:fe9c:5739.rmc > fe80::d09e:aff:fecf:a868.rmc: UDP, length 408
    22:00:51.683661 IP6 fe80::d09e:aff:fecf:a868.rmc > fe80::9850:f6ff:fe9c:5739.rmc: UDP, length 88
    

    To be sure security requirements are met from the lpar I’m pinging the NovaLink host (the first one) which is answering and then I’m pinging the second lpar (the second ping) which is not working. (And this is what we want !!!).

    # ping fe80::d09e:aff:fecf:a868
    PING fe80::d09e:aff:fecf:a868 (fe80::d09e:aff:fecf:a868): 56 data bytes
    64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=0 ttl=64 time=0.203 ms
    64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=1 ttl=64 time=0.206 ms
    64 bytes from fe80::d09e:aff:fecf:a868: icmp_seq=2 ttl=64 time=0.216 ms
    ^C
    --- fe80::d09e:aff:fecf:a868 ping statistics ---
    3 packets transmitted, 3 packets received, 0% packet loss
    round-trip min/avg/max = 0/0/0 ms
    # ping fe80::44a0:66ff:fe61:1b09
    PING fe80::44a0:66ff:fe61:1b09 (fe80::44a0:66ff:fe61:1b09): 56 data bytes
    ^C
    --- fe80::44a0:66ff:fe61:1b09 ping statistics ---
    2 packets transmitted, 0 packets received, 100% packet loss
    

    PowerVC 1.3.0.1 Dynamic Resource Optimizer

    In addition to the NovaLink part of this blog post I also wanted to talk about the killer app of 2016. Dynamic Resource Optimizer. This feature can be used on any PowerVC 1.3.0.1 managed hosts (you obviously need at least to hosts). DRO is in charge to re-balance your Virtual Machines across all the available hosts (in the host-group). To sum up if a host is experiencing an heavy load and reaching a certain amount of CPU consumption over a period of time, DRO will move your virtual machines to re-balance the load across all the available hosts (this is done at a host level). Here are a few details about DRO:

    • The DRO configuration is done at a host level.
    • You setup a threshold (in the capture below) to reach to trigger the Live Partition Moblity or Mobily Cores movements (Power Entreprise Pool).
    • droo6
      droo3

    • To be triggered this threshold must be reached a certain number of time (stabilization) over a period you are defining (run interval).
    • You can choose to move virtual machines using Live Partition Mobilty, or to move “cores” using Power Entreprise Pool (you can do both; moving CPU will always be preferred as moving partitions)
    • DRO can be run in advise mode (nothing is done, a warning is thrown in the new DRO events tab) or in active mode (which is doing the job and moving things).
      droo2
      droo1

    • Your most critical virtual machines can be excluded from DRO:
    • droo5

    How is DRO choosing which machines are moved

    I’m running DRO in production since now one month and I had the time to check what is going on behind the scene. How is DRO choosing which machines are moved when a Live Partition Moblity operation must be run to face an heavy load on a host ? To do so I decided to launch 3 different cpuhog (16 forks, 4VP, SMT4) processes (which are eating CPU ressource) on three different lpars with 4VP each. On the PowerVC I can check that before launching this processes the CPU consumption is ok on this host (the three lpars are running on the same host) :

    droo4

    # cat cpuhog.pl
    #!/usr/bin/perl
    
    print "eating the CPUs\n";
    
    foreach $i (1..16) {
          $pid = fork();
          last if $pid == 0;
          print "created PID $pid\n";
    }
    
    while (1) {
          $x++;
    }
    # perl cpuhog.pl
    eating the CPUs
    created PID 47514604
    created PID 22675712
    created PID 3015584
    created PID 21496152
    created PID 25166098
    created PID 26018068
    created PID 11796892
    created PID 33424106
    created PID 55444462
    created PID 65077976
    created PID 13369620
    created PID 10813734
    created PID 56623850
    created PID 19333542
    created PID 58393312
    created PID 3211988
    

    I’m waiting a couple of minutes and I realize that the virtual machines on which the cpuhog processes were launched are the ones which are migrated. So we can say that PowerVC is moving the machine that are eating CPU (another strategy could be to move all the non-eating CPU machines to let the working ones do their job without launching a mobility operation).

    # errpt | head -3
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    A5E6DB96   0118225116 I S pmig           Client Partition Migration Completed
    08917DC6   0118225116 I S pmig           Client Partition Migration Started
    

    After the moves are ok I can see that the load is now ok on the host. DRO has done the job for me and moved the lpar to met the configured thresold ;-)

    droo7dro_effect

    The images below will show you a good example of the “power” of PowerVC and DRO. To update my Virtual I/O Servers to the latest version the PowerVC maintenance mode was used to free up the Virtual I/O Servers. After leaving the maintenance mode the DRO was doing the job to re-balance the Virtual Machines across all the hosts (The red arrows symbolize the maintenance mode action and the purple ones the DRO actions). You can also see that some lpars were moved across 4 different hosts during this process. All these pictures are taken from real life experience on my production systems. This not a lab environment, this is one part of my production. So yes DRO and PowerVC 1.3.0.1 are production ready. Hell yes!

    real1
    real2
    real3
    real4
    real5

    Conclusion

    As my environment is growing bigger the next step for me will be to move on NovaLink on my P8 hosts. Please note that the NovaLink Co-Management feature is today a “TechPreview” but should be released GA very soon. Talking about DRO I was waiting for that for years and it finally happens. I can assure you that it is production ready, to prove this I’ll just give you this number. To upgrade my Virtual I/O Servers to 2.2.4.10 release using PowerVC maintenance mode and DRO more than 1000 Live Partition Mobility moves were performed without any outage on production servers and during working hours. Nobody in my company was aware of this during the operations. It was a seamless experience for everybody.