A first look at SRIOV vNIC adapters

I have the chance to participate in the current Early Shipment Program (ESP) for Power Systems, especially the software part. One of my tasks is to test a new feature called SRIOV vNIC. For those who does not know anything about SRIOV this technology is comparable to LHEA except it is based on a industry standard (and have a couple of other features). By using SRIOV adapter you can divide a physical port into what we call a Virtual Function (or a Logical Port) and map this Virtual Function to a partition. You can also set “Quality Of Service” on these Virtual Functions. At the creation you will setup the Virtual Function allowing it to take a certain percentage of the physical port. These can be very useful if you want to be sure that your production server will always have a guaranteed bandwidth instead of using a Shared Ethernet Adapter where every clients partitions are competing for the bandwidth. Customers are also using SRIOV adapters for performance purpose ; as nothing is going through the Virtual I/O Server the latency added by this action is eliminated and CPU cycles are saved on the Virtual I/O Server side (Shared Ethernet Adapter consume a lot of CPU cycles). If you are not aware of what SRIOV is I encourage you to check the IBM Redbook about it (http://www.redbooks.ibm.com/abstracts/redp5065.html?Open. Unfortunately you can’t move a partition by using Live Partition Mobility if this one have a Virtual Function assigned to it. Using vNICs allows you to use SRIOV through the Virtual I/O Servers and enable the possibility to move your partition even if you are using an SRIOV logical port. The better of two worlds : performance/qos and virtualization. Is this the end of the Shared Ethernet Adapter ?

SRIOV vNIC, what’s this ?

Before talking about the technical details it is important to understand what vNICs are. When I’m explaining this to newbies I often refer to NPIV. Imagine something similar as the NPIV but for the network part. By using SRIOV vNIC:

  • A Virtual Function (SRIOV Logical Port) is created and assigned to the Virtual I/O Server.
  • A vNIC adapter is created in the client partition.
  • The Virtual Function and the vNIC adapter are linked (mapped) together.
  • This is a one to one relationship between a Virtual Function and a vNIC (like a vfcs adapter is a one to one relationship between your vfcs and the physical fiber channel adapter).

On the image below, the vNIC lpars are the “yellow” ones, you can see here that the SRIOV adapter is divided in different Virtual Function, and some of them are mapped to the Virtual I/O Server. The relationship between the Virtual Function and the vNIC is achieved by a vnicserver (this is a special Virtual I/O Server device).
vNIC

One of the major advantage of using vNIC is that you eliminate the need of the Virtual I/O Server for data flows:

  • The network data flow is direct between the partition memory and the SRIOV adapter, there is no data copy passing through the Virtual I/O Server and it eliminate the CPU cost and the latency of doing that. This is achieved by LRDMA. Pretty cool !
  • The vNIC will inherits the bandwidth allocation of the Virtual Function (QoS). If the VF is configured with a capacity of 2% the vNIC will also have this capacity.
  • vNIC2

vNIC Configuration

Before checking all the details on how to configure an SRIOV vNIC adapter you have to check all the prerequisites. As this is a new feature you will need the latest level of …. everything. My advice is to stay up to date as much as possible.

vNIC Prerequisites

These outputs are taken from the early shipment program. All of this can be changed at the GA release:

  • Hardware Management Console v840:
  • # lshmc -V
    lshmc -V
    "version= Version: 8
     Release: 8.4.0
     Service Pack: 0
    HMC Build level 20150803.3
    ","base_version=V8R8.4.0
    "
    
  • Power 8 only, firmware 840 at least (both enterprise and scale out systems):
  • firmware

  • AIX 7.1TL4 or AIX 7.2:
  • # oslevel -s
    7200-00-00-0000
    # cat /proc/version
    Oct 20 2015
    06:57:03
    1543A_720
    @(#) _kdb_buildinfo unix_64 Oct 20 2015 06:57:03 1543A_720
    
  • Obviously at least on SRIOV capable adapter!

Using the HMC GUI

The configuration of a vNIC is done at the partition level. The configuration is only available on the enhanced version of the GUI. Select the virtual machine on which you want to add the vNIC and in the Virtual I/O tab you’ll see that a new Virtual NICs session is here. Click on “Virtual NICs” and a new panel will be opened with a new button called “Add Virtual NIC”, just click this one to add a Virtual NIC:

vnic_n1
vnic_conf2

All the SRIOV capable port will be displayed on the next screen. Choose the SRIOV port you want (a virtual function will be created on this one. Don’t do anything more, the creation of a vNIC will automatically create a Virtual Function; assign it to Virtual I/O Server and do the mapping to the vNIC for you). Choose the Virtual I/O Server that will be used for this vNIC (the vNIC server will be created on this Virtual I/O Server. Don’t worry we will talk about vNIC redundancy later in this post) and the Virtual NIC Capacity (the percentage the Phyiscal SRIOV port that will be dedicated to this vNIC)(this has to be a multiple of 2)(be careful with that it can’t be changed afterwards and you’ll have to delete your vNIC to redo the configuration) :

vnic_conf3

The “Advanced Virtual NIC Settings” allows you to choose the Virtual NIC Adapter ID, choosing a MAC Address, and configuring the vlan restrictions and vlan tagging. In the example below I’m configuring my Virtual NIC in the vlan 310:

vnic_conf4
vnic_conf5
allvnic

Using the HMC Command Line

As always the configuration can be achieved using the HMC command line, using lshwres to list vNIC and chhwres to create a vNIC.

List SRIOV adapters to get the adapter_id needed by the chhwres command:

# lshwres -r sriov --rsubtype adapter -m blade-8286-41A-21AFFFF
adapter_id=1,slot_id=21020014,adapter_max_logical_ports=48,config_state=sriov,functional_state=1,logical_ports=48,phys_loc=U78C9.001.WZS06RN-P1-C12,phys_ports=4,sriov_status=running,alternate_config=0
# lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0

Create the vNIC:

# chhwres -r virtualio -m blade-8286-41A-21AFFFF -o a -p 72vm1 --rsubtype vnic -v -a "port_vlan_id=310,backing_devices=sriov/vios2/1/1/1/2"

List the vNIC after create:

# lshwres -r virtualio  -m blade-8286-41A-21AFFFF --rsubtype vnic --level lpar --filter "lpar_names=72vm1"
lpar_name=72vm1,lpar_id=9,slot_num=7,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87707,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios1/2/1/1/27004008/2.0
lpar_name=72vm1,lpar_id=9,slot_num=2,desired_mode=ded,curr_mode=ded,port_vlan_id=310,pvid_priority=0,allowed_vlan_ids=all,mac_addr=ee3b8cd87702,allowed_os_mac_addrs=all,desired_capacity=2.0,backing_devices=sriov/vios2/1/1/1/2700400a/2.0

System and Virtual I/O Server Side:

  • On the Virtual I/O Server you can use two commands to check your vNIC configuration. You can first use the lsmap command to check the one to one relationship between the VF and the vNIC (you see on the output below that a VF and a vnicserver device are created)(you can also see the name of the vNIC in the client partition side) :
  • # lsdev | grep VF
    ent4             Available   PCIe2 100/1000 Base-TX 4-port Converged Network Adapter VF (df1028e214103c04)
    # lsdev | grep vnicserver
    vnicserver0      Available   Virtual NIC Server Device (vnicserver)
    # lsmap -vadapter vnicserver0 -vnic
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver0   U8286.41A.21FFFFF-V2-C32897             6 72nim1         AIX
    
    Backing device:ent4
    Status:Available
    Physloc:U78C9.001.WZS06RN-P1-C12-T4-S16
    Client device name:ent1
    Client device physloc:U8286.41A.21FFFFF-V6-C3
    
  • You can get more details (QoS, vlan tagging, port states) by using the vnicstat command:
  • # vnicstat -b vnicserver0
    [..]
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver0
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent4
    
    Client Partition ID: 6
    Client Partition Name: 72nim1
    Client Operating System: AIX
    Client Device Name: ent1
    Client Device Location Code: U8286.41A.21FFFFF-V6-C3
    [..]
    Device ID: df1028e214103c04
    Version: 1
    Physical Port Link Status: Up
    Logical Port Link Status: Up
    Physical Port Speed: 1Gbps Full Duplex
    [..]
    Port VLAN (Priority:ID): 0:3331
    [..]
    VF Minimum Bandwidth: 2%
    VF Maximum Bandwidth: 100%
    
  • On the client side you can list your vNIC and as always have details using the entstat command:
  • # lsdev -c adapter -s vdevice -t IBM,vnic
    ent0 Available  Virtual NIC Client Adapter (vnic)
    ent1 Available  Virtual NIC Client Adapter (vnic)
    ent3 Available  Virtual NIC Client Adapter (vnic)
    ent4 Available  Virtual NIC Client Adapter (vnic)
    # entstat -d ent0 | more
    [..]
    ETHERNET STATISTICS (ent0) :
    Device Type: Virtual NIC Client Adapter (vnic)
    [..]
    Virtual NIC Client Adapter (vnic) Specific Statistics:
    ------------------------------------------------------
    Current Link State: Up
    Logical Port State: Up
    Physical Port State: Up
    
    Speed Running:  1 Gbps Full Duplex
    
    Jumbo Frames: Disabled
    [..]
    Port VLAN ID Status: Enabled
            Port VLAN ID: 3331
            Port VLAN Priority: 0
    

Redundancy

You will certainly agree that having a such new cool feature without having something that is fully redundant would be a shame. Hopefully we have here a solution with the return with a great fanfare of the Network Interface Backup (NIB). As I told you before each time a vNIC is created a vnicserver is created on one of the Virtual I/O Server. (At the vNIC creation you have to choose on which Virtual I/O server it will be created). So to be fully redundant and to have a failover feature the only way is to create two vNIC adapters (one using the first Virtual I/O Server and the second one using the second Virtual I/O Server, on top of this you then have to create a Network Interface Backup, like in the old times :-) ). Here are a couple of things and best practices to know before doing this.

  • You can’t use two VF coming from the same SRIOV adapter physical port (the NIB creation will be ok, but any configuration on top of this NIB will fail).
  • You can use two VF coming from the same SRIOV adapter but with two different logical ports (this is the example I will show below).
  • The best partice is to use two VF coming from two different SRIOV adapters (you can then afford to loose one of the two SRIOV adapter).

vNIC_nib

Verify on your partition that you have two vNIC adapters and check that the status are ok using the ‘entstat‘ command:

  • Both vNIC are available on the client partition:
  • # lsdev -c adapter -s vdevice -t IBM,vnic
    ent0 Available  Virtual NIC Client Adapter (vnic)
    ent1 Available  Virtual NIC Client Adapter (vnic)
    # lsdev -c adapter -s vdevice -t IBM,vnic -F physloc
    U8286.41A.21FFFFF-V6-C2
    U8286.41A.21FFFFF-V6-C3
    
  • You can check on the first Virtual I/O Server that “Current Link State”, “Logical Port State” and “Physical Port State” are ok (all of them needs to be up):
  • # entstat -d ent0 | grep -p vnic
    -------------------------------------------------------------
    ETHERNET STATISTICS (ent0) :
    Device Type: Virtual NIC Client Adapter (vnic)
    Hardware Address: ee:3b:86:f6:45:02
    Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
    
    Virtual NIC Client Adapter (vnic) Specific Statistics:
    ------------------------------------------------------
    Current Link State: Up
    Logical Port State: Up
    Physical Port State: Up
    
  • Same on the second Virtual I/O Server:
  • # entstat -d ent1 | grep -p vnic
    -------------------------------------------------------------
    ETHERNET STATISTICS (ent1) :
    Device Type: Virtual NIC Client Adapter (vnic)
    Hardware Address: ee:3b:86:f6:45:03
    Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
    
    Virtual NIC Client Adapter (vnic) Specific Statistics:
    ------------------------------------------------------
    Current Link State: Up
    Logical Port State: Up
    Physical Port State: Up
    

Verify on both Virtual I/O Server that the two vNIC are coming from two different SRIOV adapters (for the purpose of this test I’m using two different ports on the same SRIOV adapters but it remains the same with two different adapters). You can see on the output below that on Virtual I/O Server 1 the vNIC is backed to the adapter on position 3 (T3) and that on Virtual I/O Server 2 the vNIC is backed to the adapter on position 4 (T4):

  • Once again use the lsmap command on the first Virtual I/O Server to check that (note that you can check the client name, and the client device):
  • # lsmap -vadapter vnicserver0 -vnic
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver0   U8286.41A.21AFF8V-V1-C32897             6 72nim1         AIX
    
    Backing device:ent4
    Status:Available
    Physloc:U78C9.001.WZS06RN-P1-C12-T3-S13
    Client device name:ent0
    Client device physloc:U8286.41A.21AFF8V-V6-C2
    
  • Same thing on the second Virtual I/O Server:
  • # lsmap -vadapter vnicserver0 -vnic -fmt :
    vnicserver0:U8286.41A.21AFF8V-V2-C32897:6:72nim1:AIX:ent4:Available:U78C9.001.WZS06RN-P1-C12-T4-S14:ent1:U8286.41A.21AFF8V-V6-C3
    

Finally create the Network Interface Backup and put and IP on top of it:

# mkdev -c adapter -s pseudo -t ibm_ech -a adapter_names=ent0 -a backup_adapter=ent1
ent2 Available
# mktcpip -h 72nim1 -a 10.44.33.223 -i en2 -g 10.44.33.254 -m 255.255.255.0 -s
en2
72nim1
inet0 changed
en2 changed
inet0 changed
[..]
# echo "vnic" | kdb
+-------------------------------------------------+
|       pACS       | Device | Link |    State     |
|------------------+--------+------+--------------|
| F1000A0032880000 |  ent0  |  Up  |     Open     |
|------------------+--------+------+--------------|
| F1000A00329B0000 |  ent1  |  Up  |     Open     |
+-------------------------------------------------+

Let’s now try different things to see if the redundancy is working ok. First let’s shutdown one of the Virtual I/O Server and let’s ping our machine from another one:

# ping 10.14.33.223
PING 10.14.33.223 (10.14.33.223) 56(84) bytes of data.
64 bytes from 10.14.33.223: icmp_seq=1 ttl=255 time=0.496 ms
64 bytes from 10.14.33.223: icmp_seq=2 ttl=255 time=0.528 ms
64 bytes from 10.14.33.223: icmp_seq=3 ttl=255 time=0.513 ms
[..]
64 bytes from 10.14.33.223: icmp_seq=40 ttl=255 time=0.542 ms
64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.514 ms
64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.550 ms
64 bytes from 10.14.33.223: icmp_seq=48 ttl=255 time=0.596 ms
[..]
--- 10.14.33.223 ping statistics ---
50 packets transmitted, 45 received, 10% packet loss, time 49052ms
rtt min/avg/max/mdev = 0.457/0.525/0.596/0.043 ms
# errpt | more
IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
59224136   1120200815 P H ent2           ETHERCHANNEL FAILOVER
F655DA07   1120200815 I S ent0           VNIC Link Down
3DEA4C5F   1120200815 T S ent0           VNIC Error CRQ
81453EE1   1120200815 T S vscsi1         Underlying transport error
DE3B8540   1120200815 P H hdisk0         PATH HAS FAILED
# echo "vnic" | kdb
(0)> vnic
+-------------------------------------------------+
|       pACS       | Device | Link |    State     |
|------------------+--------+------+--------------|
| F1000A0032880000 |  ent0  | Down |   Unknown    |
|------------------+--------+------+--------------|
| F1000A00329B0000 |  ent1  |  Up  |     Open     |
+-------------------------------------------------+

Same test with the addition of an address to ping, and I’m only loosing 4 packets:

# ping 10.14.33.223
[..]
64 bytes from 10.14.33.223: icmp_seq=41 ttl=255 time=0.627 ms
64 bytes from 10.14.33.223: icmp_seq=42 ttl=255 time=0.548 ms
64 bytes from 10.14.33.223: icmp_seq=46 ttl=255 time=0.629 ms
64 bytes from 10.14.33.223: icmp_seq=47 ttl=255 time=0.492 ms
[..]
# errpt | more
59224136   1120203215 P H ent2           ETHERCHANNEL FAILOVER
F655DA07   1120203215 I S ent0           VNIC Link Down
3DEA4C5F   1120203215 T S ent0           VNIC Error CRQ

vNIC Live Partition Mobility

By default you can use Live Partition Mobility with SRIOV vNIC, it is super simple and it is fully supported by IBM, as always I’ll show you how to do that using the HMC GUI and the command line:

Using the GUI

First validate the mobility operation, it will allow you to choose the destination SRIOV adapter/port on which to map your current vNIC. You have to choose:

  • The adapter (if you have more than one SRIOV adapter).
  • The Physical port on which the vNIC will be mapped.
  • The Virtual I/O Server on which the vnicserver will be created.

New options are now available in the mobility validation panel:

lpmiov1

Modify each vNIC to match your destination SRIOV adapter and ports (choose the destination Virtual I/O Server here):

lpmiov2
lpmiov3

Then migrate:

lpmiov4

IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
A5E6DB96   1120205915 I S pmig           Client Partition Migration Completed
4FB9389C   1120205915 I S ent1           VNIC Link Up
F655DA07   1120205915 I S ent1           VNIC Link Down
11FDF493   1120205915 I H ent2           ETHERCHANNEL RECOVERY
4FB9389C   1120205915 I S ent1           VNIC Link Up
4FB9389C   1120205915 I S ent0           VNIC Link Up
[..]
59224136   1120205915 P H ent2           ETHERCHANNEL FAILOVER
B50A3F81   1120205915 P H ent2           TOTAL ETHERCHANNEL FAILURE
F655DA07   1120205915 I S ent1           VNIC Link Down
3DEA4C5F   1120205915 T S ent1           VNIC Error CRQ
F655DA07   1120205915 I S ent0           VNIC Link Down
3DEA4C5F   1120205915 T S ent0           VNIC Error CRQ
08917DC6   1120205915 I S pmig           Client Partition Migration Started

The ping test during the lpm show only 9 ping lost, due to etherchannel failover (on of my port was down at the destination server):

# ping 10.14.33.223
64 bytes from 10.14.33.223: icmp_seq=23 ttl=255 time=0.504 ms
64 bytes from 10.14.33.223: icmp_seq=31 ttl=255 time=0.607 ms

Using the command line

I’m moving back the partition using the HMC command line interface, check the manpage for all the details. Here is the details for the vnic_mappings: slot_num/ded/[vios_lpar_name]/[vios_lpar_id]/[adapter_id]/[physical_port_id]/[capacity]:

  • Validate:
  • # migrlpar -o v -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'
    
    Warnings:
    HSCLA291 The selected partition may have an open virtual terminal session.  The management console will force termination of the partition's open virtual terminal session when the migration has completed.
    
  • Migrate:
  • # migrlpar -o m -m blade-8286-41A-21AFFFF -t  runner-8286-41A-21AEEEE  -p 72nim1 -i 'vnic_mappings="2/ded/vios1/1/1/2/2,3/ded/vios2/2/1/3/2"'
    

Port Labelling

One thing very annoying using LPM with vNIC is that you have to do the mapping of your vNIC each time you are moving. The default choices are never ok and the GUI will always show you the first port or the first adapter and you will have to do that job by yourself. Even worse with the command line the vnic_mappings can give you some headaches :-) . Hopefully there is a feature called port labelling. You can put a label on each SRIOV Physical port and all your machines. My advice is to tag the ports that are serving the same network and the same vlan with the same label on all your machines. During the mobility operation if labels are matching between two machine the adapter/port combination matching the label will be automatically chosen for the mobility and you will have nothing to do to map on your own. Super useful. The outputs below show you how to label your SRIOV ports:

label1
label2

# chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=3,phys_port_label=adapter1port3"
# chhwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport -o s -a "adapter_id=1,phys_port_id=2,phys_port_label=adapter1port2"
# lshwres -m s00ka9942077-8286-41A-21C9F5V -r sriov --rsubtype physport --level eth -F adapter_id,phys_port_label
1,adapter1port2
1,adapter1port3

At the validation time source and destination ports will automatically be matched:

labelautochoose

What about performance

One of the main reason I’m looking for SRIOV vNIC adapter is performance. As all of our design is based on the fact that we need to move all of our virtual machines from a host to one another we need a solution allowing both mobility and performance. If you have tried to run a TSM server in a virtualized environment you’ll probably understand what I mean about performance and virtualization. In the case of TSM you need a lot of network bandwidth. My current customer and my previous one tried to do that using Shared Ethernet Adapters and of course this solution did not work because a classic Virtual Ethernet Adapter is not able to provide enough bandwidth for a single Virtual I/O client. I’m not an expert about network performance but the result you will see below are pretty obvious to understand and will show you the power of vNIC and SRIOV (I know some optimization can be done on the SEA side but it’s just a super simple test).

Methodology

I will try here to compare a classic Virtual Ethernet Adapter with a vNIC in the same configuration, both environments are the same, using same machines, same switches on so on:

  • Two machines are used to do the test. In case of vNIC both are using a single vNIC bacedk to a 10Gb adapter, in case of Virtual Ethernet Adapter both are backed to a SEA build on top of a 10Gb adapter.
  • The two machines are running on two different s814.
  • Entitlement and memory are the same for source and destination machines.
  • In the case of vNIC the capacity of the VF is set at 100% and the physical port of the SRIOV adapter is dedicated to the vNIC.
  • In the case of vent the SEA is dedicated to the test virtual machine.
  • In both cases a MTU of 1500 is utilized.
  • The tool used for the performance test is iperf (MTU 1500, Window Size 64K, and 10 TCP thread)

SEA test for reference only

  • iperf server:
  • seaserver1

  • iperf client:
  • seacli1

vNIC SRIOV test

We are here running the exact same test:

  • iperf server:
  • iperf_vnic_client2

  • iperf client:
  • iperf_vnic_client

By using a vNIC I get 300% of the bandwidth I get with an virtual ethernet adapter. Just awesome ;-) no tuning (out of the box configuration). Nothing more to add about it it’s pretty obvious that the usage of vNIC for performance will be a must.

Conclusion

Are SRIOV vNICs the end of the SEAs ? Maybe, but not yet ! For some cases like performance and QoS it will be very useful and adopted (I’m pretty sure I will use that for my current customer to virtualized the TSM servers). But today in my opinion SRIOV lacks a real redundancy feature at the adapter level. What I want is a heartbeat communication between the two SRIOV adapters. Having such a feature on a SRIOV adapter will finish to convince customers to move from SEA to SRIOV vNIC. I know nothing about the future but I hope something like that will be available in the next few years. To sum up SRIOV vNICs are powerful, easy to use and simplify the configuration and management of your Power Servers. Please wait for the GA and try this new killer functionality. As always I hope it helps.