What’s new in VIOS 2.2.4.10 and PowerVM : Part 2 Shared Processor Pool weighting

First of all before beginning this blog post I owe you an explanation about these two months without new posts. These two months were very busy. On the personal side I was forced to move from my current apartment and had to find another one which was suitable for me (and I can assure you that this is not something really easy in Paris). As I was visiting apartments almost 3 days a week the time kept for writing blog posts (please remember that I’m doing that in my “after hours” work) was taken for something else :-(. At work things were crazy too, we had to build twelve new E870 boxes (with the provisioning toolkit and SRIOV adapters) and make them work with our current implementation of PowerVC. Then I had to do a huge vscsi to NPIV migration (more than 500 AIX machines to migrate from vscsi to NPIV and then move to P8 boxes in less than three weeks … yes more than 500 machines in less than 3 weeks (4000 zones created …). Thanks to the help of STG Lab Services consultant (Bonnie LeBarron) this was achieved using a modified version of her script (to fit our need (zoning and mapping part) (latest hmc releases)). I’m back in business now and I have planned a couple of blog posts this month. This first of this series is about the Shared Processor Pool weighting on the latest Power8 firmware versions. You’ll see that it changes a lot of things compared to P7 boxes.

A short history of Shared Processor Pool weighting

This long story began a few years ago for me (I’ll say at least 4 years ago) (I was planing to do a blog post about it a few years ago but decided not to do it because I was thinking this topic was considered as “sensible”, now that we have documentation and an official statement on this there is no reason to hide this anymore). I was working for a bank using two P795 with a lot of cores activated. We were using Multiple Shared Processor Pool in an unconventional way (as far as I remember two pools per customers one for Oracle and one for WAS, and we had more than 5 or 6 customers, so each box had at least 10 MSPP). As you may already know I only believe what I can see. So I decided to make tests on my own. By reading the Redbook I realized that there was not enough information about pool and partition weighting. We were like a lot of today’s customers having different weights for development (32), qualification (64), pre-production (128), production (192) and finally Virtual I/O Server (255). As we were using Shared Processor Pool I was expecting that when the Shared Processor Pool is full (contention) the weight will work and will prioritize the partition with the higher weight. What was my surprise when I realized the weighting was not working inside a Shared Processor Pool but only in the DefaultPool (Pool 0). Remember forever this statement on Power7 partition weighting is only working when the default pool is full. There is no “intelligence” in a Shared Processor Pool and you have to be very careful with the size of the pool because of that. On Power7 pools are used ONLY for licensing purpose. I then decided to contact my preferred IBM pre-sales in France to tell him about this incredible discovery. I had no answer for one month, then (as always) he came back with the answer of someone who already knows the truth about this. He introduced me to a performance expert (she was a performance expert at this time and is now specialized in security) and she was telling me that I was absolutely right with my discovery but that only a few people were aware of this. I decided to say nothing about it … but was sure that IBM realized there was a something to clarify about this. Then last year at the IBM Technical Collaboration Council I saw a PowerPoint slide telling that latest IBM Power8 firmware will add this long awaited feature. Partition weighting will work inside a Shared Processor Pool. Finally after waiting for more than four years I have what I want. As I was working on a new project in my current job I had to create a lot of Shared Processor Pool in a mixed Power7 (P770) and Power8 (E870) environment. It was the time to check if this new feature was really working and compare the differences between a Power8 (with latest firmware) and a Power7 machine (with latest firmware). The way we are implementing and monitoring the Shared Processor Pool on a Power8 will now be very different than it was on Power7 box. I think that this is really important and that everybody now needs to understand the differences for their future implementation. But let’s first have a look in the Redbooks to check the official statements:

The Redbook talking about this is “IBM PowerVM Virtualization Introduction and Configuration”, here is the key paragraph to understand (page 113 and 114):

redbook_statement

It was super hard to find but there is place were IBM is talking about this. I’m below quoting this link: https://www.ibm.com/support/knowledgecenter/9119-MME/p8hat/p8hat_sharedproc.htm

When the firmware is at level 8.3.0, or earlier, uncapped weight is used only when more virtual processors consume unused resources than the available physical processors in the shared processor pool. If no contention exists for processor resources, the virtual processors are immediately distributed across the physical processors, independent of their uncapped weights. This can result in situations where the uncapped weights of the logical partitions do not exactly reflect the amount of unused capacity.

For example, logical partition 2 has one virtual processor and an uncapped weight of 100. Logical partition 3 also has one virtual processor, but an uncapped weight of 200. If logical partitions 2 and 3 both require more processing capacity, and there is not enough physical processor capacity to run both logical partitions, logical partition 3 receives two more processing units for every additional processing unit that logical partition 2 receives. If logical partitions 2 and 3 both require more processing capacity, and there is enough physical processor capacity to run both logical partitions, logical partition 2 and 3 receive an equal amount of unused capacity. In this situation, their uncapped weights are ignored.

When the firmware is at level 8.4.0, or later, if multiple partitions are assigned to a shared processor pool, the uncapped weight is used as an indicator of how the processor resources must be distributed among the partitions in the shared processor pool with respect to the maximum amount of capacity that can be used by the shared processor pool. For example, logical partition 2 has one virtual processor and an uncapped weight of 100. Logical partition 3 also has one virtual processor, but an uncapped weight of 200. If logical partitions 2 and 3 both require more processing capacity, logical partition 3 receives two additional processing units for every additional processing unit that logical partition 2 receives.

The server distributes unused capacity among all of the uncapped shared processor partitions that are configured on the server, regardless of the shared processor pools to which they are assigned. For example, if you configure logical partition 1 to the default shared processor pool and you configure logical partitions 2 and 3 to a different shared processor pool, all three logical partitions compete for the same unused physical processor capacity in the server, even though they belong to different shared processor pools.

Testing methodology

We now need to demonstrate that the behavior of the weighting is different between a Power7 and a Power8 machine, here is how we are going to proceed :

  • On a Power8 machine (E870 SC840_056) we create a Shared Processor Pool with a “Maximum Processing unit” set to 1.
  • On a Power7 machine we create a Shared Processor Pool with a “Maximum Processing unit” set to 1.
  • We create two partitions in the P8 pool (1VP, 0.1EC) called mspp1 and mspp2.
  • We create two partitions in the P7 pool (1VP, 0.1EC) called mspp3 and mspp4.
  • Using ncpu providev with the nstress tools (http://public.dhe.ibm.com/systems/power/community/wikifiles/PerfTools/nstress_AIX6_April_2014.tar) we create an heavy load on each partition. Obviously this load can’t be higher than 1 processing unit in total (sum of each physc).
  • We then use these testing scenarios (each test has a duration of 15 minutes, we are recording cpu and pool stats with nmon and lpar2rrd)
    1. First partition with a weight of 128, the second partition with a weight of 128 (test with the same weight).
    2. First partition with a weight of 64, the second partition with a weight of 128 (test weight multiply by two 1/2).
    3. First partition with a weight of 32, the second partition with a weight of 128 (test weight multiply by four 1/4).
    4. First partition with a weight of 1, the second partition with a weight of 2 (we try here to prove that the ratio between two values is more important that the value itself. Values of 1 and 2 should give us the same result as 64 and 128)
    5. First partition with a weight of 1, the second partition with a weight of 255 (a ratio of 1:255) (you’ll see here that the result is pretty interesting :-) ).
  • You’ll see that It will not be necessary to do all these tests on the P7 box …. :-)

The Power8 case

Prerequistes

Firmware P8 SC840* or SV840* are mandatory to enable the weighting in a Shared Processor Pool on a machine without contention for processor resources (no contention in the DefaultPool). This means that all P6, P7 and P8 (with a firmware < 840) machines do not have this feature coded in the firmware. My advice is to update all your P8 machines to the latest level to enable this new behavior.

Tests

For each test, we prove the weight of each partition using the lparstat command, then we capture a nmon file every 30 seconds and we launch ncpu for a duration of 15 minutes with four CPUs (we are in SMT4) on both P8 and P7 box. We will show you here that weight are taken into account in a Power8 MSPP, but are not taken into account in a Power7 MSPP.

#lparstat -i | grep -iE "Variable Capacity Weight|^Partition"
Partition Name                             : mspp1-23bad3d7-00000898
Partition Number                           : 3
Partition Group-ID                         : 32771
Variable Capacity Weight                   : 255
Desired Variable Capacity Weight           : 255
# /usr/bin/nmon -F /admin/nmon/$(hostname)_weight255.nmon -s30 -c30 -t ; ./ncpu -p 4 -s 900
# lparstat 1 10
  • Both weights at 128, you can check in the picture below that the “physc” value are strictly equal (0.5 for both lpars) (the ratio of 1 between the two weight is respected) :
  • weight128

  • One partition to 64 and one partition to 128, you can check in the pictures below (lparstat output, and nmon analyser graph) that we now have different values for the physc value (0.36 for the mssp2 lpar and 0.64 for the mssp1 lpar). We now have a ratio of 2, mspp1 physc is two time the mspp2 physc (the weights are respected in the Shared Processor Pool):
  • weight64_128

nmonx2

This lpar2rrd graph show you the weighting behavior on a Power8 machine (test one: both weights equal to 128, and test two: with two different weights of 128 and 64).

graph_p8_128128_12864

  • One partition to 32 and one partition to 128: you can check in the picture below that the ratio of 3 (32:128) is respected (physc value to 0.26 and 0.74).
  • weight32_128

  • One partition to 1 and one partition to 2. The results here are exactly the same as the second test (128 and 64 weights), it proves you that the important thing to configure are the ratio between the weights and not the value itself (using 1 2 3 weights will give you the exact same results as 2 4 6):
  • weight1_2

  • Finally one partition to 1 and one partition to 255. Be careful here the ratio is big enough to have an unresponsive lpar when loading both partitions. I do not recommend putting such high ratios because of this:
  • weight1_255

graph_p8_12832_12_1255

The Power7 case

Let’s do one test on a Power7 machine with on lpar with a weight of 1 and the other one with a weight of 255 … you’ll see a huge difference here … and I think it is clear enough to avoid doing all the test scenarios on the Power7 machine.

Tests

You can see here that I’m doing the exact same test, weight to 1 and 255, now both partition have an equal physc value (0.5 for both partitions). On a Power7 box the weights will be taken into account only if the DefaultPool (pool0) is full (contention). The pictures below show you the reality of the Multiple Shared Processors pool running on a Power7 box. On Power7 MSPP must be used only for licensing purpose and nothing else.

weight1_255_power7
graph_p7_1255

Conclusion

I hope you better understand the Multiple Shared Processor Pools differences between Power8 and Power7. Now that you are aware of this my advice is to have different strategies when you are implementing MSPP on Power7 and Power8. On Power7 double check and monitor your MSPP to be sure the pools are never full and that you can get enough capacity to run you load. On a Power8 box setup you weights wisely on your different environments (backup, production, development). You can then be sure that the production will be prioritized whatever appends even if you reduce your MSPP sizes, by doing this you’ll maximize licensing costs. As always I hope it help.

5 thoughts on “What’s new in VIOS 2.2.4.10 and PowerVM : Part 2 Shared Processor Pool weighting

  1. Thanks thanks thanks.

    You’ve just answering to me a question i asked myself since 2011.

    I went to IBM training in 2011 about Power virtu perf (dont remember the number exactly). And i suspected a strange behaviour in MSPP.

    The test was the following :

    –> MSPP of 3 entitled
    –> An LPAR with weight of 100 executes a test program that generates cpu cycles
    –> Another LPAR with higher weight executes the same program a few minutes after the first LPAR

    I was very suprised to see that the LPAR that had higher priority did not beneficit of more CPU cycles.
    I asked the former (which was a very skilled man in perf : JY Brucker) and didn’t know why this strange behaviour occured.

    After the traning i didn’t meet such a situation and i let my question in deep of my mind for a long time…

    Now i understand that my question was legitime…

    Thanks again for your great work :-)

    • Not a bug. Work as designed.
      I think it’s just a lack of information from IBM on how it work on P7.
      What is really strange is that you totally have to change the way you use/monitor SPP between P7 and P8 … and there is NO change in the 840 firmware release note about that. TT.

  2. Wow,
    Very surprised to know about that. Just for the fun, I just redid same tests as you on my lab (2 P7+ p270 Flex) and have had the same results (with weight 64-128 and 1-255).

    Thanks again, and happy to know that IBM improved this behaviour.

    Erwin

  3. Interesting numbers. I know it has been a while since you posted this, but I would be curious about one last test. What about running the scenarios with a weight of 255 and something like 250, or even 196. The reason I think of this is, somewhere in the back of my mind, I seem to remember something like: Processes running with a priority of 255 are considered highest priority “real time” processes. As “real time”, they get their processor requirements regardless of impact and priority of any process with a lower setting. So I would be curious if it the the priority of 1 that made the difference, or simply the fact that server had the 255 ranking.

    Just curious (and I don’t have an environment I can test this on :-( )

    Ken C

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>