A few weeks ago I had to work on simplified remote restart. I’m not lucky enough yet -because of some political decisions in my company- to have access to any E880 or E870. We just have a few scale-out machines to play with (S814). For some critical applications we need in the future to be able to reboot the virtual machine if the system hosting the machine has failed (Hardware problem). We decided a couple of month ago not to use remote restart because it was mandatory to use a reserved storage pool device and it was too hard to manage because of this mandatory storage. We now have enough P8 boxes to try and understand the new version of remote restart called simplified remote restart which does not need any reserved storage pool device. For those who want to understand what remote restart is I strongly recommend you to check my previous blog post about remote restart on two P7 boxes: Configuration of a remote restart partition. For the others here is what I learned about the simplified version of this awesome feature.
Please keep in mind that the FSP of the machine must be up to perform a simplified remote restart operation. It means that if for instance you loose one of your datacenter or the link between your two datacenters you cannot use simplified remote restart to restart you partitions on the main/backup site. Simplified Remote Restart only prevents you from an hardware failure of your machine. Maybe this will change in a near future but for the moment it is the most important thing to understand about simplified remote restart.
Updating to the latest version of firmware
I was very surprised when I got my Power8 machines. After deploying these boxes I decided to give a try to simplified remote restart but It was just not possible. Since the Power8 Scale Out servers were release they were NOT simplified remote restart capable. The release of the SV830 firmware now enables the Simplified Remote restart on Power8 Scale Out machines. Please note that there is nothing about it in the patch note, so chmod666.org is the only place where you can get this information :-). Here is the patch note: here. Last word you will find on the internet that you need Power8 to use simplified remote restart. It’s true but partially true. YOU NEED A P8 MACHINE WITH AT LEAST A 820 FIRMWARE.
The first thing to do is to update your firmware to the SV830 version (on both systems participating in the simplified remote restart operation):
# updlic -o u -t sys -l latest -m p814-1 -r mountpoint -d /home/hscroot/SV830_048 -v [..] # lslic -m p814-1 -F activated_spname,installed_level,ecnumber FW830.00,48,01SV830 # lslic -m p814-2 -F activated_spname,installed_level,ecnumber FW830.00,48,01SV830
You can check the firmware version directly from the Hardware Management Console or in the ASMI:
After the firmware upgrade verify that you now have the Simplfied Remote Restart capability set to true.
# lssyscfg -r sys -F name,powervm_lpar_simplified_remote_restart_capable p720-1,0 p814-1,1 p720-2,0 p814-2,1
These prerequisites are true ONLY for Scale out systems:
- To update to the firmware SV830_048 you need the latest Hardware Management Console release which is v8r8.3.0 plus MH01514 PTF.
- Obviously on Scale out system SV830_048 is the minimum firmware requirement.
- Minimum level of Virtual I/O Servers is 18.104.22.168 (for both source and destination systems).
- PowerVM enterprise. (to be confirmed)
Enabling simplified remote restart of an existing partition
You probably want to enable simplified remote restart after an LPM migration/evacuation. After migrating your virtual machine(s) to a Power 8 with the Simplified Remote Restart Capability you have to enable this capability on all the virtual machines. This can only be done when the machine is shutdown, so you first have to stop the virtual machines (after a live partition mobility move) if you want to enable the SRR. It can’t be done without having to reboot the virtual machine:
- List current partition running on the system and check which one are “simplified remote restart capable” (here only one is simplified remote restart capable):
# lssyscfg -r lpar -m p814-1 -F name,simplified_remote_restart_capable vios1,0 vios2,0 lpar1,1 lpar2,0 lpar3,0 lpar4,0 lpar5,0 lpar6,0 lpar7,0
# for i in lpar2 lpar3 lpar4 lpar5 lpar6 lpar7 ; do chsyscfg -r lpar -m p824-2 -i "name=$i,simplified_remote_restart_capable=1" ; done An error occurred while changing the partition named lpar6. HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown. An error occurred while changing the partition named lpar7. HSCLA9F8 The remote restart capability of the partition can only be changed when the partition is shutdown. # lssyscfg -r lpar -m p824-1 -F name,simplified_remote_restart_capable,lpar_env | grep -v vioserver lpar1,1,aixlinux lpar2,1,aixlinux lpar3,1,aixlinux lpar4,1,aixlinux lpar5,1,aixlinux lpar6,0,aixlinux lpar7,0,aixlinux
If you are trying to do a live partition mobility operation back to a P7 or P8 box without the simplified remote restart capability it will not be possible. Enabling the simplified remote restart will force the virtual machine to stay on P8 boxes with simplified remote restart capability. This is one of the reason why most of customers are not doing it:
# migrlpar -o v -m p814-1 -t p720-1 -p lpar2 Errors: HSCLB909 This operation is not allowed because managed system p720-1 does not support PowerVM Simplified Partition Remote Restart.
On the Hardware Management Console you can see that the virtual machine is simplified remote restart capable by checking its properties:
You can now try to remote restart your virtual machines to another server. As always the status of the server has to be different from Operating (Power Off, Error, Error – Dump in progress, Initializing). As always my advice is to validate before restarting:
# rrstartlpar -o validate -m p824-1 -t p824-2 -p lpar1 # echo $? 0 # rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1 HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.
# lssyscfg -r sys -F name,state p824-2,Operating p824-1,Power Off # rrstartlpar -o restart -m p824-1 -t p824-2 -p lpar1
By doing a remote restart operation the machine will boot automatically. You can check in the errpt that in most cases the partition ID will be changed (proving that you are on another machine):
# errpt | more IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION A6DF45AA 0618170615 I O RMCdaemon The daemon is started. 1BA7DF4E 0618170615 P S SRC SOFTWARE PROGRAM ERROR CB4A951F 0618170615 I S SRC SOFTWARE PROGRAM ERROR CB4A951F 0618170615 I S SRC SOFTWARE PROGRAM ERROR D872C399 0618170615 I O sys0 Partition ID changed and devices recreat
Be very careful with the ghostdev sys0 attribute. Every VM remote restarted needs to have ghostdev set to 0 to avoid an ODM wipe (If you remote restart an lpar with ghostdev set to 1 you will loose all ODM customization)
# lsattr -El sys0 -a ghostdev ghostdev 0 Recreate ODM devices on system change / modify PVID True
When the source machine is up and running you have to clean the old definition of the remote restarted lpar by launching a cleanup operation. This will wipe the old lpar defintion:>
# rrstartlpar -o cleanup -m p814-1 -p lpar1
The RRmonitor (modified version)
There is a script delivered by IBM called rrMonitor, this one is looking at the PowerSystem‘s state and if this one is in particular state is restarting a specific virtual machine. This script is just not usable by a user because it has to be executed directly on the HMC (you need a pesh password to put the script on the hmc) and is only checking one particular virtual machine. I had to modify this script to ssh to the HMC and then check for every lpar on the machine and not just one in particular. You can download my modified version here : rrMonitor. Here is what’s the script is doing:
- Checking the state of the source machine.
- If this one is not “Operating”, the script search for every remote restartable lpars on the machine.
- The script is launching remote restart operations to remote restart all the partitions.
- The script is telling the user the command to cleanup the old lpar when the source machine will be running again.
# ./rrMonitor p814-1 p814-2 all 60 myhmc Getting remote restartable lpars lpar1 is rr simplified capable lpar1 rr status is Remote Restartable lpar2 is rr simplified capable lpar2 rr status is Remote Restartable lpar3 is rr simplified capable lpar3 rr status is Remote Restartable lpar4 is rr simplified capable lpar4 rr status is Remote Restartable Checking for source server state.... Source server state is Operating Checking for source server state.... Source server state is Operating Checking for source server state.... Source server state is Power Off In Progress Checking for source server state.... Source server state is Power Off It's time to remote restart Remote restarting lpar1 Remote restarting lpar2 Remote restarting lpar3 Remote restarting lpar4 Thu Jun 18 20:20:40 CEST 2015 Source server p814-1 state is Power Off Source server has crashed and hence attempting a remote restart of the partition lpar1 in the destination server p814-2 Thu Jun 18 20:23:12 CEST 2015 The remote restart operation was successful The cleanup operation has to be executed on the source server once the server is back to operating state The following command can be used to execute the cleanup operation, rrstartlpar -m p814-1 -p lpar1 -o cleanup Thu Jun 18 20:23:12 CEST 2015 Source server p814-1 state is Power Off Source server has crashed and hence attempting a remote restart of the partition lpar2 in the destination server p814-2 Thu Jun 18 20:25:42 CEST 2015 The remote restart operation was successful The cleanup operation has to be executed on the source server once the server is back to operating state The following command can be used to execute the cleanup operation, rrstartlpar -m sp814-1 -p lpar2 -o cleanup Thu Jun 18 20:25:42 CEST 2015 [..]
As you can see the Simplified version of the remote restart feature is simpler that the normal one. My advice is to create all your lpars with the simplified remote restart attribute. It’s that easy :). If you plan to LPM back to P6 or P7 box, don’t use simplified remote restart. I think this functionality will become more popular when all the old P7 and P6 will be replaced by P8. As always I hope it helps.
Here are a couple of link with great documentations about Simplified Remote Restart: