Configuration of a Remote Restart Capable partition

How can we move a partition to another machine if the machine or the data-center on which the partition is hosted is totally unavailable ? This question is often asked by managers and technical people. Live Partition Mobility can’t answer to this question because the source machine needs to be running to initiate the mobility. I’m sure that most of you are implementing a manual solution based on a bunch of scripts recreating the partition profile by hand but this is hard to maintain and it’s not fully automatized and not supported by IBM. A solution to this problem is to setup your partitions as Remote Restart Capable partitions. This PowerVM feature is available since the release of VMcontrol (IBM Systems Director plugin). Unfortunately this powerful feature is not well documented but will probably in the next months or in the next year be a must have on your newly deployed AIX machines. One last word : with the new Power8 machines things are going to change about remote restart, the functionality will be easier to use and a lot of prerequisites are going to disappear. Just to be clear this post has been written using Power7+ 9117-MMD machines, the only thing you can’t do with these machines (compared to Power8 ones) is changing a current partition to be remote restart capable aware without having to delete and recreate its profile.

Pre-requesite

To create and use a remote restart partition on Power7+/Power8 machines you’ll need this prerequisites :

  • A PowerVM enterprise license (Capability “PowerVM remote restart capable” to true, be careful there is another capability named “Remote restart capable” this was used by VMcontrol only, so double check the capability ok for you).
  • A firmware 780 (or later all Power8 firmware are ok, all Power7 >= 780 are ok).
  • Your source and destination machine are connected to the same Hardware Management Console, you can’t remote restart between two HMC at the moment.
  • Minimum version of HMC is 8r8.0.0. Check you have the rrstartlpar command (not the rrlpar command used by VMcontrol only).
  • Better than a long post check this video (don’t laugh at me, I’m trying to do my best but this is one of my first video …. hope it is good) :

What is a remote restart capable virtual machine ?

Better than a long text to explain you what is, check the picture below and follow each number from 1 to 4 to understand what is a remote restart partition :

remote_restart_explanation

Create the profile of you remote restart capable partition : Power7 vs Power8

A good reason to move to Power8 faster than you planed is that you can change a virtual machine to be remote restart capable without having to recreate the whole profile. I don’t know why at the time of writing this post but changing a non remote restart capable lpar to a remote restart capable lpar is only available on Power8 systems. If you are using a Power7 machine (like me in all the examples below) be carful to check this option while creating the machine. Keep in mind that if you forgot to check to option you will not be able to enable the remote restart capability afterwards and you unfortunately have to remove you profile and recreate it, sad but true … :

  • Don’t forget to check the check box to allow the partition to be remote restart capable :
  • remote_restart_capable_enabled1

  • After the partition is created you can notice in the I/O tab that all remote restart capable partition are not able to own any physical I/O adapter :
  • rr2_nophys

  • You can check in the properties that the remote restart capable feature is activated :
  • remote_restart_capable_activated

  • If you try to modify an existing profile on a Power7 machine you’ll get this error message. On a Power8 machine there will be not problem :
  • # chsyscfg -r lpar -m XXXX-9117-MMD-658B2AD -p test_lpar-i remote_restart_capable=1
    An error occurred while changing the partition named test_lpar.
    The managed system does not support changing the remote restart capability of a partition. You must delete the partition and recreate it with the desired remote restart capability.
    
  • You can verify that some of your lpar are remote restart capable :
  • lssyscfg -r lpar -m source-machine -F name,remote_restart_capable
    [..]
    lpar1,1
    lpar2,1
    lpar3,1
    remote-restart,1
    [..]
    
  • On a Power 7 machine the best way to enable remote restart on an already created machine is to delete the profile and recreate it by hand and adding it the remote restart attribute :
  • Get the current partition profile :
  • $ lssyscfg -r prof -m s00ka9927558-9117-MMD-658B2AD --filter "lpar_names=temp3-b642c120-00000133"
    name=default_profile,lpar_name=temp3-b642c120-00000133,lpar_id=11,lpar_env=aixlinux,all_resources=0,min_mem=8192,desired_mem=8192,max_mem=8192,min_num_huge_pages=0,desired_num_huge_pages=0,max_num_huge_pages=0,mem_mode=ded,mem_expansion=0.0,hpt_ratio=1:128,proc_mode=shared,min_proc_units=2.0,desired_proc_units=2.0,max_proc_units=2.0,min_procs=4,desired_procs=4,max_procs=4,sharing_mode=uncap,uncap_weight=128,shared_proc_pool_id=0,shared_proc_pool_name=DefaultPool,affinity_group_id=none,io_slots=none,lpar_io_pool_ids=none,max_virtual_slots=64,"virtual_serial_adapters=0/server/1/any//any/1,1/server/1/any//any/1",virtual_scsi_adapters=3/client/2/s00ia9927560/32/0,virtual_eth_adapters=32/0/1659//0/0/vdct/facc157c3e20/all/0,virtual_eth_vsi_profiles=none,"virtual_fc_adapters=""2/client/1/s00ia9927559/32/c050760727c5007a,c050760727c5007b/0"",""4/client/1/s00ia9927559/35/c050760727c5007c,c050760727c5007d/0"",""5/client/2/s00ia9927560/34/c050760727c5007e,c050760727c5007f/0"",""6/client/2/s00ia9927560/35/c050760727c50080,c050760727c50081/0""",vtpm_adapters=none,hca_adapters=none,boot_mode=norm,conn_monitoring=1,auto_start=0,power_ctrl_lpar_ids=none,work_group_id=none,redundant_err_path_reporting=0,bsr_arrays=0,lpar_proc_compat_mode=default,electronic_err_reporting=null,sriov_eth_logical_ports=none
    
  • Remove the partition :
  • $ chsysstate -r lpar -o shutdown --immed -m source-server -n temp3-b642c120-00000133
    $ rmsyscfg -r lpar -m source-server -n temp3-b642c120-00000133
    
  • Recreate the partition with the remote restart attribute enabled :
  • mksyscfg -r lpar -m s00ka9927558-9117-MMD-658B2AD -i 'name=temp3-b642c120-00000133,profile_name=default_profile,remote_restart_capable=1,lpar_id=11,lpar_env=aixlinux,all_resources=0,min_mem=8192,desired_mem=8192,max_mem=8192,min_num_huge_pages=0,desired_num_huge_pages=0,max_num_huge_pages=0,mem_mode=ded,mem_expansion=0.0,hpt_ratio=1:128,proc_mode=shared,min_proc_units=2.0,desired_proc_units=2.0,max_proc_units=2.0,min_procs=4,desired_procs=4,max_procs=4,sharing_mode=uncap,uncap_weight=128,shared_proc_pool_name=DefaultPool,affinity_group_id=none,io_slots=none,lpar_io_pool_ids=none,max_virtual_slots=64,"virtual_serial_adapters=0/server/1/any//any/1,1/server/1/any//any/1",virtual_scsi_adapters=3/client/2/s00ia9927560/32/0,virtual_eth_adapters=32/0/1659//0/0/vdct/facc157c3e20/all/0,virtual_eth_vsi_profiles=none,"virtual_fc_adapters=""2/client/1/s00ia9927559/32/c050760727c5007a,c050760727c5007b/0"",""4/client/1/s00ia9927559/35/c050760727c5007c,c050760727c5007d/0"",""5/client/2/s00ia9927560/34/c050760727c5007e,c050760727c5007f/0"",""6/client/2/s00ia9927560/35/c050760727c50080,c050760727c50081/0""",vtpm_adapters=none,hca_adapters=none,boot_mode=norm,conn_monitoring=1,auto_start=0,power_ctrl_lpar_ids=none,work_group_id=none,redundant_err_path_reporting=0,bsr_arrays=0,lpar_proc_compat_mode=default,sriov_eth_logical_ports=none'
    

Creating a reserved storage device

The reserved storage device pool is used to store the configuration data of the remote restart partition. At the time of writing this post thoses devices are mandatory and as far as I know they are used just to store the configuration and not the state (memory state) of the virtual machines itself (maybe in the future, who knows ?) (You can’t create or boot any remote restart partition if you do not have a reserved storage device pool created, do this before doing anything else) :

  • You have first to find on both Virtual I/O Server and on both machines (source and destination machine used for the remote restart operation) a bunch of devices. These ones have to be the same on all the Virtual I/O Server used for the remote restart operation. The lsmemdev command is used to find those devices :
  • vios1$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
    hdisk988         00ced82ce999d6f3                     None
    hdisk989         00ced82ce999d960                     None
    hdisk990         00ced82ce999dbec                     None
    vios2$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
    hdisk988         00ced82ce999d6f3                     None
    hdisk989         00ced82ce999d960                     None
    hdisk990         00ced82ce999dbec                     None
    vios3$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
    hdisk988         00ced82ce999d6f3                     None
    hdisk989         00ced82ce999d960                     None
    hdisk990         00ced82ce999dbec                     None
    vios4$ lspv | grep -iE "hdisk988|hdisk989|hdisk990"
    hdisk988         00ced82ce999d6f3                     None
    hdisk989         00ced82ce999d960                     None
    hdisk990         00ced82ce999dbec                     None
    
    $ lsmemdev -r avail -m source-machine -p vios1,vios2
    [..]
    device_name=hdisk988,redundant_device_name=hdisk988,size=61440,type=phys,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E5000000000000,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E5000000000000,redundant_capable=1
    device_name=hdisk989,redundant_device_name=hdisk989,size=61440,type=phys,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E6000000000000,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E6000000000000,redundant_capable=1
    device_name=hdisk990,redundant_device_name=hdisk990,size=61440,type=phys,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E7000000000000,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E7000000000000,redundant_capable=1
    [..]
    $ lsmemdev -r avail -m dest-machine -p vios3,vios4
    [..]
    device_name=hdisk988,redundant_device_name=hdisk988,size=61440,type=phys,phys_loc=U2C4E.001.DBJN914-P2-C2-T1-W500507680140F32C-L3E5000000000000,redundant_phys_loc=U2C4E.001.DBJN914-P2-C1-T1-W500507680140F32C-L3E5000000000000,redundant_capable=1
    device_name=hdisk989,redundant_device_name=hdisk989,size=61440,type=phys,phys_loc=U2C4E.001.DBJN914-P2-C2-T1-W500507680140F32C-L3E6000000000000,redundant_phys_loc=U2C4E.001.DBJN914-P2-C1-T1-W500507680140F32C-L3E6000000000000,redundant_capable=1
    device_name=hdisk990,redundant_device_name=hdisk990,size=61440,type=phys,phys_loc=U2C4E.001.DBJN914-P2-C2-T1-W500507680140F32C-L3E7000000000000,redundant_phys_loc=U2C4E.001.DBJN914-P2-C1-T1-W500507680140F32C-L3E7000000000000,redundant_capable=1
    [..]
    
  • Create the reserved storage device pool using the chhwres command on the Hardware Management Console (create on all machines used by the remote restart operation) :
  • $ chhwres -r rspool -m source-machine -o a -a vios_names=\"vios1,vios2\"
    $ chhwres -r rspool -m source-machine -o a -p vios1 --rsubtype rsdev --device hdisk988 --manual
    $ chhwres -r rspool -m source-machine -o a -p vios1 --rsubtype rsdev --device hdisk989 --manual
    $ chhwres -r rspool -m source-machine -o a -p vios1 --rsubtype rsdev --device hdisk990 --manual
    $ lshwres -r rspool -m source-machine --rsubtype rsdev
    device_name=hdisk988,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Inactive,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E5000000000000,is_redundant=1,redundant_device_name=hdisk988,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Inactive,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E5000000000000,lpar_id=none,device_selection_type=manual
    device_name=hdisk989,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Inactive,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E6000000000000,is_redundant=1,redundant_device_name=hdisk989,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Inactive,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E6000000000000,lpar_id=none,device_selection_type=manual
    device_name=hdisk990,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Inactive,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E7000000000000,is_redundant=1,redundant_device_name=hdisk990,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Inactive,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E7000000000000,lpar_id=none,device_selection_type=manual
    $ lshwres -r rspool -m source-machine
    "vios_names=vios1,vios2","vios_ids=1,2"
    
  • You can also create the reserved storage device pool from Hardware Management Console GUI :
  • After selecting the Virtual I/O Server, click select devices :
  • rr_rsd_pool_p

  • Choose the maximum and minimum size to filter the devices you can select for the creation of the reserved storage device :
  • rr_rsd_pool2_p

  • Choose the disk you want to put in you reserved storage device pool (put all the devices used by remote restart partitions in manual mode, automatic devices are used by suspend/resume operation or AMS pool. One device can not be shared by two remote restart partitions) :
  • rr_rsd_pool_waiting_3_p
    rr_pool_create_7_p

  • You can check afterwards that your reserved device storage pool is created and is composed by three devices :
  • rr_pool_create_9
    rr_pool_create_8_p

Select a storage device for each remote restart partition before starting it :

After creating the reserved device storage pool you have for every partition to select a device from the storage pool. This device will be used to store the configuration data of the partition :

  • You can see you cannot start the partition if no devices were selected !
  • To select the correct device size you first have to calculate the needed space for every partition using the using the lsrsdevsize command. This size around the size of max memory value set in the partition profile (don’t ask me why):
  • $ lsrsdevsize -m source-machine -p temp3-b642c120-00000133
    size=8498
    
  • Select the device you want to assign to your machine (in my case there was already a device selected for this machine) :
  • rr_rsd_pool_assign_p

  • Then select the machine you want to assign for the device :
  • rr_rsd_pool_assign2_p

  • Or do this in command line :
  • $ chsyscfg -r lpar -m source-machine -i "name=temp3-b642c120-00000133,primary_rs_vios_name=vios1,secondary_rs_vios_name=vios2,rs_device_name=hdisk988"
    $ lssyscfg -r lpar -m source-machine --filter "lpar_names=temp3-b642c120-00000133" -F primary_rs_vios_name,secondary_rs_vios_name,curr_rs_vios_name
    vios1,vios2,vios1
    $ lshwres -r rspool -m source-machine --rsubtype rsdev
    device_name=hdisk988,vios_name=vios1,vios_id=1,size=61440,type=phys,state=Active,phys_loc=U2C4E.001.DBJN916-P2-C1-T1-W500507680140F32C-L3E5000000000000,is_redundant=1,redundant_device_name=hdisk988,redundant_vios_name=vios2,redundant_vios_id=2,redundant_state=Active,redundant_phys_loc=U2C4E.001.DBJN916-P2-C2-T1-W500507680140F32C-L3E5000000000000,lpar_name=temp3-b642c120-00000133,lpar_id=11,device_selection_type=manual
    

Launch the remote restart operation

All the remote restart operations are launched from the Hardware Management Console with the rrstartlpar command. At the time of writing this post there is not GUI function to remote restart a machine and you can only do it with the command line :

Validation

Like you can do it with a Live Partition Mobility move you can validate a remote restart operation before running it. You can only perform the remote restart operation if the machine on which the remote restart machine is hosted is shutdown or in error, so the validation is very useful and mandatory to check your remote restart machine are well configured without having to stop the source machine :

$ rrstartlpar -o validate -m source-machine -t dest-machine -p rrlpar
$ rrstartlpar -o validate -m source-machine -t dest-machine -p rrlpar -d 5
$ rrstartlpar -o validate -m source-machine -t dest-machine -p rrlpar --redundantvios 2 -d 5 -v

Execution

As I said before the remote restart operation can only be performed if the source machine is in a particular state, the states that allows a remote restart operation are :

  • Power Off.
  • Error.
  • Error – Dump in progress state.

So the only way to test a remote restart operation today is to shutdown your source machine :

  • Shutdown the source machine :
  • step1

    $ chsysstate -m source-machine -r sys  -o off --immed
    

    rr_step2_mod

  • You can next check on the Hardware Management Console that Virtual I/O Servers and the remote restart lpar are in state “Not available”. You’re now ready to remote restart the lpar (if the partition id is used on the destination machine the next available one will be used) (you have to wait a little before remote restarting the partition, check below) :
  • $ rrstartlpar -o restart -m source-machine -t dest-machine -p rrlpar -d 5 -v
    HSCLA9CE The managed system is not in a valid state to support partition remote restart operations.
    $ rrstartlpar -o restart -m source-machine -t dest-machine -p rrlpar -d 5 -v
    Warnings:
    HSCLA32F The specified partition ID is no longer valid. The next available partition ID will be used.
    

    step3
    rr_step4_mod
    step5

Cleanup

When the source machine is ready to be up (after an outage for instance) just boot the machine and its Virtual I/O Server. After the machine is up you can notice that the rrlpar profile is still there and it can be a huge problem if somebody is trying to boot this machine because it is started on the other machine after the remote restart operation. To prevent such an error you have to cleanup your remote restart partition by using the rrstartlpar command again. Be careful not to check the option to boot the partitions after the machine is started :

  • Restart the source machine and its Virtual I/O Servers :
  • $ chsysstate -m source-machine -r sys -o on
    $ chsysstate -r lpar -m source-machine -n vios1 -o on -f default_profile
    $ chsysstate -r lpar -m source-machine -n vios2 -o on -f default_profile
    

    rr_step6_mod

  • Perform the cleanup operation to remove the profile of the remote restart partition (if you want later to LPM back your machine you have to keep the device of the reserved device storage pool in the pool, if you do not use the –retaindev option the device will be automatically removed from the pool) :
  • $ rrstartlpar -o cleanup -m source-machine -p rrlpar --retaindev -d 5 -v --force
    

    rr_step7_mod

Refresh the partition and profile data

During my test I encounter a problem. The configuration was not correctly synced between the device used in the reserved device storage pool and the current partition profile. I had to use a command named refdev (for refresh device) to synchronize the partition and profile data to the storage device.

$ refdev -m source-machine -p refdev -m sys1 -p temp3-b642c120-00000133 -v 

What’s in the reserved storage device ?

I’m a curious guy. After playing with remote restart I asked myself a question, what is really stored in the reserved device storage device assigned to the remote restart partition. Looking in the documentation on the internet does not answer to my question so I had to look on it on my own. By ‘dding” the reserved storage device assigned to a partition I realized that the profile is stored in xml format. Maybe this format is the same format that the one used by the HMC 8 templates library. For the moment and during my tests on Power7+ machine the state of the memory of the partition is not transferred to the destination machine, maybe because I had to shutdown the whole source machine to test. Maybe the memory state of the machine is transferred to the destination machine if this one is in error state or is dumping. I had not chance to test this :

root@vios1:/home/padmin# dd if=/dev/hdisk17 of=/tmp/hdisk17.out bs=1024 count=10
10+0 records in
10+0 records out
root@vios1:/home/padmin# more hdisk17.out
[..]
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
BwEAAAAAAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACgDIAZAAAQAEAAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA" Profile="H4sIAAAAA
98VjxbxEAhNaZEqpEptPS/iMJO4cTJBdHVj38zcYvu619fTGQlQVmxY0AUICSH4A5XYorJgA1I3sGMBCx5Vs4RNd2zgXI89tpNMxslIiRzPufec853zfefk/t/osMfRBYPZRbpuF9ueUTQsShxR1NSl9dvEEPPMMgnfvPnVk
a2ixplLuOiVCHaUKn/yYMv/PY/ydTRuv016TbgOzdVv4w6+KM0vyheMX62jgq0L7hsCXtxBH6J814WoZqRh/96+4a+ff3Br8+o3uTE0pqJZA7vYoKKnOgYnNoSsoiPECp7KzHfELTQV/lnBAgt0/Fbfs4Wd1sV+ble7Lup/c
be0LQj01FJpoVpecaNP15MhHxpcJP8al6b7fg8hxCnPY68t8LpFjn83/eKFhcffjqF8DRUshs0almioaFK0OfHaUKCue/1GcN0ndyfg9/fwsyzQ6SblellXK6RDDaIIwem6L4iXCiCfCuBZxltFz6G4eHed2EWD2sVVx6Mth
eEOtnzSjQoVwLbo2+uEf3T/s2emPv3z4xA16eD0AC6oRN3FXNnYoA6U7y3OfFc1g5hOIiTQsVUHSusSc43QVluEX2wKdKJZq4q2YmJXEF7hhuqYJA0+inNx3YTDab2m6T7vEGpBlAaJnU0qjWofTkj+uT2Tv3Rl69prZx/9s
thQTBMK42WK7XSzrizqFhPL5E6FeHGVhnSJQLlKKreab1l6z9MwF0C/jTi3OfmKCsoczcJGwITgy+f74Z4Lu2OU70SDyIdXg1+JAApBWZoAbLaEj4InyonZIDbjvZGwv3H5+tb7C5tPThQA9oUdsCN0HsnWoLxWLjPHAdJSp
Ja45pBarVb3JDyUJOn3aemXcIqtUfgPi3wCuiw76tMh6mVtNVDHOB+BxqEUDWZGtPgPrFc9oBgBhhJzEdsEVI9zC1gr0JTexhwgThzIwYEG7lLbt3dcPyHQLKQqfGzVsSNzVSvenkDJU/lUoiXGRNrdxLy2soyhtcNX47INZ
nHKOCjYfsoeR3kpm58GdYDVxipIZXDgSmhfCDCPlKZm4dZoVFORzEX0J6CLvK4py6N7Pz94yiXlPBAArd3zqIEtjXFZ4izJzQ44sCv7hh3bTnY5TbKdnOtHGtatTjrEynTuWFNXV3ouaUKIIKfDgE5XrrpWb/SHWyWCbXMM5
DkaHNzXVJws6csK57jnpToLopiQLZdgHJJh9wm+M+wbof7GzSRJBYvAAaV0RvE8ZlA5yxSob4fAiJiNNwwQAwu2y5/O881fvvz3HxgK70ZDwc1FS8JezBgKR0e/S4XR3ta8OwmdS56akXJITAmYBpElF5lZOdlXuO+8N0opU
m0HeJTw76oiD8PS9QfRECUYqk0B1KGkZ+pRGQPUhPFEb12XIoe7u4WXuwdVqTAnZT8gyYrvAPlL/sYG4RkDmAx5HFZpFIVnAz9Lrlyh9tFIc4nZAColOLNGdFRKmE8GJd5zZx++zMiAoTOWNrJvBjODNo1UOGuXngzcHWjrn
LgmkxjBXLj+6Fjy1DHFF0zV6lVH/p+VYO6pbZzYD9/ORFLouy6MwvlGuRz8Qz10ugawprAdtJ4GxWAOtmQjZXJ+Lg58T/fDy4K74bYWr9CyLIVdQiplHPLbjinZRu4BZuAENE6jxTP2zNkBVgfiWiFcv7f3xYjFqxs/7vb0P
 lpar_name="rrlpar" lpar_uuid="0D80582A44F64B43B2981D632743A6C8" lpar_uuid_gen_method="0"><SourceLparConfig additional_mac_addr_bases="" ame_capability="0" auto_start_e
rmal" conn_monitoring="0" desired_proc_compat_mode="default" effective_proc_compat_mode="POWER7" hardware_mem_encryption="10" hardware_mem_expansion="5" keylock="normal
"4" lpar_placement="0" lpar_power_mgmt="0" lpar_rr_dev_desc="	<cpage>		<P>1</P>
		<S>51</S>
		<VIOS_descri
00010E0000000000003FB04214503IBMfcp</VIOS_descriptor>
	</cpage>
" lpar_rr_status="6" lpar_tcc_slot_id="65535" lpar_vtpm_status="65535" mac_addres
x_virtual_slots="10" partition_type="rpa" processor_compatibility_mode="default" processor_mode="shared" shared_pool_util_authority="0" sharing_mode="uncapped" slb_mig_
ofile="1" time_reference="0" uncapped_weight="128"><VirtualScsiAdapter is_required="false" remote_lpar_id="2" src_vios_slot_number="4" virtual_slot_number="4"/><Virtual
"false" remote_lpar_id="1" src_vios_slot_number="3" virtual_slot_number="3"/><Processors desired="4" max="8" min="1"/><VirtualFibreChannelAdapter/><VirtualEthernetAdapt
" filter_mac_address="" is_ieee="0" is_required="false" mac_address="82776CE63602" mac_address_flags="0" qos_priority="0" qos_priority_control="false" virtual_slot_numb
witch_id="1" vswitch_name="vdct"/><Memory desired="8192" hpt_ratio="7" max="16384" memory_mode="ded" min="256" mode="ded" psp_usage="3"><IoEntitledMem usage="auto"/></M
 desired="200" max="400" min="10"/></SourceLparConfig></SourceLparInfo></SourceInfo><FileInfo modification="0" version="1"/><SriovEthMappings><SriovEthVFInfo/></SriovEt
VirtualFibreChannelAdapterInfo/></VfcMappings><ProcPools capacity="0"/><TargetInfo concurr_mig_in_prog="-1" max_msp_concur_mig_limit_dynamic="-1" max_msp_concur_mig_lim
concur_mig_limit="-1" mpio_override="1" state="nonexitent" uuid_override="1" vlan_override="1" vsi_override="1"><ManagerInfo/><TargetMspInfo port_number="-1"/><TargetLp
ar_name="rrlpar" processor_pool_id="-1" target_profile_name="mig3_9117_MMD_10C94CC141109224549"><SharedMemoryConfig pool_id="-1" primary_paging_vios_id="0"/></TargetLpa
argetInfo><VlanMappings><VlanInfo description="VkVSU0lPTj0xClZJT19UWVBFPVZFVEgKVkxBTl9JRD0zMzMxClZTV0lUQ0g9dmRjdApCUklER0VEPXllcwo=" vlan_id="3331" vswitch_mode="VEB" v
ibleTargetVios/></VlanInfo></VlanMappings><MspMappings><MspInfo/></MspMappings><VscsiMappings><VirtualScsiAdapterInfo description="PHYtc2NzaS1ob3N0PgoJPGdlbmVyYWxJbmZvP
mVyc2lvbj4KCQk8bWF4VHJhbmZlcj4yNjIxNDQ8L21heFRyYW5mZXI+CgkJPGNsdXN0ZXJJRD4wPC9jbHVzdGVySUQ+CgkJPHNyY0RyY05hbWU+VTkxMTcuTU1ELjEwQzk0Q0MtVjItQzQ8L3NyY0RyY05hbWU+CgkJPG1pb
U9TcGF0Y2g+CgkJPG1pblZJT1Njb21wYXRhYmlsaXR5PjE8L21pblZJT1Njb21wYXRhYmlsaXR5PgoJCTxlZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4xPC9lZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4KCTwvZ2VuZ
TxwYXJ0aXRpb25JRD4yPC9wYXJ0aXRpb25JRD4KCTwvcmFzPgoJPHZpcnREZXY+CgkJPHZEZXZOYW1lPnJybHBhcl9yb290dmc8L3ZEZXZOYW1lPgoJCTx2TFVOPgoJCQk8TFVBPjB4ODEwMDAwMDAwMDAwMDAwMDwvTFVBP
FVOU3RhdGU+CgkJCTxjbGllbnRSZXNlcnZlPm5vPC9jbGllbnRSZXNlcnZlPgoJCQk8QUlYPgoJCQkJPHR5cGU+dmRhc2Q8L3R5cGU+CgkJCQk8Y29ubldoZXJlPjE8L2Nvbm5XaGVyZT4KCQkJPC9BSVg+CgkJPC92TFVOP
gkJCTxyZXNlcnZlVHlwZT5OT19SRVNFUlZFPC9yZXNlcnZlVHlwZT4KCQkJPGJkZXZUeXBlPjE8L2JkZXZUeXBlPgoJCQk8cmVzdG9yZTUyMD50cnVlPC9yZXN0b3JlNTIwPgoJCQk8QUlYPgoJCQkJPHVkaWQ+MzMyMTM2M
DAwMDAwMDAwMDNGQTA0MjE0NTAzSUJNZmNwPC91ZGlkPgoJCQkJPHR5cGU+VURJRDwvdHlwZT4KCQkJPC9BSVg+CgkJPC9ibG9ja1N0b3JhZ2U+Cgk8L3ZpcnREZXY+Cjwvdi1zY3NpLWhvc3Q+" slot_number="4" sou
_slot_number="4"><PossibleTargetVios/></VirtualScsiAdapterInfo><VirtualScsiAdapterInfo description="PHYtc2NzaS1ob3N0PgoJPGdlbmVyYWxJbmZvPgoJCTx2ZXJzaW9uPjIuNDwvdmVyc2lv
NjIxNDQ8L21heFRyYW5mZXI+CgkJPGNsdXN0ZXJJRD4wPC9jbHVzdGVySUQ+CgkJPHNyY0RyY05hbWU+VTkxMTcuTU1ELjEwQzk0Q0MtVjEtQzM8L3NyY0RyY05hbWU+CgkJPG1pblZJT1NwYXRjaD4wPC9taW5WSU9TcGF0
YXRhYmlsaXR5PjE8L21pblZJT1Njb21wYXRhYmlsaXR5PgoJCTxlZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4xPC9lZmZlY3RpdmVWSU9TY29tcGF0YWJpbGl0eT4KCTwvZ2VuZXJhbEluZm8+Cgk8cmFzPgoJCTxwYXJ0
b25JRD4KCTwvcmFzPgoJPHZpcnREZXY+CgkJPHZEZXZOYW1lPnJybHBhcl9yb290dmc8L3ZEZXZOYW1lPgoJCTx2TFVOPgoJCQk8TFVBPjB4ODEwMDAwMDAwMDAwMDAwMDwvTFVBPgoJCQk8TFVOU3RhdGU+MDwvTFVOU3Rh
cnZlPm5vPC9jbGllbnRSZXNlcnZlPgoJCQk8QUlYPgoJCQkJPHR5cGU+dmRhc2Q8L3R5cGU+CgkJCQk8Y29ubldoZXJlPjE8L2Nvbm5XaGVyZT4KCQkJPC9BSVg+CgkJPC92TFVOPgoJCTxibG9ja1N0b3JhZ2U+CgkJCTxy
UlZFPC9yZXNlcnZlVHlwZT4KCQkJPGJkZXZUeXBlPjE8L2JkZXZUeXBlPgoJCQk8cmVzdG9yZTUyMD50cnVlPC9yZXN0b3JlNTIwPgoJCQk8QUlYPgoJCQkJPHVkaWQ+MzMyMTM2MDA1MDc2ODBDODAwMDEwRTAwMDAwMDAw
ZmNwPC91ZGlkPgoJCQkJPHR5cGU+VURJRDwvdHlwZT4KCQkJPC9BSVg+CgkJPC9ibG9ja1N0b3JhZ2U+Cgk8L3ZpcnREZXY+Cjwvdi1zY3NpLWhvc3Q+" slot_number="3" source_vios_id="1" src_vios_slot_n
tVios/></VirtualScsiAdapterInfo></VscsiMappings><SharedMemPools find_devices="false" max_mem="16384"><SharedMemPool/></SharedMemPools><MigrationSession optional_capabil
les" recover="na" required_capabilities="veth_switch,hmc_compatibilty,proc_compat_modes,remote_restart_capability,lpar_uuid" stream_id="9988047026654530562" stream_id_p
on>

About the state of the source machine ?

You have to know this before using remote restart : at the time of writing this post the remote restart feature is still young and have to evolve before being usable in real life, I’m saying this because the FSP of the source machine has to be up to perform a remote restart operation. To be clear the remote restart feature does not answer to the total loss of one of your site. It’s just useful to restart partitions of a system with a problem that is not an FSP problem (problem with memory DIMM, problem with CPUs for instance). It can be used in your DRP exercises but not if your whole site is totally down which is -in my humble opinion- one of the key feature that remote restart needs to answer. Don’t be afraid read the conclusion ….

Conclusion

This post have been written using Power7+ machines, my goal was to give you an example of remote restart operations : a summary of what is is, how it work, and where and when to use it. I’m pretty sure that a lot of things are going to change about remote restart. First, on Power8 machines you don’t have to recreate the partitions to make them remote restart aware. Second, I know that changes are on the way for remote restart on Power8 machines, especially about reserved storage devices and about the state of the source machine. I’m sure this feature will have a bright future and used with PowerVC it can be a killer feature. Hope to see all this changes in a near future ;-). Once again I hope this post helps you.

10 thoughts on “Configuration of a Remote Restart Capable partition

  1. Hi!
    Very nice post here… Do you have any idea when rrlpar will work with 2 hmc (i’ve seen that it doesn’t work yet, in your post, but is it planned in a near future version ?)
    And also, we don’t have svc here, but an Hitachi system that presents disks as svc does (kindof…). It’s a unique hdisk, which is remotely copied synchronously between 2 sites, but presented once (no need to split/and rw like true copy and no need of lvm copy). Will it work with a normal hdisk ?
    Anyway, thanks and bravo !

    gilles

    • Hi Gilles;

      First of all thanks for you comment and support. It’s always cool to see that people are reading you and enjoy the posts. So thank you.

      You seems to have something similar as SVC and/or VPLEX but on Hitachi. I do recommend something like that for remote restart because you need to access devices of the reserved device storage pool whatever happends even if you have an array failure. For you information my devices are masked on the VIOS trough a SVC and it works like a charm. Anyway my advice is to wait a little before deploying remote restart (or just do it for your most critical lpars (don’t forget that you’ll need devices for every machines ….) :
      1/ The reserved devices storage may dissapear in a near future (only for P8).
      2/ Don’t have any infomration for remote restart between two HMC. But I recommand to have two hmcs “seeing” machine of each site to avoid this problem.
      3/ And don’t forget the FSP issue. You have to be aware that FSP of the source machine needs to be up ….

      Once again thanks for your kind comment.

      Regards,

      B.

      • Merci pour ta réponse Benoit, il me semblait aussi que tu étais français ! Bonne continuation et bravo pour ta culture AIX/Pseries.

  2. There is also an other prerequisite that is not well indicated in the official documentation :
    “Inactive partition mobility operation (aka LPAR remote restart) must be preceded by a successful Active partition mobility (aka LPM). This is a mandatory step done to make sure WWPN is ready for migration.”

  3. Hi, this is an excellent post for remote restart. I was wondering however if remote restart supports NPIV presented LUNs LPARs? I seems VSCSI LUNs LPARs is used in the example here. If NPIV is supported for RR, can you please point me to some source for this? Our environment is NPIV and we would love to use RR on our Power7 and Power8 environments.

    Thank you.

    Ben.

    • Yes no problem at all with NPIV. I have tested this solution myself this afternoon between to P8 boxes.
      Your lpar just have to be zoned on extended fabrics and it is ok. If you can do a live partition mobility operation between the two boxes taking part in the remote restart operation it will be ok.
      If you are planing to do RR between P7 and P8 don’t use SRR … but just RR.
      Hope it helps,
      B.

  4. Pingback: Using the Simplified Remote Restart capability on Power8 Scale Out Servers | chmod666

  5. Hi Benoit,
    I wanted to experiment the Remote Restart between my two P270 (PowerFlex compute nodes) but my HMC is still on the last V7 version (V7R7.9.0 SP2 with MH01571).
    That was really misleading because I followed steps of the procedure, in particular the recreating of the LPAR with the option RR activated and at the end I realized that the HMC don’t have rrstartlpar command !
    So… I go on an upgrade of my HMC on the last V8 version (8.8.4.0) and we’ll see then !
    Anyway, thanks again for your sharing !
    Erwin

    • So I’m good now

      Just did my first RR operation. I am disappoint because I realize that the LPAR ID is not kept, even if it’s unused on the target machine !

      I have had the same message that you have in your demo although I paid attention that it was available before playing the test.

      HSCLA32F The specified partition ID is no longer valid. The next available partition ID will be used.

      What do you think : Is it normal or not ? I’m thinking about opening a PMR for that.

      • Just did again the test on my P7+ boxes with the last SP (1) of the HMC 840 version.

        No more message “HSCLA32F The specified partition ID is no longer valid. The next available partition ID will be used.”
        No more issue, all is fine now ! :-)

        PS : are you sure of your “refdev” command ?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>