Automating systems deployment & other new features : HMC8, IBM Provisioning Toolkit for PowerVM and LPM Automation Tool

I am involved in a project where we are going to deploy dozen of Power Systems (still Power7 for the moment, and Power8 in a near future). All the systems will be the same : same models with the same slots emplacements and the same Virtual I/O Server configuration. To be sure that all my machines are the same and to allow other people (who are not aware of the design or are not skilled enough to do it by themselves) I had to find a solution to automatize the deployment of the new machines. For the virtual machines the solution is now to use PowerVC but what about the Virtual I/O Servers, what about the configuration of the Shared Ethernet Adapters. In other words what about the infrastructure deployment ? I spent a week with an IBM US STG Lab services consultant (Bonnie Lebarron) for a PowerCare (you have now a PowerCare included with every high end machine you buy) about the IBM Provisioning Toolkit for PowerVM (which is a very powerful tool that allows you to deploy your Virtual I/O Server and your virtual machines automatically) and the Live Partition Mobility Automation tool. With the new Hardware Management Console (8R8.2.0) you now have the possibility to create templates not just for the new virtual machines creation, but also to deploy create and configure your Virtual I/O Severs. The goal of this post is to show that there are different way to do that but also to show you the new features embedded with the new Hardware Management Console and to spread the world about those two STG Labs Services wonderful tools that are well know in US but not so much in Europe. So it’s a HUGE post, just take what is useful for you in it. Here we go :

Hardware Management Console 8 : System templates

The goal of the systems templates is to deploy a new server in minutes without having to logging on different servers to do some tasks, you now just have to connect on the HMC to do all the work. The systems templates will deploy the Virtual I/O Server image by using your NIM server or by using the images stored in the Hardware Management Console media repository. Please note a few points :

  • You CAN’T deploy a “gold” mksysb of your Virtual I/O Server using the Hardware Management Console repository. I’ve tried this myself and it is for the moment impossible (if someone has a solution …). I’ve tried two different ways. Creating a backupios image without the mksysb flag (it will produce a tar file impossible to upload on the image repository, but usable by the installios command). Creating a backupios image with the mksysb flag and use the mkcd/mkdvd command to create iso images. Both method were failing at the installation process.
  • The current Virtual I/O Server images provided in the Eelectonic Software Delivry (2.2.3.4 at the moment) are provided in the .udf format and not the .iso format. This is not a huge problem, just rename both files to .iso before uploading the file on the Hardware Management Console.
  • If you want to deploy your own mksysb you can still choose to use your NIM server, but you will have to manually create the NIM objects, and to manually configure a bosinst installation (in my humble opinion what we are trying to do is to reduce manual interventions, but you can still do that for the moment, that’s what I do because I don’t have the choice). You’ll have to give the IP address of the NIM server and the HMC will boot the Virtual I/O Servers with the network settings already configured.
  • The Hardware Management Console installation with the media repository is based on the old well known installios command. You still need to have the NIM port opened between your HMC and the Virtual I/O Server management network (the one you will choose to install both Virtual I/O Servers) (installios is based on NIMOL). You may experience some problems if you already install your Virtual I/O Servesr this way and you may have to reset some things. My advice is to always run these three commands before deploying a system template :
  • # installios -F -e -R default1
    # installios -u 
    # installios -q
    

Uploading an iso file on the Hardware Management Console

Upload the images on the Hardware Management Console, I’ll not explain this in details …:

hmc_add_virtual_io_server_to_repo
hmc_add_virtual_io_server_to_repo2

Creating a system template

To create a system template you have first to copy an existing predefined template provided by the Hardware Management Console (1) and then edit this template to fit you own needs (2) :

create_template_1

  • You can’t edit the physical I/O part when editing a new template, you first have to deploy a system with this template to choose the physical I/O for each Virtual I/O Server and then capture this deployed system as an HMC template. Change the properties of your Virtual I/O Server :
  • create_template_2

  • Create your Shared Ethernet Adapters : let’s say we want to create one Shared Ethernet Adapter in sharing mode with four virtual adapters :
  • Adapter 1 : PVID10, vlans=1024;1025
  • Adapter 2 : PVID11, vlans=1028;1029
  • Adapter 3 : PVID12, vlans=1032;1033
  • Adapter 4 : PVID13, vlans=1036;1037
  • In the new HMC8 the terms are changing and are not the same : Virtual Network Bridge = Shared Ethernet Adapter; Load (Balance) Group = A pair of virtual adapters with the same PVID on both Virtual I/O Server.
  • Create the Shared Ethernet Adapter with the first (with PVID10) and the second (with PVID11) adapter and the first vlan (vlan 1024 has to be added on adapter with PVID 10) :
  • create_sea1
    create_sea2
    create_sea3

  • Add the second vlan (the vlan 1028) in our Shared Ethernet Adapter (Virtual Network Bridge) and choose to put it on the adapter with PVID 11 (Load Balance Group 11) :
  • create_sea4
    create_sea5
    create_sea6

  • Repeat this operation for the next vlan (1032), but this time we have to create new virtual adapters with PVID 12 (Load Balance Group 12) :
  • create_sea7

  • Repeat this operation for the next vlan (1036), but this time we have to create new virtual adapters with PVID 13 (Load Balance Group 13).
  • You can check on this picture our 4 virtual adapters with two vlans for each ones :
  • create_sea8
    create_sea9

  • I’ll not detail the other part which are very simple to understand. You can check at the end our template is created 2 Virtual I/O Servers and 8 virtual networks.

The Shared Ethernet Adapter problem : Are you deploying a Power8/Power7 with a 780 firmware or a Power6/7 server ?

When creating a system template you probably notice that when your are defining your your Shared Ethernet Adapters … sorry your Virtual Network Bridges there is no possibility to create any control channel adapters or any possibility to assign a vlan id for this control channel. If you choose to create the system template by hand with the HMC the template will be usable by all Power8 systems and all Power7 system with a firmware that allows you to create a Shared Ethernet Adapter without any control channel (780 firmwares). I’ve tried this myself and we will check that later. If you are deploying a system template an older power 7 system the deployment will fail because of this reason. You have two solutions to this problem. Create your first system “by hand” and create your Shared Ethernet Adapters with control channel on your own and then capture the system to redeploy on other machines or you have the choice to edit the XML of you current template to add the control channel adapter in it …no comments.

failed_sea_ctl_chan

If you choose to edit the template to add the control channel on your own, export your template as an xml file and edit it by hand (here is an example on the picture below), and then re-imported the modified xml file :

sea_control_channel_template

Capture an already deployed system

As you can see creating a system template from scratch can be hard and cannot match all your needs especially with this Shared Ethernet Adapter problem. My advice is to deploy by hand or by using the toolkit your first system and then capture the system to create and Hardware Management Console template based on this one. By doing this all the Shared Ethernet Adapters will be captured as configured, the ones with control channels and the ones without control channel. It can match all the cases without having to edit the xml file by hand.

  • Click “Capture configuration as template with physical I/O” :
  • capture_template_with_physical_io

  • The whole system will be captured and if you put your physical I/O in the same slot (as we do in my team) each time you deploy a new server you will not have to choice which physical I/O will belong to which Virtual I/O server :
  • capture_template_with_physical_io_capturing

  • In the system template library you can check that the physical I/O are captured and that we do not have to define our Shared Ethernet Adapter (the screenshot below shows you 49 vlans ready to be deployed) :
  • capture_template_library_with_physical_io_and_vlan

  • To do this don’t forget to edit the template and check the box “Use captured I/O information” :
  • use_captured_io_informations

    Deploying a system template

    BE VERY CAREFUL BEFORE DEPLOYING A SYSTEM TEMPLATE ALL THE ALREADY EXISTING VIRTUAL I/O SERVERS AND PARTITIONS WILL BE REMOVED BY DOING THIS. THE HMC WILL PROMPT YOU A WARNING MESSAGE. Go in the template library and right click on the template you want to deploy, then click deploy :

    reset_before_deploy1
    reset_before_deploy2

    • If you are deploying a “non captured template” choose the physical I/O for each Virtual I/O Servers :
    • choose_io1

    • If you are deploying a “captured template” the physical I/O will be automatically choose for each Virtual I/O Servers :
    • choose_io2

    • The Virtual I/O Server profiles are craved here :
    • craving_virtual_io_servers

    • You next have the choice to use a NIM server or to use the HMC image repository to deploy the Virtual I/O Servers in both cases you have to choose the adapter used to deploy the image :
    • NIM way :
    • nim_way

    • HMC way (check the tip at the beginning of the post about installios if you are choosing this method :
    • hmc_way

    • Click start when you are ready. The start button will invoke the lpar_netboot command with the settings you put in the previous screen :
    • start_dep

    • You can monitor the installation process by clicking monitoring vterm (on the images below you can check the ping is successful, the bootp is ok, the tftp is downloading, and the being mksysb restored :
    • monitor1
      monitor2
      monitor3

    • The RMC connection has to be up on both Virtual I/O Servers to build the Shared Ethernet Adapters and the Virtual I/O Server license must be accepted. Check both are ok.
    • RMCok
      licenseok

    • Choose where the Shared Ethernet Adapters will be created and the create the link aggregation device here (choose here on which network adapters and network ports will your Shared Ethernet Adapters be created) :
    • choose_adapter

    • Click start on the next screen to create the Shared Ethernet Adapter automatically :
    • sea_creation_ok

    • After a successful deployment of a system template a summary will be displayed on the screen :
    • template_ok

    IBM Provisioning Toolkit for PowerVM : A tool created by the Admins for the Admins

    As you now know the HMC templates are ok, but there are some drawbacks about using this method. In my humble opinion the HMC templates are good for a beginner, the user is now guided step by step and it is much simpler for someone who doesn’t know anything about PowerVM to build a server from scratch, without knowing and understanding all the features of PowerVM (Virtual I/O Server, Shared Ethernet Adapter). The deployment is not fully automatized the HMC will not mirror your rootvg, will not set any attributes on your fiber channel adapters, will never run a custom script after the installation to fit your needs. Last point, I’m sure that as a system administrator you probably prefer using command line tools than a “crappy” GUI, a template can not be created, neither deployed in command line (change this please). There is another way to build your server and it’s called IBM PowerVM Provisioning toolkit. This tool is developed by STG Lab Services US and is not well known in Europe but I can assure you that a lot of US customers are using it (raise your voice in comments us guys). This tool can help you in many ways :

    • Carving Virtual I/O Servers profiles.
    • Building and deploying Virtual I/O Servers with a NIM Server without having to create anything by hand.
    • Creating your SEA with or without control channel, failover/sharing, tagged/non-tagged.
    • Setting attributes on your fire channel adapters.
    • Building and deploying Virtual I/O Clients in NPIV and vscsi.
    • Mirroring you rootvg.
    • Capturing a whole frame and redeploy it on another server.
    • A lot of other things.

    Just to let you understand the approach of the tool let’s begin with an example. I want to deploy a new machine with two Virtual I/O Server :

    • 1 (white) – I’m writing a profile file : in this one I’m putting all the information that are the same all the machines (virtual switches, shared processor pools, Virtual I/O Server profiles, Shared Ethernet Adapter definition, image chosen to deploy the Virtual I/O Server, physical I/O adapter for each Virtual I/O Server)
    • 2 (white) – I’m writing a config file : in this one I’m putting all the information that are unique for each machine (name, ip, HMC name used to deploy, CEC serial number, and so on)
    • 3 (yellow) – I’m launching the provisioning toolkit to build my machine, the NIM objects are created (networks, standalone machines) and the bosinst operation is launched from the NIM server
    • 4 (red) – The Virtual I/O Servers profiles are created and the lpar_netboot command is launched an ssh key has to be shared between the NIM server and the Hardware management console
    • 5 (blue) – Shared Ethernet Adapter are created and post configuration is launched on the Virtual I/O Server (mirror creation, vfc attributes …)

    toolkit

    Let me show you a detailed example of a new machine deployment :

    • On the NIM server, the toolkit is located in /export/nim/provision. You can see that the main script called buildframe.ksh.v3.24.2, and two directories one for the profiles (build_profiles) and one for the configuration files (config_files). The work_area directory is the log directory :
    • # cd /export/nim/provision
      # ls
      build_profiles          buildframe.ksh.v3.24.2  config_files       lost+found              work_area
      
    • Let’s check a configuration file a new Power720 deployment :
    • # vi build_profiles/p720.conf
      
    • Some variables will be set in the configuration file put N/A value for this ones :
    • VARIABLES      (SERVERNAME)=NA
      VARIABLES      (BUILDHMC)=NA
      [..]
      VARIABLES      (BUILDUSER)=hscroot
      VARIABLES      (VIO1_LPARNAME)=NA
      VARIABLES      (vio1_hostname)=(VIO1_LPARNAME)
      VARIABLES      (VIO1_PROFILE)=default_profile
      
      VARIABLES      (VIO2_LPARNAME)=NA
      VARIABLES      (vio2_hostname)=(VIO2_LPARNAME)
      VARIABLES      (VIO2_PROFILE)=default_profile
      
      VARIABLES      (VIO1_IP)=NA
      VARIABLES      (VIO2_IP)=NA
      
    • Choose the ports that will be used to restore the Virtual I/O Server mksysb :
    • VARIABLES      (NIMPORT_VIO1)=(CEC1)-P1-C6-T1
      VARIABLES      (NIMPORT_VIO2)=(CEC1)-P1-C7-T1
      
    • In the example I’m building the Virtual I/O Server with 3 Shared Ethernet Adapters, and I’m not creating any LACP aggregation :
    • # SEA1
      VARIABLES      (SEA1VLAN1)=401
      VARIABLES      (SEA1VLAN2)=402
      VARIABLES      (SEA1VLAN3)=403
      VARIABLES      (SEA1VLAN4)=404
      VARIABLES      (SEA1VLANS)=(SEA1VLAN1),(SEA1VLAN2),(SEA1VLAN3),(SEA1VLAN4)
      # SEA2
      VARIABLES      (SEA2VLAN1)=100,101,102
      VARIABLES      (SEA2VLAN2)=103,104,105
      VARIABLES      (SEA2VLAN3)=106,107,108
      VARIABLES      (SEA2VLAN4)=109,110
      VARIABLES      (SEA2VLANS)=(SEA2VLAN1),(SEA2VLAN2),(SEA2VLAN3),(SEA2VLAN4)
      # SEA3
      VARIABLES      (SEA3VLAN1)=200,201,202,203,204,309
      VARIABLES      (SEA3VLAN2)=205,206,207,208,209,310
      VARIABLES      (SEA3VLAN3)=210,300,301,302,303
      VARIABLES      (SEA3VLAN4)=304,305,306,307,308
      VARIABLES      (SEA3VLANS)=(SEA3VLAN1),(SEA3VLAN2),(SEA3VLAN3),(SEA3VLAN4)
      # SEA DEF (I'm putting adapter ID and PVID here)
      SEADEF         seadefid=SEA1,networkpriority=S,vswitch=vdct,seavirtid=10,10,(SEA1VLAN1):11,11,(SEA1VLAN2):12,12,(SEA1VLAN3):13,13,(SEA1VLAN4),seactlchnlid=14,99,vlans=(SEA1VLANS),netmask=(SEA1NETMASK),gateway=(SEA1GATEWAY),etherchannel=NO,lacp8023ad=NO,vlan8021q=YES,seaat
      trid=nojumbo
      SEADEF         seadefid=SEA2,networkpriority=S,vswitch=vdca,seavirtid=15,15,(SEA2VLAN1):16,16,(SEA2VLAN2):17,17,(SEA2VLAN3):18,18,(SEA2VLAN4),seactlchnlid=19,98,vlans=(SEA2VLANS),netmask=(SEA2NETMASK),gateway=(SEA2GATEWAY),etherchannel=NO,lacp8023ad=NO,vlan8021q=YES,seaat
      trid=nojumbo
      SEADEF         seadefid=SEA3,networkpriority=S,vswitch=vdcb,seavirtid=20,20,(SEA3VLAN1):21,21,(SEA3VLAN2):22,22,(SEA3VLAN3):23,23,(SEA3VLAN4),seactlchnlid=24,97,vlans=(SEA3VLANS),netmask=(SEA3NETMASK),gateway=(SEA3GATEWAY),etherchannel=NO,lacp8023ad=NO,vlan8021q=YES,seaat
      trid=nojumbo
      # SEA PHYSICAL PORTS 
      VARIABLES      (SEA1AGGPORTS_VIO1)=(CEC1)-P1-C6-T2
      VARIABLES      (SEA1AGGPORTS_VIO2)=(CEC1)-P1-C7-T2
      VARIABLES      (SEA2AGGPORTS_VIO1)=(CEC1)-P1-C1-C3-T1
      VARIABLES      (SEA2AGGPORTS_VIO2)=(CEC1)-P1-C1-C4-T1
      VARIABLES      (SEA3AGGPORTS_VIO1)=(CEC1)-P1-C4-T1
      VARIABLES      (SEA3AGGPORTS_VIO2)=(CEC1)-P1-C5-T1
      # SEA ATTR 
      SEAATTR        seaattrid=nojumbo,ha_mode=sharing,largesend=1,large_receive=yes
      
    • I’m defining each physical I/O adapter for each Virtual I/O Servers :
    • VARIABLES      (HBASLOTS_VIO1)=(CEC1)-P1-C1-C1,(CEC1)-P1-C2
      VARIABLES      (HBASLOTS_VIO2)=(CEC1)-P1-C1-C2,(CEC1)-P1-C3
      VARIABLES      (ETHSLOTS_VIO1)=(CEC1)-P1-C6,(CEC1)-P1-C1-C3,(CEC1)-P1-C4
      VARIABLES      (ETHSLOTS_VIO2)=(CEC1)-P1-C7,(CEC1)-P1-C1-C4,(CEC1)-P1-C5
      VARIABLES      (SASSLOTS_VIO1)=(CEC1)-P1-T9
      VARIABLES      (SASSLOTS_VIO2)=(CEC1)-P1-C19-T1
      VARIABLES      (NPIVFCPORTS_VIO1)=(CEC1)-P1-C1-C1-T1,(CEC1)-P1-C1-C1-T2,(CEC1)-P1-C1-C1-T3,(CEC1)-P1-C1-C1-T4,(CEC1)-P1-C2-T1,(CEC1)-P1-C2-T2,(CEC1)-P1-C2-T3,(CEC1)-P1-C2-T4
      VARIABLES      (NPIVFCPORTS_VIO2)=(CEC1)-P1-C1-C2-T1,(CEC1)-P1-C1-C2-T2,(CEC1)-P1-C1-C2-T3,(CEC1)-P1-C1-C2-T4,(CEC1)-P1-C3-T1,(CEC1)-P1-C3-T2,(CEC1)-P1-C3-T3,(CEC1)-P1-C3-T4
      
    • I’m defining the mksysb image to use and the Virtual I/O Server profiles :
    • BOSINST        bosinstid=viogold,source=mksysb,mksysb=golden-vios-2234-29122014-mksysb,spot=golden-vios-2234-29122014-spot,bosinst_data=no_prompt_hdisk0-bosinst_data,accept_licenses=yes,boot_client=no
      
      PARTITIONDEF   partitiondefid=vioPartition,bosinstid=viogold,lpar_env=vioserver,proc_mode=shared,min_proc_units=0.4,desired_proc_units=1,max_proc_units=16,min_procs=1,desired_procs=4,max_procs=16,sharing_mode=uncap,uncap_weight=255,min_mem=1024,desired_mem=8192,max_mem=12
      288,mem_mode=ded,max_virtual_slots=500,all_resources=0,msp=1,allow_perf_collection=1
      PARTITION      name=(VIO1_LPARNAME),profile_name=(VIO1_PROFILE),partitiondefid=vioPartition,lpar_netboot=(NIM_IP),(vio1_hostname),(VIO1_IP),(NIMNETMASK),(NIMGATEWAY),(NIMPORT_VIO1),(NIM_SPEED),(NIM_DUPLEX),NA,YES,NO,NA,NA
      PARTITION      name=(VIO2_LPARNAME),profile_name=(VIO2_PROFILE),partitiondefid=vioPartition,lpar_netboot=(NIM_IP),(vio2_hostname),(VIO2_IP),(NIMNETMASK),(NIMGATEWAY),(NIMPORT_VIO2),(NIM_SPEED),(NIM_DUPLEX),NA,YES,NO,NA,NA
      
    • Let’s now check a configuration file for a specific machine (as you can see I’m putting the Virtual I/O Server name here, the ip address and all that is specific to the new machines (CEC serial number and so on)) :
    • # cat P720-8202-E4D-1.conf
      (BUILDHMC)=myhmc
      (SERVERNAME)=P720-8202-E4D-1
      (CEC1)=WZSKM8U
      (VIO1_LPARNAME)=labvios1
      (VIO2_LPARNAME)=labvios2
      (VIO1_IP)=10.14.14.1
      (VIO2_IP)=10.14.14.2
      (NIMGATEWAY)=10.14.14.254
      (VIODNS)=10.10.10.1,10.10.10.2
      (VIOSEARCH)=lab.chmod66.org,prod.chmod666.org
      (VIODOMAIN)=chmod666.org
      
    • We are now ready to build the new machine. the first thing to do is to create the vswitches on the machine (you have to confirm all operations):
    • ./buildframe.ksh.v3.24.2 -p p720 -c P720-8202-E4D-1.conf -f vswitch
      150121162625 Start of buildframe DATE: (150121162625) VERSION: v3.24.2
      150121162625        profile: p720.conf
      150121162625      operation: FRAMEvswitch
      150121162625 partition list:
      150121162625   program name: buildframe.ksh.v3.24.2
      150121162625    install dir: /export/nim/provision
      150121162625    post script:
      150121162625          DEBUG: 0
      150121162625         run ID: 150121162625
      150121162625       log file: work_area/150121162625_p720.conf.log
      150121162625 loading configuration file: config_files/P720-8202-E4D-1.conf
      [..]
      Do you want to continue?
      Please enter Y or N Y
      150121162917 buildframe is done with return code 0
      
    • Let’s now build the Virtual I/O Servers, create the Shared Ethernet Adapters and let’s have a coffee ;-)
    • # ./buildframe.ksh.v3.24.2 -p p720 -c P720-8202-E4D-1.conf -f build
      [..]
      150121172320 Creating partitions
      150121172320                 --> labvios1
      150121172322                 --> labvios2
      150121172325 Updating partition profiles
      150121172325   updating VETH adapters in partition: labvios1 profile: default_profile
      150121172329   updating VETH adapters in partition: labvios1 profile: default_profile
      150121172331   updating VETH adapters in partition: labvios1 profile: default_profile
      150121172342   updating VETH adapters in partition: labvios2 profile: default_profile
      150121172343   updating VETH adapters in partition: labvios2 profile: default_profile
      150121172344   updating VETH adapters in partition: labvios2 profile: default_profile
      150121172345   updating IOSLOTS in partition: labvios1 profile: default_profile
      150121172347   updating IOSLOTS in partition: labvios2 profile: default_profile
      150121172403 Configuring NIM for partitions
      150121172459 Executing--> lpar_netboot   -K 255.255.255.0 -f -t ent -l U78AA.001.WZSKM8U-P1-C6-T1 -T off -D -s auto -d auto -S 10.20.20.1 -G 10.14.14.254 -C 10.14.14.1 labvios1 default_profile s00ka9936774-8202-E4D-845B2CV
      150121173247 Executing--> lpar_netboot   -K 255.255.255.0 -f -t ent -l U78AA.001.WZSKM8U-P1-C7-T1 -T off -D -s auto -d auto -S 10.20.20.1 -G 10.14.14.254 -C 10.14.14.2 labvios2 default_profile s00ka9936774-8202-E4D-845B2CV
      150121174028 buildframe is done with return code 0
      
    • After the mksysb is deployed you can tail the logs on each Virtual I/O Server to check what is going on :
    • [..]
      150121180520 creating SEA for virtID: ent4,ent5,ent6,ent7
      ent21 Available
      en21
      et21
      150121180521 Success: running /usr/ios/cli/ioscli mkvdev -sea ent1 -vadapter ent4,ent5,ent6,ent7 -default ent4 -defaultid 10 -attr ctl_chan=ent8  ha_mode=sharing largesend=1 large_receive=yes, rc=0
      150121180521 found SEA ent device: ent21
      150121180521 creating SEA for virtID: ent9,ent10,ent11,ent12
      [..]
      ent22 Available
      en22
      et22
      150121180523 Success: running /usr/ios/cli/ioscli mkvdev -sea ent20 -vadapter ent9,ent10,ent11,ent12 -default ent9 -defaultid 15 -attr ctl_chan=ent13  ha_mode=sharing largesend=1 large_receive=yes, rc=0
      150121180523 found SEA ent device: ent22
      150121180523 creating SEA for virtID: ent14,ent15,ent16,ent17
      [..]
      ent23 Available
      en23
      et23
      [..]
      150121180540 Success: /usr/ios/cli/ioscli cfgnamesrv -add -ipaddr 10.10.10.1, rc=0
      150121180540 adding DNS: 10.10.10.1
      150121180540 Success: /usr/ios/cli/ioscli cfgnamesrv -add -ipaddr 10.10.10.2, rc=0
      150121180540 adding DNS: 159.50.203.10
      150121180540 adding DOMAIN: lab.chmod666.org
      150121180541 Success: /usr/ios/cli/ioscli cfgnamesrv -add -dname fr.net.intra, rc=0
      150121180541 adding SEARCH: lab.chmod666.org prod.chmod666.org
      150121180541 Success: /usr/ios/cli/ioscli cfgnamesrv -add -slist lab.chmod666.org prod.chmod666.org, rc=0
      [..]
      150121180542 Success: found fcs device for physical location WZSKM8U-P1-C2-T4: fcs3
      150121180542 Processed the following FCS attributes: fcsdevice=fcs4,fcs5,fcs6,fcs7,fcs0,fcs1,fcs2,fcs3,fcsattrid=fcsAttributes,port=WZSKM8U-P1-C1-C1-T1,WZSKM8U-P1-C1-C1-T2,WZSKM8U-P1-C1-C1-T3,WZSKM8U-P1-C1-C1-T4,WZSKM8U-P1-C2-T1,WZSKM8U-P1-C2-T2,WZSKM8U-P1-C2-T3,WZSKM8U-P
      1-C2-T4,max_xfer_size=0x100000,num_cmd_elems=2048
      150121180544 Processed the following FSCSI attributes: fcsdevice=fcs4,fcs5,fcs6,fcs7,fcs0,fcs1,fcs2,fcs3,fscsiattrid=fscsiAttributes,port=WZSKM8U-P1-C1-C1-T1,WZSKM8U-P1-C1-C1-T2,WZSKM8U-P1-C1-C1-T3,WZSKM8U-P1-C1-C1-T4,WZSKM8U-P1-C2-T1,WZSKM8U-P1-C2-T2,WZSKM8U-P1-C2-T3,WZS
      KM8U-P1-C2-T4,fc_err_recov=fast_fail,dyntrk=yes
      [..]
      150121180546 Success: found device U78AA.001.WZSKM8U-P2-D4: hdisk0
      150121180546 Success: found device U78AA.001.WZSKM8U-P2-D5: hdisk1
      150121180546 Mirror hdisk0 -->  hdisk1
      150121180547 Success: extendvg -f rootvg hdisk1, rc=0
      150121181638 Success: mirrorvg rootvg hdisk1, rc=0
      150121181655 Success: bosboot -ad hdisk0, rc=0
      150121181709 Success: bosboot -ad hdisk1, rc=0
      150121181709 Success: bootlist -m normal hdisk0 hdisk1, rc=0
      150121181709 VIOmirror <- rc=0
      150121181709 VIObuild <- rc=0
      150121181709 Preparing to reboot in 10 seconds, press control-C to abort
      

    The new server was deployed in one command and you avoid any manual mistake by using the toolkit. The example above is just one of the many was to use the toolkit. This is a very powerful and simple tool and I really want to see other Europe customers using it, so ask you IBM Pre-sales, ask for PowerCare and take the control of you deployment by using the toolkit. The toolkit is also used to capture and redeploy a whole frame for disaster recovery plan.

    Live Partition Mobility Automation Tool

    Because understanding the provisioning toolkit didn't takes me one full week we still had plenty of time the with Bonnie from STG Lab Service and we decided to give a try to another tool called Live Partition Mobility Automation Tool. I'll not talk about it in details but this tool allows you to automatize your Live Partition Mobility moves. It's a web interface coming with a tomcat server that you can run on a Linux or directly on your laptop. This web application is taking control of your Hardware Management Console and allows you to do a lot of things LPM related :

    • You can run a validation on every partitions on a system.
    • You can move you partitions by spreading or packing them on destination server.
    • You can "record" a move to replay it later (very very very useful for my previous customer for instance, we were making our moves by clients, all clients were hosted on two big P795)
    • You can run a dynamic platform optimizer after the moves.
    • You have an option to move back the partitions to their original location and this is (in my humble opinion) what's make this tool so powerfull

    lpm_toolkit

    Since I have this tool I'm now running on a week basis a validation of all my partition to check if there are any errors. I'm now using it to move and move back the partitions when I have to. So I really recommends the Live Partition Mobility Automation tool.

    Hardware Management Console 8 : Other new features

    Adding a VLAN to an already existing Shared Ethernet Adapter

    With the new Hardware Management Console you can easily add a new vlan to an already existing Shared Ethernet Adapter (failover and shared, with and without control channel : no restriction) without having to perform a dlpar operation on each Virtual I/O Server and then modifying your profiles (if you do not have the synchronization enabled). Even better by using this method to add your new vlans you will avoid any misconfiguration, for instance by forgetting to add the vlan on one or the Virtual I/O Server or by not choosing the same adapter on both side.

    • Open the Virtual Network page in the HMC and click "Add a Virtual Network". You have to remember that a Virtual Network Bridge is an Shared Ethernet Adapter, and a Load balance group is a pair of virtual adapters on both Virtual I/O Server with the same PVID :
    • add_vlan5

    • Choose the name of your vlan (in my case VLAN3331), then choose bridged network (bridged network is the new name for Shared Ethernet Adapters ...), choose "yes" for vlan tagging, and put the vlan id (in my case 3331). By choosing the virtual switch, the HMC will only let you choose a Shared Ethernet Adapter configured in the virtual switch (no mistake possible). DO NOT forget to check the box "Add new virtual network to all Virtual I/O servers" to add the vlan on both sides :
    • add_vlan

    • On the next page you have to choose the Shared Ethernet Adapter on which the vlan will be added (in my case this is super easy, I ALWAYS create one Shared Ethernet Adapter per virtual switch to avoid misconfiguration and network loops created by adding with the same vlan id on two differents Shared Ethernet Adapter) :
    • add_vlan2

    • At last choose or create a new "Load Sharing Group". A load sharing group is one of the virtual adapter of your Shared Ethernet Adapter. In my case my Shared Ethernet Adapter was created with two virtual adapters with id 10 and 11. On this screenshot I'm telling the HMC to add the new vlan on the adapter with the id 10 on both Virtual I/O Servers. You can also create a new virtual adapter to be included in the Shared Ethernet Adapter by choosing "Create a new load sharing group" :
    • add_vlan3

    • Before applying the configuration a summary is prompted to the user to check the changes :
    • add_vlan4

    Partition Templates

    You can also use the template to capture and created partitions not just systems. I'll not give you all the details because the HMC is well documented for this part and there is no tricky things to do, just follow the GUI. One more time the HMC8 is for the noobs \o/. Here are a few screenshot of partitions templates (capture and deploy) :

    create_part2
    create_part6

    A new a nice look and feel for the new Hardware Management Console

    Everybody that the HMC GUI is not very nice but it's working great. One of the major new thing of the HMC 8r8.2.0 is the new GUI. In my opinion the new GUI is awesome the design is nice and I love it. Look at the pictures below :

    hmc8
    virtual_network_diagram

    Conclusion

    The Hardware Management Console 8 is still young but offers a lot of new cool features like system and partitions template, performance dashboard and a new GUI. In my opinion the new GUI is slow and there are a lot of bugs for the moment, my advice is to use when you have the time to use it, not in a rush. Learn the new HMC on your own by trying to do all the common tasks with the new GUI (there are still impossible things to do ;-)). I can assure you that you will need more than a few hour to be familiarized with all those new features. And don't forget to call you pre-sales to have a demonstration of the STG's toolkits, both provisioning and LPM are awesome. Use it !

    What is going on in this world

    This blog is not and will never be the place for political things but with the darkest days we had in France two weeks ago with this insane and inhuman terrorists attacks I had to say a few words about it (because even if my whole life is about AIX for the moment, I'm also an human being .... if you doubt about it). Since the tragic death of 17 men and women in France everybody is raising his voice to tell us (me ?) what is right and what is wrong without thinking seriously about it. Things like this terrorist attack should never happen again. I just wanted to say that I'm for liberty, no only for the "liberty of expression", but just the liberty. By defending this liberty we have to be very careful because in the name of this defense things that are done by our government may take us what we call liberty forever. Are the phone and the internet going to be tapped and logged in the name of the liberty ? Is this liberty ? Think about it and resist.

    Updating and backuping Virtual I/O Servers with NIM : Story of APARs IV46060, IV????? and IV?????

    I recently had to find the best solution to update a bunch of Virtual I/O Server at a time. Since a couple of months I’m intensively using NIM new features such as DSM and my first thought was to use NIM to update all my Virtual I/O Servers. You’ve probably notice that a new operation exists in latest NIM version called “updateios“. With this new operation comes two new types, vios (a Virtual I/O Server machine) and ios_mksysb (a mksysb created by the backupios command on the Virtual I/O Server). I’m probably the only guy using this because at the time of writing this post the updateios command does not work. For IBMers who are reading this post I had the chance to work with french L3 Virtual I/O Server support on two PMRs (a big thanks to them for their skills and efficiency), you can have a look on it :

    • PMR 84369,664,706 : NIM updateios operation hanging on NIM master resulting in two APARs (IV?????; and IV?????) (these two APARs are still in validation at the time of writing).
    • PMR 84152,664,706 : NIM updateios problem with /usr/lpp/bos.sysmgt/nim/methods/c_updateios resulting in one APAR (IV46060) (http://www-01.ibm.com/support/docview.wss?crawler=1&uid=isg1IV46060).

    After a few weeks of work with the support we finally found two workarounds for these problems. This post will explain the solutions we found with the support. If you had one lesson to remember by reading this post keep this one : “Always subscribe to SWMA support because they are damn brillant”.

    Defining Virtual I/O Server object

    If you are reading this post I hope you’ve already read my post about NIM Less known features. If you have no time to read this one here is a reminder. Before running any operation on a Virtual I/O Server, you have to create management objects associated to it :

    • Create the HMC object :
    • # dpasswd -f foo  -U hscroot
      Password file is /etc/ibm/sysmgt/dsm/config/foo
      Password:
      Re-enter password:
      Password file created.
      # dkeyexch -f /etc/ibm/sysmgt/dsm/config/myhmc_passwd -I hmc -H myhmc
      OpenSSH_6.0p1, OpenSSL 0.9.8x 10 May 2012
      # nim -o define -t hmc -a if1="find_net myhmc 0" -a passwd_file=/etc/ibm/sysmgt/dsm/config/myhmc_passwd myhmc
      
    • Create the CEC object, I’m using in this example the nimquery command to find serial number and machine type :
    • # nimquery -a hmc=myhmc-p | grep ^CEC
      [..]
      CEC SERVER1 - 8202-E4B_6565655 :
      CEC SERVER2 - 8205-E6B_0606065 :
      [..]
      # nim -o define -t cec -a hw_type=8202 -a hw_model=E4B -a hw_serial=6565655 -a mgmt_source=myhmc SERVER1 
      
    • Created the vios object, I’m using in this example the nimquery command to find the identity field :
    • # nimquery -a cec=SERVER1 -p
      [..]
      LPAR my_vios - lpar_id 2 :
              allow_perf_collection = 1
              auto_start = 0
              curr_lpar_proc_compat_mode = POWER7
              curr_profile = my_vios
              default_profile = my_vios
              desired_lpar_proc_compat_mode = default
              logical_serial_num = 6565655
              lpar_avail_priority = 191
              lpar_env = vioserver
              lpar_id = 2
              lpar_keylock = norm
              msp = 1
              name = my_vios
              os_version = VIOS 2.2.2.1
              power_ctrl_lpar_ids = none
              redundant_err_path_reporting = 0
              resource_config = 1
              rmc_ipaddr = 10.10.20.107
              rmc_state = active
              shared_proc_pool_util_auth = 1
              state = Running
              time_ref = 0
              work_group_id = none
      [..]
      # nim -o define -t vios -a if1="1020-10-10-20-0-s24-net my_vios 0" -a mgmt_source="SERVER1" -a identity=2  my_vios
      
    • Check everything is ok by using lsnim command :
    • # lsnim -t hmc
      my_hmc      management       hmc
      # lsnim -t cec
      SERVER2     management       cec
      # lsnim -t vios
      my_vios           management       vios
      

    Setup Virtual I/O Server as a nim client

    Only a few people knows that a Virtual I/O Server can be a setup as a NIM Client. Remember that you never had to use oem_setup_env to perform administration tasks on Virtual I/O Server. To setup a Virtual I/O Server as a NIM client use a special command called remote_management as padmin. It’s the niminit command for a Virtual I/O Server. Keep in mind that the remote_management setup NIM client to use nimsh protocol (it’s important for the rest of this post.) :

    • You probably had to add NIM servers entries in your /etc/hosts file :
    • # hostmap -addr 10.10.20.140 -host my_nim1 my_nim1.lab.chmod666.org
      # hostmap -addr 10.10.20.141 -host my_nim2 my_nim2.lab.chmod666.org
      
    • Enable remote_management :
    • # remote_management -interface en0 my_nim1
      nimsh:2:wait:/usr/bin/startsrc -e "LIBPATH=/usr/lib" -g nimclient >/dev/console 2>&1
      0513-059 The nimsh Subsystem has been started. Subsystem PID is 7340278.
      
    • If you have to disable remote_management use the disable option :
    • # remote_management -disable
      0513-044 The nimsh Subsystem was requested to stop.
      
    • Check nimsh is running :
    • # ps -ef | grep nimsh
          root 5767198 5963976   0   Aug 23      -  0:00 /usr/sbin/nimsh -s
      

    Backuping Virtual I/O Server by creating an ios_mksysb resource.

    Before updating the Virtual I/O Server create a ios_mksysb. Most PowerVM administrator are running a script from the Virtual I/O Server but you can now invoke the backupios command from the NIM server. You can now do this for all your Virtual I/O Server and store the ios_mksysb on the NIM server, much easier than running a command on the Virtual I/O Server and mounting an NFS share on it …. :

    # nim -o define -t ios_mksysb -a source=my_vios -a location=/export/nim/mksysb/my_vios/my_vios-ios_mksysb  -a server=master -a mk_image=yes my_vios-ios_mksysb
    +---------------------------------------------------------------------+
                    System Backup Image Space Information
                  (Sizes are displayed in 1024-byte blocks.)
    +---------------------------------------------------------------------+
    Required = 7316181 (7145 MB)    Available = 386230180 (377178 MB)
    
    
    /tmp/7274624.mnt0/myvios-ios_mksysb  doesn't exist.
    
    Creating /tmp/7274624.mnt0/myvios-ios_mksysb
    Backup in progress.  This command can take a considerable amount of time
    to complete, please be patient...
    
    
    Creating information file (/image.data) for rootvg.
    
    Creating list of files to back up.
    ....
    Backing up 169631 files............
    51526 of 169631 files (30%)..............................
    155443 of 169631 files (91%)..
    
    169631 of 169631 files (100%)
    0512-038 savevg: Backup Completed Successfully.
    

    While running this command you can have a look on the Virtual I/O Server. By “proctreeing” the nimsh process you can check that the backupios with mksysb flag command is running :

    # proctree -a  9240678
    1    /etc/init
       3342492    /usr/sbin/srcmstr
          5046448    /usr/sbin/nimsh -s
             10813570    /usr/sbin/nimsh -s
                6160534    /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_nimpush /usr/lpp/bos.sysmgt/nim/meth
                   7274624    /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_backupios -aserver=my_nim1 -al
                      9240678    /usr/ios/cli/ioscli backupios -file /tmp/7274624.mnt0/my_vios-ios_mksysb -mk
                         10158278    /bin/ksh /usr/bin/savevg -X -i -f /tmp/7274624.mnt0/my_vios-ios_mksysb rootv
                            8585348    /bin/ksh /usr/bin/savevg -X -i -f /tmp/7274624.mnt0/my_vios-ios_mksysb rootv
                               10223832    /usr/bin/sleep 10
                            9764964    /usr/bin/cat /tmp/mksysb.10158278/.archive.list.10158278
                            11337872    backbyname -i -q -v -Z -p -U -f /tmp/7274624.mnt0/my_vios-ios_mksysb
    

    After the ios_mksysb creation you can check the source and the ioslevel of your backup :

    # lsnim -l my_vios-ios_mksysb
    my_vios-ios_mksysb:
       class         = resources
       type          = ios_mksysb
       arch          = power
       Rstate        = ready for use
       prev_state    = unavailable for use
       location      = /export/nim/mksysb/my_vios/my_vios-ios_mksysb
       version       = 6
       release       = 1
       mod           = 8
       oslevel_r     = 6100-07
       alloc_count   = 0
       server        = master
       creation_date = Mon Sep 30 11:52:35 2013
       source_image  = my_vios
       ioslevel      = 2.2.2.1
    

    Committing existing updates on the Virtual I/O Server with updateios operation.

    Commit all uncommitted updates on the Virtual I/O Server. The NIM command will invoke “ioscli updateios -commit” command on the Virtual I/O Server. Remember to remove all ifix/efix before commiting (use emgr)

    # /usr/sbin/emgr -r -L IV16920s02
    # nim -o updateios -a lpp_source=vios2223-fp26-sp02-lpp_source  -a accept_licenses=yes -a preview=no -a updateios_flags="-commit" -a force=yes my_vios
    

    Updating Virtual I/O Server with updateios operation.

    First of all if the Virtual I/O Server is member of a Shared Storage Pool cluster it can’t be updated. Leave the cluster before running the update :

    #  clstartstop -stop -n my_cluster -m my_vios
    

    You will face two problems when updating a Virtual I/O Server from NIM with the updateios operation. Running an updateios operation from the NIM server call the script /usr/lpp/bos.sysmgt/nim/methods/c_updateios on the Virtual I/O Server. If you perform the updateios operation this one will fail with this output :

    # nim -o updateios -a lpp_source=vios2223-fp26-sp02-lpp_source  -a accept_licenses=yes -a preview=no -a updateios_flags="-install" -a force=yes my_vios
    [..]
    ******************************************************************************
    End of installp PREVIEW.  No apply operation has actually occurred.
    ******************************************************************************
    
    Continue bos.rte.install installation [y|n]?
    [..]
    ******************************************************************************
    End of installp PREVIEW.  No apply operation has actually occurred.
    ******************************************************************************
    
    Continue the installation [y|n]?
    Command did not complete.
    

    As you can see on the output the updateios command is interactive and ask TWO yes/no questions. On the Virtual I/O Server while running the updateios operation you can check that /usr/lpp/bos.sysmgt/nim/methods/c_updateios is called by nimsh process :

    # proctree 15466556
    4260044    /usr/sbin/srcmstr
       7340280    /usr/sbin/nimsh -s -c
          12451968    /usr/sbin/nimsh -s -c
             15466556    /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_nimpush /usr/lpp/bos.sysmgt/nim/meth
                14352628    /bin/ksh /usr/lpp/bos.sysmgt/nim/methods/c_updateios -aaccept_licenses=yes -afo
                   10944754    /usr/ios/cli/ioscli updateios -install -dev /tmp/_nim_dir_14352628/mnt0 -f -acc
                      5374158    installp -e install.log -a -d /tmp/_nim_dir_14352628/mnt0 bos.rte.install
                         9961620    installp -e install.log -a -d /tmp/_nim_dir_14352628/mnt0 bos.rte.install
    

    If you edit the /usr/lpp/bos.sysmgt/nim/methods/c_updateios you can see at the line 130 that ‘y’ it just send one time :

    # vi /usr/lpp/bos.sysmgt/nim/methods/c_updateios
    [..]
                    -install)
                            argument="-install -dev $lpp_access ${force:+-f} ${accept_licenses:+-accept}"
                            if [[ $preview = "no" ]]; then
                                    command="eval echo 'y' | /usr/ios/cli/ioscli updateios $argument"
                            else
                                    command="eval echo 'n' | /usr/ios/cli/ioscli updateios $argument"
                            fi
                            ;;
    [..]
    

    Modify the ‘y’ by ‘y\ny’ and the script will send two ‘y’, easy :-) :

    # grep -n eval /usr/lpp/bos.sysmgt/nim/methods/c_updateios | head -1
    130:                            command="eval echo 'y\ny' | /usr/ios/cli/ioscli updateios $argument"
    

    Rerun the NIM operation and the update will start.

    At the end of the installation you will probably face another problem. This one occurs only if the Virtual I/O Server NIM client is using nimsh protocol. The NIM operation will hang forever on the NIM server : on the Virtual I/O Server a socket remain opened between the NIM client and the NIM server:

    # netstat -Aan |grep 3901
    f1000e0001cb2bb8 tcp4       0      0  10.10.20.107.3901   10.10.20.140.1021   ESTABLISHED
    f1000e00098bdbb8 tcp        0      0  *.3901                *.*                   LISTEN
    # rmsock f1000e00098bdbb8 tcpcb
    The socket 0xf1000e00098bd808 is being held by proccess 8126526 (accessprocess).
    #  rmsock f1000e0001cb2bb8 tcpcb
    The socket 0xf1000e0001cb2808 is being held by proccess 12386388 (cimserver).
    #  proctree 12386388
    12386388
       8323090    /usr/ios/lpm/sbin/eventhelper --events ref_code,lpar_state,not_ivm,migration_st
    # proctree 8126526
    15269920    /usr/bin/ksh /usr/ios/lpm/sbin/lparmgr all start
       8126526    /usr/ios/lpm/sbin/accessprocess
    # ps -ef |grep 12386388
        root  8323090 12386388   0 15:44:31      -  0:00 /usr/ios/lpm/sbin/eventhelper --events ref_code,lpar_state,not_ivm,migration_state,vsp_state
        root 12386388        1   0 15:42:56      -  0:16 [cimserve]
    

    The issue was found with the support, a command called by cimserve called climgr is not closing correctly its file descriptors a the end of the update, modify this script to close all opened file descriptor :

    # grep -n exec /usr/ios/sbin/climgr
    366:exec 1<&-
    367:exec 2<&-
    368:exec 5<&-
    

    Rerun the operation and evrything will just work fine :-)

    Conclusion

    I assume these two problems will be fixed in the next Virtual I/O Server release, probably not the 2.2.3.0 version but the next one (I have to wait in average 6 months before the fix is applied to the current version). Once again I want to thanks the IBM Support for helping me on these cases and for their efficiency. I hope it helps.

    NIM Less known features : HANIM, nimsh over ssl, DSM

    The Network Installation Manager server is one of the most important host in an environment. New machines installations, machines backups, backups restorations,software (filesets), third party products installations, in some cases volume group backups are made from the NIM server. Some best practices have to be respected. I’ll give you in this post a few tricks for NIM. First off all a NIM server has to be in your disaster recovery plan because it the first server needed when you have to re-build a crashed machine : my solution HANIM. It has to be secured (nimsh, and nimsh authentication over ssl), and it has to be flexible and automated (DSM).

    NIM High Availability : HANIM

    Finding documentation and information about NIM High Availability is not so easy. I recommend you to check the NIM from a to Z Redbook, it’s one of the only viable source for HANIM. HANIM simple to setup and simple to use, but there are a few things to know and to understand about it :

    HANIM Overview

    • The alternate NIM master is a backup NIM build from the NIM master.
    • Takeover operations from master to alternate are manuals. PowerHA can be used to run these takeover operations but my advice is not to use it. Takeover can be performed even if the NIM master is down. HANIM does not perform any heartbeat.
    • HANIM only provides a method for replicating NIM database and resources. Resources can be replicated from master to alternate : NIM database AND resources data can be replicated (replicate=yes option).
    • My advice is to run every NIM operation from the master (even if it is possible to run a NIM operation from the alternate).
    • Disks are not shared between the master and the alternate, when a sync operation is done, missing resources are copied over NFS form the master to the alternate, or from the alternate to the master. HANIM does not provides a filesystem takeover.
    • A takeover operation modify all the nimclient’s /etc/niminfo files. The NIM_MASTER_HOSTNAME_LIST is modified by the takeover operation and the alternate NIM master is moved in first position. The NIM_MASTER_HOSTNAME is modified with the alternated NIM master hostname.


    Initial setup

    On the NIM master and on the alternate NIM master some filesets have to be installed, check the presence of : bos.sysmgt.nim.master, bos.sysmgt.nim.spot, bos.sysmgt.nim.client. NIM master and alternate NIM master must be one the same AIX version :

    # lslpp -l | grep -i nim
      bos.sysmgt.nim.client     7.1.2.15  COMMITTED  Network Install Manager -
      bos.sysmgt.nim.master     7.1.2.15  COMMITTED  Network Install Manager -
      bos.sysmgt.nim.spot       7.1.2.15  COMMITTED  Network Install Manager - SPOT
      bos.sysmgt.nim.client     7.1.2.15  COMMITTED  Network Install Manager -
    # oslevel -s
    7100-02-02-1316
    

    Configure the NIM master

    Initialize the NIM master with the nimconfig command, you’ll need to name the first network used by NIM. nimesis daemons will be started at this step.

    # nimconfig -a pif_name=en0 -a netname=10-10-20-0-s24-net -a master_port=1058 -a verbose=3 -a cable_type=N/A
    [..]
    Checking input attributes.
    attr_ass:
            'cpuid' => '00F359164D00'
            'pif_name' => 'en0'
            'netname' => '10-10-20-0-s24-net'
            'master_port' => '1058'
            'cable_type' => 'N/A'
            'net_addr' => '10.10.20.1'
            'snm' => '255.255.255.0'
            'adpt_addr' => '667C70F7A904'
            'adpt_name' => 'ent0'
    Making sure the NIM Master package is OK.
          set_state: id=1361463886; name=; state_attr=85; new_state=5;
       checking the object definition of ;
       checking interface info for master;
    Built NIM infomation file.
          10.10.20.1 is known as nim_master
    Adding default route 10.10.20.254 to network object
             0 - /usr/lpp/bos.sysmgt/nim/methods/m_mknet
             1 - -anet_addr=10.10.20.1
             2 - -asnm=255.255.255.0
             3 - -tent
             4 - -arouting1=default 10.10.20.254
             5 - 10-10-20-0-s24-net
    Connecting NIM master to master network.
             0 - /usr/lpp/bos.sysmgt/nim/methods/m_chmaster
             1 - -aif1=10-10-20-0-s24-net nim_master 667C70F7A904
             2 - -amaster_port=1058
             3 - -aregistration_port=1059
             4 - -acable_type1=N/A
             5 - master
    Adding NIM deamons to SRC and starting....
    0513-071 The nimesis Subsystem has been added.
    0513-071 The nimd Subsystem has been added.
    0513-059 The nimesis Subsystem has been started. Subsystem PID is 9568296.
    [..]
    

    NIM resources such as spot, lpp_source and so on can be created right now, please refer to the NIM cheatsheet by chmod666.org ;-). For the purpose of this post some resources (spot, lpp_source, mksysb, network) are created, these ones will be replicated later.

    Configure the alternate NIM master

    NIM alternate master is configured with the niminit command. If you check on the NIM from a to Z, page 124, a note is warning you about the synchronization : “At the time of writing, only rsh/rshd communication is supported for NIM synchronization.”.THIS STATEMENT IS FALSE : I’m using nimsh for the synchronization, and I recommend to use it. We are in 2013, do not use rsh anymore.

    # niminit -a is_alternate=yes -a master=nim_master -a pif_name=en0 -a cable_type1=N/A -a connect=nimsh -a name=nim_alternate
    0513-071 The nimesis Subsystem has been added.
    0513-071 The nimd Subsystem has been added.
    0513-059 The nimesis Subsystem has been started. Subsystem PID is 10944522.
    nimsh:2:wait:/usr/bin/startsrc -g nimclient >/dev/console 2>&1
    0513-044 The nimsh Subsystem was requested to stop.
    0513-059 The nimsh Subsystem has been started. Subsystem PID is 5963998.
    

    Verification

    You’re done with the configuration, you can now start to synchronize, replicate and takeover… pretty easy. Here are some points you can verify :

    • On the NIM master, the attribute is_alternate is set to yes :
    • # lsnim -l master
      [..]
         is_alternate        = yes
      [..]
      
    • On the NIM master, a new machine object typed alternate_master is created :
    • # lsnim -t alternate_master
      nim_alternate     machines       alternate_master
      
    • After the first database synchronization, on the alternate NIM master, a new machine object typed alternate_master is created, this the NIM master :
    • # lsnim -t alternate_master
      nim_master     machines       alternate_master
      
    • On the alternate NIM master, the attribute is_alternate does not exists :
    • # lsnim -l master | grep alternate
      

    Synchronization and replication

    NIM master and alternate NIM master can now communicate with each others, some resources are created on the master, and it’s now time to synchronize. Remember : HANIM only provides a method for replicating NIM database and resources. You can -if you want- synchronize the NIM database only or the NIM database and its resources (data included). Remember : never perform a NIM synchronization from the alternate NIM master.

    Database synchronization only

    The database synchronization is useful, when objects are modified, for example when you are modifying a subnet mask for a network object. It also can be useful when objects “without files” are created ; for instance a machine. On the other hand if your are trying to synchronize the database if an object “with a file” exists such as an lpp_source, a spot, or an fb_script, this one will not be created, you have to copy the file before synchronize, or use the replicate attribute :

    • On NIM master two objects are created, an fb_script and a machine:
    • # nim -o define -t fb_script -a server=master -a location=/export/nim/others/postinstall/fb_script.ksh fb_script01
      # ls -l /export/nim/others/postinstall/fb_script.ksh
      -rw-r--r--    1 root     system           35 Mar  8 18:01 /export/nim/others/postinstall/fb_script.ksh
      # lsnim ruby
      ruby     machines       standalone
      
    • A database synchronization is performed :
    • # nim -o sync -a force nim_alternate
      [..]
      The level of the NIM master fileset on this machine is: 7.1.2.15
      The level of the NIM database backup is: 7.1.2.15
      [..]
      Checking NIM resources
        Removing fb_script01
          0518-307 odmdelete: 1 objects deleted. from nim_attr (serves attr)
          0518-307 odmdelete: 0 objects deleted. from nim_attr (group memberships)
          0518-307 odmdelete: 5 objects deleted. from nim_attr (resource attributes)
          0518-307 odmdelete: 1 objects deleted. from nim_object (resource object)
        Finished removing fb_script01
      
    • On the alternate NIM master, the machine object is here but the fb_script was not replicated because the file was not present on the alternate NIM master :
    • # lsnim ruby
      ruby     machines       standalone
      # lsnim fb_script01
      0042-053 lsnim: there is no NIM object named "fb_script01"
      
    • If you copy the file before synchronize the resource will be created :
    • master# scp fb_script.ksh nim_alternate:/export/nim/others/postinstall
      fb_script.ksh                      100%   35     0.0KB/s   00:00
      
      master# nim -o sync -a force nim_alternate
      [..]
      Restoring the NIM database from /tmp/_nim_dir_13041674/mnt0
      x ./etc/NIM.level, 9 bytes, 1 tape blocks
      [..]
        Keeping fb_script01
      
      alternate# # lsnim fb_script01
      fb_script01     resources       fb_script
      

      Synchronization with replication

      I encourage you not to use the database synchronization, but to use it with replication, it does the same job but copy the files for you. Much much easier, just add replicate=yes attribute to the nim command, it works like a charm :

      # lsnim -q sync alternate_master
      
      the following attributes are optional:
              -a verbose=
              -a replicate=
              -a reset_clients=
      # nim -o sync -a force=yes -a replicate=yes alternate_master
      

      Takeover

      If the NIM master is down a takeover operation allows the alternate NIM master to become NIM master for the clients. On clients /etc/niminfo file is modified (NIM_MASTER_HOSTNAME and NIM_MASTER_HOSTNAME_LIST attributes are modified).

      • /etc/niminfo and lsnim output file before a takeover operation :
      • client# grep -E "NIM_MASTER_HOSTNAME_LIST|NIM_MASTER_HOSTNAME" /etc/niminfo
        export NIM_MASTER_HOSTNAME=nim_master
        export NIM_MASTER_HOSTNAME_LIST="nim_master nim_alternate"
        master# lsnim -l client | grep current_master
           current_master = nim_master
        
      • Takeover operation is initiated from the alternate NIM master :
      • alternate# nim -o takeover -a show_progress=yes nim_master
        +-----------------------------------------------------------------------------+
                              Performing "reset" Operation
        +-----------------------------------------------------------------------------+
        +-----------------------------------------------------------------------------+
                              "reset" Operation Summary
        +-----------------------------------------------------------------------------+
         Target                  Result
         ------                  ------
         client                   RESET
         client1                  RESET
         [..]
        +-----------------------------------------------------------------------------+
                              Initiating "takeover" Operation
        +-----------------------------------------------------------------------------+
         Initiating the takeover operation on machine 1 of 240: client ...
        
         Initiating the takeover operation on machine 2 of 240: client1...
        [..]
        +-----------------------------------------------------------------------------+
                              "takeover" Operation Summary
        +-----------------------------------------------------------------------------+
         Target                  Result
         ------                  ------
         client                  SUCCESS
         client1                 SUCCESS
        [..]
        alternate# lsnim -l client | grep current_master
           current_master = nim_alternate
        client# grep -E "NIM_MASTER_HOSTNAME_LIST|NIM_MASTER_HOSTNAME" /etc/niminfo
        export NIM_MASTER_HOSTNAME=nim_alternate
        export NIM_MASTER_HOSTNAME_LIST="nim_alternate nim_master"
        
      • When the NIM master is up, initiate the takeover for the master :
      • # nim -o takeover -a show_progress=yes nim_alternate
        

      Synchronization automation and other files ?

      I recommend to run a NIM synchronization every day, I personally have a cronjob doing it every day at eleven PM. Most of the time a NIM synchronization is not enough and you’ll need to synchronize others file in my case, my root .profile my etc/hosts file, in your case whatever you want. For this need I’m using a little script based over rsync which synchronize my master to my alternate everyday :

      # crontab -l
      [..]
      0 23 * * * /export/nim/others/tools/do_sync.ksh >/dev/null 2>&1
      [..]
      # cat /export/nim/others/tools/do_sync.ksh
      [..]
          nim -o sync -a force=yes -a replicate=yes -a reset_clients=yes ${alternate}
          /export/nim/others/tools/sync_to_alternate.ksh
      [..]
      # cat /export/nim/others/tools/sync_to_alternate.ksh
      [..]
        /usr/bin/rsync -ave ssh ${a_filesystem} ${alternate_nim_master}:${a_filesystem}
      [..]
      

      NIM Security, use nimsh and use it over SSL

      nimsh over ssl

      NIM Master configuration form nimsh over SSL

      From the NIM master enable the SSL support trough the nimconfig command, certificates will be generated in /ssl_nimsh/keys, OpenSSL fileset has to be installed :

      • Check OpenSSL filesets :
      • # lslpp -l | grep openssl
          openssl.base            0.9.8.2400  COMMITTED  Open Secure Socket Layer
          openssl.license         0.9.8.2400  COMMITTED  Open Secure Socket License
          openssl.man.en_US       0.9.8.2400  COMMITTED  Open Secure Socket Layer
          openssl.base            0.9.8.2400  COMMITTED  Open Secure Socket Layer
        
      • Use nimconfig to enable SSL support :
      • # nimconfig -c
        0513-029 The tftpd Subsystem is already active.
        Multiple instances are not supported.
        NIM_MASTER_HOSTNAME=nim_master
        x - /usr/lib/libssl.so.0.9.8
        x - /usr/lib/libcrypto.so.0.9.8
        Target "all" is up to date.
        Generating a 1024 bit RSA private key
        ......++++++
        .++++++
        writing new private key to '/ssl_nimsh/keys/rootkey.pem'
        -----
        Signature ok
        subject=/C=US/ST=Texas/L=Austin/O=ibm.com/CN=Root CA
        Getting Private key
        Generating a 1024 bit RSA private key
        ...............++++++
        .......++++++
        writing new private key to '/ssl_nimsh/keys/clientkey.pem'
        -----
        Signature ok
        subject=/C=US/ST=Texas/L=Austin/O=ibm.com
        Getting CA Private Key
        Generating a 1024 bit RSA private key
        ......++++++
        .............++++++
        writing new private key to '/ssl_nimsh/keys/serverkey.pem'
        -----
        Signature ok
        subject=/C=US/ST=Texas/L=Austin/O=ibm.com
        Getting CA Private Key
        
      • Check the NIM master : attribute ssl_support is now set to yes :
      • # lsnim -l master | grep ssl_support
           ssl_support         = yes
        

      NIM alternate master for nimsh over SSL

      If you’re using an alternate NIM master repeat the same operation (OpenSSL and nimconfig -r). Alternate NIM master is also a client of the NIM master, its client has to be configured :

      # nimclient -c
      x - /usr/lib/libssl.so.0.9.8
      x - /usr/lib/libcrypto.so.0.9.8
      Received 2763 Bytes in 0.0 Seconds
      0513-044 The nimsh Subsystem was requested to stop.
      0513-077 Subsystem has been changed.
      0513-059 The nimsh Subsystem has been started. Subsystem PID is 9502954.
      

      Client configuration

      Configure all nimclients to use ssl crypted authentication, if you are using alternate NIM master do not forget to download alternate certificates on clients :

      # rmitab nimsh 2>/dev/null 
      # rm -rf /etc/niminfo
      # niminit -aname=$(hostname) -a master=nim_master -a master_port=1058 -a registration_port=1059 -a connect=nimsh
      # nimclient -c
      # nimclient -o get_cert -a master_name=nim_alternate
      # stopsrc -s nimsh
      # startsrc -s nimsh
      

      On the NIM server itself client’s connect attribute is now set to “nimsh (secure)” :

      # lsnim -l ruby | grep connect
         connect        = nimsh (secure)
      

      Are the data encrypted ?

      Check this statement in NIM from a to Z Redbook at page 434 :

      “Any communication initiated from the NIM client (pull operation) reaches the NIM master on the request for services and registration ports (1058 and 1059, respectively). This communication is not encrypted. For any communication initiated from the NIM master (push operations), the NIM master communicates with the NIM client using the NIMSH daemon. This allows an encrypted handshake dialog during authentication. However, data packets are not encrypted.”

      To sum up :

      • Only push operations can use secure nimsh.
      • Data packets are not encrypted.
      • Secure nimsh just add an encrypted handshake between NIM master and its clients.

      Have a look on this two screenshots, the first one is the tcp stream of a non-secure operation, the second one is secured :

      • Non secure tcp stream of a push operation :
      • Secure tcp stream of a push operation :

      Distributed Systems Management

      Distributed Systems Management (we’ll call it DSM until now), is a set of tools and programs used to enhance NIM capabilities. I personally use DSM for two main purposes, opening and monitoring consoles through the dconsole utility, and to automate my installations. DSM add new objects the NIM environment, and new attributes to the NIM objects. You can also gain more on control on your lpars and directly restart, maint_boot an lpar through NIM by using DSM. Hardware Management Console (HMC objects) and Pserie’s frames (CEC objects) can be added in NIM, profile management are added to standalone objects in order to take advantage of DSM with NIM.

      There are two main source of information for DSM

      • The dsm.core fileset comes with a pdf file named dsm_tech_note.pdf, page 161, chapter 5.
      • # lslpp -f dsm.core | grep dsm_tech_note.pdf
                                /opt/ibm/sysmgt/dsm/doc/dsm_tech_note.pdf
        
      • There are full detailed examples in the IBM AIX Version 7.1 Differences Guide .

      Filesets prerequisites

      Starting with AIX 6.1 TL3 base installation media are shipped with DSM packages (dsm.core). expect, tcl, tk, and xterm are needed by this DSM pacakges :

      # lslpp -l | grep -E "dsm|tcl|tk|expect|xterm"
        X11.apps.aixterm           7.1.2.0  COMMITTED  AIXwindows aixterm Application
        X11.apps.xterm            7.1.2.15  COMMITTED  AIXwindows xterm Application
        X11.msg.en_US.apps.aixterm
                                   7.1.2.0  COMMITTED  AIXwindows aixterm Messages -
        dsm.core                  7.1.2.15  COMMITTED  Distributed Systems Management
        dsm.dsh                   7.1.2.15  COMMITTED  Distributed Systems Management
        expect.base               5.42.1.0  COMMITTED  Binary executable files of
        expect.man.en_US          5.42.1.0  COMMITTED  Expect man page documentation
        tcl.base                   8.4.7.0  COMMITTED  Binary executable files of Tcl
        tcl.man.en_US              8.4.7.0  COMMITTED  Tcl man page documentation
        tk.base                    8.4.7.0  COMMITTED  Binary executable files of Tk
        tk.man.en_US               8.4.7.0  COMMITTED  Tk man page documentation
      

      Defining HMC objects

      DSM is using HMC to start (poweron) lpars, stop (poweroff) lpars and open console on lpars. HMC can be defined on NIM. An HMC object is a management object. To avoid prompting password each time a NIM operations is performed, or each time dconsole is called, DSM provides a mechanism to manage SSH key sharing between the NIM and the HMC. Before adding an HMC object use dpasswd and dkeyexch command to enable SSH key authentication :

      • Create the authentication file with dpasswd command. File is by default stored in /etc/ibm/sysmgm/dsm/config :
      • # dpasswd -f hmc1_passwd -U hscroot
        Password:
        Re-enter password:
        Password file created
        # ls -l  /etc/ibm/sysmgt/dsm/config/
        total 24
        -r--r--r--    1 root     system           16 Mar 11 13:25 .key
        -r--r--r--    1 root     system           24 Mar 11 13:25 hmc1_passwd
        
      • Share the key between NIM master and HMC using dkeyexch command :
      • # dkeyexch -f /etc/ibm/sysmgt/dsm/config/hmc1_passwd -I hmc -H hmc1
        OpenSSH_6.0p1, OpenSSL 0.9.8x 10 May 2012
        
      • At this step you should be able to connect to the HMC without password prompting :
      • # ssh hscroot@hmc1
        Last login: Mon Mar 11 13:51:35 2013 from 10.10.20.21
        
      • Define the new HMC object with nim command, the network on which the HMC is running must be defined as an NIM network :
      • # nim -o define -t ent -a net_addr=10.10.30.0 -a snm=255.255.254.0 -a routing1="default 10.10.31.254" 10-10-30-0-s23-net
        # nim -o define -t hmc -a if1="find_net hmc1 0" -a passwd_file=/etc/ibm/sysmgt/dsm/config/hmc1_passwd hmc1
        # lsnim -t hmc
        hmc1     management       hmc
        # lsnim -lF hmc1
        hmc1:
           id          = 1363005068
           class       = management
           type        = hmc
           if1         = 10-10-30-0-s23-net hmc1 0
           Cstate      = ready for a NIM operation
           prev_state  =
           Mstate      = not running
           passwd_file = /etc/ibm/sysmgt/dsm/config/hmc1_passwd
        

      Defining CEC objects

      Defining HMC object allows to define CEC object, NIM CEC‘s object are requiring four mandatory attributes, hardware type (hw_type), hardware model (hw_model), hardware serial (hw_serial), and the HMC used to control this CEC object (mgmt_source). Query the HMC to get the attributes with lssyscfg command, and define the new CEC object with the nim command :

      • Querying HMC to get hw_model, hw_serial, and hw_type :
      • # ssh hscroot@hmc1 "lssyscfg -r sys -F name,type_model,serial_num"
        # CEC1,8203-E4A,060CE99
        
      • lssyscfg output tells you that : hw_type=8203, hw_model=EA4 and hw_serial=060CE99
      • Create the CEC object :
      • # nim -o define -t cec -a hw_type=8203 -a hw_model=E4A -a hw_serial=060CE99 -a mgmt_source=hmc1 cec1
        # lsnim -l cec1
        cec1:
           class      = management
           type       = cec
           Cstate     = ready for a NIM operation
           prev_state =
           hmc        = hmc1
           serial     = 8203-E4A*060CE99
        

      Adding profile management to standalone object

      To define a standalone object with a management profile or to add a management profile to an existing standalone, MAC address and lpar id are needed, the lpar id can easily be learned by the HMC, for the MAC address use the dgetmacs command to get it:

      • Get the lpar id trough the HMC :
      • ssh hscroot@infmc102 "lssyscfg -r lpar -m CEC1 -F name,lpar_id"
        lpar1,5
        lpar2,4
        vios1,3
        vios2,2
        lpar3,1
        
      • Define the machine and replace the MAC address by 0 :
      • # nim -o define -t standalone -a if1="10-10-20-0-s24-net lpar2 0" -a net_settings1="auto auto" -a mgmt_profile1="hmc1 4 CEC1" lpar2
        
      • Retrieve the machine MAC address by using the dgetmacs command, the host will booted on openfirmware. If the host is already installed get the MAC address with entstat command directly on the machine :
      • #  dgetmacs -n lpar2 -C NIM
        Using an adapter type of "ent".
        Could not dsh to node lpar2.
        Attempting to use openfirmware method to collect MAC addresses.
        Acquiring adapter information from Open Firmware for node lpar2.
        
        # Node::adapter_type::interface_name::MAC_address::location::media_speed::adapter_duplex::UNUSED::install_gateway::ping_status::machine_type::netaddr::subnet_mask
        
        lpar1::ent_v::::2643EEBC6C04::U8203.E4A.060CE99-V4-C4-T1::auto::auto::::::n/a::secondary::::
        
      • Modify the NIM object to add the MAC address :
      • # nim -o change -a if1="10-10-20-0-s24-net lpar2 2643EEBC6C04" lpar2
        

      Using dconsole to open and monitor machines consoles

      If the machine is already installed, or after the installation with a bos_inst operation, you can manage its console with the dconsole command. A few cool things comes with dconsole such as opening a console in read only mode, opening a console in text mode or through an xterm, and logging all consoles outputs into /var/ibm/sysmgt/dsm/log/console; here are a few examples :

      • Opening a text console in read-write mode and log the output in /var/ibm/sysmgt/dsm/log/console :
      • # dconsole -C NIM -n lpar2 -t -l
        Starting console daemon
        [read-write session]
        
         Open in progress
        
         Open Completed.
        AIX Version 7
        Copyright IBM Corporation, 1982, 2013.
        Console login: root
        # echo test
        test
        # tail -10 /var/ibm/sysmgt/dsm/log/console/lpar2.0
        # echo test
        test
        # exit
        
      • Opening an xterm console in read-write mode and log the output in /var/ibm/sysmgt/dsm/log/console on greenclient1 :
      • # export DISPLAY=10.10.20.35:0
        # dconsole -C NIM -n greenclient1  -l
        Starting console daemon
        

      • Opening a text console in read-only mode :
      • # dconsole -C NIM -n lpar2  -l -t -r
        Starting console daemon
        [read only session, user input discarded]
        
         Open in progress
        
         Open Completed.
        AIX Version 7
        Copyright IBM Corporation, 1982, 2013.
        Console login: [read only session, user input discarded]
        [read only session, user input discarded]
        

      bos_inst operation through NIM with DSM

      Machine installation and bos_inst operation can be automated with DSM. If a machine has a management profile and a bos_inst operation is performed this one will be rebooted and automatically installed, I do install machine with this method and it works like a charm :

      • Install the machine lpar2 in aix 7100-02-02, a bosinst_data with no prompt stanza was created for this installation :
      • # nim -o bos_inst -a bosinst_data=hdisk0_noprompt-bosinst_data -a source=rte -a installp_flags=agX -a accept_licenses=yes -a spot=7100-02-02-1316-spot -a lpp_source=7100-02-02-1316-lpp_source lpar2
        dnetboot Status: Invoking /opt/ibm/sysmgt/dsm/dsmbin/lpar_netboot lpar2
        dnetboot Status: Was successful network booting node lpar2.
        
      • DSM is using HMC lpar_netboot command to install machines, the output of this command can be found in /tmp filesystem :
      • # cat /tmp/lpar_netboot.12124286.exec.log
        lpar_netboot Status: process id is 12124286
        lpar_netboot Status: lpar_netboot -i -t ent -D -S 10.10.20.140 -G 10.10.20.254 -C 10.10.20.202 -m 2643EEBC6C04 -s auto -d auto -F /etc/ibm/sysmgt/dsm/config/hmc1_passwd -j hmc -J 10.10.30.1 4 060C
        E74 8203-E4A
        [..]
        IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
        IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
        
                  1 = SMS Menu                          5 = Default Boot List
                  8 = Open Firmware Prompt              6 = Stored Boot List
        [..]
        10.10.20.202:    24  bytes from 10.10.20.140:  icmp_seq=7  ttl=? time=21  ms
        
        10.10.20.202:    24  bytes from 10.10.20.140:  icmp_seq=8  ttl=? time=21  ms
        PING SUCCESS.
        [..]
        38300 ^MPACKET COUNT = 38400 ^MPACKET COUNT = 38500 ^MPACKET COUNT = 38600 ^MPACKET COUNT = 38700 ^MPACKET COUNT = 38800 ^MPACKET COUNT = 38900 ^MFINAL PACKET COUNT = 38913
        FINAL FILE SIZE = 19922944  BYTES
        
      • The installation progression can be monitored form the NIM itself :
      • # lsnim -l lpar2 |grep info
           info           = BOS install 39% complete : Installing additional software.
        

      Is it free ?

      Unlike CSM DSM is free, you do not need any licenses to use it. As you can see these tools can be very powerful to automate installations for standalone clients. VMControl is using DSM and NIM to automate installations. DSM is the right tool to industrialize your NIM installations.

      Cheatsheet

      I love cheat sheet ! NIM commands are complex and hard to remember, I’ve search over the internet if a NIM cheat sheet already exists but I haven’t found anything correct or anything that fits my needs. I’m sure that a lot of my readers already knows William Favorite’s Quicksheets. I’m a huge fan of this Quicksheets and I was inspired by Willam when creating my own one for NIM. Feel free to contact me if you want to add or correct something in my cheat sheet, you’ll be -of course- credited if you add some useful informations. Click here to download my NIM cheat sheet :chmod666 NIM Cheat Sheet

      No future ?

      I do love NIM, but in my opinion it’s a little bit outdated, everyone is calling for an update of the Redbook (click here to call for an update ;-)) and of the product, me included. This part of the post was inspired by one of my AIX Guru, thanks to him, I’m sure he’ll recognize himself. If IBMers are reading this part of the post, please tell IBM to update NIM. Readers please react in the comments if you agree with me on this point. Here are a few points I want to see in a future NIM release :

      • Network package repository of software : publish lpp_source over http or https. IBM can publish an official repository, and customer can create its own one on the NIM server (this one can be synchronized with IBM official repository).
      • Create a client (updated nimclient) with search and download option. (Yes like yum).
      • Getting rid of bootp and tftp, download kernel (created in /tftpboot when a new SPOT is created) and ramdisk image trough http or https.
      • Replace nfs exports by http or https (or force nfsv4) for NIM resources sharing (SPOT, lpp_source, install_script, bosinst_data…)(easier for security, and firewall ruling).
      • Allow IPL menu to be setup in dhcp.
      • Automatic dependencies checking and resolution while installing a software.
      • Simplify postinstall (script) and firstboot (fb_script). My actual solution is to create a firstboot script, this one download a script and add an entry in /etc/inittab, the downloaded script do the job and remove the entry in /etc/inittab at the end of its execution.
      • Automatic multibos creation while updating a system trough NIM — or in option.
      • Keep mksysb the way it is, this is the best bare metal backup I ever known.
      • Getting rid of rsh, force user to use nimsh (for nimadm too).
      • Better design for high availability (HANIM auto sync for example).
      • NIM Database flexibility : Let user renaming an resource object (please do this !!!) — Who has never experienced this problem while creating a SPOT or an lpp_source with an erroneous name ?
      • Allow allocating multiple lpp_source for different installp_bundle for installation.
      • Allow nimadm migration to be performed without the exact same level for bos.alt_disk_install.rte fileset.
      • Allow nimsh to be configured over http or https (no more multiple ports for nimsh ; easier for security, and firewall ruling).
      • Automatically enabled cryptographic authentication for NIM service handler. (nimsh can uses SSL-encrypted certificates).
      • Easier NIM backup and restore, getting rid of m_backup_db and m_restore_db.


      Please comment and react I do need support ;-). Hope this can help.

    Restoring mksysb image without nim using virtual optical disk and mkcd command

    Sometimes, it can be very usefull to restore an lpar through CD/DVD. For example, your LPAR cannot access any known VLAN, or you don’t have any nim server. PowerVM virtualisation provides the ability to store images into repositories and to load them on Virtual Optic Devices. LPAR can be restored very easy with this method.

    Creating image repository

    First of all you need to create a Repository to store virtual media cd, use mkrep command create this Repository.
    This Repository has to be created on an Storage Pool, by default, rootvg Storage Pool exists.

    # lssp
    Pool   Size(mb) Free(mb) Alloc Size(mb) BDs Type
    rootvg 279552   201216         256      0 LVPOOL
    # mkrep -sp rootvg -size 20G
    Virtual Media Repository Created
    Repository created within "VMLibrary" logical volume
    # lsrep
    Size(mb) Free(mb) Parent Pool Parent Size Parent Free
      20396    20396  rootvg           279552      201216

    Adding virtual scsi adapter on LPAR and on VIO Server

    On client LPAR

    On lpar, create a new Client SCSI Adapter device as described below :
    As you can see on image below (you can choose whatever you want) :

    • Client Vscsi adapter ID is set to 100.
    • Server Vscsi adapter ID is set to 100.

    On VIO Server

    On VIO Server, create a new Server SCSI Adapter device as described below :

    As you can see on image below, Client Vscsi adapter ID (100) and Server Vscsi adapter ID are matching :

    Post cheks on VIO Server

    On VIO Server check that Virtual SCSI adapter is here, and is matching with our ID (100) :

    # lsdev -type adapter | grep vhost0
    vhost0           Available   Virtual SCSI Server Adapter
    # lsdev -slots | grep vhost0
    U9119.FHB.84F55B6-V3-C100    Virtual I/O Slot  vhost0

    Creating a DVD mksysb form an AIX Lpar

    On the host you want to restore create an mksysb image with mksysb command :

    # mksysb -i /mksysb_images/my_node.mksysb
    [..]

    You now have to convert this mksysb into CD/DVDs, unfortunatly it’s not possible to create a big iso file and you’ll have to choose CD or DVD format. Use the mkcd command to convert mksysb file into bootable CD/DVDs. In our example we’ll create DVD sized iso files.

    # mkcd -L -S -I /mksysb_images/mkcd -m /mksysb_images/my_node.mksysb
    Initializing mkcd log: /var/adm/ras/mkcd.log...
    Verifying command parameters...
    Creating temporary file system: /mkcd/cd_fs...
    Populating the CD or DVD file system...
    Building chrp boot image...
    Copying backup to the CD or DVD file system...
    ......
    Creating Rock Ridge format image: /mksysb_images/mkcd/cd_image_11010246.vol1
    Running mkisofs ...
    .......
    mkrr_fs was successful.
    
    Making the CD or DVD image bootable...
    
    Copying the remainder of the backup to the CD or DVD file system...
    Creating Rock Ridge format image: /mksysb_images/mkcd/cd_image_11010246.vol2
    Running mkisofs ...
    .......
    mkrr_fs was successful.
    
    Copying the remainder of the backup to the CD or DVD file system...
    Creating Rock Ridge format image: /mksysb_images/mkcd/cd_image_11010246.vol3
    Running mkisofs ...
    ...
    mkrr_fs was successful.
    • -L : this option is used to create DVD sized iso images.
    • -S : this option is used to keep image file, and avoid writing it on a real DVD.
    • -I : specify the directory where images will be stored.
    • -m : mksysb image file to convert into DVDs.

    mkcd command as created three DVD files :

    # ls -l
    total 53815784
    -rw-r--r--    1 root     system   4274950144 Apr 30 13:09 cd_image_22872126.vol1
    -rw-r--r--    1 root     system   4293890048 Apr 30 13:12 cd_image_22872126.vol2
    -rw-r--r--    1 root     system   4293890048 Apr 30 13:14 cd_image_22872126.vol3
    

    Adding images to VIO Server repository

    After creating DVDs files from an mksysb file you’ll have to put them on VIO Server repository, transfer it via scp or NFS, and add it into repository :

    # mkvopt -name cd_image_22872126.vol1 -file ./cd_image_22872126.vol1
    # mkvopt -name cd_image_22872126.vol2 -file ./cd_image_22872126.vol2
    # mkvopt -name cd_image_22872126.vol3 -file ./cd_image_22872126.vol3
    # lsrep
    Size(mb) Free(mb) Parent Pool         Parent Size      Parent Free
       20397     7208 rootvg                   279552           201216
    
    Name                                    File Size Optical         Access
    cd_image_22872126.vol1                       4077 None            rw
    cd_image_22872126.vol2                       4095 None            rw
    cd_image_22872126.vol3                       4095 None            rw
    

    Create Virtual Optic Device on VIO Server

    On VIO Server create a Virtual Optic Device on Server Virtual SCSI Adapter with mkvdev command

    # mkvdev -fbo -vadapter vhost0 -dev lpar_cdrom0
    lpar_cdrom0 Available
    

    Loading and Unloading Optic Device

    You just have finished, load first DVD on Virtual Optic Device, boot the client on LPAR on Virtual SCSI Adapter and start you restore, then load DVDs one by one :

    # loadopt -disk cd_image_22872126.vol1 -vtd lpar_cdrom0 -release
    # lsrep
    lsrep
    Size(mb) Free(mb) Parent Pool         Parent Size      Parent Free
       20397     7208 rootvg                   279552           201216
    
    Name                                    File Size Optical         Access
    cd_image_22872126.vol1                       4077 lpar_cdrom0     rw
    cd_image_22872126.vol2                       4095 None            rw
    cd_image_22872126.vol3                       4095 None            rw
    
    # unloadopt -release -vtd lpar_cdrom0
    # loadopt -disk cd_image_22872126.vol2 -vtd lpar_cdrom0 -release
    lsrep
    Size(mb) Free(mb) Parent Pool         Parent Size      Parent Free
       20397     7208 rootvg                   279552           201216
    
    Name                                    File Size Optical         Access
    cd_image_22872126.vol1                       4077 None            rw
    cd_image_22872126.vol2                       4095 lpar_cdrom0     rw
    cd_image_22872126.vol3                       4095 None            rw
    

    You’re done.