NIM Less known features : HANIM, nimsh over ssl, DSM

The Network Installation Manager server is one of the most important host in an environment. New machines installations, machines backups, backups restorations,software (filesets), third party products installations, in some cases volume group backups are made from the NIM server. Some best practices have to be respected. I’ll give you in this post a few tricks for NIM. First off all a NIM server has to be in your disaster recovery plan because it the first server needed when you have to re-build a crashed machine : my solution HANIM. It has to be secured (nimsh, and nimsh authentication over ssl), and it has to be flexible and automated (DSM).

NIM High Availability : HANIM

Finding documentation and information about NIM High Availability is not so easy. I recommend you to check the NIM from a to Z Redbook, it’s one of the only viable source for HANIM. HANIM simple to setup and simple to use, but there are a few things to know and to understand about it :

HANIM Overview

  • The alternate NIM master is a backup NIM build from the NIM master.
  • Takeover operations from master to alternate are manuals. PowerHA can be used to run these takeover operations but my advice is not to use it. Takeover can be performed even if the NIM master is down. HANIM does not perform any heartbeat.
  • HANIM only provides a method for replicating NIM database and resources. Resources can be replicated from master to alternate : NIM database AND resources data can be replicated (replicate=yes option).
  • My advice is to run every NIM operation from the master (even if it is possible to run a NIM operation from the alternate).
  • Disks are not shared between the master and the alternate, when a sync operation is done, missing resources are copied over NFS form the master to the alternate, or from the alternate to the master. HANIM does not provides a filesystem takeover.
  • A takeover operation modify all the nimclient’s /etc/niminfo files. The NIM_MASTER_HOSTNAME_LIST is modified by the takeover operation and the alternate NIM master is moved in first position. The NIM_MASTER_HOSTNAME is modified with the alternated NIM master hostname.


Initial setup

On the NIM master and on the alternate NIM master some filesets have to be installed, check the presence of : bos.sysmgt.nim.master, bos.sysmgt.nim.spot, bos.sysmgt.nim.client. NIM master and alternate NIM master must be one the same AIX version :

# lslpp -l | grep -i nim
  bos.sysmgt.nim.client     7.1.2.15  COMMITTED  Network Install Manager -
  bos.sysmgt.nim.master     7.1.2.15  COMMITTED  Network Install Manager -
  bos.sysmgt.nim.spot       7.1.2.15  COMMITTED  Network Install Manager - SPOT
  bos.sysmgt.nim.client     7.1.2.15  COMMITTED  Network Install Manager -
# oslevel -s
7100-02-02-1316

Configure the NIM master

Initialize the NIM master with the nimconfig command, you’ll need to name the first network used by NIM. nimesis daemons will be started at this step.

# nimconfig -a pif_name=en0 -a netname=10-10-20-0-s24-net -a master_port=1058 -a verbose=3 -a cable_type=N/A
[..]
Checking input attributes.
attr_ass:
        'cpuid' => '00F359164D00'
        'pif_name' => 'en0'
        'netname' => '10-10-20-0-s24-net'
        'master_port' => '1058'
        'cable_type' => 'N/A'
        'net_addr' => '10.10.20.1'
        'snm' => '255.255.255.0'
        'adpt_addr' => '667C70F7A904'
        'adpt_name' => 'ent0'
Making sure the NIM Master package is OK.
      set_state: id=1361463886; name=; state_attr=85; new_state=5;
   checking the object definition of ;
   checking interface info for master;
Built NIM infomation file.
      10.10.20.1 is known as nim_master
Adding default route 10.10.20.254 to network object
         0 - /usr/lpp/bos.sysmgt/nim/methods/m_mknet
         1 - -anet_addr=10.10.20.1
         2 - -asnm=255.255.255.0
         3 - -tent
         4 - -arouting1=default 10.10.20.254
         5 - 10-10-20-0-s24-net
Connecting NIM master to master network.
         0 - /usr/lpp/bos.sysmgt/nim/methods/m_chmaster
         1 - -aif1=10-10-20-0-s24-net nim_master 667C70F7A904
         2 - -amaster_port=1058
         3 - -aregistration_port=1059
         4 - -acable_type1=N/A
         5 - master
Adding NIM deamons to SRC and starting....
0513-071 The nimesis Subsystem has been added.
0513-071 The nimd Subsystem has been added.
0513-059 The nimesis Subsystem has been started. Subsystem PID is 9568296.
[..]

NIM resources such as spot, lpp_source and so on can be created right now, please refer to the NIM cheatsheet by chmod666.org ;-). For the purpose of this post some resources (spot, lpp_source, mksysb, network) are created, these ones will be replicated later.

Configure the alternate NIM master

NIM alternate master is configured with the niminit command. If you check on the NIM from a to Z, page 124, a note is warning you about the synchronization : “At the time of writing, only rsh/rshd communication is supported for NIM synchronization.”.THIS STATEMENT IS FALSE : I’m using nimsh for the synchronization, and I recommend to use it. We are in 2013, do not use rsh anymore.

# niminit -a is_alternate=yes -a master=nim_master -a pif_name=en0 -a cable_type1=N/A -a connect=nimsh -a name=nim_alternate
0513-071 The nimesis Subsystem has been added.
0513-071 The nimd Subsystem has been added.
0513-059 The nimesis Subsystem has been started. Subsystem PID is 10944522.
nimsh:2:wait:/usr/bin/startsrc -g nimclient >/dev/console 2>&1
0513-044 The nimsh Subsystem was requested to stop.
0513-059 The nimsh Subsystem has been started. Subsystem PID is 5963998.

Verification

You’re done with the configuration, you can now start to synchronize, replicate and takeover… pretty easy. Here are some points you can verify :

  • On the NIM master, the attribute is_alternate is set to yes :
  • # lsnim -l master
    [..]
       is_alternate        = yes
    [..]
    
  • On the NIM master, a new machine object typed alternate_master is created :
  • # lsnim -t alternate_master
    nim_alternate     machines       alternate_master
    
  • After the first database synchronization, on the alternate NIM master, a new machine object typed alternate_master is created, this the NIM master :
  • # lsnim -t alternate_master
    nim_master     machines       alternate_master
    
  • On the alternate NIM master, the attribute is_alternate does not exists :
  • # lsnim -l master | grep alternate
    

Synchronization and replication

NIM master and alternate NIM master can now communicate with each others, some resources are created on the master, and it’s now time to synchronize. Remember : HANIM only provides a method for replicating NIM database and resources. You can -if you want- synchronize the NIM database only or the NIM database and its resources (data included). Remember : never perform a NIM synchronization from the alternate NIM master.

Database synchronization only

The database synchronization is useful, when objects are modified, for example when you are modifying a subnet mask for a network object. It also can be useful when objects “without files” are created ; for instance a machine. On the other hand if your are trying to synchronize the database if an object “with a file” exists such as an lpp_source, a spot, or an fb_script, this one will not be created, you have to copy the file before synchronize, or use the replicate attribute :

  • On NIM master two objects are created, an fb_script and a machine:
  • # nim -o define -t fb_script -a server=master -a location=/export/nim/others/postinstall/fb_script.ksh fb_script01
    # ls -l /export/nim/others/postinstall/fb_script.ksh
    -rw-r--r--    1 root     system           35 Mar  8 18:01 /export/nim/others/postinstall/fb_script.ksh
    # lsnim ruby
    ruby     machines       standalone
    
  • A database synchronization is performed :
  • # nim -o sync -a force nim_alternate
    [..]
    The level of the NIM master fileset on this machine is: 7.1.2.15
    The level of the NIM database backup is: 7.1.2.15
    [..]
    Checking NIM resources
      Removing fb_script01
        0518-307 odmdelete: 1 objects deleted. from nim_attr (serves attr)
        0518-307 odmdelete: 0 objects deleted. from nim_attr (group memberships)
        0518-307 odmdelete: 5 objects deleted. from nim_attr (resource attributes)
        0518-307 odmdelete: 1 objects deleted. from nim_object (resource object)
      Finished removing fb_script01
    
  • On the alternate NIM master, the machine object is here but the fb_script was not replicated because the file was not present on the alternate NIM master :
  • # lsnim ruby
    ruby     machines       standalone
    # lsnim fb_script01
    0042-053 lsnim: there is no NIM object named "fb_script01"
    
  • If you copy the file before synchronize the resource will be created :
  • master# scp fb_script.ksh nim_alternate:/export/nim/others/postinstall
    fb_script.ksh                      100%   35     0.0KB/s   00:00
    
    master# nim -o sync -a force nim_alternate
    [..]
    Restoring the NIM database from /tmp/_nim_dir_13041674/mnt0
    x ./etc/NIM.level, 9 bytes, 1 tape blocks
    [..]
      Keeping fb_script01
    
    alternate# # lsnim fb_script01
    fb_script01     resources       fb_script
    

    Synchronization with replication

    I encourage you not to use the database synchronization, but to use it with replication, it does the same job but copy the files for you. Much much easier, just add replicate=yes attribute to the nim command, it works like a charm :

    # lsnim -q sync alternate_master
    
    the following attributes are optional:
            -a verbose=
            -a replicate=
            -a reset_clients=
    # nim -o sync -a force=yes -a replicate=yes alternate_master
    

    Takeover

    If the NIM master is down a takeover operation allows the alternate NIM master to become NIM master for the clients. On clients /etc/niminfo file is modified (NIM_MASTER_HOSTNAME and NIM_MASTER_HOSTNAME_LIST attributes are modified).

    • /etc/niminfo and lsnim output file before a takeover operation :
    • client# grep -E "NIM_MASTER_HOSTNAME_LIST|NIM_MASTER_HOSTNAME" /etc/niminfo
      export NIM_MASTER_HOSTNAME=nim_master
      export NIM_MASTER_HOSTNAME_LIST="nim_master nim_alternate"
      master# lsnim -l client | grep current_master
         current_master = nim_master
      
    • Takeover operation is initiated from the alternate NIM master :
    • alternate# nim -o takeover -a show_progress=yes nim_master
      +-----------------------------------------------------------------------------+
                            Performing "reset" Operation
      +-----------------------------------------------------------------------------+
      +-----------------------------------------------------------------------------+
                            "reset" Operation Summary
      +-----------------------------------------------------------------------------+
       Target                  Result
       ------                  ------
       client                   RESET
       client1                  RESET
       [..]
      +-----------------------------------------------------------------------------+
                            Initiating "takeover" Operation
      +-----------------------------------------------------------------------------+
       Initiating the takeover operation on machine 1 of 240: client ...
      
       Initiating the takeover operation on machine 2 of 240: client1...
      [..]
      +-----------------------------------------------------------------------------+
                            "takeover" Operation Summary
      +-----------------------------------------------------------------------------+
       Target                  Result
       ------                  ------
       client                  SUCCESS
       client1                 SUCCESS
      [..]
      alternate# lsnim -l client | grep current_master
         current_master = nim_alternate
      client# grep -E "NIM_MASTER_HOSTNAME_LIST|NIM_MASTER_HOSTNAME" /etc/niminfo
      export NIM_MASTER_HOSTNAME=nim_alternate
      export NIM_MASTER_HOSTNAME_LIST="nim_alternate nim_master"
      
    • When the NIM master is up, initiate the takeover for the master :
    • # nim -o takeover -a show_progress=yes nim_alternate
      

    Synchronization automation and other files ?

    I recommend to run a NIM synchronization every day, I personally have a cronjob doing it every day at eleven PM. Most of the time a NIM synchronization is not enough and you’ll need to synchronize others file in my case, my root .profile my etc/hosts file, in your case whatever you want. For this need I’m using a little script based over rsync which synchronize my master to my alternate everyday :

    # crontab -l
    [..]
    0 23 * * * /export/nim/others/tools/do_sync.ksh >/dev/null 2>&1
    [..]
    # cat /export/nim/others/tools/do_sync.ksh
    [..]
        nim -o sync -a force=yes -a replicate=yes -a reset_clients=yes ${alternate}
        /export/nim/others/tools/sync_to_alternate.ksh
    [..]
    # cat /export/nim/others/tools/sync_to_alternate.ksh
    [..]
      /usr/bin/rsync -ave ssh ${a_filesystem} ${alternate_nim_master}:${a_filesystem}
    [..]
    

    NIM Security, use nimsh and use it over SSL

    nimsh over ssl

    NIM Master configuration form nimsh over SSL

    From the NIM master enable the SSL support trough the nimconfig command, certificates will be generated in /ssl_nimsh/keys, OpenSSL fileset has to be installed :

    • Check OpenSSL filesets :
    • # lslpp -l | grep openssl
        openssl.base            0.9.8.2400  COMMITTED  Open Secure Socket Layer
        openssl.license         0.9.8.2400  COMMITTED  Open Secure Socket License
        openssl.man.en_US       0.9.8.2400  COMMITTED  Open Secure Socket Layer
        openssl.base            0.9.8.2400  COMMITTED  Open Secure Socket Layer
      
    • Use nimconfig to enable SSL support :
    • # nimconfig -c
      0513-029 The tftpd Subsystem is already active.
      Multiple instances are not supported.
      NIM_MASTER_HOSTNAME=nim_master
      x - /usr/lib/libssl.so.0.9.8
      x - /usr/lib/libcrypto.so.0.9.8
      Target "all" is up to date.
      Generating a 1024 bit RSA private key
      ......++++++
      .++++++
      writing new private key to '/ssl_nimsh/keys/rootkey.pem'
      -----
      Signature ok
      subject=/C=US/ST=Texas/L=Austin/O=ibm.com/CN=Root CA
      Getting Private key
      Generating a 1024 bit RSA private key
      ...............++++++
      .......++++++
      writing new private key to '/ssl_nimsh/keys/clientkey.pem'
      -----
      Signature ok
      subject=/C=US/ST=Texas/L=Austin/O=ibm.com
      Getting CA Private Key
      Generating a 1024 bit RSA private key
      ......++++++
      .............++++++
      writing new private key to '/ssl_nimsh/keys/serverkey.pem'
      -----
      Signature ok
      subject=/C=US/ST=Texas/L=Austin/O=ibm.com
      Getting CA Private Key
      
    • Check the NIM master : attribute ssl_support is now set to yes :
    • # lsnim -l master | grep ssl_support
         ssl_support         = yes
      

    NIM alternate master for nimsh over SSL

    If you’re using an alternate NIM master repeat the same operation (OpenSSL and nimconfig -r). Alternate NIM master is also a client of the NIM master, its client has to be configured :

    # nimclient -c
    x - /usr/lib/libssl.so.0.9.8
    x - /usr/lib/libcrypto.so.0.9.8
    Received 2763 Bytes in 0.0 Seconds
    0513-044 The nimsh Subsystem was requested to stop.
    0513-077 Subsystem has been changed.
    0513-059 The nimsh Subsystem has been started. Subsystem PID is 9502954.
    

    Client configuration

    Configure all nimclients to use ssl crypted authentication, if you are using alternate NIM master do not forget to download alternate certificates on clients :

    # rmitab nimsh 2>/dev/null 
    # rm -rf /etc/niminfo
    # niminit -aname=$(hostname) -a master=nim_master -a master_port=1058 -a registration_port=1059 -a connect=nimsh
    # nimclient -c
    # nimclient -o get_cert -a master_name=nim_alternate
    # stopsrc -s nimsh
    # startsrc -s nimsh
    

    On the NIM server itself client’s connect attribute is now set to “nimsh (secure)” :

    # lsnim -l ruby | grep connect
       connect        = nimsh (secure)
    

    Are the data encrypted ?

    Check this statement in NIM from a to Z Redbook at page 434 :

    “Any communication initiated from the NIM client (pull operation) reaches the NIM master on the request for services and registration ports (1058 and 1059, respectively). This communication is not encrypted. For any communication initiated from the NIM master (push operations), the NIM master communicates with the NIM client using the NIMSH daemon. This allows an encrypted handshake dialog during authentication. However, data packets are not encrypted.”

    To sum up :

    • Only push operations can use secure nimsh.
    • Data packets are not encrypted.
    • Secure nimsh just add an encrypted handshake between NIM master and its clients.

    Have a look on this two screenshots, the first one is the tcp stream of a non-secure operation, the second one is secured :

    • Non secure tcp stream of a push operation :
    • Secure tcp stream of a push operation :

    Distributed Systems Management

    Distributed Systems Management (we’ll call it DSM until now), is a set of tools and programs used to enhance NIM capabilities. I personally use DSM for two main purposes, opening and monitoring consoles through the dconsole utility, and to automate my installations. DSM add new objects the NIM environment, and new attributes to the NIM objects. You can also gain more on control on your lpars and directly restart, maint_boot an lpar through NIM by using DSM. Hardware Management Console (HMC objects) and Pserie’s frames (CEC objects) can be added in NIM, profile management are added to standalone objects in order to take advantage of DSM with NIM.

    There are two main source of information for DSM

    • The dsm.core fileset comes with a pdf file named dsm_tech_note.pdf, page 161, chapter 5.
    • # lslpp -f dsm.core | grep dsm_tech_note.pdf
                              /opt/ibm/sysmgt/dsm/doc/dsm_tech_note.pdf
      
    • There are full detailed examples in the IBM AIX Version 7.1 Differences Guide .

    Filesets prerequisites

    Starting with AIX 6.1 TL3 base installation media are shipped with DSM packages (dsm.core). expect, tcl, tk, and xterm are needed by this DSM pacakges :

    # lslpp -l | grep -E "dsm|tcl|tk|expect|xterm"
      X11.apps.aixterm           7.1.2.0  COMMITTED  AIXwindows aixterm Application
      X11.apps.xterm            7.1.2.15  COMMITTED  AIXwindows xterm Application
      X11.msg.en_US.apps.aixterm
                                 7.1.2.0  COMMITTED  AIXwindows aixterm Messages -
      dsm.core                  7.1.2.15  COMMITTED  Distributed Systems Management
      dsm.dsh                   7.1.2.15  COMMITTED  Distributed Systems Management
      expect.base               5.42.1.0  COMMITTED  Binary executable files of
      expect.man.en_US          5.42.1.0  COMMITTED  Expect man page documentation
      tcl.base                   8.4.7.0  COMMITTED  Binary executable files of Tcl
      tcl.man.en_US              8.4.7.0  COMMITTED  Tcl man page documentation
      tk.base                    8.4.7.0  COMMITTED  Binary executable files of Tk
      tk.man.en_US               8.4.7.0  COMMITTED  Tk man page documentation
    

    Defining HMC objects

    DSM is using HMC to start (poweron) lpars, stop (poweroff) lpars and open console on lpars. HMC can be defined on NIM. An HMC object is a management object. To avoid prompting password each time a NIM operations is performed, or each time dconsole is called, DSM provides a mechanism to manage SSH key sharing between the NIM and the HMC. Before adding an HMC object use dpasswd and dkeyexch command to enable SSH key authentication :

    • Create the authentication file with dpasswd command. File is by default stored in /etc/ibm/sysmgm/dsm/config :
    • # dpasswd -f hmc1_passwd -U hscroot
      Password:
      Re-enter password:
      Password file created
      # ls -l  /etc/ibm/sysmgt/dsm/config/
      total 24
      -r--r--r--    1 root     system           16 Mar 11 13:25 .key
      -r--r--r--    1 root     system           24 Mar 11 13:25 hmc1_passwd
      
    • Share the key between NIM master and HMC using dkeyexch command :
    • # dkeyexch -f /etc/ibm/sysmgt/dsm/config/hmc1_passwd -I hmc -H hmc1
      OpenSSH_6.0p1, OpenSSL 0.9.8x 10 May 2012
      
    • At this step you should be able to connect to the HMC without password prompting :
    • # ssh hscroot@hmc1
      Last login: Mon Mar 11 13:51:35 2013 from 10.10.20.21
      
    • Define the new HMC object with nim command, the network on which the HMC is running must be defined as an NIM network :
    • # nim -o define -t ent -a net_addr=10.10.30.0 -a snm=255.255.254.0 -a routing1="default 10.10.31.254" 10-10-30-0-s23-net
      # nim -o define -t hmc -a if1="find_net hmc1 0" -a passwd_file=/etc/ibm/sysmgt/dsm/config/hmc1_passwd hmc1
      # lsnim -t hmc
      hmc1     management       hmc
      # lsnim -lF hmc1
      hmc1:
         id          = 1363005068
         class       = management
         type        = hmc
         if1         = 10-10-30-0-s23-net hmc1 0
         Cstate      = ready for a NIM operation
         prev_state  =
         Mstate      = not running
         passwd_file = /etc/ibm/sysmgt/dsm/config/hmc1_passwd
      

    Defining CEC objects

    Defining HMC object allows to define CEC object, NIM CEC‘s object are requiring four mandatory attributes, hardware type (hw_type), hardware model (hw_model), hardware serial (hw_serial), and the HMC used to control this CEC object (mgmt_source). Query the HMC to get the attributes with lssyscfg command, and define the new CEC object with the nim command :

    • Querying HMC to get hw_model, hw_serial, and hw_type :
    • # ssh hscroot@hmc1 "lssyscfg -r sys -F name,type_model,serial_num"
      # CEC1,8203-E4A,060CE99
      
    • lssyscfg output tells you that : hw_type=8203, hw_model=EA4 and hw_serial=060CE99
    • Create the CEC object :
    • # nim -o define -t cec -a hw_type=8203 -a hw_model=E4A -a hw_serial=060CE99 -a mgmt_source=hmc1 cec1
      # lsnim -l cec1
      cec1:
         class      = management
         type       = cec
         Cstate     = ready for a NIM operation
         prev_state =
         hmc        = hmc1
         serial     = 8203-E4A*060CE99
      

    Adding profile management to standalone object

    To define a standalone object with a management profile or to add a management profile to an existing standalone, MAC address and lpar id are needed, the lpar id can easily be learned by the HMC, for the MAC address use the dgetmacs command to get it:

    • Get the lpar id trough the HMC :
    • ssh hscroot@infmc102 "lssyscfg -r lpar -m CEC1 -F name,lpar_id"
      lpar1,5
      lpar2,4
      vios1,3
      vios2,2
      lpar3,1
      
    • Define the machine and replace the MAC address by 0 :
    • # nim -o define -t standalone -a if1="10-10-20-0-s24-net lpar2 0" -a net_settings1="auto auto" -a mgmt_profile1="hmc1 4 CEC1" lpar2
      
    • Retrieve the machine MAC address by using the dgetmacs command, the host will booted on openfirmware. If the host is already installed get the MAC address with entstat command directly on the machine :
    • #  dgetmacs -n lpar2 -C NIM
      Using an adapter type of "ent".
      Could not dsh to node lpar2.
      Attempting to use openfirmware method to collect MAC addresses.
      Acquiring adapter information from Open Firmware for node lpar2.
      
      # Node::adapter_type::interface_name::MAC_address::location::media_speed::adapter_duplex::UNUSED::install_gateway::ping_status::machine_type::netaddr::subnet_mask
      
      lpar1::ent_v::::2643EEBC6C04::U8203.E4A.060CE99-V4-C4-T1::auto::auto::::::n/a::secondary::::
      
    • Modify the NIM object to add the MAC address :
    • # nim -o change -a if1="10-10-20-0-s24-net lpar2 2643EEBC6C04" lpar2
      

    Using dconsole to open and monitor machines consoles

    If the machine is already installed, or after the installation with a bos_inst operation, you can manage its console with the dconsole command. A few cool things comes with dconsole such as opening a console in read only mode, opening a console in text mode or through an xterm, and logging all consoles outputs into /var/ibm/sysmgt/dsm/log/console; here are a few examples :

    • Opening a text console in read-write mode and log the output in /var/ibm/sysmgt/dsm/log/console :
    • # dconsole -C NIM -n lpar2 -t -l
      Starting console daemon
      [read-write session]
      
       Open in progress
      
       Open Completed.
      AIX Version 7
      Copyright IBM Corporation, 1982, 2013.
      Console login: root
      # echo test
      test
      # tail -10 /var/ibm/sysmgt/dsm/log/console/lpar2.0
      # echo test
      test
      # exit
      
    • Opening an xterm console in read-write mode and log the output in /var/ibm/sysmgt/dsm/log/console on greenclient1 :
    • # export DISPLAY=10.10.20.35:0
      # dconsole -C NIM -n greenclient1  -l
      Starting console daemon
      

    • Opening a text console in read-only mode :
    • # dconsole -C NIM -n lpar2  -l -t -r
      Starting console daemon
      [read only session, user input discarded]
      
       Open in progress
      
       Open Completed.
      AIX Version 7
      Copyright IBM Corporation, 1982, 2013.
      Console login: [read only session, user input discarded]
      [read only session, user input discarded]
      

    bos_inst operation through NIM with DSM

    Machine installation and bos_inst operation can be automated with DSM. If a machine has a management profile and a bos_inst operation is performed this one will be rebooted and automatically installed, I do install machine with this method and it works like a charm :

    • Install the machine lpar2 in aix 7100-02-02, a bosinst_data with no prompt stanza was created for this installation :
    • # nim -o bos_inst -a bosinst_data=hdisk0_noprompt-bosinst_data -a source=rte -a installp_flags=agX -a accept_licenses=yes -a spot=7100-02-02-1316-spot -a lpp_source=7100-02-02-1316-lpp_source lpar2
      dnetboot Status: Invoking /opt/ibm/sysmgt/dsm/dsmbin/lpar_netboot lpar2
      dnetboot Status: Was successful network booting node lpar2.
      
    • DSM is using HMC lpar_netboot command to install machines, the output of this command can be found in /tmp filesystem :
    • # cat /tmp/lpar_netboot.12124286.exec.log
      lpar_netboot Status: process id is 12124286
      lpar_netboot Status: lpar_netboot -i -t ent -D -S 10.10.20.140 -G 10.10.20.254 -C 10.10.20.202 -m 2643EEBC6C04 -s auto -d auto -F /etc/ibm/sysmgt/dsm/config/hmc1_passwd -j hmc -J 10.10.30.1 4 060C
      E74 8203-E4A
      [..]
      IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
      IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
      
                1 = SMS Menu                          5 = Default Boot List
                8 = Open Firmware Prompt              6 = Stored Boot List
      [..]
      10.10.20.202:    24  bytes from 10.10.20.140:  icmp_seq=7  ttl=? time=21  ms
      
      10.10.20.202:    24  bytes from 10.10.20.140:  icmp_seq=8  ttl=? time=21  ms
      PING SUCCESS.
      [..]
      38300 ^MPACKET COUNT = 38400 ^MPACKET COUNT = 38500 ^MPACKET COUNT = 38600 ^MPACKET COUNT = 38700 ^MPACKET COUNT = 38800 ^MPACKET COUNT = 38900 ^MFINAL PACKET COUNT = 38913
      FINAL FILE SIZE = 19922944  BYTES
      
    • The installation progression can be monitored form the NIM itself :
    • # lsnim -l lpar2 |grep info
         info           = BOS install 39% complete : Installing additional software.
      

    Is it free ?

    Unlike CSM DSM is free, you do not need any licenses to use it. As you can see these tools can be very powerful to automate installations for standalone clients. VMControl is using DSM and NIM to automate installations. DSM is the right tool to industrialize your NIM installations.

    Cheatsheet

    I love cheat sheet ! NIM commands are complex and hard to remember, I’ve search over the internet if a NIM cheat sheet already exists but I haven’t found anything correct or anything that fits my needs. I’m sure that a lot of my readers already knows William Favorite’s Quicksheets. I’m a huge fan of this Quicksheets and I was inspired by Willam when creating my own one for NIM. Feel free to contact me if you want to add or correct something in my cheat sheet, you’ll be -of course- credited if you add some useful informations. Click here to download my NIM cheat sheet :chmod666 NIM Cheat Sheet

    No future ?

    I do love NIM, but in my opinion it’s a little bit outdated, everyone is calling for an update of the Redbook (click here to call for an update ;-)) and of the product, me included. This part of the post was inspired by one of my AIX Guru, thanks to him, I’m sure he’ll recognize himself. If IBMers are reading this part of the post, please tell IBM to update NIM. Readers please react in the comments if you agree with me on this point. Here are a few points I want to see in a future NIM release :

    • Network package repository of software : publish lpp_source over http or https. IBM can publish an official repository, and customer can create its own one on the NIM server (this one can be synchronized with IBM official repository).
    • Create a client (updated nimclient) with search and download option. (Yes like yum).
    • Getting rid of bootp and tftp, download kernel (created in /tftpboot when a new SPOT is created) and ramdisk image trough http or https.
    • Replace nfs exports by http or https (or force nfsv4) for NIM resources sharing (SPOT, lpp_source, install_script, bosinst_data…)(easier for security, and firewall ruling).
    • Allow IPL menu to be setup in dhcp.
    • Automatic dependencies checking and resolution while installing a software.
    • Simplify postinstall (script) and firstboot (fb_script). My actual solution is to create a firstboot script, this one download a script and add an entry in /etc/inittab, the downloaded script do the job and remove the entry in /etc/inittab at the end of its execution.
    • Automatic multibos creation while updating a system trough NIM — or in option.
    • Keep mksysb the way it is, this is the best bare metal backup I ever known.
    • Getting rid of rsh, force user to use nimsh (for nimadm too).
    • Better design for high availability (HANIM auto sync for example).
    • NIM Database flexibility : Let user renaming an resource object (please do this !!!) — Who has never experienced this problem while creating a SPOT or an lpp_source with an erroneous name ?
    • Allow allocating multiple lpp_source for different installp_bundle for installation.
    • Allow nimadm migration to be performed without the exact same level for bos.alt_disk_install.rte fileset.
    • Allow nimsh to be configured over http or https (no more multiple ports for nimsh ; easier for security, and firewall ruling).
    • Automatically enabled cryptographic authentication for NIM service handler. (nimsh can uses SSL-encrypted certificates).
    • Easier NIM backup and restore, getting rid of m_backup_db and m_restore_db.


    Please comment and react I do need support ;-). Hope this can help.

31 thoughts on “NIM Less known features : HANIM, nimsh over ssl, DSM

  1. Nice article :)
    I agree on the NIM changes (ok i called for it ;)
    I’d say that nimshell could also be replaced by https.
    That would leave only 1 port to open to install/migrate/update for DMZ machines.
    Make it all https for security reasons.
    And yes on the name changes, i still hate when someone creates a network with a typo and i have to collect the database entries, remove all, recreate all in order to have it fixed.

  2. Very good article !

    I’m glad to see that some AIX administrators in the world, use some recent improvements into NIM :)

    The class management was implemented in 6.1 TL3 (in my remembering) but some recent improvements, related to VIOS, are a “must know”.

    Now with class management and 6.1 TL7 / 7.1 TL1 : you can perform backupios, updateios operations onto a VIO from NIM.

    The requisite is to define your VIO as a “vios” object type.

    NIM1 $ lsnim -t vios
    vios1 management vios
    vios2 management vios

    vios1 :
    class = management
    type = vios
    connect = nimsh
    platform = chrp
    netboot_kernel = 64
    if1 = ent-Network54 vios1 0
    net_settings1 = 1000 full
    cable_type1 = N/A
    mgmt_profile1 = hmc1 1 cec1
    Cstate = ready for a NIM operation
    prev_state = not running
    Mstate = currently running
    cpuid = 00C911544C00
    Cstate_result = success

    vios:
    define = define an object
    change = change an object’s attributes
    cust = perform software customization
    bos_inst = perform a BOS installation
    lslpp = list LPP information about an object
    fix_query = perform queries on installed fixes
    reboot = reboot specified machines
    showlog = display a log in the NIM environment
    lppchk = verify installed filesets
    check = check the status of a NIM object
    remove = remove an object
    reset = reset an object’s NIM state
    allocate = allocate a resource for use
    deallocate = deallocate a resource
    updateios = perform software customization on I/O server

    At moment of writing, you cannot perform netboot from NIM onto a VIO, there is an APAR that fix the issue :

    http://www-01.ibm.com/support/docview.wss?uid=isg1IV36566

    Another suggestions to your article is to remind what required filed to an lpar that is candidate to be entirely managed by NIM, such as mgmt_profile1, mgmt_profile2….

      • After talking to Benoit it seems to have been fixed, but cable_type1 = N/A was a source of problems for a long time, where the IP address wasn’t set on the interface if cable type wasn’t “tp”.

        • cable_type1 = N/A does not occur when the client’s adapters are managed by a vio ; exactly as the network speed.

  3. Hi everyone!!
    I’m happy to see that people love nim server as me :D

    What about this :
    The reset_clients option can be used along with the force option. For example:
    # nim -Fo sync -a reset_clients=yes masterb

    In my nim server ( also in the alternate one ) there isn’t the option reset_clients !!!
    Could be a fileset missing or something I have to install ??

    I have the same on both nim :

    # oslevel -s
    7100-01-07-1316
    # lslpp -l | grep nim
    bos.sysmgt.nim.client 7.1.1.16 APPLIED Network Install Manager –
    bos.sysmgt.nim.master 7.1.1.17 APPLIED Network Install Manager –
    bos.sysmgt.nim.spot 7.1.1.15 APPLIED Network Install Manager – SPOT
    bos.sysmgt.nim.client 7.1.1.16 APPLIED Network Install Manager -

    • Hi,

      I do not have the same problem on my nim server, your problem seems pretty strange.

      Can you raise a PMR to IBM ?

      Regards,

      Benoit

      • Hi again,
        I think the problem could be the version of the nim filesets. In the post they are using 7.1.2 , I’m trying now with this version.
        But the option reset_clients is only for the command lsnim .
        I didn’t open a PMR yet, I would try with this version first, but the cuestion is, Is it possible to initialize all the clients defined in the master , for the alternate nim master ( the second nim ) ?? I don’t know how does the niminfo looks like in the client, with both master declaration .
        My idea was HANIM but maybe is easy two nim servers with it’s own clients and the same lppsource, spot, net … resources

        • Hi again,
          the option reset_clients in AIX 7.1.2 exists ; but still not working on my environment . The nim database doesn’t sincronize OK. It takes a long time and finally ends with this error.
          “””
          Before command completion, additional instructions may appear below.

          0042-001 nim: processing error encountered on “master”:
          0042-006 m_sync: (From_Master) connect Connection refused

          nconn: connect() failed, errno is 79
          nim_alternate: Connection refused
          “”””

          Here is the option that appear on AIX 7.1.2 :
          “”
          Synchronize an Alternate Master’s NIM database

          Type or select values in entry fields.
          Press Enter AFTER making all desired changes.

          [Entry Fields]
          * Target Name nim_alternate
          Force no +
          Replicate no +
          Reset NIM Clients to Alternate Master yes

          ——————————————————————————————
          do_op() |
          | { |
          | FORCE= |
          | REPLICATE= |
          | RESET_CLIENTS= |
          | |
          | while getopts rRFN: option |
          | do |
          | case $option in |
          | F) FORCE=yes;; |
          | R) REPLICATE=yes;; |
          | N) NAME=$OPTARG;; |
          | r) RESET_CLIENTS=yes;; |
          | esac |
          | done |
          | nim -o sync ${FORCE:+-a force=yes} ${REPLI |
          | CATE:+-a replicate=yes} ${RESET_CLIENTS:+-a reset_ |
          | clients=yes} $NAME |
          | exit $? |
          | } |
          | do_op -N ‘nim_alternate’

          __________________________________________________
          “””””

  4. In my environment rsh service is running on the NIM master server and nimsh service is running on the NIM client. Can anyobody please explain how the communication happens in this case.? Shall i stop rsh and start nimsh service on the NIM master server.?

    • If your clients are all configured with nimsh, you can stop rshd on the server …. but remember that nimadm is still using rsh or you have to patch your nim server with IV46746m2c.

  5. Pingback: NIM, KRB5/AD rsh, ftp …….Waldemar Mark Duszyk

    • Hi Mark !

      Thanks a lot for you support and for quoting my blog in one of your post. You remind my why I’m doing this , to share knowledge with passionate people like you. You rules !

  6. Hi Guys, first of all, thanks for the awsome guide, much appreciated.

    We are having some issues defining the alternate_master:
    “0042-006 niminit: (init to master) rcmd Error 0″

    Both LPARs are in a different subnet. firewall rules for ports 1508/1509 and 3901/3902 are open bidirectional.

    NIM routing is in place:

    graymgt_net:
    routing1 = default 156.150.181.65
    violetmgt_net:
    routing1 = default 156.150.181.1

    Who has any ideas?

    Thanks a lot

    Mike

  7. Hi All,

    i have an issue relating to defining an alternate_master. Let’s first show the setup:

    nim01 = 192.168.0.71/26
    nim02 = 192.168.0.7/26

    Networks + routing:

    graymgt_net:
    routing1 = default 192.168.0.65
    violetmgt_net:
    routing1 = default 192.168.0.1

    Ports opened in firewall; 1508,1509,3901,3902

    However, when i define the alternate_master, i get:

    # niminit -a is_alternate=yes -a master=nim01 -a pif_name=en0 -a cable_type1=tp -a connect=nimsh -a name=nim02
    0513-071 The nimesis Subsystem has been added.
    0513-071 The nimd Subsystem has been added.
    0513-059 The nimesis Subsystem has been started. Subsystem PID is 5374066.
    0042-006 niminit: (init to master) rcmd Error 0

    0513-044 The nimesis Subsystem was requested to stop.
    0513-004 The Subsystem or Group, nimd, is currently inoperative.
    0513-004 The Subsystem or Group, nimsh, is currently inoperative.
    0513-083 Subsystem has been Deleted.
    0513-083 Subsystem has been Deleted.
    0518-307 odmdelete: 4 objects deleted.
    0518-307 odmdelete: 33 objects deleted.

    Any idea’s?

    • Hi,

      Can you please check on each NIM server that resolution and inverse resolution is working well on both server :

      # host -n -t A nim01
      # host -n -t A nim02
      # host -n -t PTR 192.168.0.71
      # host -n -t PTR 192.168.0.7

      Check /etc/hosts on each nim server.
      Check /etc/netsvc.conf on each nim server (perfer local4 before bind4).

      Retry you command.

      Regards,

      Benoît.

      • Hi Benoît,

        “Host not found, try again.”. We don’t have DNS in place, and the hostfile contains;

        192.168.0.71 nim01.localdomain nim01
        192.168.0.7 nim02.localdomain nim02

        /etc/netsvc.conf was “local,bind4″, changing to local4 didn’t help.

      • Hi, i will also open a call with IBM for this. When they find the answer i’ll post it here for reference :-)

        • hanim and nimsh are very sensitive to dns resolution, I think your problem is dns related.

          Be careful to set your hostname to nim01.localdomain and nim02.localdomain, or to inverse the alias on the host in /etc/hosts.

          Tell me when you have news of your PMR.

          Regards,

          Benoit

    • Hi Guys,

      so the answer is; open up more ports… on this page i found all the info needed; http://www-01.ibm.com/support/docview.wss?uid=isg3T1011808#6

      and then especially the rsh ports;
      rsh* 513 – 1023**

      So, even though we are using nimsh in our command, it still does something via rsh…. eventhough shell and login are not active on both of the nimservers in the inetd.conf.

      Mike

      • Uh !

        Ok, have you tried any tcpdump trace to check what is going on on which port ?

        Thanks for you answer :-).

        B.

        • Alternate NIM master, traffic from alternate master view

          1. ping from alternate to master
          2. TCP from alternate to master (1023 to 1059)
          3. TCP from master to alternate (1023 to 1022)
          4. TCP from master to alternate (1059 to 1023)
          5. ping from master to alternate

          and ofcourse we see more packets, all on above ports (SYN, ACK, PSH, Windows Update, FIN)

          • Hi mike,

            Thanks for you help, i’ll update the post to talk about it.

            Thanks again.

  8. Hi,

    Thank you for this article!!!
    But I have a problem, at the end of synchronisation I obtain an error message:
    nim_master_recover Complete

    /usr/sbin/niminit
    0042-001 nim: processing error encountered on “master”:
    0042-020 nim: “object name” must be supplied for this
    operation
    […]
    0042-051 niminit: unable to resolve “”
    to an IP address

    rc=1
    0042-001 m_sync: processing error encountered on “SRVNIMB”:

    My Name resolution work fine
    Any idea?

  9. nim -Fo sync
    command hangs at
    “Removing NIM client “.
    My oslevel “7100-03-04-1441″ and nim sw level is 7.1.3.30.

    • Seems pretty strange. Never faced that issue. I’ll setup a test nimha this weekend and tell you if I have the same issue.
      Sorry for the delay.
      Regards,
      B.

  10. Hi, Great article thanks!
    In the A-Z redbook i see the below info, that i need to copy the lpp_resource dir manually. Does replication=yes option does this for me?

    Copy the lpp_source to the backup NIM master.
    You need to create the same file system structure for the lpp_source as in the
    primary NIM master server.
    Copy all the lpp_source directory to the backup NIM master:
    From the Primary NIM master,
    # cd /export/lppsource/lpp-aix5304
    # find . -print | backup -iqvf- |rsh lpar6 -l root “cd \
    /export/lppsource/lpp-aix5304 ;restore -xqvf-“

  11. I use a script to create client / cec and install lpar using DSM & NIM
    I’m in front of issue when dconsole is invoked that stop the whole process:
    dconsole from dsm.core 7.2.0.0
    lpar_netboot error message:
    “lpar_netboot Status: console command is /opt/ibm/sysmgt/dsm/bin//dconsole -c -f -t -n “1|hmc|192.168.11.81|TargetHWTypeModel=8202-E4B:TargetHWSerialNum=212824V:TargetLPARID=1|/etc/ibm/sysmgt/dsm/config/hmc-p6-1_password_file”
    Starting console daemon
    [read-write session
    lpar_netboot Status: connected to dconsole
    ]
    2760-257 [dconsoled] The Virtual Terminal session was closed by another user.
    Console connection closed by peer.
    Press “Enter”
    lpar_netboot Status: dconsole failure detected
    lpar_netboot Status: 2760-257 [dconsoled] The Virtual Terminal session was closed by another user.
    Console connection closed by peer.”

    issue bypassed in the script by :
    1- exporting ressources from NIM for the lpar
    2- launching lpar_netboot on the HMC
    3- launching mkvterm on the HMC
    but 16 lines instead of 1….

    Any help on this issue is welcome :)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>