Updating AIX TL and SP using Chef

Creating something to automate the update of a service pack or a technology level has always been a dream that never come true. You can trust me almost every customers that I know tried to make that dream come true. Different customers, same story everywhere. They tried to do something and then tripped up in a miserable way. A fact that is always true in those stories is that the decision of doing that is always taken by someone that do not understand that AIX cannot be managed like a workstation or any other OS (who said windows). A good example of that is an IBM (and you know that I’m an IBM fan) tool call BigFix/TEM (for Tivoli Endpoint Manager): I’m not an expert about TEM (so maybe I am wrong) but you can use this one to update your Windows OS, your Linux, your AIX and even your Iphones or Android devices. LET ME LAUGH! How can it be possible that someone think about this: updating an Iphone the same way as you update an AIX. A good joke! (To be clear I am always and will always support IBM but my role is also to say what I think). Another good example is the utilization of IBM Systems Director (unfortunately … or fortunately this one has been withdrawn since a couple of days). I tried this one myself a few years ago (you can check this post). System Director was (in my humble opinion) the least worst solution to update an AIX or a Virtual I/O Server in a automated way. So how are we going to do this in a world that is always asking to do more with less people ?. I had to find a solution a few months ago to update more than 700 hosts from AIX 6.1 to AIX 7.1, the job was to create something that anybody can launch without knowing anything about AIX (one more time who can even think this is possible ?). I tried to do things like writing scripts to automate nimadm and I’m pretty happy with this solution (almost 80% were ok without any errors, but there were tons of prerequisites before launching the scripts and we faced some problems that were inevitable (nimsh error, sendmail configuration, broken filesets) forcing the AIX L3 team to fix tons of migrations). As everybody knows now I’m working on Chef since a few months and this can be the solution to what our world is asking today : replacing hundred of peoples by a single man launching a magical thing that can do everything without knowing anything about anything and save money! This is obviously ironical but unfortunately this is the reality of what happends today in France. “Money” and “resource” rules everything without having any plans about the future (to be clear I’m here talking about a generality, nothing here can reflect what’s going on in my place). It is like it is and as a good soldier I’m going to give you solutions to face the reality of this harsh world. But now it’s action time ! I don’t want to be too pessimistic but this is unfortunately the reality of what is happening today and my anger about that only reflects the fact that I’m living in fear, the fear of becoming bad or the fear of doing a job I really don’t like. I think I have to find a solution about this problem. The picture below is clear enough to give you a good a example of what I’m trying to do with Chef.

CF8j9_dWgAAOuyC

How do we update machines

I’m not here to teach you how to update a service pack or a technology level (I’m sure everybody know that) but in an automated way we need to talk about the method and identify each needed steps to perform an update. As there is always one more way to do it I have identified three ways to update a machine (the multibos way, the nimclient way and finally the alt_disk_copy way). To be able to update using Chef we obviously need to have an available provider for each method (you can do this with the execute resource, but we’re here to have fun and to learn some new things). So we need one provider capable of managing multibos, one capable of managing nimclient, and one capable of managing alt_disk_copy. All of these three providers are available now and can be used to write different recipes doing what is necessary to update a machine. Obviously there are pre-update and post-update steps needed (removing efixes, checking filesets). Let’s identify the step required first:

  • Verify with lppchk the consistency of all installed packages.
  • Remove any installed efixes (using emgr provider)
  • The multibos way:
    • You don’t need to create a backup of the rootvg using the multibos way.
    • Mount the SP or TL directory from the NIM server (using Chef mount resource).
    • Create the multibos instance and update using the remote mounted directory (using multibos resource).
  • The nimclient way:
    • Create a backup of your rootvg (using the altdisk resource).
    • Use nimclient to run a cust operation (using niminit,nimclient resource).
  • The alt_disk_copy way:
    • You don’t new to create a backup of the rootvg using the alt_disk_copy way.
    • Mount the SP or TL directory from the NIM server (using Chef mount).
    • Create the altinst_rootvg volume group and update it using the remote mounted directory (using altdisk provider).
  • Reboot the machine.
  • Remove any unwanted bos, old_rootvg.

Reminder where to download the AIX Chef cookbook:

Before trying to do all these steps in a single way let’s try to use the resources one by one to understand what each one is doing.

Fixes installation

This one is simple and allows you to install or remove fixes from your AIX machine, in the example below we are going to show how to do that with two Chef recipes: one for installing and the other one for removing! Super easy.

Installing fixes

In the recipe provides all the fixes name in an array and specify the name of the directory in which the filesets are (this can be an NFS mount point if you want to). Please note here that I’m using the cookbook_file resource to download the fixes, this resource allows you to download a file directly from the cookbook (so from the Chef server). Imagine using this single recipe to install a fix on all your machines. Quite easy ;-)

directory "/var/tmp/fixes" do
  action :create
end

cookbook_file "/var/tmp/fixes/IV75031s5a.150716.71TL03SP05.epkg.Z" do
  source 'IV75031s5a.150716.71TL03SP05.epkg.Z'
  action :create
end

cookbook_file "/var/tmp/fixes/IV77596s5a.150930.71TL03SP05.epkg.Z" do
  source 'IV77596s5a.150930.71TL03SP05.epkg.Z'
  action :create
end

aix_fixes "installing fixes" do
  fixes ["IV75031s5a.150716.71TL03SP05.epkg.Z", "IV77596s5a.150930.71TL03SP05.epkg.Z"]
  directory "/var/tmp/fixes"
  action :install
end

directory "/var/tmp/fixes" do
  recursive true
  action :delete
end

emgr1

Removing fixes

The recipe is almost the same but with the remove action instead of the install action. Please note that you can specify which fixes to remove or use the keyword all to remove all the installed fixes (in the case of our recipe to update our servers we will use “all” as we want to remove all fixes before trying launch the update).

aix_fixes "remove fixes IV75031s5a and IV77596s5a" do
  fixes ["IV75031s5a", "IV77596s5a]
  action :remove
end
aix_fixes "remove all fixes" do
  fixes ["all"]
end

emgr2

Alternate disks

In most AIX places I have seen the solution to backup your system before doing anything is to create an alternate disk using the alt_disk_copy command. Sometimes in some places where sysadmins love their job this disk is updated on the go to do a TL or SP upgrade. The altdisk resource I’ve coded for Chef take care of this. I’ll not detail with examples every actions available and will focus on create and cust:

  • create: This action create an alternate disk we will detail the attributes in the next section.
  • cleanup: Cleanup the alternate disk (remove it).
  • rename: Rename the alternate disk.
  • sleep: Put the alternate disk in sleep (umount every /alt_inst/* filesystem and varyoff the volume group)
  • wakeup: Wake up the alternate disk (varyon the volume group and mount every filesystems)
  • customize: Run a cust operation (the current resource is coded to use a directory to update the alternate disk with all the filesets present in a directory).

Creation

The alternate disk create action create an alternate disk an helps you to find an available disk for this creation. In any cases only free disks will be choosen (disks with no PVID and no volume group defined). Different types are available to choose the disk on which the alternate disk will be created:

  • Size: If type is size a disk by the exact same size of the value attribute will be used.
  • Name: If type is name a disk by the name of the value attribute will be used.
  • Auto: In auto mode available values for value are bigger and equals. If bigger is choose the first disk found with a size bigger than the current rootvg size will be used. If equals is choose the first disk found with a size equals to the current rootvg size is used.
aix_altdisk "cloning rootvg by name" do
  type :name
  value "hdisk3"
  action :create
end
aix_altdisk "cloning rootvg by size 66560" do
  type :size
  value "66560"
end
aix_altdisk "removing old alternates" do
  action :cleanup
end

aix_altdisk "cloning rootvg" do
  type :auto
  value "bigger"
  action :create
end

altdisk1

Customization

The customization action will update the previously created alternate disk with the filesets presents in an NFS mounted directory (from the NIM server). Please note in the recipe below that we are mounting the directory from NFS. The node[:nim_server] is an attribute of the node telling which nim server will be mounted. For instance you can define a nim server used for production environment and a nim server used for development environment.

# mounting /mnt
mount "/mnt" do
  device '#{node[:nim_server]}:/export/nim/lpp_source'
  fstype 'nfs'
  action :mount
end

# updating the current disk
aix_altdisk "altdisk_update" do
  image_location "/mnt/7100-03-05-1524"
  action :customize
end

mount "/mnt" do
  action :umount
end

altdisk_cust

niminit/nimclient

The niminit and nimclient resources are used to register the nimclient to the nim master and then run a nimclient operation from the client. In my humble opinion this is the best way to do the update at the time of writing this blog post. One cool thing is that you can specify on which adapter the nimclient will be configured by using some ohai attributes. It’s an elegant way to do that, one more time this is showing you the power of Chef ;-) . Let’s start with some examples:

niminit

aix_niminit node[:hostname] do
  action :remove
end

aix_niminit node[:hostname] do 
  master "nimcloud"
  connect "nimsh"
  pif_name node[:network][:default_interface]
  action :setup
end

nimclient1

nimclient

nimclient can first be used to install some filesets you may need. The provider is intelligent and can choose the good lpp_source for you. Please note that you will need lpp_source with a specific naming convention if you want to use this feature. To find the next/latest available sp/tl the provider is checking the current oslevel of the current machine and compare it with the available lpp_source present on you nim server. The naming convention needed is $(oslevel s)-lpp_source (ie. 7100-03-05-1524-lpp_source) (same principle is applicable to the spot when you need to use spot)

$ lsnim -t lpp_source | grep 7100
7100-03-00-0000-lpp_source             resources       lpp_source
7100-03-01-1341-lpp_source             resources       lpp_source
7100-03-02-1412-lpp_source             resources       lpp_source
7100-03-03-1415-lpp_source             resources       lpp_source
7100-03-04-1441-lpp_source             resources       lpp_source
7100-03-05-1524-lpp_source             resources       lpp_source

If your nim resources name are ok the lpp_source attribute can be:

  • latest_sp: the latest available service pack.
  • next_sp: the next available service.
  • latest_tl: the latest available technology level.
  • next_tl: the next available techonlogy level.
  • If you do not want to do this you can still specify the name of the lpp_source by hand.

Here are a few example to install packages

aix_nimclient "installing filesets" do
  installp_flags "aXYg"
  lpp_source "7100-03-04-1441-lpp_source"
  filesets ["openssh.base.client","openssh.base.server","openssh.license"]
  action :cust
end

aix_nimclient "installing filesets" do
  installp_flags "aXYg"
  lpp_source "7100-03-04-1441-lpp_source"
  filesets ["bos.compat.cmds", "bos.compat.libs"]
  action :cust
end

aix_nimclient "installing filesets" do
  installp_flags "aXYg"
  lpp_source "7100-03-04-1441-lpp_source"
  filesets ["Java6_64.samples"]
  action :cust
end

nimclient2

Please note that some filesets were already installed and the resource did not converge because of that ;-) . Let’s now try to update to the latest service pack:

aix_nimclient "updating to latest sp" do
  installp_flags "aXYg"
  lpp_source "latest_sp"
  fixes "update_all"
  action :cust
end

nimclient3

Tadam the machine was updated from 7100-03-04-1441 to 7100-03-05-1524 using a single a recipe and without knowing which service pack was available to update!

multibos

I really like the multibos way and I don’t know why today so few peoples are using it, anyway, I know some customers who are only working this way so I thought it was worth it working on a multibos resource. Here is a nice recipe creating a bos and updating it.

# creating dir for mount
directory "/var/tmp/mnt" do
  action :create
end

# mounting /mnt
mount "/var/tmp/mnt" do
  device "#{node[:nim_server]}:/export/nim/lpp_source"
  fstype 'nfs'
  action :mount
end

# removing standby multibos
aix_multibos "removing standby bos" do
  action :remove
end

# create multibos and updateit
aix_multibos "creating bos " do
  action :create
end

aix_multibos "update bos" do
  update_device "/var/tmp/mnt/7100-03-05-1524"
  action :update
end

# unmount /mnt
mount "/var/tmp/mnt" do
  action :umount
end

# deleting temp directory
directory "/var/tmp/mnt" do
  action :delete
end

multibos1
multibos2

Full recipes for updates

Let’s now write a big recipe doing all the things we need for an update. Remember that if one resource is failing the recipe stop by itself. For instance you’ll see in the recipe below that I’m doing an “lppchk -vm3″. If it returns something other than 0, the resources fail and the recipe fail. It’s obviously a normal behavior, it’s seems ok not to continue if there is a problem. So to sum up here are all the steps this recipe is doing: check fileset consistency, removing all fixes, committing filesets, creating an alternate disk, configuring the nimclient, running the update, deallocating resources

# if lppchk -vm return code is different
# than zero recipe will fail
# no guard needed here
execute "lppchk" do
  command 'lppchk -vm3'
end

# removing any efixes
aix_fixes "remvoving_efixes" do
  fixes ["all"]
  action :remove
end

# committing filesets
# no guard needed here
execute 'commit' do
  command 'installp -c all'
end

# cleaning exsiting altdisk
aix_altdisk "cleanup alternate rootvg" do
  action :cleanup
end

# creating an alternate disk using the
# first disk bigger than the actual rootvg
# bootlist to false as this disk is just a backup copy
aix_altdisk "altdisk_by_auto" do
  type :auto
  value "bigger"
  change_bootlist true
  action :create
end

# nimclient configuration
aix_niminit node[:hostname] do
  master "nimcloud"
  connect "nimsh"
  pif_name "en1"
  action :setup
end

# update to latest available tl/sp
aix_nimclient "updating to latest sp" do
  installp_flags "aXYg"
  lpp_source "latest_sp"
  fixes "update_all"
  action :cust
end

# dealloacate resource
aix_nimclient "deallocating resources" do
  action :deallocate
end

How about a single point of management “knife ssh”, “pushjobs”

Chef is and was designed on a pull model, it means that the client is asking to server to get the recipes and cookbooks and then execute them. This is the role of the chef-client. In a Linux environment, people are often running the client in demonized mode, it means that the client is waking up on a time interval basis and is executed (then every change to the cookbooks are run by the client). I’m almost sure that every AIX shop will be against this method because this one is dangerous. If you are doing that run the change first in test environment, then in dev, and finally in production. To be honest this is not the model I want to build where I am working. We want for some actions (like updates) a push model. By default Chef is delivered with a feature called push jobs. Push jobs is a way to run jobs like “execute the chef-client” from your knife workstation, unfortunately push jobs needs plugin to the chef-client and this one is only available on Linux OS …. not yet one AIX. Anyway we have an alternative, this one is the ssh knife plugin. This plugin that is in knife by default allows you to run commands on the nodes with ssh. Even better if you already have an ssh gateway with key sharing enabled knife ssh can use this gateway to communicate with the clients. Using knife ssh you’ll have the possibility to say “run chef-client on all my AIX 6.1 nodes” or “run this recipes installing this fix on all my AIX 7.1 nodes”, possibilities are infinite. Last note about knife ssh. This one is creating tunnels through your ssh gateway to communicate with the node, so if you use a shared key you have to copy the private key on the knife workstation (it tooks me time to understand that). Here are somes exmples:

knifessh

  • On two nodes check the current os level:
  • ssh1

  • Run the update with Chef:
  • update3

  • Alternates disk have been created:
  • update4

  • Both systems are up to date:
  • update5

Conclusion

I think this blog post helped you to better understand Chef and what is Chef capable of. We are still on the very beginning of the Chef cookbook and I’m sure plenty of new things (recipes, providers) will come in the next few months. Try it by yourself and I’m sure you’ll like the way it work. I must admit that it is difficult to learn and to start but if you are doing this right you’ll get the benefit of an automation tool working on AIX … and honestly AIX needs an automation tool. I’m almost sure it will be Chef (in fact we have no other choice). Help us to write postinstall recipes, updates recipes and any other recipes you can think about. We need your help and it is happening now! You have the opportunity to be a part of this, a part of something new that will help AIX in the future. We don’t want a dying os, Chef will give AIX the opportunity to be an OS with a fully working automation tool. Go give it a try now!

5 thoughts on “Updating AIX TL and SP using Chef

  1. Hi, nice post. Thank you very much for sharing. You recommend to use “push” model knif ssh and I have one question. Although you use knife ssh In this environment do you need a a chef master server?
    Thanks much!!!! Cheers from Andorra.

    • Yep, you will need a chef server to use knife !
      If you don’t want to do that you can also use chef-zero/chef-solo to launch recipe without having a server (and use a ssh loop to do multiple servers).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>