Running Docker on PowerSystems using Ubuntu and Redhat ppc64le (docker, registry, compose and swarm with haproxy)

Every blog post that I read since a couple of month are mentioning Docker, that’s a fact ! I’ve never been so stressed since years because our jobs are changing. That is not my choice or my will but what we were doing couple of years ago and what we are doing now is going to disappear sooner than I thought. The world of infrastructure as we know it is dying, same thing for sysadmin jobs. I would never have thought this was something that could happen to me during my career, but here we are. Old Unix systems are slowly dying and Linux virtual machines are becoming less and less popular. One part of my career plan is to be excellent on two different systems Linux and AIX but I now have to recognize I probably made a mistake thinking it will saves me from unemployment or from any bullshit job. We’re all gonna die that’s certain but the reality is that I’ll prefer working on something fun and innovative than being stuck on old stuffs forever. We’ve got Openstack for a while and we now have Docker. As no employers will look at a candidate with no Docker experience I had to learn this (in fact I’m using docker since more than one year now. My twitter followers already knows this). I don’t want to be one of the social-reject of a world that is changing too fast. Computer science is living its car crisis and we are the blue collars who will be left behind. There is no choice; there won’t be a place for everyone and you’ll not be the only one fighting in the pit trying to be hired. You have to react now or slowly die … like all the sysadmins I see in banks getting worse and worse. Moving them on Openstack was a real challenge (still not completed) I can’t imagine trying to make them work on Docker. On the other hand I’m also surrounded by excellent people (I have to say I’ve met a true genius a couple of years ago) who are doing crazy things. Unfortunately for me they are not working with me (they are in big companies (ie. RedHat/Oracle/Big blue) or in other places where people tends to understand something is changing and going on)). I feel like being a bad at everything I do. Unemployable. But I don’t want to die. I still have the energy to work on new things and Docker is a part of it. One of my challenge was/is to migrate all our infrastructure services on Docker, not just for the fun but to be able to easily reproduce this infrastructure over and over again. The goal here is to run every infrastructure service in a Docker containers and try at least to make them highly available. We are here going to see how to do that on PowerSystems trying to use Ubuntu or Redhat ppc64le to run our Docker engine and containers. We will next create our own Docker base images (Ubuntu and Redhat ones) and push it in our custom made registry. Then we will create containers for our applications (I’ll just give here some examples (webserver and grafana/influxdb). Then to finish we will try Swarm to make these containers highly available by creating “global/replicas” services. This blog post is also here to prove that Power is an architecture on which you can do the exact same thing as x86. Having Ubuntu 16.04 LTS available on ppc64le arch is a damn good thing because it provides a lot of Opensource products (graphite, grafana, influxdb and all web servers, and so on). Let’s do everything to become a killer DevOps. I have done this for sysadmin stuffs why the hell I’ll not be capable of providing the same effort on DevOps things. I’m not that bad, at least I try.

Image 1

Installing the docker-engine

Red Hat Enterprise Linux ppc64el

Unfortunately for our “little” community the current Red Hat Enterprise repositories for the ppc64le arch do not provides the Docker packages. IBM is providing a repository at this adress http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/. On my side I’m mirroring this repository on my local site (with wget) and create my own repository as my servers have no access to the internet. Keep in mind that this repository is not up to date with the lastest version of Docker. At the time I’m writing this blog post Docker 1.13 is available on this repository is still serving Docker 1.12. Not exactly what we want for a technology like Docker (we absolutely want to keep the engine up to date):

# wget --mirror http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/docker-ppc64el/
# wget --mirror http://ftp.unicamp.br/pub/ppc64el/rhel/7_1/misc_ppc64el/
# cat docker.repo
[docker-ppc64le-misc]
name=docker-ppc64le-msic
baseurl=http://nimprod:8080/dockermisc-ppc64el/
enabled=1
gpgcheck=0
[docker-ppc64le]
name=docker-ppc64le
baseurl=http://nimprod:8080/docker-ppc64el/
enabled=1
gpgcheck=0
# yum info docker.ppc64le
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
Installed Packages
Name        : docker
Arch        : ppc64le
Version     : 1.12.0
Release     : 0.ael7b
Size        : 77 M
Repo        : installed
From repo   : docker-ppc64le
Summary     : The open-source application container engine
URL         : https://dockerproject.org
License     : ASL 2.0
Description : Docker is an open source project to build, ship and run any application as a
[..]
# yum search swarm
yum search swarm
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
============================================================================================================================== N/S matched: swarm ==============================================================================================================================
docker-swarm.ppc64le : Docker Swarm is native clustering for Docker.
[..]
# yum -y install docker
[..]
Downloading packages:
(1/3): docker-selinux-1.12.0-0.ael7b.noarch.rpm                                                                                                                                                                                                          |  27 kB  00:00:00
(2/3): libtool-ltdl-2.4.2-20.el7.ppc64le.rpm                                                                                                                                                                                                             |  50 kB  00:00:00
(3/3): docker-1.12.0-0.ael7b.ppc64le.rpm                                                                                                                                                                                                                 |  16 MB  00:00:00
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                                                                                                                            33 MB/s |  16 MB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : libtool-ltdl-2.4.2-20.el7.ppc64le                                                                                                                                                                                                                            1/3
  Installing : docker-selinux-1.12.0-0.ael7b.noarch                                                                                                                                                                                                                         2/3
setsebool:  SELinux is disabled.
  Installing : docker-1.12.0-0.ael7b.ppc64le                                                                                                                                                                                                                                3/3
rhel72/productid                                                                                                                                                                                                                                         | 1.6 kB  00:00:00
  Verifying  : docker-selinux-1.12.0-0.ael7b.noarch                                                                                                                                                                                                                         1/3
  Verifying  : docker-1.12.0-0.ael7b.ppc64le                                                                                                                                                                                                                                2/3
  Verifying  : libtool-ltdl-2.4.2-20.el7.ppc64le                                                                                                                                                                                                                            3/3

Installed:
  docker.ppc64le 0:1.12.0-0.ael7b

Dependency Installed:
  docker-selinux.noarch 0:1.12.0-0.ael7b                                                                                                   libtool-ltdl.ppc64le 0:2.4.2-20.el7

Complete!
# systemctl start docker
# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.12.0
[..]

Enabling the device-mapper direct disk mode (instead of loop)

By default on RHEL after installing the docker packages and starting the engine Docker use an lvm loop device to create it’s pool (where the images and the containers will be stored). This is not recommanded and not good for production usage. That’s why on every docker engine host I’m creating a dockervg for this pool. Red Hat provides with the atomic host project a tool called docker-storage-setup to let you configure the thin pool for you (on another volume group).

# git clone https://github.com/projectatomic/docker-storage-setup.git
# cd docker-storage-setup
# make install

Create a volume group on a physical volume, configure and run docker-storage-setup:

# docker-storage-setup --reset
# systemctl stop docker
# rm -rf /var/lib/docker
# pvcreate /dev/mapper/mpathb
  Physical volume "/dev/mapper/mpathb" successfully created
# vgcreate dockervg /dev/mapper/mpathb
  Volume group "dockervg" successfully created
# cat /etc/sysconfig/docker-storage-setup
# Edit this file to override any configuration options specified in
# /usr/lib/docker-storage-setup/docker-storage-setup.
#
# For more details refer to "man docker-storage-setup"
VG=dockervg
SETUP_LVM_THIN_POOL=yes
DATA_SIZE=70%FREE
# /usr/bin/docker-storage-setup
  Rounding up size to full physical extent 104.00 MiB
  Logical volume "docker-pool" created.
  Logical volume "docker-pool" changed.
# cat /etc/sysconfig/docker-storage
DOCKER_STORAGE_OPTIONS="--storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/dockervg-docker--pool --storage-opt dm.use_deferred_removal=true "

I don’t know why on the version of docker I am running the DOCKER_STORAGE_OPTIONS (in /etc/sysconfig/docker-storage) was not read. I had to manually edit the systemctl unit to be able to let Docker use my thinpooldev:

# vi /usr/lib/systemd/system/docker.service
ExecStart=/usr/bin/dockerd --storage-driver devicemapper --storage-opt dm.fs=xfs --storage-opt dm.thinpooldev=/dev/mapper/dockervg-docker--pool --storage-opt dm.use_deferred_removal=true
# systemctl daemon-reload
# systemctl start docker
# docker info
[..]
Storage Driver: devicemapper
 Pool Name: dockervg-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 10.74 GB
 Backing Filesystem: xfs
 Data file:
 Metadata file:
 Data Space Used: 20.45 MB
 Data Space Total: 74.94 GB
 Data Space Available: 74.92 GB
 Metadata Space Used: 77.82 kB
 Metadata Space Total: 109.1 MB
 Metadata Space Available: 109 MB
 Thin Pool Minimum Free Space: 7.494 GB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: false
 Deferred Deleted Device Count: 0
 Library Version: 1.02.107-RHEL7 (2015-10-14)

Ubuntu 16.04 LTS ppc64le

As always on Ubuntu all is always super easy. I’m just deploying an Ubuntu 16.04 LTS and run a single apt install to install the docker engine. Neat. Just for you information as my server do not have any access to the internet I’m using a tool called apt-mirror to mirror the official Ubuntu repositories. The tool can be found easily on github at this address. https://github.com/apt-mirror/apt-mirror. You then just have to specify which arch and which repository you want to clone on your local site:

# cat /etc/apt/mirror.list
[..]
set defaultarch       ppc64el
[..]
set use_proxy         on
set http_proxy        proxy:8080
set proxy_user        benoit
set proxy_password    mypasswd
[..]
deb http://ports.ubuntu.com/ubuntu-ports xenial main restricted universe multiverse
deb http://ports.ubuntu.com/ubuntu-ports xenial-security main restricted universe multiverse
deb http://ports.ubuntu.com/ubuntu-ports xenial-updates main restricted universe multiverse
deb http://ports.ubuntu.com/ubuntu-ports xenial-backports main restricted universe multiverse
# /usr/local/bin/apt-mirror
Downloading 152 index files using 20 threads...
Begin time: Fri Feb 17 14:36:03 2017
[20]... [19]... [18]... [17]... [16]... [15]... [14]... [13]... [12]... [11]... [10]... [9]... [8]... [7]... [6]... [5].

After having downloaded the packages create a repository based on these downloaded deb files accessible trough http and install Docker:

# cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04 LTS (Xenial Xerus)"
# uname -a
Linux dockermachine1 4.4.0-21-generic #37-Ubuntu SMP Mon Apr 18 18:30:22 UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
# apt-install docker.io
Reading package lists... Done
Building dependency tree
Reading state information... Done
[..]
Setting up docker.io (1.10.3-0ubuntu6) ...
Adding group `docker' (GID 116) ...
# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

On ubuntu use aufs

I strongly recommend keeping aufs as the default filesystem to store containers and images. I’m creating and mounting the /var/lib/docker/aufs on another disk with a lot of space available and that’s it:

# pvcreate /dev/mapper/mpathb
  Physical volume "/dev/mapper/mpathb" successfully created
# vgcreate dockervg /dev/mapper/mpathb
  Volume group "dockervg" successfully created
# lvcreate -n dockerlv -L99G dockervg
  Logical volume "dockerlv" created.
# mkfs.ext4 /dev/dockervg/dockerlv
[..]
# echo "/dev/mapper/dockervg-dockerlv /var/lib/docker/ ext4 errors=remount-ro 0       1" > /etc/fstab
# systemctl stop docker
# mount /var/lib/docker
# systemctl start docker
# df -h | grep docker
/dev/mapper/dockervg-dockerlv   98G   61M   93G   1% /var/lib/docker

The docker-compose case

If you’re installing Docker on a Ubuntu host everything is easy as docker-compose will be available on the Ubuntu official repository. Just run an apt-get install docker-compose and it’s ok.

# apt install docker-compose
[..]
# docker-compose -v
docker-compose version 1.5.2, build unknown

On RedHat compose is not available on the repository delivred by IBM. docker-compose is just a python program and can be downloaded and install via pip. Download compose on a machine with internet access, then use pip to install it:

On the machine having the access to the internet:

# mkdir compose
# pip install --proxy "http://benoit:mypasswd@myproxy:8080"  --download="compose" docker-compose --force --upgrade
[..]
Successfully downloaded docker-compose cached-property six backports.ssl-match-hostname PyYAML ipaddress enum34 colorama requests jsonschema docker texttable websocket-client docopt dockerpty functools32 docker-pycreds
# scp -r compose dockerhost:~
docker_compose-1.11.1-py2.py3-none-any.whl                                                                                                                                                                                                    100%   83KB  83.4KB/s   00:00
cached_property-1.3.0-py2.py3-none-any.whl                                                                                                                                                                                                    100% 8359     8.2KB/s   00:00
six-1.10.0-py2.py3-none-any.whl                                                                                                                                                                                                               100%   10KB  10.1KB/s   00:00
backports.ssl_match_hostname-3.5.0.1.tar.gz                                                                                                                                                                                                   100% 5605     5.5KB/s   00:00
PyYAML-3.12.tar.gz                                                                                                                                                                                                                            100%  247KB 247.1KB/s   00:00
ipaddress-1.0.18-py2-none-any.whl                                                                                                                                                                                                             100%   17KB  17.1KB/s   00:00
enum34-1.1.6-py2-none-any.whl                                                                                                                                                                                                                 100%   12KB  12.1KB/s   00:00
colorama-0.3.7-py2.py3-none-any.whl                                                                                                                                                                                                           100%   19KB  19.5KB/s   00:00
requests-2.11.1-py2.py3-none-any.whl                                                                                                                                                                                                          100%  503KB 502.8KB/s   00:00
jsonschema-2.6.0-py2.py3-none-any.whl                                                                                                                                                                                                         100%   39KB  38.6KB/s   00:00
docker-2.1.0-py2.py3-none-any.whl                                                                                                                                                                                                             100%  103KB 102.9KB/s   00:00
texttable-0.8.7.tar.gz                                                                                                                                                                                                                        100% 9829     9.6KB/s   00:00
websocket_client-0.40.0.tar.gz                                                                                                                                                                                                                100%  192KB 191.6KB/s   00:00
docopt-0.6.2.tar.gz                                                                                                                                                                                                                           100%   25KB  25.3KB/s   00:00
dockerpty-0.4.1.tar.gz                                                                                                                                                                                                                        100%   14KB  13.6KB/s   00:00
functools32-3.2.3-2.zip                                                                                                                                                                                                                       100%   33KB  33.3KB/s   00:00
docker_pycreds-0.2.1-py2.py3-none-any.whl                                                                                                                                                                                                     100% 4474     4.4KB/s   00:00

On the machine runrunning docker:

# rpm -ivh python2-pip-8.1.2-5.el7.noarch.rpm
# cd compose
# pip install docker-compose -f ./ --no-index
[..]
Successfully installed colorama-0.3.7 docker-2.1.0 docker-compose-1.11.1 ipaddress-1.0.18 jsonschema-2.6.0
# docker-compose -v
docker-compose version 1.11.1, build 7c5d5e4

Creating you docker base images and run your first application (a web server)

Regardless of which Linux distribution you have chosen you now need a docker base image to run your first containers. You have two choices: downloading an image from the internet and modify it to your own needs or create an image by yourself base on your current os.

Downloading an image from the internet

From a machine having an access to the internet install the docker engine and download the Ubuntu image. Using the docker save command create a tar based image. This one can then be imported on any docker engine using the docker load command:

  • On the machine having access to the internet:
  • # docker pull ppc64le/ubuntu
    # docker save ppc644le/ubuntu > /tmp/ppc64le_ubuntu.tar
    
  • On your docker engine host:
  • # docker load  < ppc64le_ubuntu.tar
    4fad21ac6351: Loading layer [==================================================>] 173.5 MB/173.5 MB
    625e647dc584: Loading layer [==================================================>] 15.87 kB/15.87 kB
    8505832e8bea: Loading layer [==================================================>] 9.216 kB/9.216 kB
    9bca281924ab: Loading layer [==================================================>] 4.608 kB/4.608 kB
    289bda1cbd14: Loading layer [==================================================>] 3.072 kB/3.072 kB
    Loaded image: ppc64le/ubuntu:latest
    # docker images
    REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
    ppc64le/ubuntu      latest              1967d889e07f        3 months ago        167.9 MB
    

The problem is that this image is not customized for your/my own needs. By this I mean the repositories used by the image are “pointing” to the officials Ubuntu repositories which will obviously not work if you have no access to the internet. We now have to modify the image for our needs. Run a container and launch a shell, then modify the sources.list with you local repository. Then commit this images to validate the changes made inside this one (you will generate a new image based on the current one plus your modifications):

# docker run -it ppc64le/ubuntu /bin/bash
# rm /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main/ xenial main" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main/ xenial-updates main" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main/ xenial-security main" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial restricted" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-updates restricted" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-security restricted" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial universe" >> /etc/apt/sources.list
# echo "#deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-updates universe" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-security universe" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial multiverse" >> /etc/apt/sources.list
# echo "#deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-updates multiverse" >> /etc/apt/sources.list
# echo "deb http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports/ xenial-security multiverse" >> /etc/apt/sources.list
# exit
# docker ps -a
# docker commit
# docker commit a9506bd5dd30 ppc64le/ubuntucust
sha256:423c13b604dee8d24dae29566cd3a2252e4060270b71347f8d306380b8b6817d
# docker images

Test the image is working by creating an image based on the one just created before. I’m here creating a dockerfile to do this. I’m not explaining here how dockerfiles are working, there are plenty of tutorial on the internet to learn this. To sum up you need to now the basis of Docker to read this blog post ;-) .

# cat dockerfile
FROM ppc64le/ubuntucust

RUN apt-get -y update && apt-get -y install apache2

ENV APACHE_RUN_USER www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR /var/log/apache2
ENV APACHE_PID_FILE /var/run/apache2.pid
ENV APACHE_RUN_DIR /var/run/apache2
ENV APACHE_LOCK_DIR /var/lock/apache2

RUN mkdir -p $APACHE_RUN_DIR $APACHE_LOCK_DIR $APACHE_LOG_DIR

EXPOSE 80

CMD [ "-D", "FOREGROUND" ]
ENTRYPOINT ["/usr/sbin/apache2"]

I’m building the image calling it ubuntu_apache2 (this image will run a single apache2 server and expose the port 80):

# docker build -t ubuntu_apache2 . 
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM ppc64le/ubuntucust
 ---> 423c13b604de
Step 2 : RUN apt-get -y update && apt-get -y install apache2
 ---> Running in 5f868988bf5c
Get:1 http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main xenial InRelease [247 kB]
Get:2 http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main xenial-updates InRelease [102 kB]
Get:3 http://ubuntuppc64le.chmod666.org:8080/ubuntu-ports-main xenial-security InRelease [102 kB]
debconf: unable to initialize frontend: Dialog
debconf: (TERM is not set, so the dialog frontend is not usable.)
debconf: falling back to frontend: Readline
Processing triggers for libc-bin (2.23-0ubuntu4) ...
Processing triggers for systemd (229-4ubuntu11) ...
Processing triggers for sgml-base (1.26+nmu4ubuntu1) ...
 ---> 4256ac36c0f7
Removing intermediate container 5f868988bf5c
Step 3 : EXPOSE 80
 ---> Running in fc72a50d3f1d
 ---> 3c273b0e2c3f
Removing intermediate container fc72a50d3f1d
Step 4 : CMD -D FOREGROUND
 ---> Running in 112d87a2f1e6
 ---> e6ddda152e97
Removing intermediate container 112d87a2f1e6
Step 5 : ENTRYPOINT /usr/sbin/apache2
 ---> Running in 6dab9b99f945
 ---> bed93aae55b3
Removing intermediate container 6dab9b99f945
Successfully built bed93aae55b3
# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED              SIZE
ubuntu_apache2       latest              bed93aae55b3        About a minute ago   301.8 MB
ppc64le/ubuntucust   latest              423c13b604de        7 minutes ago        167.9 MB
ppc64le/ubuntu       latest              1967d889e07f        3 months ago         167.9 MB

Run a container with this image and expose the port 80:

# docker run -d -it -p 80:80 ubuntu_apache2
49916e3703c1cf0a671be10984b3215478973c0fd085490a61142b8959495732
# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                NAMES
49916e3703c1        ubuntu_apache2      "/usr/sbin/apache2 -D"   12 seconds ago      Up 10 seconds       0.0.0.0:80->80/tcp   high_brattain
# ps -ef | grep -i apache
root     11282 11267  0 11:04 pts/1    00:00:00 /usr/sbin/apache2 -D FOREGROUND
33       11302 11282  0 11:04 pts/1    00:00:00 /usr/sbin/apache2 -D FOREGROUND
33       11303 11282  0 11:04 pts/1    00:00:00 /usr/sbin/apache2 -D FOREGROUND
root     11382  3895  0 11:04 pts/0    00:00:00 grep --color=auto -i apache

On another host test the service is running by using curl (you can see here that you have access to the default index page of the Ubuntu apache2 server):

# curl mydockerhost
  <body>
    <div class="main_page">
      <div class="page_header floating_element">
        <img src="/icons/ubuntu-logo.png" alt="Ubuntu Logo" class="floating_element"/>
        <span class="floating_element">
          Apache2 Ubuntu Default Page
[..]

Creating your own image

You can also create your own image from scratch. For RHEL based systems (Centos, Fedora), Redhat provides an awesome script doing the job for you. This script is called mkimage-yum.sh and can be directly download from github. Have a look in it if you want to have the exact details (mknode, yum installroot, …..). The script will create a tar file and import it. After running the script you will have a new image available to use:

# wget https://github.com/docker/docker/blob/master/contrib/mkimage-yum.sh
# chmod +x mkimage-yum.sh 
# ./mkimage-yum.sh baserehel72
[..]
+ tar --numeric-owner -c -C /tmp/base.sh.bxma2T .
+ docker import - baserhel72:7.2
sha256:f8b80847b4c7fe03d2cfdeda0756a7aa857eb23ab68e5c954cf3f0cb01f61562
+ docker run -i -t --rm baserhel72:7.2 /bin/bash -c 'echo success'
success
+ rm -rf /tmp/base.sh.bxma2T
# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED              SIZE
baserhel72           7.2                 f8b80847b4c7        About a minute ago   309.1 MB
[..]

I’m running a web server to be sure everything is working is ok (same thing than on Ubuntu, httpd installation and exposing the port 80). Here below here is the dockerfile and the image build:

# cat dockerfile
FROM baserhel72:7.2

RUN yum -y update && yum -y upgrade && yum -y install httpd

EXPOSE 80

CMD [ "-D", "FOREGROUND" ]
ENTRYPOINT ["/usr/sbin/httpd"]
# docker build -t rhel_httpd .
Sending build context to Docker daemon 2.048 kB
Step 1 : FROM baserhel72:7.2
 ---> 0c22a33fc079
Step 2 : RUN yum -y update && yum -y upgrade && yum -y install httpd
 ---> Running in 74c79763c56f
[..]
Dependency Installed:
  apr.ppc64le 0:1.4.8-3.el7                apr-util.ppc64le 0:1.5.2-6.el7
  httpd-tools.ppc64le 0:2.4.6-40.el7       mailcap.noarch 0:2.1.41-2.el7

Complete!
 ---> 73094e173c1b
Removing intermediate container 74c79763c56f
Step 3 : EXPOSE 80
 ---> Running in 045b86d1a6dc
 ---> f032c1569201
Removing intermediate container 045b86d1a6dc
Step 4 : CMD -D FOREGROUND
 ---> Running in 9edc1cc2540d
 ---> 6d5d27171cba
Removing intermediate container 9edc1cc2540d
Step 5 : ENTRYPOINT /usr/sbin/httpd
 ---> Running in 8280382d61f0
 ---> f937439d4359
Removing intermediate container 8280382d61f0
Successfully built f937439d4359

Again I’m launching a container and checking the service is available by curling the docker host. You can see that the image is based on RedHat … and the default page is the RHEL test page :

# docker run
# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                NAMES
30d090b2f0d1        rhel_httpd          "/usr/sbin/httpd -D F"   3 seconds ago       Up 1 seconds        0.0.0.0:80->80/tcp   agitated_boyd
# curl localhost
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http//www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
        <head>
                <title>Test Page for the Apache HTTP Server on Red Hat Enterprise Linux</title>
[..]

Creating your own docker registry

We now have our base Docker images but we want to make the available on every docker hosts without having to recreate them over and over again. To do so we are going to create what we call a docker registry. This registry will allow us to distribute our images across different docker hosts. Neat :-) . When you are installing Docker the package docker-distribution is also installed and is shipped with a binary called “registry”. Why not running the registry … in a Docker container?

  • Verify you have the registry command on the system:
  • # which registry
    /usr/bin/registry
    # registry --version
    registry github.com/docker/distribution v2.3.0+unknown
    
  • The package containing the registry is docker-distribution:
  • # yum whatproviders /usr/bin/registry
    Loaded plugins: product-id, search-disabled-repos, subscription-manager
    This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
    docker-distribution-2.3.0-2.ael7b.ppc64le : Docker toolset to pack, ship, store, and deliver content
    Repo        : @docker
    Matched from:
    Filename    : /usr/bin/registry
    

It’s a question of “chicken or egg” but you will need to have a base image to create your registry image (it’s obvious). As we have created our images locally we will now use one of these image (the RedHat one) to run the docker registry in a container. Here are the steps we are going to follow.

  • Create a dockerfile based on the RedHat image we just created before. This docker file will contain the registry binary (registry) (COPY ./registry), the registry config file (config.yml) (COPY ./config.yml) and a wrapper script allowing its execution (entrypoint.sh) (COPY ./entrypoint.sh). We will also secure the registry with a password using htaccess file type (RUN htpasswd). Finally we will make volumes (VOLUME) /var/lib/registry and /certs available and expose (EXPOSE) the port 5000. Obviously necessaries directories will be create (RUN mkdir) and need tools will be install (RUN yum). I’m also here generating the htaccess file with regimguser with the password regimguser:
  • # cat dockerfile
    FROM ppc64le/rhel72:7.2
    
    RUN yum update && yum upgrade && yum -y install httpd-tools
    RUN mkdir /etc/registry && mkdir /certs
    
    COPY ./registry /usr/bin/registry
    COPY ./entrypoint.sh /entrypoint.sh
    COPY ./config.yml /etc/registry/config.yml
    
    RUN htpasswd -b -B -c /etc/registry/registry_passwd regimguser regimguser
    
    VOLUME ["/var/lib/registry", "/certs"]
    EXPOSE 5000
    
    ENTRYPOINT ["./entrypoint.sh"]
    
    CMD ["/etc/registry/config.yml"]
    
  • Copy the registry binary to the directory containing the dockerfile:
  • # cp /usr/bin/registry .
    
  • create an entrypoint.sh file in the directory containing the dockerfile. This script will launch the registry binary:
  • # cat entrypoint.sh
    #!/bin/sh
    
    set -e
    exec /usr/bin/registry "$@"
    
  • Create a configuration file for the registry in the directory containing the dockerfile and name it config.yml. This configuration file will contain where to store the registry file, the certification and the authentication method to the registry (we are using an htaccess file):
  • version: 0.1
    storage:
      filesystem:
        rootdirectory: /var/lib/registry
      delete:
        enabled: true
    http:
      addr: :5000
      tls:
          certificate: /certs/domain.crt
          key: /certs/domain.key
    auth:
      htpasswd:
        realm: basic-realm
        path: /etc/registry/registry_passwd
    
  • Build the image:
  • # docker build -t registry .
    ending build context to Docker daemon 13.57 MB
    Step 1 : FROM ppc64le/rhel72:7.2
     ---> 9005cbc9c7f6
    Step 2 : RUN yum update && yum upgrade && yum -y install httpd-tools
     ---> Using cache
     ---> de34fdf3864e
    Step 3 : RUN mkdir /etc/registry && mkdir /certs
     ---> Using cache
     ---> c801568b6944
    Step 4 : COPY ./registry /usr/bin/registry
     ---> Using cache
     ---> 49927e0a90b8
    Step 5 : COPY ./entrypoint.sh /entrypoint.sh
     ---> Using cache
    [..]
    Removing intermediate container 261f2b380556
    Successfully built ccef43825f21
    # docker images
    REPOSITORY                                          TAG                 IMAGE ID            CREATED             SIZE
                                                                16d35e8c1177        About an hour ago   361 MB
    registry                                            latest              4287d4e389dc        2 hours ago         361 MB
    

We now need to generate certificates and place it in the right directories to make the registry secure:

  • Generate an ssl certificate:
  • # cd /certs
    # openssl req  -newkey rsa:4096 -nodes -sha256 -keyout /certs/domain.key  -x509 -days 365 -out /certs/domain.crt
    Generating a 4096 bit RSA private key
    .............................................................................................................................................................++
    ..........................................................++
    writing new private key to '/certs/domain.key'
    -----
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    [..]
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [XX]:
    State or Province Name (full name) []:
    Locality Name (eg, city) [Default City]:
    Organization Name (eg, company) [Default Company Ltd]:
    Organizational Unit Name (eg, section) []:
    Common Name (eg, your name or your server's hostname) []:dockerengineppc64le.chmod666.org
    Email Address []:
    
  • Copy the certificates on every docker engine host that will need to access the registry:
  • # mkdir /etc/docker/certs.d/dockerengineppc64le.chmod666.org\:5000/
    # cp /certs/domain.crt /etc/docker/certs.d/dockerengineppc64le.chmod666.org\:5000/cat.crt
    # cp /certs/domain.crt /etc/pki/ca-trust/source/anchors/dockerengineppc64le.chmod666.org.crt
    # update-ca-trust
    
  • Restart docker:
  • # systemctl restart docker
    

Now that everything is ok regarding the image and the certificates, let’s now run the Docker container upload and download an image into the registry:

  • Run the container, expose the port 5000 (-p 5000:5000), be sure the registry will be started when docker start (–restart=always), let the container access the certificates we have created before (-v /certs:/certs), store the images in /var/lib/registry (-v /var/lib/registry:/var/lib/registry):
  • # docker run -d -p 5000:5000 --restart=always -v /certs:/certs -v /var/lib/registry:/var/lib/registry --name registry registry
    51ad253616be336bcf5a1508bf48b059f01ebf20a0772b35b5686b4012600c46
    # docker ps
    CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
    51ad253616be        registry            "./entrypoint.sh /etc"   10 seconds ago      Up 8 seconds        0.0.0.0:5000->5000/tcp   registry
    
  • Connect to the registry using docker login (user and login created before will be asked). Then pull and push an image to be sure everything is working. The only way to list the images available in the registry is to make a call to the registry api and check the catalog:
  • # docker login https://dockerengineppc64le.chmod666.org:5000
    Username (regimguser): regimguser
    Password:
    Login Succeeded
    # docker tag grafana dockerengineppc64le.chmod666.org:5000/ppc64le/grafana
    # The push refers to a repository [dockerengineppc64le.chmod666.org:5000/ppc64le/grafana]
    82bca1cb11d8: Pushed
    9c1f2163c216: Pushing [==>                                                ] 22.83 MB/508.9 MB
    1df85fc1eaaf: Mounted from ppc64le/ubuntucust
    289bda1cbd14: Mounted from ppc64le/ubuntucust
    9bca281924ab: Mounted from ppc64le/ubuntucust
    8505832e8bea: Mounted from ppc64le/ubuntucust
    625e647dc584: Mounted from ppc64le/ubuntucust
    4fad21ac6351: Mounted from ppc64le/ubuntucust
    [..]
    atest: digest: sha256:88eef1b47ec57dd255aa489c8a494c11be17eb35ea98f38a63ab9f5690c26c1f size: 1984
    # curl --cacert /certs/domain.crt -X GET https://regimguser:regimguser@dockerengineppc64le.chmod666.org:5000/v2/_catalog
    {"repositories":["ppc64le/grafana","ppc64le/ubuntucust"]}
    # docker pull dockerengineppc64le.chmod666.org:5000/ppc64le/grafana
    Using default tag: latest
    latest: Pulling from ppc64le/grafana
    Digest: sha256:88eef1b47ec57dd255aa489c8a494c11be17eb35ea98f38a63ab9f5690c26c1f
    Status: Image is up to date for dockerengineppc64le.chmod666.org:5000/ppc64le/grafana:latest
    

Running a more complex application (graphana + influxdb)

One of the application I’m running is grafana. This grafana is used with influxdb as a datasource. We will see here how to run grafana/influxdb in a docker containers running a ppc64le Redhat distribution:

Build the grafana docker image

First create the docker file. You now have seen a lot of dockerfiles in this blog post so I’ll not explain this to you in details. The docker engine is running on Redhat but the image used here is an Ubuntu one. Grafana and Influxdb are available in the Ubuntu repositories.

# cat /data/docker/grafana/dockerfile
FROM ppc64le/ubuntucust

RUN apt-get update && apt-get -y install grafana gosu

VOLUME ['/var/lib/grafana', '/var/log/grafana", "/etc/grafana']

EXPOSE 3000

COPY ./run.sh /run.sh

ENTRYPOINT ["/run.sh"]

Here is the entrypoint script that will run grafan when the docker container will start :

# cat /data/docker/grafana/run.sh
#!/bin/bash -e

: "${GF_PATHS_DATA:=/var/lib/grafana}"
: "${GF_PATHS_LOGS:=/var/log/grafana}"
: "${GF_PATHS_PLUGINS:=/var/lib/grafana/plugins}"

chown -R grafana:grafana "$GF_PATHS_DATA" "$GF_PATHS_LOGS"
chown -R grafana:grafana /etc/grafana

if [ ! -z "${GF_INSTALL_PLUGINS}" ]; then
  OLDIFS=$IFS
  IFS=','
  for plugin in ${GF_INSTALL_PLUGINS}; do
    grafana-cli plugins install ${plugin}
  done
  IFS=$OLDIFS
fi

exec gosu grafana /usr/sbin/grafana  \
  --homepath=/usr/share/grafana             \
  --config=/etc/grafana/grafana.ini         \
  cfg:default.paths.data="$GF_PATHS_DATA"   \
  cfg:default.paths.logs="$GF_PATHS_LOGS"   \
  cfg:default.paths.plugins="$GF_PATHS_PLUGINS"

Then build grafana image:

# cd /data/docker/grafana
# docker build -t grafana .
Step 3 : VOLUME ['/var/lib/grafana', '/var/log/grafana", "/etc/grafana']
 ---> Running in 7baf11e2a2b6
 ---> f3449dd17ad4
Removing intermediate container 7baf11e2a2b6
Step 4 : EXPOSE 3000
 ---> Running in 89e10b7bfa5e
 ---> cdc65141d2f4
Removing intermediate container 89e10b7bfa5e
Step 5 : COPY ./run.sh /run.sh
 ---> 0a75c203bc8e
Removing intermediate container 885719ef1fde
Step 6 : ENTRYPOINT /run.sh
 ---> Running in 56f8b7d1274a
 ---> 4ca5c23b9aba
Removing intermediate container 56f8b7d1274a
Successfully built 4ca5c23b9aba
# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
grafana              latest              4ca5c23b9aba        32 seconds ago      676.8 MB
ppc64le/ubuntucust   latest              c9274707505e        12 minutes ago      167.9 MB
ppc64le/ubuntu       latest              1967d889e07f        3 months ago        167.9 MB

Run it and verify it works ok:

# docker run -d -it -p 443:3000 grafana
19bdd6c82a37a7275edc12e91668530fc1d52699542dae1e17901cce59f1230a
# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS                   NAMES
19bdd6c82a37        grafana             "/run.sh"           26 seconds ago      Up 24 seconds       0.0.0.0:443->3000/tcp   kickass_mcclintock
# docker logs 19bdd6c82a37
2017/02/17 15:28:36 [I] Starting Grafana
2017/02/17 15:28:36 [I] Version: master, Commit: NA, Build date: 1970-01-01 00:00:00 +0000 UTC
2017/02/17 15:28:36 [I] Configuration Info
Config files:
  [0]: /usr/share/grafana/conf/defaults.ini
  [1]: /etc/grafana/grafana.ini
Command lines overrides:
  [0]: default.paths.data=/var/lib/grafana
  [1]: default.paths.logs=/var/log/grafana
Paths:
  home: /usr/share/grafana
  data: /var/lib/grafana
[..]

grafana

Build the influxdb docker image

Same job for the influxdb image, this one is also based on the Ubuntu image. I’m here showing you the dockerfile (as always packages installation, volume, port exposition). I’m here also including a configuration file influxdb (you can see here I’m also including a configuration file for influxdb):

# cat /data/docker/influxdb/dockerfile
FROM ppc64le/ubuntucust

RUN apt-get update && apt-get -y install influxdb

VOLUME ['/var/lib/influxdb']

EXPOSE 8086 8083

COPY influxdb.conf /etc/influxdb.conf

COPY entrypoint.sh /entrypoint.sh
ENTRYPOINT ["/entrypoint.sh"]
CMD ["/usr/bin/influxd"]
# cat influxdb.conf
[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  engine = "tsm1"
  wal-dir = "/var/lib/influxdb/wal"

[admin]
  enabled = true
# cat entrypoint.sh
#!/bin/bash
set -e

if [ "${1:0:1}" = '-' ]; then
    set -- influxd "$@"
fi

exec "$@"

Then build influxdb image:

# docker build -t influxdb .
[..]
Step 3 : VOLUME ['/var/lib/influxdb']
 ---> Running in f3570a5a6c91
 ---> 014035e3134c
Removing intermediate container f3570a5a6c91
Step 4 : EXPOSE 8086 8083
 ---> Running in 590405701bfc
 ---> 25f557aae499
Removing intermediate container 590405701bfc
Step 5 : COPY influxdb.conf /etc/influxdb.conf
 ---> c58397a5ae7b
Removing intermediate container d22132ec9925
Step 6 : COPY entrypoint.sh /entrypoint.sh
 ---> 25e931d39bbc
Removing intermediate container 680eacd6597e
Step 7 : ENTRYPOINT /entrypoint.sh
 ---> Running in 0695135e81c0
 ---> 44ed7385ae61
Removing intermediate container 0695135e81c0
Step 8 : CMD /usr/bin/influxd
 ---> Running in f59cbcd5f199
 ---> 073eeeb78055
Removing intermediate container f59cbcd5f199
Successfully built 073eeeb78055
# docker images
REPOSITORY           TAG                 IMAGE ID            CREATED             SIZE
influxdb             latest              073eeeb78055        28 seconds ago      202.7 MB
grafana              latest              4ca5c23b9aba        11 minutes ago      676.8 MB
ppc64le/ubuntucust   latest              c9274707505e        23 minutes ago      167.9 MB
ppc64le/ubuntu       latest              1967d889e07f        3 months ago        167.9 MB

Run an influxdb container to verify it works ok:

# docker run -d -it -p 8080:8083 influxdb
c0c042c7bc1a361d1bcff403ed243651eac88270738cfc390e35dfd434cfc457
# docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
c0c042c7bc1a        influxdb            "/entrypoint.sh /usr/"   4 seconds ago       Up 1 seconds        0.0.0.0:8080->8086/tcp   amazing_goldwasser
19bdd6c82a37        grafana             "/run.sh"                10 minutes ago      Up 10 minutes       0.0.0.0:443->3000/tcp    kickass_mcclintock
#  docker logs c0c042c7bc1a

 8888888           .d888 888                   8888888b.  888888b.
   888            d88P"  888                   888  "Y88b 888  "88b
   888            888    888                   888    888 888  .88P
   888   88888b.  888888 888 888  888 888  888 888    888 8888888K.
   888   888 "88b 888    888 888  888  Y8bd8P' 888    888 888  "Y88b
   888   888  888 888    888 888  888   X88K   888    888 888    888
   888   888  888 888    888 Y88b 888 .d8""8b. 888  .d88P 888   d88P
 8888888 888  888 888    888  "Y88888 888  888 8888888P"  8888888P"

2017/02/17 15:39:08 InfluxDB starting, version 0.10.0, branch unknown, commit unknown, built unknown
2017/02/17 15:39:08 Go version go1.6rc1, GOMAXPROCS set to 16

influx

docker-compose

Now that we have two images on for grafana and one for influxdb lets make work them together. To do so we will user docker-compose. docker-compose allows to to describe the containers you want to run in a yml file and link them together. You can see below there are two different entries, one for influx db telling which image I’m going to use, the container name, the port that will be expose to the docker host (equivalent of -p 8080:8083 with a docker run command) and the volumes (-v with docker run command). For the grafana container everything is almost the same exception the “links” part. The grafana container should be able to “talk” to the influxdb one (to use influxdb as a datasource). The “links” stanza of the the yml file tells an entry containing the influxdb ip and name will be add in the /etc/hosts file of the grafana container. When you are going to configure grafana you will be able to use the “influxdb” name to access the database.:

# cat docker-compose.yml
influxdb:
  image: influxdb:latest
  container_name: influxdb
  ports:
    - "8080:8083"
    - "80:8086"
  volumes:
    - "/data/docker/influxdb/var/lib/influxdb:/var/lib/influxdb"

grafana:
  image: grafana:latest
  container_name: grafana
  ports:
    - "443:3000"
  links:
    - influxdb
  volumes:
    - "/data/docker/grafana/var/lib/grafana:/var/lib/grafana"
    - "/data/docker/grafana/var/log/grafana:/var/log/grafana"

To create the containers just run the “docker-compose up” (from the directory containing the yml file) command, this will create all the containers described in the yml file. Same for destroying them run a “docker-compose down.

# docker-compose up -d
Creating influxdb
Creating grafana
# docker ps
ONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                                                    NAMES
5df7f3d58631        grafana:latest      "/run.sh"                About a minute ago   Up About a minute   0.0.0.0:443->3000/tcp                                    grafana
727dfc6763e1        influxdb:latest     "/entrypoint.sh /usr/"   About a minute ago   Up About a minute   8083/tcp, 0.0.0.0:80->8086/tcp, 0.0.0.0:8080->8086/tcp   influxdb
# docker-compose down
Stopping grafana ... done
Stopping influxdb ... done
Removing grafana ... done
Removing influxdb ... done

Just to prove you everything is working ok I’m logging inside the influxdb container and pushing some data to the database using the NOAA_data.txt file provided by influxdb guy (these are just test data).

# docker exec -it 15845e92152f /bin/bash
# apt-get install influxdb-client
# cd /var/lib/influxdb ; influx -import -path=NOAA_data.txt -precision=s
2017/02/17 17:00:35 Processed 1 commands
2017/02/17 17:00:35 Processed 76290 inserts
2017/02/17 17:00:35 Failed 0 inserts

I'm finally logging into the grafana (from a browser) and configuring the access to the database. I can create graphs based on the data just after doing this.

grafanaok1
grafanaok2

Creating a swarm cluster

Be very careful when starting with swarm. There are 2 different type of “swarm”. The swarm before docker 1.12 (called docker-swarm) and the swarm starting from docker 1.12 (call swarm mode). As the first version of swarm is already deprecated we will here use the swarm more embedded with docker 1.12. In this case no need to install additional software the swarm more is embedded with the docker binaries. The swarm mode can be used with the “docker service” commands to create what we call services (multiple docker-containers running across the swarm cluster with rules/constraints applied on them (create the containers on all the hosts, only on a couple of node and so on). First initialize the swarm mode on the machines (I’ll only use two nodes in my swarm cluster in the examples below) and all the worker nodes be sure you are logged in the registry (certificates are copied, docker login was done):

We will setup the swarm cluster on two nodes just to show you a simple example of the power of this technology. The first step is to choose a leader (there is one leader among the managers and the manager leader is responsible for the orchestration and the management of the swarm cluster) (if the leader has an issue one of the manager will take the lead) and a worker (you can have as many workers as you want in the swarm cluster). In the example below the manager/leader will be called (node1(manager)#) and the worker will be called (node2(worker)#). User the “docker swarm init” command to create your leader. The advertise address is the public address of the machine. The command will give you the commands to launch on the other managers or worker to allow them to join the cluster. Be sure the port tcp 2377 is reachable from all the nodes to the leader/managers. Last thing to add: swarm services rely on a overlay network, you need to createit to be able to create your swarm services:

node1(manager)# docker swarm init --advertise-addr 10.10.10.49
Swarm initialized: current node (813ompnl4c7f4ilkxqy0faj59) is now a manager.

To add a worker to this swarm, run the following command:
    docker swarm join \
    --token SWMTKN-1-69tw66gb9jwfl8y46ujeemj3p5v85ikrqvwmqzb2x32kqmek8e-a9dv25loilaor6jfmcdq8je6h \
    10.10.10.49:2377

To add a manager to this swarm, run the following command:
    docker swarm join \
    --token SWMTKN-1-69tw66gb9jwfl8y46ujeemj3p5v85ikrqvwmqzb2x32kqmek8e-9e82z5k7qrzxsk2autu9ajt3r \
    10.10.10.49:2377
node1(manager)# docker node ls
ID                           HOSTNAME                   STATUS  AVAILABILITY  MANAGER STATUS
813ompnl4c7f4ilkxqy0faj59 *  swarm1.chmod666.org  Ready   Active        Leader
node1(manager)# docker network create -d overlay mynet
8mv5ydu9vokx
node1(amanger)# docker network ls
8mv5ydu9vokx        mynet               overlay             swarm

On the worker node run the command to join the cluster and verify all the nodes are Ready and Active. This will mean that you are ready to use the swarm cluster:

node2(worker)# docker swarm join --token SWMTKN-1-69tw66gb9jwfl8y46ujeemj3p5v85ikrqvwmqzb2x32kqmek8e-a9dv25loilaor6jfmcdq8je6h 10.10.10.49:2377
This node joined a swarm as a worker.
node1(manager)# docker node ls
ID                           HOSTNAME                   STATUS  AVAILABILITY  MANAGER STATUS
813ompnl4c7f4ilkxqy0faj59 *  swarm1.chmod666.org        Ready   Active        Leader
bh7mhv3hg1x98b9j6lu00c3ef    swarm2.chmod666.org        Ready   Active

The cluster is up and ready. Before working with it we need to find a solution to share the data of our application among the cluster. The best solution (from my point of view) is to use gluster, but for the convenience of this blog post I’ll just create a small nfs server on the leader node and mount the data on the worker node (for a production server the nfs server should be externalized (mounted from a NAS server)):

node1(manager)# exportfs
# exportfs
/nfs            
node2(worker)# mount | grep nfs
mount | grep nfs
[..]
swarm1.chmod666.org:/nfs on /nfs type nfs4 (rw,relatime,vers=4.0,rsize=524288,wsize=524288,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.48,local_lock=none,addr=10.10.10.49)

Running an appliction in a swarm cluster

We now have the swarm cluster ready to run some services but we first need a service. I’ll use a web application created by myself called whoami (inspired by the emilevauge/whoami application), just displaying the hostname and the ip address of the node running the service). I’m first creating a dockerfile allowing me to create container image ready to run any cgi ksh scripts. The dockerfile is copying a configuration file in /etc/httpd/conf.d and serving the files in /var/www/mysite on an /whoami/ alias:

# cd /data/dockerfile/httpd
# cat dockerfile
FROM swarm1.chmod666.org:5000/ppc64le/rhel72:latest

RUN yum -y install httpd
RUN mkdir /var/www/mysite && chown apache:apache /var/www/mysite

EXPOSE 80

COPY ./mysite.conf /etc/httpd/conf.d
VOLUME ['/var/www/html', '/var/www/mysite']

CMD [ "-D", "FOREGROUND" ]
ENTRYPOINT ["/usr/sbin/httpd"]
# cat mysite.conf
Alias /whoami/ "/var/www/mysite/"

  AddHandler cgi-script .ksh
  DirectoryIndex whoami.ksh
  Options Indexes FollowSymLinks ExecCGI
  AllowOverride None
  Require all granted

I’m then building the image and pushing it into my private registry. The image is now available for download on any node of the swarm cluster:

Sending build context to Docker daemon 3.072 kB
Step 1 : FROM dockerengineppc64le.chmod666.org:5000/ppc64le/rhel72:latest
 ---> 9005cbc9c7f6
Step 2 : RUN yum -y install httpd
 ---> Using cache
 ---> 1bc91df747cd
[..]
 ---> Using cache
 ---> afb3cf77eb8a
Step 8 : ENTRYPOINT /usr/sbin/httpd
 ---> Using cache
 ---> 187da163e084
Successfully built 187da163e084
# docker tag httpd swarm1.chmod666.org:5000/ppc64le/httpd
# docker push swarm1.chmod666.org:5000/ppc64le/httpd
The push refers to a repository [swarm1.chmod666.org:5000/ppc64le/httpd]
92d958e708cc: Layer already exists
[..]
latest: digest: sha256:3b1521432c9704ca74707cd2f3c77fb342a957c919787efe9920f62a26b69e26 size: 1156

Now that the image is ready we will create the application, it’s just a single ksh script and a css file.

# ls /nfs/docker/whoami/
table-responsive.css  whoami.ksh
# cat whoami.ksh
#!/usr/bin/bash

hostname=$(hostname)
uname=$(uname -a)
ip=$(hostname -I)
date=$(date)
env=$(env)
echo ""
echo "<html>"
echo "<head>"
echo "  <title>Docker exemple</title>"
echo "  <link href="table-responsive.css" media="screen" type="text/css" rel="stylesheet" />"
echo "</head>"
echo "<body>"
echo "<h1><span class="blue"><<span>Docker<span class="blue"><span> <span class="yellow">on PowerSystems ppc64le</pan></h1>"
echo "<h2>Created with passion by <a href="http://chmod666.org" target="_blank">chmod666.org</a></h2>"
echo "<table class="container">"
echo "  <thead>"
echo "    <tr>"
echo "      <th><h1>type</h1></th>"
echo "      <th><h1>value</h1></th>"
echo "    </tr>"
echo "  </thead>"
echo "  <tbody>"
echo "    <tr>"
echo "      <td>hostname</td>"
echo "      <td>${hostname}</td>"
echo "    </tr>"
echo "    <tr>"
echo "      <td>uname</td>"
echo "      <td>${uname}</td>"
echo "    </tr>"
echo "    <tr>"
echo "      <td>ip</td>"
echo "      <td>${ip}</td>"
echo "    </tr>"
echo "    <tr>"
echo "      <td>date</td>"
echo "      <td>${date}</td>"
echo "    </tr>"
echo "    <tr>"
echo "      <td>httpd env</td>"
echo "      <td>SERVER_SOFTWARE:${SERVER_SOFTWARE},SERVER_NAME:${SERVER_NAME},SERVER_PROTOCOL:${SERVER_PROTOCOL}</td>"
echo "    </tr>"
echo "  </tbody>"
echo "</table>"
echo "  </tbody>"
echo "</table>"
echo "</body>"
echo "</html>"

Just to be sure the web application is working run this image on the worker node (without swarm):

# docker run -d -p 80:80 -v /nfs/docker/whoami/:/var/www/mysite --name httpd swarm1.chmod666.org:5000/ppc64le/httpd
a75095b23bc31715ac95d9bb57a7a161b06ef3e6a0f4eb4ed708cf60d03c0e5d
# curl localhost/whoami/
[..]
    
      hostname
      a75095b23bc3
    
    
      uname
      Linux a75095b23bc3 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
    
    
      ip
      172.17.0.2 
    
    
      date
      Wed Feb 22 14:33:59 UTC 2017
#  docker rm a75095b23bc3 -f
a75095b23bc3

We are now ready to create a swarm service with our application. Verify the swarm cluster health and create a service in global mode. The global mode means swarm will create one docker container per node.

node1(manager)# docker node ls
ID                           HOSTNAME                   STATUS  AVAILABILITY  MANAGER STATUS
813ompnl4c7f4ilkxqy0faj59 *  swarm1.chmod666.org        Ready   Active        Leader
bh7mhv3hg1x98b9j6lu00c3ef    swarm2.chmod666.org        Ready   Active
node1(manager)# docker service create --name whoami --mount type=bind,source=/nfs/docker/whoami/,destination=/var/www/mysite --mode global --publish 80:80 --network mynet  swarm1.chmod666.org:5000/ppc64le/httpd
7l8c4stcl3zgiijf6oe2hvu1r
node1(manager) # docker service ls
ID            NAME    REPLICAS  IMAGE                                         COMMAND
7l8c4stcl3zg  whoami  global    swarm1.chmod666.org:5000/ppc64le/httpd

Verify there is one container available on each swarm node:

node1(manager) docker service ps 7l8c4stcl3zg  
docker service ps 7l8c4stcl3zg
ID                         NAME        IMAGE                                         NODE                       DESIRED STATE  CURRENT STATE          ERROR
2sa543un5v4hpvwgouyorhndm  whoami      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 2 minutes ago
5061eogr8wimt9al6uss1wet2   \_ whoami  swarm2.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 2 minutes ago

I’m now accessing the webservice with both dns (swarm1 and swarm2) and I’m verifying I’m accessing a different container each time I’m doing an http resquest:

  • When access swarm1.chmod666.org (I’m seeing a docker hostname and ip)
  • node1

  • When access swarm2.chmod666.org (I’m seeing a docker hostname and ip different that the first one)
  • node2

You will now say: Ok that great !. But that’s not “redundant”. In fact it is because swarm is embedded with a very cool feature call swarm mesh routing. When you create service in the swarm cluster with the –publish option each swarm node will listen on this port, even the node on which the docker containers are not running, if you access any node on this port you will reach the container, by this I mean by accessing swarm1.chmod666.org you may reach the container running on swarm2.chmod666.org. When you will make another http request you can reach any of the containers running for this service. Let’s try creating a service with 10 replicas and access the same node over and over again.

node1(manager)# docker service create --name whoami --mount type=bind,source=/nfs/docker/whoami/,destination=/var/www/mysite --replicas 10 --publish 80:80 --network mynet  swarm1.chmod666.org:5000/ppc64le/httpd
el7nyiuga1vxtfgzktpfahucw
node1(manager)# docker service ls
ID            NAME    REPLICAS  IMAGE                                         COMMAND
el7nyiuga1vx  whoami  10/10     swarm1.chmod666.org:5000/ppc64le/httpd
node2(worker)# docker service ps el7nyiuga1vx
ID                         NAME          IMAGE                                         NODE                       DESIRED STATE  CURRENT STATE                ERROR
bed84pmdjy6c0758g3r52mmsq  whoami.1      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 46 seconds ago
dgdj4ygqdr476e156osk8dd95  whoami.2      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 46 seconds ago  
ba2ni51fo96eo6c4qfir90t7q  whoami.3      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 48 seconds ago
9qkwigxkrqje48do39ru3cv2h  whoami.4      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 40 seconds ago
3hgwwdly23ovafv1g0jvegu16  whoami.5      swarm1.chmod666.org:5000/ppc64le/httpd        swarm2.chmod666.org        Running        Running 43 seconds ago
0f3y844yqfbll2lmb954ro3cy  whoami.6      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 51 seconds ago
0955dz84rv4gpb4oqv8libahd  whoami.7      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 42 seconds ago
c05hrs9h0mm6ghxxdxc1afco9  whoami.8      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 50 seconds ago
03qcbiuxlk13p60we0ke6vqka  whoami.9      swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 54 seconds ago
0otgw4ncka81hlxgyt82z36zj  whoami.10     swarm1.chmod666.org:5000/ppc64le/httpd        swarm1.chmod666.org        Running        Running 48 seconds ago
node1(manager)# docker ps
# docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS                    NAMES
a25404371765        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 4 minutes        80/tcp                   whoami.7.0955dz84rv4gpb4oqv8libahd
07c38a306a68        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 4 minutes        80/tcp                   whoami.4.9qkwigxkrqje48do39ru3cv2h
e88a8c8a3639        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 5 minutes        80/tcp                   whoami.8.c05hrs9h0mm6ghxxdxc1afco9
f73a84cc6622        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 5 minutes        80/tcp                   whoami.1.bed84pmdjy6c0758g3r52mmsq
757be5ec73a4        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   5 minutes ago       Up 5 minutes        80/tcp                   whoami.3.ba2ni51fo96eo6c4qfir90t7q
51ad253616be        registry                                              "./entrypoint.sh /etc"   45 hours ago        Up 2 hours          0.0.0.0:5000->5000/tcp   registry
node2(worker)# docker ps
# docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED             STATUS              PORTS               NAMES
f015b0da7f2e        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.5.3hgwwdly23ovafv1g0jvegu16
4b7452245406        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.10.0otgw4ncka81hlxgyt82z36zj
71722a2d7f38        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.6.0f3y844yqfbll2lmb954ro3cy
01bc73d6fdf7        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.9.03qcbiuxlk13p60we0ke6vqka
438c0d553550        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   6 minutes ago       Up 5 minutes        80/tcp              whoami.2.dgdj4ygqdr476e156osk8dd95

Let’s now try accessing the service. I’m modifying my whoami.ksh just to print the information I need (the hostname).

cat /nfs/docker/whoami/whoami.ksh
#!/usr/bin/bash

hostname=$(hostname)
uname=$(uname -a)
ip=$(hostname -I)
date=$(date)
env=$(env)
echo ""
echo "hostname: ${hostname}"
echo "ip: ${ip}"
echo "uname:${uname}"
# for i in $(seq 1 10) ; do 
 for i in $(seq 1 10) ; do echo "[CALL $1]" ; curl -s http://swarm1.chmod666.org/whoami/ ; done
[CALL ]
hostname: f015b0da7f2e
ip: 10.255.0.14 10.255.0.2 172.18.0.7 10.0.0.12 10.0.0.2
uname:Linux f015b0da7f2e 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
[CALL ]
hostname: 4b7452245406
ip: 10.255.0.11 10.255.0.2 172.18.0.6 10.0.0.9 10.0.0.2
uname:Linux 4b7452245406 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
[CALL ]
hostname: 438c0d553550
ip: 10.0.0.5 10.0.0.2 172.18.0.4 10.255.0.7 10.255.0.2
uname:Linux 438c0d553550 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
[CALL ]
hostname: 71722a2d7f38
ip: 10.255.0.10 10.255.0.2 172.18.0.5 10.0.0.8 10.0.0.2
uname:Linux 71722a2d7f38 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
[CALL ]
hostname: 01bc73d6fdf7
ip: 10.255.0.6 10.255.0.2 172.18.0.3 10.0.0.4 10.0.0.2
uname:Linux 01bc73d6fdf7 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
[CALL ]
hostname: a25404371765
ip: 10.255.0.9 10.255.0.2 172.18.0.7 10.0.0.7 10.0.0.2
uname:Linux a25404371765 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
[CALL ]
hostname: 07c38a306a68
ip: 10.255.0.8 10.255.0.2 172.18.0.6 10.0.0.6 10.0.0.2
uname:Linux 07c38a306a68 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
[CALL ]
hostname: e88a8c8a3639
ip: 10.255.0.4 10.255.0.2 172.18.0.5 10.0.0.3 10.0.0.2
uname:Linux e88a8c8a3639 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
[CALL ]
hostname: f73a84cc6622
ip: 10.255.0.12 10.255.0.2 172.18.0.4 10.0.0.10 10.0.0.2
uname:Linux f73a84cc6622 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux
[CALL ]
hostname: 757be5ec73a4
ip: 10.255.0.13 10.255.0.2 172.18.0.3 10.0.0.11 10.0.0.2
uname:Linux 757be5ec73a4 3.10.0-327.el7.ppc64le #1 SMP Thu Oct 29 17:31:13 EDT 2015 ppc64le ppc64le ppc64le GNU/Linux

I'm here doing ten calls and I'm seeing that I'm reaching a different docker container on each call, I can see that by checking the hostname. It shows you that the routing mesh is correctly working.

HAproxy

To access the service for a single ip I’m installing an haproxy server on aonther host (an Ubuntu ppc64le host). I’m then modifying the configuration file my swarm nodes. The haproxy will check for the accessibility of the web application and will round robin the request between the two docker host. If one of the docker swarm node is failing all requests will be send to the remaining alive node.

# apt-get install haproxy
apt-get install haproxy
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
  liblua5.3-0
Suggested packages:
[..]
Setting up haproxy (1.6.3-1ubuntu0.1) ...
Processing triggers for libc-bin (2.23-0ubuntu5) ...
Processing triggers for systemd (229-4ubuntu13) ...
Processing triggers for ureadahead (0.100.0-19) ...
# cat /etc/haproxy.conf
frontend http_front
   bind *:80
   stats uri /haproxy?stats
   default_backend http_back

backend http_back
   balance roundrobin
   server swarm1.chmod666.org 10.10.10.48:80 check
   server swarm2.chmod666.org 10.10.10.49:80 check

I’m again changing the whoami.sh script just to print the hostname. Then from another host I’m running 10000 http request on the public ip of my haproxy server. I’m then counting how many request per containers were done. By doing this we can see two things. The haproxy service is correctly spreading the requests across each swarm nodev (I’m reaching ten different containers). The swarm mesh routing is working ok: all the request are almost equally spread among all the running containers. You can see the sessions spread in the haproxy stats page and in the curl example:

# /nfs/docker/whoami/whoami.sh
#!/usr/bin/bash

hostname=$(hostname)
uname=$(uname -a)
ip=$(hostname -I)
date=$(date)
env=$(env)
echo ""
echo "${hostname}"
# for i in $(seq 1 10000) ; do curl -s http://10.10.10.50/whoami/ ; done  | sort | uniq -c
    999 01bc73d6fdf7
   1003 07c38a306a68
    993 438c0d553550
    998 4b7452245406
   1006 71722a2d7f38
    996 757be5ec73a4
   1004 a25404371765
   1004 e88a8c8a3639
    995 f015b0da7f2e
   1002 f73a84cc6622

haproxy1

I’m finally shutting down one of the worker nodes. We can also see two things here. The service is created with 10 replicas. When can see here that shutting down one done results in the creation of 5 more containers on the other node. By checking the haproxy stats page we also see that one node is detected down and all the request will be send to the remaining one. We have our high available docker service (to be totally redundant we also need to be sure the haproxy is running on two different host with a “floating” ip (I’ll not explain this here):

# docker ps
CONTAINER ID        IMAGE                                                 COMMAND                  CREATED              STATUS              PORTS                    NAMES
82fe21465b96        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 29 seconds       80/tcp                   whoami.5.2d0t99pjide4w7nenzrribjph
71a4c51460ef        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 21 seconds       80/tcp                   whoami.9.5f9qkx6t47vvjt8b9k5jhj79h
5830f0696cca        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 32 seconds       80/tcp                   whoami.6.eso8uwhx6ij2we2iabmzx3tdu
dbc2b731c547        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 16 seconds       80/tcp                   whoami.2.8tc8zoxrpdell4f4d8zsr0rlw
050aacdf8126        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   About a minute ago   Up 23 seconds       80/tcp                   whoami.10.ej8ahxzzp8bw3pybc6fib17qh
a25404371765        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.7.0955dz84rv4gpb4oqv8libahd
07c38a306a68        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.4.9qkwigxkrqje48do39ru3cv2h
e88a8c8a3639        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.8.c05hrs9h0mm6ghxxdxc1afco9
f73a84cc6622        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.1.bed84pmdjy6c0758g3r52mmsq
757be5ec73a4        swarm1.chmod666.org:5000/ppc64le/httpd:latest   "/usr/sbin/httpd -D F"   2 hours ago          Up 2 hours          80/tcp                   whoami.3.ba2ni51fo96eo6c4qfir90t7q
51ad253616be        registry                                              "./entrypoint.sh /etc"   2 days ago           Up 4 hours          0.0.0.0:5000->5000/tcp   registry

haproxy2

Conclusion

docker_ascii

What we have reviewed in this blog post is pretty. The PowerSystem ecosystem is capable of doing the exact same thing as the x86 one. Everything here is proved. Powersystems are definitly ready to run Linux. The mighty RedHat and the incredible Ubuntu both provides a viable way to enter the world of DevOps on PowerSystems. We don’t need anymore to recompile everything or search for this or that package not available on Linux. The Ubuntu repository is huge. I was super impressed by the variety of packages available that are running on Power. A few days ago RedHat finally joined the OpenPower foundation and I can assure you that this is a big news. Maybe people are still not believing in the spreading of PowerSystems but things are slowy changing and with the first OpenPower servers running on Power9 I can assure you (at least I want to believe) that things will change. Regarding Docker I was/am a big x86 user of the solution, I’m running the blog and all my “personal” services on Docker and I have to recognize that ppc64le Linux distributions provides the exact same value as the x86. Hire me if you want to do such things (DevOps on Power). They’ll probably don’t want to do anything about Linux On Power in my company (I still have the faith as we have purchased 120 pairs of power sockets of Redhat ppc64le ;-) ;-) ).

Last words: sorry for not publishing more blog posts these days but I’m not living the best part of my life at work (nobody cares about what I’m doing, I’m just nothing …) and personally (different health problems for me and the people I love). Please accept my apologizes.

Unleash the true potential of SRIOV vNIC using vNIC failover !

I’m always working on tight schedule, I never have the time to write documentation because we’re moving fast, very fast … but not as fast as I want to ;-). A few months ago we were asked to put the TSM servers in our PowerVC environment I thought it was a very very bad idea to put a pet among the cattle as TSM servers are very specific and super I/O intensive in our environment (and are configured with plenty of rmt devices. This means that we tried to put lan-free stuffs into Openstack which is not designed at all for this kind of things). In my previous place we tried to put the TSM servers behind a virtualized environment (this means serving network through Shared Ethernet Adapters) and this was an EPIC FAIL. A few weeks after putting the servers in production we decided to move back to physical I/O and decided to used dedicated network adapters. As we didn’t want to make the same mistake in my current place we decided not to go on Shared Ethernet Adapters. Instead of that we took the decision to use SRIOV vNIC. SRIOV vNIC have the advantage to be fully virtualized (this means LPM aware and super flexible) allowing us to have the wanted flexibility (by moving TSM servers between sites if we feel the need to put a host in maintenance mode or if we are facing any kind of outage). In my previous blog post about vNIC I was very happy with the performance but not with the reliability. I didn’t want to go on NIB adapters for network redundancy (because it is an anti-virtualization way of doing things (we do not want to manage anything inside the VM, we want to let the virtualization environment do the job for us)). Lucky for me the project was reschedule to the end of the year and we finally took the decision not to put the TSM server into our big Openstack by dedicating some hosts for the backup stuffs. The latest version of PowerVM, HMC and firmware arrived just at time to let me use SRIOV vNIC failover new feature for this new TSM environment (fortunately for me we had some data center issues allowing me to wait enough time not to go on NIB and start the production directly with SRIOV vNIC \o/). I just have delivered the first four servers to my backup team yesterday and I must admit that SRIOV vNIC failover is a killer feature for this kind of things. Let’s now see how to setup this !

Prerequisites

As always using the latest features means you need to have everything up to date. In this case the minimal requierements for SRIOV vNIC failover are Virtual I/O Servers 2.2.5.10, Hardware Management Console v8R860 with the latest patchs and finally having a firmware up to date (ie. fw 860). Note that not all AIX versions are ok with SRIOV vNIC I’m here only using AIX 7.2 TL1 SP1:

  • Check the Virtual I/O Server are installed in 2.2.5.10:
  • # ioslevel
    2.2.5.10
    
  • Check the HMC is in the latest version (V8R860)
  • hscroot@myhmc:~> lshmc -V
    "version= Version: 8
     Release: 8.6.0
     Service Pack: 0
    HMC Build level 20161101.1
    MH01655: Required fix for HMC V8R8.6.0 (11-01-2016)
    ","base_version=V8R8.6.0
    "
    

    860

  • Check the firmware version is ok on the PowerSystem:
  • # updlic -o u -t sys -l latest -m reptilian-9119-MME-659707C -r mountpoint -d /home/hscroot/860_056/ -v
    # lslic -m reptilan-9119-MME-65BA46F -F activated_level,activated_spname
    56,FW860.10
    

    fw

What’s SRIOV vNIC failover and how it works ?

I’ll not explain here what’s an SRIOV vNIC, if you want to know more about it just check my previous blog post speaking about this topic A first look at SRIOV vNIC adapters. What’s failover is adding is a feature allowing you to add as “many” backing devices as you want for a vNIC adapter (the maximum is 6 backing devices). For each backing device you have the possibility to choose on which Virtual I/O Server will be created the corresponding vnicserver and set a failover priority to determine which backing device is active. Keep in mind that priorities are working the exact same way as it is with Shared Ethernet Adapter. This means that priority 10 is an higher priority than priority 20.

vnicvisio1

On the example shown on the images above and below the vNIC is configured with two backing devices (on two differents SRIOV adapters) with priority 10 and 20. As long as there is no outage (for instance on the Virtual I/O Server or on the adapter itself) the physical port utilized will be the one with priority 10. If the adapter has for instance an hardware issue we will have the possiblity to manually fallback on the second backing device or let the hypervisor do this for us by checking the next highest priority to choose the right backing device to use. Easy. This allow us to have redundant LPM aware and high performance adapters fully virtualized. A MUST :-) !

vnicvisio2

Creating a SRIOV vNIC failover using the HMC GUI and administrating it

To create or delete an SRIOV vNIC failover adapter (I’ll call this vNIC for the rest of the blog post) the machine must be shutdown or active (this is not possible to add a vNIC when a machine is booted in OpenFirmware). The only way to do this using the HMC GUI is to used the enhanced interface (no problem as we will have no other choice in a near future). Select the machine on which you want to create the adapter and click on the “Virtual NICs” tab.

vnic1b

Click “Add Virtual NIC”:

vnic1c

Chose the “Physical Port Location Code” (the physical port of the SRIOV adapter) on which you want to create the vNIC. You can add from one to six “backup adapter” (by clicking the “Add Entry” buton). This means that only one vNIC will be active at a moment. If this one is failing (adapter issue, network issue) the vNIC will failover to the next backup adapter depending on the “Failover priority”. Be careful to spread the hosting Virtual I/O Server to be sure that having a Virtual I/O Server down will be seamless for you partition:

vnic1d

On the example above:

  • I’m creating a vNIC failover with “vNIC Auto Priority Failover” enabled.
  • Four VF will be created two on the VIOS ending with 88, two on the VIOS ending with 89.
  • Obviously four vnicservers will be created on the VIOS (2 on each).
  • The lower priority will take the lead. This means That if the first one with priority 10 is failing the active adapter will be the second one. Then if the second one with priority 20 is failing the third one will be active and so on. Keep in my that if your lower priority is ok nothing will appends if one on the other backup adapter is failing. Be smart when choosing the priorities. As Yoda says “Wise you must be!”.
  • The physical ports are located on different CECs.

vnic1e

The “Advanced Virtual NIC Settings” is applied to all the vNIC that will be created (in the example above 4). For instance I’m using vlan tagging on these port so I just need to apply the “Port VLAN ID” one time.

vnic1f

You can choose or not to allow the hypervisor to perform the failover/fallback automatically depending on the priorities you have set. If you click “enable” the hypervisor will automatically failover to the next operational backing device depending on the priorities. If it is disabled only a user can trigger a failover operation.

vnic1g

Be careful the priorities are designed the same way they are on Shared Ethernet Adapter. This means the lowest number you will have in the failover priority will be the “highest priority failover” just like it is designed for Shared Ethernet Adapter. On the image below you can notice that the “priority” 10 which is the “highest failover priority” is active (but it is the lowest number between 10 20 30 and 40)

vnic1h

After the creation of the vNIC you can check differents stuffs on the Virtual I/O Server. You will notice that every entry added for the creation of the vNIC has a corresponding VF (virtual function) and a corresponding vnicserver (each vnicserver has a VF mapped on it):

  • You can see that for each entry added when creating a vNIC you’ll have the corresponding VF device present on the Virtual I/O Servers:
  • vios1# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C3-C1-T2-S5                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C3-C1-T2-S6                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
    vios2# lsdev -type adapter -field name physloc description | grep "VF"
    [..]
    ent3             U78CA.001.CSS08ZN-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    ent4             U78CA.001.CSS08EL-P1-C4-C1-T2-S2                                  PCIe3 4-Port 10GbE SR Adapter VF(df1028e21410e304)
    
  • For each VF you’ll see the corresponding vnicserver devices:
  • vios1# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
    vios2# lsdev -type adapter -virtual | grep vnicserver
    [..]
    vnicserver1      Available   Virtual NIC Server Device (vnicserver)
    vnicserver2      Available   Virtual NIC Server Device (vnicserver)
    
  • You can check the corresponding mapped VF for each vnicserver using the ‘lsmap’ command. You can check on funny thing: when the adapter was never “used” by using the “Make the backing Device Active” button in the GUI the corresponding client name and Client device will not be showed:
  • vios1# lsmap -all -vnic -fmt :
    [..]
    vnicserver1:U9119.MME.659707C-V2-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C3-C1-T2-S5:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V2-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C3-C1-T2-S6:N/A:U9119.MME.659707C-V6-C6
    
    vios2# lsmap -all -vnic
    [..]
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver1   U9119.MME.659707C-V1-C32898             6 N/A            N/A
    
    Backing device:ent3
    Status:Available
    Physloc:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2
    Client device name:ent0
    Client device physloc:U9119.MME.659707C-V6-C6
    
    Name          Physloc                            ClntID ClntName       ClntOS
    ------------- ---------------------------------- ------ -------------- -------
    vnicserver2   U9119.MME.659707C-V1-C32899             6 N/A            N/A
    
    Backing device:ent4
    Status:Available
    Physloc:U78CA.001.CSS08EL-P1-C4-C1-T2-S2
    Client device name:N/A
    Client device physloc:U9119.MME.659707C-V6-C6
    
  • You can activate the device by yourself just by clicking the “Make backing Device Active Button” in the GUI and check the vnicserver is now logged:
  • vnic1i
    vnic1j

    vios2# lsmap -all -vnic -vadapter
    [..]
    vnicserver1:U9119.MME.659707C-V1-C32898:6:lizard:AIX:ent3:Available:U78CA.001.CSS08ZN-P1-C4-C1-T2-S2:ent0:U9119.MME.659707C-V6-C6
    vnicserver2:U9119.MME.659707C-V1-C32899:6:N/A:N/A:ent4:Available:U78CA.001.CSS08EL-P1-C4-C1-T2-S2:N/A:U9119.MME.659707C-V6-C6
    
  • I noticed something pretty strange for me. When you are doing a manual failover of the vNIC the auto-priority will be set to disable. Remember to re-enable it after the manual operation was performed:
  • vnic1k

    You can also check the status and the priority of the vNIC in the Virtual I/O Server using the vnicstat command. Some good information are showed by the command, the state of the device, if it is active or not (I have noticed 2 different states in my test which are “active” (meaning this is the vf/vnicserver you are using) and “config_2″ meaning the adapter is ready and available for a failover operation (there is probably another state when the link is down but I didn’t had the time to ask my network team to shut a port to verify this)) and finally the failover priority. The vnicstat command is a root command.

    vios1#  vnicstat vnicserver1
    
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 10
    
    Client Partition ID: 6
    Client Partition Name: lizard
    Client Operating System: AIX
    Client Device Name: ent0
    Client Device Location Code: U9119.MME.659707C-V6-C6
    [..]
    
    vios2# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: config_2
    Backing Device Name: ent3
    
    Failover State: inactive
    Failover Readiness: operational
    Failover Priority: 20
    [..]
    

    You can also check vnic server events in this errpt (login when failover and so on …)

    # errpt | more
    8C577CB6   1202195216 I S vnicserver1    VNIC Transport Event
    60D73419   1202194816 I S vnicserver1    VNIC Client Login
    # errpt -aj 60D73419 | more
    ---------------------------------------------------------------------------
    LABEL:          VS_CLIENT_LOGIN
    IDENTIFIER:     60D73419
    
    Date/Time:       Fri Dec  2 19:48:06 2016
    Sequence Number: 10567
    Machine Id:      00C9707C4C00
    Node Id:         vios2
    Class:           S
    Type:            INFO
    WPAR:            Global
    Resource Name:   vnicserver1
    
    Description
    VNIC Client Login
    
    Probable Causes
    VNIC Client Login
    
    Failure Causes
    VNIC Client Login
    

    Same thing using the hmc command line.

    Now we will do the same thing in command line. I warn you the commands are pretty huge !!!!

    • List the sriov adapter (you will need those to create the vNICs):
    • # lshwres -r sriov --rsubtype adapter -m reptilian-9119-MME-65BA46F
      adapter_id=3,slot_id=21010012,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=4,slot_id=21010013,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08XH-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=1,slot_id=21010022,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C3-C1,phys_ports=4,sriov_status=running,alternate_config=0
      adapter_id=2,slot_id=21010023,adapter_max_logical_ports=64,config_state=sriov,functional_state=1,logical_ports=64,phys_loc=U78CA.001.CSS08RG-P1-C4-C1,phys_ports=4,sriov_status=running,alternate_config=0
      
    • List vNIC for virtual machine “lizard”:
    • lshwres -r virtualio  -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=0,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/3/0/2700c003/2.0/2.0/50,sriov/vios2/2/1/0/27004003/2.0/2.0/60","backing_device_states=sriov/2700c003/0/Operational,sriov/27004003/1/Operational"
      
    • Creates a vNIC with 2 backing devices first one on Virtual I/O Server 1 on adapter 1 on physical port 2 with a failover priority set to 10, second one on Virtual I/O Server 2 on adapter 3 on physical port 2 with a failover priority set to 20 (this vNIC will take the next available slot which will be 6) (WARNING: Physical port numbering starts from 0):
    • #chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o a -p lizard --rsubtype vnic -v -a 'port_vlan_id=3455,auto_priority_failover=1,backing_devices="sriov/vios1//1/1/2.0/10,sriov/vios1//3/1/2.0/20"'
      #lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational"
      
    • Add two backing devices (one on each vios on adapter 2 and 4, both on physical port 2 with failover priority set to 30 and 40) on vNIC with slot 6:
    • # chhwres -r virtualio -m reptilian-9119-MME-65BA46F -o s --rsubtype vnic -p lizard -s 6 -a '"backing_devices+=sriov/vios1//2/1/2.0/30,sriov/vios2//4/1/2.0/40"'
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/10,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Change the failover priority of logical port 2700400b of the vNIC in slot 6 to 11:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnicbkdev -p lizard -s 6 --logport 2700400b -a "failover_priority=11"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      
    • Make logical port 27008005 active on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o act --rsubtype vnicbkdev -p lizard  -s 6 --logport 27008005 
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=0,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/0/Operational,sriov/2700c008/0/Operational,sriov/27008005/1/Operational,sriov/27010002/0/Operational"
      
    • Re-enable automatic failover on vNIC in slot 6:
    • # chhwres -m reptilian-9119-MME-65BA46F -r virtualio -o s --rsubtype vnic -p lizard  -s 6 -a "auto_priority_failover=1"
      # lshwres -r virtualio -m reptilian-9119-MME-65BA46F --rsubtype vnic --level lpar --filter "lpar_names=lizard"
      lpar_name=lizard,lpar_id=6,slot_num=6,desired_mode=ded,curr_mode=ded,auto_priority_failover=1,port_vlan_id=3455,pvid_priority=0,allowed_vlan_ids=all,mac_addr=6ac53577b106,allowed_os_mac_addrs=all,"backing_devices=sriov/vios1/1/1/1/2700400b/2.0/2.0/11,sriov/vios2/2/3/1/2700c008/2.0/2.0/20,sriov/vios1/1/2/1/27008005/2.0/2.0/30,sriov/vios2/2/4/1/27010002/2.0/2.0/40","backing_device_states=sriov/2700400b/1/Operational,sriov/2700c008/0/Operational,sriov/27008005/0/Operational,sriov/27010002/0/Operational"
      

    Testing the failover.

    It’s now time to test is the failover is working as intended. The test will be super simple I will just shutoff one of the two Virtual I/O Server and check if I’m loosing some packets or not. I’m first checking on which VIOS is located the active adapter:

    vnic1l

    I now need to shutdown the Virtual I/O Server ending with 88 and check if the one ending with 89 is taking the lead:

    *****88# shutdown -force 
    

    Priorities 10 and 30 are on the shutted Virtual I/O Server, the highest priority is on the active Virtual I/O Server is 20. This backing device hosted on the second Virtual I/O Server is serving the network I/Os;

    vnic1m

    You can check the same thing with command line on the remaining Virtual I/O Server:

    *****89# errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    60D73419   1202214716 I S vnicserver0    VNIC Client Login
    60D73419   1202214716 I S vnicserver1    VNIC Client Login
    *****89# vnicstat vnicserver1
    --------------------------------------------------------------------------------
    VNIC Server Statistics: vnicserver1
    --------------------------------------------------------------------------------
    Device Statistics:
    ------------------
    State: active
    Backing Device Name: ent3
    
    Failover State: active
    Failover Readiness: operational
    Failover Priority: 20
    
    

    During my tests the failover was working as I expected. You can see on the picture below that during this test I only lost one ping between 64 and 66 during the failover/failback process.

    vnic1n

    In the partition I saw some messaging in the errpt during the failover:

    # errpt | mroe 
    4FB9389C   1202215816 I S ent0           VNIC Link Up
    F655DA07   1202215816 I S ent0           VNIC Link Down
    # errpt -a | more
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: DOWN   logical link: DOWN
    Status
    [..]
    SOURCE ADDRESS
    56FB 2DB8 A406
    Event
    physical link: UP   logical link: UP
    Status
    

    What about Live Partition Mobility.

    If you want a seamless LPM experience without having to choose the destination adapter and physical port on which to map you current vNIC backing devices on the destination, just fill the label and sublabel (most important is label) for each physical port of your SRIOV adapter. Then during the LPM if names are aligned between two systems the good physical port will be automatically chose depending on the names of the label:

    vnic1o
    vnic1p

    The LPM was working like a charm and I didn’t notice any particular problems during the move. vNIC failover and LPM are working ok as long as you take care of your SRIOV labels :-). I did notice on AIX 7.2 TL1 SP1 that there was no errpt messages in the partition itself but just in the Virtual I/O Server … weird :-)

    # errpt | more
    IDENTIFIER TIMESTAMP  T C RESOURCE_NAME  DESCRIPTION
    3EB09F5A   1202222416 I S Migration      Migration completed successfully
    

    Conlusion.

    No long story here. If you need performance AND flexibility you absolutely have to use SRIOV vNIC failover adapters. This feature offers you the best of two worlds having the possibility to dedicate 10GB adapters with a failover capability without having to be worried about LPM or about NIB configuration. It’s not applicable in all cases but it’s definitely something to have for an environment such as TSM or network I/O intensive workloads. Use it !

    About reptilians !

    Before you start reading this, keep your sense of humor and be noticed that what I say is not related to my workplace at all it’s a general way of thinking not especially based on my experience. Don’t be offended by this it’s just a personal opinion based on things I may or may have not seen during my life. You’ve been warned.

    This blog was never a place to share my opinions about life and society but I must admit that I should have done that before. Speaking about this kind of things makes you feel alive in world where everything needs to be ok and where you don’t have anymore the right to feel or express something about what you are living. There are couple of good blog posts speaking of this kind of things and related to the IT world. I agree with all of what is said in these posts. Some of the authors of these posts are just telling what they love in their daily jobs but I think it’s also a way to say what they probably won’t love in another one :-) :

    • Adam Leventhal’s “I’m not a resource”: here
    • Brendan Gregg’s “Working at Netflix in 2016″: here

    All of this to say that I work at nights, I work on weekends, I’m thinking about PowerSystems/computers when I fall asleep. I always have new ideas and I always want to learn new things, discover new technologies and features. I truly, deeply love this but being like this does not help me and will never help me in my daily job for one single reason. In this world people who have the knowledge are not people who are taking technical decisions it’s sad but true. I’m just good at working the most I can for the less money possible. Nobody cares if techs are happy, unhappy, want to stay or leave. I doesn’t make any differences for anyone driving a company. What’s important is money. Everything is meaningless. We are no one we are nothing, just number in a excel spreadsheet. I’m probably saying because I’m not good enough in anything to find an acceptable workplace. Once again sad but true.

    Even worst, if you just want to follow what’s the industry is asking you have to be everywhere and know everything. I know I’ll be forced in a very near future to move on Devops/ Linux (I love Linux I’m an RHCE certified engineer !). That’s why since a couple of years now, at night after my daily job is finished I’m working again: working to understand how Docker is working, working to install my own Openstack on my own machines, working to understand Saltstack, Ceph, Python, Ruby, Go …. it’s a never ending process. But it’s still not enough for them ! No enough to be consider as good or good enough guy to fit for a job. I remember being asked to know about Openstack, Cassandra, Hadoop, AWS, KVM, Linux, Automation tools (puppet this time), Docker and continuous integration for one single job application. First, I seriously doubt that someone will have such skills and be good at each. Second even if I’m an expert on each one if you have a look a few years ago it was the exact same thing but with different products. You have to understand and be good at every new products in minutes. All of this to understand that one or two years after you are considered as an “expert” you are bad at everything that exists in the industry. I’m really sick of this fight against something I can’t control. Being a hard worker and clever enough to understand every new features is not enough nowadays. On top of that you also need to be a beautiful person with a nice perfect smile wearing a perfect suit. You also have to be on LinkedIn and be connected with the good persons. And even if every of these boxes are checked you still need to be lucky enough to be at the right place at the right moment. I’m so sick of this. Work doesn’t pay. Only luck. I don’t want to live in this kind of world but I have to. Anyway this is just a “two-cents” way of thinking. Everything is probably a big trick orchestrated by this reptilians lizard mens ! ^^. Be good at what you do and don’t care about what people are thinking of you (even your horrible french accent during your sessions) … that’s the most important !

    picture-of-reptilian-alien

    Putting NovaLink in Production & more PowerVC (1.3.1.2) tips and tricks

    I’ve been quite busy and writing the blog is getting to be more and more difficult with the amount of work I have but I try to stick to my thing as writing these blogs posts is almost the only thing I can do properly in my whole life. So why do without ? As my place is one of the craziest place I have ever worked in -(for the good … and the bad (I’ll not talk here about how are the things organized here or how is the recognition of your work but be sure it is probably be one the main reason I’ll probably leave this place one day or another)- the PowerSystems growth is crazy and the number of AIX partitions we are managing with PowerVC never stops increasing and I think that we are one the biggest PowerVC customer in the whole world (I don’t know if it is a good thing or not). Just to give you a couple of examples we have here on the biggest Power Enterprise Pool I have ever seen (384 Power8 mobile cores), the number of partitions managed by PowerVC is around 2600 and we have a PowerVC managing almost 30 hosts. You have understand well … theses numbers are huge. It’s seems to be very funny, but it’s not ; the growth is problem, a technical problem and we are facing problems that most of you will never hit. I’m speaking about density and scalability. Hopefully for us the “vertical” design of PowerVC can now be replaced by what I call an “horizontal” design. Instead of putting all the nova instances on one single machine, we now have the possibility to spread the load on each host by using NovaLink. As we needed to solve these density and scalability problems we decided to move all the P8 hosts to NovaLink (this process is still ongoing but most of the engineering stuffs are already done). As you now know we are not deploying a host every year but generally a couple by month and that’s why we needed to find a solution to automate this. So this blog post will talk about all the things and the best practices I have learn using and implementing NovaLink in a huge production environment (automated installation, tips and tricks, post-install, migration and so on). But we will not stop here I’ll also talk about the new things I have learn about PowerVC (1.3.1.2 and 1.3.0.1) and give more tips and tricks to use the product as it best. Before going any further I’d first want to say a big thank you to the whole PowerVC team for their kindness and the precious time they gave to us to advise and educate the OpenStack noob I am. (A special thanks to Drew Thorstensen for the long discussions we had about Openstack and PowerVC. He is probably one the most passionate guy I have ever met at IBM).

    Novalink Automated installation

    I’ll not write big introduction, let’s work and let’s start with NovaLink and how to automate the Novalink installation process. Copy the content of the installation cdrom to a directory that can be served by an http server on your NIM server (I’m using my NIM server for the bootp and tftp part). Note that I’m doing this with a tar command because there are symbolic links in the iso and a simple cp will end up with a full filesystem.

    # loopmount -i ESD_-_PowerVM_NovaLink_V1.0.0.3_062016.iso -o "-V cdrfs -o ro" -m /mnt
    # tar cvf iso.tar /mnt/*
    # tar xvf ios.tar -C /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso
    # ls -l /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso
    total 320
    dr-xr-xr-x    2 root     system          256 Jul 28 17:54 .disk
    -r--r--r--    1 root     system          243 Apr 20 21:27 README.diskdefines
    -r--r--r--    1 root     system         3053 May 25 22:25 TRANS.TBL
    dr-xr-xr-x    3 root     system          256 Apr 20 11:59 boot
    dr-xr-xr-x    3 root     system          256 Apr 20 21:27 dists
    dr-xr-xr-x    3 root     system          256 Apr 20 21:27 doc
    dr-xr-xr-x    2 root     system         4096 Aug 09 15:59 install
    -r--r--r--    1 root     system       145981 Apr 20 21:34 md5sum.txt
    dr-xr-xr-x    2 root     system         4096 Apr 20 21:27 pics
    dr-xr-xr-x    3 root     system          256 Apr 20 21:27 pool
    dr-xr-xr-x    3 root     system          256 Apr 20 11:59 ppc
    dr-xr-xr-x    2 root     system          256 Apr 20 21:27 preseed
    dr-xr-xr-x    4 root     system          256 May 25 22:25 pvm
    lrwxrwxrwx    1 root     system            1 Aug 29 14:55 ubuntu -> .
    dr-xr-xr-x    3 root     system          256 May 25 22:25 vios
    

    Prepare the PowerVM NovaLink repository. The content of the repository can be found in the NovaLink iso image in pvm/repo/pvmrepo.tgz:

    # ls -l /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/pvm/repo/
    total 720192
    -r--r--r--    1 root     system          223 May 25 22:25 TRANS.TBL
    -rw-r--r--    1 root     system         2106 Sep 05 15:56 pvm-install.cfg
    -r--r--r--    1 root     system    368722592 May 25 22:25 pvmrepo.tgz
    

    Extract the content of this tgz file in a directory that can be served by the http server:

    # mkdir /export/nim/lpp_source/powervc/novalink/1.0.0.3/pvmrepo
    # cp /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/pvm/repo/pvmrepo.tgz
    # cd /export/nim/lpp_source/powervc/novalink/1.0.0.3/pvmrepo
    # gunzip pvmrepo.tgz
    # tar xvf pvmrepo.tar
    [..]
    x ./pool/non-free/p/pvm-core/pvm-core-dbg_1.0.0.3-160525-2192_ppc64el.deb, 54686380 bytes, 106810 media blocks.
    x ./pool/non-free/p/pvm-core/pvm-core_1.0.0.3-160525-2192_ppc64el.deb, 2244784 bytes, 4385 media blocks.
    x ./pool/non-free/p/pvm-core/pvm-core-dev_1.0.0.3-160525-2192_ppc64el.deb, 618378 bytes, 1208 media blocks.
    x ./pool/non-free/p/pvm-pkg-tools/pvm-pkg-tools_1.0.0.3-160525-492_ppc64el.deb, 170700 bytes, 334 media blocks.
    x ./pool/non-free/p/pvm-rest-server/pvm-rest-server_1.0.0.3-160524-2229_ppc64el.deb, 263084432 bytes, 513837 media blocks.
    # rm pvmrepo.tar 
    # ls -l 
    total 16
    drwxr-xr-x    2 root     system          256 Sep 11 13:26 conf
    drwxr-xr-x    2 root     system          256 Sep 11 13:26 db
    -rw-r--r--    1 root     system          203 May 26 02:19 distributions
    drwxr-xr-x    3 root     system          256 Sep 11 13:26 dists
    -rw-r--r--    1 root     system         3132 May 24 20:25 novalink-gpg-pub.key
    drwxr-xr-x    4 root     system          256 Sep 11 13:26 pool
    

    Copy the NovaLink boot files in a directory that can be served by your tftp server (I’m using /var/lib/tftpboot):

    # mkdir /var/lib/tftpboot
    # cp -r /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/pvm /var/lib/tftpboot
    # ls -l /var/lib/tftpboot
    total 1016
    -r--r--r--    1 root     system         1120 Jul 26 20:53 TRANS.TBL
    -r--r--r--    1 root     system       494072 Jul 26 20:53 core.elf
    -r--r--r--    1 root     system          856 Jul 26 21:18 grub.cfg
    -r--r--r--    1 root     system        12147 Jul 26 20:53 pvm-install-config.template
    dr-xr-xr-x    2 root     system          256 Jul 26 20:53 repo
    dr-xr-xr-x    2 root     system          256 Jul 26 20:53 rootfs
    -r--r--r--    1 root     system         2040 Jul 26 20:53 sample_grub.cfg
    

    I still don’t know why this is the case on AIX but the tftp server is searching for the grub.cfg in the root directory of your AIX system. It’s not the case for my RedHat Enterprise Linux installation but it’s the case for the NovaLink/Ubuntu installation. Copy the sample-grub.cfg to /grub.cfg and modify the content of the file:

    • As the gateway, netmask and nameserver will be provided the the pvm-install-config.cfg (the configuration file of the Novalink installer we will talk about this later) file comment those three lines.
    • The hostname will still be needed.
    • Modify the linux line and point to the vmlinux file provided in the NovaLink iso image.
    • Modify the live-installer to point to the filesystem.squashfs provided in the NovaLink iso image.
    • Modify the pvm-repo line to point to the pvm-repository directory we created before.
    • Modify the pvm-installer line to point to the NovaLink install configuration file (we will modify this one after).
    • Don’t do anything with the pvm-vios line as we are installing NovaLink on a system already having Virtual I/O Servers installed (I’m not installing Scale Out system but high end models only).
    • I’ll talk later about the pvm-disk line (this line is not by default in the pvm-install-config.template provided in the NovaLink iso image).
    # cp /var/lib/tftpboot/sample_grub.cfg /grub.cfg
    # cat /grub.cfg
    # Sample GRUB configuration for NovaLink network installation
    set default=0
    set timeout=10
    
    menuentry 'PowerVM NovaLink Install/Repair' {
     insmod http
     insmod tftp
     regexp -s 1:mac_pos1 -s 2:mac_pos2 -s 3:mac_pos3 -s 4:mac_pos4 -s 5:mac_pos5 -s 6:mac_pos6 '(..):(..):(..):(..):(..):(..)' ${net_default_mac}
     set bootif=01-${mac_pos1}-${mac_pos2}-${mac_pos3}-${mac_pos4}-${mac_pos5}-${mac_pos6}
     regexp -s 1:prefix '(.*)\.(\.*)' ${net_default_ip}
    # Setup variables with values from Grub's default variables
     set ip=${net_default_ip}
     set serveraddress=${net_default_server}
     set domain=${net_ofnet_network_domain}
    # If tftp is desired, replace http with tftp in the line below
     set root=http,${serveraddress}
    # Remove comment after providing the values below for
    # GATEWAY_ADDRESS, NETWORK_MASK, NAME_SERVER_IP_ADDRESS
    # set gateway=10.10.10.1
    # set netmask=255.255.255.0
    # set namserver=10.20.2.22
      set hostname=nova0696010
    # In this sample file, the directory novalink is assumed to exist on the
    # BOOTP server and has the NovaLink ISO content
     linux /export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/vmlinux \
     live-installer/net-image=http://${serveraddress}/export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/filesystem.squashfs \
     pkgsel/language-pack-patterns= \
     pkgsel/install-language-support=false \
     netcfg/disable_dhcp=true \
     netcfg/choose_interface=auto \
     netcfg/get_ipaddress=${ip} \
     netcfg/get_netmask=${netmask} \
     netcfg/get_gateway=${gateway} \
     netcfg/get_nameservers=${nameserver} \
     netcfg/get_hostname=${hostname} \
     netcfg/get_domain=${domain} \
     debian-installer/locale=en_US.UTF-8 \
     debian-installer/country=US \
    # The directory novalink-repo on the BOOTP server contains the content
    # of the pvmrepo.tgz file obtained from the pvm/repo directory on the
    # NovaLink ISO file.
    # The directory novalink-vios on the BOOTP server contains the files
    # needed to perform a NIM install of VIOS server(s)
    #  pvmdebug=1
     pvm-repo=http://${serveraddress}/export/nim/lpp_source/powervc/novalink/1.0.0.3/novalink-repo/ \
     pvm-installer-config=http://${serveraddress}/export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/pvm/repo/pvm-install.cfg \
     pvm-viosdir=http://${serveraddress}/novalink-vios \
     pvmdisk=/dev/mapper/mpatha \
     initrd /export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/install/netboot_initrd.gz
    }
    

    Modify the pvm-install.cfg, it’s the NovaLink installer configuration file. We just need to modify here the [SystemConfig],[NovaLinkGeneralSettings],[NovaLinkNetworkSettings],[NovaLinkAPTRepoConfig] and [NovaLinkAdminCredential]. My advice is to configure one NovaLink by hand (by doing an installation directly with the iso image, then after the installation your configuration file is saved in /var/log/pvm-install/novalink-install.cfg. You can copy this one as your template on your installation server. This file is filled by the answers you gave during the NovaLink installation)

    # more /export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/pvm/repo/pvm-install.cfg
    [SystemConfig]
    serialnumber = XXXXXXXX
    lmbsize = 256
    
    [NovaLinkGeneralSettings]
    ntpenabled = True
    ntpserver = timeserver1
    timezone = Europe/Paris
    
    [NovaLinkNetworkSettings]
    dhcpip = DISABLED
    ipaddress = YYYYYYYY
    gateway = ZZZZZZZZ
    netmask = 255.255.255.0
    dns1 = 8.8.8.8
    dns2 = 8.8.9.9
    hostname = WWWWWWWW
    domain = lab.chmod666.org
    
    [NovaLinkAPTRepoConfig]
    downloadprotocol = http
    mirrorhostname = nimserver
    mirrordirectory = /export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/
    mirrorproxy =
    
    [VIOSNIMServerConfig]
    novalink_private_ip = 192.168.128.1
    vios1_private_ip = 192.168.128.2
    vios2_private_ip = 192.168.128.3
    novalink_netmask = 255.255.128.0
    viosinstallprompt = False
    
    [NovaLinkAdminCredentials]
    username = padmin
    password = $6$N1hP6cJ32p17VMpQ$sdThvaGaR8Rj12SRtJsTSRyEUEhwPaVtCTvbdocW8cRzSQDglSbpS.jgKJpmz9L5SAv8qptgzUrHDCz5ureCS.
    userdescription = NovaLink System Administrator
    

    Finally modify the /etc/bootptab file and add a line matching your installation:

    # tail -1 /etc/bootptab
    nova0696010:bf=/var/lib/tftpboot/core.elf:ip=10.20.65.16:ht=ethernet:sa=10.255.228.37:gw=10.20.65.1:sm=255.255.255.0:
    

    Don’t forget to setup an http server, serving all the needed files. I know this configuration is super unsecured. But honestly I don’t care my NIM server is in a super secured network just accessible by the VIOS and NovaLink partition. So I’m good :-) :

    # cd /opt/freeware/etc/httpd/ 
    # grep -Ei "^Listen|^DocumentRoot" conf/httpd.conf
    Listen 80
    DocumentRoot "/"
    

    novaserved

    Instead of doing this over and over and over at every NovaLink installation I have written a custom script preparing my NovaLink installation file, what I do in this script is:

    • Preparing the pvm-install.cfg file.
    • Modifying the grub.cfg file.
    • Adding a line to the /etc/bootptab file.
    #  ./custnovainstall.ksh nova0696010 10.20.65.16 10.20.65.1 255.255.255.0
    #!/usr/bin/ksh
    
    novalinkname=$1
    novalinkip=$2
    novalinkgw=$3
    novalinknm=$4
    cfgfile=/export/nim/lpp_source/powervc/novalink/novalink-install.cfg
    desfile=/export/nim/lpp_source/powervc/novalink/1.0.0.3/mnt/pvm/repo/pvm-install.cfg
    grubcfg=/export/nim/lpp_source/powervc/novalink/grub.cfg
    grubdes=/grub.cfg
    
    echo "+--------------------------------------+"
    echo "NovaLink name: ${novalinkname}"
    echo "NovaLink IP: ${novalinkip}"
    echo "NovaLink GW: ${novalinkgw}"
    echo "NovaLink NM: ${novalinknm}"
    echo "+--------------------------------------+"
    echo "Cfg ref: ${cfgfile}"
    echo "Cfg file: ${cfgfile}.${novalinkname}"
    echo "+--------------------------------------+"
    
    typeset -u serialnumber
    serialnumber=$(echo ${novalinkname} | sed 's/nova//g')
    
    echo "SerNum: ${serialnumber}"
    
    cat ${cfgfile} | sed "s/serialnumber = XXXXXXXX/serialnumber = ${serialnumber}/g" | sed "s/ipaddress = YYYYYYYY/ipaddress = ${novalinkip}/g" | sed "s/gateway = ZZZZZZZZ/gateway = ${novalinkgw}
    /g" | sed "s/netmask = 255.255.255.0/netmask = ${novalinknm}/g" | sed "s/hostname = WWWWWWWW/hostname = ${novalinkname}/g" > ${cfgfile}.${novalinkname}
    cp ${cfgfile}.${novalinkname} ${desfile}
    cat ${grubcfg} | sed "s/  set hostname=WWWWWWWW/  set hostname=${novalinkname}/g" > ${grubcfg}.${novalinkname}
    cp ${grubcfg}.${novalinkname} ${grubdes}
    # nova1009425:bf=/var/lib/tftpboot/core.elf:ip=10.20.65.15:ht=ethernet:sa=10.255.248.37:gw=10.20.65.1:sm=255.255.255.0:
    echo "${novalinkname}:bf=/var/lib/tftpboot/core.elf:ip=${novalinkip}:ht=ethernet:sa=10.255.248.37:gw=${novalinkgw}:sm=${novalinknm}:" >> /etc/bootptab
    

    Novalink installation: vSCSI or NPIV ?

    NovaLink is not designed to be installed of top of NPIV it’s a fact. As it is designed to be installed on a totally new system without any Virtual I/O Servers configured the NovaLink installation is by default creating the Virtual I/O Servers and using these VIOS the installation process is creating backing devices on top of logical volumes created in the default VIOS storage pool. Then the Novalink installation partition is created on top of these two logical volumes and at the end mirrored. This is the way NovaLink is doing for Scale Out systems.

    For High End systems NovaLink is assuming your going to install the NovaLink partition on top of vSCSI (have personnaly tried with hdisk backed and SSP Logical Unit backed and both are working ok). For those like me who wants to install NovaLink on top of NPIV (I know this is not a good choice, but once again I was forced to do that) there still is a possiblity to do it. (In my humble opinion the NPIV design is done for high performance and the Novalink partition is not going to be an I/O intensive partition. Even worse our whole new design is based on NPIV for LPARs …. it’s a shame as NPIV is not a solution designed for high denstity and high scalability. Every PowerVM system administrator should remember this. NPIV IS NOT A GOOD CHOICE FOR DENSITY AND SCALABILITY USE IT FOR PERFORMANCE ONLY !!!. The story behind this is funny. I’m 100% sure that SSP is ten time a better choice to achieve density and scalability. I decided to open a poll on twitter asking this question “Will you choose SSP or NPIV to design a scalable AIX cloud based on PowerVC ?”. I was 100% sure SSP will win and made a bet with friend (I owe him beers now) that I’ll be right. What was my surprise when seeing the results. 90% of people vote for NPIV. I’m sorry to say that guys but there are two possibilities: 1/ You don’t really know what scalability and density means because you never faced it so that’s why you made the wrong choice. 2/ You know it and you’re just wrong :-) . This little story is another proof telling that IBM is not responsible about the dying of AIX and PowerVM … but unfortunately you are responsible of it not understanding that the only way to survive is to face high scalable solution like Linux is doing with Openstack and Ceph. It’s a fact. Period.)

    This said … if you are trying to install NovaLink on top of NPIV you’ll get an error. A workaround to this problem is to add the following line to the grub.cfg file

     pvmdisk=/dev/mapper/mpatha \
    

    If you do that you’ll be able to install NovaLink on your NPIV disk but still have an error the first time you’ll install it at the “grub-install step”. Just re-run the installation a second time and the grub-install command will work ok :-) (I’ll explain how to do to avoid this second issue later).

    One work-around to this second issue is to recreate the initrd by adding a line in the debian-installer config file.

    Fully automated installation by example

    • Here the core.elf file is downloaded by tftp. You can se in the capture below that the grub.cfg file is searched in / :
    • 1m
      13m

    • The installer is starting:
    • 2

    • The vmlinux is downloaded (http):
    • 3

    • The root.squashfs is downloaded (http):
    • 4m

    • The pvm-install.cfg configuration file is downloaded (http):
    • 5

    • pvm services are started. At this time if you are running in co-management mode you’ll see the Red lock in the HMC Server status:
    • 6

    • The Linux and Novalink istallation is ongoing:
    • 7
      8
      9
      10
      11
      12

    • System is ready:
    • 14

    Novalink code auto update

    When adding a NovaLink host to PowerVC the powervc packages coming from the powervc management host will be installed on the NovaLink partition. You can check this during the installation. Here is what’s going on when adding the NovaLink host to PowerVC:

    15
    16

    # cat /opt/ibm/powervc/log/powervc_install_2016-09-11-164205.log
    ################################################################################
    Starting the IBM PowerVC Novalink Installation on:
    2016-09-11T16:42:05+02:00
    ################################################################################
    
    LOG file is /opt/ibm/powervc/log/powervc_install_2016-09-11-164205.log
    
    2016-09-11T16:42:05.18+02:00 Installation directory is /opt/ibm/powervc
    2016-09-11T16:42:05.18+02:00 Installation source location is /tmp/powervc_img_temp_1473611916_1627713/powervc-1.3.1.2
    [..]
    Setting up python-neutron (10:8.0.0-201608161728.ibm.ubuntu1.375) ...
    Setting up neutron-common (10:8.0.0-201608161728.ibm.ubuntu1.375) ...
    Setting up neutron-plugin-ml2 (10:8.0.0-201608161728.ibm.ubuntu1.375) ...
    Setting up ibmpowervc-powervm-network (1.3.1.2) ...
    Setting up ibmpowervc-powervm-oslo (1.3.1.2) ...
    Setting up ibmpowervc-powervm-ras (1.3.1.2) ...
    Setting up ibmpowervc-powervm (1.3.1.2) ...
    W: --force-yes is deprecated, use one of the options starting with --allow instead.
    
    ***************************************************************************
    IBM PowerVC Novalink installation
     successfully completed at 2016-09-11T17:02:30+02:00.
     Refer to
     /opt/ibm/powervc/log/powervc_install_2016-09-11-165617.log
     for more details.
    ***************************************************************************
    

    17

    Installing the missing deb packages if NovaLink host was added before PowerVC upgrade

    If the NovaLink host was added in PowerVC 1.3.1.1 and you updated to PowerVC 1.3.1.2 you have to update the package by hand because there is a little bug during the update of some packages:

    • From the PowerVC management host copy the latest packages to the NovaLink host:
    • # scp /opt/ibm/powervc/images/powervm/powervc-powervm-compute-1.3.1.2.tgz padmin@nova0696010:~
      padmin@nova0696010's password:
      powervc-powervm-compute-1.3.1.2.tgz
      
    • Update the packages on the NovaLink host
    • # tar xvzf powervc-powervm-compute-1.3.1.2.tgz
      # cd powervc-1.3.1.2/packages/powervm
      # dpkg -i nova-powervm_2.0.3-160816-48_all.deb
      # dpkg -i networking-powervm_2.0.1-160816-6_all.deb
      # dpkg -i ceilometer-powervm_2.0.1-160816-17_all.deb
      # /opt/ibm/powervc/bin/powervc-services restart
      

    rsct and pvm deb update

    Never forget to install latest rsct and pvm packages after the installation. You can clone the official IBM repository for pvm and rsct files (you can check my previous post about Novalink for more details about cloning the repository). Then create two files in /etc/apt/sources.list.d one for pvm, the other for rsct

    # vi /etc/apt/sources.list.d/pvm.list
    deb http://nimserver/export/nim/lpp_source/powervc/novalink/nova/debian novalink_1.0.0 non-free
    # vi /etc/apt/source.list.d/rsct.list
    deb http://nimserver/export/nim/lpp_source/powervc/novalink/rsct/ubuntu xenial main
    # dpkg -l | grep -i rsct
    ii  rsct.basic                                3.2.1.0-15300                           ppc64el      Reliable Scalable Cluster Technology - Basic
    ii  rsct.core                                 3.2.1.3-16106-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Core
    ii  rsct.core.utils                           3.2.1.3-16106-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Utilities
    # # dpkg -l | grep -i pvm
    ii  pvm-cli                                   1.0.0.3-160516-1488                     all          Power VM Command Line Interface
    ii  pvm-core                                  1.0.0.3-160525-2192                     ppc64el      PVM core runtime package
    ii  pvm-novalink                              1.0.0.3-160525-1000                     ppc64el      Meta package for all PowerVM Novalink packages
    ii  pvm-rest-app                              1.0.0.3-160524-2229                     ppc64el      The PowerVM NovaLink REST API Application
    ii  pvm-rest-server                           1.0.0.3-160524-2229                     ppc64el      Holds the basic installation of the REST WebServer (Websphere Liberty Profile) for PowerVM NovaLink 
    # apt-get install rsct.core rsct.basic
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following packages were automatically installed and are no longer required:
      docutils-common libpaper-utils libpaper1 python-docutils python-roman
    Use 'apt autoremove' to remove them.
    The following additional packages will be installed:
      rsct.core.utils src
    The following packages will be upgraded:
      rsct.core rsct.core.utils src
    3 upgraded, 0 newly installed, 0 to remove and 6 not upgraded.
    Need to get 9,356 kB of archives.
    After this operation, 548 kB disk space will be freed.
    [..]
    # apt-get install pvm-novalink
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    The following packages were automatically installed and are no longer required:
      docutils-common libpaper-utils libpaper1 python-docutils python-roman
    Use 'apt autoremove' to remove them.
    The following additional packages will be installed:
      pvm-core pvm-rest-app pvm-rest-server pypowervm
    The following packages will be upgraded:
      pvm-core pvm-novalink pvm-rest-app pvm-rest-server pypowervm
    5 upgraded, 0 newly installed, 0 to remove and 1 not upgraded.
    Need to get 287 MB of archives.
    After this operation, 203 kB of additional disk space will be used.
    Do you want to continue? [Y/n] Y
    [..]
    

    After the installation, here is what you should have if everything was updated properly:

    dpkg -l | grep rsct
    ii  rsct.basic                                3.2.1.4-16154-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Basic
    ii  rsct.core                                 3.2.1.4-16154-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Core
    ii  rsct.core.utils                           3.2.1.4-16154-1ubuntu1                  ppc64el      Reliable Scalable Cluster Technology - Utilities
    dpkg -l | grep pvm
    ii  pvm-cli                                   1.0.0.3-160516-1488                     all          Power VM Command Line Interface
    ii  pvm-core                                  1.0.0.3.1-160713-2441                   ppc64el      PVM core runtime package
    ii  pvm-novalink                              1.0.0.3.1-160714-1152                   ppc64el      Meta package for all PowerVM Novalink packages
    ii  pvm-rest-app                              1.0.0.3.1-160713-2417                   ppc64el      The PowerVM NovaLink REST API Application
    ii  pvm-rest-server                           1.0.0.3.1-160713-2417                   ppc64el      Holds the basic installation of the REST WebServer (Websphere Liberty Profile) for PowerVM NovaLink
    

    Novalink post-installation (my ansible way to do that)

    You all now know that I’m not very fond of doing the same things over and over again, that’s why I have create an ansible post-install playbook especially for NovaLink post installation. You can download it here: nova_ansible. Then install ansible on a host that has an ssh access to all your NovaLink partitions and run the the ansible playbook:

    • Untar the ansible playbook:
    • # mkdir /srv/ansible
      # cd /srv/ansible
      # tar xvf novalink_ansible.tar 
      
    • Modify the group_vars/novalink.yml to fit your environment:
    • # cat group_vars/novalink.yml
      ntpservers:
        - ntpserver1
        - ntpserver2
      dnsservers:
        - 8.8.8.8
        - 8.8.9.9
      dnssearch:
        - lab.chmod666.org
      vepa_iface: ibmveth6
      repo: nimserver
      
    • Share root ssh key to the NovaLink host (be careful by default NovaLink does not allow root login you have to modify the sshd configuration file):
    • Put all your Novalink hosts into the inventory file:
    • #cat inventories/hosts.novalink
      [novalink]
      nova65a0cab
      nova65ff4cd
      nova10094ef
      nova06960ab
      
    • Run ansible-playbook and you’re done:
    • # ansible-playbook -i inventories/hosts.novalink site.yml
      

      ansible1
      ansible2
      ansible3

    More details about NovaLink

    MGMTSWITCH vswitch automatic creation

    Do not try to create the MGMTSWITCH by yourself. The NovaLink installer is doing it for you. As my Virtual I/O Servers are installed using the IBM Provisioning Toolkit for PowerVM … I was creating the MGMTSWITCH at this time but I was wrong. You can see this in the file /var/log/pvm-install/pvminstall.log on the NovaLink partition:

    # cat /var/log/pvm-install/pvminstall.log
    Fri Aug 12 17:26:07 UTC 2016: PVMDebug = 0
    Fri Aug 12 17:26:07 UTC 2016: Running initEnv
    [..]
    Fri Aug 12 17:27:08 UTC 2016: Using user provided pvm-install configuration file
    Fri Aug 12 17:27:08 UTC 2016: Auto Install set
    [..]
    Fri Aug 12 17:27:44 UTC 2016: Auto Install = 1
    Fri Aug 12 17:27:44 UTC 2016: Validating configuration file
    Fri Aug 12 17:27:44 UTC 2016: Initializing private network configuration
    Fri Aug 12 17:27:45 UTC 2016: Running /opt/ibm/pvm-install/bin/switchnetworkcfg -o c
    Fri Aug 12 17:27:46 UTC 2016: Running /opt/ibm/pvm-install/bin/switchnetworkcfg -o n -i 3 -n MGMTSWITCH -p 4094 -t 1
    Fri Aug 12 17:27:49 UTC 2016: Start setupinstalldisk operation for /dev/mapper/mpatha
    Fri Aug 12 17:27:49 UTC 2016: Running updatedebconf
    Fri Aug 12 17:56:06 UTC 2016: Pre-seeding disk recipe
    

    NPIV lpar creation problem !

    As you know my environment is crazy. Every lpar we are creating have 4 virtual fibre channels adapters. Obviously two on fabric A and two on fabric B. And obviously again each fabric must be present on each Virtual I/O Servers. So to sum up. An lpar must have access to fabric A and B using VIOS1 and to fabric A and B using VIOS2. Unfortunately there was a little bug in the current NovaLink (1.0.0.3) code and all the lpar created were created with only two adapters. The PowerVC team gave my a patch to handle this particular issue patching the npiv.py file. This patch needs to be installed on the NovaLink partition itself.:

    # cd /usr/lib/python2.7/dist-packages/powervc_nova/virt/ibmpowervm/pvm/volume
    # sdiff npiv.py.back npiv.bck
    

    npivpb

    I’m intentionally not giving you the solution here (just by copying/pasting code) because an issue is addressed and an APAR has been opened for this issue and is resolved in 1.3.1.2 version. IT16534

    From NovaLink to HMC …. and the opposite

    One of the challenge for me was to be sure everything was working ok regarding LPM and NovaLink. So I decided to test different cases:

    • From NovaLink host to Novalink host (didn’t had any trouble) :-)
    • From NovaLink host to HMC host (didn’t had any trouble) :-)
    • From HMC host to Novalink host (had a trouble) :-(

    Once again this issue avoiding HMC to Novalink LPM to work correctly is related to storage. A patch is ongoing but let me explain this issue a little bit (only if you have to absolutely move an LPAR from HMC to NovaLink and your are in the same case as I am):

    PowerVC is not correctly doing the mapping to the destination Virtual I/O Servers and is trying to map two times the fabric A on the VIOS1 and two time the fabric B on the VIOS2. Hopefully for us you can do the migration by hand :

    • Do the LPM operation from PowerVC and check on the HMC side how PowerVC is doing the mapping (log on the HMC to check this):
    • #  lssvcevents -t console -d 0 | grep powervc_admin | grep migrlpar
      time=08/31/2016 18:53:27,"text=HSCE2124 User name powervc_admin: migrlpar -m 9119-MME-656C38A -t 9119-MME-65A0C31 --id 18 --ip 10.22.33.198 -u wlp -i ""virtual_fc_mappings=6/vios1/2//fcs2,3/vios2/1//fcs2,4/vios2/1//fcs1,5/vios1/2//fcs1"",shared_proc_pool_id=0 -o m command failed."
      
    • One interesting point you can see here is that the NovaLink user used for LPM is not padmin but wlp. Have look on the Novalink machine if you are a little bit curious:
    • 18

    • If you are double checking the mapping you’ll see that PowerVC is mixing up the VIOS. Just rerun the command in the right order and you’ll see that you’re going to be able to do HMC to NovaLink LPM (By the way PowerVC is automattically detecting that the host has changed for this lpar (moved outside of PowerVC)):
    • # migrlpar -m 9119-MME-656C38A -t 9119-MME-65A0C31 --id 18 --ip 10.22.33.198 -u wlp -i '"virtual_fc_mappings=6/vios2/1//fcs2,3/vios1/2//fcs2,4/vios2/1//fcs1,5/vios1/2//fcs1"',shared_proc_pool_id=0 -o m
      # lssvcevents -t console -d 0 | grep powervc_admin | grep migrlpar
      time=08/31/2016 19:13:00,"text=HSCE2123 User name powervc_admin: migrlpar -m 9119-MME-656C38A -t 9119-MME-65A0C31 --id 18 --ip 10.22.33.198 -u wlp -i ""virtual_fc_mappings=6/vios2/1//fcs2,3/vios1/2//fcs2,4/vios2/1//fcs1,5/vios1/2//fcs1"",shared_proc_pool_id=0 -o m command was executed successfully."
      
    hmctonova

    One more time don't worry about this issue a patch is on the way. But I thought it was interessting to talk about it just to show you how PowerVC is handling this (user, key sharing, check on the HMC).

    Deep dive into the initrd

    I am curious and there is no way to change this. As I wanted to know how the NovaLink installer is working I had to check into the netboot_initrd.gz file. There are a lot of interesting stuff to check in this initrd. Run the commands below on a Linux partition if you also want to have a look:

    # scp nimdy:/export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/netboot_initrd.gz .
    # gunzip netboot_initrd
    # cpio -i < netboot_initrd
    185892 blocks
    

    The installer is located in opt/ibm/pvm-install:

    # ls opt/ibm/pvm-install/data/
    40mirror.pvm  debpkgs.txt  license.txt  nimclient.info  pvm-install-config.template  pvm-install-preseed.cfg  rsct-gpg-pub.key  vios_diagram.txt
    # ls opt/ibm/pvm-install/bin
    assignio.py        envsetup        installpvm                    monitor        postProcessing    pvmwizardmain.py  restore.py        switchnetworkcfg  vios
    cfgviosnetwork.py  functions       installPVMPartitionWizard.py  network        procmem           recovery          setupinstalldisk  updatedebconf     vioscfg
    chviospasswd       getnetworkinfo  ioadapter                     networkbridge  pvmconfigdata.py  removemem         setupviosinstall  updatenimsetup    welcome.py
    editpvmconfig      initEnv         mirror                        nimscript      pvmtime           resetsystem       summary.py        user              wizpkg
    

    You can for instance check what's the installer is exactly doing. Let's take again the exemple of the MGMTSWITCH creation, you can see in the output below that I was right saying that:

    initrd1

    Remember that I was telling you before that I had problem with installation on NPIV. You can avoid installing NovaLink two times by modifying the debian installer directly in the initrd by adding a line in the debian installer file opt/ibm/pvm-install/data/pvm-install-preseed.cfg (you have to rebuild the initrd after doing this) :

    # grep bootdev opt/ibm/pvm-install/data/pvm-install-preseed.cfg
    d-i grub-installer/bootdev string /dev/mapper/mpatha
    # find | cpio -H newc -o > ../new_initrd_file
    # gzip -9 ../new_initrd_file
    # scp ../new_initrdfile.gz nimdy:/export/nim/lpp_source/powervc/novalink/1.0.0.3/iso/install/netboot_initrd.gz
    

    You can also find good example here of pvmctl commands:

    # grep -R pvmctl *
    pvmctl lv create --size $LV_SIZE --name $LV_NAME -p id=$vid
    pvmctl scsi create --type lv --vg name=rootvg --lpar id=1 -p id=$vid --stor-id name=$LV_NAME
    

    Troubleshooting

    NovaLink is not PowerVC so here is a little reminder of what I do to troubleshot Novalink:

    • Installation troubleshooting:
    • #cat /var/log/pvm-install/pvminstall.log
      
    • Neutron Agent log (always double check this one):
    • # cat /var/log/neutron/neutron-powervc-pvm-sea-agent.log
      
    • Nova logs for this host are not accessible on the PowerVC management host anymore, so check it on the NovaLink partition if needed:
    • # cat /var/log/nova/nova-compute.log
      
    • pvmctl logs:
    • # cat /var/log/pvm/pvmctl.log
      

    One last thing to add about NovaLink. One thing I like a lot is that Novalink is doing backups of the system and VIOS hourly/daily. These backup are stored in /var/backup/pvm :

    # crontab -l
    # VIOS hourly backups - at 15 past every hour except for midnight
    15 1-23 * * * /usr/sbin/pvm-backup --type vios --frequency hourly
    # Hypervisor hourly backups - at 15 past every hour except for midnight
    15 1-23 * * * /usr/sbin/pvm-backup --type system --frequency hourly
    # VIOS daily backups - at 15 past midnight
    15 0    * * * /usr/sbin/pvm-backup --type vios --frequency daily
    # Hypervisor daily backups - at 15 past midnight
    15 0    * * * /usr/sbin/pvm-backup --type system --frequency daily
    #ls -l /var/backups/pvm
    total 4
    drwxr-xr-x 2 root pvm_admin 4096 Sep  9 00:15 9119-MME*0265FF47B
    

    More PowerVC tips and tricks

    Let's finish this blog post with more PowerVC tips and tricks. Before giving you the tricks I have to warn you. All of these tricks are not supported by PowerVC, use them at your own risk OR contact your support before doing anything else. You may break and destroy everything if you are not aware of what you are doing. So please be very careful using all these tricks. YOU HAVE BEEN WARNED !!!!!!

    Accessing and querying the database

    This first trick is funny and will allow you to query and modify the PowerVC database. Once again do this a your own risks. One of the issue I had was strange. I do not remeber how it happends exactly but some of my luns that were not attached to any hosts and were still showing an attachmenent number equals to 1 and I didn't had the possibility to remove it. Even worse someone has deleted these luns on the SVC side. So these luns were what I called "ghost lun". Non existing but non-deletable luns. (I had also to remove the storage provider related to these luns). The only way to change this was to change the state to detached directly in the cinder database. Be careful this trick is only working with MariaDB.

    First get the database password. Get the encrypted password from /opt/ibm/powervc/data/powervc-db.conf file and decode it to have the clear password:

    # grep ^db_password /opt/ibm/powervc/data/powervc-db.conf
    db_password = aes-ctr:NjM2ODM5MjM0NTAzMTg4MzQzNzrQZWi+mrUC+HYj9Mxi5fQp1XyCXA==
    # python -c "from powervc_keystone.encrypthandler import EncryptHandler; print EncryptHandler().decode('aes-ctr:NjM2ODM5MjM0NTAzMTg4MzQzNzrQZWi+mrUC+HYj9Mxi5fQp1XyCXA==')"
    OhnhBBS_gvbCcqHVfx2N
    # mysql -u root -p cinder
    Enter password:
    MariaDB [cinder]> MariaDB [cinder]> show tables;
    +----------------------------+
    | Tables_in_cinder           |
    +----------------------------+
    | backups                    |
    | cgsnapshots                |
    | consistencygroups          |
    | driver_initiator_data      |
    | encryption                 |
    [..]
    

    Then get the lun uuid on the PowerVC gui for the lun you want to change, and follow the commands below:

    dummy

    MariaDB [cinder]> select * from volume_attachment where volume_id='9cf6d85a-3edd-4ab7-b797-577ff6566f78' \G
    *************************** 1. row ***************************
       created_at: 2016-05-26 08:52:51
       updated_at: 2016-05-26 08:54:23
       deleted_at: 2016-05-26 08:54:23
          deleted: 1
               id: ce4238b5-ea39-4ce1-9ae7-6e305dd506b1
        volume_id: 9cf6d85a-3edd-4ab7-b797-577ff6566f78
    attached_host: NULL
    instance_uuid: 44c7a72c-610c-4af1-a3ed-9476746841ab
       mountpoint: /dev/sdb
      attach_time: 2016-05-26 08:52:51
      detach_time: 2016-05-26 08:54:23
      attach_mode: rw
    attach_status: attached
    1 row in set (0.01 sec)
    MariaDB [cinder]> select * from volumes where id='9cf6d85a-3edd-4ab7-b797-577ff6566f78' \G
    *************************** 1. row ***************************
                     created_at: 2016-05-26 08:51:57
                     updated_at: 2016-05-26 08:54:23
                     deleted_at: NULL
                        deleted: 0
                             id: 9cf6d85a-3edd-4ab7-b797-577ff6566f78
                         ec2_id: NULL
                        user_id: 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9
                     project_id: 1471acf124a0479c8d525aa79b2582d0
                           host: pb01_mn_svc_qual
                           size: 1
              availability_zone: nova
                         status: available
                  attach_status: attached
                   scheduled_at: 2016-05-26 08:51:57
                    launched_at: 2016-05-26 08:51:59
                  terminated_at: NULL
                   display_name: dummy
            display_description: NULL
              provider_location: NULL
                  provider_auth: NULL
                    snapshot_id: NULL
                 volume_type_id: e49e9cc3-efc3-4e7e-bcb9-0291ad28df42
                   source_volid: NULL
                       bootable: 0
              provider_geometry: NULL
                       _name_id: NULL
              encryption_key_id: NULL
               migration_status: NULL
             replication_status: disabled
    replication_extended_status: NULL
        replication_driver_data: NULL
            consistencygroup_id: NULL
                    provider_id: NULL
                    multiattach: 0
                previous_status: NULL
    1 row in set (0.00 sec)
    MariaDB [cinder]> update volume_attachment set attach_status='detached' where volume_id='9cf6d85a-3edd-4ab7-b797-577ff6566f78';
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    MariaDB [cinder]> update volumes set attach_status='detached' where id='9cf6d85a-3edd-4ab7-b797-577ff6566f78';
    Query OK, 1 row affected (0.00 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    

    The second issue I had was about having some machines in deleted state but the reality was that the HMC just rebooted and for an unknow reason these machines where seen as 'deleted' .. but they were not. Using this trick I was able to force a re-evalutation of each machine is this case:

    #  mysql -u root -p nova
    Enter password:
    MariaDB [nova]> select * from instance_health_status where health_state='WARNING';
    +---------------------+---------------------+------------+---------+--------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+
    | created_at          | updated_at          | deleted_at | deleted | id                                   | health_state | reason                                                                                                                                                                                                                | unknown_reason_details |
    +---------------------+---------------------+------------+---------+--------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+
    | 2016-07-11 08:58:37 | NULL                | NULL       |       0 | 1af1805c-bb59-4bc9-8b6d-adeaeb4250f3 | WARNING      | [{"resource_local": "server", "display_name": "p00ww6754398", "resource_property_key": "rmc_state", "resource_property_value": "initializing", "resource_id": "1af1805c-bb59-4bc9-8b6d-adeaeb4250f3"}]                |                        |
    | 2015-07-31 16:53:50 | 2015-07-31 18:49:50 | NULL       |       0 | 2668e808-10a1-425f-a272-6b052584557d | WARNING      | [{"resource_local": "server", "display_name": "multi-vol", "resource_property_key": "vm_state", "resource_property_value": "deleted", "resource_id": "2668e808-10a1-425f-a272-6b052584557d"}]                         |                        |
    | 2015-08-03 11:22:38 | 2015-08-03 15:47:41 | NULL       |       0 | 2934fb36-5d91-48cd-96de-8c16459c50f3 | WARNING      | [{"resource_local": "server", "display_name": "clouddev-test-754df319-00000038", "resource_property_key": "rmc_state", "resource_property_value": "inactive", "resource_id": "2934fb36-5d91-48cd-96de-8c16459c50f3"}] |                        |
    | 2016-07-11 09:03:59 | NULL                | NULL       |       0 | 3fc42502-856b-46a5-9c36-3d0864d6aa4c | WARNING      | [{"resource_local": "server", "display_name": "p00ww3254401", "resource_property_key": "rmc_state", "resource_property_value": "initializing", "resource_id": "3fc42502-856b-46a5-9c36-3d0864d6aa4c"}]                |                        |
    | 2015-07-08 20:11:48 | 2015-07-08 20:14:09 | NULL       |       0 | 54d02c60-bd0e-4f34-9cb6-9c0a0b366873 | WARNING      | [{"resource_local": "server", "display_name": "p00wb3740870", "resource_property_key": "rmc_state", "resource_property_value": "inactive", "resource_id": "54d02c60-bd0e-4f34-9cb6-9c0a0b366873"}]                    |                        |
    | 2015-07-31 17:44:16 | 2015-07-31 18:49:50 | NULL       |       0 | d5ec2a9c-221b-44c0-8573-d8e3695a8dd7 | WARNING      | [{"resource_local": "server", "display_name": "multi-vol-sp5", "resource_property_key": "vm_state", "resource_property_value": "deleted", "resource_id": "d5ec2a9c-221b-44c0-8573-d8e3695a8dd7"}]                     |                        |
    +---------------------+---------------------+------------+---------+--------------------------------------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------+
    6 rows in set (0.00 sec)
    MariaDB [nova]> update instance_health_status set health_state='PENDING',reason='' where health_state='WARNING';
    Query OK, 6 rows affected (0.00 sec)
    Rows matched: 6  Changed: 6  Warnings: 0
    

    pending

    The ceilometer issue

    When updating from PowerVC 1.3.0.1 to 1.3.1.1 PowerVC is changing the database backend from DB2 to MariaDB. This is a good thing but the way the update is done is by exporting all the data in flat files and then re-inserting it in the MariaDB database records per records. I had a huge problem because of this, just because my ceilodb base was huge because of the number of machines I had and the number of operations we run on PowerVC since it is in production. The DB insert took more than 3 days and never finish. If you don't need the ceilo data my advice is to change the retention from 270 days y default to 2 hours:

    # powervc-config metering event_ttl --set 2 --unit hr 
    # ceilometer-expirer --config-file /etc/ceilometer/ceilometer.conf
    

    If this is not enough an you still experiencing problems regarding the update the best way is to flush the entire table before the update:

    # /opt/ibm/powervc/bin/powervc-services stop
    # /opt/ibm/powervc/bin/powervc-services db2 start
    # /bin/su - pwrvcdb -c "db2 drop database ceilodb2"
    # /bin/su - pwrvcdb -c "db2 CREATE DATABASE ceilodb2 AUTOMATIC STORAGE YES ON /home/pwrvcdb DBPATH ON /home/pwrvcdb USING CODESET UTF-8 TERRITORY US COLLATE USING SYSTEM PAGESIZE 16384 RESTRICTIVE"
    # /bin/su - pwrvcdb -c "db2 connect to ceilodb2 ; db2 grant dbadm on database to user ceilometer"
    # /opt/ibm/powervc/bin/powervc-dbsync ceilometer
    # /bin/su - pwrvcdb -c "db2 connect TO ceilodb2; db2 CALL GET_DBSIZE_INFO '(?, ?, ?, 0)' > /tmp/ceilodb2_db_size.out; db2 terminate" > /dev/null
    

    Multi tenancy ... how to deal with a huge environment

    As my environment is growing bigger and bigger I faced a couple people trying to force me to multiply the number of PowerVC machine we have. As Openstack is a solution designed to handle both density and scalability I said that doing this is just a "non-sense". Seriously people who still believe in this have not understand anything about the cloud, openstack and PowerVC. Hopefully we found a solution acceptable by everybody. As we are created what we are calling "building-block" we had to find a way to isolate one "block" from one another. The solution for host isolation is called mutly tenancy isolation. For the storage side we are just going to play with quotas. By doing this a user will be able to manage a couple of hosts and the associated storage (storage template) without having the right to do anything on the others:

    multitenancyisolation

    Before doing anything create the tenant (or project) and a user associated with it:

    # cat /opt/ibm/powervc/version.properties | grep cloud_enabled
    cloud_enabled = yes
    # ~/powervcrc
    export OS_USERNAME=root
    export OS_PASSWORD=root
    export OS_TENANT_NAME=ibm-default
    export OS_AUTH_URL=https://powervc.lab.chmod666.org:5000/v3/
    export OS_IDENTITY_API_VERSION=3
    export OS_CACERT=/etc/pki/tls/certs/powervc.crt
    export OS_REGION_NAME=RegionOne
    export OS_USER_DOMAIN_NAME=Default
    export OS_PROJECT_DOMAIN_NAME=Default
    export OS_COMPUTE_API_VERSION=2.25
    export OS_NETWORK_API_VERSION=2.0
    export OS_IMAGE_API_VERSION=2
    export OS_VOLUME_API_VERSION=2
    # source powervcrc
    # openstack project create hb01
    +-------------+----------------------------------+
    | Field       | Value                            |
    +-------------+----------------------------------+
    | description |                                  |
    | domain_id   | default                          |
    | enabled     | True                             |
    | id          | 90d064b4abea4339acd32a8b6a8b1fdf |
    | is_domain   | False                            |
    | name        | hb01                             |
    | parent_id   | default                          |
    +-------------+----------------------------------+
    # openstack role list
    +----------------------------------+---------------------+
    | ID                               | Name                |
    +----------------------------------+---------------------+
    | 1a76014f12594214a50c36e6a8e3722c | deployer            |
    | 54616a8b136742098dd81eede8fd5aa8 | vm_manager          |
    | 7bd6de32c14d46f2bd5300530492d4a4 | storage_manager     |
    | 8260b7c3a4c24a38ba6bee8e13ced040 | deployer_restricted |
    | 9b69a55c6b9346e2b317d0806a225621 | image_manager       |
    | bc455ed006154d56ad53cca3a50fa7bd | admin               |
    | c19a43973db148608eb71eb3d86d4735 | service             |
    | cb130e4fa4dc4f41b7bb4f1fdcf79fc2 | self_service        |
    | f1a0c1f9041d4962838ec10671befe33 | vm_user             |
    | f8cf9127468045e891d5867ce8825d30 | viewer              |
    +----------------------------------+---------------------+
    # useradd hb01_admin
    # openstack role add --project hb01 --user hb01_admin admin
    

    Then associate each host group (aggregates in Openstack terms) (you have to put your allowed hosts in an host group to enable this feature) that are allowed for this tenant using filter_tenant_id meta-data. For each allowed host group add this field to the metatadata of the host. (first find the tenant id):

    # openstack project list
    +----------------------------------+-------------+
    | ID                               | Name        |
    +----------------------------------+-------------+
    | 1471acf124a0479c8d525aa79b2582d0 | ibm-default |
    | 90d064b4abea4339acd32a8b6a8b1fdf | hb01        |
    | b79b694c70734a80bc561e84a95b313d | powervm     |
    | c8c42d45ef9e4a97b3b55d7451d72591 | service     |
    | f371d1f29c774f2a97f4043932b94080 | project1    |
    +----------------------------------+-------------+
    # openstack aggregate list
    +----+---------------+-------------------+
    | ID | Name          | Availability Zone |
    +----+---------------+-------------------+
    |  1 | Default Group | None              |
    | 21 | aggregate2    | None              |
    | 41 | hg2           | None              |
    | 43 | hb01_mn       | None              |
    | 44 | hb01_me       | None              |
    +----+---------------+-------------------+
    # nova aggregate-set-metadata hb01_mn filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf 
    Metadata has been successfully updated for aggregate 43.
    | Id | Name    | Availability Zone | Hosts             | Metadata                                                                                                                                   
    | 43 | hb01_mn | -                 | '9119MME_1009425' | 'dro_enabled=False', 'filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf', 'hapolicy-id=1', 'hapolicy-run_interval=1', 'hapolicy-stabilization=1', 'initialpolicy-id=4', 'runtimepolicy-action=migrate_vm_advise_only', 'runtimepolicy-id=5', 'runtimepolicy-max_parallel=10', 'runtimepolicy-run_interval=5', 'runtimepolicy-stabilization=2', 'runtimepolicy-threshold=70' |
    # nova aggregate-set-metadata hb01_me filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf 
    Metadata has been successfully updated for aggregate 44.
    | Id | Name    | Availability Zone | Hosts             | Metadata                                                                                                                                   
    | 44 | hb01_me | -                 | '9119MME_0696010' | 'dro_enabled=False', 'filter_tenant_id=90d064b4abea4339acd32a8b6a8b1fdf', 'hapolicy-id=1', 'hapolicy-run_interval=1', 'hapolicy-stabilization=1', 'initialpolicy-id=2', 'runtimepolicy-action=migrate_vm_advise_only', 'runtimepolicy-id=5', 'runtimepolicy-max_parallel=10', 'runtimepolicy-run_interval=5', 'runtimepolicy-stabilization=2', 'runtimepolicy-threshold=70' |
    

    To make this work add the AggregateMultiTenancyIsolation to the scheduler_default_filter in nova.conf file and restart nova services:

    # grep scheduler_default_filter /etc/nova/nova.conf
    scheduler_default_filters = RamFilter,CoreFilter,ComputeFilter,RetryFilter,AvailabilityZoneFilter,ImagePropertiesFilter,ComputeCapabilitiesFilter,MaintenanceFilter,PowerVCServerGroupAffinityFilter,PowerVCServerGroupAntiAffinityFilter,PowerVCHostAggregateFilter,PowerVMNetworkFilter,PowerVMProcCompatModeFilter,PowerLMBSizeFilter,PowerMigrationLicenseFilter,PowerVMMigrationCountFilter,PowerVMStorageFilter,PowerVMIBMiMobilityFilter,PowerVMRemoteRestartFilter,PowerVMRemoteRestartSameHMCFilter,PowerVMEndianFilter,PowerVMGuestCapableFilter,PowerVMSharedProcPoolFilter,PowerVCResizeSameHostFilter,PowerVCDROFilter,PowerVMActiveMemoryExpansionFilter,PowerVMNovaLinkMobilityFilter,AggregateMultiTenancyIsolation
    # powervc-services restart
    

    We are done regarding the hosts.

    Enabling quotas

    To allow one user/tenant to create volumes only on onz storage provider we first need to enable quotas using the following commands:

    # grep quota /opt/ibm/powervc/policy/cinder/policy.json
        "volume_extension:quotas:show": "",
        "volume_extension:quotas:update": "rule:admin_only",
        "volume_extension:quotas:delete": "rule:admin_only",
        "volume_extension:quota_classes": "rule:admin_only",
        "volume_extension:quota_classes:validate_setup_for_nested_quota_use": "rule:admin_only",
    

    Then put to 0 all the non-allowed storage template for this tenant and let the only one you want to 10000. Easy:

    # cinder --service-type volume type-list
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    |                  ID                  |                     Name                    | Description | Is_Public |
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    | 53434872-a0d2-49ea-9683-15c7940b30e5 |               svc2 base template            |      -      |    True   |
    | e49e9cc3-efc3-4e7e-bcb9-0291ad28df42 |               svc1 base template            |      -      |    True   |
    | f45469d5-df66-44cf-8b60-b226425eee4f |                     svc3                    |      -      |    True   |
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    # cinder --service-type volume quota-update --volumes 0 --volume-type "svc2" 90d064b4abea4339acd32a8b6a8b1fdf
    # cinder --service-type volume quota-update --volumes 0 --volume-type "svc3" 90d064b4abea4339acd32a8b6a8b1fdf
    +-------------------------------------------------------+----------+
    |                        Property                       |  Value   |
    +-------------------------------------------------------+----------+
    |                    backup_gigabytes                   |   1000   |
    |                        backups                        |    10    |
    |                       gigabytes                       | 1000000  |
    |              gigabytes_svc2 base template             | 10000000 |
    |              gigabytes_svc1 base template             | 10000000 |
    |                     gigabytes_svc3                    |    -1    |
    |                  per_volume_gigabytes                 |    -1    |
    |                       snapshots                       |  100000  |
    |             snapshots_svc2 base template              |  100000  |
    |             snapshots_svc1 base template              |  100000  |
    |                     snapshots_svc3                    |    -1    |
    |                        volumes                        |  100000  |
    |            volumes_svc2 base template                 |  100000  |
    |            volumes_svc1 base template                 |    0     |
    |                      volumes_svc3                     |    0     |
    +-------------------------------------------------------+----------+
    # powervc-services stop
    # powervc-services start
    

    By doing this you have enable the isolation between two tenants. Then use the appropriate user to do the appropriate task.

    PowerVC cinder above the Petabyte

    Now that quota are enabled use this command if you want to be able to have more that one petabyte of data managed by PowerVC:

    # cinder --service-type volume quota-class-update --gigabytes -1 default
    # powervc-services stop
    # powervc-services start
    

    PowerVC cinder above 10000 luns

    Change the osapi_max_limit in cinder.conf if you want to go above the 10000 lun limits (check every cinder configuration files; the cinder.conf if for the global number of volumes):

    # grep ^osapi_max_limit cinder.conf
    osapi_max_limit = 15000
    # powervc-services stop
    # powervc-services start
    

    Snapshot and consistncy group

    There is a new cool feature available with the latest version of PowerVC (1.3.1.2). This feature allows you to create snapshots of volume (only on SVC and Storwise for the moment). You now have the possibility to create consistency group (group of volumes) and create snapshots of these consistency groups (allowing for instance to make a backup of a volume group directly from OpenStack. I'm doing the example below using the command line because I think it is easier to understand with these commands rather than showing you the same thing with the rest api):

    First create a consistency group:

    # cinder --service-type volume type-list
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    |                  ID                  |                     Name                    | Description | Is_Public |
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    | 53434872-a0d2-49ea-9683-15c7940b30e5 |              svc2 base template             |      -      |    True   |
    | 862b0a8e-cab4-400c-afeb-99247838f889 |             p8_ssp base template            |      -      |    True   |
    | e49e9cc3-efc3-4e7e-bcb9-0291ad28df42 |               svc1 base template            |      -      |    True   |
    | f45469d5-df66-44cf-8b60-b226425eee4f |                     svc3                    |      -      |    True   |
    +--------------------------------------+---------------------------------------------+-------------+-----------+
    # cinder --service-type volume consisgroup-create --name foovg_cg "svc1 base template"
    +-------------------+-------------------------------------------+
    |      Property     |                   Value                   |
    +-------------------+-------------------------------------------+
    | availability_zone |                    nova                   |
    |     created_at    |         2016-09-11T21:10:58.000000        |
    |    description    |                    None                   |
    |         id        |    950a5193-827b-49ab-9511-41ba120c9ebd   |
    |        name       |                  foovg_cg                 |
    |       status      |                  creating                 |
    |    volume_types   | [u'e49e9cc3-efc3-4e7e-bcb9-0291ad28df42'] |
    +-------------------+-------------------------------------------+
    # cinder --service-type volume consisgroup-list
    +--------------------------------------+-----------+----------+
    |                  ID                  |   Status  |   Name   |
    +--------------------------------------+-----------+----------+
    | 950a5193-827b-49ab-9511-41ba120c9ebd | available | foovg_cg |
    +--------------------------------------+-----------+----------+
    

    Create volume in this consistency group:

    # cinder --service-type volume create --volume-type "svc1 base template" --name foovg_vol1 --consisgroup-id 950a5193-827b-49ab-9511-41ba120c9ebd 200
    # cinder --service-type volume create --volume-type "svc1 base template" --name foovg_vol2 --consisgroup-id 950a5193-827b-49ab-9511-41ba120c9ebd 200
    +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
    |           Property           |                                                                          Value                                                                           |
    +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
    |         attachments          |                                                                            []                                                                            |
    |      availability_zone       |                                                                           nova                                                                           |
    |           bootable           |                                                                          false                                                                           |
    |     consistencygroup_id      |                                                           950a5193-827b-49ab-9511-41ba120c9ebd                                                           |
    |          created_at          |                                                                2016-09-11T21:23:02.000000                                                                |
    |         description          |                                                                           None                                                                           |
    |          encrypted           |                                                                          False                                                                           |
    |        health_status         | {u'health_value': u'PENDING', u'id': u'8d078772-00b5-45fc-89c8-82c63e2c48ed', u'value_reason': u'PENDING', u'updated_at': u'2016-09-11T21:23:02.669372'} |
    |              id              |                                                           8d078772-00b5-45fc-89c8-82c63e2c48ed                                                           |
    |           metadata           |                                                                            {}                                                                            |
    |       migration_status       |                                                                           None                                                                           |
    |         multiattach          |                                                                          False                                                                           |
    |             name             |                                                                        foovg_vol2                                                                        |
    |    os-vol-host-attr:host     |                                                                           None                                                                           |
    | os-vol-tenant-attr:tenant_id |                                                             1471acf124a0479c8d525aa79b2582d0                                                             |
    |      replication_status      |                                                                         disabled                                                                         |
    |             size             |                                                                           200                                                                            |
    |         snapshot_id          |                                                                           None                                                                           |
    |         source_volid         |                                                                           None                                                                           |
    |            status            |                                                                         creating                                                                         |
    |          updated_at          |                                                                           None                                                                           |
    |           user_id            |                                             0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9                                             |
    |         volume_type          |                                                                   svc1 base template                                                                     |
    +------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
    

    You're now able to attach these two volumes to a machine from the PowerVC GUI:

    consist

    # lsmpio -q
    Device           Vendor Id  Product Id       Size    Volume Name
    ------------------------------------------------------------------------------
    hdisk0           IBM        2145                 64G volume-aix72-44c7a72c-000000e0-
    hdisk1           IBM        2145                100G volume-snap1-dab0e2d1-130a
    hdisk2           IBM        2145                100G volume-snap2-5e863fdb-ab8c
    hdisk3           IBM        2145                200G volume-foovg_vol1-3ba0ff59-acd8
    hdisk4           IBM        2145                200G volume-foovg_vol2-8d078772-00b5
    # cfgmr
    # lspv
    hdisk0          00c8b2add70d7db0                    rootvg          active
    hdisk1          00f9c9f51afe960e                    None
    hdisk2          00f9c9f51afe9698                    None
    hdisk3          none                                None
    hdisk4          none                                None
    

    Then you can create a snapshot fo these two volumes. It's that easy :-) :

    # cinder --service-type volume cgsnapshot-create 950a5193-827b-49ab-9511-41ba120c9ebd
    +---------------------+--------------------------------------+
    |       Property      |                Value                 |
    +---------------------+--------------------------------------+
    | consistencygroup_id | 950a5193-827b-49ab-9511-41ba120c9ebd |
    |      created_at     |      2016-09-11T21:31:12.000000      |
    |     description     |                 None                 |
    |          id         | 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f |
    |         name        |                 None                 |
    |        status       |               creating               |
    +---------------------+--------------------------------------+
    # cinder --service-type volume cgsnapshot-list
    +--------------------------------------+-----------+------+
    |                  ID                  |   Status  | Name |
    +--------------------------------------+-----------+------+
    | 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f | available |  -   |
    +--------------------------------------+-----------+------+
    # cinder --service-type volume cgsnapshot-show 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f
    +---------------------+--------------------------------------+
    |       Property      |                Value                 |
    +---------------------+--------------------------------------+
    | consistencygroup_id | 950a5193-827b-49ab-9511-41ba120c9ebd |
    |      created_at     |      2016-09-11T21:31:12.000000      |
    |     description     |                 None                 |
    |          id         | 20e2ce6b-9c4a-4eea-b05d-f0b0b6e4768f |
    |         name        |                 None                 |
    |        status       |              available               |
    +---------------------+--------------------------------------+
    

    cgsnap

    Conclusion

    Please keep in mind that the content of this blog post comes from real life and production examples. I hope you will be able to better understand that scalability, density, fast deployment, snapshots, multi tenancy are some features that are absolutely needed in the AIX world. As you can see the PowerVC team is moving fast. Probably faster than every customer I have ever seen. I must admit they are right. Doing this is the only way the face the Linux X86 offering. And I must confess this is damn fun to work on those things. I'm so happy to have the best of two worlds AIX/PowerSystem and Openstack. This is the only direction we have to take if we want AIX to survive. So please stop being scared or not convinced by these solutions they are damn good, production ready. Please face and embrace the future and stop looking at the past. As always I hope it help.

    Continuous integration for your Chef AIX cookbooks (using PowerVC, Jenkins, test-kitchen and gitlab)

    My Journey to integrate Chef on AIX is still going on and I’m working more than ever on these topics. I know that using such tools is not something widely adopted by AIX customers. But what I also know is that whatever happens you will in a near -or distant- future use an automation tool. These tools are so widely used in the Linux world that you just can’t ignore it. The way you were managing your AIX ten years ago is not the same as what you are doing today, and what you do today will not be what you’ll do in the future. The AIX world needs a facelift to survive, a huge step has already be done (and is still ongoing) with PowerVC thanks to a fantastic team composed by very smart people at IBM (@amarteyp; @drewthorst, @jwcroppe, and all the other persons in this team!) The AIX world is now compatible with Openstack and with this other things are coming … such as automation. When all of these things will be ready AIX we will be able to offer something comparable to Linux. Openstack and automation are the first brick to what we call today “devops” (to be more specific it’s the ops part of the devops word).

    I will today focus on how to manage your AIX machines using Chef. By using the word “how” I mean what are the best practices and infrastructures to build to start using Chef on AIX. If you remember my session about Chef on AIX at the IBM Technical University in Cannes I was saying that by using Chef your infrastructure will be testable, repeatable, and versionnable. We will focus on this blog post on how to do that. To test your AIX Chef cookbooks you will need to understand what is the test kitchen (we will use the test kitchen to drive PowerVC to build virtual machines on the fly and run the chef recipes on it). To repeat this over and over to be sure everything is working (code review, be sure that your cookbook is converging) ok without having to do anything we will use Jenkins to automate these tests. Then to version your cookbooks development we will use gitlab.

    To better understand why I’m doing such a thing there is nothing better than a concrete example. My goal is to do all my AIX post-installation tasks using Chef (motd configuration, dns, devices attributes, fileset installation, enabling services … everything that you are today doing using korn shells scripts). Who has never experienced someone changing one of these scripts (most of the time without warning the other members of the team) resulting in a syntax error then resulting in an outage for all your new builds. Doing this is possible if you are in a little team creating one machine per month but is inconceivable in an environment driven by PowerVC where sysadmin are not doing anything “by hand”. In such an environment if someone is doing this kind of error all the new builds are failing …. even worse you’ll probably not be aware of this until someone who is connecting on the machine will say that there is an error (most of the time the final customer). By using continuous integration your AIX build will be tested at every change, all this changes will be stored in a git repository and even better you will not be able to put a change in production without passing all these tests. Even if using this is just mandatory to do that for people using PowerVC today people who are not can still do the same thing. By doing that you’ll have a clean and proper AIX build (post-install) and no errors will be possible anymore, so I highly encourage you to do this even if you are not adopting the Openstack way or even if today you don’t see the benefits. In the future this effort will pay. Trust me.

    The test-kitchen

    What is the kitchen

    The test-kitchen is a tool that allows you to run your AIX Chef cookbooks and recipes in a quick way without having to do manual task. During the development of your recipes if you don’t use the test kitchen you’ll have many tasks to do manually. Build a virtual machine, install the chef client, copy the cookbook and the recipes, run it, check everything is in the state that you want. Imagine doing that on different AIX version (6.1, 7.1, 7.2) everytime you are changing something in your post-installation recipes (I was doing that before and I can assure you that creating and destroy machine over and over and over is just a waste of time). The test kitchen is here to do the job for you. It will build the machine for you (using the PowerVC kitchen driver), install the chef-client (using an omnibus server), copy the content of your cookbook (the files), run a bunch of recipe (described in what we call suites) and then test it (using bats, or serverspec). You can configure your kitchen to test different kind of images (6.1, 7.1, 7.2) and differents suites (cookbooks, recipes) depending on the environment you want to test. By default the test kitchen is using a Linux tool called Vagrant to build your VM. Obsiouvly Vagrant is not able to build an AIX machine, that’s why we will use a modified version of the kitchen-openstack driver (modified by my self) called kitchen-powervc to build the virtual machines:

    Installing the kitchen and the PowerVC driver

    If you have an access to an enterprise proxy you can directly download and install the gem files from your host (in my case this is a Linux on Power … so Linux on Power is working great for this).

    • Install the test kitchen :
    • # gem install --http-proxy http://bcreau:mypasswd@proxy:8080 test-kitchen
      Successfully installed test-kitchen-1.7.2
      Parsing documentation for test-kitchen-1.7.2
      1 gem installed
      
    • Install kitchen-powervc :
    • # gem install --http-proxy http://bcreau:mypasswd@proxy:8080 kitchen-powervc
      Successfully installed kitchen-powervc-0.1.0
      Parsing documentation for kitchen-powervc-0.1.0
      1 gem installed
      
    • Install kitchen-openstack :
    • # gem install --http-proxy http://bcreau:mypasswd@proxy:8080 kitchen-openstack
      Successfully installed kitchen-openstack-3.0.0
      Fetching: fog-core-1.38.0.gem (100%)
      Successfully installed fog-core-1.38.0
      Fetching: fuzzyurl-0.8.0.gem (100%)
      Successfully installed fuzzyurl-0.8.0
      Parsing documentation for kitchen-openstack-3.0.0
      Installing ri documentation for kitchen-openstack-3.0.0
      Parsing documentation for fog-core-1.38.0
      Installing ri documentation for fog-core-1.38.0
      Parsing documentation for fuzzyurl-0.8.0
      Installing ri documentation for fuzzyurl-0.8.0
      3 gems installed
      

    If you don’t have the access to an enterprise proxy you can still download the gems from home and install it on your work machine:

    # gem install test-kitchen kitchen-powervc kitchen-openstack -i repo --no-ri --no-rdoc
    # # copy the files (repo directory) on your destination machine
    # gem install *.gem
    

    Setup the kitchen (.kitchen.yml file)

    The kitchen configuration file is the .kitchen.yml, when you’ll run the kitchen command, the kitchen will look at this file. You have to put it in the chef-repo (where the cookbook directory is, the kitchen will copy the file from the cookbook to the test machine that’s why it’s important to put this file at the root of the chef-repo.) This file is separated in different sections:

    • The driver section. In this section you will configure howto created virtual machines. In our case how to connect to PowerVC (credentials, region). You’ll also tell in this section which image you want to use (PowerVC images), which flavor (PowerVC template) and which network will be used at the VM creation (please note that you can put some driver_config in the platform section, to tell which image or which ip you want to use for each specific platform.:
      • name: the name of the driver (here powervc).
      • openstack*: the PowerVC url, user, password, region, domain.
      • image_ref: the name of the image (we will put this in driver_config in the platform section).
      • flavor_ref: the name of the PowerVC template used at the VM creation.
      • fixed_ip: the ip_address used for the virtual machine creation.
      • server_name_prefix: each vm created by the kitchen will be prefixed by this parameter.
      • network_ref: the name of the PowerVC vlan to be used at the machine creation.
      • public_key_path: The kitchen needs to connect to the machine with ssh, you need to provide the public key used.
      • private_key_path: Same but for the private key.
      • username: The ssh username (we will use root, but you can use another user and then tell the kitchen to use sudo)
      • user_data: The activation input used by cloud-init we will in this one put the public key to be sure you can access the machine without password (it’s the PowerVC activation input).
      • driver:
          name: powervc
          server_wait: 100
          openstack_username: "root"
          openstack_api_key: "root"
          openstack_auth_url: "https://mypowervc:5000/v3/auth/tokens"
          openstack_region: "RegionOne"
          openstack_project_domain: "Default"
          openstack_user_domain: "Default"
          openstack_project_name: "ibm-default"
          flavor_ref: "mytemplate"
          server_name_prefix: "chefkitchen"
          network_ref: "vlan666"
          public_key_path: "/home/chef/.ssh/id_dsa.pub"
          private_key_path: "/home/chef/.ssh/id_dsa"
          username: "root"
          user_data: userdata.txt
        
        #cloud-config
        ssh_authorized_keys:
          - ssh-dss AAAAB3NzaC1kc3MAAACBAIVZx6Pic+FyUisoNrm6Znxd48DQ/YGNRgsed+fc+yL1BVESyTU5kqnupS8GXG2I0VPMWN7ZiPnbT1Fe2D[..]
        
    • The provisioner section: This section can be use to specify if you want to user chef-zero or chef-solo as a provisioner. You can also specify an omnibus url (use to download and install the chef-client at the machine creation time). In my case the omnibus url is a link to an http server “serving” a script (install.sh) installing the chef client fileset for AIX (more details later in the blog post). I’m also putting “sudo” to false as I’ll connect with the root user:
    • provisioner:
        name: chef_solo
        chef_omnibus_url: "http://myomnibusserver:8080/chefclient/install.sh"
        sudo: false
      
    • The platefrom section: The plateform section will describe each plateform that the test-kitchen can create (I’m putting here the image_ref and the fixed_ip for each plateform (AIX 6.1, AIX 7.1, AIX 7.2)
    • platforms:
        - name: aix72
          driver_config:
            image_ref: "kitchen-aix72"
            fixed_ip: "10.66.33.234"
        - name: aix71
          driver_config:
            image_ref: "kitchen-aix71"
            fixed_ip: "10.66.33.235"
        - name: aix61
          driver_config:
            image_ref: "kitchen-aix61"
            fixed_ip: "10.66.33.236"
      
    • The suite section: this section describe which cookbook and which recipes you want to run in the machines created by the test-kitchen. For the simplicity of this example I’m just running two recipe the first on called root_authorized_keys (creating the /root directory, changing the home directory of root and the putting a public key in the .ssh directory) and the second one call gem_source (we will check later in the post why I’m also calling this recipe):
    • suites:
        - name: aixcookbook
          run_list:
          - recipe[aix::root_authorized_keys]
          - recipe[aix::gem_source]
          attributes: { gem_source: { add_urls: [ "http://10.14.66.100:8808" ], delete_urls: [ "https://rubygems.org/" ] } }
      
    • The busser section: this section describe how to run you tests (more details later in the post ;-) ):
    • busser:
        sudo: false
      

    After configuring the kitchen you can check the yml file is ok by listing what’s configured on the kitchen:

    # kitchen list
    Instance           Driver   Provisioner  Verifier  Transport  Last Action
    aixcookbook-aix72  Powervc  ChefSolo     Busser    Ssh        
    aixcookbook-aix71  Powervc  ChefSolo     Busser    Ssh        
    aixcookbook-aix61  Powervc  ChefSolo     Busser    Ssh        
    

    kitchen1
    kitchen2

    Anatomy of a kitchen run

    A kitchen run is divided into five steps. At first we are creating a virtual machine (the create action), then we are installing the chef-client (using an omnibus url) and running some recipes (converge), then we are installing testing tools on the virtual machine (in my case serverspec) (setup) and we are running the tests (verify). Finally if everything was ok we are deleting the virtual machines (destroy). Instead of running all theses steps one by one you can use the “test” option. This one will do destroy,create,converge,setup,verify,destroy in on single “pass”. Let’s check in details each steps:

    kitchen1

    • Create: This will create the virtual machine using PowerVC. If you choose to use the “fixed_ip” option in the .kitchen.yml file this ip will be choose at the machine creation time. If you prefer to pick an ip from the network (in the pool) don’t set the “fixed_ip”. You’ll see the details in the picture below. You can at the end test the connectivity (transport) (ssh) to the machine using “kitchen login”. The ssh public key was automatically added using the userdata.txt file used by cloud-init at the machine creation time. After the machine is created you can use the “kitchen list” command to check the machine was successfully created:
    # kitchen create
    

    kitchencreate3
    kitchencreate1
    kitchencreate2
    kitchenlistcreate1

    • Converge: This will converge the kitchen (on more time converge = chef-client installation and running chef-solo with the suite configuration describing which recipe will be launched). The converge action will download the chef client and install it on the machine (using the omnibus url) and run the recipe specified in the suite stanza of the .kitchen.yml file. Here is the script I use for the omnibus installation this script is “served” by an http server:
    • # cat install.sh
      #!/usr/bin/ksh
      echo "[omnibus] [start] starting omnibus install"
      echo "[omnibus] downloading chef client http://chefomnibus:8080/chefclient/lastest"
      perl -le 'use LWP::Simple;getstore("http://chefomnibus:8080/chefclient/latest", "/tmp/chef.bff")'
      echo "[omnibus] installing chef client"
      installp -aXYgd /tmp/ chef
      echo "[omnibus] [end] ending omnibus install"
      
    • The http server is serving this install.sh file. Here is the httpd.conf configuration file for the omnibus installation on AIX:
    • # ls -l /apps/chef/chefclient
      total 647896
      -rw-r--r--    1 apache   apache     87033856 Dec 16 17:15 chef-12.1.2-1.powerpc.bff
      -rwxr-xr-x    1 apache   apache     91922944 Nov 25 00:24 chef-12.5.1-1.powerpc.bff
      -rw-------    2 apache   apache     76375040 Jan  6 11:23 chef-12.6.0-1.powerpc.bff
      -rwxr-xr-x    1 apache   apache          364 Apr 15 10:23 install.sh
      -rw-------    2 apache   apache     76375040 Jan  6 11:23 latest
      # cat httpd.conf
      [..]
           Alias /chefclient/ "/apps/chef/chefclient/"
           
               Options Indexes FollowSymlinks MultiViews
             AllowOverride None
             Require all granted
           
      
    # kitchen converge
    

    kitchenconverge1
    kitchenconverge2b
    kitchenlistconverge1

    • Setup and verify: these actions will run a bunch of tests to verify the machine is in the state you want. The test I am writing are checking that the root home directory was created and the key was successfully created in the .ssh directory. In a few words you need to write tests checking that your recipes are working well (in chef words: “check that the machine is in the correct state”). In my case I’m using serverspec to describe my tests (there are different tools using for testing, you can also use bats). To describe the tests suite just create serverspec files (describing the tests) in the chef-repo directory (in ~/test/integration//serverspec in my case ~/test/integration/aixcookbook/serverspec). All the serverspec test files are suffixed by _spec:
    • # ls test/integration/aixcookbook/serverspec/
      root_authorized_keys_spec.rb  spec_helper.rb
      
    • The “_spec” file describe the tests that will be run by the kitchen. In my very simple tests here I’m just checking my files exists and the content of the public_key is the same as my public_key (the key created by cloud-init in AIX is located in ~/.ssh and my test recipe here is changing the root home directory and putting the key in the right place). By looking at the file you can see that the serverspec language is very simple to understand:
    • # ls test/integration/aixcookbook/serverspec/
      root_authorized_keys_spec.rb  spec_helper.rb
      
      # cat spec_helper.rb
      require 'serverspec'
      set :backend, :exec
      # cat root_authorized_keys_spec.rb
      require 'spec_helper'
      
      describe file('/root/.ssh') do
        it { should exist }
        it { should be_directory }
        it { should be_owned_by 'root' }
      end
      
      describe file('/root/.ssh/authorized_keys') do
        it { should exist }
        it { should be_owned_by 'root' }
        it { should contain 'from="1[..]" ssh-rsa AAAAB3NzaC1[..]' }
      end
      
    • The kitchen will try to install needed ruby gems for serverspec (serverspec needs to be installed on the server to run the automated test). As my server has no connectivity to the internet I need to run my own gem server. I’m lucky all the gem needed are installed on my chef workstation (if you have no internet access from the workstation use the tip described at the beginning of this blog post). I just need to run a local gem server by running “gem server” on the chef workstation. The server is listening on port 8808 and will serve all the needed gems:
    • # gem list | grep -E "busser|serverspec"
      busser (0.7.1)
      busser-bats (0.3.0)
      busser-serverspec (0.5.9)
      serverspec (2.31.1)
      # gem server
      Server started at http://0.0.0.0:8808
      
    • If you look on the output above you can see that the recipe gem_server was executed. This recipe change the gem source on the virtual machine (from https://rubygems.org to my own local server). In the .kitchen.yml file the urls to add and remove to the gem source are specified in the suite attributes:
    • # cat gem_source.rb
      ruby_block 'Changing gem source' do
        block do
          node['gem_source']['add_urls'].each do |url|
            current_sources = Mixlib::ShellOut.new('/opt/chef/embedded/bin/gem source')
            current_sources.run_command
            next if current_sources.stdout.include?(url)
            add = Mixlib::ShellOut.new("/opt/chef/embedded/bin/gem source --add #{url}")
            add.run_command
            Chef::Application.fatal!("Adding gem source #{url} failed #{add.status}") unless add.status == 0
            Chef::Log.info("Add gem source #{url}")
          end
      
          node['gem_source']['delete_urls'].each do |url|
            current_sources = Mixlib::ShellOut.new('/opt/chef/embedded/bin/gem source')
            current_sources.run_command
            next unless current_sources.stdout.include?(url)
            del = Mixlib::ShellOut.new("/opt/chef/embedded/bin/gem source --remove #{url}")
            del.run_command
            Chef::Application.fatal!("Removing gem source #{url} failed #{del.status}") unless del.status == 0
            Chef::Log.info("Remove gem source #{url}")
          end
        end
        action :run
      end
      
    # kitchen setup
    # kitchen verify
    

    kitchensetupeverify1
    kitchenlistverfied1

    • Destroy: This will destroy the virtual machine on PowerVC.
    # kitchen destroy
    

    kitchendestroy1
    kitchendestroy2
    kitchenlistdestroy1

    Now that you understand how the kitchen is working and that you are now able to run it to create and test AIX machines you are ready to use the kitchen to develop and create the chef cookbook that will fit your infrastructure. To run the all the steps “create,converge,setup,verify,destroy”, just use the “kitchen test” command:

    # kitchen test
    

    As you are going to change a lot of things in your cookbook you’ll need to version the code you are creating, for this we will use a gitlab server.

    Gitlab: version your AIX cookbook

    Unfortunately for you and for me I didn’t had the time to run gitlab on a Linux on Power machine. I’m sure it is possible (if you find a way to do this please mail me). Anyway my version of gitlab is running on an x86 box. The goal here is to allow the chef workstation (in my environment this user is “chef”) user to push all the new developments (providers, recipes) to the git development branch for this we will:

    • Allow the chef user to push its source to the git server trough ssh (we are creating a chefworkstation user and adding the key to authorize this user to push the changes to the git repository with ssh).
    • gitlabchefworkst

    • Create a new repository called aix-cookbook.
    • createrepo

    • Push your current work to the master branch. The master branch will be the production branch.
    • # git config --global user.name "chefworkstation"
      # git config --global user.email "chef@myworkstation.chmod666.org"
      # git init
      # git add -A .
      # git commit -m "first commit"
      # git remote add origin git@gitlabserver:chefworkstation/aix-cookbook.git
      # git push origin master
      

      masterbranch

    • Create a development branch (you’ll need to push all your new development to this branch, and you’ll never have to do anything else on the master branch as Jenkins is going to do the job for us.
    • # git checkout -b dev
      # git commit -a
      # git push origin dev
      

      devbranch

    The git server is ready: we have a repository accessible by the chef user. Two branch created the dev one (the one we are working on used for all our development) and the master branch used for production that will be never touched by us and will be only updated (by jenkins) if all the tests (foodcritic, rubocop and the test-kitchen) are ok

    Automating the continous integration with Jenkins

    What is Jenkins

    The goal of Jenkins is to automate all tests and run them over and over again every time a change is applied onto the cookbook you are developing. By using Jenkins you will be sure that every change will be tested and you will never push something that is not working or not passing the tests you have defined in your production environment. To be sure the cookbook is working as desired we will use three different tools. foodcritic will check the will check your chef cookbook for common problems by checking rules that are defined within the tools (this rules will check that everything is ok for the chef execution, so you will be sure that there is no syntax error, and that all the coding convention will be respected), rubocop will check the ruby syntax, and then we will run a kitchen test to be sure that the developement branch is working with the kitchen and that all our serverspec tests are ok. Jenkins will automate the following steps:

    1. Pull the dev branch from git server (gitlab) if anything has changed on this branch.
    2. Run foodcritic on the code.
    3. If foodcritic tests are ok this will trigger the next step.
    4. Pull the dev branch again
    5. Run rubocop on the code.
    6. If rubocop tests are ok this will trigger the next step.
    7. Run the test-kitchen
    8. This will build a new machine on PowerVC and test the cookbook against it (kitchen test).
    9. If the test kitchen is ok push the dev branch to the master branch.
    10. You are ready for production :-)

    kitchen2

    First: Foodcritic

    The first test we are running is foodcritic. Better than trying to do my own explanation of this with my weird english I prefer to quote the chef website:

    Foodcritic is a static linting tool that analyzes all of the Ruby code that is authored in a cookbook against a number of rules, and then returns a list of violations. Because Foodcritic is a static linting tool, using it is fast. The code in a cookbook is read, broken down, and then compared to Foodcritic rules. The code is not run (a chef-client run does not occur). Foodcritic does not validate the intention of a recipe, rather it evaluates the structure of the code, and helps enforce specific behavior, detect portability of recipes, identify potential run-time failures, and spot common anti-patterns.

    # foodcritic -f correctness ./cookbooks/
    FC014: Consider extracting long ruby_block to library: ./cookbooks/aix/recipes/gem_source.rb:1
    

    In Jenkins here are the steps to create a foodcritic test:

    • Pull dev branch from gitlab:
    • food1

    • Check for changes (the Jenkins test will be triggered only if there was a change in the git repository):
    • food2

    • Run foodcritic
    • food3

    • After the build parse the code (to archive and record the evolution of the foodcritic errors) and run the rubocop project if the build is stable (passed without any errors):
    • food4

    • To configure the parser go in the Jenkins configuration and add the foodcritic compiler warnings:
    • food5

    Second: Rubocop

    The second test we are running is rubocop it’s a Ruby static code analyzer, based on the community Ruby style guide. Here is an example below

    # rubocop .
    Inspecting 71 files
    ..CCCCWWCWC.WC..CC........C.....CC.........C.C.....C..................C
    
    Offenses:
    
    cookbooks/aix/providers/fixes.rb:31:1: C: Assignment Branch Condition size for load_current_resource is too high. [20.15/15]
    def load_current_resource
    ^^^
    cookbooks/aix/providers/fixes.rb:31:1: C: Method has too many lines. [19/10]
    def load_current_resource ...
    ^^^^^^^^^^^^^^^^^^^^^^^^^
    cookbooks/aix/providers/sysdump.rb:11:1: C: Assignment Branch Condition size for load_current_resource is too high. [25.16/15]
    def load_current_resource
    

    In Jenkins here are the steps to create a rubocop test:

    • Do the same thing as foodcritic except for the build and post-build action steps:
    • Run rubocop:
    • rubo1

    • After the build parse the code and run the test-kitchen project even if the build is fails (rubocop will generate tons of things to correct … once you are ok with rubocop change this to “trigger only if the build is stable”) :
    • rubo2

    Third: test-kitchen

    I don’t have to explain again what is the test-kitchen ;-) . It is the third test we are creating with Jenkins and if this one is ok we are pushing the changes in production:

    • Do the same thing as foodcritic except for the build and post-build action steps:
    • Run the test-kitchen:
    • kitchen1

    • If the test kitchen is ok push dev branch to master branch (dev to production):
    • kitchen3

    More about Jenkins

    The three tests are now linked together. On the Jenkins home page you can check the current state of your tests. Here are a couple of screenshots:

    meteo
    timeline

    Conclusion

    I know that for most of you working this way is something totally new. As AIX sysadmins we are used to our ksh and bash scripts and we like the way it is today. But as the world is changing and as you are going to manage more and more machines with less and less admins you will understand how powerful it is to use automation and how powerful it is to work in a “continuous integration” way. Even if you don’t like this “concept” or this new work habit … give it a try and you’ll see that working this way is worth the effort. First for you … you’ll discover a lot of new interesting things, second for your boss that will discover that working this way is safer and more productive. Trust me AIX needs to face Linux today and we are not going anywhere without having a proper fight versus the Linux guys :-) (yep it’s a joke).