Blog

Ansible Tutorial: Multi-tier Deployment with Ansible

Multi-stage with modules

Apr 10, 2019

Ansible, by Red Hat, provides automated configurations as well as orchestrations of machine landscapes. The lightweight and flexible tool is increasingly used in DevOps toolchains and cloud computing. This article shows how to get a complex multi-tier setup on the server with it.

As a configuration and provisioning framework for machines, Ansible provides automated and reproducible management for IT landscapes. The relatively young but already quite technically mature tool belongs to the application class Infrastructure-as-Code (IaC) and is in competition with solutions such as Puppet, Chef and Salt/Saltstack, some of which have already been established for some time.

The software was acquired by Red Hat in 2015 and it is currently positioned in the market as part of a comprehensive open source software stack for application provisioning offered by this company.

The CLI-based tools of Ansible, which are also available also outside of the commercial offering, are referred to as the Ansible Engine. The provisioner offers a clearly structured and well-arranged component model consisting of a machine inventory, modules, playbooks and roles. It deliberately follows a lightweight approach and can be used successfully after a relatively short training period.

Ansible is mainly used for the controls of Linux, BSD and Unix machines and doesn’t require an agent on the target systems. It also doesn’t require a central server and only needs to be installed on the user’s control node resp. workstation and kept up-to-date there. For an operation on a target machine, a SSH access and an installed Python interpreter (this 2.7 by default) are required there, so that Ansible starts at a higher level than working with “bare metal” systems that don’t have an installed operating system, and thus cannot provide this.

Ansible projects are used to move one or more machines from specific states (usually the basic installation of an OS is assumed) to another specific state (for example, to function as a web or database server). The software has an imperative approach (how to achieve something). Supporters of this method point to its advantages, such as easier debugging and full control. However, the Ansible user always needs to know the initial state of a machine in very much detail, just like an administrator, and he also needs to know how to implement the required procedures (“plays”) with the required individual steps (tasks) in the correct order in Ansible. And this can become quite complex rather fast.

Ansible

Ansible utilizes the user-friendly YAML format for scripting procedures in workflows (playbooks) and other elements. It also offers the integrated powerful template language Jinja 2 as its killer feature, which was developed by the Pocoo group as a stand-alone-project and is used also in other Python projects. For the implementation of different administrative operations on targeted systems, Ansible provides a comprehensive toolbox with hundreds of supplied modules. A profound knowledge of Ansible is to a large extent means knowing what modules are included for which purpose and how they behave in detail.

A couple of modules form a manageable basic set for fundamental administrative tasks on a target system. Tasks such as uploading files or importing them from the network, installing packages, creating users and directories, attaching lines to configuration files or changing them, as well as restarting services. Some standard modules, such as ping, are particularly suitable for being set-off ad hoc without a playbook but with the provided CLI tool ansible. Furthermore, there are many special modules available that can be used with MySQL servers, for example, to set up users and databases. The additional Python libraries that are required for the application of some modules on the target system (e.g. for MySQL) can then simply be installed as a first step in the same playbook.

STAY TUNED

Learn more about DevOpsCon

Partially, there are alternative modules for specific purposes and sometimes there are also alternative ways how certain things can be achieved. Additionally, modules can also be used creatively, like the subversion module. This can be employed in such a way that it imports individual directories directly from GitHub projects onto the target machines. The supplied CLI tool ansible-doc offers a fast access to the documentation of all provided modules on the control node similar to Linux Manpages. The infrastructure and cloud computing modules, which are now fully established in Ansible and only indirectly related to machine provisioning procedures, play a major role in the more recent versions of the provisioner. They make use of Ansible’s procedural principle for their own purposes, e.g. for pulling up cloud infrastructure and remotely controlling network hardware.

Roles in Ansible function in the sense of “the role that a machine takes” as an organizational structure for the components of projects. A complex Ansible project can contain multiple playbooks and a single role offers a set of fixed defined directories (Listing 1): tasks/ is for the playbooks, vars/ and defaults/ are for the definition of variables, and handlers/ for the definition of switches, they hold YAML configuration files, and in the directories files/ and templates/ you can provide arbitrary files and templates for the installation during the Ansible run. You can create the skeleton for a new role with the included CLI tool ansbile-galaxy, where the unneeded generated directories with included templates (always main.yml) can easily be deleted; meta/ , for example, is mainly intended to distribute the role via the official repository of Ansible – the galaxy.

├── group_vars
│   └── all.yml
├── hosts
├── roles
│   ├── flask
│   │   ├── files
│   │   │   ├── querier.conf
│   │   │   └── querier.wsgi
│   │   ├── handlers
│   │   │   └── main.yml
│   │   ├── tasks
│   │   │   └── main.yml
│   │   └── templates
│   │       └── querier.py.j2
│   ├── haproxy
│   │   ├── handlers
│   │   │   └── main.yml
│   │   ├── tasks
│   │   │   └── main.yml
│   │   ├── templates
│   │   │   └── haproxy.cfg.j2
│   │   └── vars
│   │       └── main.yml
│   └── mariadb
│       ├── handlers
│       │   └── main.yml
│       └── tasks
│           └── main.yml
└── site.yml

In Ansible projects the user can define any custom variables and evaluate them together with the built-in variables, which all start with ansible_, if required. The so-called “facts” do play a special role in this regard. This is comprehensive information that Ansible collects from all connected machines when running a playbook and which can be completely outputted for development purposes with the setup module. For example, Ansible researches the IP addresses and hostnames of all connected machines during the run and the user can evaluate these in templates for configuration files for complex setups in which nodes must be able to reach each other.

Example

We will use a classic multi-tier setup, which was implemented with open source software, and consists of three components as an example for a slightly more complex Ansible project with three roles (Listing 1). The MySQL-Fork MariaDB is a relational database server running on a backend machine on which the test database test_db is installed, which is well-known among MySQL developers. It is a fictitious personnel database of a non-existent large corporation with six tables which contain around 300 000 persons and several million salary data entries. To use this database, a micro service written with the Python web framework Flask [1] is installed on frontend machines, which queries this database when called and returns a JSON object. This contains the current top earner of one of the nine company departments that are included in this personnel data. This web application is based on an Apache 2 web server with WSGI extension, which Flask needs as an interface to communicate with the server.

The load balancer HAProxy is installed on another node. For the purpose of reliability and load-sharing, it distributes requests from the outside network to any number of dedicated frontend nodes with the same Flask application, all of which access the same backend with the database in parallel. HAProxy is a powerful enterprise-level software solution that is used by many well-known providers such as Twitter and GitHub. This application makes only limited sense, even beyond the fictitious personal data, because the data does not change at all and the queries always return the same results. But this is, nevertheless, an overall-setup, which is much common in practice and appears oftenly in variations. Ansible is suitable for completely automatically pulling up this entire structure with the push of a button and thereby deploying the required components on machines (hosts). The playbooks are written for the basic installations of Debian 9 and contain several customized details, such as the used package names.

 

 

Backend

To install the MariaDB server you only need a playbook (Listing 2), but no configuration templates. Plus, files to be placed on the control node for uploading can be omitted, because the sample database can be imported directly from the net.

- name: MariaDB und benötigte Pakete installieren
  apt:
    name: "{{ item }}"
    state: latest
    update_cache: yes
  with_items:
    - mariadb-server
    - python-mysqldb
    - unzip

- name: Datenbank "employees" anlegen
  mysql_db:
    name: employees
    state: present

- name: SQL-Benutzer "employees" anlegen
  mysql_user:
    name: employees
    host: "%"
    password: "{{ employees_password }}"
    priv: "employees.*:ALL"
    state: present

- name: check ob test_db bereits importiert ist
  stat:
    path: /var/lib/mysql/employees/employees.frm
  register: testdb_imported

- name: test_db von Github einspielen
  unarchive:
    src: https://github.com/datacharmer/test_db/archive/master.zip
    dest: /tmp
    remote_src: yes
  when: testdb_imported.stat.exists == false

- name: Pfade in Importskript anpassen
  replace:
    path: /tmp/test_db-master/employees.sql
    regexp: "source"
    replace: "source /tmp/test_db-master/"
  when: testdb_imported.stat.exists == false

- name: test_db importieren
  mysql_db:
    name: all
    state: import
    target: /tmp/test_db-master/employees.sql
  when: testdb_imported.stat.exists == false

- name: MariaDB für Fernzugriff freischalten
  lineinfile:
    dest: /etc/mysql/mariadb.conf.d/50-server.cnf
    regexp: "^bind-address(.+)127.0.0.1"
    line: "bind-address = 0.0.0.0"
    backrefs: yes
  notify: restart mariadb

At first, the playbook installs the package mariadb-server from the official Debian archive with the module apt for the package manager. For later operations with Ansible, two more packages are needed on this host, which are python-mysqldb for the MySQL modules of Ansible and unzip to unpack the downloaded example database. There is nothing wrong with installing all three packages in one step. Ansible contains the variable item, which can be used in conjunction with with_items to construct you own iterators, as it is shown here in the example.

In Ansible, or respectively Jinja2, variables are always expanded with doubled curly brackets and the syntax in the examples (Ansible provides an alternative syntax which isn’t YAML conform) also requires it to put these constructs in quotation marks. The database server is already activated after the installation of the package, the playbook then creates a new empty database and a user for it in two further steps with the modules mysql_db and mysql_user. The password-preset in the user-defined variable employees_password is also required for the Flask application. Therefore, it is a good idea not to define it in both roles in parallel in vars/, but centrally on a higher level in group_vars/ (Listing 3).

employees_password: fbfad90d99d0b4 

The test database installation is the next step. But it is best to first create a checking mechanism in order to prevent this from happening at every new Ansible run, because this creates an unnecessary overhead. The stat module is well suited for this task with which you can check whether the employees.frm file already exists (which is the case if the database has already been installed), and the return of the module can be incorporated in a variable (here in testdb_imported) with register. In the next step the unarchive module imports the database as a ZIP file from GitHub into /tmp and unpacks it. But this only happens (when) if the return value of testdb_imported.stat.exists is negative. The replace module will then adjust some paths in the import script from the ZIP archive and here, as well as in the next steps, the playbook will set with when the same condition for the execution. The next step uses the mysql_db module again to install the unpacked personnel database to the MariaDB server by using the import script which is shipped with it.

 

Better Continuous Delivery with microservices

Explore the Continuous Delivery & Automation Track

To enable the database server for access from the network it is necessary to change a line in a configuration file under /etc/myslq, which can be done with the module lineinfile. The backrefs option for this module prevents the same line from being rewritten when this playbook is run again: Otherwise it will be appended again and again at every run of this role when the regexp expression is not found (anymore) in this file. This step activates the switch (a handler) restart mariadb, which is defined in Listing 4 for the service module with notify if required (if the module returns changed as result). The handler ensures that the MariaDB server re-reads its configuration files, with enabled it is simultaneously specified that the associated Systemd service should be active again after a reboot of the target machine (the module switches the service unit if that isn’t the case).

- name: restart mariadb
  service:
    name: mariadb
    state: restarted
    enabled: true

By using handlers instead of fixed steps the service module can be prevented from reloading or restarting the Systemd service repeatedly every time the playbook runs again, which is then just not necessary to happen every time when this part isn’t further developed, but remains the same. Ansible’s imperative method requires you to think carefully when writing procedures and to always keep an eye also on the repeated run of a playbook.

Frontend

The playbook for the setup of the Flask application (Listing 5) is a bit shorter, but several files have to be provided in this role so that Ansible can upload them.

- name: Apache und benötigte Pakete installieren
  apt:
    name: "{{ item }}"
    state: latest
    update_cache: yes
  with_items:
    - apache2
    - libapache2-mod-wsgi
    - python-flask
    - python-mysqldb

- name: WSGI-Starter aufspielen
  copy:
    src: querier.wsgi
    dest: /var/www/querier/

- name: Pseudo-User für WSGI-Prozesss anlegen
  user:
    name: wsgi
    shell: /bin/false
    state: present

- name: Applikation aufspielen
  template:
    src: querier.py.j2
    dest: /var/www/querier/querier.py
    owner: wsgi
    mode: 0600
  notify: reload apache

- name: Konfiguration für virtuellen Host aufspielen
  copy:
    src: querier.conf
    dest: /etc/apache2/sites-available/
  notify: reload apache

- name: virtuellen Host enablen
  file:
    src: /etc/apache2/sites-available/querier.conf
    dest: /etc/apache2/sites-enabled/querier.conf
    state: link

- name: Default-Startseite disablen
  file:
    path: /etc/apache2/sites-enabled/000-default.conf
    state: absent

Initially some packages are installed again in one step: the web server apache2, the corresponding WSGI extension, Flask, and the same Python library for MySQL again – this time it is not for an Ansible module but for the Flask application. You then install a WSGI launch script (Listing 6) to the target machine with copy. This module automatically searches in the files/ folder of this role so you don’t have to specify a source path here. The target path is automatically created by the module if it doesn’t exist already. The WSGI process requires a pseudo-user on the target system to avoid running with root privileges. The playbook then creates this easily with the module user.

import sys
sys.path.insert(0, '/var/www/querier')
from querier import app as application

With the next step the actual application querier.py (Listing 7) can be loaded.

from flask import Flask
import json
import MySQLdb as mysqldb

app = Flask(__name__)

ipv4 = '{{ ansible_eth0.ipv4.address }}'
hostname = '{{ ansible_hostname }}'

mydb = mysqldb.connect(user = 'employees',
    host = '{{ hostvars[groups.datenbank.0].ansible_default_ipv4.address }}',
    passwd = '{{ employees_password }}',
    db = 'employees')

@app.route("/<abteilung>")
def topearners(abteilung):
    cursor = mydb.cursor()

    command = cursor.execute("""SELECT e.last_name, e.first_name, d.dept_no,
        max(s.salary) as max_sal FROM employees e
        JOIN salaries s ON e.emp_no = s.emp_no AND s.to_date > now()
        JOIN dept_emp d ON e.emp_no = d.emp_no
        WHERE d.dept_no = 'd%s'
        GROUP BY e.emp_no ORDER BY max_sal desc limit 1;""" % abteilung)

    results = cursor.fetchall()
    daten = (results[0])
    (nachname, vorname, abteilung, gehalt) = daten
    resultsx = (abteilung, vorname, nachname, gehalt, ipv4, hostname)
    return json.dumps(resultsx)

The Python script is set up as an Ansible template (with the extension .j2) and must be processed accordingly with the module template instead of copy. It has to be made available in templates/ for this purpose.

During the installation on the target system the variables in the template are expanded from the facts, and afterwards the IP address (ansible_eth0_ipv4.address) and the host name (ansible_hostname) of the respective target system could be found hard-coded here in place to appear in the output when the script runs. The flask application returns a JSON object with the result of the database query, whereby the accompanying IP address and the hostname just serves to check where the return of the load balancer actually comes from.

With hostvars you also get the IP address of the database backend from the inventory (Listing 8), which the application needs for the query of the database over the network. Just like in the case of the backend, employees_password is evaluated for the access password (Listing 3). This step is linked to the handler reload apache (Listing 9), which is triggered when you make changes to the template and then run the playbook through again to re-deploy that new version. In the next step, the playbook loads the configuration for the virtual host for the Apache webserver (Listing 10) on which the Flask app will run.

Two further steps are activating the virtual host with the common method for Apache. The first step is the creation of a softlink below /etc/apache2 with the module file. Then you delete the softlink which is already there for the Apache default starter page with the same module. Since it is not to be expected that these steps will undergo any change in the future (that is unless Apache happens to be changing) there is no need to supply these steps also with the reload apache handler. This doesn’t represent a problem for the first run of the playbook (like the configuration changes aren’t recognized by Apache which is already running) because this handler is already triggered by previous steps, and Ansible always sets off handlers at the end of a run. Thus, these changes to the default Apache configuration are ensured to take effect. Another subtle fact is that the reloaded switch for the service module not just reloads but also always starts a Systemd service if it is not already running. This for example is the case after installing the apache2 package.

[datenbank]
167.99.242.69

[applikation]
167.99.242.84
167.99.242.179
167.99.242.237

[load-balancer]
167.99.250.42

[all:vars]
ansible_user=root
ansible_ssh_private_key_file=~/.ssh/id_digitalocean
- name: reload apache
  service:
    name: apache2
    state: reloaded
    enabled: yes
<VirtualHost *:80>
 WSGIDaemonProcess querier user=wsgi group=wsgi threads=5
 WSGIScriptAlias / /var/www/querier/querier.wsgi
 <Directory /var/www/querier>
  WSGIProcessGroup querier
  WSGIApplicationGroup %{GLOBAL}
  Order allow,deny
  Allow from all
 </Directory>
</VirtualHost>

Load Balancer

For the installation of the load balancer you’ll need another handler to reload the corresponding Systemd service (Listing 12) and only two steps in the playbook (Listing 11), because everything crucial is happening in the template for the configuration data for the HAProxy (Listing 13). Under backend (meaning the backend of the Load Balancer) the necessary nodes have to be integrated. This is done by using a for loop provided by Jinja2 to iterate over the nodes in the applikation group in the inventory (Listing 8) and using hostvars to write their IP addresses and hostnames from the facts into this configuration file.

 

- name: haproxy-Paket installieren
  apt:
    name: haproxy
    state: latest
    update_cache: yes

- name: Konfiguration einspielen
  template:
    src: haproxy.cfg.j2
    dest: /etc/haproxy/haproxy.cfg
    backup: yes
  notify: reload haproxy
- name: reload haproxy
  service:
    name: haproxy
    state: reloaded
    enabled: yes
global
  daemon
  maxconn 256

{% if statspage %}
listen stats
  bind 0.0.0.0:9000
  mode http
  stats enable
  stats uri /haproxy?stats
  stats refresh 15s
{% endif %}

defaults
  mode http
  timeout connect 10s
  timeout client 1m
  timeout server 1m

frontend http
  bind {{ balancer_listen_address }}:{{ balancer_listen_port|default('80') }}
  default_backend querier
  
backend querier
{% for host in groups['applikation'] %}
  server {{ hostvars[host].ansible_hostname }} {{ hostvars[host].ansible_default_ipv4.address }}:80 check
{% endfor %}

The programming elements of Jinja2 such as if and for are always within simple brackets with percentage signs, like shown above in this file also at the if statspage: This block for the HAProxy dashboard will only be outputted if statspage is set to true, as in the example the master playbook shows when calling the role (Listing 14). Under frontend (meaning the frontend of the load balancer), two additional self-defined variables are evaluated. These are only valid for this role, which is why they are best defined under vars/ (Listing 15). For the port on which HAProxy awaits requests, the preset is the number 80 here with a special variable filter (default). If you want to change this, you just set balancer_listen_port accordingly in the variables files (Listing 15).

- hosts: datenbank
  roles:
    - mariadb

- hosts: applikation
  roles:
    - flask

- hosts: load-balancer
  roles:
    - {role: haproxy, statspage: true}
roles/haproxy/vars/main.yml

Deployment

To utilize a role two additional components are needed within an Ansible project. These components are an inventory resp. an inventory file (with any name, but it is often hosts or inventory) and a master playbook (quite often site.yml). Ansible inventories are written in simple INI format; they take IP addresses or DNS hostnames in, and can group the machine inventory associated therein with the project as desired (Listing 8).

The database group contains only one machine. Multiple entries would be also possible and they are assembled homogeneously, but the application only considers the first node (groups.database.0 in Listing 7). Beneath the applikation nodes you can scale arbitrarily and go beyond the three servers of the example if an extreme request load is to be expected because, just like it was said before, the template for the HAProxy iterates over all the nodes that are registered here. You can simply add more machines later, if it is necessary. Ansible must be run again to do this in order to populate the newly added nodes and to update the load balancer accordingly.

Other load-balancer nodes are of course also possible and work in the same way, but the use of HAProxy actually is supposed to overcome alternative accesses. Frontend nodes that are out-of-order are automatically caught by the remaining ones. However, if you want to compensate the failure of the database or the load balancer, you need an even more complex setup with built-in monitoring. The example used in this article uses virtual servers from DigitalOcean (Fig. 1). The username (ansible_user) and path to the private SSH key on the workstation (ansible_ssh_private_key_file) can be given directly in the inventory.

Fig. 1: The user dashboard of DigitalOcean

The master playbook (Listing 14) links the groups from the inventory (hosts) with the roles in the project and determines the order in which the deployment should take place. To do this, simply trigger ansible-playbook -i hosts site.yml and Ansible will work through the entire project. During the run, a log file is output which lists the individual steps and shows whether changes have taken place (which is the case everywhere during the first run). The entire setup is installed after a few minutes. Then, the load balancer just has to be addressed. For this purpose the nine departments of the fictitious group are available as endpoints (001-009):

$ curl 167.99.250.42/001

["d001", "Akemi", "Warwick", 145128, "167.99.242.179", "frontend2"]

$ curl 167.99.250.42/002

["d002", "Lunjin", "Swick", 142395, "167.99.242.237", "frontend3"]

$ curl 167.99.250.42/003

["d003", "Yinlin", "Flowers", 141953, "167.99.242.84", "frontend1"]

The returned JSON object always contains first and foremost the queried department, the first and last name of the respective top earner, the current annual salary of this person, as well as the IPv4 address and the hostname of the frontend from which the return originates, like explained. If you call the HAProxy dashboard (Fig. 2), you can follow the load balancer doing its work. It always takes a few seconds until the result of a request arrives because the backend has to work through about 160 MB of data each time. Should the request becomes larger, then it is better to provide the MariaDB server with a more powerful hardware and also to take more in-depth tuning measures.

Fig. 2: The statistics page of HAProxy

STAY TUNED

Learn more about DevOpsCon

Conclusion

Ansible is a relevant tool for deploying applications on servers. Not only single-node setups but also multi-tier setups can be implemented with it. Ansible unfolds its full strength and offers the roles of a proven means of structuring complex projects. The example has shown how to use this provisioner to deploy a setup with three communicating components to five nodes. It also showed how playbooks and the built-in modules are used to perform procedurally ordered steps on the target machines.

The example setup is not suitable for production and for potential attackers it is rather a challenge at the primary school level: The database server, for example, is not secured (no root password is set and anonymous access is possible), the frontends are individually addressable, the internal connections are unsecured (you would rather use a private network for this), the load balancer is not addressable via HTTPS, etc.

Hardening the MariaDB server would also be done with Ansible tools according to the script mysql_secure_installation, which is included in the package. Of course, the setup itself should not be in the foreground here, but rather how to get such a construct installed automatically with Ansible and offer starting points for a more engaged and detailed activity with this tool. But be careful, like many other DevOps tools, Ansible has a certain addiction potential.

Links & Literature

[1] Stender, Daniel: „Mit Flask Webapplikationen in Python entwickeln“, in: Entwickler Magazin 6.2017, S. 68-75

Stay tuned:

Behind the Tracks

 

Kubernetes Ecosystem

Docker, Kubernetes & Co

Microservices & Software Architecture

Maximize development productivity

Continuous Delivery & Automation

Build, test and deploy agile

Cloud Platforms & Serverless

Cloud-based & native apps

Monitoring, Traceability & Diagnostics

Handle the complexity of microservices applications

Security

DevSecOps for safer applications

Business & Company Culture

Radically optimize IT

Organizational Change

Overcome obstacles on the road to DevOps

Live Demo #slideless

Showing how technology really works