How to Provision a Multi Node Elasticsearch Cluster Using Ansible
Mar 2019
You can see the sample code for this tutorial on GitHub.
Elasticsearch is a distributed, NoSQL, document database, built on top of Lucene. There are so many things I could say about Elasticsearch, but instead I'll focus on how to install a simple 3-node cluster with an Ansible role. The following example will not have any security baked into it, so it's really just a starting point to get you up and running.
To properly work along with the following example, you'll need ansible and probably vagrant (with virtualbox).
Initializing an Ansible Role
I'm electing to use Molecule to initialize an ansible role for me, and vagrant as the VM provider:
$ molecule init role -d vagrant -r install-elasticsearch-cluster
$ cd install-elasticsearch-cluster
We can use some convenient yaml syntax to define some inventory for fleshing out our role in the molecule/default/molecule.yml file. First, adjust the platforms section to look like the following:
platforms:
- name: elasticsearchNode1
box: ubuntu/xenial64
memory: 4096
provider_raw_config_args:
- "customize ['modifyvm', :id, '--uartmode1', 'disconnected']"
interfaces:
- auto_config: true
network_name: private_network
ip: 192.168.56.101
type: static
- name: elasticsearchNode2
box: ubuntu/xenial64
memory: 4096
provider_raw_config_args:
- "customize ['modifyvm', :id, '--uartmode1', 'disconnected']"
interfaces:
- auto_config: true
network_name: private_network
ip: 192.168.56.102
type: static
- name: elasticsearchNode3
box: ubuntu/xenial64
memory: 4096
provider_raw_config_args:
- "customize ['modifyvm', :id, '--uartmode1', 'disconnected']"
interfaces:
- auto_config: true
network_name: private_network
ip: 192.168.56.103
type: static
This will instruct molecule to create three virtual machines, each of the Ubuntu/xenial64 distribution, with 4GB of RAM, and IP addresses of 192.168.56.(101-103). By specifying the name option, we also have implicitly specified that our "inventory" for any local testing will contain the host names elasticsearchNode1, elasticsearchNode2, and elasticsearchNode3. This is important, as we can then define host variables for each of these in our playbooks in the provisioner section:
provisioner:
name: ansible
inventory:
host_vars:
elasticsearchNode1:
node_name: es_node_1
is_master_node: true
elasticsearchNode2:
node_name: es_node_1
is_master_node: true
elasticsearchNode3:
node_name: es_node_1
is_master_node: false
We will be using these variables in a bit.
Actually installing Elasticsearch is pretty straightforward if you elect to use the deb distribution file. All we need is Java 8 as a prerequisite, which is available via the package manager on xenial64. Set your tasks/main.yml file to look like:
---
- name: ensure Java is installed
apt:
name: "openjdk-8-jdk"
state: present
update_cache: yes
become: yes
- name: download deb package
get_url:
dest: "/etc/{{ elasticsearch_deb_file }}"
url: "https://artifacts.elastic.co/downloads/elasticsearch/{{ elasticsearch_deb_file }}"
checksum: "sha512:https://artifacts.elastic.co/downloads/elasticsearch/{{ elasticsearch_deb_file }}.sha512"
become: yes
- name: install from deb package
apt:
deb: "/etc/{{ elasticsearch_deb_file }}"
become: yes
We need to add, at a minimum, some variables to work with. For brevity's sake I'll include some variables which will become important later. Edit your defaults/main.yml file to look like:
node_name: example_node
is_master_node: true
elasticsearch_deb_version: 6.3.0
elasticsearch_deb_file: elasticsearch-{{ elasticsearch_deb_version }}.deb
cluster_name: my_cluster_name
elasticsearch_http_port_range: 9200-9300
The deb file automatically includes a systemd service file. By default, it looks for elasticsearch configuration in the /etc/elasticsearch/elasticsearch.yml file. The real meat of installing elasticsearch effectively (as is the case with most tools like it) is in the configuration, and that's where we have to go.
We can use a Jinja2 template to make this playbook more reuseable, utilizing many of the variables that were previously defined. First, create a templates/elasticsearch.yml.j2 file from your root directory, and populate it with the following:
cluster.name: {{ cluster_name }}
network:
publish_host: {{ ansible_facts['all_ipv4_addresses'] | last }}
bind_host: 0.0.0.0
http.port: {{ elasticsearch_http_port_range }}
transport.tcp.port: 9300
node.master: {{ is_master_node }}
node.name: {{ node_name }}
path:
logs: /var/log/elasticsearch
data: /var/lib/elasticsearch
discovery:
zen:
ping.unicast.hosts: [ '{{ hostvars['elasticsearchNode1']['ansible_facts']['all_ipv4_addresses'] | last }}:9300', '{{ hostvars['elasticsearchNode2']['ansible_facts']['all_ipv4_addresses'] | last }}:9300', '{{ hostvars['elasticsearchNode3']['ansible_facts']['all_ipv4_addresses'] | last }}:9300' ]
minimum_master_nodes: 2
The most critical parts (the parts that make the cluster work together) are the network.publish_host value, which must be unique, and the discovery.zen.ping.unicast.hosts value, which must contain the locations that any member of the cluster can find other members (looking at the transport.tcp.port value for which port to look at). If there are multiple IP addresses on the box you're putting elasticsearch on (e.g. 10.0.0.1 and 192.56.168.101), then the node will fail to start unless you explicitly tell elasticsearch which publish_host you want it to advertise itself on. It can bind to multiple public hosts, but only publish itself to one.
With the above configured, you should be able to run:
$ molecule create && molecule converge
And, eventually, the cluster should come up and sync with each other. If you hit http://192.168.56.101:9200, http://192.168.56.102:9200, or http://192.168.56.103:9200, you should see the same cluster_uuid in each case, letting you know that they are working together.
Nick Fisher is a software engineer in the Pacific Northwest. He focuses on building highly scalable and maintainable backend systems.