We use cookies to improve your experience on our website. By browsing this website, you agree to our use of cookies. More info Ok, I understand!

21 March 2016

Consul Service Discovery. Part 2

Consul Service Discovery. Part 2

In the first part we considered in depth what problems and tasks can be solved by application distributed architecture. We defined what tools can be used for solving these problems and marked the importance of discovery implementation at the setting stage of the project. We also chose Consul as a base application for discovery – service implementation.

In the second part we will review Consul’s work with DNS protocol, describe the main requests to HTTP API, clarify what types of Health Check can be used and, of course, find out the importance of K/V storage. And what is more important – we will become familiar with some of the features in practice.

DNS interface

Consul can respond to the requests according to DNS protocol. Any DNS-client can be used for requests. DNS-interface is available for components on local host, port 8600. Besides straight requests to Consul, you can specify it as a resolver in the system and transparently use it for name resolution by proxying all external requests to the above stated “full" DNS-server and resolving requests in private zone .consul by yourself.

In case several services exist in catalogue with same names and different IP-addresses, Consul accidentally mixes addresses in the response for implementation of DNS primitive load-balancing.

It is possible to make either a straight request for domain name resolution in terms of cluster or to make a lookup. It can be done for both service lookup and node lookup.

Domain name format for DNS request in terms of consul-cluster is strictly defined and is not a subject to change.

Cluster node

This is a typical DNS-request that will return cluster node IP-address by its name (the node name is stated at agent start with the help of - node parameter).

Let’s review node’s name format for DNS-request:

<node>.node[.datacenter].<domain>
  • <node> - obligatory part, node’s name;
  • .node – indicator for making a node lookup;
  • [.datacenter] – optional part, datacenter name (consul “out of the box" can provide discovery for several datacenters in terms of one cluster. By default, “dc1" name is used. If the name of a datacenter is not included, the current datacenter is used. The current datacenter is a datacenter in terms of which there is a running agent that receives the request);
  • <domain> - obligatory part, private Consul top-level domain has the meaning .consul by default.

So, domain name for node (for example, by name nodeservice) will look like that:

nodeservice.node.consul.

As we can see the datacenter’s name is dropped, but the name can be built like that:

nodeservice.node.dc1.consul.

Several nodes with the same name in terms of one datacenter are not allowed.

Service

Request for a service lookup by name is processed on all cluster nodes. Request for a service lookup provides more opportunities than request for a domain name resolution. Besides a request for service IP-address (A record) we can make a request for getting SRV-record and find out ports with a running service.

That’s how a typical request for nodes lookup with a running server (with the name rls) looks like:

root@511cdc9dd19b:~# dig @127.0.0.1 -p 8600 rls.service.consul.

; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> @127.0.0.1 -p 8600 rls.service.consul.
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26143
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;rls.service.consul.        IN  A

;; ANSWER SECTION:
rls.service.consul. 0   IN  A   172.17.0.2
rls.service.consul. 0   IN  A   172.17.0.3

;; Query time: 4 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Thu Feb 18 07:23:00 UTC 2016
;; MSG SIZE  rcvd: 104

From this response we can see that there are two nodes with running service (rls) in cluster and that Consul DNS-interface returns IP-addresses of all nodes. If we repeat the request for several times, we will see the logs switching places. It means that the first place is not reserved to the first found server. This is an example of simple DNS load-balancing that we mentioned above.

If we request SRV-record, the response will be the following:

root@511cdc9dd19b:/# dig @127.0.0.1 -p 8600 rls.service.consul. SRV

; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> @127.0.0.1 -p 8600 rls.service.consul. SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 8371
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;rls.service.consul.        IN  SRV

;; ANSWER SECTION:
rls.service.consul. 0   IN  SRV 1 1 80 agent-two.node.dc1.consul.
rls.service.consul. 0   IN  SRV 1 1 80 agent-one.node.dc1.consul.

;; ADDITIONAL SECTION:
agent-two.node.dc1.consul. 0    IN  A   172.17.0.3
agent-one.node.dc1.consul. 0    IN  A   172.17.0.2

;; Query time: 5 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Thu Feb 18 07:39:22 UTC 2016
;; MSG SIZE  rcvd: 244

In ANSWER SECTION there are domain names of nodes in format, requested by Consul (pay attention- these are nodes, not services!) and ports with running requested service. Nodes IP-addresses are listed in ADDITIONAL SECTION of the response.

Service name format for DNS-request looks like:

[tag.]<service>.service[.datacenter].<domain>
  • [tag.] – optional part. It is used for service filtration by tags. If we have services with the same name but different tags, then adding tag’s name can help to filter the respond;
  • <service> - obligatory part, service name;
  • .service - indicates that we make a service lookup;
  • [.datacenter] – optional part, datacenter name;
  • <domain> - obligatory part, Consul top-level private domain.

So, service with nginx name, possessing tag by name “web", can be presented as domain:

web.nginx.service.consul

SRV-requests for services lookup according to RFC-2782

Besides the “usual" building of domain name we can build it according to more strict rules RFC-2782 (https://www.ietf.org/rfc/rfc2782.txt) for request execution on getting SRV-record. Name format looks like:

_service._tag.service[.datacenter].<domain>

Service name and tag have underscore (_) in the form of prefix. (In original RFC instead of tag should be protocol name. This helps to prevent collisions while making a request). In case of using name in RFC-2782 format, service with nginx name, possessing tag by name “web", will look like that:

_web._nginx.service.consul

The response will be the same as in case of a “simple" request:

root@511cdc9dd19b:/# dig @127.0.0.1 -p 8600 _rls._rails.service.consul. SRV

; <<>> DiG 9.9.5-3ubuntu0.7-Ubuntu <<>> @127.0.0.1 -p 8600 _rls._rails.service.consul. SRV
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26932
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;_rls._rails.service.consul.    IN  SRV

;; ANSWER SECTION:
_rls._rails.service.consul. 0   IN  SRV 1 1 80 agent-one.node.dc1.consul.
_rls._rails.service.consul. 0   IN  SRV 1 1 80 agent-two.node.dc1.consul.

;; ADDITIONAL SECTION:
agent-one.node.dc1.consul. 0    IN  A   172.17.0.2
agent-two.node.dc1.consul. 0    IN  A   172.17.0.3

;; Query time: 6 msec
;; SERVER: 127.0.0.1#8600(127.0.0.1)
;; WHEN: Thu Feb 18 07:52:59 UTC 2016
;; MSG SIZE  rcvd: 268

By default, all domain names in terms of Consul have TTL = 0, meaning they are not cashed at all. It is important to remember.

HTTP API

HTTP REST API is a main tool for Consul cluster management and it offers a range of opportunities. In terms of API 10 endpoints are processed. Each of them provides access to configuration of a certain functional Consul aspect. In Consul Documentation there is detailed endpoints description.

https://www.consul.io/docs/agent/http.html, here is a brief description of them for having an idea of API opportunities:

  • acl – access control;
  • agent - Consul agent management;
  • catalog – cluster nodes and services management;
  • coordinate – network coordinates;
  • event – custom events;
  • health – availability check;
  • kv - Key/Value storage;
  • query – prepared requests;
  • session - sessions;
  • status – system status.

acl

From the name we can get that acl manages access control to Consul services. We can regulate access for receiving and changing data about services, nodes, custom events and also to perform access control to k/v-storage.

agent

Managing Consul local agent. All operations at that endpoint affects local agent data. It is possible to get information about current agent condition, its role in cluster, and also to get access for managing local services. Changes, implemented to local services will be synchronized with all cluster nodes.

catalog

Managing Consul global registry. Work with nodes and services is concentrated here. In terms of this endpoint it is possible to register and turn off services. In case of work with services - using this section is more preferable than working through agent. Working through catalog is easier, clearer and facilitates anti-entropy (https://www.consul.io/docs/internals/anti-entropy.html).

coordinate

Consul uses network tomography (https://en.wikipedia.org/wiki/Network_tomography) for measurement of network coordinates. These coordinates are used for building effective routes in terms of cluster and many other useful functions, like lookup of nearest node with provided service or switching to the nearest datacenter in case of system crash. API functions in this section are used only for receiving information about current statement of network coordinates.

event

Custom events processing. Custom events are used for performing any actions in terms of cluster: for automatic deploy, services restart, certain scripts running or other acts in terms of orchestration.

health

Check of nodes and services current state. This endpoint is used only for reading and returns the current state of nodes and services, and also lists of performed checks.

kv

This endpoint has only one method and is used for data management in distributed key/value-storage, provided by Consul. The only method in this endpoint looks like:

/v1/kv/<key>

The difference in processing is in the request method. GET will return the value by key, PUT will save the new value or rewrite the old one, and DELETE will delete the record.

query

Managing of Prepared queries. These queries allow making complex manipulations at Consul configuration and can be saved and performed later. Saved queries get a unique ID. With the help of the ID the query can be made any time without a necessity of repeated preparation.

session

Session mechanism in Consul is used for building distributed locks. Sessions represent binding layer between nodes, running checks and k/v-storage. Every session has a name and it can be saved in the storage. The name is used for locks in terms of sequential activities with nodes and services in concurrent mode. The mechanism of sessions work is described in Consul documentation: https://www.consul.io/docs/internals/sessions.html.

status

This endpoint is used for receiving information on cluster status. Here it is possible to find out the current leader and receive information on cluster members.

Health Checks

Previously we talked about equal load distribution with the help of DNS, and now we are going to observe the health check mechanism of nodes and services state. Health check – is a periodical operation. According to its results we can define the condition of the tested system. In fact this is an automatic monitoring, that keeps the cluster condition in working state, clears not working nodes and services and returns them to work by rehabilitation. Consul supports several types of checks:

  • Script check – running of a certain script on a certain node at predetermined intervals. According to escape code (any different from zero code means that the check fails) it turns on or turns off the node or service.
  • HTTP Check – check, that is trying to get the stated URL, and according to response code it turns on or turns off the testing object (any 2xx – is ok, code 429 Too Many Requests generates a warning, other codes report an error).
  • TCP Check - check, that is trying to establish tcp-connection with a predetermined interval to a determined address and port. Connection failure means that the check fails.
  • TTL Check - check, that should be periodically updated by HTTP API. If a service doesn’t update this check in terms of a certain interval, it is indicated as non-working. This is a passive check. The service has to report periodically of its work. If in predetermined interval there is no report, the check is considered failed.
  • Docker Check – check for services working in docker-containers. Consul, by using Docker Exec API, can run a script, located inside a container. The check result will depend on escape code (any different from zero means the check failure).

K/V storage

The storage, provided by Consul is a distributed key-value database and can be used for saving any data, accessible for any cluster member (according to ACL rules, of course). Services can save in the storage data that is essential for other cluster’s members. It can be values of configuration options, some calculation results or as we stated above, k/v-storage can be used for distributed locks implementation with the help of session mechanism. Using k/v-storage allows us to make cluster more effective and to decrease percentage of manual configuration. Services can correct their state according to the information in the storage, provided by cluster. Note: don’t save in this storage any data concerning business-logics of your services. The storage, provided by Consul, is used for keeping and distributing metainformation on cluster members’ condition, but not for keeping data that cluster members are processing.

Conclusion

It is hard to overestimate the role of discovery-service in the process of building distributed architecture on big projects. Consul is the perfect choice for that role. The product is constantly developing and doesn’t forge ahead. Lots of useful functionality, necessary for easy support of the system with many components, was implemented. Besides, Consul is written in Go and is distributed as single executable file, which makes updating and support process very convenient.