#Docker : Fix services cannot see each other by name if they run on different swarm nodes issue

By | October 18, 2017

Lately after one of the updates of docker binaries I started to have a strange issue in a Docker Swarm setup configured here.
Services were no longer able to see each other if they were running on different nodes. If two services were running on the same swarm node there was no issue.
This is wrong as the whole idea of running services on a swarm was to hide from the service where they are actually running.
There is no trace in the Docker documentation that something was changed when a swarm is deployed.
The steps are:

STEP 1: Initialize the swarm on a manager node

[root@nas2 ~]# docker swarm init --advertise-addr 192.168.2.22
Swarm initialized: current node (kvanolk8l14nitzm9z2w5wg6i) is now a manager.
 
To add a worker to this swarm, run the following command:
...

STEP 2: Add a worker node to the swarm

[root@nas1 storage]# docker swarm join \
>     --token SWMTKN-1-5qwfhoezr4eq7edg7fg2gvtv5jl5m66br4vnugl63u9b9kk113-79168ao7nlq8dsn5kqzfxx9eh \
>     192.168.2.22:2377

Note that the above is exactly as described in official Docker documentation or how the command line suggests.

Note that in STEP 1 the manager node advertises an address to which the worker nodes should connect for the enrolment process. It seems that further on that is the address used for the management operations on the swarm. Strangely enough the workers do not advertise an address. This can be an issue if a worker has more IPs, we never now which one is used to register to the swarm. Until now it seems that the swarm was able to find the right address of the worker.

Something changed after one Docker update and the swarm is no longer finding the “good” IP. What I discovered is that the worker enrolment command also has the “–advertise-addr” option. By changing the STEP 2 to the following:

STEP 2′: Add a worker node to the swarm

[root@nas1 storage]# docker swarm join \
>     --token SWMTKN-1-5qwfhoezr4eq7edg7fg2gvtv5jl5m66br4vnugl63u9b9kk113-79168ao7nlq8dsn5kqzfxx9eh \
>     192.168.2.22:2377 --advertise-addr 192.168.2.122

where 192.168.2.122 is one of the IPs of the worker. Suddenly everything works as before. All the services of the Swarm can see each other independent on the node they are running on.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.