Crate Cluster not forming on Docker Swarm
See original GitHub issueCrateDB version:
3.0.5 (but it seems any > 2.0.6 2.3.11)
Environment description:
- Official Crate Docker Image, e.g. crate:3.0.5
- 2-nodes local swarm (tried with 3 also, but 2 is faster to reproduce)
- Docker version 18.06.1-ce, build e68fc7a
- All replicas attached to a docker overlay network
Problem description:
The latest version of Crate (3.0.7) is not able to form a cluster when running on Docker Swarm mode as it used to until version 2.0.6 2.3.11.
The previous idea was to use docker’s dnsrr
and set -Cdiscovery.zen.ping.unicast.hosts
to the name of the docker service so the discovery would eventually gather all actual container endpoints.
This, plus using -Cnetwork.host
flag, which I used to use with value 0.0.0.0
(none of the other options _local_
nor _site_
work now).
The issue seems to be in the discovery process (see logs at the end).
I wanted to understand a bit better what’s the issue under the hood so as to better judge:
- If there’s something I can do?
- Which are the options we’d have?
I also though opening this issue would be a good way to keep informed on what’s decided on the matter.
Thanks for your work!
Steps to reproduce:
Running this docker-compose.yml
would do.
version: '3.3'
services:
crate:
image: crate:3.0.5
command: ["crate",
"-Clicense.enterprise=false",
"-Cgateway.expected_nodes=2",
"-Cgateway.recover_after_nodes=1",
"-Cgateway.recover_after_time=5m",
"-Cdiscovery.zen.minimum_master_nodes=1",
"-Cdiscovery.zen.ping.unicast.hosts=crate",
"-Cdiscovery.zen.ping_timeout=15s",
"-Cnetwork.host=_local_",
"-Chttp.cors.enabled=true",
'-Chttp.cors.allow-origin="*"']
environment:
- MAX_MAP_COUNT=262144
- ES_JAVA_OPTS="-Xms1g -Xmx1g"
- CRATE_HEAP_SIZE=1g
deploy:
endpoint_mode: dnsrr
mode: global
labels:
- "traefik.port=4200"
- "traefik.frontend.rule=Host:crate.mydomain.com"
- "traefik.backend.loadbalancer.sticky=true"
- "traefik.backend=crate"
- "traefik.backend.loadbalancer.swarm=false"
update_config:
parallelism: 1
delay: 10s
volumes:
- cratedata:/data
networks:
- backend
volumes:
cratedata:
networks:
backend:
driver: overlay
Additional Logs:
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:35,812][INFO ][o.e.n.Node ] [Tête du Clotonnet] initializing ...
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:36,037][INFO ][o.e.e.NodeEnvironment ] [Tête du Clotonnet] using [1] data paths, mounts [[/data (/dev/sda1)]], net usable_space [15.3gb], net total_space [17.8gb], types [ext4]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:36,039][INFO ][o.e.e.NodeEnvironment ] [Tête du Clotonnet] heap size [1015.6mb], compressed ordinary object pointers [true]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:37,385][INFO ][i.c.plugin ] [Tête du Clotonnet] plugins loaded: [jmx-monitoring, hyperLogLog, lang-js]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
crate3_crate.0.t16mhtw8djlh@ms-worker0 | SLF4J: Defaulting to no-operation (NOP) logger implementation
crate3_crate.0.t16mhtw8djlh@ms-worker0 | SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,398][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] no modules loaded
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,410][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [crate-azure-discovery]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,411][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [es-repository-hdfs]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,411][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [io.crate.plugin.BlobPlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,411][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [io.crate.plugin.CrateCorePlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,411][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [io.crate.plugin.HttpTransportPlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,411][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [io.crate.plugin.PluginLoaderPlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,411][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [io.crate.plugin.SrvPlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,412][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [io.crate.udc.plugin.UDCPlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,412][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [org.elasticsearch.analysis.common.CommonAnalysisPlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,414][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [org.elasticsearch.discovery.ec2.Ec2DiscoveryPlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,414][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [org.elasticsearch.plugin.repository.url.URLRepositoryPlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,415][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [org.elasticsearch.repositories.s3.S3RepositoryPlugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,415][INFO ][o.e.p.PluginsService ] [Tête du Clotonnet] loaded plugin [org.elasticsearch.transport.Netty4Plugin]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,480][INFO ][o.e.n.Node ] [Tête du Clotonnet] node name [Tête du Clotonnet], node ID [2HtcoZYZT_m-W6Onr4sZJg]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,548][INFO ][o.e.n.Node ] [Tête du Clotonnet] CrateDB version[3.0.5], pid[1], build[8970370/2018-07-31T06:18:44Z], OS[Linux/4.9.93-boot2docker/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_171/25.171-b11]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:39,549][INFO ][o.e.n.Node ] [Tête du Clotonnet] JVM arguments [-Xms1g, -Xmx1g, -Djava.awt.headless=true, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Xloggc:/data/log/gc.log, -XX:+PrintGCDetails, -XX:+PrintGCDateStamps, -XX:+PrintTenuringDistribution, -XX:+PrintGCApplicationStoppedTime, -XX:+UseGCLogFileRotation, -XX:NumberOfGCLogFiles=16, -XX:GCLogFileSize=64m, -XX:+DisableExplicitGC, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/data/data, -XX:+UnlockExperimentalVMOptions, -XX:+UseCGroupMemoryLimitForHeap, -Des.cgroups.hierarchy.override=/]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:43,177][INFO ][o.e.d.DiscoveryModule ] [Tête du Clotonnet] using discovery type [zen]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:45,824][INFO ][i.c.p.s.SslContextProvider] HTTP SSL support is disabled.
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:46,878][INFO ][o.e.n.Node ] [Tête du Clotonnet] initialized
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:46,879][INFO ][o.e.n.Node ] [Tête du Clotonnet] starting ...
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:47,057][INFO ][psql ] [Tête du Clotonnet] PSQL SSL support is disabled.
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:47,262][INFO ][psql ] [Tête du Clotonnet] publish_address {127.0.0.1:5432}, bound_addresses {127.0.0.1:5432}
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:47,315][INFO ][i.c.p.h.CrateNettyHttpServerTransport] [Tête du Clotonnet] publish_address {127.0.0.1:4200}, bound_addresses {127.0.0.1:4200}
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:47,352][INFO ][o.e.t.TransportService ] [Tête du Clotonnet] publish_address {127.0.0.1:4300}, bound_addresses {127.0.0.1:4300}
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:03:50,145][WARN ][o.e.d.z.UnicastZenPing ] [Tête du Clotonnet] failed to resolve host [crate]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | java.net.UnknownHostException: crate: Name does not resolve
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) ~[?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) ~[?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) ~[?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.net.InetAddress.getAllByName0(InetAddress.java:1276) ~[?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.net.InetAddress.getAllByName(InetAddress.java:1192) ~[?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.net.InetAddress.getAllByName(InetAddress.java:1126) ~[?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at org.elasticsearch.transport.TcpTransport.parse(TcpTransport.java:917) ~[crate-app-3.0.5.jar:3.0.5]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at org.elasticsearch.transport.TcpTransport.addressesFromString(TcpTransport.java:872) ~[crate-app-3.0.5.jar:3.0.5]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at org.elasticsearch.transport.TransportService.addressesFromString(TransportService.java:699) ~[crate-app-3.0.5.jar:3.0.5]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at org.elasticsearch.discovery.zen.UnicastZenPing.lambda$null$0(UnicastZenPing.java:213) ~[crate-app-3.0.5.jar:3.0.5]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:568) [crate-app-3.0.5.jar:3.0.5]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | at java.lang.Thread.run(Thread.java:748) [?:1.8.0_171]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:04:05,192][INFO ][o.e.c.s.MasterService ] [Tête du Clotonnet] zen-disco-elected-as-master ([0] nodes joined), reason: new_master {Tête du Clotonnet}{2HtcoZYZT_m-W6Onr4sZJg}{eX1oGcS7Q1CKRLQlpZKjow}{127.0.0.1}{127.0.0.1:4300}{http_address=127.0.0.1:4200}
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:04:05,208][INFO ][o.e.c.s.ClusterApplierService] [Tête du Clotonnet] new_master {Tête du Clotonnet}{2HtcoZYZT_m-W6Onr4sZJg}{eX1oGcS7Q1CKRLQlpZKjow}{127.0.0.1}{127.0.0.1:4300}{http_address=127.0.0.1:4200}, reason: apply cluster state (from master [master {Tête du Clotonnet}{2HtcoZYZT_m-W6Onr4sZJg}{eX1oGcS7Q1CKRLQlpZKjow}{127.0.0.1}{127.0.0.1:4300}{http_address=127.0.0.1:4200} committed version [1] source [zen-disco-elected-as-master ([0] nodes joined)]])
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:04:05,216][INFO ][o.e.g.GatewayService ] [Tête du Clotonnet] delaying initial state recovery for [5m]. expecting [2] nodes, but only have [1]
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:04:05,223][INFO ][o.e.n.Node ] [Tête du Clotonnet] started
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:05:04,564][INFO ][o.e.n.Node ] [Tête du Clotonnet] stopping ...
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:05:04,719][INFO ][o.e.n.Node ] [Tête du Clotonnet] stopped
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:05:04,720][INFO ][o.e.n.Node ] [Tête du Clotonnet] closing ...
crate3_crate.0.t16mhtw8djlh@ms-worker0 | [2018-08-30T14:05:04,786][INFO ][o.e.n.Node ] [Tête du Clotonnet] closed
Update: bump latest versions working (2.3.11
) and not working (3.0.7
) on docker swarm.
Issue Analytics
- State:
- Created 5 years ago
- Reactions:1
- Comments:16 (6 by maintainers)
Top GitHub Comments
@taliaga I’m having the same issue but we have the enterprise support. I’ll let you know when we get to the bottom of this behavior.
Thanks @quodt , that works with crate:2.3.11, so I’ll update the “highest working version”. I have just tried the same setting for
3.0.7
but the reported problem still persists.