question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Elasticsearch unable to start because of java.lang.IllegalStateException: Failed to create node environment

See original GitHub issue

Describe the bug

Elasticsearch image is not able to create “node environment” in the mounted (persistent) /usr/share/elasticsearch/data This is due to permission issues because of fsGroup (it was set to 0) The Java Exception is: java.lang.IllegalStateException: Failed to create node environment

To Reproduce

Steps to reproduce the behavior:

  1. Create an Elasticsearch deployment
  2. See error

Expected behavior

Elasticsearch pod up&running.

Additional context

Tested by adding fsGroup: 0 in the deployment securityContext and it works as expected

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
Schnitzelcommented, Apr 15, 2021

I did some more research on this, as I was confused why this has not been a problem earlier.

So turns out if you mount a NEW PVC into a container, without any securityContext.fsGroup settings in the Pod, this is how the PVC is mounted into the container:

[lagoonproject]dev@elasticsearch:/usr/share/elasticsearch/data$ ls -lisa
total 24
      2  4 drwxr-xr-x 3 root          root  4096 Apr 15 21:21 .
1851635  4 drwxrwxr-x 1 elasticsearch root  4096 Feb 13  2019 ..
     11 16 drwx------ 2 root          root 16384 Apr 15 21:21 lost+found

see the drwxr-xr-x on the main folder, the container itself is running as user/group root/root, so technically it should have rite access (user root is owner of the folder and has write access) but the elasticsearch process is started under the user/group elasticsearch/root, see here:

[lagoonproject]dev@elasticsearch:/usr/share/elasticsearch$ ps -aux   
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root           1  0.0  0.0   4364   624 ?        Ss   10:41   0:00 /sbin/tini -- /lagoon/entrypoints.bash /usr/local/bin/docker-entrypoint.sh
elastic+       6  0.2  2.9 3917836 480732 ?      Sl   10:41   2:12 /opt/jdk-11.0.1/bin/java -Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupan

which means the elasticsearch user has no write access to the PVC (user doesn’t match, and the group matches, but the group does not have write access).

Now, if you set securityContext.fsGroup: 0 inside the pod it looks like this:

[lagoonproject]dev@elasticsearch:/usr/share/elasticsearch/data$ ls -lisa
total 24
      2  4 drwxrwsr-x 3 root          root  4096 Apr 15 21:21 .
1851635  4 drwxrwxr-x 1 elasticsearch root  4096 Feb 13  2019 ..
     11 16 drwxrws--- 2 root          root 16384 Apr 15 21:21 lost+found

the big difference is the drwxrwsr-x on the . folder, meaning the the group root has write access. Therefore elasticsearch will be able to access the data folder and do it’s thing. So turns out that setting securityContext.fsGroup: 0 does not only set the filesystem group to 0 (root) but also changes the permissions of the filesystem to writeable by group.

Now why id this not cause more havoc: After the permissions have been set once to drwxrwsr-x on the PVC, they stay that way, so even if we removed securityContext.fsGroup: 0 recently, all PVCs that where created before the removal hat the drwxrwsr-x on the . folder and everything is fine. Only if a new PVC (like a new project/migration) was added this caused issues on the very beginning.

It also only causes issues with container images that switch the user of the service to something else than root, like the elasticsearch or solr images do. I still though changed it in the PR for mariadb-single, mongo-single, postgres-single just to be safe

0reactions
rocketeerbkwcommented, Jul 7, 2022

Fixed in #2610

Read more comments on GitHub >

github_iconTop Results From Across the Web

Failed to created node environment · Issue #21 - GitHub
org.elasticsearch.bootstrap.StartupException: java.lang.IllegalStateException: Failed to created node environment
Read more >
Elasticsearch Service Fails to create Node Environment (5.5.6)
StartupException: java.lang.IllegalStateException: Failed to create node environment at org.elasticsearch.bootstrap.
Read more >
Troubles with ddev + elasticsearch latest version
I would start by deleting the docker volume that this creates, probably named "ddev-<projectname>_elasticsearch". docker volume ls | grep  ...
Read more >
unable to start elasticsearch - Google Groups
Caused by: java.lang.IllegalStateException: failed to obtain node locks, tried [[/var/lib/elasticsearch]] with lock id [0]; maybe these locations are not ...
Read more >
Elasicsearch connection test failed since 2.11.0
Hey. We updated our UCRM today to 2.11.0 and we now cant search and have "Elasicsearch connection test failed" in the system status...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found