question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Permission denied error when calling ray.init()

See original GitHub issue

Someone ran into the following error today when calling ray.init().

The issue was that Ray is attempting to log to /tmp/ray and the user was on a shared machine in which a different user owned /tmp/ray. the solution was to call

ray.init(temp_dir='/tmp/something_else')

to force ray to log somewhere else (to a directory that the user could create).

---------------------------------------------------------------------------
PermissionError                           Traceback (most recent call last)
<ipython-input-2-3f68a533b944> in <module>
----> 1 ray.init()

/data/nileshtrip/miniconda3/lib/python3.6/site-packages/ray/worker.py in init(redis_address, num_cpus, num_gpus, resources, object_store_memory, redis_max_memory, node_ip_address, object_id_seed, num_workers, local_mode, driver_mode, redirect_worker_output, redirect_output, ignore_reinit_error, num_redis_shards, redis_max_clients, redis_password, plasma_directory, huge_pages, include_webui, driver_id, configure_logging, logging_level, logging_format, plasma_store_socket_name, raylet_socket_name, temp_dir, _internal_config, use_raylet)
   1452         global _global_node
   1453         _global_node = ray.node.Node(
-> 1454             head=True, shutdown_at_exit=False, ray_params=ray_params)
   1455         address_info["redis_address"] = _global_node.redis_address
   1456         address_info[

/data/nileshtrip/miniconda3/lib/python3.6/site-packages/ray/node.py in __init__(self, ray_params, head, shutdown_at_exit)
     84         self._webui_url = None
     85 
---> 86         self.start_ray_processes()
     87 
     88         if shutdown_at_exit:

/data/nileshtrip/miniconda3/lib/python3.6/site-packages/ray/node.py in start_ray_processes(self)
    272         logger.info(
    273             "Process STDOUT and STDERR is being redirected to {}.".format(
--> 274                 get_logs_dir_path()))
    275 
    276         # If this is the head node, start the relevant head node processes.

/data/nileshtrip/miniconda3/lib/python3.6/site-packages/ray/tempfile_services.py in get_logs_dir_path()
    104 def get_logs_dir_path():
    105     """Get a temp dir for logging."""
--> 106     logs_dir = os.path.join(get_temp_root(), "logs")
    107     try_to_create_directory(logs_dir)
    108     return logs_dir

/data/nileshtrip/miniconda3/lib/python3.6/site-packages/ray/tempfile_services.py in get_temp_root()
     92                 pid=os.getpid(), date_str=date_str),
     93             directory_name="/tmp/ray")
---> 94     try_to_create_directory(_temp_root)
     95     return _temp_root
     96 

/data/nileshtrip/miniconda3/lib/python3.6/site-packages/ray/tempfile_services.py in try_to_create_directory(directory_path)
     59         except OSError as e:
     60             if e.errno != os.errno.EEXIST:
---> 61                 raise e
     62             logger.warning(
     63                 "Attempted to create '{}', but the directory already "

/data/nileshtrip/miniconda3/lib/python3.6/site-packages/ray/tempfile_services.py in try_to_create_directory(directory_path)
     56     if not os.path.exists(directory_path):
     57         try:
---> 58             os.makedirs(directory_path)
     59         except OSError as e:
     60             if e.errno != os.errno.EEXIST:

/data/nileshtrip/miniconda3/lib/python3.6/os.py in makedirs(name, mode, exist_ok)
    218             return
    219     try:
--> 220         mkdir(name, mode)
    221     except OSError:
    222         # Cannot rely on checking for EEXIST, since the operating system

PermissionError: [Errno 13] Permission denied: '/tmp/ray/session_2019-01-29_16-18-38_28339'

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:2
  • Comments:10

github_iconTop GitHub Comments

2reactions
iglimanajcommented, Feb 18, 2021

For the latest version of ray you need to pass the temp_dir argument like the following: ray.init(_temp_dir='/tmp/something_else')

0reactions
Break00commented, Oct 17, 2022

@ARDivekar can you try with Ray 2.0? I just tried locally (on a Macbook) but wasn’t able to reproduce the issue.

I got the same issue with Ray 2.0.0

Code ray.init(address=‘auto’)

Error

File /opt/tljh/user/lib/python3.9/site-packages/ray/_private/client_mode_hook.py:105, in client_mode_hook.<locals>.wrapper(*args, **kwargs) 103 if func.name != “init” or is_client_mode_enabled_by_default: 104 return getattr(ray, func.name)(*args, **kwargs) –> 105 return func(*args, **kwargs)

File /opt/tljh/user/lib/python3.9/site-packages/ray/_private/worker.py:1475, in init(address, num_cpus, num_gpus, resources, object_store_memory, local_mode, ignore_reinit_error, include_dashboard, dashboard_host, dashboard_port, job_config, configure_logging, logging_level, logging_format, log_to_driver, namespace, runtime_env, storage, **kwargs) 1462 ray_params = ray._private.parameter.RayParams( 1463 node_ip_address=node_ip_address, 1464 raylet_ip_address=raylet_ip_address, (…) 1472 metrics_export_port=_metrics_export_port, 1473 ) 1474 try: -> 1475 _global_node = ray._private.node.Node( 1476 ray_params, 1477 head=False, 1478 shutdown_at_exit=False, 1479 spawn_reaper=False, 1480 connect_only=True, 1481 ) 1482 except ConnectionError: 1483 if gcs_address == ray._private.utils.read_ray_address(_temp_dir):

File /opt/tljh/user/lib/python3.9/site-packages/ray/_private/node.py:244, in Node.init(self, ray_params, head, shutdown_at_exit, spawn_reaper, connect_only) 237 self._plasma_store_socket_name = self._prepare_socket_file( 238 self._ray_params.plasma_store_socket_name, default_prefix=“plasma_store” 239 ) 240 self._raylet_socket_name = self._prepare_socket_file( 241 self._ray_params.raylet_socket_name, default_prefix=“raylet” 242 ) –> 244 self.metrics_agent_port = self._get_cached_port( 245 “metrics_agent_port”, default_port=ray_params.metrics_agent_port 246 ) 247 self._metrics_export_port = self._get_cached_port( 248 “metrics_export_port”, default_port=ray_params.metrics_export_port 249 ) 251 ray_params.update_if_absent( 252 metrics_agent_port=self.metrics_agent_port, 253 metrics_export_port=self._metrics_export_port, 254 )

File /opt/tljh/user/lib/python3.9/site-packages/ray/_private/node.py:801, in Node._get_cached_port(self, port_name, default_port) 798 # Maps a Node.unique_id to a dict that maps port names to port numbers. 799 ports_by_node: Dict[str, Dict[str, int]] = defaultdict(dict) –> 801 with FileLock(file_path + “.lock”): 802 if not os.path.exists(file_path): 803 with open(file_path, “w”) as f:

File /opt/tljh/user/lib/python3.9/site-packages/filelock/_api.py:220, in BaseFileLock.enter(self) 214 def enter(self) -> BaseFileLock: 215 “”" 216 Acquire the lock. 217 218 :return: the lock object 219 “”" –> 220 self.acquire() 221 return self

File /opt/tljh/user/lib/python3.9/site-packages/filelock/_api.py:173, in BaseFileLock.acquire(self, timeout, poll_interval, poll_intervall, blocking) 171 if not self.is_locked: 172 _LOGGER.debug(“Attempting to acquire lock %s on %s”, lock_id, lock_filename) –> 173 self._acquire() 175 if self.is_locked: 176 _LOGGER.debug(“Lock %s acquired on %s”, lock_id, lock_filename)

File /opt/tljh/user/lib/python3.9/site-packages/filelock/_unix.py:35, in UnixFileLock._acquire(self) 33 def _acquire(self) -> None: 34 open_mode = os.O_RDWR | os.O_CREAT | os.O_TRUNC —> 35 fd = os.open(self._lock_file, open_mode) 36 try: 37 fcntl.flock(fd, fcntl.LOCK_EX | fcntl.LOCK_NB)

PermissionError: [Errno 13] Permission denied: ‘/tmp/ray/session_2022-10-17_23-56-04_168517_260622/ports_by_node.json.lock’

Read more comments on GitHub >

github_iconTop Results From Across the Web

Permission denied with local cluster
I set up a local cluster with one head node and 3 workers that connected just fine. However when starting a job with...
Read more >
[Errno 13] Permission denied: '/tmp/raylet_start.lock'
I am getting this error while using zoo.orca.learn.tf2 Estimator. PermissionError: [Errno 13] Permission denied: '/tmp/raylet_start.lock'.
Read more >
Troubleshooting errors in AWS Glue
If AWS Glue returns an access denied error to an Amazon S3 bucket or object, it might be because the IAM role provided...
Read more >
OSError - Errno 13 Permission denied
I'm using gunicorn. I get same error message. I want to give gunicorn read/write access to a folder under /home/portfolio. How do I...
Read more >
wandb.init() · GitBook
If you're getting a LaunchError: Launch exception: Permission denied error, you don't have permissions to log to the project you're trying to send...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found