question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Running tile-reduce accross containers

See original GitHub issue

The current internal workings of tile-reduce make it very good for running on a single machine, but due to an issue with require('os').cpus.length, which doesn’t accurately report the number of CPU’s available within a docker container, tile-reduce can be very resource hungry when running in a container.

Rather than working with a single-machine/process-forking model for distributing work, could we write a version of tile-reduce that can run with workers as their own short-lived containers that communicate to a master container using HTTP or TCP? For use in ECS, we could use https://github.com/mapbox/ecs-watchbot to manage the orchestration of running new containers on an AWS ECS cluster. We could possibly create affordances for other orchestration tools (like mesos or kubernetes) if other people would like them.

cc/ @nickcordella @mourner @tcql @rclark

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
nickcordellacommented, Mar 9, 2017

I think the primary motivation for work like this would be to ease transition of systems currently running tile-reduce on EC2s over to ECS.

As someone who recently lived through the use case this is meant to address, I lean toward not dedicating concerted resources toward a custom tile-reduce version. It is easy to stack-parameterize MaxWorkers and tie it to reservation.cpu as a way of getting a tile-reduce job up and running on ECS. If that framework is not too offensive to the platform team, I’d suggest calling that out somewhere in the docs as a strategy to ease the transition a bit. But also acknowledge that a full leveraging of ECS Watchbot is a more elegant approach, whenever the developer feels comfortable.

0reactions
morganherlockercommented, Mar 11, 2017

An HTTP/TCP based mapreduce implementation is going to have very different performance characteristics compared to tile-reduce. The communication protocol has a major effect on architecture considerations for any particular job such as tile zoom or memory usage. I think something like this is worthwhile, but it’s a very different problem than what tile-reduce is trying to solve (closer to Hadoop).

If the noisy neighbor problem is a common footgun though, I would be in favor of making maxWorkers a required parameter with no default. We can document an example that sets maxWorkers to os.cpus, so the normal “I’m running this on a laptop” use case is easy to achieve. In general, I do think it is always best practice to carefully consider how many workers you want to use, even on a laptop (for other reasons like RAM constraints).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Runtime options with Memory, CPUs, and GPUs
Docker provides ways to control how much memory, or CPU a container can use, setting runtime configuration flags of the docker run command....
Read more >
How to limit docker container running time in python?
Then, under the cont.stop() code, type in time.sleep(300). (300 seconds is 5 mins, according to a calculator.).
Read more >
Containers - Optimize Resource Hub - Google Support
Reduce container size​​ End running experiences before starting new ones. Use different containers across different parts of your site. When combined with  ......
Read more >
Docker Containers Running Slowly? Try These Tricks!
With the reduced load time of the images, you can easily speed up the performance of Docker containers. Preload Image Cache
Read more >
3 Ways to optimize Cloud Run response times
As traffic fluctuates, Cloud Run attempts to reduce the chance of cold starts by keeping some idle instances around to handle spikes in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found