Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Reproduction of 80K/sec throughput

See original GitHub issue

Hi, I tried to reproduce the 80K/sec throughput reported in the paper, but only got around 22K/sec.

I ran the single learner on a GPU machine (the GPU is P40):

python experiment.py --job_name=learner --task=0 --num_actors=150 \
    --level_name=rooms_keys_doors_puzzle --batch_size=32 \
    --entropy_cost=0.0033391318945337044 \
    --learning_rate=0.00031866995608948655 \
    --total_environment_frames=10000000000 --reward_clipping=soft_asymmetric

and ran 150 actors each on a CPU machine (each one is actually a docker machine in remote allocated by a cloud service):

python experiment.py --job_name=actor --task=$i \
      --num_actors=150 --level_name=rooms_keys_doors_puzzle

where i denotes the i-th actor.

Could you give some hints on how to reproduce the throughput? Did you require a proprietary intra net connection?

Issue Analytics

State:
Created 5 years ago
Reactions:1
Comments:15

Top GitHub Comments

1reaction

pengsuncommented, Jul 31, 2018

Thanks!

How does the number 2-3GB/sec come (e.g., batch_size * width * height * rollout_len * BytesOfFloat, etc.)? I’m still reading the tf.FIFOQueue code (with capacity=1) and struggling to understand the sync mechanism. I guess answering this question my help me (and others) to understand how the Actor code works 😃

Also, I just asked around and found I was unable to access a P100, the best GPU in hand is only P40… So please feel free to close the issue.

0reactions

lespeholtcommented, Sep 18, 2018

Yes, we used 1 CPU per actor. Can you try 150 actors with 1 CPU each?

It’s a bit hard to interpret the timelines without interacting with them. Since dequeuemany is taking that much time on the learner, it looks like they are bottlenecked by actors or the bandwidth to them. Not sure why there is a gap between the actor steps. If they wait on enqueuing, then it suggest a bottleneck in the learner or the bandwidth. In this case it would then be the network.

Can you try and create new variables for each actor? i.e. no sharing of variables. If that is significantly faster, it’s network bandwidth.

Top Results From Across the Web

[Release Nos. IC-24991 and IA-1945; File No. S7-06-01] - RIN ...

Final Rule: Electronic Recordkeeping by Investment Companies and Investment Advisers. SECURITIES AND EXCHANGE COMMISSION. 17 CFR Parts 270 and 275.

The Performance Benefits of Fibre Channel Compared to ...

The second-generation all-flash storage array B demonstrated consistently less iSCSI throughput as ISL utilization increased, dropping by two-thirds at 80% ...

Bandwidth basics and fundamentals - Test & Measurement Tips

While communication links throughput is measured in bit/second units, file sizes are measured in bytes. IEC standards define a megabyte as one ...

Reproduction and Breeding of Nonhuman Primates - PMC

When a new breeding group is started, the group is allowed the first 3 months to acclimate, after which a pregnancy rate of...

An environmental channel throughput and radio propagation ...

V2V communication is expected to aid the user in detecting and reducing. 70%–80% of collisions or accidents.3 The communica- tions between V2V and...