BB3 Response Time
See original GitHub issueHi, I’m trying to host a chat service with blenderbot3 (3B) across 5 NVIDIA GeForce RTX 2080 Ti gpus. Here is my config file
tasks:
default:
onboard_world: MessengerBotChatOnboardWorld
task_world: MessengerBotChatTaskWorld
timeout: 1800
agents_required: 1
task_name: chatbot
world_module: parlai.chat_service.tasks.chatbot.worlds
overworld: MessengerOverworld
max_workers: 1000
opt:
debug: True
models:
blenderbot3_3B:
model: projects.seeker.agents.seeker:ComboFidGoldDocumentAgent
init_opt: gen/r2c2_bb3
model_file: zoo:bb3/bb3_3B/model
interactive_mode: True
no_cuda: False
override:
init_opt: gen/r2c2_bb3
search_server: https://www.google.com
model_parallel: True
additional_args:
page_id: 1 # Configure Your Own Page
I was wondering if there are any methods I could use to increase the response time for the bot. Currently it takes the bot around 10-15 seconds to respond with my current setup which seems a bit slow, especially compared to the bb3 demo which is faster and uses a larger model. Let me know if there are any other details I can provide
Issue Analytics
- State:
- Created a year ago
- Comments:9 (5 by maintainers)
Top Results From Across the Web
DCP405 module measurements – EEZ
The response time is within 20 μs that is almost two order of magnitude better then software based OVP. The similar response time...
Read more >Late intervention with the small molecule BB3 mitigates ... - NCBI
Animals were administered vehicle or BB3 once daily until euthanization. To identify any effects of BB3 on urine output in healthy, i.e., non-IR...
Read more >Study to Evaluate the Safety and Activity of BB3 to Treat Heart ...
Evaluation of the degree of late ventricular remodeling between the BB3 and placebo treatment groups at 6 months, as measured by increase in...
Read more >ParlAI/README.md at main - agents - GitHub
A framework for training and evaluating AI models on a variety of openly available dialogue datasets. - ParlAI/README.md at main · facebookresearch/ParlAI.
Read more >EEZ Bench Box 3 (BB3) - EEVblog
EEZ Bench Box 3 (BB3) - Page 14. ... Boot time, UI response time and especially control and read out of the outputs....
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
3-4 seconds seems reasonable. And you are right that decoding is taking most of the time during inference. There are some ongoing projects for improving decoding. So stay tuned on that. For serving multiple users, you may try a batching mechanism that keeps requests in a queue and batches them between each inference. Have a look at batching in this code for reference.
Great! Thanks for the help with this. I think I’ve managed to mostly resolve the issue so I will close it for now.