Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

BB3 Response Time

See original GitHub issue

Hi, I’m trying to host a chat service with blenderbot3 (3B) across 5 NVIDIA GeForce RTX 2080 Ti gpus. Here is my config file

tasks:
  default:
    onboard_world: MessengerBotChatOnboardWorld
    task_world: MessengerBotChatTaskWorld
    timeout: 1800
    agents_required: 1
task_name: chatbot
world_module: parlai.chat_service.tasks.chatbot.worlds
overworld: MessengerOverworld
max_workers: 1000
opt:
  debug: True
  models:
    blenderbot3_3B:
      model: projects.seeker.agents.seeker:ComboFidGoldDocumentAgent
      init_opt: gen/r2c2_bb3
      model_file: zoo:bb3/bb3_3B/model
      interactive_mode: True
      no_cuda: False
      override:
        init_opt: gen/r2c2_bb3
        search_server: https://www.google.com
        model_parallel: True
additional_args:
  page_id: 1 # Configure Your Own Page

I was wondering if there are any methods I could use to increase the response time for the bot. Currently it takes the bot around 10-15 seconds to respond with my current setup which seems a bit slow, especially compared to the bb3 demo which is faster and uses a larger model. Let me know if there are any other details I can provide

Issue Analytics

State:
Created a year ago
Comments:9 (5 by maintainers)

Top GitHub Comments

1reaction

mojtaba-komeilicommented, Oct 31, 2022

3-4 seconds seems reasonable. And you are right that decoding is taking most of the time during inference. There are some ongoing projects for improving decoding. So stay tuned on that. For serving multiple users, you may try a batching mechanism that keeps requests in a queue and batches them between each inference. Have a look at batching in this code for reference.

0reactions

ryanshea10commented, Nov 7, 2022

Great! Thanks for the help with this. I think I’ve managed to mostly resolve the issue so I will close it for now.

Top Results From Across the Web

DCP405 module measurements – EEZ

The response time is within 20 μs that is almost two order of magnitude better then software based OVP. The similar response time...

Late intervention with the small molecule BB3 mitigates ... - NCBI

Animals were administered vehicle or BB3 once daily until euthanization. To identify any effects of BB3 on urine output in healthy, i.e., non-IR...

Study to Evaluate the Safety and Activity of BB3 to Treat Heart ...

Evaluation of the degree of late ventricular remodeling between the BB3 and placebo treatment groups at 6 months, as measured by increase in...

ParlAI/README.md at main - agents - GitHub

A framework for training and evaluating AI models on a variety of openly available dialogue datasets. - ParlAI/README.md at main · facebookresearch/ParlAI.

EEZ Bench Box 3 (BB3) - EEVblog

EEZ Bench Box 3 (BB3) - Page 14. ... Boot time, UI response time and especially control and read out of the outputs....