question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Run T5 on a CPU for inference

See original GitHub issue

Hi, great work on T5!

I’m looking to run T5 on a CPU for interactive predictions only, no fine-tuning. The given notebook provides great instructions for using T5 with a TPU, but I’m struggling to find how I could use it with a CPU?

I’ve tried changing the notebook similar to this:

model = t5.models.MtfModel(
    model_dir=MODEL_DIR,
    tpu=None,
    model_parallelism=model_parallelism,
    batch_size=train_batch_size,
    layout_rules="ensemble:ensemble,batch:batch,d_ff:model,heads:model,vocab:model,experts:batch,model1:batch1,cpu:0", # sometimes I include this, sometimes I don't - it doesn't seem to matter
    sequence_length={"inputs": 128, "targets": 32},
    learning_rate_schedule=0.003,
    save_checkpoints_steps=5000,
    keep_checkpoint_max=keep_checkpoint_max if ON_CLOUD else None,
    iterations_per_loop=100,
)

But I get these errors:

ValueError: Tensor dimension size not divisible by mesh dimension size: tensor_shape=Shape[outer_batch=1, batch=4, length=128] tensor_layout=TensorLayout(None, 0, None)

It seems likely that it has something to do with my TensorLayout being none. Would you mind giving me some tips for this? Thanks a bunch in advance.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
nshazeercommented, Apr 1, 2020

(None, 0, None) indicates the sharding of the tensor-dimensions. The first and third tensor-dimensions are not split. The second tensor-dimension (batch=4) is split across the 0-th mesh-dimension.

Say that this is using model_parallelism=1 (i.e. pure data-parallelism) on an 8-core TPU. This would mean that the batch needs to be split 8 ways. But the batch size is 4, so this is impossible.

Solutions would be to either double the batch size, or to increase model-parallelism to 2 or 4.

On Wed, Apr 1, 2020 at 11:11 AM Colin Raffel notifications@github.com wrote:

@orsk-moscow https://github.com/orsk-moscow Are you running on a TPU or CPU? What model parallelism are you using?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google-research/text-to-text-transfer-transformer/issues/67#issuecomment-607408855, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHDH33WDYZVDXNFIFA3VXKLRKN7UXANCNFSM4KP37GSQ .

0reactions
orsk-moscowcommented, Apr 2, 2020

Hi @nshazeer , thanks for clarification

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fast T5 transformer model CPU inference with ... - YouTube
Question Generation using NLP course link: https://bit.ly/2PunWiWThe Colab notebook shown in the video is available in the course.
Read more >
How to get Accelerated Inference API for T5 models? - Hub
I just copy/paste the following codes in a Google Colab notebook with my TOKEN_API in order to check the inference time with the...
Read more >
Optimizing T5 and GPT-2 for Real-Time Inference with NVIDIA ...
This optimization leads to a 3–6x reduction in latency compared to PyTorch GPU inference, and a 9–21x compared to PyTorch CPU inference.
Read more >
Deploy T5 11B for inference for less than $500 - philschmid
This blog will teach you how to deploy T5 11B for inference using Hugging Face Inference Endpoints. The T5 model was presented in...
Read more >
Optimizing the T5 Model for Fast Inference - DataToBiz
The T5 model is an encoder-decoder model hence we tried to optimize the encoder first and then the decoder next. For doing this...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found