[BUG] Inference predictions dont match Huggingface for GPT-J
See original GitHub issueDescribe the bug
hf_output [{'generated_text': 'Try without sampling the data.\n\nA:\n\nYou can use the following code to get the data from the database.\n$sql = "SELECT * FROM `table`";\n$result = mysqli_query($conn,'}]
ds output [{'generated_text': 'Try without sampling the ( � hub ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ('}]
To Reproduce Steps to reproduce the behavior:
import torch
from transformers import pipeline
import deepspeed
query_text = "Try without sampling"
from transformers import GPTJForCausalLM
model = GPTJForCausalLM.from_pretrained("EleutherAI/gpt-j-6B",
revision="float16",
torch_dtype=torch.float16,
low_cpu_mem_usage=True)
pipe = pipeline("text-generation", model=model, tokenizer="EleutherAI/gpt-j-6B", device=0, framework="pt")
pipe.model.half()
hf_output = pipe(query_text, do_sample=False)
pipe.model = deepspeed.init_inference(
pipe.model,
mp_size=1,
dtype=torch.half,
replace_method="auto",
replace_with_kernel_inject=True,
)
ds_output = pipe(query_text, do_sample=False)
print('HUGGINGFACE:', hf_output[0])
print('DEEPSPEED:', ds_output[0])
Expected behavior Output predictions match HF predictions
ds_report output
oot@2f0b3a15b3d0:/fsx/huilgolr/inference/rubik# ds_report
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
sparse_attn ............ [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/opt/conda/lib/python3.8/site-packages/torch']
torch version .................... 1.11.0+cu113
torch cuda version ............... 11.3
torch hip version ................ None
nvcc version ..................... 11.3
deepspeed install path ........... ['/deepspeed']
deepspeed info ................... 0.7.1+8b2a6371, 8b2a6371, master
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.3
Screenshots If applicable, add screenshots to help explain your problem. Screenshots NA
System info (please complete the following information):
OS: [e.g. Ubuntu 18.04] Ubuntu
GPU count and types A100 GPU
Interconnects (if applicable) [e.g., two machines connected with 100 Gbps IB] N/A
Python version 3.8.3
Any other relevant info about your setup
Launcher context inference, single process
Issue Analytics
- State:
- Created a year ago
- Comments:25 (11 by maintainers)
Top Results From Across the Web
Memory use of GPT-J-6B - Beginners - Hugging Face Forums
Hello everyone! I am trying to install GPT-J-6B on a powerful (more or less “powerful”) computer and I have encountered some problems.
Read more >Deploy GPT-J 6B for inference using Hugging Face ...
In this blog post, you will learn how to easily deploy GPT-J using Amazon SageMaker and the Hugging Face Inference Toolkit with a...
Read more >Few-shot learning in practice: GPT-Neo and the Accelerated ...
Few-Shot Learning refers to the practice of feeding a machine learning model with a very small amount of training data to guide its...
Read more >Different results predicting from trainer and model - Beginners
I get predictions which are slightly different from the previous ones and do not match the accuracy of the training. So, anything I'm...
Read more >GPT-J - Hugging Face
Overview · The model should fit on 16GB GPU for inference. For training/fine-tuning it would take much more GPU RAM. · Although the...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Created a separate issue for the low_cpu_mem_usage flag issue. https://github.com/microsoft/DeepSpeed/issues/2275
Let’s use this for tracking the multi gpu correctness for GPT-J seen above
HUGGINGFACE: [{‘generated_text’: “Try without sampling.\n\nI’m not sure if I’m doing it right.\n\nI’m not sure if I’m doing it right.\n\nI’m not sure if I’m doing it right.\n\nI’m not sure if”}]
with kernels [{‘generated_text’: “Try without sampling.\n\nI’m not sure if I’m doing it right.\n\n.\n\n.\n\n.\n\n.\n\n.\n\n as”}] without [{‘generated_text’: “Try without sampling.\n\nI’m not sure if I’m doing it right.\n\nI’m not sure if I’m doing it right.\n\nI’m not sure if I’m doing it right.\n\nI’m not sure if”}]