t5-11b out of memory/FileNotFoundError
See original GitHub issue``First of all, this seems like a great repo that I was super excited to find!
When testing with t5-small everything works correctly. But when trying with my custom t5-11b I get out of memory issues.
I was running this with a t5-11b as model:
onnx_model_paths = generate_onnx_representation("t5-11b",model=model)
And at first I got this error:
RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.
So I simply added use_external_data_format=True
to all of the three torch.onnx.export
in onnx_exporter.py in fastT5.
Then I can run onnx_model_paths = generate_onnx_representation(model_name,model=model)
, and get no error (First time I posted I got an error but it seems like I made an error and only had 100 GB disk memory, when trying 200 GB it worked).
Then when running quant_model_paths = quantize(onnx_model_paths)
I get the error:
`FileNotFoundError Traceback (most recent call last)
<ipython-input-7-3a782b6d5a25> in <module>
8
9 # Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
---> 10 quant_model_paths = quantize(onnx_model_paths)
11
12 # step 3. setup onnx runtime
~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
273 activation_type=QuantType.QUInt8,
274 weight_type=QuantType.QUInt8,
--> 275 optimize_model=False,
276 ) # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
277 quant_model_paths.append(output_model_name)
/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
266 op_types_to_quantize = list(IntegerOpsRegistry.keys())
267
--> 268 model = load_model(Path(model_input), optimize_model)
269 quantizer = ONNXQuantizer(
270 model,
/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in load_model(model_path, optimize)
51 return onnx_model.model
52
---> 53 return onnx.load(Path(model_path))
54
55
/opt/conda/lib/python3.7/site-packages/onnx/__init__.py in load_model(f, format, load_external_data)
125 if model_filepath:
126 base_dir = os.path.dirname(model_filepath)
--> 127 load_external_data_for_model(model, base_dir)
128
129 return model
/opt/conda/lib/python3.7/site-packages/onnx/external_data_helper.py in load_external_data_for_model(model, base_dir)
69 for tensor in _get_all_tensors(model):
70 if uses_external_data(tensor):
---> 71 load_external_data_for_tensor(tensor, base_dir)
72 # After loading raw_data from external_data, change the state of tensors
73 tensor.data_location = TensorProto.DEFAULT
/opt/conda/lib/python3.7/site-packages/onnx/external_data_helper.py in load_external_data_for_tensor(tensor, base_dir)
48 external_data_file_path = os.path.join(base_dir, file_location)
49
---> 50 with open(external_data_file_path, 'rb') as data_file:
51
52 if info.offset:
FileNotFoundError: [Errno 2] No such file or directory: '/home/jupyter/encoder.embed_tokens.weight'`
Has anyone successfully exported the t5-11b version and knows how to solve this?
Update:
I tried changing the working directory to /home/jupyter/models instead of /home/jupyter/, which seems to solve the FileNotFoundError. But then again I get problems with the size:
ValueError Traceback (most recent call last)
<ipython-input-10-032d95bca1c8> in <module>
1 os.chdir(r'/home/jupyter/models/')
----> 2 quant_model_paths = quantize(onnx_model_paths)
~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
273 activation_type=QuantType.QUInt8,
274 weight_type=QuantType.QUInt8,
--> 275 optimize_model=False,
276 ) # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
277 quant_model_paths.append(output_model_name)
/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
278 nodes_to_quantize,
279 nodes_to_exclude,
--> 280 op_types_to_quantize)
281
282 quantizer.quantize_model()
/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/onnx_quantizer.py in __init__(self, model, per_channel, reduce_range, mode, static, weight_qType, input_qType, tensors_range, nodes_to_quantize, nodes_to_exclude, op_types_to_quantize)
30
31 # run shape inference on the model
---> 32 model = onnx.shape_inference.infer_shapes(model)
33 self.value_infos = {vi.name: vi for vi in model.graph.value_info}
34 self.value_infos.update({ot.name: ot for ot in model.graph.output})
/opt/conda/lib/python3.7/site-packages/onnx/shape_inference.py in infer_shapes(model, check_type, strict_mode)
34 def infer_shapes(model, check_type=False, strict_mode=False): # type: (ModelProto, bool, bool) -> ModelProto
35 if isinstance(model, ModelProto):
---> 36 model_str = model.SerializeToString()
37 inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode)
38 return onnx.load_from_string(inferred_model_str)
ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 19459248612
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (2 by maintainers)
Top GitHub Comments
Normal torch quantization works on the larger models, so to anyone reading could check that out as an alternative: https://snappishproductions.com/blog/2020/05/03/big-models-hate-this-one-weird-trick-quantization-t5--pytorch-1.4.html.html
My result was 4x smaller (with qint8) and 3x faster, so better than nothing, although I lost a little bit of accuracy.
I’m getting this same error when trying to export t5-3b. Seems like this may be the more relevant onnx issue. Seems like the
infer_shapes
method doesn’t work with large models, and is supposed to be replaced withinfer_shapes_path
. So that would need to be fixed in the onnxruntime project. I modified the code inonnx_quantizer
to look like:while passing in a
model_name
to the method as well. The code was able to get pass the shape inference step, but failed with this information now: