Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

t5-11b out of memory/FileNotFoundError

See original GitHub issue

``First of all, this seems like a great repo that I was super excited to find!

When testing with t5-small everything works correctly. But when trying with my custom t5-11b I get out of memory issues.

I was running this with a t5-11b as model: onnx_model_paths = generate_onnx_representation("t5-11b",model=model)

And at first I got this error:

RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.

So I simply added use_external_data_format=True to all of the three torch.onnx.export in onnx_exporter.py in fastT5.

Then I can run onnx_model_paths = generate_onnx_representation(model_name,model=model), and get no error (First time I posted I got an error but it seems like I made an error and only had 100 GB disk memory, when trying 200 GB it worked).

Then when running quant_model_paths = quantize(onnx_model_paths) I get the error:

`FileNotFoundError                         Traceback (most recent call last)
<ipython-input-7-3a782b6d5a25> in <module>
      8 
      9 # Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
---> 10 quant_model_paths = quantize(onnx_model_paths)
     11 
     12 # step 3. setup onnx runtime

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    273             activation_type=QuantType.QUInt8,
    274             weight_type=QuantType.QUInt8,
--> 275             optimize_model=False,
    276         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    277         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    266         op_types_to_quantize = list(IntegerOpsRegistry.keys())
    267 
--> 268     model = load_model(Path(model_input), optimize_model)
    269     quantizer = ONNXQuantizer(
    270         model,

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in load_model(model_path, optimize)
     51         return onnx_model.model
     52 
---> 53     return onnx.load(Path(model_path))
     54 
     55 

/opt/conda/lib/python3.7/site-packages/onnx/__init__.py in load_model(f, format, load_external_data)
    125         if model_filepath:
    126             base_dir = os.path.dirname(model_filepath)
--> 127             load_external_data_for_model(model, base_dir)
    128 
    129     return model

/opt/conda/lib/python3.7/site-packages/onnx/external_data_helper.py in load_external_data_for_model(model, base_dir)
     69     for tensor in _get_all_tensors(model):
     70         if uses_external_data(tensor):
---> 71             load_external_data_for_tensor(tensor, base_dir)
     72             # After loading raw_data from external_data, change the state of tensors
     73             tensor.data_location = TensorProto.DEFAULT

/opt/conda/lib/python3.7/site-packages/onnx/external_data_helper.py in load_external_data_for_tensor(tensor, base_dir)
     48     external_data_file_path = os.path.join(base_dir, file_location)
     49 
---> 50     with open(external_data_file_path, 'rb') as data_file:
     51 
     52         if info.offset:

FileNotFoundError: [Errno 2] No such file or directory: '/home/jupyter/encoder.embed_tokens.weight'`

Has anyone successfully exported the t5-11b version and knows how to solve this?

Update:

I tried changing the working directory to /home/jupyter/models instead of /home/jupyter/, which seems to solve the FileNotFoundError. But then again I get problems with the size:

ValueError                                Traceback (most recent call last)
<ipython-input-10-032d95bca1c8> in <module>
      1 os.chdir(r'/home/jupyter/models/')
----> 2 quant_model_paths = quantize(onnx_model_paths)

~/fastT5/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    273             activation_type=QuantType.QUInt8,
    274             weight_type=QuantType.QUInt8,
--> 275             optimize_model=False,
    276         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    277         quant_model_paths.append(output_model_name)

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    278         nodes_to_quantize,
    279         nodes_to_exclude,
--> 280         op_types_to_quantize)
    281 
    282     quantizer.quantize_model()

/opt/conda/lib/python3.7/site-packages/onnxruntime/quantization/onnx_quantizer.py in __init__(self, model, per_channel, reduce_range, mode, static, weight_qType, input_qType, tensors_range, nodes_to_quantize, nodes_to_exclude, op_types_to_quantize)
     30 
     31         # run shape inference on the model
---> 32         model = onnx.shape_inference.infer_shapes(model)
     33         self.value_infos = {vi.name: vi for vi in model.graph.value_info}
     34         self.value_infos.update({ot.name: ot for ot in model.graph.output})

/opt/conda/lib/python3.7/site-packages/onnx/shape_inference.py in infer_shapes(model, check_type, strict_mode)
     34 def infer_shapes(model, check_type=False, strict_mode=False):  # type: (ModelProto, bool, bool) -> ModelProto
     35     if isinstance(model, ModelProto):
---> 36         model_str = model.SerializeToString()
     37         inferred_model_str = C.infer_shapes(model_str, check_type, strict_mode)
     38         return onnx.load_from_string(inferred_model_str)

ValueError: Message onnx.ModelProto exceeds maximum protobuf size of 2GB: 19459248612

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

ViktorThinkcommented, May 9, 2021

Normal torch quantization works on the larger models, so to anyone reading could check that out as an alternative: https://snappishproductions.com/blog/2020/05/03/big-models-hate-this-one-weird-trick-quantization-t5--pytorch-1.4.html.html

My result was 4x smaller (with qint8) and 3x faster, so better than nothing, although I lost a little bit of accuracy.

0reactions

samanzcommented, May 3, 2021

I’m getting this same error when trying to export t5-3b. Seems like this may be the more relevant onnx issue. Seems like the infer_shapes method doesn’t work with large models, and is supposed to be replaced with infer_shapes_path. So that would need to be fixed in the onnxruntime project. I modified the code in onnx_quantizer to look like:

        onnx.shape_inference.infer_shapes_path(model_name, model_name + ".inferred")
        model = onnx.load(model_name + ".inferred")

while passing in a model_name to the method as well. The code was able to get pass the shape inference step, but failed with this information now:

Quantizing... |##########                      | 1/3
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-e72945460842> in <module>
      1 # Step 2. (recommended) quantize the converted model for fast inference and to reduce model size.
----> 2 quant_model_paths = quantize(onnx_model_paths)
      3 
      4 # step 3. setup onnx runtime
      5 model_sessions = get_onnx_runtime_sessions(quant_model_paths)

~/.local/lib/python3.6/site-packages/fastT5/onnx_exporter.py in quantize(models_name_or_path)
    274             weight_type=QuantType.QUInt8,
    275             optimize_model=False,
--> 276             use_external_data_format=True,
    277         )  # op_types_to_quantize=['MatMul', 'Relu', 'Add', 'Mul' ],
    278         quant_model_paths.append(output_model_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/quantize.py in quantize_dynamic(model_input, model_output, op_types_to_quantize, per_channel, reduce_range, activation_type, weight_type, nodes_to_quantize, nodes_to_exclude, optimize_model, use_external_data_format)
    281         op_types_to_quantize)
    282 
--> 283     quantizer.quantize_model()
    284     quantizer.model.save_model_to_file(model_output, use_external_data_format)
    285 

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_model(self)
    195                 op_quantizer = CreateDefaultOpQuantizer(self, node)
    196 
--> 197             op_quantizer.quantize()
    198 
    199         self._dequantize_outputs()

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/operators/matmul.py in quantize(self)
     17 
     18         (quantized_input_names, zero_point_names, scale_names, nodes) = \
---> 19             self.quantizer.quantize_inputs(node, [0, 1])
     20 
     21         matmul_integer_output = node.output[0] + "_output_quantized"

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_inputs(self, node, indices, initializer_use_weight_qType)
    613             if initializer is not None:
    614                 q_weight_name, zp_name, scale_name = self.quantize_weight(
--> 615                     initializer, self.weight_qType if initializer_use_weight_qType else self.input_qType)
    616 
    617                 quantized_input_names.append(q_weight_name)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in quantize_weight(self, weight, qType)
    654 
    655         # Update packed weight, zero point, and scale initializers
--> 656         weight_data = self.tensor_proto_to_array(weight)
    657         _, _, zero_point, scale, q_weight_data = quantize_data(weight_data.flatten().tolist(),
    658                                                                get_qrange_for_qType(qType, self.reduce_range), qType)

~/.local/lib/python3.6/site-packages/onnxruntime/quantization/onnx_quantizer.py in tensor_proto_to_array(initializer)
    215     def tensor_proto_to_array(initializer):
    216         if initializer.data_type == onnx_proto.TensorProto.FLOAT:
--> 217             weights = onnx.numpy_helper.to_array(initializer)
    218         else:
    219             raise ValueError('Only float type quantization is supported. Weights {} is {}. '.format(

~/.local/lib/python3.6/site-packages/onnx/numpy_helper.py in to_array(tensor)
     52         return np.frombuffer(
     53             tensor.raw_data,
---> 54             dtype=np_dtype).reshape(dims)
     55     else:
     56         data = getattr(tensor, storage_field),  # type: Sequence[np.complex64]

ValueError: cannot reshape array of size 16777216 into shape (1024,4096)