Out of memory error when running torchdynamo with model
See original GitHub issueš Describe the bug
Hello ! I have an out of memory error when I try to run Whisper through torchdynamo.
- It works fine without torchdynamo.
- It works fine on
openai/whisper-medium
- It works fine with
num_beams=1
- Seems it is the decoder causing the problem, If I remove
optimize_model(model.model.decoder)
it works
When I set use_cache
to false
in generate
it segfault instead of OOM.
And I donāt think the minifier is working for this case.
Pytorch: 1.14.0.dev20221130+cu117
(nightly)
Minimal reproduction
dynamo.config.cache_size_limit = 512
def _compiler(gm: torch.fx.GraphModule, example_inputs):
return gm
def optimize_model(original_model) -> None:
original_model.forward2 = original_model.forward
@torchdynamo.optimize(_compiler)
def run(*args, **kwargs):
return original_model.forward2(*args, **kwargs)
original_model.forward = run
audio_dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large").to("cuda")
optimize_model(model.model.encoder)
optimize_model(model.model.decoder)
processor = WhisperProcessor.from_pretrained("openai/whisper-large")
speech_data = audio_dataset[0]["audio"]["array"]
inputs = processor(speech_data, return_tensors="pt", sampling_rate=16_000).input_features.to("cuda")
with torch.inference_mode(), torch.autocast(dtype=torch.float16, cache_enabled=True, device_type="cuda"):
predicted_ids = model.generate( inputs, min_length=25, max_length=25, num_beams=2, do_sample=False)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True, normalize=True)[0]
assert transcription == "mister quilter is the apostle of the middle classes and we are glad to welcome his gospel"
Error logs
x = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
def clone_input(x):
"""copy while preserving strides"""
def torch_clone(x):
y = torch.clone(x)
if x.is_leaf:
y.requires_grad_(x.requires_grad)
if x.is_leaf and x.grad is not None:
y.grad = clone_input(x.grad)
return y
with torch.no_grad():
if x.device.type == "xla":
# Access data_ptr() for a xla tensor will cause crash
return torch_clone(x)
needed_size = sum(
(shape - 1) * stride for shape, stride in zip(x.size(), x.stride())
)
if x.is_quantized:
result = torch.empty_quantized((needed_size + 32,), x)
else:
result = torch.empty(needed_size + 32, dtype=x.dtype, device=x.device)
cache_line_offset = (
(x.data_ptr() - result.data_ptr()) % 32
) // x.element_size()
result.as_strided_(x.size(), x.stride(), cache_line_offset)
try:
> result.copy_(x.clone())
E torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 22.20 GiB total capacity; 17.57 GiB already allocated; 6.12 MiB free; 20.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/utils.py:427: OutOfMemoryError
During handling of the above exception, another exception occurred:
code = <code object run at 0x7fe85b69cdf0, file "/kernl/test/test_torchdynamo.py", line 142>
globals = {'@py_builtins': <module 'builtins' (built-in)>, '@pytest_ar': <module '_pytest.assertion.rewrite' from '/usr/local/li...auto.AutoModelForSeq2SeqLM'>, 'AutoTokenizer': <class 'transformers.models.auto.tokenization_auto.AutoTokenizer'>, ...}
locals = {'args': (), 'kwargs': {'attention_mask': None, 'cross_attn_head_mask': None, 'encoder_hidden_states': tensor([[[-9.05...eps=1e-05, elementwise_affine=True)
)
)
(layer_norm): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
)}
builtins = {'ArithmeticError': <class 'ArithmeticError'>, 'AssertionError': <class 'AssertionError'>, 'AttributeError': <class 'AttributeError'>, 'BaseException': <class 'BaseException'>, ...}
compiler_fn = <function _compiler at 0x7fe8416fab80>, one_graph = False, export = False, guard_export_fn = None, frame = <frame at 0x7fe81ce13c80, file '/kernl/test/test_torchdynamo.py', line 142, code run>
def _compile(
code: types.CodeType,
globals,
locals,
builtins,
compiler_fn,
one_graph,
export,
guard_export_fn=None,
frame=None,
) -> Optional[GuardedCode]:
output: Optional[OutputGraph] = None
# from .utils import print_once; print_once(code.co_filename)
def transform(instructions, code_options):
nonlocal output
tracer = InstructionTranslator(
instructions,
code,
locals,
globals,
builtins,
code_options,
compiler_fn,
one_graph,
export,
)
tracer.run()
output = tracer.output
assert output is not None
assert output.output_instructions
instructions[:] = output.output_instructions
code_options.update(output.code_options)
if config.dead_code_elimination:
instructions[:] = remove_pointless_jumps(remove_dead_code(instructions))
try:
for attempt in itertools.count():
try:
> out_code = transform_code_object(code, transform)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/convert_frame.py:393:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
code = <code object run at 0x7fe85b69cdf0, file "/kernl/test/test_torchdynamo.py", line 142>, transformations = <function _compile.<locals>.transform at 0x7fe7afdbfee0>, safe = False
def transform_code_object(code, transformations, safe=False):
keys = [
"co_argcount",
"co_posonlyargcount", # python 3.8+
"co_kwonlyargcount",
"co_nlocals",
"co_stacksize",
"co_flags",
"co_code",
"co_consts",
"co_names",
"co_varnames",
"co_filename",
"co_name",
"co_firstlineno",
"co_lnotab", # changed to "co_linetable" if python 3.10+
"co_freevars",
"co_cellvars",
]
if sys.version_info < (3, 8):
keys.pop(1)
if sys.version_info >= (3, 10):
keys = list(map(lambda x: x.replace("co_lnotab", "co_linetable"), keys))
code_options = {k: getattr(code, k) for k in keys}
assert len(code_options["co_varnames"]) == code_options["co_nlocals"]
instructions = cleaned_instructions(code, safe)
propagate_line_nums(instructions)
> transformations(instructions, code_options)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/bytecode_transformation.py:341:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
instructions = [Instruction(opcode=136, opname='LOAD_DEREF', arg=0, argval='original_model', offset=0, starts_line=144, is_jump_targe...(opcode=164, opname='DICT_MERGE', arg=1, argval=1, offset=10, starts_line=144, is_jump_target=False, target=None), ...]
code_options = {'co_argcount': 0, 'co_cellvars': (), 'co_code': b'\x88\x00j\x00|\x00i\x00|\x01\xa4\x01\x8e\x01S\x00', 'co_consts': (None,), ...}
def transform(instructions, code_options):
nonlocal output
> tracer = InstructionTranslator(
instructions,
code,
locals,
globals,
builtins,
code_options,
compiler_fn,
one_graph,
export,
)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/convert_frame.py:369:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7fe7af6b0280>
instructions = [Instruction(opcode=136, opname='LOAD_DEREF', arg=0, argval='original_model', offset=0, starts_line=144, is_jump_targe...(opcode=164, opname='DICT_MERGE', arg=1, argval=1, offset=10, starts_line=144, is_jump_target=False, target=None), ...]
f_code = <code object run at 0x7fe85b69cdf0, file "/kernl/test/test_torchdynamo.py", line 142>
f_locals = {'args': (), 'kwargs': {'attention_mask': None, 'cross_attn_head_mask': None, 'encoder_hidden_states': tensor([[[-9.05...eps=1e-05, elementwise_affine=True)
)
)
(layer_norm): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
)}
f_globals = {'@py_builtins': <module 'builtins' (built-in)>, '@pytest_ar': <module '_pytest.assertion.rewrite' from '/usr/local/li...auto.AutoModelForSeq2SeqLM'>, 'AutoTokenizer': <class 'transformers.models.auto.tokenization_auto.AutoTokenizer'>, ...}
f_builtins = {'ArithmeticError': <class 'ArithmeticError'>, 'AssertionError': <class 'AssertionError'>, 'AttributeError': <class 'AttributeError'>, 'BaseException': <class 'BaseException'>, ...}
code_options = {'co_argcount': 0, 'co_cellvars': (), 'co_code': b'\x88\x00j\x00|\x00i\x00|\x01\xa4\x01\x8e\x01S\x00', 'co_consts': (None,), ...}, compiler_fn = <function _compiler at 0x7fe8416fab80>, one_graph = False, export = False
def __init__(
self,
instructions: List[Instruction],
f_code,
f_locals,
f_globals,
f_builtins,
code_options,
compiler_fn,
one_graph,
export,
):
super(InstructionTranslator, self).__init__(
output=OutputGraph(f_globals, code_options, compiler_fn, self),
instructions=instructions,
f_locals=f_locals,
f_globals=f_globals,
f_builtins=f_builtins,
code_options=code_options,
symbolic_locals=collections.OrderedDict(), # set below
# A global var is inserted only after a STORE_GLOBAL happens to it
symbolic_globals=collections.OrderedDict(),
f_code=f_code,
export=export,
)
self.one_graph: bool = one_graph
self.export = export
if self.export:
assert (
self.one_graph
), "Export without one graph - something has gone wrong."
vars = list(code_options["co_varnames"])
vars.extend(x for x in self.cell_and_freevars() if x not in vars)
> self.symbolic_locals = collections.OrderedDict(
(k, VariableBuilder(self, LocalSource(k))(f_locals[k]))
for k in vars
if k in f_locals
)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/symbolic_convert.py:1567:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.0 = <list_iterator object at 0x7fe7f08c1730>
self.symbolic_locals = collections.OrderedDict(
> (k, VariableBuilder(self, LocalSource(k))(f_locals[k]))
for k in vars
if k in f_locals
)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/symbolic_convert.py:1568:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.variables.builder.VariableBuilder object at 0x7fe7af63b160>
value = {'attention_mask': None, 'cross_attn_head_mask': None, 'encoder_hidden_states': tensor([[[-9.0559e-01, 4.3369e-02, 8...8e-01, -2.8411e-01, ..., 9.2083e-01,
-2.8621e-01, 1.7712e-01]]], device='cuda:0'), 'head_mask': None, ...}
def __call__(self, value):
if value in self.tx.output.side_effects:
# TODO(jansel): add guard for alias relationship
return self.tx.output.side_effects[value]
> return self._wrap(value).clone(**self.options())
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:146:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.variables.builder.VariableBuilder object at 0x7fe7af63b160>
value = {'attention_mask': None, 'cross_attn_head_mask': None, 'encoder_hidden_states': tensor([[[-9.0559e-01, 4.3369e-02, 8...8e-01, -2.8411e-01, ..., 9.2083e-01,
-2.8621e-01, 1.7712e-01]]], device='cuda:0'), 'head_mask': None, ...}
def _wrap(self, value):
make_guards = self.make_guards
if istype(value, (torch.SymInt, torch.SymFloat)):
return self.wrap_sym(value)
if istensor(value):
return self.wrap_tensor(value)
elif istype(value, (tuple, list, odict_values)) or is_namedtuple(value):
# One can index a tensor with a list/tuple. Therefore, we need to
# have a stricter match.
if istype(value, (tuple, list)) and all(
[isinstance(x, int) or is_numpy_int_type(x) or x is None for x in value]
):
guards = self.make_guards(GuardBuilder.EQUALS_MATCH)
else:
guards = self.make_guards(GuardBuilder.LIST_LENGTH)
output = [
VariableBuilder(self.tx, GetItemSource(self.get_source(), i))(
item
).add_guards(guards)
for i, item in enumerate(value)
]
result = self.list_type(value)(output, guards=guards)
if istype(value, list):
return self.tx.output.side_effects.track_list(
self.source, value, result
)
return result
elif istype(value, tuple_iterator):
guards = self.make_guards(GuardBuilder.TUPLE_ITERATOR_LEN)
output = [
VariableBuilder(
self.tx, TupleIteratorGetItemSource(self.get_source(), i)
)(tuple_iterator_getitem(value, i)).add_guards(guards)
for i in range(tuple_iterator_len(value))
]
return ListIteratorVariable(
output, mutable_local=MutableLocal(), guards=guards
)
elif istype(value, (slice, range)):
items = [
VariableBuilder(self.tx, AttrSource(self.get_source(), k))(
getattr(value, k)
)
for k in ("start", "stop", "step")
]
if isinstance(value, slice):
return SliceVariable(items, guards=make_guards(GuardBuilder.TYPE_MATCH))
else:
return RangeVariable(
items, guards=make_guards(GuardBuilder.EQUALS_MATCH)
)
elif istype(
value, (dict, collections.defaultdict, collections.OrderedDict)
) and all(
map(
lambda k: ConstantVariable.is_literal(k)
or self.tensor_can_be_dict_key(k),
value.keys(),
)
):
guards = self.make_guards(GuardBuilder.DICT_KEYS)
# store key variables in global location for reconstruction
for key in value.keys():
if self.tensor_can_be_dict_key(key):
self.tx.store_dict_key(global_key_name(key), key)
def index_source(key):
if self.tensor_can_be_dict_key(key):
return GlobalWeakRefSource(global_key_name(key))
else:
return key
result = dict(
> [
(
k,
VariableBuilder(
self.tx, GetItemSource(self.get_source(), index_source(k))
)(value[k]).add_guards(guards),
)
for k in value.keys()
]
)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:279:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.0 = <dict_keyiterator object at 0x7fe7ec0d7130>
[
(
k,
> VariableBuilder(
self.tx, GetItemSource(self.get_source(), index_source(k))
)(value[k]).add_guards(guards),
)
for k in value.keys()
]
)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:282:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.variables.builder.VariableBuilder object at 0x7fe7af445430>
value = ((tensor([[[[-2.9565e-01, 2.6001e-01, 5.9033e-01, ..., -2.0483e-01,
6.0205e-01, 1.7151e-01],
..., -3.5449e-01, ..., -3.3234e-02,
-1.8591e-01, 6.0539e-03]]]], device='cuda:0', dtype=torch.float16)), ...)
def __call__(self, value):
if value in self.tx.output.side_effects:
# TODO(jansel): add guard for alias relationship
return self.tx.output.side_effects[value]
> return self._wrap(value).clone(**self.options())
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:146:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.variables.builder.VariableBuilder object at 0x7fe7af445430>
value = ((tensor([[[[-2.9565e-01, 2.6001e-01, 5.9033e-01, ..., -2.0483e-01,
6.0205e-01, 1.7151e-01],
..., -3.5449e-01, ..., -3.3234e-02,
-1.8591e-01, 6.0539e-03]]]], device='cuda:0', dtype=torch.float16)), ...)
def _wrap(self, value):
make_guards = self.make_guards
if istype(value, (torch.SymInt, torch.SymFloat)):
return self.wrap_sym(value)
if istensor(value):
return self.wrap_tensor(value)
elif istype(value, (tuple, list, odict_values)) or is_namedtuple(value):
# One can index a tensor with a list/tuple. Therefore, we need to
# have a stricter match.
if istype(value, (tuple, list)) and all(
[isinstance(x, int) or is_numpy_int_type(x) or x is None for x in value]
):
guards = self.make_guards(GuardBuilder.EQUALS_MATCH)
else:
guards = self.make_guards(GuardBuilder.LIST_LENGTH)
> output = [
VariableBuilder(self.tx, GetItemSource(self.get_source(), i))(
item
).add_guards(guards)
for i, item in enumerate(value)
]
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:220:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.0 = <enumerate object at 0x7fe7af9b2ec0>
output = [
> VariableBuilder(self.tx, GetItemSource(self.get_source(), i))(
item
).add_guards(guards)
for i, item in enumerate(value)
]
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:221:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.variables.builder.VariableBuilder object at 0x7fe7f09e64c0>
value = (tensor([[[[-0.7871, -0.1460, 0.0963, ..., -0.1921, 0.1439, -0.1443],
[-0.3533, -0.3508, 0.0318, ..., ...29e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16))
def __call__(self, value):
if value in self.tx.output.side_effects:
# TODO(jansel): add guard for alias relationship
return self.tx.output.side_effects[value]
> return self._wrap(value).clone(**self.options())
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:146:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.variables.builder.VariableBuilder object at 0x7fe7f09e64c0>
value = (tensor([[[[-0.7871, -0.1460, 0.0963, ..., -0.1921, 0.1439, -0.1443],
[-0.3533, -0.3508, 0.0318, ..., ...29e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16))
def _wrap(self, value):
make_guards = self.make_guards
if istype(value, (torch.SymInt, torch.SymFloat)):
return self.wrap_sym(value)
if istensor(value):
return self.wrap_tensor(value)
elif istype(value, (tuple, list, odict_values)) or is_namedtuple(value):
# One can index a tensor with a list/tuple. Therefore, we need to
# have a stricter match.
if istype(value, (tuple, list)) and all(
[isinstance(x, int) or is_numpy_int_type(x) or x is None for x in value]
):
guards = self.make_guards(GuardBuilder.EQUALS_MATCH)
else:
guards = self.make_guards(GuardBuilder.LIST_LENGTH)
> output = [
VariableBuilder(self.tx, GetItemSource(self.get_source(), i))(
item
).add_guards(guards)
for i, item in enumerate(value)
]
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:220:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.0 = <enumerate object at 0x7fe7f054a440>
output = [
> VariableBuilder(self.tx, GetItemSource(self.get_source(), i))(
item
).add_guards(guards)
for i, item in enumerate(value)
]
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:221:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.variables.builder.VariableBuilder object at 0x7fe7af4e2070>
value = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
def __call__(self, value):
if value in self.tx.output.side_effects:
# TODO(jansel): add guard for alias relationship
return self.tx.output.side_effects[value]
> return self._wrap(value).clone(**self.options())
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:146:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.variables.builder.VariableBuilder object at 0x7fe7af4e2070>
value = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
def _wrap(self, value):
make_guards = self.make_guards
if istype(value, (torch.SymInt, torch.SymFloat)):
return self.wrap_sym(value)
if istensor(value):
> return self.wrap_tensor(value)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:210:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <torch._dynamo.variables.builder.VariableBuilder object at 0x7fe7af4e2070>
value = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
def wrap_tensor(self, value: torch.Tensor):
if self.get_source().guard_source().is_nn_module():
return self.tx.output.register_attr_or_module(
value,
self.name,
source=self.get_source(),
# Guards are done inside register_attr_or_module
# guards=self.make_guards(GuardBuilder.TENSOR_MATCH),
)
else:
if not is_constant_source(self.get_source()):
self.tx.output.graphargs.append(
GraphArg(self.get_source(), value, False)
)
# Disable __torch_function__ to prevent cloning of `value` to hit
# us
with torch._C.DisableTorchFunction():
if is_constant_source(self.get_source()):
return self.tx.output.register_attr_or_module(
value,
re.sub(r"[^a-zA-Z0-9]+", "_", self.name),
source=None,
# Guards are added inside register_attr_or_module
)
> tensor_variable = wrap_fx_proxy(
tx=self.tx,
proxy=self.tx.output.create_graph_input(
re.sub(r"[^a-zA-Z0-9]+", "_", self.name), type(value)
),
example_value=value,
guards=self.make_guards(GuardBuilder.TENSOR_MATCH),
should_specialize=self.tensor_should_specialize(),
)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:568:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tx = <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7fe7af6b0280>, proxy = Proxy(kwargs_past_key_values_29_3_)
example_value = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
options = {'guards': {Guard(name="kwargs['past_key_values'][29][3]", source=<GuardSource.LOCAL: 0>, create_fn=<function GuardBui...le=False, guard_types=None, code_list=None, obj_weakref=None, guarded_class_weakref=None)}, 'should_specialize': False}
def wrap_fx_proxy(tx, proxy, example_value=None, **options):
> return wrap_fx_proxy_cls(
target_cls=TensorVariable,
tx=tx,
proxy=proxy,
example_value=example_value,
**options,
)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:657:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
target_cls = <class 'torch._dynamo.variables.tensor.TensorVariable'>, tx = <torch._dynamo.symbolic_convert.InstructionTranslator object at 0x7fe7af6b0280>, proxy = Proxy(kwargs_past_key_values_29_3_)
example_value = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
options = {'guards': {Guard(name="kwargs['past_key_values'][29][3]", source=<GuardSource.LOCAL: 0>, create_fn=<function GuardBui...le=False, guard_types=None, code_list=None, obj_weakref=None, guarded_class_weakref=None)}, 'should_specialize': False}
initial_example_value = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
_clone_input = <function wrap_fx_proxy_cls.<locals>._clone_input at 0x7fe7af8b9dc0>
def wrap_fx_proxy_cls(target_cls, tx, proxy, example_value=None, **options):
if "guards" in options and options["guards"] is not None:
tx.output.guards.update(options["guards"])
assert "example_value" not in proxy.node.meta
if not config.dynamic_propagation:
if isinstance(example_value, torch.Tensor):
options.update(target_cls.specialize(example_value))
return target_cls(proxy, **options)
initial_example_value = example_value
def _clone_input(value):
if isinstance(value, torch.Tensor):
# tensor subclasses will not be converted to FakeTensors and need to be cloned
if not isinstance(value, torch._subclasses.fake_tensor.FakeTensor):
# NB: ensure strides are preserved
value = clone_input(value)
return value
with preserve_rng_state():
if example_value is None:
example_value = get_fake_value(proxy.node, tx)
else:
> proxy.tracer.real_value_cache[proxy.node] = _clone_input(example_value)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:694:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
value = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
def _clone_input(value):
if isinstance(value, torch.Tensor):
# tensor subclasses will not be converted to FakeTensors and need to be cloned
if not isinstance(value, torch._subclasses.fake_tensor.FakeTensor):
# NB: ensure strides are preserved
> value = clone_input(value)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/variables/builder.py:685:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
x = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
def clone_input(x):
"""copy while preserving strides"""
def torch_clone(x):
y = torch.clone(x)
if x.is_leaf:
y.requires_grad_(x.requires_grad)
if x.is_leaf and x.grad is not None:
y.grad = clone_input(x.grad)
return y
with torch.no_grad():
if x.device.type == "xla":
# Access data_ptr() for a xla tensor will cause crash
return torch_clone(x)
needed_size = sum(
(shape - 1) * stride for shape, stride in zip(x.size(), x.stride())
)
if x.is_quantized:
result = torch.empty_quantized((needed_size + 32,), x)
else:
result = torch.empty(needed_size + 32, dtype=x.dtype, device=x.device)
cache_line_offset = (
(x.data_ptr() - result.data_ptr()) % 32
) // x.element_size()
result.as_strided_(x.size(), x.stride(), cache_line_offset)
try:
result.copy_(x.clone())
if x.is_leaf:
result.requires_grad_(x.requires_grad)
if x.is_leaf and x.grad is not None:
result.grad = clone_input(x.grad)
except RuntimeError:
# RuntimeError: unsupported operation: more than one element of the written-to
# tensor refers to a single memory location. Please clone() the tensor before
# performing the operation.
> return torch_clone(x)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/utils.py:436:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
x = tensor([[[[-3.9331e-01, 2.4365e-01, 4.9634e-01, ..., -3.2983e-01,
5.1117e-02, 4.9341e-01],
[...129e-01, -7.0947e-01, ..., 6.1218e-02,
2.4744e-01, -2.6904e-01]]]], device='cuda:0', dtype=torch.float16)
def torch_clone(x):
> y = torch.clone(x)
E torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 22.20 GiB total capacity; 17.57 GiB already allocated; 6.12 MiB free; 20.84 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
E
E Set torch._dynamo.config.verbose=True for more information
E
E
E You can suppress this exception and fall back to eager by setting:
E torch._dynamo.config.suppress_errors = True
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/utils.py:403: OutOfMemoryError
The above exception was the direct cause of the following exception:
benchmark = <kernl.benchmark.benchmark_fixture.BenchmarkFixture object at 0x7fe85b69b910>, implementation = 'optimized'
@setup_dynamo()
@pytest.mark.parametrize("implementation", ["optimized"])
def test_whisper_hf(benchmark, implementation):
audio_dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large").to("cuda")
optimize_model(model.model.encoder)
optimize_model(model.model.decoder)
processor = WhisperProcessor.from_pretrained("openai/whisper-large")
speech_data = audio_dataset[0]["audio"]["array"]
inputs = processor(speech_data, return_tensors="pt", sampling_rate=16_000).input_features.to("cuda")
with torch.inference_mode(), torch.autocast(dtype=torch.float16, cache_enabled=True, device_type="cuda"):
> predicted_ids = benchmark(model.generate, inputs, min_length=25, max_length=25, num_beams=2, do_sample=False)
test/test_torchdynamo.py:162:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
src/kernl/benchmark/benchmark_fixture.py:53: in __call__
function_to_benchmark(*args, **kwargs)
/usr/local/lib/python3.9/dist-packages/torch/autograd/grad_mode.py:34: in decorate_context
return func(*args, **kwargs)
/usr/local/lib/python3.9/dist-packages/transformers/generation_utils.py:1577: in generate
return self.beam_search(
/usr/local/lib/python3.9/dist-packages/transformers/generation_utils.py:2747: in beam_search
outputs = self(
/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1480: in _call_impl
return forward_call(*args, **kwargs)
/usr/local/lib/python3.9/dist-packages/transformers/models/whisper/modeling_whisper.py:1192: in forward
outputs = self.model(
/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1480: in _call_impl
return forward_call(*args, **kwargs)
/usr/local/lib/python3.9/dist-packages/transformers/models/whisper/modeling_whisper.py:1061: in forward
decoder_outputs = self.decoder(
/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py:1480: in _call_impl
return forward_call(*args, **kwargs)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/eval_frame.py:209: in _fn
return fn(*args, **kwargs)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/eval_frame.py:329: in catch_errors
return callback(frame, cache_size)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/convert_frame.py:466: in _convert_frame
result = inner_convert(frame, cache_size)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/convert_frame.py:103: in _fn
return fn(*args, **kwargs)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/utils.py:90: in time_wrapper
r = func(*args, **kwargs)
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/convert_frame.py:337: in _convert_frame_assert
return _compile(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
code = <code object run at 0x7fe85b69cdf0, file "/kernl/test/test_torchdynamo.py", line 142>
globals = {'@py_builtins': <module 'builtins' (built-in)>, '@pytest_ar': <module '_pytest.assertion.rewrite' from '/usr/local/li...auto.AutoModelForSeq2SeqLM'>, 'AutoTokenizer': <class 'transformers.models.auto.tokenization_auto.AutoTokenizer'>, ...}
locals = {'args': (), 'kwargs': {'attention_mask': None, 'cross_attn_head_mask': None, 'encoder_hidden_states': tensor([[[-9.05...eps=1e-05, elementwise_affine=True)
)
)
(layer_norm): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
)}
builtins = {'ArithmeticError': <class 'ArithmeticError'>, 'AssertionError': <class 'AssertionError'>, 'AttributeError': <class 'AttributeError'>, 'BaseException': <class 'BaseException'>, ...}
compiler_fn = <function _compiler at 0x7fe8416fab80>, one_graph = False, export = False, guard_export_fn = None, frame = <frame at 0x7fe81ce13c80, file '/kernl/test/test_torchdynamo.py', line 142, code run>
def _compile(
code: types.CodeType,
globals,
locals,
builtins,
compiler_fn,
one_graph,
export,
guard_export_fn=None,
frame=None,
) -> Optional[GuardedCode]:
output: Optional[OutputGraph] = None
# from .utils import print_once; print_once(code.co_filename)
def transform(instructions, code_options):
nonlocal output
tracer = InstructionTranslator(
instructions,
code,
locals,
globals,
builtins,
code_options,
compiler_fn,
one_graph,
export,
)
tracer.run()
output = tracer.output
assert output is not None
assert output.output_instructions
instructions[:] = output.output_instructions
code_options.update(output.code_options)
if config.dead_code_elimination:
instructions[:] = remove_pointless_jumps(remove_dead_code(instructions))
try:
for attempt in itertools.count():
try:
out_code = transform_code_object(code, transform)
orig_code_map[out_code] = code
break
except exc.RestartAnalysis:
log.debug("Restarting analysis ...")
if attempt > 100:
unimplemented("100+ RestartAnalysis() calls")
except exc.SkipFrame:
log.debug(
f"Skipping frame {code.co_name} \
{code.co_filename} {code.co_firstlineno}"
)
if one_graph:
log.debug("No graph captured with one_graph=True")
return None
output_codes.add(out_code)
log.log(
logging.CODE, # type: ignore[attr-defined]
format_bytecode(
"ORIGINAL BYTECODE",
code.co_name,
code.co_filename,
code.co_firstlineno,
code,
),
)
log.log(
logging.CODE, # type: ignore[attr-defined]
format_bytecode(
"MODIFIED BYTECODE",
code.co_name,
code.co_filename,
code.co_firstlineno,
out_code,
),
)
assert output is not None
assert output.guards is not None
CleanupManager.instance[out_code] = output.cleanups
check_fn = CheckFunctionManager(output, output.guards, locals, globals)
guarded_code = GuardedCode(out_code, check_fn.check_fn)
guard_str = "GUARDS:\n"
guard_str += "\n".join([f" - {str(guard)}" for guard in sorted(output.guards)])
log.log(logging.CODE, guard_str) # type: ignore[attr-defined]
if guard_export_fn is not None:
guard_export_fn(output.guards)
return guarded_code
except (
Unsupported,
TorchRuntimeError,
BackendCompilerFailed,
AssertionError,
) as e:
exception_handler(e, code, frame)
raise
except Exception as e:
exception_handler(e, code, frame)
> raise InternalTorchDynamoError() from e
E torch._dynamo.exc.InternalTorchDynamoError
/usr/local/lib/python3.9/dist-packages/torch/_dynamo/convert_frame.py:456: InternalTorchDynamoError
----------------------------------------------------------------------------------------------------------- Captured log call -----------------------------------------------------------------------------------------------------------
WARNING datasets.builder:builder.py:747 Found cached dataset librispeech_asr_dummy (/root/.cache/huggingface/datasets/hf-internal-testing___librispeech_asr_dummy/clean/2.1.0/d3bc4c2bc2078fcde3ad0f0f635862e4c0fef78ba94c4a34c4c250a097af240b)
=========================================================================================================== warnings summary ============================================================================================================
../usr/lib/python3/dist-packages/requests/__init__.py:89
/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.13) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================================================== short test summary info ========================================================================================================
FAILED test/test_torchdynamo.py::test_whisper_hf[optimized] - torch._dynamo.exc.InternalTorchDynamoError
Minified repro
No response
Issue Analytics
- State:
- Created 10 months ago
- Comments:12 (5 by maintainers)
Top Results From Across the Web
Issues Ā· pytorch/torchdynamo - GitHub
A Python-level JIT compiler designed to make unmodified PyTorch programs faster. - Issues Ā· pytorch/torchdynamo.
Read more >Out of memory error during evaluation but training works fine!
I have recently upgraded pytorch from 0.2 to 0.3. Surprisingly my old programs are throwing an out of memory error during evaluation (in...
Read more >Solving the āRuntimeError: CUDA Out of memoryā error
If you're running a model on GPU, there are ways to figure what is causing your machine to output a "Runtime: CUDA Out...
Read more >Runtime error: CUDA out of memory by the end of training and ...
The problem is your loss_train list, which stores all losses from the beginning of your experiment. If the losses you put in were...
Read more >Trainer - Hugging Face
If using a transformers model, it will be a PreTrainedModel subclass. ... memory management system doesn't track any memory allocated outside of pytorch....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
With āeagerā, I canāt raise the cache_size_limit above 64 without getting OOM
With āofiā, even at cache_size_limit=64, Iām getting OOM, also a bunch of these Warnings that I havenāt had time to research