question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

tutorials: caught MemoryError when "Running in bulk" in deep/define-ml-pipeline#running-in-bulk

See original GitHub issue

Please provide information about your setup DVC version: 0.40.2 (installed by pip) OS: Ubuntu 18.04 RAM: 8GB

~~I am following a tutorial in https://dvc.org/doc/tutorial/define-ml-pipeline.~~ UPDATE: This refers to http://localhost:3000/doc/tutorials/deep/define-ml-pipeline#running-in-bulk now.

In “Running in bulk” section, I failed to run this command and caught an error.

$ dvc run -d code/featurization.py -d code/conf.py \
            -d data/Posts-train.tsv -d data/Posts-test.tsv \
            -o data/matrix-train.p -o data/matrix-test.p \
            python code/featurization.py
Running command:
	python code/featurization.py
The input data frame data/Posts-train.tsv size is (66999, 3)
Traceback (most recent call last):
  File "code/featurization.py", line 48, in <module>
    train_words = np.array(df_train.text.str.lower().values.astype('U'))
MemoryError
ERROR: failed to run command - stage 'matrix-train.p.dvc' cmd python code/featurization.py failed

Having any troubles?. Hit us up at https://dvc.org/support, we are always happy to help!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:30 (28 by maintainers)

github_iconTop GitHub Comments

5reactions
ryokugyucommented, Jun 27, 2019

@Naba7 I am working on a new tutorial. It will be up soon. With a smaller dataset and fewer RAM requirements.

3reactions
depatecommented, Mar 6, 2020

This issue affects me aswell.

Paste from my Discord message:

What’s wrong? While running featurization.py I get some kind buffer overflow. 16GB of RAM get consumed in seconds and the execution halts after a couple of seconds of system freeze.

I get a The input data frame data/Posts-train.tsv size is (66999, 3) output, so far the code is valid. But the next step most likely goes sideways, because a injected print(test) does not show up after train_words.

My setup includes 16GB of RAM. Despite the older statements I don’t get a memory error raised. I think dvc may not be verbose about python errors.

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to Handle the MemoryError in Python - Rollbar
A MemoryError is an error encountered in Python when there is no memory available for allocation. Learn two ways to solve this.
Read more >
When to catch MemoryError in Python? - Stack Overflow
In Python list is a linked list so the memory Exception MemoryError will be raised only when the system can't allocate additional memory....
Read more >
Fix out of Error Memory Error in Windows 10 - YouTube
Issues addressed in this tutorial : your computer is low on memory your ... desktops,and tablets running the Windows 10, Windows 8/8.1, ...
Read more >
Out of memory error running batch job with Corticon Server 5.5.1
Out of memory error is captured in the event viewer log. ... This issue was resolved by increasing the timeout for in-house application...
Read more >
Memory error checking in C and C++: Comparing Sanitizers ...
A program running under Valgrind could run 20 to 50 times slower ... Clang option to catch uninitialized memory reads: -fsanitize=memory .
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found