question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add MoleculeNet wrappers for Factors/Kinase/UV datasets

See original GitHub issue

I recently ran python benchmark.py from examples/ using 1 K80 on a Amazon EC2 p2.xlarge instance. The benchmark failed after 18.5 hours with the following error:

Traceback (most recent call last):
  File "benchmark.py", line 1097, in <module>
    test=test)
  File "benchmark.py", line 179, in benchmark_loading_datasets
    featurizer=featurizer)
  File "/home/ubuntu/deepchem/examples/kaggle/kaggle_datasets.py", line 139, in load_kaggle
    shard_size=shard_size)
  File "/home/ubuntu/deepchem/examples/kaggle/kaggle_datasets.py", line 74, in gen_kaggle
    train_dataset = loader.featurize(train_files, shard_size=shard_size)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/data/data_loader.py", line 197, in featurize
    shard_generator(), data_dir, self.tasks, verbose=self.verbose)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/data/datasets.py", line 440, in create_dataset
    for shard_num, (X, y, w, ids) in enumerate(shard_generator):
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/data/data_loader.py", line 174, in shard_generator
    self.get_shards(input_files, shard_size)):
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/utils/save.py", line 98, in load_csv_files
    for df in pd.read_csv(filename, chunksize=shard_size):
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 389, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 730, in __init__
    self._make_engine(self.engine)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 923, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 1390, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4184)
  File "pandas/parser.pyx", line 609, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:7348)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/gzip.py", line 163, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/deepchem/examples/kaggle/KAGGLE_training_disguised_combined_full.csv.gz'

The time elapsed was: real 1125m29.127s user 1723m4.204s sys 170m16.504s

This issue is to ensure that the /home/ubuntu/deepchem/examples/kaggle/KAGGLE_training_disguised_combined_full.csv.gz file is either made present before benchmarking, and if not to handle more gracefully.

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:13 (11 by maintainers)

github_iconTop GitHub Comments

github_iconTop Results From Across the Web

MoleculeNet — deepchem 2.6.2.dev documentation
Open an issue to discuss the dataset you want to add to MolNet. Write a DatasetLoader class ... Add documentation for your loader...
Read more >
MoleculeNet
MoleculeNet is a benchmark specially designed for testing machine learning ... All methods and datasets are integrated as parts of the open source...
Read more >
MoleculeNet Dataset | Papers With Code
MoleculeNet is a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, ...
Read more >
MoleculeNet Part 1: Datasets for Deep Learning in the ...
In this post, we show how to add datasets to the MoleculeNet benchmark for molecular machine learning and make them programmatically ...
Read more >
Introduction to Moleculenet | Kaggle
The set of MoleculeNet loaders is actively maintained by the DeepChem community and we work on adding new datasets to the collection.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found