Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Add MoleculeNet wrappers for Factors/Kinase/UV datasets

See original GitHub issue

I recently ran python benchmark.py from examples/ using 1 K80 on a Amazon EC2 p2.xlarge instance. The benchmark failed after 18.5 hours with the following error:

Traceback (most recent call last):
  File "benchmark.py", line 1097, in <module>
    test=test)
  File "benchmark.py", line 179, in benchmark_loading_datasets
    featurizer=featurizer)
  File "/home/ubuntu/deepchem/examples/kaggle/kaggle_datasets.py", line 139, in load_kaggle
    shard_size=shard_size)
  File "/home/ubuntu/deepchem/examples/kaggle/kaggle_datasets.py", line 74, in gen_kaggle
    train_dataset = loader.featurize(train_files, shard_size=shard_size)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/data/data_loader.py", line 197, in featurize
    shard_generator(), data_dir, self.tasks, verbose=self.verbose)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/data/datasets.py", line 440, in create_dataset
    for shard_num, (X, y, w, ids) in enumerate(shard_generator):
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/data/data_loader.py", line 174, in shard_generator
    self.get_shards(input_files, shard_size)):
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/utils/save.py", line 98, in load_csv_files
    for df in pd.read_csv(filename, chunksize=shard_size):
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 646, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 389, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 730, in __init__
    self._make_engine(self.engine)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 923, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 1390, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4184)
  File "pandas/parser.pyx", line 609, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:7348)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/gzip.py", line 163, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/deepchem/examples/kaggle/KAGGLE_training_disguised_combined_full.csv.gz'

The time elapsed was: real 1125m29.127s user 1723m4.204s sys 170m16.504s

This issue is to ensure that the /home/ubuntu/deepchem/examples/kaggle/KAGGLE_training_disguised_combined_full.csv.gz file is either made present before benchmarking, and if not to handle more gracefully.

Issue Analytics

State:
Created 6 years ago
Comments:13 (11 by maintainers)

Top GitHub Comments

1reaction

rbharathcommented, Jan 18, 2018

Haven’t had a chance to write a MoleculeNet load function, but the datasets are online at:

Contributions welcome for adding simple dc.molnet wrappers. Will rename the issue to reflect that.

1reaction

rbharathcommented, Dec 5, 2017

This one’s on me. Ping me in a few weeks if I haven’t taken care of it yet.

Top Results From Across the Web

MoleculeNet — deepchem 2.6.2.dev documentation

Open an issue to discuss the dataset you want to add to MolNet. Write a DatasetLoader class ... Add documentation for your loader...

MoleculeNet

MoleculeNet is a benchmark specially designed for testing machine learning ... All methods and datasets are integrated as parts of the open source...

MoleculeNet Dataset | Papers With Code

MoleculeNet is a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, ...

MoleculeNet Part 1: Datasets for Deep Learning in the ...

In this post, we show how to add datasets to the MoleculeNet benchmark for molecular machine learning and make them programmatically ...

Introduction to Moleculenet | Kaggle

The set of MoleculeNet loaders is actively maintained by the DeepChem community and we work on adding new datasets to the collection.