Add MoleculeNet wrappers for Factors/Kinase/UV datasets
See original GitHub issueI recently ran python benchmark.py from examples/ using 1 K80 on a Amazon EC2 p2.xlarge instance. The benchmark failed after 18.5 hours with the following error:
Traceback (most recent call last):
File "benchmark.py", line 1097, in <module>
test=test)
File "benchmark.py", line 179, in benchmark_loading_datasets
featurizer=featurizer)
File "/home/ubuntu/deepchem/examples/kaggle/kaggle_datasets.py", line 139, in load_kaggle
shard_size=shard_size)
File "/home/ubuntu/deepchem/examples/kaggle/kaggle_datasets.py", line 74, in gen_kaggle
train_dataset = loader.featurize(train_files, shard_size=shard_size)
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/data/data_loader.py", line 197, in featurize
shard_generator(), data_dir, self.tasks, verbose=self.verbose)
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/data/datasets.py", line 440, in create_dataset
for shard_num, (X, y, w, ids) in enumerate(shard_generator):
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/data/data_loader.py", line 174, in shard_generator
self.get_shards(input_files, shard_size)):
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/deepchem/utils/save.py", line 98, in load_csv_files
for df in pd.read_csv(filename, chunksize=shard_size):
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 389, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 730, in __init__
self._make_engine(self.engine)
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 923, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/site-packages/pandas/io/parsers.py", line 1390, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 373, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4184)
File "pandas/parser.pyx", line 609, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:7348)
File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.5/gzip.py", line 163, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/home/ubuntu/deepchem/examples/kaggle/KAGGLE_training_disguised_combined_full.csv.gz'
The time elapsed was: real 1125m29.127s user 1723m4.204s sys 170m16.504s
This issue is to ensure that the /home/ubuntu/deepchem/examples/kaggle/KAGGLE_training_disguised_combined_full.csv.gz file is either made present before benchmarking, and if not to handle more gracefully.
Issue Analytics
- State:
- Created 6 years ago
- Comments:13 (11 by maintainers)
Top Results From Across the Web
MoleculeNet — deepchem 2.6.2.dev documentation
Open an issue to discuss the dataset you want to add to MolNet. Write a DatasetLoader class ... Add documentation for your loader...
Read more >MoleculeNet
MoleculeNet is a benchmark specially designed for testing machine learning ... All methods and datasets are integrated as parts of the open source...
Read more >MoleculeNet Dataset | Papers With Code
MoleculeNet is a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, ...
Read more >MoleculeNet Part 1: Datasets for Deep Learning in the ...
In this post, we show how to add datasets to the MoleculeNet benchmark for molecular machine learning and make them programmatically ...
Read more >Introduction to Moleculenet | Kaggle
The set of MoleculeNet loaders is actively maintained by the DeepChem community and we work on adding new datasets to the collection.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Haven’t had a chance to write a MoleculeNet load function, but the datasets are online at:
https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/FACTORS_test1_disguised_combined_full.csv.gz https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/FACTORS_test2_disguised_combined_full.csv.gz https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/FACTORS_training_disguised_combined_full.csv.gz https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/KINASE_test1_disguised_combined_full.csv.gz https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/KINASE_test2_disguised_combined_full.csv.gz https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/KINASE_training_disguised_combined_full.csv.gz https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/UV_test1_disguised_combined_full.csv.gz https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/UV_test2_disguised_combined_full.csv.gz https://s3-us-west-1.amazonaws.com/deepchem.io/datasets/UV_training_disguised_combined_full.csv.gz
Contributions welcome for adding simple
dc.molnet
wrappers. Will rename the issue to reflect that.This one’s on me. Ping me in a few weeks if I haven’t taken care of it yet.