Build dataset
See original GitHub issueHi,
Is it possible to build my own dataset via torchani
? Is it necessary to use the built-in ANI dataset to train models? I did not find any information in the document.
Thanks.
Issue Analytics
- State:
- Created 4 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
How to build your own dataset for Data Science projects
You want to begin with a project, construct a model and run for the results and actively looking for a dataset? Why not...
Read more >Creating datasets | BigQuery - Google Cloud
Open the BigQuery page in the Google Cloud console. Go to the BigQuery page · In the Explorer panel, select the project where...
Read more >How to Create a Dataset for Machine Learning - Section.io
This article gives an overview of how datasets are created for Machine Learning models. Having good quality data is very important to ML ......
Read more >Preparing Your Dataset for Machine Learning: 10 Steps
Preparing Your Dataset for Machine Learning: 10 Basic Techniques That Make Your Data Better · 1. Articulate the problem early · 2. Establish...
Read more >Build and load - Hugging Face
When you load a dataset for the first time, Datasets takes the raw data file and builds it into a table of rows...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@njzjz You can have your own dataset in any format. But our dataset loader only supports the hdf5 we are using, as @farhadrgh mentioned, they should have have the same key as our format. But using our data loader is not mandatory, and you can always load your data in your favorite way, convert it to PyTorch tensors, and feed it to your training pipeline.
Yes, you can create your dataset and use it for training (not with TorchANI though) as long as each entry has the same keys as the sample HDF5 datasets. This should be done using libraries like h5py.
The required keys for compatible datasets are:
coordinates
,species
,energies
, andforces
.