Generating HDF5 detections from custom dataset or bottom-up-attention TSV
See original GitHub issueI have a custom dataset,
I have generated the detections TSV using : https://github.com/airsplay/py-bottom-up-attention But the model requires HDF5.
TSV has these per each example:
{
'image_id': image_id,
'image_h': np.size(im, 0),
'image_w': np.size(im, 1),
'num_boxes' : len(keep_boxes),
'boxes': base64.b64encode(cls_boxes[keep_boxes]),
'features': base64.b64encode(pool5[keep_boxes])
}
When examining the coco dataset examples I see the following for example:
>>> dts["35368_boxes"]
<HDF5 dataset "35368_boxes": shape (37, 4), type "<f4">
>>> dts["35368_features"]
<HDF5 dataset "35368_features": shape (37, 2048), type "<f4">
>>> dts["35368_cls_prob"]
<HDF5 dataset "35368_cls_prob": shape (37, 1601), type "<f4">
>>> dts["35368_boxes"][36]
array([349.57147, 154.07967, 420.0327 , 408.64462], dtype=float32)
I’ll try to figure out how to convert my TSV to required HDF5 myself from the code but guide would be appreciated.
Thank you.
Issue Analytics
- State:
- Created 2 years ago
- Comments:10
Top Results From Across the Web
Tutorial: Creating HDF5 Dataset - Sik-Ho Tsang - Medium
In this story, a simple tutorial is described to create a Hierarchical Data Format (HDF5) dataset using the CIFAR-10 dataset as example.
Read more >HDF5 Feature-Barcode Matrix Format -Software - Support
The feature reference is stored as an HDF5 group called features , within the matrix group. Note that for Targeted Gene Expression samples,...
Read more >Converting hdf5 to csv or tsv files - Stack Overflow
Problem with h5dump is that it gives data in hierarchical form and when we open it in excel it doesn't output as excepted.I...
Read more >HDF5 files in Python - GeeksforGeeks
HDF5 file stands for Hierarchical Data Format 5. ... We use the [:] to create a copy of the dataset d1 into the...
Read more >HDF5 format - GATK - Broad Institute
Section 2 then goes into the details of HDF5, and section 3 outlines how to navigate HDF5 data using HDFView. 1. TSV data...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m working on this either, still haven’t done it myself but I think you just need to convert the tsv into a hdf5 file, it has nothing to do with M2T or py-bottom-up-attention code. You read your tsv using csv or pandas and then you can use libraries like h5py to store and save your data in hdf5 format using names “<id>_boxes”, “<id>_features” and “<id>_cls_prob”, in which you put data relative to bounding box corners, feature vectors and class probabilities, as specified in M2T repo readme file. I believe it would be straightforward, don’t know about how much time it would take. Let me know if you manage to do it
Hi everyone, thank you @eugeniotonanzi for your answer, that should exactly solve the problem. Once you have a hdf5 file for your custom dataset with the same format, the model should work as expected. Let us know if you have any other issues. Best, Matteo