question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Generating HDF5 detections from custom dataset or bottom-up-attention TSV

See original GitHub issue

I have a custom dataset,

I have generated the detections TSV using : https://github.com/airsplay/py-bottom-up-attention But the model requires HDF5.

TSV has these per each example:

{
   'image_id': image_id,
   'image_h': np.size(im, 0),
   'image_w': np.size(im, 1),
   'num_boxes' : len(keep_boxes),
   'boxes': base64.b64encode(cls_boxes[keep_boxes]),
   'features': base64.b64encode(pool5[keep_boxes])
}  

When examining the coco dataset examples I see the following for example:

>>> dts["35368_boxes"]
<HDF5 dataset "35368_boxes": shape (37, 4), type "<f4">
>>> dts["35368_features"]
<HDF5 dataset "35368_features": shape (37, 2048), type "<f4">
>>> dts["35368_cls_prob"]
<HDF5 dataset "35368_cls_prob": shape (37, 1601), type "<f4">
>>> dts["35368_boxes"][36]
array([349.57147, 154.07967, 420.0327 , 408.64462], dtype=float32)

I’ll try to figure out how to convert my TSV to required HDF5 myself from the code but guide would be appreciated.

Thank you.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:10

github_iconTop GitHub Comments

5reactions
eugeniotonanzicommented, May 13, 2021

I’m working on this either, still haven’t done it myself but I think you just need to convert the tsv into a hdf5 file, it has nothing to do with M2T or py-bottom-up-attention code. You read your tsv using csv or pandas and then you can use libraries like h5py to store and save your data in hdf5 format using names “<id>_boxes”, “<id>_features” and “<id>_cls_prob”, in which you put data relative to bounding box corners, feature vectors and class probabilities, as specified in M2T repo readme file. I believe it would be straightforward, don’t know about how much time it would take. Let me know if you manage to do it

2reactions
MatteoStefaninicommented, May 13, 2021

Hi everyone, thank you @eugeniotonanzi for your answer, that should exactly solve the problem. Once you have a hdf5 file for your custom dataset with the same format, the model should work as expected. Let us know if you have any other issues. Best, Matteo

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tutorial: Creating HDF5 Dataset - Sik-Ho Tsang - Medium
In this story, a simple tutorial is described to create a Hierarchical Data Format (HDF5) dataset using the CIFAR-10 dataset as example.
Read more >
HDF5 Feature-Barcode Matrix Format -Software - Support
The feature reference is stored as an HDF5 group called features , within the matrix group. Note that for Targeted Gene Expression samples,...
Read more >
Converting hdf5 to csv or tsv files - Stack Overflow
Problem with h5dump is that it gives data in hierarchical form and when we open it in excel it doesn't output as excepted.I...
Read more >
HDF5 files in Python - GeeksforGeeks
HDF5 file stands for Hierarchical Data Format 5. ... We use the [:] to create a copy of the dataset d1 into the...
Read more >
HDF5 format - GATK - Broad Institute
Section 2 then goes into the details of HDF5, and section 3 outlines how to navigate HDF5 data using HDFView. 1. TSV data...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found