question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Keeping of observations information and selection cuts on the datasets

See original GitHub issue

Datasets are thought as stand alone objects that can be saved after data reduction and loaded into a gammapy session to perform modeling and fitting.

In order to do be able to perform dataset selection or some diagnostic plots, keeping track of a number of observation conditions as well as some selections applied during dataset production is necessary.

What to store, where and how to store it should be discussed. How this relates to provenance is also an issue.

In PR #2447 , I tried to add a Table on the dataset that is a copy of the DataStoreObservation.obs_info property, i.e. the corresponding line in the ObservationTable. The table can then be stacked when stacking dataset or on the fly when working with a Datasets object. While this is a possible solution, we decided to close the PR to get a better view of what we need.

Some issues raised:

  • we also want to keep track and store selection cuts applied. A clear example are the ON and OFF regions used to create a SpectrumDatasetOnOff.
  • if you want to perform some grouping of datasets from a Datasets object, you should not have to load all datasets in memory if you only are interested in their meta informations.
  • You do not want to duplicate provenance information

Suggestions @adonath , @Bultako, @cdeil ?

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
adonathcommented, Jun 9, 2022

can we decide on the content of the meta table for 1.0

Honestly, I think we can’t, not in the given time frame. The topic is too big and will require a lot of discussion and documenting. Especially distinguishing from provenance info etc. We could add some arbitrary keywords and observation info, but it is very likely to break for v2.0. Technically users can already fill the meta_table with any kind of info, but then they are on their own. And we should not encourage it until we have a clear vision of which info belongs there.

adding some info about the background extraction details (eg: the fitted normalisations of each dataset for the FoV method) can be quite convenient

I think in this case, that’s not where the information should go. It is model specific information and for this we already have Models. I don’t think we should ever store any model specific information on a MapDataset. The recommended way to keep track of the background parameters is rather:

models_bkg = Models()

for obs in observations:
    ...
    dataset.models = [FoVBackgroundModel(dataset_names=[dataset.name])]
    dataset = fov_bkg_maker.run(dataset, obs)
    models_bkg.extend(dataset.models)

models_bkg.write("bkg_models.yaml")
0reactions
AtreyeeScommented, Jun 9, 2022

Reviving this issue again - can we decide on the content of the meta table for 1.0 ? The current meta_table is not very useful (even though it has a lot of potential). Along with Observation meta info, adding some info about the background extraction details (eg: the fitted normalisations of each dataset for the FoV method) can be quite convenient

Read more comments on GitHub >

github_iconTop Results From Across the Web

Selecting a subset of observations with a complicated criterion
Question. I have a dataset, and I wish to work with a subset of observations, and that subset is defined by a complicated...
Read more >
Feature Extraction Techniques - Towards Data Science
Another commonly used technique to reduce the number of feature in a dataset is Feature Selection. The difference between Feature Selection ...
Read more >
Keeping the Observations You Want
This section will introduce several ways to subset a dataset using statements and options. IF Statement, subsetting: The IF statement, used alone, tells...
Read more >
How to Choose a Feature Selection Method For Machine ...
The difference is that feature selection select features to keep or remove from the dataset, whereas dimensionality reduction create a ...
Read more >
SAS Tutorials: Subsetting and Splitting Datasets - LibGuides
Both processes create new datasets by pulling information out of an ... The criteria for keeping an observation is called the inclusion ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found