Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Keeping of observations information and selection cuts on the datasets

See original GitHub issue

Datasets are thought as stand alone objects that can be saved after data reduction and loaded into a gammapy session to perform modeling and fitting.

In order to do be able to perform dataset selection or some diagnostic plots, keeping track of a number of observation conditions as well as some selections applied during dataset production is necessary.

What to store, where and how to store it should be discussed. How this relates to provenance is also an issue.

In PR #2447 , I tried to add a Table on the dataset that is a copy of the DataStoreObservation.obs_info property, i.e. the corresponding line in the ObservationTable. The table can then be stacked when stacking dataset or on the fly when working with a Datasets object. While this is a possible solution, we decided to close the PR to get a better view of what we need.

Some issues raised:

we also want to keep track and store selection cuts applied. A clear example are the ON and OFF regions used to create a SpectrumDatasetOnOff.
if you want to perform some grouping of datasets from a Datasets object, you should not have to load all datasets in memory if you only are interested in their meta informations.
You do not want to duplicate provenance information

Suggestions @adonath , @Bultako, @cdeil ?

Issue Analytics

State:
Created 4 years ago
Comments:6 (6 by maintainers)

Top GitHub Comments

1reaction

adonathcommented, Jun 9, 2022

can we decide on the content of the meta table for 1.0

Honestly, I think we can’t, not in the given time frame. The topic is too big and will require a lot of discussion and documenting. Especially distinguishing from provenance info etc. We could add some arbitrary keywords and observation info, but it is very likely to break for v2.0. Technically users can already fill the meta_table with any kind of info, but then they are on their own. And we should not encourage it until we have a clear vision of which info belongs there.

adding some info about the background extraction details (eg: the fitted normalisations of each dataset for the FoV method) can be quite convenient

I think in this case, that’s not where the information should go. It is model specific information and for this we already have Models. I don’t think we should ever store any model specific information on a MapDataset. The recommended way to keep track of the background parameters is rather:

models_bkg = Models()

for obs in observations:
    ...
    dataset.models = [FoVBackgroundModel(dataset_names=[dataset.name])]
    dataset = fov_bkg_maker.run(dataset, obs)
    models_bkg.extend(dataset.models)

models_bkg.write("bkg_models.yaml")

0reactions

AtreyeeScommented, Jun 9, 2022

Reviving this issue again - can we decide on the content of the meta table for 1.0 ? The current meta_table is not very useful (even though it has a lot of potential). Along with Observation meta info, adding some info about the background extraction details (eg: the fitted normalisations of each dataset for the FoV method) can be quite convenient

Top Results From Across the Web

Selecting a subset of observations with a complicated criterion

Question. I have a dataset, and I wish to work with a subset of observations, and that subset is defined by a complicated...

Feature Extraction Techniques - Towards Data Science

Another commonly used technique to reduce the number of feature in a dataset is Feature Selection. The difference between Feature Selection ...

Keeping the Observations You Want

This section will introduce several ways to subset a dataset using statements and options. IF Statement, subsetting: The IF statement, used alone, tells...

How to Choose a Feature Selection Method For Machine ...

The difference is that feature selection select features to keep or remove from the dataset, whereas dimensionality reduction create a ...

SAS Tutorials: Subsetting and Splitting Datasets - LibGuides

Both processes create new datasets by pulling information out of an ... The criteria for keeping an observation is called the inclusion ......