Catalog by smart classes
See original GitHub issuePandas implementation of the catalog doesn’t work pretty well because of these two main issues:
-
Cannot implement a full extension of the classes without corner cases: https://github.com/CartoDB/cartoframes/issues/1032
-
The logic of the classes is delegated to the user and it makes quite complicated when the catalog amount of data increases.
Thus, let’s move to an approach where classes are smarter.
At the definition, we’re just including the properties that work as methods. The rest of the properties are not defined but must appear with the same name as in the metadata DB.
[]
Denotes a list that will work as an Entity List (see below)
Methods are replaced by properties. I think it’s better for a catalog, i.e., Catalog.countries
👍 instead of Catalog.countries()
EntityList
-
get
: It will allow finding by id:Catalog.countries.get('es')
-
to_dataframe
: returns a pandas dataframe of the list.
Classes
Catalog
Catalog.countries => [Country] #Static Catalog.datasets => [Datasets] #Static Catalog.categories => [Category] #Static
Country
Country.get(<country_id>) => Country #Static Country.id => String Country.categories => [Category] Country.datasets => [Dataset] Country.geographies => [Geography]
Category
Category.get(<category_id>) Category.id => String Category.datasets => [Dataset] Category.geographies => [Geography] #It returns all the geographies with datasets for this category (country and category), This instance of category must be create with the optional parameter category_id Category.countries => [Country]
Dataset
Dataset.get(<dataset_id>) #Static Dataset.id => String Dataset.variables => [Variable] Dataset.variables_groups => { ‘group_1’: [Variable], ‘group_2’: [Variable] } # It removes the concept of Variables Groups! Dataset.geography => Geography
Variable Variable.get(<variable_id>) #static Variable.id => String Variable.dataset => Dataset
Geography
Geography.get(<geography_id>) Geography.datasets = [Dataset] Geography.support = String (admin|quadgrid|postalcodes) Geography.support_level = 1,2,3,4 Geography.country = Country
If Geography class is instantiate by providing category_id, datasets method will return all the datasets filtered by the category provided.
Usage
Get all categories of a country
Country.get(‘usa’).categories
Convert a list to pandas
Country.get(‘usa’).categories.to_dataframe().head() Country.get(‘usa’).geographies.to_dataframe().head() Country.get(‘usa’).datasets.to_dataframe().head()
Get all datasets of a category
Country.get(‘usa’).categories.get(‘demographics’).datasets
Get all datasets of a category
Category.get(‘geomgraphics’).countries.get(‘usa’).datasets
Get all boundaries with demographics datasets
Country.get(‘usa’).categories.get(‘demographics’).geographies
Get all demographics datasets for block groups of a country
Country.get(‘usa’).categories.get(‘demographics’).geographies.get(‘block_groups’).datasets()
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (12 by maintainers)
Top GitHub Comments
After talking with @alasarr we agreed on that notation and decided to allow the entities to keep the filters of their creation so following queries will take them into account 😃
Missing methods have been added in PR #1093