Integration with Github as a data portal
See original GitHub issueOverview
An important step for Frictionless Framework is to provide an ability to read and write packages from different data portals (CKAN/Github/Zenodod/etc) so the users can publish and access their packages easily and using a straightforward API. This issue is for Github integration. The implementation is already prototyped in v5 branch.
Specs
Read package
Read package from a repository that has a datapackage.json/yaml:
package = Package("https://github.com/datasets/population")
package = Package.from_github(...) # alias
Read package from a repository without a datapackage.json/yaml. We probably need to filter files and add only CSV/XLS(X) to the package. Also GithubControl should have this configurable. We need to map as much as possible metadata provided for Github Repository:
package = Package("https://github.com/frictionlessdata/repository-demo")
package = Package("<link>", control=portals.GithubControl(formats=['csv']))
Write package
Publish a package on Github (for now, only if the repo doesn’t exist). Also we need to provide an ability to store credentials in ENV/etc. We need to map as much as possible metadata provided by Package:
package.to_github(user=, repo=, api_key=)
Read catalog
Read catalog from github search. Design some search configurations like limit
and offset
(pagination).
catalog = Catalog(control=portals.GithubControl(search="<frictionless>")
for package in catalog.packages:
print(package.name)
Plan
- prototype the functionality based on the functional requirements
- get feedback from @roll on the implementation
- finish the implementation
- design the testing approach (probably using pytest.vcr fro reading but how to test writing?)
- write a great deal of tests to be sure that the integration works correctly
- write a comprehensive tutorial - https://framework.frictionlessdata.io/docs/tutorials/tutorials-overview (new section Portals Tutorials)
Issue Analytics
- State:
- Created a year ago
- Reactions:3
- Comments:6 (6 by maintainers)
Top GitHub Comments
Thanks. I have implemented to_github and from_github. I will add publish also.
Someting like this:
Then we need to use it in tests for writing/publishing