OGC Disaster Pilot 2022 Sprint Meta Issue
See original GitHub issueContext
Quansight is conducting a sprint as part of the 2022 OGC Disaster Pilot. The aim of the sprint is to demonstrate how QHub/Nebari could be used to quickly spin up a data science platform on the cloud of choice to provide practitioners the computational tools needed to respond to a disaster. The Sprint is scheduled for July
Audience & Structure
The event will be focused on scientists and engineers in the geospatial-ocean-met space. There will be two tracks/tutorials.
- Tutorial demonstrating the use of the (Pangeo stack)[https://pangeo.io/) on QHub/Nebari
- Tutorial walking through the installation of QHub/Nebari
It is expected that some participants will not want to install QHub themselves and will only be interested in learning about the Pangeo stack. The plan is to give them accounts on a hosted QHub/Nebari. Most probably via ESIP or via the OGC.
Currently, the plan is for a two day sprint event starting with two tutorials on the first day taking not more than the morning, after this there will be an async mechanism for folks to ask questions as they either try out the Pangeo stack in the hosted QHub/Nebari OR try installing QHub/Nebari for themselves.
High Priority Issues
Installation
- Clear working instructions on how to install QHub/Nebari on the clouds we decide to support
- Make sure we have sane instance sizes for the clouds we use in the demo
- Small/Large/High Mem/Cheap GPU etc
- Is our default conda store pod size reasonable
- Are our Dask workers configured correctly for our instance sizes
- We may be able to use the ESIP deployment as an example.
- Clear documentation of how to use Keycloak
- How do groups and roles work. Is there a difference
- Explain special groups i.e. currently Admin/Developer/Analyst
- How do I add/remove people to QHub. (ideally this should be doable by anyone in the admin group) not just with the root password.
- How are groups and shared folders connected.
- conda-store
- fix for user created environments conda-environments not showing up for CDS-Dashboards
- fix namespace clashes between filesystem conda-envs and user created environments
- documentation around how to create/delete environments and namespaces and conda-store configuration
- Rename default and filesystem namespaces and also explain the different between the envs that come from git and user created environments
Demo List
@dharhas will split this out in a new issue but at a high level. We will be demonstrating:
- Dask (https://github.com/nebari-dev/nebari/issues/39)
- Xarray
- CDSDashboards (https://github.com/nebari-dev/nebari/issues/13)
- Panel/Holoviz (related to CDS https://github.com/nebari-dev/nebari/issues/13)
- VSCode (https://github.com/nebari-dev/nebari/issues/55)
- Long running workflows via kbatch (https://github.com/nebari-dev/nebari/issues/46)
- Scheduled cronjobs via kbatch (https://github.com/nebari-dev/nebari/issues/47)
- using jupyter-ssh & tmux to access your pod and run things *
- using a token to connect to dask-gateway from local machine *
- multistep workflow via argo* (https://github.com/nebari-dev/nebari/issues/48 and https://github.com/nebari-dev/nebari/issues/47)
- stretch goals
Open Questions
- There is a requirement to show installation capability on multiple clouds. Do we support all 4 or say for the sprint we will only support 2.
- Which cloud install will we use in the demo.Probably AWS since many @rsignell-usgs demos use AWS hosted datasets
- Are we comfortable with our GPU support. Preference is to
- Do we want to refer to everything as Nebari. This is my preference.
- some of the datasets like ‘sentinel-1’ etc require AWS credentials. how do we handle this in the sprint.
Out of Scope
- Renaming of code to use Nebari is not required for this. We can explain we are in the middle of a rebranding.
References
Original Issue description from @iameskild below.
Clear deployment instructions
For the items listed below, most of these docs need to be validated / improved.
- Installation
- 🔴 Getting setup
- 🔴
qhub init
andqhub deploy
to the cloud - Basic keycloak user / group management overview
I think that although not perfect, much of the team’s effort in the past few months has been to stabilize the deployment process and I think with a few improvements and updates to the docs, we’re in good shape for the demo. Here are a few items that could be addressed as a stretch goal:
Demos and clear documentation for the core services
These docs can be the demos or instructions walking users through some of the core features and services.
- Dask
- Conda-Store
- 🔴 Establish and document that the only way to handle conda environments is by adding them to the
qhub-config.yaml
.- Perhaps even remove the mention that environments can be created any other way…
- This will change in the future as conda-store is developed but given the issues with namespaces, it makes sense to simplify the messaging.
- Test / clean up current conda-store docs - user_guide
- Test / clean up current conda-store docs - management
- 🔴 Establish and document that the only way to handle conda environments is by adding them to the
- CDS Dashboards
- KBatch
- Argo (stretch-goal)
- Validate argo docs
- Create example workflow tutorial
Given that we have some “bare-bones” examples of much of the above, it might be worthwhile developing a few more complex example notebooks. These could fall into the “tutorial” section of the diataxis documentation framework.
🔴 - deemed highest priority by eskild
Issue Analytics
- State:
- Created a year ago
- Comments:7 (7 by maintainers)
Regarding notebooks to use for the Sprint, do any of these look interesting? https://gallery.pangeo.io/index.html
With slight modification (creating the Dask cluster), they should all work on Qhub/Nebari
I also tested cloning the Element 84 geo-notebooks repo and this Planetary Computer remote sensing notebook might be nice: https://jupyter.qhub.esipfed.org/hub/user-redirect/lab/tree/shared/users/rsignell/repos/geo-notebooks/notebooks/odc-planetary-computer.ipynb
Actually the infracost work #1315 needs to be part of this sprint as well.