OGC Disaster Pilot 2022 Sprint Meta Issue

Context

Quansight is conducting a sprint as part of the 2022 OGC Disaster Pilot. The aim of the sprint is to demonstrate how QHub/Nebari could be used to quickly spin up a data science platform on the cloud of choice to provide practitioners the computational tools needed to respond to a disaster. The Sprint is scheduled for July

Audience & Structure

The event will be focused on scientists and engineers in the geospatial-ocean-met space. There will be two tracks/tutorials.

Tutorial demonstrating the use of the (Pangeo stack)[https://pangeo.io/) on QHub/Nebari
Tutorial walking through the installation of QHub/Nebari

It is expected that some participants will not want to install QHub themselves and will only be interested in learning about the Pangeo stack. The plan is to give them accounts on a hosted QHub/Nebari. Most probably via ESIP or via the OGC.

Currently, the plan is for a two day sprint event starting with two tutorials on the first day taking not more than the morning, after this there will be an async mechanism for folks to ask questions as they either try out the Pangeo stack in the hosted QHub/Nebari OR try installing QHub/Nebari for themselves.

High Priority Issues

Installation

Clear working instructions on how to install QHub/Nebari on the clouds we decide to support
Make sure we have sane instance sizes for the clouds we use in the demo
- Small/Large/High Mem/Cheap GPU etc
- Is our default conda store pod size reasonable
- Are our Dask workers configured correctly for our instance sizes
- We may be able to use the ESIP deployment as an example.
Clear documentation of how to use Keycloak
- How do groups and roles work. Is there a difference
- Explain special groups i.e. currently Admin/Developer/Analyst
- How do I add/remove people to QHub. (ideally this should be doable by anyone in the admin group) not just with the root password.
- How are groups and shared folders connected.
conda-store
- fix for user created environments conda-environments not showing up for CDS-Dashboards
- fix namespace clashes between filesystem conda-envs and user created environments
- documentation around how to create/delete environments and namespaces and conda-store configuration
- Rename default and filesystem namespaces and also explain the different between the envs that come from git and user created environments

Demo List

@dharhas will split this out in a new issue but at a high level. We will be demonstrating:

Dask (https://github.com/nebari-dev/nebari/issues/39)
Xarray
CDSDashboards (https://github.com/nebari-dev/nebari/issues/13)
Panel/Holoviz (related to CDS https://github.com/nebari-dev/nebari/issues/13)
VSCode (https://github.com/nebari-dev/nebari/issues/55)
Long running workflows via kbatch (https://github.com/nebari-dev/nebari/issues/46)
Scheduled cronjobs via kbatch (https://github.com/nebari-dev/nebari/issues/47)
using jupyter-ssh & tmux to access your pod and run things *
using a token to connect to dask-gateway from local machine *
multistep workflow via argo* (https://github.com/nebari-dev/nebari/issues/48 and https://github.com/nebari-dev/nebari/issues/47)

stretch goals

Open Questions

There is a requirement to show installation capability on multiple clouds. Do we support all 4 or say for the sprint we will only support 2.
Which cloud install will we use in the demo.Probably AWS since many @rsignell-usgs demos use AWS hosted datasets
Are we comfortable with our GPU support. Preference is to
Do we want to refer to everything as Nebari. This is my preference.
some of the datasets like ‘sentinel-1’ etc require AWS credentials. how do we handle this in the sprint.

Out of Scope

Renaming of code to use Nebari is not required for this. We can explain we are in the middle of a rebranding.

References

2021 Disaster Pilot

Original Issue description from @iameskild below.

Clear deployment instructions

For the items listed below, most of these docs need to be validated / improved.

I think that although not perfect, much of the team’s effort in the past few months has been to stabilize the deployment process and I think with a few improvements and updates to the docs, we’re in good shape for the demo. Here are a few items that could be addressed as a stretch goal:

Demos and clear documentation for the core services

These docs can be the demos or instructions walking users through some of the core features and services.

Dask
Conda-Store
- 🔴 Establish and document that the only way to handle conda environments is by adding them to the qhub-config.yaml.
  - Perhaps even remove the mention that environments can be created any other way…
  - This will change in the future as conda-store is developed but given the issues with namespaces, it makes sense to simplify the messaging.
- Test / clean up current conda-store docs - user_guide
- Test / clean up current conda-store docs - management
CDS Dashboards
- Test / clean up current CDS Dashboards docs
- https://github.com/Quansight/qhub/issues/1285
- https://github.com/Quansight/qhub/pull/1070 - for me to wrap up
KBatch
- 🔴 https://github.com/Quansight/qhub/issues/1311
  - https://github.com/Quansight/qhub/pull/1335
- 🔴 https://github.com/Quansight/qhub/issues/1322
Argo (stretch-goal)
- Validate argo docs
- Create example workflow tutorial

Given that we have some “bare-bones” examples of much of the above, it might be worthwhile developing a few more complex example notebooks. These could fall into the “tutorial” section of the diataxis documentation framework.

🔴 - deemed highest priority by eskild

Issue Analytics

State:
Created a year ago
Comments:7 (7 by maintainers)

Top GitHub Comments

1reaction

rsignell-usgscommented, Jun 23, 2022

Regarding notebooks to use for the Sprint, do any of these look interesting? https://gallery.pangeo.io/index.html

With slight modification (creating the Dask cluster), they should all work on Qhub/Nebari

I also tested cloning the Element 84 geo-notebooks repo and this Planetary Computer remote sensing notebook might be nice: https://jupyter.qhub.esipfed.org/hub/user-redirect/lab/tree/shared/users/rsignell/repos/geo-notebooks/notebooks/odc-planetary-computer.ipynb

1reaction

dharhascommented, Jun 9, 2022

Actually the infracost work #1315 needs to be part of this sprint as well.

Top Results From Across the Web

OGC Disaster Pilot 2022 Sprint Meta Issue #1318 - GitHub

The event will be focused on scientists and engineers in the geospatial-ocean-met space. There will be two tracks/tutorials. ... It is expected ...

Disaster Pilot 2022 | OGC - Open Geospatial Consortium

Building on the success and outcomes of the Disaster Resilience Pilot, Disaster Pilot 2021, and subsequent preparatory tasks, OGC is now preparing to...

The OGC Blog - GISCafe

OGC invites developers to the Quansight Automated Data Science Developer Sprint. Tuesday, June 28th, 2022. The Sprint, part of OGC's Disaster Pilot ...

Nadine Alameh on LinkedIn: Join the OGC - online Code Sprint!

Just finished prepping for my talk on Open Geospatial Standards for the #Metaverse at #iitsec2022 on Wednesday. The event is TedTalk style, and...

OGC Newsroom | Page 133 | OGC - Open Geospatial Consortium

The sprint also sought to identify issues as well as options for addressing them. ... Hear from experts on the OGC Disasters Resilience...