Last week makepath attended the first-annual Dask Developer Workshop at Capital One Labs in Arlington, VA. This unique workshop brought together 50+ Dask/Python experts to review the state of Dask.
This was the first annual Dask Developer Workshop
For those unfamiliar with the project, Dask provides advanced parallelism for scaling Python to multithread, multicore, and multi-machine (cluster) scenarios.
Dask is analogous to Apache Spark, but written in Python instead of Scala/Java.
Dask unites engineers and researchers of various disciplines in the pursuit of scalable analytics. It was inspiring to see climate scientists working alongside quantitative finance experts, seismologists brainstorming with supply chain management managers, and astronomers helping civil engineers crunch data. The multidisciplinary nature of the Dask community reinforces Python’s most important motto, “Programming for Everybody.”
New and Popular libraries discussed included:
One of the hot topics of discussion was the desire for heterogeneous Dask workers. At the moment, Dask only supports a single worker type. By worker type, we mean the profile of a worker machine (number of CPUs/GPUs, RAM, threads, etc.). There are many use cases which could benefit from a cluster composed of various types of workers. This is one area which I’m sure we’ll see development on in the coming year.
Jim Crist-Harif gave a great talk on dask-gateway, a project anybody using Dask should check out. Dask-gateway provides Dask Clusters as a Service.
Some important features of Dask-gateway include:
Deployment scripts versioned as part of your code.
No need for extra infrastructure, plays well with many deployment backends (YARN, Kubernetes, Slurm)
Just python libraries
Provides a REST api for managing clusters
SSL / TLS support for scheduler traffic
User resource limits
Automatic shutdown of idle clusters
Strong interoperability with JupyterHub
The Dask Developer workshop also provided an opportunity for members of the Pangeo Group to meet in person. Makepath is a proud contributor to the pangeo project. Pangeo is a consortium of engineers and climate scientists focused on using Open Source tools to better understand threats posed by climate change and the challenges of climate change adaptation.
Captial One Labs provided an amazing workspace for the workshop. Anaconda and Nvidia sponsored our happy hours which provided some downtime for community members to interact and bond.
Special thanks to Matthew Rocklin, creator of Dask, founder of Coiled Computing, breaker of chains, father of Pythons, for organizing and leading such a special event.
Want to learn more about Dask? The Dask Examples github repo is a great place to start!
- Machine Learning for Change Detection: Part 1
- GPU-Enhanced Geospatial Analysis
- Open Source Machine Learning Tools (Updated for 2023)
- Getting Started with Open Source (Updated for 2023)
- The History of Open Source GIS: An Interactive Infographic (Updated for 2023)
- Superpowered GIS: ESRI’s ArcGIS + Open Source Spatial Analysis Tools.
- Seniors at Risk: Using Spatial Analysis to Identify Pharmacy Deserts
- Open Source Spatial Analysis Tools for Python: A Quick Guide (Updated for 2022)