The Dask Developer Workshop

Last week makepath attended the first-annual Dask Developer Workshop at Capital One Labs in Arlington, VA. This unique workshop brought together 50+ Dask/Python experts to review the state of Dask.

TL;DR:

  • This was the first annual Dask Developer Workshop
  • For those unfamiliar with the project, Dask provides advanced parallelism for scaling Python to multithread, multicore, and multi-machine (cluster) scenarios.  
  • Dask is analogous to Apache Spark, but written in Python instead of Scala/Java.

Dask unites engineers and researchers of various disciplines in the pursuit of scalable analytics.  It was inspiring to see climate scientists working alongside quantitative finance experts, seismologists brainstorming with supply chain management managers, and astronomers helping civil engineers crunch data.  The multidisciplinary nature of the Dask community reinforces Python’s most important motto, “Programming for Everybody.”

New and Popular libraries discussed included:

One of the hot topics of discussion was the desire for heterogeneous Dask workers.  At the moment, Dask only supports a single worker type. By worker type, we mean the profile of a worker machine (number of CPUs/GPUs, RAM, threads, etc.). There are many use cases which could benefit from a cluster composed of various types of workers.  This is one area which I’m sure we’ll see development on in the coming year.

Jim Crist-Harif gave a great talk on dask-gateway, a project anybody using Dask should check out.  Dask-gateway provides Dask Clusters as a Service. 

Some important features of Dask-gateway include:

  • Deployment scripts versioned as part of your code.
  • No need for extra infrastructure, plays well with many deployment backends (YARN, Kubernetes, Slurm)
  • Just python libraries
  • Extensible design. 
  • Provides a REST api for managing clusters
  • SSL / TLS support for scheduler traffic
  • User resource limits
  • Automatic shutdown of idle clusters
  • Strong interoperability with JupyterHub

The Dask Developer workshop also provided an opportunity for members of the Pangeo Group to meet in person. Makepath is a proud contributor to the pangeo project. Pangeo is a consortium of engineers and climate scientists focused on using Open Source tools to better understand threats posed by climate change and the challenges of climate change adaptation.

Captial One Labs provided an amazing workspace for the workshop.  Anaconda and Nvidia sponsored our happy hours which provided some downtime for community members to interact and bond. 

Special thanks to Matthew Rocklin, creator of Dask, founder of Coiled Computing, breaker of chains, father of Pythons, for organizing and leading such a special event.

Want to learn more about Dask? The Dask Examples github repo is a great place to start!

Leave a Reply

Your email address will not be published. Required fields are marked *