Dask Summit 2021 Logo Banner

makepath was proud to sponsor the Dask Distributed Summit, which took place from May 19-21. The summit gathered a diverse group of users from across the world to explore the scalability of Python analytics with Dask.

Brendan Collins, Co-founder of makepath, was one of the speakers at the summit. Brendan highlighted Xarray-Spatial, an open source Python library that implements common raster analysis functions on geospatial data.

Xarray-Spatial was created to bridge the gap between GIS analysts and professionals in other disciplines where interdisciplinary semantics can create a bit of a disconnect. For instance, in the GIS world we call the first derivative of elevation, slope. However, if you are working in Numpy, the slope may be specified as the gradient.

Brendan Presentation Screenshot

Brendan explored several tools within Xarray-Spatial including the Proximity, Zonal, Focal, Classification, and Multispectral tools, and emphasized the goal of having all Xarray-Spatial tools enabled to run across Dask clusters.

You can watch a recording of the talk here.

Highlight 1: Interactive Visualization and Near Real-time Analysis on Out-of-core Satellite Images

The first talk of the summit was given by Draga Doncila Pop, a key contributor to napari, an open source Python scientific image viewer. Draga’s talk focused on “Interactive Visualization and Near Real-time Analysis on Out-of-core Satellite Images” using napari.

napari was used to demonstrate that even with complex images and limited RAM, new and experienced Python programmers can analyze data with minimal code. Focusing on “Big Data, Low Effort”, napari is an n-dimensional viewer that includes a time slider to solve issues related to viewing images over time.

Draga Presentation Screenshot

Draga showed how napari helped researchers with MonashVegMap, a Sentinel 2A land cover study of Victoria, Australia. The researchers involved in the study were having difficulty viewing their datasets over time and comparing different visualizations to validate their results.

napari enabled the researchers to view their data over time and hover over a pixel to instantly get its classification. An interactive NDVI widget is also built into napari, so users can compute NDVI layers and profiles of a pixel over time.

You can watch a recording of the talk here.

Highlight 2: Scalable Geospatial Data Analysis with Dask

Tom Augspurger, Geospatial Infrastructure Engineer at Microsoft and key contributor on the AI for Earth team, talked about Scalable Geospatial Data Analysis with Dask.

Cloud Optimized GeoTIFFs (COGs) in blob storage as URLs have some structure, but it is not the friendliest way to work with the data. Tom discussed a standard solution.

The Microsoft Planetary Computer, an initiative Tom is heavily involved in, can work in tandem with an emerging standard STAC (Spatial Temporal Asset Catalog) to make data computations easier in a cloud native way. This standard provides a structured way to describe spatio-temporal datasets.

Tom Presentation Screenshot 2

Tom’s Quick Pop Quiz

Tom Presentation Screenshot 1

Answer: Dask will do a bit of I/O up front but it just does enough I/O to construct the metadata and subsequent operations are lazy.

You can watch a recording of the talk here.

Highlight 3: Datashader for Scaling Geospatial Vector Data

Jim Bednar, Anaconda veteran and creator of Datashader, highlighted new advances for using Datashader to visualize geospatial data, as part of a Scaling Geospatial Vector Data Workshop.

Datashader is an open source graphics pipeline system that can easily render, update, and interactively select large datasets— without dependence on any Geo tools. Most data formats can be quickly turned into a fixed raster with Datashader, and rasterized data can be analyzed using a raster analysis library such as Xarray-Spatial.

Jim Presentation Screenshot

When asked about using HoloViews on Dask dataframes, Jim highlighted that HoloViews is built on a big stack of every library in the universe, so the format you use will be consistent as long as both HoloViews and Datashader support a given structure.

You can watch a recording of the talk here. (26:34)

Curious about more ways Dask can be used to scale Python analytics? Connect with us at contact@makepath.com.