makepath was proud to sponsor the Dask Distributed Summit, which took place from May 19-21. The summit gathered a diverse group of users from across the world to explore the scalability of Python analytics with Dask.
Brendan Collins, Co-founder of makepath, was one of the speakers at the summit. Brendan highlighted Xarray-Spatial, an open source Python library that implements common raster analysis functions on geospatial data.
Xarray-Spatial was created to bridge the gap between GIS analysts and professionals in other disciplines where interdisciplinary semantics can create a bit of a disconnect. For instance, in the GIS world we call the first derivative of elevation, slope. However, if you are working in Numpy, the slope may be specified as the gradient.
The first talk of the summit was given by Draga Doncila Pop, a key contributor to napari, an open source Python scientific image viewer. Draga’s talk focused on “Interactive Visualization and Near Real-time Analysis on Out-of-core Satellite Images” using napari.
napari was used to demonstrate that even with complex images and limited RAM, new and experienced Python programmers can analyze data with minimal code. Focusing on “Big Data, Low Effort”, napari is an n-dimensional viewer that includes a time slider to solve issues related to viewing images over time.
Draga showed how napari helped researchers with MonashVegMap, a Sentinel 2A land cover study of Victoria, Australia. The researchers involved in the study were having difficulty viewing their datasets over time and comparing different visualizations to validate their results.
napari enabled the researchers to view their data over time and hover over a pixel to instantly get its classification. An interactive NDVI widget is also built into napari, so users can compute NDVI layers and profiles of a pixel over time.
Tom Augspurger, Geospatial Infrastructure Engineer at Microsoft and key contributor on the AI for Earth team, talked about Scalable Geospatial Data Analysis with Dask.
Cloud Optimized GeoTIFFs (COGs) in blob storage as URLs have some structure, but it is not the friendliest way to work with the data. Tom discussed a standard solution.
The Microsoft Planetary Computer, an initiative Tom is heavily involved in, can work in tandem with an emerging standard STAC (Spatial Temporal Asset Catalog) to make data computations easier in a cloud native way. This standard provides a structured way to describe spatio-temporal datasets.
Tom’s Quick Pop Quiz
Answer: Dask will do a bit of I/O up front but it just does enough I/O to construct the metadata and subsequent operations are lazy.
Jim Bednar, Anaconda veteran and creator of Datashader, highlighted new advances for using Datashader to visualize geospatial data, as part of a Scaling Geospatial Vector Data Workshop.
Datashader is an open source graphics pipeline system that can easily render, update, and interactively select large datasets— without dependence on any Geo tools. Most data formats can be quickly turned into a fixed raster with Datashader, and rasterized data can be analyzed using a raster analysis library such as Xarray-Spatial.
When asked about using HoloViews on Dask dataframes, Jim highlighted that HoloViews is built on a big stack of every library in the universe, so the format you use will be consistent as long as both HoloViews and Datashader support a given structure.
You can watch a recording of the talk here. (26:34)
Curious about more ways Dask can be used to scale Python analytics? Let us know your thoughts or questions in the comments section below or connect with us email@example.com.