SciPy 2021 Banner

We are glad we got to participate in SciPy 2021! In addition to sponsoring the conference, members of the makepath team gave a talk, led a sprint and participated in many small-scale events. It was a great opportunity to make new connections and learn about interesting projects and open source libraries in the Python scientific community.

Now that recordings of talks given during SciPy 2021 are available, we wanted to recap some of our favorite talks amongst an exceptional lineup of tracks and content.

Check out all the recordings here.

SciPy 2021 Tracks

  • General
  • Machine Learning & Data Science
  • Data Visualization & Image Processing
  • Physics & Astronomy
  • Biology & Neuroscience
  • Computational Social Science & Digital Humanities
  • Earth, Ocean, Geo, & Atmospheric Science
  • Maintainers

Day 1

Track: Data Visualization & Image Processing
Talk Title: Needles in the haystack: Easy interactive dashboards allowing single-point selection in billion-row datasets
Speakers: Jean-Luc Stevens and Jim Bednar

Mentee-mentor pair and Anaconda veterans, Jean-Luc Stevens and Jim Bednar, gave a talk on using interactive open source data visualization tools to analyze millions (and billions) of data points.

Jean-Luc highlighted that while it is common to find publicly available datasets with billions of rows, good visualization tools are needed to get intuition for what is going on within the data.

Easy interactive dashboards allowing single-point selection

Two problems he explored related to large datasets are – Overplotting and expressing large dynamic ranges.

Overplotting: Sometimes loss of granularity occurs with large datasets and their resulting visualizations. Such visualizations may not show important degrees of nuance, as seen in the screenshot below.

Overplotting example

Expressing large dynamic ranges: Linear and log colormaps can be limited in their ability to adequately portray a very wide array of data. This is why large dynamic range issues are addressed with open source visualization libraries like Datashader, as seen in the screenshot below.

expressing dynamic ranges

Datashader, a rasterizer, can take an entire data frame or dataset and create an image out of it, up to hundreds of millions of points. 

The goal is to allow users to not only see patterns within the data but glean insights. This requires querying and actively interacting with the data. 

Visit examples.pyviz.org to see more examples of compelling data visualizations and PyViz.org to learn more about open source Python tools for data visualization.

Track: General
Talk Title: Building the NetworkX developer community
Speaker: K. Jarrod Millman

Jarrod, a core contributor for NetworkX, gave an overview of the history of the library as well as guidelines on how to build a developer community around an open source project. He also shared some lessons he has learned over the 10+ years that NetworkX has been around.

NetworkX is an open source library used to analyze graphs and complex networks with large amounts of data, up to tens of millions of nodes and edges.

NetworkX in the ecosystem

A major issue Jarrod emphasized when it comes to open source projects is the scenario where there are a lot of general contributors and only a small amount of core contributors. This divide leaves a few contributors doing most of the work, which was the case in the early days of NetworkX.

NetworkX Lesson 1NetworkX Lesson 2

Some general tips Jarrod gave for aspiring contributors are:

  • Find a project where you can quickly become a core contributor: Look for low hanging fruit, such as fun and interesting algorithms that are intuitive. You can also work on fixing minor errors within the code.
  • Find sources of funding to continue to encourage contributions: This helps to secure, sustain and train future core contributors or mentors.
  • Support for core developers is crucial: This allows work to continue and releases more time for mentoring new contributors.
  • Interdisciplinary collaborations are key: There was a geospatial component with little support within NetworkX. It turns out there was a whole ecosystem of open source Python tools for Geo applications available, so communication and collaboration across the ecosystem is key. 
  • Dependencies can make or break user experience: Thoroughly consider which dependencies to add and the potential downstream effects.

Day 3

Track: Earth, Ocean, Geo, & Atmospheric Science
Talk Title: It’s Time for the Atmospheric Science Community to ACT Together
Speaker: Adam Theisen

Adam Theisen from Argonne National Laboratory talked about leveraging open source tools to work toward solutions that benefit research efforts in the atmospheric science community. 

Argonne National Laboratory is a U.S. Department of Energy national laboratory for science and engineering research. The Atmospheric Radiation Measurement Program (ARM) at Argonne National Laboratory is working on an open source library called Atmospheric data Community Toolkit (ACT).

The goal of ACT is to:

  • Facilitate collaboration and break down the silos that exist within the atmospheric science research community.
  • Reduce duplication of effort.
  • Create mutually beneficial tools through the PyData ecosystem and standardize data across the atmospheric science community.

Atmospheric Data Community Toolkit

ACT Forecasting

Some areas of focus moving forward include improving visualization tools within ACT, giving visible credit to contributors and making citations easier for researchers who use data from ACT.

Day 4

Track: Data Visualization & Image Processing
Talk Title: Scaling up Python for Geo with Distributed Computing
Speaker: Brendan Collins

makepath’s Brendan Collins talked about all things scaling for Python-based geospatial analysis.

Scaling Geo for Distributed Computing

With Dask, users can scale up geoprocessing from one machine to many machines. Libraries like Datashader work with Dask arrays.

Numba is useful for speeding up algorithms which can then materialize over a Dask cluster. With Numba, users also benefit from the amazing work being done with GPUs.

SpatioTemporal Asset Catalogs (STAC) come in handy and make it easy to access and discover data while also helping to speed up algorithms.

Brendan highlighted a real world example of scaling algorithms using open source GIS library, Xarray-Spatial, connected to Microsoft’s Planetary Computer.

The Planetary Computer provides a rich set of satellite imagery, like Landsat, Sentinel and SRTM elevation datasets. It combines these datasets with a JupyterLab environment to allow for fast and easy connections to the data with open source tools.

Planetary Computer Data Catalog

For instance, you can use Xarray-Spatial to connect to the Planetary Computer via STAC to query SRTM elevation data on the Grand Canyon

There are also tutorials for the Planetary Computer that allow users to access existing datasets to use as examples to implement their own analyses.

SciPy2021 Schedule and Conference Recordings

Want to learn more about what was presented at SciPy 2021? Check out the full SciPy 2021 schedule and watch recordings from the conference here

Have any questions about how to get involved in the SciPy ecosystem and learn about key tools, projects and people? Visit SciPy.org or let’s connect at contact@makepath.com.