Visualizing Sulfur Dioxide Pollution: a 2020 Story

graphing “the earth is healing”

Image for post
Image for post
Source: Reuters

Sulfur Dioxide is an often-overlooked, but rather harmful air pollutant. It’s not quite a greenhouse gas — injecting it into the atmosphere may have a cooling effect — but it’s a leading cause of acid rain and forms nasty compounds like sulfuric & sulfurous acid. It can combine with other compounds to form small particulate matter, which is extremely deleterious to lung health.
Volcanic activity can spew megatons into the sky, but troubling amounts of atmospheric SO² are human-made: industrial processing, power plants; most organic matter tends to have sulfur that is released as sulfur dioxide upon incineration.

As industries across the world came to a grinding halt these past few months, nature seems to be making a comeback. Visibility increases, animals return. I thought it would be neat to have a look at SO² data around the world, so last week I downloaded NASA’s “OMPS/NPP PCA SO2 Total Column 1-Orbit L2 Swath 50x50km V1” data, which has been providing global readings since 2012.

NASA’s Suomi-NPP satellite orbits ~15 times a day, and provides a lot of data.

39,000 .h5 files became 86gb after a 17-hour upload to the MySQL server, for a total of 563 million observations with 10 columns. Trying to wrangle it gave me a distinct sense of hardware deficiency, so I’ll be investigating Spark & big-data remote processing next time.

I pulled a sample of the data for exploratory visualization. Since each observation is a reading of particle concentration at a latitude & longitude, I figured hexbinning would be a great place to start. It’s a bit more sensitive than choropleths due to weighting and boundaries, and I’m a fan of tessellation.
Here’s a Mercator-style 40,000-row hexbin:

An interesting illustration of sinusoidal satellite patterns, certainly. I remembered that each orbit recorded ~11k observations, so I’d need a lot more data to paint a full picture. 10 million is better.

But not much better. You can sort of make out continents if you try hard. For a global picture, we’re going to need some better visualization tools than pure matplotlib.

I looked into Basemap for a simple way to combine geographical features with my hexplot. It’s fairly easy to get started with a Mercator-style representation, and they have a native basemap.hexbin( ) method.

Now we’re getting somewhere. I’ve already filtered out volcanic activity and the South Atlantic Anomaly (though some noise remains). It looks like 2020 saw some patchy output across most continents, though it’s still a little hard to pinpoint.

A decent, if slow, approach, would be to explore certain coordinates over time. Some investigative line plots and probably an interactive map in Heroku would be good for this, but number crunching will likely require remote computing on larger servers.

You can follow this project on my GitHub.

Written by

data scientist, machine learning engineer. passionate about ecology, biotech and AI.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store