Hexbin Plots for Geospatial Data

tessellation, resolution and you

This is from John Nelson’s Hexperiment. He uses some wicked cool tools to draw hexagonal filters over NASA photos of the United States, further breaks those hexagons down into tessellating diamonds, and makes the whole thing fairly interactive with Chipotle location density data.

Last week I spent an entire article rambling about the intrinsic power of hexagons: how bees and our brain use the same structure to build maximally efficient grid networks.

While I work on other technically-heavy hexagonal grids, I wanted to share a simple and enjoyable plotting framework. Infuse your graphs with the apiary power of hexbins:

This is a visualization I made back in May of 2013’s yearly-averaged sulfur dioxide readings. When you’re working with latitude & longitude or other coordinate systems, there’s no visualization more powerful than the hexgrid.

Today we’ll be looking further into two options that are somewhat easier to get into than ArcGIS: Matplotlib’s hexbin() and Basemap.

Hexplotlib

I always figured the ‘mat’ in matplotlib stands for ‘matrix’ following MATLAB’s naming convention. But do hexagons translate neatly into matrices?
That’s an interesting question that’ll surely have me writing another article.

For now, check out the source code for their hexplotting example:

Image for post
Image for post

Any continuous data can technically be hexgraphed, in the same way that any numeric dataset can fit into a simple point graph or kernel density estimate plot.

We start by creating some X and Y data centered around a standard distribution with np.random.standard_normal(). Notice the use of max & min values to set a valid axis.

The standard matploglib fig, axs = plt.subplots() is what creates the figure, but to populate it we call ax.hexbin()which comes with a host of interesting parameters to explore.

Geographic Pandemic

NASA’s website is currently down (?) so rather than wait to re-download the GEODISC dataset, I browsed Kaggle for a more immediately relevant source of geospatial data. Today we’ll work with the John Hopkins COVID dataset that details the all of their data on the pandemic since 2019. It’s most recent update was October 3rd 2020, so the data should be quite interesting.

I loaded the 44MB CSV with pandas and drew a quick hexplot.

I laughed — not just exhaled air out of my nose, laughed — at the result of this first basic graph. A flattened beehive or some unfortunately asymmetric Rorschach, perhaps.

What we need to do is add nuance, namely by editing gridsize=(more,hexagons) and refining the color map & bin distribution scale.

This is more tolerable. Specifying a logarithmic mapping scale with bins='log' allows us to make better use of our ‘plasma’ colormap, which we detail by including plt.colorbar(hexbin).

But it’s still missing something important — the world map we’re trying to measure. Data is useless without context.

Basemapping Hexes

Matplotlib’s Basemap library is designed around the needs of various earth scientists, and aids immensely in fields such as weather forecasting and oceanography.

We can use it to easily draw 2D maps of the earth over our coordinate data. It invokes PROJ & Matplotlib to handle a huge amount of figure fitting and drawing behind the scenes with just a few lines of code.

Borders & coastlines make this all much easier to read. We’ve gone from hex-points floating in space to an annotated map.

But of course, when it comes to aesthetics we can always do better. Let’s invoke Basemap.shadedrelief():

Basemap draws us a beautifully blended terrain-shaded overlay to make our hexmap feel more like home.

My favorite so far is calling Basemap.bluemarble():

This is quite close, I daresay, to being useful to look at. The hexagons are small enough to look like dots, but we can easily control that resolution at a very granular level with gridsize=(x,y).

Every pandemic map begins at some point to resemble a population density graph. However, we can start to see some of the nuances of COVID-19’s overall impact throughout the world. As brighter colors indicate more cases, darker indicates fewer and none means no cases, we can get a clearer picture of how the world ended up over the past year.

Of course, this begets a host of other questions — are regions with no colored hexes free of infection? Only if they’re free of people, I’d reckon — more likely that many areas lack sufficient testing infrastructure. Hopkins can’t be reasonably expected to accurately aggregate the distributed (and often siloed) data of various countries.

Here in the US, we can see a visual representation of the spikes in coastal states such as California, Florida & NJ. A more detailed hexmap of the US may be of analytical use, considering the country’s troubled history with lockdown procedures.

In any case, there’s plenty of mapping resources to explore in the Basemap examples library, and you can start plotting your data with only the few lines printed above. I wish you all well in your own hexagonal journeys.

Written by

data scientist, machine learning engineer. passionate about ecology, biotech and AI. https://www.linkedin.com/in/mark-s-cleverley/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store