Creating & Importing Data for ArcGIS
geographic formatting & spatial dataframes
Last time we installed ArcGIS with minimum fuss and began exploring their mapmaking tools, including raster pixel tilings and searching for more pre-uploaded GIS items online.
However, there’s a fairly fundamental rule of data science:
The data you want usually isn’t in the place or shape you want it to be.
This is one reason why I believe data science careers are fairly safe from AI replacement. Machine learning is great at re-treading previously seen pathways, but the first 80% of the data process — ETL, including cleaning and migrations — is variable enough to require human input.
It could also be that the people researching AI algorithms aren’t interested in making themselves obsolete, but I doubt it; the drive to automate and simplify your own workflow is generally too high for such far-flung fears.
Sourcing Proper Data
Let’s explore ArcGIS’ native data creation & import settings before running some more visualizations.
For any meaningful analysis you’d like to do, the data probably isn’t already up on the ArcGIS cloud and neatly formatted for feature layering, though they do have some nice searchable layers already out there.
If you’re fortunate you’ve just grabbed a pre-cleaned CSV from some government-hosted API, or perhaps you had to painstakingly query a SQL server that hasn’t been updated since 1980.
CSV-compatible formats are by far the most common, so we’ll begin working under those assumptions. Even graph networks are JSON-compatible, which is what we’ll eventually upload.
Go and find some data of interest to you. For example, the Johns Hopkins University Center for Systems Science and Engineering maintains a beautifully ordered repository of domestic and international COVID-19 data:
And a huge list of government and private sources — fine data to work with. I pulled down yesterday’s US daily report in CSV format.
Double Check your Data
This is all made quite simple with Jupyter notebooks; nothing more than df = pd.read_csv(‘../data/covid_data.csv’)
and you’ve got a brilliant dataframe.
File paths are important as well, if only to keep your project neatly structured with subdirectories. Using ../
to extend the working directory is quite the same as keeping your clothes neatly folded in drawers rather than strewn haphazardly about the room.

The data is quite well ordered; we’ve got 58 rows, one for each US state/region.
The actual case numbers are in their own columns, and show “numbers so far” rather than “new numbers today”.
If we want to compare them later over time, we could just aggregate each daily CSV in the repository and compare it against the previous day. JHU also put out a time series CSV that seems to have already taken care of this.
Getting a general statistical overview with df.describe()
is also quite helpful:

The mean death count of 5093 for 58 regions lines up well with the 296,000 deaths so far. The mean Case_Fatality_Ratio
of 1.72 is somewhat alarming, of course — that should indicate an average of one death per 1.72 recorded cases, unless the documentation says otherwise.
Fitting Into the Pipe
Now we’ve got workable data and know (generally) what it’s about, we can put it into ArcGIS.
Their Spatially Enabled Dataframe extends Pandas’ usual Dataframe, so there isn’t too much complexity involved in working between the two.
We start by making a GIS object to interface with the various modules:
from arcgis.gis import GIS
# instantiate new GIS object
gis = GIS("https://www.arcgis.com", 'username', 'password')
ArcGIS uses Feature Collections as cloud-hostable data structures. We can convert our covid dataframe by passing it through the gis.content.import_data()
method:
# import dataframe as feature collection
covid_fc = gis.content.import_data(df)
Now we need to transform it to a JSON (since we’re working with an API, after all):
import json# create feature_collection_property dict
covid_fc_dict = dict(covid_fc.properties)# convert to JSON
covid_json = json.dumps({'featureCollection':{"layers": [covid_fc_dict]}}) # careful with the double nesting format
Finally we assign the covid_json
to some Item
properties to make it more easily searchable, then create & add the final Item
object to our GIS.

The above data is now a searchable GIS Item stored on your user-info cloud. Next time we’ll access the piped data to create a mapping feature layer.