Network Graphs: A Practical Introduction

also known as graph networks

Here’s a graph.

Image for post
Image for post

And here’s a graph.

Image for post
Image for post

They’re different things. The latter is also called a network (graph), which makes more sense. You might see these floating around online, like strange webs made by alien spiders. I think they’re neat, and apparently so do a bunch of mathematicians. Let’s talk about why.

Components

Image for post
Image for post

Graphs are constructed of nodes, data points that represent or store some information, and edges, connections or relationships between nodes. Sounds simple. It’s not.

You can do a lot with just those two concepts. The left image illustrates some standard patterns within graphs that are used in many algorithms.

Graph Theory and Data Science

Modern data science ends up using graphs a lot. Think of social networks: You can visualize Facebook or LinkedIn as nodes and edges, users and connections.

Image for post
Image for post

Once you’ve got this, you can start to answer some interesting questions:

“Does a path exist from user A to user B?”

“Can we identify any ‘communities’ of highly-interconnected users, and can we identify any interesting characteristics about them?”

These questions, it turns out, can be monetized extremely well. Facebook, for example, can identify your interest based on who you interact with: if you post in a group where people post about cats, you’ll see ads for cat products. If you have a tight-knit group of friends who message each other a lot, and one of them takes an interest in skateboarding (recorded by cookies + FB outlinks), Zuckerberg will send the entire group some ads about skateboards, on the logic that people who are connected to each other influence each other.

At their core, graphs are a tool to derive structure from randomness.

Enter the Matrix

Image for post
Image for post

Everything interesting in graph theory is possible because we can turn graphs into matrices. An adjacency matrix represents a binary 1/0 to store which nodes are connected to each other. A degree matrix stores the number of connections on each node.

The laplacian is more complex, and some linear algebra comes into play: we need the eigenvectors and eigenvalues to derive it. Spectral Graph Theory, a book by Fan Chung, is a great resource to read more into this.

So let’s talk about what this can be used for.

Graphs are the foundation of PageRank, google’s own eigen-solver algorithm. Using a network of webpages, with (directed) edges representing links from page to page, it decides in which order to display pages based on their importance.

Image for post
Image for post

PR(pj) is your PageRank score. d is a dampening factor, between 0 and 1, that determines arrival by link (d% of the time) or arrival randomly (1-d% of the time). L(pj) is the number of incoming edges. Through some elegant mathematics, Google is able to quickly and effectively decide the ‘importance’ of a page to their average.

Image for post
Image for post
Image for post
Image for post

Graph theory is also used in manifold learning, an unsupervised process that tries to ascertain the geometry of the space the data lies in. In some ML/AI problems, data isn’t Euclidean, but is in a curved subspace “manifold”.

Manifold recovery involves building a graph, connecting sample points based on distance measures, then using random walks or eigen decomposition with LLE or ISOMAP.

Imagine a series of images of a slowly rotating object, like a tea cup. The data appears to be high dimensional, since it is the number of pixels in the image, but the intrinsic dimension is just one! LLE or Laplacian eigenmaps can discover the intrinsic dimension of the data by building a graph.

Image for post
Image for post

Here’s IOTA’s Tangle blockchain network. It’s directed and acyclic; there aren’t any clear, distinguishable cycles, and they all head downwards. Each node is a blockchain transaction. When a new transaction is added, it chooses two unconfirmed transactions, confirms them and adds itself, creating two new edges and nodes. A confirmed transaction can’t be confirmed by new incoming transactions, so this creates downwards directionality.

The Lightning blockchain Network (first image, above) is visualized as a series of nodes and payment channels. Payment channels are undirected, so users can send transactions back and forth endlessly once they’e been established. They boast of millisecond transaction times, because connections are user-to-user (a ledger of edges) rather than cloud-ledger verified each time. They liken it to drawing up contracts to do business with other people, which saves time by not going to court for enforcement and arbitration each time.

In summary, graph theory is a huge branch of mathematics with huge applications in computing and data science. They’re a unique way of mapping real-world structure into workable formulas, and can help answer some fascinating questions.

Written by

data scientist, machine learning engineer. passionate about ecology, biotech and AI. https://www.linkedin.com/in/mark-s-cleverley/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store