nature is cruel; statistics is worse

Image for post
Image for post
source: altmodern / Getty

The online dating giant OkCupid runs a blog here on Medium. Every now and then they put out something really fantastic, like this archived article from 2014 that highlighted some fascinating discrepancies on the battlefield of love.

It turned some heads and generated some delightfully disruptive graphs, and has since been deleted — probably because peering at the truth underneath oceans of data tends to be somewhat unpleasant.

You could, of course, argue that OkCupid’s data is biased or this interpretation is skewed. The dating app Hinge put out an analytics article in 2017, which reached quite similar conclusions.

One interesting quote compares ‘likes’ received by users to wealth distribution in…


black holes and temporal dimensions

Image for post
Image for post
The Persistence of Memory, Dalí, 1931

As we move through the world, walking or flying in physical space, we also move through time. It’s well argued to be the 4th dimension, for good reason: it takes time to do anything.

But then it gets a bit more tricky — we’re probably in something resembling Minkowski Spacetime, which at its simplest means that time is bound together with space rather than being a separate, purely impartial ‘observer’.

This may clue us in as to why time dilation occurs at the edge of a black hole, among other things.

Image for post
Image for post
source: Visualizing stars’ impact on the ‘fabric’ of spacetime, compared to the ‘drop-off’ of the event horizon.

density = mass / volume : The singularity has zero meaningful volume, so any mass (50/50 as for whether or not black holes truly have “mass”) yields functionally infinite density. …


supraspatial decisionmaking

Image for post
Image for post
source

A core data science task is classification: grouping data points into various groups based on certain shared qualities.

In a sense, it’s an exercise as old as life itself: as soon as the first protozoan developed sensory organs, it (accidentally) started to act differently based on various sensory stimulus.

On a higher biological level, it’s a monkey looking at an object hanging from a branch and deciding “food” or “not food”.
On a machine level, it’s your ML model combing through credit transactions and deciding “fraud” or “not fraud”.

Image for post
Image for post
“Astrology and Data Science”

You’ve probably heard of clustering as a technique for classification; it’s easy enough to visualize on a two-dimensional graph, or even with a Z axis added in. …


relationships are complicated

Image for post
Image for post
source

The connections between data can often tell us more than the data itself.

Nothing in this world exists in a vacuum — everything is a part of something else, every piece of information is interlinked with other data. Ignore context at your own risk.

But since graphs are everywhere — and there’s no shortage of ways to record them as simple data structures — how do we go about analyzing these graphs?

Well, you could start by feeding them into a neural network, experimenting until something goes horribly wrong, and then trying again with more graphs. …


unpacking Viviene Clay’s findings in reinforcement learning

Image for post
Image for post
source

I’ve written a few articles on Hierarchical Temporal Memory neural networks, which encode data into Sparse Distributed Representations to make noise-resistant predictions that consider multiple time-steps from the past input feed.

The key pioneer of HTM technologies, Numenta, holds weekly research meetings to discuss interesting new theories and advancements in machine learning and neuroscience.
Recently they invited Viviane Clay, a PhD student from the Institute of Cognitive Science in Osnabrück, to discuss her fascinating experiments in embodied reinforcement learning.

Her experimental reasoning can be summarized as such:

  1. Observing nature can lead to clues on how to efficient perform a task
  2. We can observe how humans learn to gain insight into artificial…

tessellation, resolution and you

Image for post
Image for post
John Nelson’s Hexperiment

This is from John Nelson’s Hexperiment. He uses some wicked cool tools to draw hexagonal filters over NASA photos of the United States, further breaks those hexagons down into tessellating diamonds, and makes the whole thing fairly interactive with Chipotle location density data.

Last week I spent an entire article rambling about the intrinsic power of hexagons: how bees and our brain use the same structure to build maximally efficient grid networks.

While I work on other technically-heavy hexagonal grids, I wanted to share a simple and enjoyable plotting framework. Infuse your graphs with the apiary power of hexbins:

Image for post
Image for post
Greenhouse gas tracking using NASA satellite data

This is a visualization I made back in May of 2013’s yearly-averaged sulfur dioxide readings. When you’re working with latitude & longitude or other coordinate systems, there’s no visualization more powerful than the hexgrid. …


bees figured this out ages ago

Image for post
Image for post
source

If you’re like me, you’ve always had a strange feeling that bees (much like dolphins) know more than they’re letting on.

They know architecture, and communicate navigational information through dance. Their elaborate caste-based hierarchy and collective survival impetus is striking. There is something mathematical about them, and that is deeply unsettling.

I’ve recently been looking into encoding data to binary representations for use in neuromorphic AI systems. I’ve always been fascinated with mathematical and artistic representations of space, distance and volume.

Searching for the intersection of these ideas has led me to the same destination as bees: the hexagon.

Let me make this quite clear: we are only starting to learn the extent to which honey-gathering insects know more than us. Bees build their hives with tessellating hexagons. What do they know? …


time is technically a graph

Image for post
Image for post
source

It is difficult to overstate exactly how versatile graphs are. Almost everything that exists can be represented as a graph. I’m not just talking the obvious cases like molecules or social networks, either:

Image for post
Image for post

3D modeling is essentially graph structure, with vertices of polygons comprising complex objects.

Fluid dynamics? Absolutely.

Architecture? You bet.

I’m quite certain that even sheet music can be represented by a graph, once someone figures out how to turn a profit from it.

Image for post
Image for post
this is about as dense as deriving linear algebra

These ubiquitous and powerful structures are quite difficult to capture, however, when it comes to machine learning. I pondered my frustrations on binary encoding these structures a while back, and concluded that our brains understand graphs by constructing a neuron-synapse graph in our own grey matter. …


cats to binary arrays

Image for post
Image for post
source: CaptainMeo

We’re really good at looking at things. Image recognition & object classification is an extremely complex task, but your brain learned what certain objects are earlier and life and “generalized”.

So you can see a cat you haven’t seen before at a strange angle, yet still recognize the furry beast as such — you have a level of “abstract” understanding of what photons hitting your retinal cells constitutes a feline.

This is wicked tough for a computer to do, by the way.

You’ll need thousands of cat pictures to train a “cat or not-cat” convolutional neural network, and you’d better hope the data is diverse (cats that don’t look like cats at odd lighting or angles) or it’ll be surprised by new data. …


auto-manage your portfolio with RESTful APIs

Image for post
Image for post
source

Last week I built a simple trading algorithm that buys or sells based on temporal AI-generated Bitcoin price predictions. It delivered 9% average returns in 90 days, which was quite fun to see.

However, it was a dreadfully basic implementation, for several reasons:
It didn’t simulate “noise” such as slippage or various exchange fees that a portfolio will face in the real world.

It also, quite importantly, wasn’t the present real world. I’m happy to boast of great success on historical data, but if the bot hasn’t made any real cash as of yet, many would understandably restrain their enthusiasm.

Past behavior is the best predictor of future behavior — right up until it isn’t. …

About

Mark Cleverley

data scientist, machine learning engineer. passionate about ecology, biotech and AI. https://www.linkedin.com/in/mark-s-cleverley/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store