Neurological Time Series/Anomaly Detection: Hierarchical Temporal Memory

predicting power consumption with a ‘closer to biology’ neural network

Image for post
Image for post
3D imaged & colored section of hippocampus: University of Hong Kong

I really talked up Hierarchical Temporal Memory a while ago. It’s still rather new and far from the industry standard for deep learning, but its results are hard to argue with. I’m also a big believer in “emulate form to get function”, so I dove right into Numenta’s NuPIC HTM Python library to try and show some results for all my adulation.

Bad news: It’s written in 2.7. However, the open-source HTM community has put together their own fork, where they recoded the bindings (C++ base) to run in Python 3. I was able to install on a Mac running Mojave 10.14 via the command line PyPI option (after running ) without too much hassle.

There’s some different syntax and naming conventions (documented on the fork linked above), but it’s the same tech as NuPIC’s official package — just a bit more granular. HTM.Core feels like Pytorch compared to NuPIC’s Keras.

Hitting the Gym

The real strength of HTM lies in pattern recognition, so I explored HTM.Core’s example:
Using a gym’s power consumption & timestamps, it trains a simple model to predict the next likely power value & detect anomalous activity. It’s an elegant way of showing the how HTM deals with patterns, and there’s a huge amount of industry applications for time series & anomaly detection.

I ran into a few syntax & Runtime errors and had to recode some parts, so let’s go through the interesting bits. The entire code is up on my Github as if you want to check out the whole Jupyter notebook.

The process goes something like:

  1. Get data from .CSV
  2. Create Encoder
  3. Create Spatial Pooler
  4. Create Temporal Memory
  5. Training loop, predicting on each iteration
  6. Check outputs, make some graphs

Starts with a CSV, ends with graphs. That’s how you know it’s data science.

Binary Encodings

The distinguishing factor of HTM models are that they only work with binary inputs. Specifically, Sparse Distributed Representations: a bit vector (generally longer than 2000 places) of 1s and 0s. These can be visualized as a square for easy comparison.

This is possible through any implementation of an Encoder: an object designed to take a data type (int, string, image etc) and convert it into an SDR. A good encoder ensures that “similar” input data creates SDRs that are also similar, by way of having overlapping bits.

Here’s an example from Cortical.io: Three SDRs, left to right representing , , and :

Just by looking at the shared activated bits (1s), you can see that and are a little closer to each other than , but there’s still some mammalian bit overlap going on. Cortical’s Retina HTM model is wicked cool, but we’ll talk about their semantic embedding some other time.

So we’ve got and . How do we encode this?

dateEncoder = DateEncoder(
timeOfDay = (30,1) # DateTime is a composite variable
weekend = 21 # how many bits to allocate to each part
scalarEncoderParams = RDSE_Parameters() # encoding a continuous var
scalarEncoderParams.size = 700 # SDR size
scalarEncoderParams.sparsity = 0.02 # 2% sparsity magic number
scalarEncoderParams.resolution = 0.88
scalarEncoder = RDSE(scalarEncoderParams) # 'random distributed scalar encoder'
encodingWidth = (dateEncoder.size + scalarEncoder.size)
enc_info = Metrics( [encodingWidth], 999999999) # performance metrics storage obj

And there’s the Encoder objects set up; we’ll combine those later in the training loop.

A Dip in the Pool

The next step is the Spatial Pooler: the part that takes the encoded input SDR & translates it to a sparse, more ‘balanced’ SDR while maintaining spatial relationships. It’s a little harder to explain without watching Matt’s nifty video, but I’ll give a quick rundown with the image below.

The left side is the Encoder’s output SDR (1s marked in blue), and the right is the Spatial Pooler’s output. The mouse hovers over one pool_cell, and displays circles over every input_cell connected to that pool_cell.
As you feed SDR data to the pooler, it reinforces the connections of those green circles; the cells that ‘match’ have their synapses strengthened, and the inverse applies to ‘misses’.

Image for post
Image for post
so it’s Battleship, kinda

Note how it says “Spatial Pooler Columns”. Each cell is actually a column of N cells (we’ll use 5); you’re looking from above at the topmost cell in each column. This’ll come into play later with temporal memory.

Initializing the pooler:

sp = SpatialPooler(
inputDimensions = (encodingWidth,),
columnDimensions = (spParams["columnCount"],), # 1638
potentialPct = spParams["potentialPct"], # 0.85
potentialRadius = encodingWidth,
globalInhibition = True,
localAreaDensity = spParams["localAreaDensity"], # .04
synPermInactiveDec = spParams["synPermInactiveDec"], # .006
synPermActiveInc = spParams["synPermActiveInc"], # 0.04
synPermConnected = spParams["synPermConnected"], # 0.13
boostStrength = spParams["boostStrength"], # 3
wrapAround = True
)
sp_info = Metrics(sp.getColumnDimensions(), 999999999)

A lot of this setup involves some well-tested default values that you can pickup from the documentation or examples. There’s a lot of room for parameter tweaking via swarming, of course, but that’s for another day.

Walking Down Memory Lane

Now the fancy part: Temporal Memory, which also runs on SDRs. I explained this in my last article, but again I believe HTM School’s video is invaluable for visualizing and understanding how the algorithm learns.

Remember how each pooler cell was actually a column? Now we’re looking at those columns from the side.

Image for post
Image for post
source

If you feed the TM the sequence it “bursts” various columns by activating all cells in a column. The four letters have distinct patterns (we’re seeing just a little piece of the same SDRs visualized earlier). This example feeds it as well.

Each cell is randomly connected to many others, and if B follows A, the connections between the cells_involved_in_B and cells_involved_in_A are strengthened. Those synaptic connections are what allows TM to ‘remember’ patterns.

The number of cells per column is also important. Note how (B_from_A) uses different cells in the same column as (B_from_X). If the columns only had one bit, there’d be no way of differentiating the two “past contexts”.

So the number of cells per column is essentially “how many inputs far back the TM can remember”.

tm = TemporalMemory(
columnDimensions = (spParams["columnCount"],),
cellsPerColumn = tmParams["cellsPerColumn"], # 13
activationThreshold = tmParams["activationThreshold"], # 17
initialPermanence = tmParams["initialPerm"], # 0.21
connectedPermanence = spParams["synPermConnected"],
minThreshold = tmParams["minThreshold"], # 19
maxNewSynapseCount = tmParams["newSynapseCount"], # 32
permanenceIncrement = tmParams["permanenceInc"], # 0.1
permanenceDecrement = tmParams["permanenceDec"], # 0.1
predictedSegmentDecrement = 0.0,
maxSegmentsPerCell = tmParams["maxSegmentsPerCell"], # 128
maxSynapsesPerSegment = tmParams["maxSynapsesPerSegment"] # 64
)
tm_info = Metrics( [tm.numberOfCells()], 999999999)

Time to Train

Now we’ve got nearly all the parts set up, we’ll put them all together in a training loop that iterates over each row of data.

Not only is an HTM model unsupervised, it trains & predicts as it goes — no need to tinker with batching like conventional neural nets.
Each and pair is encoded, spatial pooled, temporally memorized, and used for predictions on the fly, so we’ll be able to see its predictions improving as it learns from each SDR.

We make use of the and objects created earlier:

predictor = Predictor(steps=[1,5], alpha=0.1) # continuous output predictor
predictor_resolution = 1
inputs = [] # create inp/out lists
anomaly = []
anomalyProb = []
predictions = {1: [], 5:[]}
predictor.reset() # reset the predictor
for count, record in enumerate(records): # iterate through data
dateString = datetime.datetime.strptime(record[0], "%m/%d/%y %H:%M") # unstring timestamp
consumption = float(record[1]) # unstring power value
inputs.append(consumption) # add power to inputs


# use encoder: create SDRs for each input value
dateBits = dateEncoder.encode(dateString)
consumptionBits = scalarEncoder.encode(consumption)

# concatenate these encoded_SDRs into a larger one for pooling
encoding = SDR(encodingWidth).concatenate([consumptionBits, dateBits])
enc_info.addData(encoding) # enc_info is our metrics to keep track of how the encoder fares


# create SDR to represent active columns. it'll be populated by .compute()
# notably, this activeColumns SDR has same dimensions as spatial pooler
activeColumns = SDR(sp.getColumnDimensions())


# throw the input into the spatial pool and hope it swims
sp.compute(encoding, True, activeColumns) # we're training, so learn=True
tm_info.addData(tm.getActiveCells().flatten())


# pass pooled SDR through temporal memory
tm.compute(activeColumns, learn=True)

# make prediction based on current input & memory-context
pdf = predictor.infer( tm.getActiveCells() )
for n in (1,5):
if pdf[n]:
predictions[n].append( np.argmax( pdf[n] ) * predictor_resolution )
else:
predictions[n].append(float('nan'))

anomalyLikelihood = anomaly_history.anomalyProbability( consumption, tm.anomaly )
anomaly.append(tm.anomaly)
anomalyProb.append(anomalyLikelihood)

# reinforce output connections
predictor.learn(count, tm.getActiveCells(), int(consumption/predictor_resolution))

The last piece of the puzzle is the object, which is essentially a regular neural network that receives the TM’s output SDR and outputs the desired prediction — in our case, power consumption. It gets trained through backpropagation increment/decrements like most NNs.

The predictor is the “head” of the model: at each iteration we ask the model “what level of power consumption do you think will happen next?” and record the prediction to compare with the real value later.

Results

Image for post
Image for post

We’ve got some neat metrics to inspect the health of our HTM model. You want to keep an eye on & , generally speaking.

We calculate the Root-Mean-Squared-Error for output prediction accuracy of two “versions” of the model:
1) looking one step behind & 2) looking five steps behind.

{1: 0.07548016042172133, 5: 0.0010324285729320193} # RMSE
power_consumption:
min: 10
max: 90.9
mean: 31.3

For comparison, the units of power consumption range from 10 to 90, so this RMSE is looking pretty good.
It’s also nice to see that the 5-step model has a dramatically higher accuracy, reinforcing the idea that you pattern recognition leads to better predictions.

But remember— if you don’t make a graph, it’s not really data science:

Image for post
Image for post
Anomaly Y-axis: normalized power units. I love seaborn

The X-axis is ‘timesteps’, which is ~4,400 hourly readings from the same gym — about six months. Take a look at the green 5-time line on the top graph: it starts out with some wild miscalculations, but eventually starts to predict in sync with the actual next-value (red).

The model does a good job understanding daily and weekly fluctuations (that’s why we encoded both and as part of the Encoder SDR). Since we encode the Date, and thus the month, this model should pick up on seasonal power consumption shifts as well.

The anomaly prediction seems to encounter some weekly signal; since there’s 26 “double spikes” in the above graph, I’d reckon it’s marking the start and end of each weekend as anomalous activity. For a real anomaly detection system, we’d probably want to tune that so it doesn’t give unneeded worries every week.

Good job, brain-model

Not bad for fixing up an existing template. Again, this is mostly from the lovely lads behind HTM.Core — I just tidied up and commented.
There’s tons of other HTM applications; it really depends on how you configure the encoder, but theoretically anything can be turned into an SDR. If you have a meat grinder, anything can be made into a sausage.

I messed around with an MNIST numerical handwriting HTM classifier, which gets ~95% accuracy (though most models do good on MNIST these days).
I’d imagine HTM image systems would excel at video-feed object recognition, a tricky task: “5-step memory” can look at “5 frames before this frame”. If the model can ‘see’ wings flapping, it’s probably a bird, etc.

If all this sounds interesting, that’s because it is. The most efficient way to get started learning about HTM tech is Numenta’s HTM School, which I found intuitive and quite delightful.

Check out more at the official forums.

Written by

data scientist, machine learning engineer. passionate about ecology, biotech and AI. https://www.linkedin.com/in/mark-s-cleverley/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store