Vectorizing Time with Time2Vec

Published on 2021-02-14

Introduction

Machine Learning models can only understand data that can be encoded in a numerical form.

Real Numbers such as Distances, Mass, Population, etc. can be simply normalized and fed to a Machine Learning Algorithm. Here’s a list of “type” of Data and most “standard” way of encoding them (pun intended).

Data Type	Encoding Method
Numerical - Ratio	Normalize
Numerical - Interval	Normalize/Label + One-Hot Encode
Categorical - Ordinal/Nominal	Label + One-Hot Encode
Images	Normalize Across Channels
Words	Label + One-Hot Encode
Graphs	Edge Index + Node Features

But things aren’t so straight forward for some type of data.

For example, how would you encode a Date? Traditionally, only features extracted from Dates have been used, such as “Weekend/Weekday”, “Season”, “Quarter” etc, as one would notice, that this would lead to a lot of information loss, how do we encapsulate the entire information in a given date?

Consider a date 2021-01-12, Should we normalize across Year, Month, and Day as separate features? or One-Hot encode them? Why is this a bad idea?

Periodicity of Days, Months, and Years is not captured by neural nets with non-periodic activation functions (Most neural nets).
Exploding/Vanishing Gradients due to growing inputs.

Time2Vec

Say \(\tau\ \in \mathbb{R}^d\) is a periodic input variable for which we want to learn a vector representation \(v \in \mathbb{R}^{k}\).

The Time2Vec “Layer” is defined as:

Where,

\(\cal F\) is a periodic function for e.g. \(Sine(\sdot)\) or \(Cosine(\sdot)\)
\(w_i, \varphi_{i}\) are learnable parameters

Points to Note:

\(k-1\) components learn periodic features, \(1 \le i \le k\).
One component allows learning of Non-periodic features, \(i = 0\).

Link to the Paper: “Time2Vec: Learning a Vector Representation of Time”

Time2Vec - Code

We’ll use PyTorrch with my open-source implementation of Time2Vec here

PyTorch implementation

import torch
from torch import nn
import numpy as np
import math

def t2v(tau, f, out_features, w, b, w0, b0):
    # k-1 periodic features
    v1 = f(torch.matmul(tau, w) + b)
    # One Non-periodic feature
    v2 = torch.matmul(tau, w0) + b0
    return torch.cat([v1, v2], 1)

class SineActivation(nn.Module):
    def __init__(self, in_features, out_features):
        super(SineActivation, self).__init__()
        self.out_features = out_features
        self.w0 = nn.parameter.Parameter(torch.randn(in_features, 1))
        self.b0 = nn.parameter.Parameter(torch.randn(in_features, 1))
        self.w = nn.parameter.Parameter(torch.randn(in_features, out_features-1))
        self.b = nn.parameter.Parameter(torch.randn(in_features, out_features-1))
        self.f = torch.sin

    def forward(self, tau):
        return t2v(tau, self.f, self.out_features, self.w, self.b, self.w0, self.b0)

I think the code is self-explanatory.

I have also provided a pretrained model to encode Date-Time in ISO format (e.g. 13:23:30 2019-7-23) to a vector of 64 dimensions. The pretraining task is “Next Date Prediction”, a dataset of pairs of date-time and their immediate next date-time is generated and trained in a supervised fashion. After training, the decoder is discarded, and it can be now used to generate embeddings for date-time.

A sample code for getting embedding from this Date2Vec package is shown below.

from Model import Date2VecConvert
import torch

# Date2Vec embedder object
# Loads a pretrained model
d2v = Date2VecConvert(model_path="./d2v_model/d2v_98291_17.169918439404636.pth")

# Date-Time is 13:23:30 2019-7-23
x = torch.Tensor([[13, 23, 30, 2019, 7, 23]]).float()

# Get embeddings
embed = d2v(x)

print(embed, embed.shape)

Conclusion

We talked about an incredibly simple technique that allows Neural Nets to understand periodic features.