SelfSupervised Learning for Time Series Data
In domains where it is hard to get labeled data or limited labeled data such as health care, selfsupervised learning lends itself as the most useful and practical machine learning paradigm. In selfsupervised learning, machine learning algorithms learn the representation of the data by solving pretext tasks.
Pretext tasks are carefully designed classification or regression problems such that the machine learning model learns useful features and representation from the unlabeled data. These learned representations then can be used in downstream tasks to solve the original problem.
Given a problem \(P\) and corresponding dataset \(D\) we want to solve \(P\) using machine learning. Since \(D\) is not labeled or only a few labels are available we cannot use supervised learning. We then design some pretext tasks \(T\) using the dataset \(D\) such that we get inputoutput pairs from \(D\) i.e., \((x_{t_i}, y_{t_i}) \in D\) where \(t_i\) is the ith pretext task in \(T\). Next, we train machine learning models to solve the pretext tasks and then use the representation learned by the models to solve the original problem \(P\). This can be done using transfer learning.
Now the main part becomes coming up with these pretext tasks and answering how to design these pretext tasks. Generally, pretext tasks are designed using the principle of Contrastive Predictive Coding.
Contrastive Methods
Contrastive methods are based on the idea of constructing pairs of \(x_1\) and \(x_2\) that are not similar to each other and learning to quantify or measure their similarity. Basically, given an input \(x\) we create samples \((x_i, x_j)\) from \(x\) using various strategies and then label these samples to create a dataset. The goal of this process is that by learning to quantify or measure the similarity between the created samples, the model will learn the general representation of the original data. Contrastive methods usually employ an encoder framework to learn the representation of the data. After learning the pretext tasks, the learned encoder is frozen and augmented with new layers for downstream tasks.
Pretext tasks for time series data.
My interest in selfsupervised learning originated from my struggle to train a stress classification model from limited sensor data. Below I have listed some pretext tasks that can be used for selfsupervised learning with timeseries data.

Temporal cut: a random contiguous section of the timeseries signal is replaced with zero

Temporal delays: the timeseries data is randomly delayed in time.

Noise: independent and identically distributed Gaussian noise is added to the signal

Bandstop filtering: the signal content at a randomly selected frequency band is filtered out using a bandstop filter.

Signal mixing: another time instance or subject data is added to the signal to simulate correlated noise.

Relative positioning: segments are displaced along time scale and labels are created based on whether twotime segments are closed together or farther apart.

Temporal Shuffling: create three segments and the label is determined based on their relative ordering.

Blend Detection: detecting input blending as a multiclass classification problem.

Fusion Magnitude Prediction: predicting the magnitude which defines the blending as a regression problem.

Feature Prediction from Masked Window: approximate summary statistics of a masked temporal segment within a signal using a multihead network and Huber Loss.

Transformation Recognition: multiclass classification problem to learn a network that can directly recognize the applied transformation on input from one out of k classes.

Temporal Shift Prediction: estimate the number of steps by which the samples are circularly shifted in their temporal dimension.

Modality Denoising: decompose a signal for obtaining a clean target through input reconstruction, i.e., isolating the mixed noise. Mix data from different modalities and then try to reconstruct each using an encoderdecoder network.

Odd Segmentation Recognition: identify the unrelated subsegment that does not belong to the input under consideration.

Metric Learning with Triplet Loss: encourage the representations of similar inputs but different modalities to be closer, while the representations of dissimilar inputs to be further apart.
For timeseries data with spatial variation, for example, multiple electrodes of EEG device placed around the surface of the head. Some pretext tasks are
 Spatial rotation: the data is rotated in space.
 Spatial shift: the data is shifted in space.
 Sensor dropout: a random subset of sensors is replaced with zeros.
 Sensor cutout: sensors in a small region of space are replaced with zeros.
Cheers!