question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Extracted shapelets from multivariate dataset are always of the 1st dimension

See original GitHub issue

Describe the bug Hey there, I’ve been playing around with sktime in order to detect shapelets in multivariate TS datasets. I’ve been testing with the notebook available in multivariate_time_series_classification.ipynb, specifically I’ve been trying the third method (Bespoke estimator-specific methods) in order to extract these shapelets from multivariate datasets.

What I’ve found is that no matter how big the dataset is (I tried BasicMotions dataset and also an own multivariate dataset, increasing and decreasing them, in order to make sure that all the series are visited), neither how much time do you put the python script to run (I performed tests from 5 to 30 minutes long), the extracted shapelets that ShapeletTransformClassifier detects are always shapelets associated to the first dimension of the multivariate dataset, i.e., for the BasicMotions dataset we have 6 dimensions, so the shapelets extracted are always from the first one.

I’ve noticed that this method (Bespoke estimator-specific methods) is still under construction, but I would like to know if this behavior is the one that I should expect or this is a bug.

To Reproduce

from sktime.transformers.compose import ColumnConcatenator
from sktime.transformers.shapelets import ContractedShapeletTransform
from sktime.classifiers.compose import TimeSeriesForestClassifier
from sktime.classifiers.dictionary_based.boss import BOSSEnsemble
from sktime.classifiers.compose import ColumnEnsembleClassifier
from sktime.classifiers.shapelet_based import ShapeletTransformClassifier
from sktime.datasets import load_basic_motions
from sktime.pipeline import Pipeline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

###### FUNCTIONS ######

def plotEachShapelets(st, train_x):
	# for each extracted shapelet (in descending order of quality/information gain)
	for s in st.shapelets[0:5]:
	    print(s)
    
	    # plot the series that the shapelet was extracted from 
	    plt.plot(
	        train_x.iloc[s.series_id,0],
	        'gray'
	    )
	    
	    # overlay the shapelet onto the full series
	    plt.plot(
	        list(range(s.start_pos,(s.start_pos+s.length))),
	        train_x.iloc[s.series_id,0][s.start_pos:s.start_pos+s.length],
	        'r',
	        linewidth=3.0
	    )
	    plt.show()

def plotAllShapelets(st, train_x):
	# for each extracted shapelet (in descending order of quality/information gain)
	for i in range(0,len(st.shapelets)):
	    s = st.shapelets[i]
	    # summary info about the shapelet 
	    print("#"+str(i)+": "+str(s))
	    
	    # overlay shapelets
	    plt.plot(
	        list(range(s.start_pos,(s.start_pos+s.length))),
	        train_x.iloc[s.series_id,0][s.start_pos:s.start_pos+s.length]
	    )

	plt.show()

X_train, y_train = load_basic_motions(split='TRAIN', return_X_y=True)
X_test, y_test = load_basic_motions(split='TEST', return_X_y=True)


clf = ShapeletTransformClassifier(time_contract_in_mins=5)
clf.fit(X_train, y_train)
print("--> Score = " + str(clf.score(X_test, y_test)))
print("--> Shapelets detected = " + str(len(clf.classifier[0].shapelets)))

plotEachShapelets(clf.classifier[0], X_train)
plotAllShapelets(clf.classifier[0], X_train)

Expected behavior I would expect that the shapelets detected are not always shapelets extracted from the first dimension, but also from other dimensions of the multivariate dataset.

Additional context None.

Versions

  • Linux-4.4.0-17134-Microsoft-x86_64-with-Ubuntu-16.04-xenial
  • Python 3.6.8 (default, May 7 2019, 14:58:50)
  • [GCC 5.4.0 20160609]
  • NumPy 1.16.4
  • SciPy 1.3.0
  • sktime 0.3.0

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
DavidCorral94commented, Sep 10, 2019

Hey @mloning, it’s ok, I just thought that you wanted me to try it out! I’ll keep waiting and following the issue in order to know when the multivariate Shapelets detection will be available. Thanks for everything!

1reaction
mloningcommented, Sep 10, 2019

Hi @DavidCorral94, sorry, I think my previous comment may have been confusing. The validate_X_y and check_X_is_univariate are helper functions used inside of estimators to check if the estimator can handle the input data, basically to avoid the original issue you described.

The basic motion data set is multivariate, so check_X_is_univariate is expected to throw an error.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] Extracted shapelets from multivariate dataset ... - GitHub
I would expect that the shapelets detected are not always shapelets extracted from the first dimension, but also from other dimensions of the ......
Read more >
Early classification of multivariate temporal observations by ...
The method extracts time series patterns, called multivariate shapelets, from all dimensions of the time series that distinctly manifest the ...
Read more >
Shapelet Transforms for Univariate and Multivariate ... - CORE
Shapelets are phase independent subsequences that are extracted from time series to form discriminatory features. It has been shown that using the shapelets...
Read more >
Learning multivariate shapelets with multi-layer neural ...
Abstract. Shapelets are discriminative subsequences extracted from time-series data. Classifiers using shapelets have proven to achieve ...
Read more >
Characteristics of the multivariate datasets. | Download Table
propose LCTS, a shapelet learning method that, instead of extracting the top shapelets directly from time series subsequences, uses selforganizing incremental ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found