question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

The output I get looks very weird to me

See original GitHub issue

Trying prefixspan-0.5.2 with python 3.6.8

I am running prefixspan-cli frequent 5 --closed ids.txt > seqs.ids

ids.txt: https://gist.github.com/johann-petrak/9d07e3bacd167639c26defb822dbe6aa seqs.ids: https://gist.github.com/johann-petrak/7db1b94153816075556798db1d068069

If you look at the line 4 from the bottom, this is “34 1 0 2 : 6” and the next line is “34 0 2 : 8”.

If you look at the input you will notice that “34 0 2” does not occur anywhere in the input, so why is it included in the output with frequency 8?

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
LoLeicommented, Jun 5, 2020

@johann-petrak I suggest you read this survey about sequential pattern mining [Fournier-Viger et al., 2017]. The introduction describes what sequential patterns are in general, and why gap constraints are not universally needed, and there are also short descriptions of the various algorithms and their parameters.

Also the papers of PrefixSpan [Pei et al., 2004], BIDE [Wang et al., 2007] and FEAT [Gao et al., 2008] are of course interesting.

This repository merely provides efficient implementations of these algorithms.

Adding gap constraints arbitrarily to an existing algorithm is not trivial, as the possibility heavily depends on how that specific algorithm works. For some, gap-constraint versions have been proposed after the original algorithm was published, e.g. Gap-BIDE [Li and Wang, 2008].

0reactions
johann-petrakcommented, Jun 5, 2020

My expectation would have been that unless the size and number of gaps are specified, no gaps would be allowed. In many situations one just wants proper subsequences without gaps, in other situations e.g. subsequences with at most 1 gap of at most 2 elements etc. But this algorithm does not seem to have a command line option to control this and I did also not see a mention about what the default settings are.

But clearly id does not make much sense in many applications to allow any number of gaps of any size.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why do people look at me really weird in public? - Quora
You are either: too self-conscious or too paranoid. First, even if people are looking at you in a weird way, so what? You...
Read more >
Why Do So Many Couples Look Alike? Here's the Psychology ...
Why do so many couples look alike? Here's what the science says about couples who look like each other, or couples who look...
Read more >
Change the sound output settings on Mac - Apple Support
On your Mac, choose a different sound output (such as headphones or a speaker) using the Output pane of Sound settings.
Read more >
When Even the Simplest Word Looks Weird And Wrong You ...
This problem crops up when you can't spell the simplest words. When familiar words suddenly seem like the strangest things. We don't know...
Read more >
PCA output looks weird for a kmeans scatter plot
I'll add that there are very few 5D datasets which can be projected down to 2D without throwing out a lot of variance...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found