Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Algorithm outputs a series of repeated items but there are none in the training data

See original GitHub issue


I have noticed a behaviour that, to me, is a bit strange. I trained the algorithm with a series of sequences that had no repeated items, i.e. it’s not possible that an item appears again immediately after itself, like 1 in the sequence [3, 2, 1, 1, 5, 7, 2].

When I generated the most frequent sequences, though, I obtained repeated items. Is it possible?

For example, given the code: seqs = [[22, 16], [22, 21], [22, 16, 14, 20], [22, 16], [22, 16, 34, 24, 26, 24, 26, 14, 13], [22, 16], [22, 26], [22, 13, 34], [22, 16], [22, 21, 16]]

ps = PrefixSpan(seqs) ps.minlen = 2 ps.maxlen = 10

freq_ratio = 0.1 freq = np.ceil(freq_ratio * len(seqs)).astype(int)

res = ps.frequent(freq)

The output has [26, 26, 14, 13]

I just made a small reproducible example, in my case the sequence dataset is ~1000 sequences. But the problem remains.


Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

chuanconggaocommented, Dec 6, 2018

Hi, you seem to misunderstand the concept of pattern.

For example for one of your provided sequence [22, 1, 30, 1, 24, 30], pattern []22, 30, 30 IS a sub-pattern of this sequence. It is allowed to have other items in between.

ghostcommented, Nov 26, 2018

I have attached a file with some example sequences. It does not contain sequences with repeated items (i.e. where the same number appears once and then immediately again) but in the output I obtain, for example:

(156, [22, 30, 30])

Thanks for your help

Attached file: seqs.txt

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why Do I Get Different Results Each Time in Machine Learning?
Perhaps your model is making different predictions each time it is trained, even when it is trained on the same data set each...
Read more >
Types of Machine Learning Algorithms You Should Know
As a request from my friend Richaldo, in this post I'm going to explain the types of machine learning algorithms and when you...
Read more >
Supervised vs. Unsupervised Learning: What's the Difference?
To put it simply, supervised learning uses labeled input and output data, while an unsupervised learning algorithm does not.
Read more >
Top 10 Deep Learning Algorithms You Should Know in 2023
During the training process, algorithms use unknown elements in the input distribution to extract features, group objects, and discover useful  ...
Read more >
Python | Pandas - GeeksforGeeks
Output : As we can see in the output, the summary includes list of all columns with their data types and the number...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Post

No results found

github_iconTop Related Hashnode Post

No results found