question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicated cell names after concatenation

See original GitHub issue

In DropSeq experiments cell names are encoded by 12nt barcodes. It seems that no name check is performed when merging multiple datasets in ScanPy.

>>> from collections import Counter
>>> import scanpy.api  as sc

>>> f = sc.read("data1.txt").transpose()
>>> g = sc.read("data2.txt").transpose()
>>> c = f.concatenate(g)

>>> len(c.obs_names)
7932
>>> len(set(c.obs_names))
7890

>>> cc = Counter(c.obs_names)
>>> cc.most_common(10)
[('AAAAAAAAAAAA', 2), ('TCCTGTCTCTTA', 2), ('CGCAAGGGAAAG', 2), ('ACCCGTCTATGT', 2), ('CTCCTGTCTCTT', 2), ('TTCCTGTCTCTT', 2), ('CCCTGTCTCTTA', 2), ('CCGCTGTCTCTT', 2), ('GACAAACCTACC', 2), ('ACACTGTCTCTT', 2)]

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
falexwolfcommented, Dec 28, 2017

Hi Davide,

thank you! Currently, we use the default behavior of pandas concatenate: see here. I’ll not be able to fix this during the next days. @flying-sheep, could you have a look and maybe figure out a meaningful option or meaningful default to circumvent this? It should be easy to simply pass an option to DataFrame.concat().

@dawe Thanks again for your pull request. I also put a new anndata version that incorporates it on PyPI.

Cheers, Alex

0reactions
falexwolfcommented, Feb 12, 2018

So, this is solved in anndata 0.5 and scanpy 0.4.3. See the release notes (https://scanpy.readthedocs.io) and https://github.com/theislab/anndata/commit/63500075e926f202e856bd04ec673df55bbd2460 and the example.

Hope this is a meaningful default. If you pass index_unique=None, then it keeps the previous indices including duplicates.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Duplicated cell names after concatenation · Issue #55 - GitHub
In DropSeq experiments cell names are encoded by 12nt barcodes. It seems that no name check is performed when merging multiple datasets in ......
Read more >
pandas : pd.concat results in duplicated columns
concat always concatenates the dataframes as is. It does not drop duplicate columns at all. If you concatenate without the axis, it will...
Read more >
Solved: Remove Duplicates from Cell after Concatenation
Hi all,. I'm having trouble with removing duplicates from a cell after concatenation. Basically, in my Summarize tool, I regroup my data and ......
Read more >
How to merge cells in Google Sheets - CONCATENATE ...
Learn how to combine cells in Google Sheets: concatenate strings with formulas and a special add-on.
Read more >
Duplicate Labels — pandas 1.5.2 documentation
Duplicate Labels#. Index objects are not required to be unique; you can have duplicate row or column labels. This may be a bit...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found