Duplicated cell names after concatenation
See original GitHub issueIn DropSeq experiments cell names are encoded by 12nt barcodes. It seems that no name check is performed when merging multiple datasets in ScanPy.
>>> from collections import Counter
>>> import scanpy.api as sc
>>> f = sc.read("data1.txt").transpose()
>>> g = sc.read("data2.txt").transpose()
>>> c = f.concatenate(g)
>>> len(c.obs_names)
7932
>>> len(set(c.obs_names))
7890
>>> cc = Counter(c.obs_names)
>>> cc.most_common(10)
[('AAAAAAAAAAAA', 2), ('TCCTGTCTCTTA', 2), ('CGCAAGGGAAAG', 2), ('ACCCGTCTATGT', 2), ('CTCCTGTCTCTT', 2), ('TTCCTGTCTCTT', 2), ('CCCTGTCTCTTA', 2), ('CCGCTGTCTCTT', 2), ('GACAAACCTACC', 2), ('ACACTGTCTCTT', 2)]
Issue Analytics
- State:
- Created 6 years ago
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Duplicated cell names after concatenation · Issue #55 - GitHub
In DropSeq experiments cell names are encoded by 12nt barcodes. It seems that no name check is performed when merging multiple datasets in ......
Read more >pandas : pd.concat results in duplicated columns
concat always concatenates the dataframes as is. It does not drop duplicate columns at all. If you concatenate without the axis, it will...
Read more >Solved: Remove Duplicates from Cell after Concatenation
Hi all,. I'm having trouble with removing duplicates from a cell after concatenation. Basically, in my Summarize tool, I regroup my data and ......
Read more >How to merge cells in Google Sheets - CONCATENATE ...
Learn how to combine cells in Google Sheets: concatenate strings with formulas and a special add-on.
Read more >Duplicate Labels — pandas 1.5.2 documentation
Duplicate Labels#. Index objects are not required to be unique; you can have duplicate row or column labels. This may be a bit...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi Davide,
thank you! Currently, we use the default behavior of pandas concatenate: see here. I’ll not be able to fix this during the next days. @flying-sheep, could you have a look and maybe figure out a meaningful option or meaningful default to circumvent this? It should be easy to simply pass an option to DataFrame.concat().
@dawe Thanks again for your pull request. I also put a new anndata version that incorporates it on PyPI.
Cheers, Alex
So, this is solved in anndata 0.5 and scanpy 0.4.3. See the release notes (https://scanpy.readthedocs.io) and https://github.com/theislab/anndata/commit/63500075e926f202e856bd04ec673df55bbd2460 and the example.
Hope this is a meaningful default. If you pass
index_unique=None
, then it keeps the previous indices including duplicates.