Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicated cell names after concatenation

See original GitHub issue

In DropSeq experiments cell names are encoded by 12nt barcodes. It seems that no name check is performed when merging multiple datasets in ScanPy.

>>> from collections import Counter
>>> import scanpy.api  as sc

>>> f = sc.read("data1.txt").transpose()
>>> g = sc.read("data2.txt").transpose()
>>> c = f.concatenate(g)

>>> len(c.obs_names)
7932
>>> len(set(c.obs_names))
7890

>>> cc = Counter(c.obs_names)
>>> cc.most_common(10)
[('AAAAAAAAAAAA', 2), ('TCCTGTCTCTTA', 2), ('CGCAAGGGAAAG', 2), ('ACCCGTCTATGT', 2), ('CTCCTGTCTCTT', 2), ('TTCCTGTCTCTT', 2), ('CCCTGTCTCTTA', 2), ('CCGCTGTCTCTT', 2), ('GACAAACCTACC', 2), ('ACACTGTCTCTT', 2)]

Issue Analytics

State:
Created 6 years ago
Comments:8 (8 by maintainers)

Top GitHub Comments

1reaction

falexwolfcommented, Dec 28, 2017

Hi Davide,

thank you! Currently, we use the default behavior of pandas concatenate: see here. I’ll not be able to fix this during the next days. @flying-sheep, could you have a look and maybe figure out a meaningful option or meaningful default to circumvent this? It should be easy to simply pass an option to DataFrame.concat().

@dawe Thanks again for your pull request. I also put a new anndata version that incorporates it on PyPI.

Cheers, Alex

0reactions

falexwolfcommented, Feb 12, 2018

So, this is solved in anndata 0.5 and scanpy 0.4.3. See the release notes (https://scanpy.readthedocs.io) and https://github.com/theislab/anndata/commit/63500075e926f202e856bd04ec673df55bbd2460 and the example.

Hope this is a meaningful default. If you pass index_unique=None, then it keeps the previous indices including duplicates.

Top Results From Across the Web

Duplicated cell names after concatenation · Issue #55 - GitHub

In DropSeq experiments cell names are encoded by 12nt barcodes. It seems that no name check is performed when merging multiple datasets in ......

pandas : pd.concat results in duplicated columns

concat always concatenates the dataframes as is. It does not drop duplicate columns at all. If you concatenate without the axis, it will...

Solved: Remove Duplicates from Cell after Concatenation

Hi all,. I'm having trouble with removing duplicates from a cell after concatenation. Basically, in my Summarize tool, I regroup my data and ......

How to merge cells in Google Sheets - CONCATENATE ...

Learn how to combine cells in Google Sheets: concatenate strings with formulas and a special add-on.

Duplicate Labels — pandas 1.5.2 documentation

Duplicate Labels#. Index objects are not required to be unique; you can have duplicate row or column labels. This may be a bit...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

Duplicated cell names after concatenation

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Documentation request: Iterating a DataFrame

indexing of AnnData