testing cooler on capture Hi-C data
See original GitHub issueHi @nvictus As quickly discussed, I tried to build a cooler object on a 3Mb Hi-C region from capture Hi-C data. I’m facing several issues according to the test I run …
I have a bed file with genomic intervals of 1kb, from chrX:150125000-153125000
Accordinly, I extracted my pairs within the same genomic range ;
>>zcat contacts.txt.gz | head -1
NB501764:1043:HM5FKBGXF:2:23302:4878:10125 chrX 150125462 chrX 150163625 + +
>>zcat contacts.txt.gz | tail -1
NB501764:1043:HM5FKBGXF:3:21505:23621:19037 chrX 153121783 chrX 153123912 - -
Then, I simply try to ingest the data with cload pairs
>>cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 target.bed contacts.txt.gz test.cool
INFO:cooler.create:Writing chunk 0: /data/kdi_prod/.kdi/project_workspace_0/1309/acl/01.00/downstreamAnalysis/scripts/tmpenmswn78.multi.cool::0
INFO:cooler.create:Creating cooler at "/data/kdi_prod/.kdi/project_workspace_0/1309/acl/01.00/downstreamAnalysis/scripts/tmpenmswn78.multi.cool::/0"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.create:Writing indexes
INFO:cooler.create:Writing info
INFO:cooler.create:Done
INFO:cooler.create:Merging into test.cool
INFO:cooler.create:Creating cooler at "test.cool::/"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.reduce:nnzs: [377101]
INFO:cooler.reduce:current: [0]
Traceback (most recent call last):
File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/bin/cooler", line 10, in <module>
sys.exit(cli())
File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/data/users/nservant/projects_analysis/kdi_home/conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/cli/cload.py", line 492, in pairs
ordered=False
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 944, in create_cooler
max_merge=max_merge)
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 687, in create_from_unordered
**kwargs)
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 577, in create
file_path, target, meta.columns, iterable, h5opts, lock)
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 213, in write_pixels
for i, chunk in enumerate(iterable):
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/reduce.py", line 163, in __iter__
ignore_index=True)
File "conda/nf-core-hic/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 255, in concat
sort=sort,
File "conda/nf-core-hic/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 304, in __init__
raise ValueError("No objects to concatenate")
ValueError: No objects to concatenate
Then, I had a try with cooler pairx
cooler csort -c1 2 -p1 3 -s1 6 -c2 4 -p2 5 -s2 7 -o contacts.csort.txt.gz contacts.txt.gz chrX.sizes
cooler cload pairix target.bed contacts.csort.txt.gz test.cool
INFO:cooler.cli.csort:Enumerating requested chromosomes...
INFO:cooler.cli.csort:chrX 1
INFO:cooler.cli.csort:Input: '../Jarid_NPC_B1/cooler/Jarid_NPC_B1_contacts.txt.gz'
INFO:cooler.cli.csort:Output: '../Jarid_NPC_B1/cooler/Jarid_NPC_B1_contacts.csort.txt.gz'
INFO:cooler.cli.csort:Reordering pair mates and sorting pair records...
INFO:cooler.cli.csort:Sort order: block (chrom1, chrom2, pos1, pos2)
INFO:cooler.cli.csort:sort -k2,2 -k4,4 -k3,3n -k5,5n --parallel=8 --buffer-size=50%
INFO:cooler.cli.csort:Indexing...
INFO:cooler.cli.csort:Indexer: pairix
INFO:cooler.cli.csort:pairix -f -s2 -d4 -b3 -e3 -u5 -v5 ../Jarid_NPC_B1/cooler/Jarid_NPC_B1_contacts.csort.txt.gz
INFO:cooler.cli.cload:Using 8 cores
INFO:cooler.create:Creating cooler at "../Jarid_NPC_B1/cooler/Jarid_NPC_B1_1000.cool::/"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.create:Binning chrX:150125000-151625000|*
INFO:cooler.create:Binning chrX:151625000-153125000|*
INFO:cooler.create:Finished chrX:151625000-153125000|*
INFO:cooler.create:Finished chrX:150125000-151625000|*
Traceback (most recent call last):
File "conda/nf-core-hic/bin/cooler", line 10, in <module>
sys.exit(cli())
File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "conda/nf-core-hic/lib/python3.7/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/cli/cload.py", line 238, in pairix
ordered=True)
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 925, in create_cooler
lock=lock)
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 577, in create
file_path, target, meta.columns, iterable, h5opts, lock)
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_create.py", line 213, in write_pixels
for i, chunk in enumerate(iterable):
File "conda/nf-core-hic/lib/python3.7/site-packages/cooler/create/_ingest.py", line 292, in _validate_pixels
"Found a bin ID that exceeds the declared number of bins. "
cooler.create._ingest.BadInputError: Found a bin ID that exceeds the declared number of bins. Check whether your bin table is correct.
The csort
command works (although I put here the entire chrX size ? not sure what to put otherwise …), but the cload pairix
also crashed …
Of note, I also reported the same error in cooltools
earlier, when trying to bin a small genome
https://github.com/open2c/cooltools/issues/237
cooler -V cooler, version 0.8.6.post0
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (4 by maintainers)
Top Results From Across the Web
Captured Hi-C data analysis - HiCExplorer - Read the Docs
Aggregate data for differential test It selects the original data based on the target locations and returns one hdf5 based file.
Read more >FAN-C: a feature-rich framework for the analysis and ...
Here, we present FAN-C, a Framework for the ANalysis of Chromatin Conformation Capture data, an easy-to-use command-line tool and powerful ...
Read more >Hi-C analysis: from data generation to integration - PMC - NCBI
Hi-C data allows examining the genome 3D organization at multiple scales ... For example, promoter capture Hi-C is designed to enrich for ...
Read more >Hi‐C 3.0: Improved Protocol for Genome‐Wide Chromosome ...
Hi-C is a chromosome conformation capture (3C)-based technology to detect ... 6-bp DNA sequences, limiting data resolution to ∼10 kb.
Read more >Galaxy HiCExplorer 3: a web server for reproducible Hi-C ...
Capture Hi-C data cannot be analysed with established Hi-C ... Hi-C and cHi-C supports HiCExplorer's h5 and cool interaction matrix file ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
You don’t even need to use them for balancing, just storing them in the cooler so you can see them if you want… In my limited experience balancing small regions is not 100% reliable in general unfortunately, need to modify filtering sometimes. But would be good to assemble some test data to check how the tools work with it.
I just wanted to say that I always use whole-genome binning, as if it was whole-genome Hi-C, and never had any issues like that. Just provide a blacklist to balancing, which would make it ignore most of the genome.