question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Question about TF_distance_matrix.txt

See original GitHub issue

Hi, Dear developer I have some questions about TF_distance_matrix.

  1. According to the below explanation

Distance matrix used to cluster the transcription factors in the bindetect_figures-dendrograms. This is based on the overlap of individual transcription factor binding sites.

The calculation of the distance value is based on the TFBS , so it is from <outdir>/<TF>/beds/<TF>_all.bed ? So every distance tree in each page of bindetect_figures.pdf is same ?

  1. What is the calculation method of distance ? Is it Jaccard index?

  2. Can I get the distance value of specific condition using the <outdir>/<TF>/beds/<TF>_<condition>_bound.bed ? I think it may offers another information.

  3. And Does Cluster Motifs: Cluster motifs and create consensus motifs based on similarity has relationship with the distance value? I think clusterMotifs is just based on the motif similarity without thinking TFBS.

And thanks for this wonderful tools 👍 😃 Best wishes Guandong Shang

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
msbentsencommented, Oct 28, 2020

Hi Guandong Shang,

The reason why the distance is calculated by TFBS overlap is to get an idea of possible false-positive footprints within motif families. E.g. if a factor such as GATA4 is found to have a footprint, the distances will show that GATA2, GATA3, etc. are also very similar. We can therefore not be sure which of the proteins is actually causing the footprint.

You could probably calculate a distance based on the cooperation of TFs - but this is a different problem than what is being solved by TOBIAS. You can create a TF x peaks matrix from the “TOBIAS BINDetect” output files, as you have all information of which peaks each TFBS was found within. Sounds interesting, but it is not something that I am going to get into at this point 😃

Best Mette

1reaction
msbentsencommented, Oct 23, 2020

I have just released TOBIAS 0.12.3 containing a utility script called cluster_sites_by_overlap.py, which creates the distance matrix and dendrogram for a subset of sites. As an example, to get the clustering of sites in the bound subsets, you can run it with: cluster_sites_by_overlap.py --bedfiles BINDetect_output/*/beds/*Bcell_bound.bed

Because of the internal normalization in BINDetect, it is not possible to calculate the differential footprint scores on a subset of sites (as the scores would then be shifted with regards to the background scores). So this plot only gives you the dendrogram of overlapping sites. I hope it helps you nonetheless!

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found