question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Soliciting feedback on plans to merge Picard + GATK4

See original GitHub issue

For many years now we’ve been hearing from users of both GATK and Picard about how they’d love to see the two projects unite into a single “toolkit-to-rule-them-all”, for the sake of user convenience, to promote consistency across tools, and to minimize duplication of effort.

With the advent of GATK 4 this suddenly became a real possibility, as the decision was made to start the new GATK codebase from the Picard base classes rather than the old GATK 3.x base classes. This allows for free-form Picard-style tools and GATK “walkers” built upon an engine traversal to peacefully co-exist within the same framework. Last year, a Picard engineer successfully ported all Picard tools to the GATK 4 codebase with only minor changes to the tools themselves. More recently, efforts have been made to harmonize the build systems of the two projects, resulting in Picard’s recent move to gradle.

Importantly, the core GATK 4 codebase at https://github.com/broadinstitute/gatk is released entirely under the BSD 3-clause license, a big improvement over the confusing licensing situation in GATK 3.x, with its mix of open-source and proprietary licenses within the same repository – and that is where any Picard tools moved to the GATK 4 codebase would live, remaining fully open-sourced and free for all.

As all of the technical pieces are now in place to allow for a merger of the two projects (with the guarantee that the open-source nature of Picard code will be preserved) we are soliciting feedback from the Picard developer community about the prospect of a union with GATK. Would people here be generally in favor of such a move? Are there any strong objections to this idea? Any concerns that should be addressed before we head any further down this path?

To help start the discussion, here is the case in favor of a GATK4 + Picard merger as we see it:

  • It would greatly cut down on duplication of effort and re-invention of functionality already existing in one project or the other – Picard devs could directly use open-source GATK functionality/utilities, and vice versa.
  • Bug fixes and new features such as CRAM support could more easily propagate to all the tools in the GATK+Picard ecosystem, and the tools overall could be made more consistent and easier to use.
  • Picard and GATK devs could better share knowledge and give each other feedback, and with more eyes looking at the code and more shared developer resources available the overall quality of the tools should increase.
  • End-users’ lives would be greatly simplified by having all of the tools in one place, with a unified interface, a common set of conventions and a single place to go to for documentation, help and updates.
  • Along similar lines, the ability of the GATK/Picard support team to provide in-depth support for the tools would be greatly enhanced. Among other things this would speed up plans to provide developer support and API documentation.
  • Only minimal changes to the Picard tools would be required. The biggest change would be a switch to standard POSIX-style command-line arguments (eg., --argName value instead of argName=value).
  • The work of porting has already largely been done, and in the process, test coverage of the ported versions of the Picard tools has been improved and bugs have been fixed.

In short: we’re stronger together than apart, right? 😃

We realize that such a move is never trivial, but we’re prepared to put resources into making the migration as painless as possible for everyone, and we believe that it would take us all to a better situation than what we have right now.

What do you think? Let the discussion begin!

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:13 (9 by maintainers)

github_iconTop GitHub Comments

2reactions
lordzappocommented, Sep 9, 2016

bitmoji

ibtl

1reaction
alecwcommented, Sep 19, 2016

I’ve been away so I apologize for being late to this discussion. I have a few thoughts, some of which are merely echoing what others have written above:

  • I have a number of workflows that use the Picard-style CLP, both to invoke Picard tools and our own tools that use the Picard CLP. Converting all my workflows would be a non-trivial effort. In addition, there are almost 600 subscribers to the Dropseq Google group. I’m not sure how many of them are active, but any who are would eventually also need to port their workflows.
  • I am concerned that I’m going to run into issues building with all the additional dependencies. I don’t use Gradle. I don’t use Maven. I use plain old Ant, with the dependencies copied into my project.
  • I have been burned before by the enormous entourage of features that has been the GATK in the past. E.g. I spent a while figuring out that jobs were taking a long time because GATK was calling javax.crypto.Cipher.getInstance() so that it could phone home, and for some reason that process needed to list the temp directory, and the temp directory was getting filled up because of something else, etc., etc. Picard is lean. Historically, at least, GATK has not been, and I fear I will spend a lot of time dealing with the unintended consequences of something that someone thought was a good idea.

-Alec

Read more comments on GitHub >

github_iconTop Results From Across the Web

gatk-docs/2016-09-08 ... - GitHub
Documentation archive for GATK tools and workflows - gatk-docs/2016-09-08- ... Soliciting developer feedback on plans to merge Picard + GATK4.
Read more >
MergeSamFiles (Picard) - GATK
This tool is used for combining SAM and/or BAM files from different runs or read groups into a single file, similarl to the...
Read more >
150. Consolidating GATK Picard tools support - Google Sites
The good news is that we're taking steps to consolidate these efforts, which we believe ... To recap, we have brought the GATK...
Read more >
Tutorials — GATK-Forum - RSSing.com
PrintReads merges or subsets sequence data. ... Picard's FastqToSam transforms a FASTQ file to an unmapped BAM, requires two read ... We welcome...
Read more >
Picard Tools - By Broad Institute - GitHub Pages
When Asking For Help. When asking a question about a problem, please include the following: Command line(s) you ran; Program console output and...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found