Soliciting feedback on plans to merge Picard + GATK4
See original GitHub issueFor many years now we’ve been hearing from users of both GATK and Picard about how they’d love to see the two projects unite into a single “toolkit-to-rule-them-all”, for the sake of user convenience, to promote consistency across tools, and to minimize duplication of effort.
With the advent of GATK 4 this suddenly became a real possibility, as the decision was made to start the new GATK codebase from the Picard base classes rather than the old GATK 3.x base classes. This allows for free-form Picard-style tools and GATK “walkers” built upon an engine traversal to peacefully co-exist within the same framework. Last year, a Picard engineer successfully ported all Picard tools to the GATK 4 codebase with only minor changes to the tools themselves. More recently, efforts have been made to harmonize the build systems of the two projects, resulting in Picard’s recent move to gradle.
Importantly, the core GATK 4 codebase at https://github.com/broadinstitute/gatk is released entirely under the BSD 3-clause license, a big improvement over the confusing licensing situation in GATK 3.x, with its mix of open-source and proprietary licenses within the same repository – and that is where any Picard tools moved to the GATK 4 codebase would live, remaining fully open-sourced and free for all.
As all of the technical pieces are now in place to allow for a merger of the two projects (with the guarantee that the open-source nature of Picard code will be preserved) we are soliciting feedback from the Picard developer community about the prospect of a union with GATK. Would people here be generally in favor of such a move? Are there any strong objections to this idea? Any concerns that should be addressed before we head any further down this path?
To help start the discussion, here is the case in favor of a GATK4 + Picard merger as we see it:
- It would greatly cut down on duplication of effort and re-invention of functionality already existing in one project or the other – Picard devs could directly use open-source GATK functionality/utilities, and vice versa.
- Bug fixes and new features such as CRAM support could more easily propagate to all the tools in the GATK+Picard ecosystem, and the tools overall could be made more consistent and easier to use.
- Picard and GATK devs could better share knowledge and give each other feedback, and with more eyes looking at the code and more shared developer resources available the overall quality of the tools should increase.
- End-users’ lives would be greatly simplified by having all of the tools in one place, with a unified interface, a common set of conventions and a single place to go to for documentation, help and updates.
- Along similar lines, the ability of the GATK/Picard support team to provide in-depth support for the tools would be greatly enhanced. Among other things this would speed up plans to provide developer support and API documentation.
- Only minimal changes to the Picard tools would be required. The biggest change would be a switch to standard POSIX-style command-line arguments (eg.,
--argName value
instead ofargName=value
). - The work of porting has already largely been done, and in the process, test coverage of the ported versions of the Picard tools has been improved and bugs have been fixed.
In short: we’re stronger together than apart, right? 😃
We realize that such a move is never trivial, but we’re prepared to put resources into making the migration as painless as possible for everyone, and we believe that it would take us all to a better situation than what we have right now.
What do you think? Let the discussion begin!
Issue Analytics
- State:
- Created 7 years ago
- Comments:13 (9 by maintainers)
Top GitHub Comments
I’ve been away so I apologize for being late to this discussion. I have a few thoughts, some of which are merely echoing what others have written above:
-Alec