Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Changing feature labels form gbk record

See original GitHub issue

Hi Zulko,

Love the package! Exactly what I’ve been looking for.

I have a couple of queries about the best way to implement the following things:

Firstly, I’d like to maybe colour each CDS on an individual basis using Matplotlib’s colourscales. How would you advise implementing this? I’ve been using the code below, so I’m thinking I could maybe iterate a list of colours and zip it to the feature list so that each feature is paired with an individual colour? Any better suggestions?

Secondly, is there some way to access additional options for labelling features? In the documented examples, you’ve shown how to forcibly renaming all the CDS to a specific string ("CDS here"). I’d like to label some/all of my features with the /product tag from a Genbank, as it’s more informative than the locus tags. Is there some way to access these in the .features object? I can’t see something that looks like it corresponds to this.

More generally, could I put a request in for some more ‘fully featured’ examples in the documentation that could be deconstructed (if you get the time of course!) ? I’d like to learn how to use the package in much more depth for the future.

Many thanks!

import os, platform, sys
from os.path import expanduser
home = expanduser("~")
import matplotlib
if platform.system() == "Darwin":
    matplotlib.use('TkAgg') # Avoid python framework errors on OSX


__author__ = "Joe R. J. Healey"
__version__ = "1.0.0"
__title__ = "PrettyPlotter"
__license__ = "GPLv3"
__author_email__ = "J.R.J.Healey@warwick.ac.uk"


from dna_features_viewer import BiopythonTranslator

class MyCustomTranslator(BiopythonTranslator):
    """Custom translator implementing the following theme:

    - Color terminators in green, CDS in blue, all other features in gold.
    - Do not display features that are restriction sites unless they are BamHI
    - Do not display labels for restriction sites
    - For CDS labels just write "CDS here" instead of the name of the gene.

    """

    def compute_feature_color(self, feature):
        if feature.type == "CDS":
            return "blue"
        # maybe zip() together iterated features and a list of colours from a colour scale rather than
        # having all features the same colour?
        elif feature.type == "terminator":
            return "green"
        else:
            return "gold"

    def compute_feature_label(self, feature):
        if feature.type == "CDS":
            return BiopythonTranslator.compute_feature_label(feature.description) # how to get /product description to replace CDS name/locus tag?

    def compute_filtered_features(self, features):
        """Do not display promoters. Just because."""
        return [feature for feature in features if (feature.type != "gene")]


def parse_args():
    """Parse commandline arguments"""
    import argparse

    try:
        parser = argparse.ArgumentParser(
            description='Make pretty images from sequence files (.dna/.gbk/.gff).')
        parser.add_argument(
            'seqfile',
            action='store',
            help='Input sequence file. Supported filetypes are genbank, GFF, and SnapGene\'s .dna')
        parser.add_argument(
            'image',
            action='store',
            default=home,
            help='Output image filename with extension')

        return parser.parse_args()

    except:
        print("An exception occured with argument parsing. Check your provided options.")
        sys.exit(1)


def main():

    args = parse_args()

    if os.path.splitext(args.seqfile)[1] == '.dna':
        print("Input file is a SnapGene DNA file. Calling snapgene_reader to convert to BioPython.")
        from snapgene_reader import snapgene_file_to_seqrecord
        seqrecord = snapgene_file_to_seqrecord(args.infile)
    else:
        pass

    graphic_record = MyCustomTranslator().translate_record(args.seqfile)
    ax, _ = graphic_record.plot(figure_width=10)
    ax.figure.tight_layout()
    ax.figure.savefig(args.image)

if __name__ == '__main__':
    main()

Issue Analytics

State:
Created 5 years ago
Comments:16 (8 by maintainers)

Top GitHub Comments

1reaction

Zulkocommented, Mar 26, 2018

Hey there,

I agree more examples would be a good thing. There are several possible answers to your questions.

First, keep in mind that the feature objects in the Biotranslator refer to Biopython Feature objects. So the Biopython docs will tell you everything about their structure.

Also, if a feature has qualifiers “color” or “label”, these will be used by default by the BioPythonTranslator (unless you have a custom BiopythonTranslator where you overwrite this behavior). That means that instead of putting your logics in your custom translator, you can also pre-process your biopython record before you feed it to the translator:

seqrecord = snapgene_file_to_seqrecord(args.infile)#
for feature in seqrecord:
    feature.qualifiers["color"] = 'blue'
    # ... implement other rules
graphic_record =Translator().translate_record(args.seqfile)

This being said, here are some ways of doing what you want from the Translator:

For the gene product, I would do it as follows:

def compute_feature_label(self, feature):
    if feature.type == "CDS":
        return feature.qualifiers["product"]
        # or even more robust, will handle the case with 0 or 2+ products:
        return " ".join(feature.qualifiers.get("product", [""]))

For iterating through colors, i would do it as follows:

from itertools import cycle
color_iterator = cycle(['blue', 'red', 'green', 'purple'])

def compute_feature_color(self, feature):
        if feature.type == "CDS":
            return next(color_iterator)

also have a look at matplotlib.colors for ways of generating color palettes.

0reactions

MeggyCcommented, May 30, 2019

Brilliant, that works like a dream! Thanks!

Top Results From Across the Web

02 - GenBank format and annotation

A sequence record in GenBank format has three main sections: header starting with the LOCUS line; feature table listing any annotated features like...

The DDBJ/ENA/GenBank Feature Table Definition

Difference and change features Indicate ways in which a sequence should be changed to produce a different "version": misc_difference location ...

GenBank Sample Record - NCBI

This identification number uses the accession.version format implemented by GenBank/EMBL/DDBJ in February 1999. If there is any change to the ...

Label Properties - Logi Analytics

Specifies whether to adjust the width and height of the object according to the contents. Not supported on labels that are inside a...

Biopython Tutorial and Cookbook

There was a major change in Biopython 1.65 making the Seq and ... from Bio import SeqIO records = list(SeqIO.parse("ls_orchid.gbk", ...