question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CodonOptimize mode 'harmonized' only optimizes first 1/3 of the sequence

See original GitHub issue

Using the CodonOptimize ‘harmonized’ functionality, I do not get a codon distribution similar to the one I specified. It seems that the second 2/3s of a sequence are never optimized.

A minimal example:

from dnachisel import *

problem = DnaOptimizationProblem(
            sequence='GACGACGACAAAAAAAAAAAAAAAAAA',
            constraints=[EnforceTranslation()],
            objectives=[CodonOptimize(species='b_subtilis', mode='harmonized')]
           )

problem.resolve_constraints()
problem.optimize()

print('SEQUENCE:', problem.sequence)

frequencies, positions = biotools.biotools.codons_frequencies_and_positions(problem.sequence)
print(*frequencies.items(), sep='\n')

This sequence encodes DDDKKKKKK and gives as output:

SEQUENCE: GACGATGATAAAAAAAAAAAAAAAAAA
('V', {'total': 0, 'GTA': 0.0, 'GTT': 0.0, 'GTC': 0.0, 'GTG': 0.0})
('L', {'CTT': 0.0, 'TTA': 0.0, 'CTC': 0.0, 'CTG': 0.0, 'CTA': 0.0, 'total': 0, 'TTG': 0.0})
('Q', {'CAG': 0.0, 'total': 0, 'CAA': 0.0})
('G', {'total': 0, 'GGC': 0.0, 'GGA': 0.0, 'GGT': 0.0, 'GGG': 0.0})
('P', {'CCA': 0.0, 'total': 0, 'CCT': 0.0, 'CCG': 0.0, 'CCC': 0.0})
('A', {'total': 0, 'GCG': 0.0, 'GCC': 0.0, 'GCT': 0.0, 'GCA': 0.0})
('T', {'total': 0, 'ACT': 0.0, 'ACA': 0.0, 'ACG': 0.0, 'ACC': 0.0})
('C', {'total': 0, 'TGT': 0.0, 'TGC': 0.0})
('S', {'AGC': 0.0, 'AGT': 0.0, 'TCT': 0.0, 'total': 0, 'TCG': 0.0, 'TCC': 0.0, 'TCA': 0.0})
('N', {'AAC': 0.0, 'total': 0, 'AAT': 0.0})
('H', {'CAC': 0.0, 'total': 0, 'CAT': 0.0})
('I', {'total': 0, 'ATA': 0.0, 'ATT': 0.0, 'ATC': 0.0})
('D', {'GAT': 0.6666666666666666, 'total': 3, 'GAC': 0.3333333333333333})
('M', {'total': 0, 'ATG': 0.0})
('F', {'TTT': 0.0, 'total': 0, 'TTC': 0.0})
('R', {'CGG': 0.0, 'AGG': 0.0, 'CGC': 0.0, 'CGA': 0.0, 'total': 0, 'AGA': 0.0, 'CGT': 0.0})
('K', {'total': 6, 'AAG': 0.0, 'AAA': 1.0})
('E', {'total': 0, 'GAG': 0.0, 'GAA': 0.0})
('*', {'total': 0, 'TAA': 0.0, 'TGA': 0.0, 'TAG': 0.0})
('Y', {'total': 0, 'TAC': 0.0, 'TAT': 0.0})
('W', {'total': 0, 'TGG': 0.0})

It does well for Aspartic Acid (D), as this has a GAT 0.64 / GAC 0.36 ratio, but for K with a AAA 0.7 / AAG 0.3 ratio, it does nothing at all. This is position dependent, as a sequence encoding KKKDDDDDD does the opposite.

While trying to debug, I printed all variants that were used during exhaustive search for this sequence (line 454 of DnaOptimizationProblem.py) , which are:

GACGACGACAAAAAAAAAAAAAAAAAA
GATGACGACAAAAAAAAAAAAAAAAAA
GATGACGACAAAAAAAAAAAAAAAAAA
GATGACGATAAAAAAAAAAAAAAAAAA
GATGACGATAAAAAAAAAAAAAAAAAA
GATGATGATAAAAAAAAAAAAAAAAAA

This again points to the fact that the later positions are never included in the variants to be analysed.

I’m using the latest version, dnachisel 1.1.

It would be great if you could find out where the problem is and come up with a quick fix, so I can use your library for my thesis. Thank you!

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:9 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
Zulkocommented, May 15, 2019

Regarding the naming, I am thinking of changing the terms in the future to “rank_harmonize” (match codon ranks in original host and target) and “frequence_harmonize” (the current algorithm). I’ll make the necessary version bumps and warnings.

1reaction
Zulkocommented, May 15, 2019

Ok this is fixed on Github and PyPI. I have also changed the code so that now you have the convenient compare_frequencies method to check the final result:

from dnachisel import *

protein = "DDDKKKKKK"
sequence = reverse_translate(protein)
harmonization = CodonOptimize(species='b_subtilis', mode='harmonized')
problem = DnaOptimizationProblem(
            sequence=sequence,
            constraints=[EnforceTranslation()],
            objectives=[harmonization]
           )

print ('Sequence_before:', sequence)
problem.optimize()
print ('New sequence:', problem.sequence)

comparison = harmonization.compare_frequencies(problem.sequence, text_mode=True)
print (comparison)

Output:

Sequence_before: GACGACGACAAAAAAAAAAAAAAAAAA
New sequence: GACGATGATAAAAAGAAAAAAAAAAAG

  K: 
    total: 6
    AAA: 
      sequence: 0.67
      table: 0.7
    
    AAG: 
      sequence: 0.33
      table: 0.3
    
  
  D: 
    total: 3
    GAC: 
      sequence: 0.33
      table: 0.36
    
    GAT: 
      sequence: 0.67
      table: 0.64

Note that for this particular optimization there is a risk that DnaChisel introduces a bit of codon bias, because it solves left-to-right (which may cause spatial bias) and because reverse_translate always uses the same codons. to be clear, I am not certain this is the case. In any case I have added an option to randomize the codons when reverse-translating the protein sequence: reverse_translate(sequence, randomize_codons=True).

Let me know if that works for you!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Codon optimization with deep learning to enhance protein ...
In gene synthesis, codon optimization involves recombination based on different criteria without changing the sequence of the amino acid and can ...
Read more >
Codon optimization tool makes synthetic gene design easy
Use the free IDT Codon Optimization Tool to simplify designing synthetic genes and single-stranded or double-stranded DNA fragments for expression in a ...
Read more >
Predicting synonymous codon usage and optimizing the ...
The role of synonymous codons is unclear, as they do not alter the encoded amino acid sequence. ... gene may need to be...
Read more >
Codon optimisation for maximising gene expression ... - bioRxiv
approaches to optimise CDSs. Another method, Chimera Map, constructs coding sequences using codon “blocks” from native coding sequences ...
Read more >
Built-in Specifications - Edinburgh Genome Foundry
MatchTargetCodonUsage. Codon-optimize a sequence so it has the same codon usage as a target. HarmonizeRCA. Codon-Harmonize a native sequence for a new host ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found