Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Model interpretation in Python script takes several minutes

See original GitHub issue

Hello!

I am trying to implement model interpretation within my ML pipeline. The interpretation instructions provided in the README work perfectly fine when I tried them on command line. Then, I tried to include this into my code by writing a small function. The method works but it takes several minutes to provide an output unlike the command line which returns the result immediately. I tried this with just one compound: Clc1ccc(c(Cl)c1Cl)c2c(Cl)cc(Cl)c(Cl)c2Cl

I would really appreciate your help in fixing this. Code and output below.

Best, Vishal

Code:

import tempfile
import time
import pandas as pd
import sys
sys.path.insert(0, './predictors/chemprop')
from chemprop.args import InterpretArgs
from chemprop.interpret import interpret

def get_interpretation(kekule_smiles_df):

    start = time.time()
    intrprt_df = pd.DataFrame()

    with tempfile.NamedTemporaryFile(delete=True) as temp:

        kekule_smiles_df.to_csv(temp.name + '.csv', index=None)

        # interpretation arguments
        intrprt_args = [
        '--data_path', temp.name + '.csv',
        '--checkpoint_path', './models/pampa/gcnn_model.pt',
        '--property_id', '1',
        ]

        # slightly modified the interpret method to return a dataframe for further use
        intrprt_df = interpret(args=InterpretArgs().parse_args(intrprt_args))
        temp.close()
    end = time.time()
    print(f'{end - start} seconds to interpret {kekule_smiles_df.shape[0]} molecules')
    return intrprt_df

Output:

Loading pretrained parameter "encoder.encoder.cached_zero_vector".
Loading pretrained parameter "encoder.encoder.W_i.weight".
Loading pretrained parameter "encoder.encoder.W_h.weight".
Loading pretrained parameter "encoder.encoder.W_o.weight".
Loading pretrained parameter "encoder.encoder.W_o.bias".
Loading pretrained parameter "ffn.1.weight".
Loading pretrained parameter "ffn.1.bias".
Loading pretrained parameter "ffn.4.weight".
Loading pretrained parameter "ffn.4.bias".
smiles,score,rationale,rationale_score
ClC1=C(Cl)C(Cl)=C(C2=C(Cl)C(Cl)=C(Cl)C=C2Cl)C=C1,0.981,Clc1c[cH:1]c(Cl)[cH:1]c1[CH3:1],1.000
1469.1523442268372 seconds to interpret 1 molecules

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

hesthercommented, Oct 8, 2021

I will need to run some tests. Unfortunately, the MCTS algorithm in general is not really fast. However, you can parallelize it manually by e.g. splitting up your test data into multiple files (command line), or running multiple processes within a python script.

I will try to reproduce the weird timing behavior and track it down (if I can reproduce it)

0reactions

iwwwishcommented, Oct 13, 2021

Thank you @hesther. I am not sure I’ll be able to do this myself completely. But if I were to get around it, I will definitely open a PR.

Top Results From Across the Web

Why Python is so slow and how to speed it up | by Mike Huls

These design choices, however, do make Python code slower than other ... Python is an interpreted, high-level, general-purpose programming language.

Python script that is executed every 5 minutes - GeeksforGeeks

In this article, we will discuss how to execute a Python script after every 5 minutes. Let's discuss some methods for doing this....

Time Series Analysis in Python - A Comprehensive Guide with ...

This guide walks you through the process of analyzing the characteristics of a given time series in python. Time Series Analysis in Python...

How to Make Python Code Run Incredibly Fast - KDnuggets

Despite having these many qualities, python has one drawback, which is it's slow speed. Being an interpreted language, python is slower than ...

Find out time it took for a python script to complete execution

from datetime import datetime startTime = datetime.now() #do something #Python 2: print datetime.now() - startTime #Python 3: print(datetime.now() ...