question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

When using standalone mode, limit the recursion into imported modules to only those modules that are actually utilized.

See original GitHub issue

When trying to build a standalone executable, the number of included modules can quickly explode, especially if the application uses large third-party packages. Ideally, Nuitka should only recurse into the modules that are actually used by the application. Currently there is no mechanism to do so, apart from manually specifying --nofollow-import-to flags for all the unwanted modules.

As a simple example for motivation, suppose an application uses the popular machine learning package sklearn (AKA scikit-learn):

from sklearn import datasets
datasets.load_iris()

Even though the above code uses only load_iris(), Nuitka will simply look at the imports for the datasets module, which include the following:

from .base import load_breast_cancer
from .base import load_boston
from .base import load_diabetes
from .base import load_digits
from .base import load_files
from .base import load_iris
from .base import load_linnerud
from .base import load_sample_images
from .base import load_sample_image
from .base import load_wine
from .base import get_data_home
from .base import clear_data_home
from .covtype import fetch_covtype
from .kddcup99 import fetch_kddcup99
from .lfw import fetch_lfw_pairs
from .lfw import fetch_lfw_people
from .twenty_newsgroups import fetch_20newsgroups
from .twenty_newsgroups import fetch_20newsgroups_vectorized
from .mldata import fetch_mldata, mldata_filename
from .openml import fetch_openml
from .samples_generator import make_classification
from .samples_generator import make_multilabel_classification
from .samples_generator import make_hastie_10_2
from .samples_generator import make_regression
from .samples_generator import make_blobs
from .samples_generator import make_moons
from .samples_generator import make_circles
from .samples_generator import make_friedman1
from .samples_generator import make_friedman2
from .samples_generator import make_friedman3
from .samples_generator import make_low_rank_matrix
from .samples_generator import make_sparse_coded_signal
from .samples_generator import make_sparse_uncorrelated
from .samples_generator import make_spd_matrix
from .samples_generator import make_swiss_roll
from .samples_generator import make_s_curve
from .samples_generator import make_sparse_spd_matrix
from .samples_generator import make_gaussian_quantiles
from .samples_generator import make_biclusters
from .samples_generator import make_checkerboard
from .svmlight_format import load_svmlight_file
from .svmlight_format import load_svmlight_files
from .svmlight_format import dump_svmlight_file
from .olivetti_faces import fetch_olivetti_faces
from .species_distributions import fetch_species_distributions
from .california_housing import fetch_california_housing
from .rcv1 import fetch_rcv1

Nuitka will start to recurse into all of these modules, even though base.load_iris is the only one that is actually needed. Furthermore, it will recurse into hundreds of additional other modules after reading the import statements of the above scikit-learn modules.

Referencing a short conversation with @kayhayen on Gitter:

The idea here would be to use the import tracing hidden in hints.py in nuitka git checkout, run your program with it, collect all imports, and then to make a change to Nuitka, where --no-follow is added to the command line with --follow-to=all_the,names,you,saw,being,actually,imported That way, sort of making it possible to avoid packages that never even do get imported at least.

I will start to look into this technique. If others are interested to contribute, please let me know.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:25 (20 by maintainers)

github_iconTop GitHub Comments

1reaction
JorjMcKiecommented, Mar 28, 2019

Just finished a first version the stuff mentioned in my previous post. It consists of the following components:

  • a (slightly) modified version of hints.py
  • script get-hints.py which reads the to-be-examined script and and runs it under control of hints.py. It collects the output and creates a JSON file script.json for the Python script.py. The JSON file is a sorted array of all imports made by the script during the test run. Duplicates have been removed.
  • script nuitka-hints.py, which invokes the Nuitka compiler, passing any appropriate or site / specific options to it (among them --standalone), and also --user-plugin=hinted-mods.py=script.json. File script.json is the file from the previous test run.
  • hinted-mods.py is a user plugin, which in its __init__ reads the JSON file and adds some options in response to the JSON contents, like enabling plugins. Its main logic is contained in its onModuleEncounter method, where it checks for each module handed in by the Nuitka compiler, whether it is part of the JSON file.

@robguinness, @kayhayen, @sannanansari: I am going to put a ZIP file of these components on the utilities repo. Maybe you are interested in trying something out yourself. Here is a sample output for a script that uses PIL / Pillow to open an image file:

D:\Jorj\Desktop\Develop\Nuitka>python nuitka-hints.py pil-test.py
NUITKA is compiling 'pil-test.py' with these options:
 --standalone
 --mingw64
 --python-flag=nosite
 --remove-output
 --experimental=use_pefile
 --disable-dll-dependency-cache
 --user-plugin=hinted-mods.py=pil-test.json

Nuitka:INFO: hinted-mods.py is adding the following options:
Nuitka:INFO: --disable-plugin=numpy-plugin
Nuitka:INFO: --recurse-not-to=numpy
Nuitka:INFO: --disable-plugin=tk-plugin
Nuitka:INFO: --recurse-not-to=PIL.ImageTk
Nuitka:INFO: --disable-plugin=qt-plugins
Nuitka:INFO: --recurse-not-to=PIL.ImageQt
Nuitka:INFO:User plugin 'hinted-mods.py' loaded.
Nuitka:INFO: excluding unreferenced tempfile
Nuitka:INFO: excluding unreferenced PIL.PyAccess
Nuitka:INFO: excluding unreferenced PIL.ImageFilter
Nuitka:INFO: excluding unreferenced PIL.ImageQt
Nuitka:INFO: excluding unreferenced PIL.ImageQt
Nuitka:INFO: excluding unreferenced PIL.ImageQt
Nuitka:INFO: excluding unreferenced PIL.ImageQt
Nuitka:INFO: excluding unreferenced PIL.ImageShow
Nuitka:INFO: excluding unreferenced colorsys
Nuitka:INFO: excluding unreferenced colorsys
Nuitka:INFO: excluding unreferenced random
Nuitka:INFO: excluding unreferenced subprocess
Nuitka:INFO: excluding unreferenced tempfile
Nuitka:INFO: excluding unreferenced subprocess
Nuitka:INFO: excluding unreferenced PIL.MpoImagePlugin
Nuitka:INFO: excluding unreferenced copy
Nuitka:INFO: excluding unreferenced subprocess
Nuitka:INFO: excluding unreferenced _cffi_backend
Nuitka:INFO: excluding unreferenced cffi.cparser
Nuitka:INFO: excluding unreferenced cffi.verifier
Nuitka:INFO: excluding unreferenced sysconfig
Nuitka:INFO: excluding unreferenced distutils.dir_util
Nuitka:INFO: excluding unreferenced cffi.recompiler
Nuitka:INFO: excluding unreferenced cffi.recompiler
Nuitka:INFO: excluding unreferenced cffi.recompiler
Nuitka:INFO: excluding unreferenced cffi.recompiler
Nuitka:INFO: excluding unreferenced ctypes.util
Nuitka:WARNING:Unresolved '__import__' call at 'C:\Users\Jorj\AppData\Local\Programs\Python\Python37\lib\site-packages\PIL\Image.py:428' may require use of '--include-plugin-directory' or '--include-plugin-files'.
1reaction
robguinnesscommented, Mar 22, 2019

@sannanansari , great that you can contribute! I’m not sure how quickly I can get to this, as it seems a bit more complicated than I originally thought. But I am still definitely interested in working on it. I will let you know if I make any progress.

Read more comments on GitHub >

github_iconTop Results From Across the Web

6. Modules — Python 3.11.1 documentation
These statements are intended to initialize the module. They are executed only the first time the module name is encountered in an import...
Read more >
Python import: Advanced Techniques and Tips
In this tutorial, you'll learn how to: Use modules, packages, and namespace packages; Handle resources and data files inside your packages; Import modules...
Read more >
Installing and Importing Modules in Python 3 - Linode
Learn how to install modules in Python 3 and import them using `import`, ... but the import command is used to actually import...
Read more >
Go Modules Reference - The Go Programming Language
If exactly one module in the build list provides the package, that module is used. If no modules provide the package or if...
Read more >
Modules — Clang 16.0.0git documentation
What declarations in those headers are actually meant to be part of the API, ... Because modules can only be built standalone, tools...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found