question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Replace each nested list comprehension with a single DB query in BIDSLayout.__repr__

See original GitHub issue

I think there is currently some serious performance issues with BIDSLayout. Using a somewhat average database of 132 subjects (1 session and 1 run per subject), it needs about 1:15 minute to get a layout object. Using the following code:

from bids import BIDSLayout
%lprun -f BIDSLayout.__init__ BIDSLayout("/media/christian/ElementsSE/MPI-Leipzig_Mind-Brain-Body-LEMON/BIDS_LEMON/")

I get the following profiling report :

Total time: 76.4714 s
File: /home/christian/pybids/bids/layout/layout.py
Function: __init__ at line 196

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   196                                               def __init__(self, root, validate=True, absolute_paths=True,
   197                                                            derivatives=False, config=None, sources=None, ignore=None,
   198                                                            force_index=None, config_filename='layout_config.json',
   199                                                            regex_search=False, database_path=None, database_file=None,
   200                                                            reset_database=False, index_metadata=True):
   201                                                   """Initialize BIDSLayout."""
   202         1          4.0      4.0      0.0          self.root = str(root)
   203         1          2.0      2.0      0.0          self.validate = validate
   204         1          1.0      1.0      0.0          self.absolute_paths = absolute_paths
   205         1          2.0      2.0      0.0          self.derivatives = {}
   206         1          2.0      2.0      0.0          self.sources = sources
   207         1          3.0      3.0      0.0          self.regex_search = regex_search
   208         1          2.0      2.0      0.0          self.config_filename = config_filename
   209                                                   # Store original init arguments as dictionary
   210         1          3.0      3.0      0.0          self._init_args = self._sanitize_init_args(
   211         1          2.0      2.0      0.0              root=root, validate=validate, absolute_paths=absolute_paths,
   212         1          2.0      2.0      0.0              derivatives=derivatives, ignore=ignore, force_index=force_index,
   213         1         91.0     91.0      0.0              index_metadata=index_metadata, config=config)
   214                                           
   215         1          4.0      4.0      0.0          if database_path is None and database_file is not None:
   216                                                       database_path = database_file
   217                                                       warnings.warn(
   218                                                           'In pybids 0.10 database_file argument was deprecated in favor'
   219                                                           ' of database_path, and will be removed in 0.12. '
   220                                                           'For now, treating database_file as a directory.',
   221                                                           DeprecationWarning)
   222         1          4.0      4.0      0.0          if database_path:
   223                                                       database_path = str(Path(database_path).absolute())
   224                                           
   225         1         47.0     47.0      0.0          self.session = None
   226                                           
   227         1      25891.0  25891.0      0.0          index_dataset = self._init_db(database_path, reset_database)
   228                                           
   229                                                   # Do basic BIDS validation on root directory
   230         1        488.0    488.0      0.0          self._validate_root()
   231                                           
   232         1          4.0      4.0      0.0          if ignore is None:
   233         1          3.0      3.0      0.0              ignore = self._default_ignore
   234                                           
   235                                                   # Instantiate after root validation to ensure os.path.join works
   236         1          3.0      3.0      0.0          self.ignore = [os.path.abspath(os.path.join(self.root, patt))
   237                                                                  if isinstance(patt, str) else patt
   238         1        102.0    102.0      0.0                         for patt in listify(ignore or [])]
   239         1          3.0      3.0      0.0          self.force_index = [os.path.abspath(os.path.join(self.root, patt))
   240                                                                       if isinstance(patt, str) else patt
   241         1          4.0      4.0      0.0                              for patt in listify(force_index or [])]
   242                                           
   243                                                   # Initialize the BIDS validator and examine ignore/force_index args
   244         1          4.0      4.0      0.0          self._validate_force_index()
   245                                           
   246         1          1.0      1.0      0.0          if index_dataset:
   247                                                       # Create Config objects
   248         1          2.0      2.0      0.0              if config is None:
   249         1          2.0      2.0      0.0                  config = 'bids'
   250         1          1.0      1.0      0.0              config = [Config.load(c, session=self.session)
   251         1      62271.0  62271.0      0.1                        for c in listify(config)]
   252         1         15.0     15.0      0.0              self.config = {c.name: c for c in config}
   253                                                       # Missing persistence of configs to the database
   254         2          6.0      3.0      0.0              for config_obj in self.config.values():
   255         1        308.0    308.0      0.0                  self.session.add(config_obj)
   256         1      27372.0  27372.0      0.0                  self.session.commit()
   257                                           
   258                                                       # Index files and (optionally) metadata
   259         1         35.0     35.0      0.0              indexer = BIDSLayoutIndexer(self)
   260         1   28127988.0 28127988.0     36.8              indexer.index_files()
   261         1          3.0      3.0      0.0              if index_metadata:
   262         1   48226769.0 48226769.0     63.1                  indexer.index_metadata()
   263                                                   else:
   264                                                       # Load Configs from DB
   265                                                       self.config = {c.name: c for c in self.session.query(Config).all()}
   266                                           
   267                                                   # Add derivatives if any are found
   268         1          3.0      3.0      0.0          if derivatives:
   269                                                       if derivatives is True:
   270                                                           derivatives = os.path.join(root, 'derivatives')
   271                                                       self.add_derivatives(
   272                                                           derivatives, parent_database_path=database_path,
   273                                                           validate=validate, absolute_paths=absolute_paths,
   274                                                           derivatives=None, sources=self, ignore=ignore,  config=None,
   275                                                           force_index=force_index, config_filename=config_filename,
   276                                                           regex_search=regex_search, index_metadata=index_metadata,
   277                                                           reset_database=index_dataset or reset_database
   278                                                           )

For day-to-day interaction with a dataset, development tests, etc., this kind of delay seems prohibitive to me…

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:27

github_iconTop GitHub Comments

1reaction
adelavegacommented, Dec 9, 2019

Actually, looking at this more closely, looks __repr__ is not using the sql queries, and is doing some slow nested list comprehensions, so this is probably the issue. There is a TODO comment to implement this.

Especially calculating the number of sessions, looking over subjects, is pretty slow.

https://github.com/bids-standard/pybids/blob/35e1296202959d375e570d08078282c26ad02bc0/bids/layout/layout.py#L302

0reactions
adelavegacommented, Dec 10, 2019

Ah yeah, you’re right. I’m still not getting how to use just the Tag model to count the distinct combinations of subject and session though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Replace Value Nested List Comprehension - python
The nested list comp is excessive IMO. array2 is really the wrong data structure for what it ... This should get what you...
Read more >
How to Write Nested List Comprehensions in Python | Built In
List comprehensions create a new list by scanning all items in a list, checking to see if they meet a condition, passing the...
Read more >
Python List Comprehension Tutorial - DataCamp
Learn how to effectively use list comprehension in Python to create lists, to replace (nested) for loops and the map(), filter() and reduce()...
Read more >
Nested List Comprehensions in Python - GeeksforGeeks
It is a smart and concise way of creating lists by iterating over an iterable object. Nested List Comprehensions are nothing but a...
Read more >
vocab.txt - Hugging Face
... mode ##wer templ ##ream results ##ler ##ples ##ired mult last db ##ature sum appl back extra dim ##pert exec ip search level...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found