Refactor parsers
See original GitHub issueThe extract
methods of the parsers have become unruly now that we support so many attributes, so it would be good to refactor these into smaller chunks. One possibility would be to take advantage of decorators which would allow all wrapped functions to automatically update progress, have a method that returns which attributes are actually supported and any parser-specific doc-strings, and ability to parse just the attributes one cares about instead of whole file.
I’ve pushed some code to my repo for this decorator and how it would be used: https://github.com/ATenderholt/cclib/commit/59453355360e3a682f10160778507cfdbfe646a3
Here’s an example:
>>> from cclib.parser import Gaussian
>>> Gaussian.get_supported_attributes().keys()
dict_keys(['newattr']) # the only attribute parsed with decorated function is newattr
>>> Gaussian.get_supported_attributes()["newattr"].__name__
'parsing_test' # the function is called 'parsing_test'
>>> Gaussian.get_supported_attributes()["newattr"].__doc__
'Parses newattr.' # and here's its docstring
>>> parser = Gaussian("nonexistant.log")
>>> parser.fupdate = 0.5 # needed in this demo because fupdate isn't set unless parse() is called
>>> parser.parsing_test("some good line", None) # lets parse a line that should actually enter this function; None is because I don't have an associated inputfile in this example.
some good line
>>> parser.parsing_test("some bad line", None) # here's a block we should skip over
>>>
Only downside I see is the overhead of calling into the decorators, but I think simplified code and methods to help generate documentation would make it worth it.
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:7 (7 by maintainers)
According to https://twitter.com/mmwieclaw/status/1498652945394683911/photo/1, we are 10x slower than the custom parsing code in https://github.com/mishioo/tesliper/blob/master/tesliper/extraction/gaussian_parser.py for Gaussian.
An additional reason to refactor is to find where hotspots are, because parsing can be slow (proof in https://github.com/patonlab/GoodVibes/pull/43#issuecomment-1018148575).