detection is really slow in some cases
See original GitHub issueHey there, first of all, great project!
The following commands takes a significant amount of time:
> python3 -m timeit -n 1 -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')"
1 loop, best of 5: 13.2 sec per loop
python3 -m timeit "from clevercsv import Detector; Detector().detect('a'*18)"
1 loop, best of 5: 8.24 sec per loop
After benchmarking a little bit, the apparent cause is that the unix_path
and url
regexes in the detector are susceptible to a ReDOS .
These change, which replace the regexes with (hopefully) equivalent ones fixes the most oblivious issues:
- "url": "((https?|ftp):\/\/(?!\-))?(((([\p{L}\p{N}]*\-?[\p{L}\p{N}]+)+\.)+([a-z]{2,}|local)(\.[a-z]{2,3})?)|localhost|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(\:\d{1,5})?))(\/[\p{L}\p{N}_\/()~?=&%\-\#\.:]*)?(\.[a-z]+)?",
- "unix_path": "(\/|~\/|\.\/)(?:[a-zA-Z0-9\.\-\_]+\/?)+",
+ "url": "((https?|ftp):\/\/(?!\-))?(((?:[\p{L}\p{N}-]+\.)+([a-z]{2,}|local)(\.[a-z]{2,3})?)|localhost|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(\:\d{1,5})?))(\/[\p{L}\p{N}_\/()~?=&%\-\#\.:]*)?(\.[a-z]+)?",
+ "unix_path": "[~.]?(?:\/[a-zA-Z0-9\.\-\_]+)+\/?",
New results:
> python3 -m timeit -n 1 -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')"
1 loop, best of 5: 4.17 msec per loop
:0: UserWarning: The test results are likely unreliable. The worst time (347 msec) was more than four times slower than the best time (4.17 msec).
> python3 -m timeit "from clevercsv import Detector; Detector().detect('a'*18)"
1 loop, best of 5: 217 usec per loop
Python version: 3.8
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:5
Top Results From Across the Web
Object detection Slow processing video #4164 - GitHub
My CPU is hardly affected while my webcam feed is getting processed in real time, yet the speed is horrible. More like 1...
Read more >Python SimpleBlobDetector.detect() is very slow with big ...
I program in Python 3.X and use the opencv simple blob detector to retrieve their key points. This works very good.
Read more >Dementia - Diagnosis and treatment - Mayo Clinic
Simple blood tests can detect physical problems that can affect brain function, such as vitamin B-12 deficiency or an underactive thyroid gland.
Read more >Facial hair may slow detection of happy facial expressions in ...
We adopted a design where participants searched for targets based on emotional expression (the presence or absence of a beard was incidental to ......
Read more >Why Is My Mac Running Slowly? Possible Reasons and Fixes
1. Some apps or processes may be taking up a hefty chunk of your Mac’s processor power, preventing other processes from running efficiently....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks again @kaskawu for reporting this issue. I’ve updated CleverCSV using the unix_path regex you suggested above (diving into it, that regex seemed to be the problem). I’m preparing an updated release of the package now. Thanks also @lmmentel for confirming!
Same here, performance drops with python3.8