question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

detection is really slow in some cases

See original GitHub issue

Hey there, first of all, great project!

The following commands takes a significant amount of time:

> python3 -m timeit -n 1 -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')" 
1 loop, best of 5: 13.2 sec per loop
python3 -m timeit "from clevercsv import Detector; Detector().detect('a'*18)" 
1 loop, best of 5: 8.24 sec per loop

After benchmarking a little bit, the apparent cause is that the unix_path and url regexes in the detector are susceptible to a ReDOS .

These change, which replace the regexes with (hopefully) equivalent ones fixes the most oblivious issues:

-    "url": "((https?|ftp):\/\/(?!\-))?(((([\p{L}\p{N}]*\-?[\p{L}\p{N}]+)+\.)+([a-z]{2,}|local)(\.[a-z]{2,3})?)|localhost|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(\:\d{1,5})?))(\/[\p{L}\p{N}_\/()~?=&%\-\#\.:]*)?(\.[a-z]+)?",
-    "unix_path": "(\/|~\/|\.\/)(?:[a-zA-Z0-9\.\-\_]+\/?)+",
+    "url": "((https?|ftp):\/\/(?!\-))?(((?:[\p{L}\p{N}-]+\.)+([a-z]{2,}|local)(\.[a-z]{2,3})?)|localhost|(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(\:\d{1,5})?))(\/[\p{L}\p{N}_\/()~?=&%\-\#\.:]*)?(\.[a-z]+)?",
+    "unix_path": "[~.]?(?:\/[a-zA-Z0-9\.\-\_]+)+\/?",

New results:

> python3 -m timeit -n 1 -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')" 
1 loop, best of 5: 4.17 msec per loop
:0: UserWarning: The test results are likely unreliable. The worst time (347 msec) was more than four times slower than the best time (4.17 msec).
> python3 -m timeit "from clevercsv import Detector; Detector().detect('a'*18)" 
1 loop, best of 5: 217 usec per loop

Python version: 3.8

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:5

github_iconTop GitHub Comments

1reaction
GjjvdBurgcommented, May 12, 2020

Thanks again @kaskawu for reporting this issue. I’ve updated CleverCSV using the unix_path regex you suggested above (diving into it, that regex seemed to be the problem). I’m preparing an updated release of the package now. Thanks also @lmmentel for confirming!

0reactions
lmmentelcommented, May 12, 2020

Same here, performance drops with python3.8

python --version
Python 3.8.1
python -m timeit -n 1 -r 1 -- "from clevercsv import Detector; Detector().detect('fileurl="file://$PROJECT_DIR$/../aaaaaa_aaaaaaa_aaaaa/.aaa/." filepath=$')" 
1 loop, best of 1: 8.34 sec per loop
Read more comments on GitHub >

github_iconTop Results From Across the Web

Object detection Slow processing video #4164 - GitHub
My CPU is hardly affected while my webcam feed is getting processed in real time, yet the speed is horrible. More like 1...
Read more >
Python SimpleBlobDetector.detect() is very slow with big ...
I program in Python 3.X and use the opencv simple blob detector to retrieve their key points. This works very good.
Read more >
Dementia - Diagnosis and treatment - Mayo Clinic
Simple blood tests can detect physical problems that can affect brain function, such as vitamin B-12 deficiency or an underactive thyroid gland.
Read more >
Facial hair may slow detection of happy facial expressions in ...
We adopted a design where participants searched for targets based on emotional expression (the presence or absence of a beard was incidental to ......
Read more >
Why Is My Mac Running Slowly? Possible Reasons and Fixes
1. Some apps or processes may be taking up a hefty chunk of your Mac’s processor power, preventing other processes from running efficiently....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found