captcha22 client label should copy files
See original GitHub issueI came across this project and thought I’d give it a try with a very limited set of 4 captchas.
After downloading and installing using pip I ran:
captcha22 client label --input=captchas
I labeled the images and a data directory was created:
$ ls -lh data/
total 2,0M
-rw-rw-r-- 1 thijs thijs 665K dec 5 22:08 2YREA9.png
-rw-rw-r-- 1 thijs thijs 630K dec 5 22:08 6PA5K7.png
-rw-rw-r-- 1 thijs thijs 11K dec 5 22:03 NTEMYU.png
-rw-rw-r-- 1 thijs thijs 724K dec 5 22:08 XAMK6Q.png
Unfortunately it seems all my original captchas were moved by this script and I mistakenly deleted the data dir. Have to harvest some new ones now 😦 Would be really useful if captcha22 leaves the originals alone or mentions it very clearly in the readme that these files will be moved.
It also seems JPG files are not supported:
INFO:Captcha22 Label Scripts:Executing CAPTCHA Typing Script
INFO:Captcha22 Typer:No png files found
Update: found the --image-type option for this:
captcha22 client label --image-type=jpg --input=captchas
Issue Analytics
- State:
- Created 3 years ago
- Comments:13 (13 by maintainers)
Top Results From Across the Web
CAPTCHA22 is a toolset for building, and training, CAPTCHA ...
CAPTCHA22 is a toolset for building, and training, CAPTCHA cracking models using neural networks. These models can then be used to crack CAPTCHAs...
Read more >Breaking the multi colored box: a study of CAPTCHA
The first way is to use a predefined list stored in a text file with the application. The second is that it can...
Read more >Breaking e-Banking CAPTCHAs
These broken e-banking. CAPTCHA schemes are used by thousands of financial institutions worldwide, which are serving hundreds of millions of e-banking customers ......
Read more >Python only get specific value in json - Stack Overflow
Here's what you have to do: var_name = file_name['description']. The file_name is supposed to be the name of the opened json file.
Read more >(PDF) A comparison Study for CAPTCHA Security
Finally, some security analysis methods for CAPTCHA will be ... intended to allow a computer to determine if a remote client is human...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Hi thijstriemstra,
The reason captcha22 moves the images is since when you are labelling a couple, say 1000 - 2000, the amount of space the images require becomes fairly large.
To save space, captcha22 moves the images that have already been labelled. This feature also ensures that if you stop the labelling process at any time, you will not have to start with the first captcha again, as those captchas are now in the ouput directory. Captcha22 never deletes images, just moves them.
We can however look to introduce a flag that will leave the images in the directory, but if this flag is set it would probably mean that it you stop and restart the labelling process, your progress point will be lost.
In terms of your error, ideally you should be using the API to copy the
data.zipfile, since it would solve the naming convention for you. The idea is to keep your captchas organised, so it takes the form of<username>_<captcha_name>_<captcha_version>.zipSo it would be something like:
thijstriemstra_testcaptcha_1.zipYou can always rename the
data.zipmanually to this format before placing it in theUnsorteddirectory.We will be pushing a UI at the end of this year or start of 2021 which will also make it easier to interface with the server.
Published and thanks, forgot to do it last time.
Looking at the captchas I’d say just added more solved captchas to train with will be the main thing to improve your solving accuracy.
However, this is also a captcha with static noise, so another approach would just be to filter the noise out. You could or a background filter, comparing static values across multiple captcha images, but I think a normal gray filter would do the trick as well.
This is an example of gray filter we used previously, of course the start and end values would have to be tweaked.
We are working on it, but will only be able to get this up and running once we’ve done a full convert of AOCR, since that is the main learning engine. I’ll create a milestone for it after we finalise the UI update.