question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Method for adding config settings

See original GitHub issue

Look mama, no config files!

I was wrestling with config files for some of the settings when I ran across this google group discussion about tesseract using java and it made my mouth water. Here’s a code snippet from their discussion:

tesseract = new Tesseract();                      
tesseract.setOcrEngineMode(TessAPI.TessOcrEngineMode.OEM_TESSERACT_ONLY);
tesseract.setPageSegMode(7);
tesseract.setTessVariable("load_system_dawg", "0");
tesseract.setTessVariable("load_freq_dawg", "0");
tesseract.setTessVariable("load_punc_dawg", "0");
tesseract.setTessVariable("load_number_dawg", "0");

At first you may think, well that’s cool I guess but you can really do the same thing by just defining a long string of configs and calling it whenever you need it. For example, '--psm 10 --oem 3 -c load_system_dawg=0 load_freq_dawg=0 load_punc_dawg=0 . . .'

In the tesseract documentation, it mentions that you can’t change ‘init only’ parameters with tesseract executable option -c. And those ‘init only’ parameters would include some of the ones I’ve been messing with. I think that most people would say that it would be nice to be able to set your variables for your config file directly in python using a set_config_variable method instead of having to go make a config file. Since some of the variables that are being set in the code above are in fact ‘init only’, the Java guys must be creating a config file (I did not sniff through their code to verify this, however) from java code.

I haven’t done it yet because I’m not too familiar with the code inside pytesseract, but right now making a temporary config file and letting it be loadable via a set_config_variable method doesn’t seem very hard from my perspective. Here’s the high level logic I’m thinking about:

  • When pytesseract is imported, check the config folder to see if a temp.txt file exists. If so, wipe it clean. If not, create one.
  • When someone calls the tsr.set_config_variable method, just write the variable, a space, and the value on a new line in the temp.txt file.
  • You could also have a method to delete the variable from the file and thus return tesseract to the default.
  • When any of the OCR functions are called, if the user does not manually supply another config file, use the temp.txt as the config file unless it’s empty.

Why this would be a good feature:

  • For me and others like me who wrote their first line of code 8 months ago, even little trips to the back-end of config files or source code can be confusing and take lot’s of time.
  • There’s a lot of super ridiculously lazy people out there just like me who would rather not know anything about how the programs and libraries work which they’re using, but just want to use them to make other interesting applications.

But maybe it’s actually not very easy to implement. Is this actually possible?

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:6
  • Comments:18

github_iconTop GitHub Comments

4reactions
int3lcommented, Oct 2, 2018

Hi, thank you very much for the proposal. This can be implemented and in fact - you can implement it with a custom logic for yourself. At the end of the day - you can make your own logic for handling config files, and then you can pass the resulting config file via the config method argument.

As far as integrating this into pytesseract - well, if I have some free time, I will try to implement the logic for this. The only “problematic” part of this is - where to store this temp config.

And btw, we can have this nice python approach:

config = pytesseract.temp_config(path='<custom_filepath>')
config.set_variables({'key': 'value'})
pytesseract.image_to_string('<image_filepath>', config=config)
3reactions
EricPHamiltoncommented, Nov 16, 2019

I was able to supply my own config file by using the following: (“words” is the name of my config file) pytesseract.run_and_get_output(im, extension=“txt”, config=“words”)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Building a Better Configuration Settings Class - CODE Magazine
To write out the configuration data, you can use a method in the wwUtils class in your source code that allows XML (or...
Read more >
Configuration in ASP.NET Core - Microsoft Learn
Configuration providers that are added later have higher priority and override previous key settings. For example, if MyKey is set in both ...
Read more >
Suggest a method for config file - Stack Overflow
C# applications have an app.config (for desktop apps) or web.config (for asp.net apps) file. In this file you can specify settings, ...
Read more >
Accessing Configuration Settings in ASP.NET Core
Add entries to the appsettings.json file; Create a class with matching properties; Use the IConfiguration object's GetSection method to retrieve ...
Read more >
Four Ways To Read Configuration Setting In C#
Thus, let's move to different ways to add the values inside the config file and the approach we follow to get it. First...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found