Next-word-predictions development discussion thread
See original GitHub issueHi all!
The time has finally come, I am working on dictionary suggestions! As I want the development of this feature to be as transparent and open as possible, I will share the roadmap for the next 2-3 weeks with you. This issue thread serves as a room for general discussion about the implementation and feedback given in this issue will directly influence the early development of suggestions. The development of word suggestions is rather big, so I split it up in four main phases. After finishing each phase, the already implemented part of the suggestions will be released, so all users on F-Droid and Google Play can try it out. Only if you want to, you can also install the debug artifact builds which get created for each commit in the PR, so you can try out new stuff even faster.
During development, all preferences regarding this topic will be marked as [EXPERIMENTAL] and all string resources regarding this feature will be non-translatable. I am doing this to prevent circumstances where I change a string like 5 times and so the translators get notified 5 times to translate the same string over and over again, even though often only a small thing has changed. Nearing the end of the last phase, I will unlock these strings for translation.
For each phase beginning from phase 1, a separate PR will be filed and linked here.
NOTE: below phases are currently crossed out because the time plan got completely overthrown by the OOM issues as well as the Flictionary implementation not working out as it should. I don’t have a concrete plan for suggestions, but I think I will implement spellchecking (and suggesting spell-checked words) first, as this part is a bit easier to add than next-word predictions. The old Flictionary implementation will stay in for now until I’ve found a replacement.
Phase 0
~This phase has technically already finished. It includes collecting ideas for which sources can be used (a big thanks to everyone who commented on #34!!), providing a way to have standardized dictionary formats in the APK (see https://github.com/florisboard/dictionary-tools) as well as reading and understanding how word suggestions should work. As this phase didn’t change a thing in the application itself, no release for this phase is being made.~
Phase 1 (v0.3.8) (#329)
~This is the “real” first phase, in which the input logic code gets an overhaul. This especially changes the EditorInstance
logic to be more performant while providing better context to the prediction logic. Also, simple tests are conducted if the created binary dictionaries from the dictionary-tools are actually readable on Android and are built correctly. Then, the logic for providing words for the current input is created. Note, that the predictions in this phase are quite “dumb”, as the next-word algorithm is missing.~
~Supported languages in this phase:~
- ~English~
Phase 2 (v0.3.11)
~In this phase, experiments regarding the next-word algorithms are conducted and the decision is made, whether to use a traditional “Algorithm” or a local AI module. Additionally, more languages (currently though general only, no specific region dialects etc.) are supported:~
- ~English~
- ~German~
- ~Spanish~
- ~Portuguese~
- ~Italian~
- ~French~
Phase 3 (v0.3.12)
~- Further improvement of the prediction and next-word algorithm.~
- ~All languages from https://github.com/LuminosoInsight/wordfreq are supported, still though no region/dialect specific sources.~
Phase 4 (v0.3.13)
~- Final bug-fixing~
- ~Translations get unlocked on Crowdin for the suggestion-related UI strings.~
- ~If requested, I can look into adding languages from other sources rather than LuminosoInsight as well in this phase to support region/dialect specific word predictions.~
~After Phase 4, Word suggestions for the “main” languages should be up and running. This does of course doesn’t mean that word suggestions are finished. I will then continuously add new languages, add support for dialects of languages, improve the prediction algorithm, fix bugs, etc.~
~I hope that this roadmap gives you a clear insight on what I’ll be working on in the next few weeks. Also note that especially in the first week I will not work on other open bug reports/feature requests (except when a critical security bug arises which has to be fixed asap), so I can fully concentrate on suggestions.~
If you have any thoughts about this or want to give feedback, feel free to comment below!
Issue Analytics
- State:
- Created 3 years ago
- Reactions:87
- Comments:57 (19 by maintainers)
Top GitHub Comments
@patrickgold for the dictionary-tools are you using this project or its logic?
This is really neat, and just some thoughts…
For Learning on the Fly
regarding @kj7rrv’s question, learning without an existing model or wordlist has been done by several projects which implement User History. AnySoftKeyboard, OpenBoard, Indic Keyboard.
The potential challenge is that all the implementations are Java and rely on C++ functionality using JNI interface – based on the original Android Open Source keyboard code.
Prediction Algorithms
Spell Check algorithm
For Spellcheck/Autocorrect There’s a very good article and implementation and it seems to be the real deal. There isn’t a Kotlin version nevertheless.
Again, neat stuff happening here!
@ftyers In a way or another I already use a character based model currently, see https://github.com/florisboard/florisboard/blob/master/app/src/main/java/dev/patrickgold/florisboard/ime/nlp/FlorisLanguageModel.kt
The code of the keyboard you linked looks indeed very usable, as it uses TensorFlow lite to do the heavy memory&computation stuff, which currently are one of the main concerns.
If you want to experiment with adding some of the code to FlorisBoard, you can!
I have a few questions to TensorFlow though: