Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Next-word-predictions development discussion thread

See original GitHub issue

Hi all!

The time has finally come, I am working on dictionary suggestions! As I want the development of this feature to be as transparent and open as possible, I will share the roadmap for the next 2-3 weeks with you. This issue thread serves as a room for general discussion about the implementation and feedback given in this issue will directly influence the early development of suggestions. The development of word suggestions is rather big, so I split it up in four main phases. After finishing each phase, the already implemented part of the suggestions will be released, so all users on F-Droid and Google Play can try it out. Only if you want to, you can also install the debug artifact builds which get created for each commit in the PR, so you can try out new stuff even faster.

During development, all preferences regarding this topic will be marked as [EXPERIMENTAL] and all string resources regarding this feature will be non-translatable. I am doing this to prevent circumstances where I change a string like 5 times and so the translators get notified 5 times to translate the same string over and over again, even though often only a small thing has changed. Nearing the end of the last phase, I will unlock these strings for translation.

For each phase beginning from phase 1, a separate PR will be filed and linked here.

NOTE: below phases are currently crossed out because the time plan got completely overthrown by the OOM issues as well as the Flictionary implementation not working out as it should. I don’t have a concrete plan for suggestions, but I think I will implement spellchecking (and suggesting spell-checked words) first, as this part is a bit easier to add than next-word predictions. The old Flictionary implementation will stay in for now until I’ve found a replacement.

Phase 0

~This phase has technically already finished. It includes collecting ideas for which sources can be used (a big thanks to everyone who commented on #34!!), providing a way to have standardized dictionary formats in the APK (see https://github.com/florisboard/dictionary-tools) as well as reading and understanding how word suggestions should work. As this phase didn’t change a thing in the application itself, no release for this phase is being made.~

Phase 1 (v0.3.8) (#329)

~This is the “real” first phase, in which the input logic code gets an overhaul. This especially changes the EditorInstance logic to be more performant while providing better context to the prediction logic. Also, simple tests are conducted if the created binary dictionaries from the dictionary-tools are actually readable on Android and are built correctly. Then, the logic for providing words for the current input is created. Note, that the predictions in this phase are quite “dumb”, as the next-word algorithm is missing.~

~Supported languages in this phase:~

~English~

Phase 2 (v0.3.11)

~In this phase, experiments regarding the next-word algorithms are conducted and the decision is made, whether to use a traditional “Algorithm” or a local AI module. Additionally, more languages (currently though general only, no specific region dialects etc.) are supported:~

~English~
~German~
~Spanish~
~Portuguese~
~Italian~
~French~

Phase 3 (v0.3.12)

~- Further improvement of the prediction and next-word algorithm.~

~All languages from https://github.com/LuminosoInsight/wordfreq are supported, still though no region/dialect specific sources.~

Phase 4 (v0.3.13)

~- Final bug-fixing~

~Translations get unlocked on Crowdin for the suggestion-related UI strings.~
~If requested, I can look into adding languages from other sources rather than LuminosoInsight as well in this phase to support region/dialect specific word predictions.~

~After Phase 4, Word suggestions for the “main” languages should be up and running. This does of course doesn’t mean that word suggestions are finished. I will then continuously add new languages, add support for dialects of languages, improve the prediction algorithm, fix bugs, etc.~

~I hope that this roadmap gives you a clear insight on what I’ll be working on in the next few weeks. Also note that especially in the first week I will not work on other open bug reports/feature requests (except when a critical security bug arises which has to be fixed asap), so I can fully concentrate on suggestions.~

If you have any thoughts about this or want to give feedback, feel free to comment below!

Issue Analytics

State:
Created 3 years ago
Reactions:87
Comments:57 (19 by maintainers)

Top GitHub Comments

3reactions

sabzocommented, Feb 13, 2021

@patrickgold for the dictionary-tools are you using this project or its logic?

This is really neat, and just some thoughts…

For Learning on the Fly

regarding @kj7rrv’s question, learning without an existing model or wordlist has been done by several projects which implement User History. AnySoftKeyboard, OpenBoard, Indic Keyboard.

Studying any of these files from the Indic Keyboard, or it’s unit tests that use the User History feature, already support by Android OS, and going down the rabbit hole, which probably leads to some C++ code, may add more insight to how the user history learning happens.

The potential challenge is that all the implementations are Java and rely on C++ functionality using JNI interface – based on the original Android Open Source keyboard code.

Prediction Algorithms

Word Prediction N-grams was suggested, which is good. Indic/OpenBoard use bi-grams, Swiftkey a few years ago used by tri-grams. There’s a helpful article on N-grams in Kotlin along with the source code.
Neural Nets It would be useful to leave the api open-ended so that developers may replace the underlying algorithm, for example being able to swap out a bi-gram with a quad-gram model, or maybe a bi-gram with a Neural Network model.

Spell Check algorithm

For Spellcheck/Autocorrect There’s a very good article and implementation and it seems to be the real deal. There isn’t a Kotlin version nevertheless.

Again, neat stuff happening here!

2reactions

patrickgoldcommented, Apr 4, 2021

@ftyers In a way or another I already use a character based model currently, see https://github.com/florisboard/florisboard/blob/master/app/src/main/java/dev/patrickgold/florisboard/ime/nlp/FlorisLanguageModel.kt

The code of the keyboard you linked looks indeed very usable, as it uses TensorFlow lite to do the heavy memory&computation stuff, which currently are one of the main concerns.

I would be interested in working on it.

If you want to experiment with adding some of the code to FlorisBoard, you can!

I have a few questions to TensorFlow though:

Are the tflite models for word suggestions pre-generated for each supported language and then learn while the user types or does the model start with zero for each user?
TensorFlow [lite] is open-source and has an Apache 2.0 license, though it is still a Google product, thus I am always a bit cautious at first when it comes to Google’s privacy practices. As far as I understood TensorFlow lite runs 100% locally and requires no Internet permission, so all data learned from typing stays within the app, right? It is important that this is ensured because the whole purpose of FlorisBoard is to put user’s privacy at first.
What size does the TensorFlow lite library has when added to the APK? For many FlorisBoard users it is important that the APK size stays relatively small, thus I am asking beforehand.

Top Results From Across the Web

Predictive text / suggest next word in sentence on iOS?

It will suggest three contextually relevant next-word-predictions on the bar at ... in the new thread : h ttps://forums.developer.apple.com/thread/104045.

Thinking ahead: spontaneous next word predictions in context ...

By carefully manipulating the contextual embeddings and developing an embedding-based decoder, we show how both context and next-word prediction ...

AI is changing scientists' understanding of language learning

But a preliminary study that has not yet been peer-reviewed found that GPT-2 can still model human next-word predictions and brain ...

[AN #156]: The scaling hypothesis: a plan for building AGI

As a simple example of how the scaling hypothesis affects AI safety research, it suggests that the training objective (“predict the next word”) ......

DB 2 Group Development.docx - Discussion Thread

View DB 2 Group Development.docx from HSCO 511 at Liberty University. Discussion Thread: Group Development In this module, you will read and hear...