Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Maintaining Oppia's Core Data with Backend Validation Checks

See original GitHub issue

All Validation Check Issues:

#13449
#13603
#13764
#13820
#13821
[you are here] #13822

Introduction

Background

Oppia’s data storage is frequently updated to have the most recent features. To ensure a smooth user and developer experience, we need to have checks in place which ensure the integrity of the data currently being stored, as well as the integrity of future data.

These checks should exist in both the frontend and the backend. Oppia’s users interact with Oppia’s data in the GUI (frontend), so we need frontend validation checks that stop the user from inputting anything invalid. In the case where the user does manage to do something bad, the inputs get routed to our Python backend, and that’s where we need backend validation checks. These backend validation checks stop the bad inputs from reaching Oppia’s data storage and are the final line of defense.

This starter task focuses on the backend half of the validation checks.

Getting Started

Review the example and instructions below on how to add a backend validation check. Then leave a message on the issue thread asking to be assigned to a validation check. We’ll assign you to the check by adding your username next to the item. If you have any questions, send a message on the issue thread, and we’ll help you out!

Example

Context

Let’s say we want to guarantee that all explorations (lessons) have titles with no more than 36 characters. We previously did not have a limit like this, but want to add it so that titles display nicely on Android phones. Since we did not have this check before, there may be explorations in Oppia’s data storage that have titles with more than 36 characters.

Phase 1

We need to figure out which explorations (if any) violate this new validation! This is called a Beam Job. We use Beam Jobs to audit (search through) Oppia’s data storage and find data that violate the validation check that we want to add. After the Beam Job is run, if we are lucky and find that no exploration titles violate it, then we can move on to Phase 2. Otherwise, we need to figure out what to do with those violating explorations. Maybe we need to raise the limit from 36 to 72? Maybe we need to cut off all of the characters after the 36th? That decision is up to you to discuss with other contributors!

Phase 2

Once we know that no exploration titles that are already stored have more than 36 characters, we need to add a backend validation check to Oppia’s code that will stop any new explorations from having violating titles. Recently, we added a frontend validation check in the UI that should hopefully stop the user from creating any invalid exploration titles. However, since we want to be extra secure, we still add backend validation checks as the last line of defense. That backend validation check is Phase 2.

How to Add Backend Validation Checks for Core Models

Phase 1: Write a Beam Job

Idea: Write a script that audits all of Oppia’s existing data and finds any data that does not follow the backend validation check that we want to add. Sample PR: #14343 NOTE: don’t make separate PRs for Phase 1 and Phase 2. Just modify the same PR.

Firstly, confirm that we don’t already have a backend validation implemented for this check. You can find relevant the files in the core/domain folder. (For example, we would look in exp_domain.py for the example above.) If a backend validation already exists, then we probably don’t need a Beam job, and you can take up another available check.
Write the validation job, following the documentation on Beam Jobs. Your Beam Job file should be core.jobs.batch_jobs.<model_type>_validation_jobs.py.
Test your Beam Job locally as a Release Coordinator (see instructions).
Run the Beam Job unit tests: python -m scripts.run_backend_tests --test_target=core.jobs.batch_jobs.<model_type>.validation_jobs_test, and ensure they all pass.
Create a PR and wait for the review!
After you’ve been given the OK on the PR, submit a request for your Beam Job to be tested on a production server using this form. Your Beam Job is an audit job. You can optionally read more about this here. For an example of what the Beam job instructions look like, see this Google Doc.
Wait for an Oppia admin to send you the results of your Beam Job.
If you receive errors, do the following:
- Check whether any of the errors correspond to the curated lessons; if so, record them in this spreadsheet. (You’ll find a list of curated exploration IDs in the spreadsheet.)
- Update the job tracker spreadsheet to keep track of the Beam job results and decisions.

Note -> We are following a particular template for our beam job results, make sure to follow this template so that it will be easier for us to keep track of the results. template for beam job errors - The id of {{tag}} is {{id}} and its {{field that's being validated}} is {{current data}} eg - The id of exploration is 10 and its category is Test (reference)

Phase 2: Add the Backend Validation Check

Idea: Add a check that stops any invalid data from entering Oppia’s storage in the future. Sample PR: #14962 NOTE: don’t make separate PRs for Phase 1 and Phase 2. Just modify the same PR.

Add a backend validation to guarantee that no new data violates the issue we are working on. In our case, there should be no exploration with a title whose length is greater than 36 characters.
You will be adding the backend validation in the validate() method in the relevant object class in the domain files (see the core/domain/ folder). This layer validates the data before finally storing it. In the example above, we will be adding the validation in exp_domain.py.
Find the appropriate class which contains the field for which you will be applying the validation. In our case, it will be class Exploration, since that class contains the title field. Add your validation check to that class, and raise a validation error in case of violation.
Add a test in the test file associated with the domain file. In our case, it’s the exp_domain_test.py file.
Now, after implementing the back-end validation, we need to conduct a small investigation which will let us know if our changes break anything or not. You can take a reference here. If some errors occur while doing this, make sure to add a front-end validation which handles your validation error, so that the user can fix the error without it reaching the backend.
Once you’re done with the above, raise the backend validation PR and you are good to go!

Validation Checks

Available Checks

In order of priority:

🏷️ General State Validation (for Question)

labelled_as_correct should not be True if destination ID is (try again). 🏷️Outcome
The answer group should have at least one rule spec. 🏷️AnswerGroup
The default outcome should have a valid destination node. 🏷️DefaultOutcome
Answer specified in interaction should actually be a correct answer. 🏷️Solution
destination_id should be non-empty and match the ID of a state in the exploration. 🏷️Outcome

🏷️ Core Model Validation

AnswerGroup’s tagged skill misconception IDs should be a list of misconception ID-s attached to one of the skills pointed to by the question’s linked skill IDs. 🏷️ Question: Question.state
Exploration’s title, category, objective, language_code, and tags should all match those of the corresponding exploration summary (can use get_exploration_summary_by_id to find corresponding exploration summary). 🏷️ Exploration and ExplorationSummary
Question summary’s interaction_id should be a valid ID, should match the interaction_id of the corresponding Question’s InteractionInstance, and should be contained within the list of ANDROID-allowed interactions excluding Continue and EndExploration. 🏷️ Question and QuestionSummary
inapplicable_skill_misconception_ids should be a (not necessarily strict) subset of the optional misconceptions associated with the linked skills. inapplicable_skill_misconception_ids should not intersect with the tagged skill misconception ids for the answer groups, but their union should be all of the misconception ids of all the linked skills. 🏷️ Question and Skill
Question summary’s misconception_ids should be the union of all misconception_ids for all of the corresponding question’s linked skills (can use get_question_by_id and get_skill_by_id). 🏷️ Question and QuestionSummary and Skill

🏷️ Low Priority

Subtopic skillIds should be a list of unique strings where each string represents an existing skill_id. 🏷️ Topic
Topic canonical_name should be the lowercase version of the topic_name. 🏷️ Topic <-- only needs backend validation
Topic practice_tab_is_displayed is true only when there are at least 10 practice questions in the topic. 🏷️ Topic and Question
Story corresponding_topic_id should be valid, and that topic should contain this story. 🏷️ Topic and Story

Claimed Checks

🏷️ Curated Lessons (lessons in a topic) @soumyo123-prog

State classifier model_id should be None for curated lessons. 🏷️State
Outcome param_changes should be empty for curated lessons 🏷️Outcome
Outcome refresher_exploration_id should be None for curated lessons. 🏷️Outcome
Outcome missing_prerequisite_skill_id should be None or the ID of a skill. 🏷️Outcome
Exploration param_specs and param_changes should be empty for curated lessons. 🏷️ Exploration
Training data should be empty for curated lessons. 🏷️AnswerGroup

🏷️ General State Validation (for Exploration) @lkbhitesh07

labelled_as_correct should not be True if destination ID is (try again). 🏷️Outcome
The answer group should have at least one rule spec. 🏷️AnswerGroup
The default outcome should have a valid destination node. 🏷️DefaultOutcome
Answer specified in interaction should actually be a correct answer. 🏷️Solution
destination_id should be non-empty and match the ID of a state in the exploration. 🏷️Outcome

🏷️ Core Model Validation

Exploration title should have a max length of 36. 🏷️ Exploration @lkbhitesh07
- Beam Job: #14748
- Backend Validation: #14980
Exploration tags should be a list of at most 10 non-empty strings without duplicates, where each tag has a max length of 30. 🏷️ Exploration @sahiljoster32 #15086
AnswerGroup.tagged_skill_misconception_id should be None. 🏷️ Exploration: Exploration.state @lkbhitesh07

🏷️ Low Priority

Rubric explanations should be a list of at most 10 strings of 300 characters each. 🏷️ Skill @soumyo123-prog #15173
Chapter thumbnail should have background color of #F8BF74, #D68F78, #8EBBB6, or #B3D8F1. 🏷️ Story @soumyo123-prog
Story notes should have at most 5000 characters. 🏷️ Story @gopivaibhav #15324
story_is_published should be a boolean. 🏷️ Topic @gopivaibhav <-- only needs backend validation

Completed Checks

Exploration user rights (owner_ids, editor_ids, voice_artist_ids, viewer_ids) should not have any user IDs in common. @EricZLou
- Beam Job: #14343 Gist
- Backend Validation: #14962
Story description should have at most 1000 characters. @soumyo123-prog #15038
Subtopic thumbnail should have background color of #FFFFFF. @Lawful2002
Misconception ID should be an integer >= 0. @Lawful2002 #15039
Topic abbreviated_name should have at most 39 characters. @Lawful2002 #15094
Question state data schema version should be >= 27. @sahiljoster32 #15264
Topic page_title_fragment_for_web should be non-empty, with min-length 5 and max-length 50.
There must be at least one explanation for the Medium rubric. @lkbhitesh07 #15235
Exploration scaled_average_rating should be a non-negative float between 0 and 5, inclusive. 🏷️ Exploration @Lawful2002 #14995
Subtopic url fragment should be non-empty and match the RegEx “^[a-z]+(-[a-z]+)*$” with at most 25 characters. 🏷️ Topic @Lawful2002 #15500
Story thumbnail should have background color of #F8BF74, #D68F78, #8EBBB6, or #B3D8F1. 🏷️ Story @soumyo123-prog #15137
Exploration category should be one of the fixed list of categories defined by ALL_CATEGORIES in constants.ts. 🏷️ Exploration @Lawful2002 #15342

Issue Analytics

State:
Created 2 years ago
Comments:25 (25 by maintainers)

Top GitHub Comments

1reaction

chiragbaid7commented, Mar 7, 2022

@lkbhitesh07 Thank you for asking I am working on some other issues involving discussion docs and needs to fixed quickly I will start working on this task soon.

0reactions

sahiljoster32commented, Apr 30, 2022

@gopivaibhav Thanks for the reply and for clearing the ambiguity!!