question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Maintaining Oppia's Core Data with Backend Validation Checks

See original GitHub issue

All Validation Check Issues:

Introduction

Background

Oppia’s data storage is frequently updated to have the most recent features. To ensure a smooth user and developer experience, we need to have checks in place which ensure the integrity of the data currently being stored, as well as the integrity of future data.

These checks should exist in both the frontend and the backend. Oppia’s users interact with Oppia’s data in the GUI (frontend), so we need frontend validation checks that stop the user from inputting anything invalid. In the case where the user does manage to do something bad, the inputs get routed to our Python backend, and that’s where we need backend validation checks. These backend validation checks stop the bad inputs from reaching Oppia’s data storage and are the final line of defense.

This starter task focuses on the backend half of the validation checks.

Getting Started

Review the example and instructions below on how to add a backend validation check. Then leave a message on the issue thread asking to be assigned to a validation check. We’ll assign you to the check by adding your username next to the item. If you have any questions, send a message on the issue thread, and we’ll help you out!

Example

Context

Let’s say we want to guarantee that all explorations (lessons) have titles with no more than 36 characters. We previously did not have a limit like this, but want to add it so that titles display nicely on Android phones. Since we did not have this check before, there may be explorations in Oppia’s data storage that have titles with more than 36 characters.

Phase 1

We need to figure out which explorations (if any) violate this new validation! This is called a Beam Job. We use Beam Jobs to audit (search through) Oppia’s data storage and find data that violate the validation check that we want to add. After the Beam Job is run, if we are lucky and find that no exploration titles violate it, then we can move on to Phase 2. Otherwise, we need to figure out what to do with those violating explorations. Maybe we need to raise the limit from 36 to 72? Maybe we need to cut off all of the characters after the 36th? That decision is up to you to discuss with other contributors!

Phase 2

Once we know that no exploration titles that are already stored have more than 36 characters, we need to add a backend validation check to Oppia’s code that will stop any new explorations from having violating titles. Recently, we added a frontend validation check in the UI that should hopefully stop the user from creating any invalid exploration titles. However, since we want to be extra secure, we still add backend validation checks as the last line of defense. That backend validation check is Phase 2.

How to Add Backend Validation Checks for Core Models

Phase 1: Write a Beam Job

Idea: Write a script that audits all of Oppia’s existing data and finds any data that does not follow the backend validation check that we want to add. Sample PR: #14343 NOTE: don’t make separate PRs for Phase 1 and Phase 2. Just modify the same PR.

  1. Firstly, confirm that we don’t already have a backend validation implemented for this check. You can find relevant the files in the core/domain folder. (For example, we would look in exp_domain.py for the example above.) If a backend validation already exists, then we probably don’t need a Beam job, and you can take up another available check.
  2. Write the validation job, following the documentation on Beam Jobs. Your Beam Job file should be core.jobs.batch_jobs.<model_type>_validation_jobs.py.
  3. Test your Beam Job locally as a Release Coordinator (see instructions).
  4. Run the Beam Job unit tests: python -m scripts.run_backend_tests --test_target=core.jobs.batch_jobs.<model_type>.validation_jobs_test, and ensure they all pass.
  5. Create a PR and wait for the review!
  6. After you’ve been given the OK on the PR, submit a request for your Beam Job to be tested on a production server using this form. Your Beam Job is an audit job. You can optionally read more about this here. For an example of what the Beam job instructions look like, see this Google Doc.
  7. Wait for an Oppia admin to send you the results of your Beam Job.
  8. If you receive errors, do the following:
    • Check whether any of the errors correspond to the curated lessons; if so, record them in this spreadsheet. (You’ll find a list of curated exploration IDs in the spreadsheet.)
    • Update the job tracker spreadsheet to keep track of the Beam job results and decisions.

Note -> We are following a particular template for our beam job results, make sure to follow this template so that it will be easier for us to keep track of the results. template for beam job errors - The id of {{tag}} is {{id}} and its {{field that's being validated}} is {{current data}} eg - The id of exploration is 10 and its category is Test (reference)

Phase 2: Add the Backend Validation Check

Idea: Add a check that stops any invalid data from entering Oppia’s storage in the future. Sample PR: #14962 NOTE: don’t make separate PRs for Phase 1 and Phase 2. Just modify the same PR.

  1. Add a backend validation to guarantee that no new data violates the issue we are working on. In our case, there should be no exploration with a title whose length is greater than 36 characters.
  2. You will be adding the backend validation in the validate() method in the relevant object class in the domain files (see the core/domain/ folder). This layer validates the data before finally storing it. In the example above, we will be adding the validation in exp_domain.py.
  3. Find the appropriate class which contains the field for which you will be applying the validation. In our case, it will be class Exploration, since that class contains the title field. Add your validation check to that class, and raise a validation error in case of violation.
  4. Add a test in the test file associated with the domain file. In our case, it’s the exp_domain_test.py file.
  5. Now, after implementing the back-end validation, we need to conduct a small investigation which will let us know if our changes break anything or not. You can take a reference here. If some errors occur while doing this, make sure to add a front-end validation which handles your validation error, so that the user can fix the error without it reaching the backend.
  6. Once you’re done with the above, raise the backend validation PR and you are good to go!

Validation Checks

Available Checks

In order of priority:

🏷️ General State Validation (for Question)

  • labelled_as_correct should not be True if destination ID is (try again). 🏷️Outcome
  • The answer group should have at least one rule spec. 🏷️AnswerGroup
  • The default outcome should have a valid destination node. 🏷️DefaultOutcome
  • Answer specified in interaction should actually be a correct answer. 🏷️Solution
  • destination_id should be non-empty and match the ID of a state in the exploration. 🏷️Outcome

🏷️ Core Model Validation

  • AnswerGroup’s tagged skill misconception IDs should be a list of misconception ID-s attached to one of the skills pointed to by the question’s linked skill IDs. 🏷️ Question: Question.state
  • Exploration’s title, category, objective, language_code, and tags should all match those of the corresponding exploration summary (can use get_exploration_summary_by_id to find corresponding exploration summary). 🏷️ Exploration and ExplorationSummary
  • Question summary’s interaction_id should be a valid ID, should match the interaction_id of the corresponding Question’s InteractionInstance, and should be contained within the list of ANDROID-allowed interactions excluding Continue and EndExploration. 🏷️ Question and QuestionSummary
  • inapplicable_skill_misconception_ids should be a (not necessarily strict) subset of the optional misconceptions associated with the linked skills. inapplicable_skill_misconception_ids should not intersect with the tagged skill misconception ids for the answer groups, but their union should be all of the misconception ids of all the linked skills. 🏷️ Question and Skill
  • Question summary’s misconception_ids should be the union of all misconception_ids for all of the corresponding question’s linked skills (can use get_question_by_id and get_skill_by_id). 🏷️ Question and QuestionSummary and Skill

🏷️ Low Priority

  • Subtopic skillIds should be a list of unique strings where each string represents an existing skill_id. 🏷️ Topic
  • Topic canonical_name should be the lowercase version of the topic_name. 🏷️ Topic <-- only needs backend validation
  • Topic practice_tab_is_displayed is true only when there are at least 10 practice questions in the topic. 🏷️ Topic and Question
  • Story corresponding_topic_id should be valid, and that topic should contain this story. 🏷️ Topic and Story

Claimed Checks

🏷️ Curated Lessons (lessons in a topic) @soumyo123-prog

  • State classifier model_id should be None for curated lessons. 🏷️State
  • Outcome param_changes should be empty for curated lessons 🏷️Outcome
  • Outcome refresher_exploration_id should be None for curated lessons. 🏷️Outcome
  • Outcome missing_prerequisite_skill_id should be None or the ID of a skill. 🏷️Outcome
  • Exploration param_specs and param_changes should be empty for curated lessons. 🏷️ Exploration
  • Training data should be empty for curated lessons. 🏷️AnswerGroup

🏷️ General State Validation (for Exploration) @lkbhitesh07

  • labelled_as_correct should not be True if destination ID is (try again). 🏷️Outcome
  • The answer group should have at least one rule spec. 🏷️AnswerGroup
  • The default outcome should have a valid destination node. 🏷️DefaultOutcome
  • Answer specified in interaction should actually be a correct answer. 🏷️Solution
  • destination_id should be non-empty and match the ID of a state in the exploration. 🏷️Outcome

🏷️ Core Model Validation

  • Exploration title should have a max length of 36. 🏷️ Exploration @lkbhitesh07
  • Exploration tags should be a list of at most 10 non-empty strings without duplicates, where each tag has a max length of 30. 🏷️ Exploration @sahiljoster32 #15086
  • AnswerGroup.tagged_skill_misconception_id should be None. 🏷️ Exploration: Exploration.state @lkbhitesh07

🏷️ Low Priority

  • Rubric explanations should be a list of at most 10 strings of 300 characters each. 🏷️ Skill @soumyo123-prog #15173
  • Chapter thumbnail should have background color of #F8BF74, #D68F78, #8EBBB6, or #B3D8F1. 🏷️ Story @soumyo123-prog
  • Story notes should have at most 5000 characters. 🏷️ Story @gopivaibhav #15324
  • story_is_published should be a boolean. 🏷️ Topic @gopivaibhav <-- only needs backend validation

Completed Checks

  • Exploration user rights (owner_ids, editor_ids, voice_artist_ids, viewer_ids) should not have any user IDs in common. @EricZLou
  • Story description should have at most 1000 characters. @soumyo123-prog #15038
  • Subtopic thumbnail should have background color of #FFFFFF. @Lawful2002
  • Misconception ID should be an integer >= 0. @Lawful2002 #15039
  • Topic abbreviated_name should have at most 39 characters. @Lawful2002 #15094
  • Question state data schema version should be >= 27. @sahiljoster32 #15264
  • Topic page_title_fragment_for_web should be non-empty, with min-length 5 and max-length 50.
  • There must be at least one explanation for the Medium rubric. @lkbhitesh07 #15235
  • Exploration scaled_average_rating should be a non-negative float between 0 and 5, inclusive. 🏷️ Exploration @Lawful2002 #14995
  • Subtopic url fragment should be non-empty and match the RegEx “^[a-z]+(-[a-z]+)*$” with at most 25 characters. 🏷️ Topic @Lawful2002 #15500
  • Story thumbnail should have background color of #F8BF74, #D68F78, #8EBBB6, or #B3D8F1. 🏷️ Story @soumyo123-prog #15137
  • Exploration category should be one of the fixed list of categories defined by ALL_CATEGORIES in constants.ts. 🏷️ Exploration @Lawful2002 #15342

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:25 (25 by maintainers)

github_iconTop GitHub Comments

1reaction
chiragbaid7commented, Mar 7, 2022

@lkbhitesh07 Thank you for asking I am working on some other issues involving discussion docs and needs to fixed quickly I will start working on this task soon.

0reactions
sahiljoster32commented, Apr 30, 2022

@gopivaibhav Thanks for the reply and for clearing the ambiguity!!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Data validation on insertion, update, and deletion in Core Data
Data validation prevents ending up with invalid data in your database. Validate before insertion, updates, and deletions with these methods.
Read more >
Object Validation - Core Data Programming Guide
Explains how to manage objects using the Core Data framework.
Read more >
The Laws of Core Data | Dave DeLong
Core Data is an “object graph and persistence framework”, ... That means it is a whole bunch of code to help you maintain...
Read more >
Implementing a one-way sync strategy with Core Data ...
A common use of a Core Data store is to cache data from a remote resource locally to support offline functionality of an...
Read more >
Donny Wals: Using Core Data in a Modern SwiftUI Application
Presented at Swift Heroes 2021USING CORE DATA IN A MODERN SWIFTUI APPLICATIONDonny Wals, iOS Engineer, Disney Streaming ServicesLearn how ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found