question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Gather more data for the chatbot's database via publicly available datasets

See original GitHub issue

Requirement The sentences.csv file has very limited data which can be used for the initial training. The aim is to gather more data via publicly available datasets and sources to help improve the responses of the bot via ML models.

Pre-requisite

  • Elementary knowledge of Python
  • Elementary understanding of the available data

Dependencies None

Description This is an open-ended issue where participants can explore various sources to gather the data required for improving the bot’s NLP capabilities. Depending on the data, it may or may not require some elementary pre-processing before getting added to the available data. A separate issue might be created for the pre-processing if needed later.

A good point to start here would be to look for common conversation examples like ‘Hello’, ‘How’re you’, ‘That’s good to hear’ which are labelled as ‘C’ in the sentences.csv file. Looking for data based on the different labels might be easier.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:18 (12 by maintainers)

github_iconTop GitHub Comments

1reaction
radhikasethi2011commented, Mar 23, 2020

Update - Just fixing one problem in my code. Will then be ready to create a PR

1reaction
radhikasethi2011commented, Mar 20, 2020

Working on it. MySQL is giving an issue, will be making the pull request by tomorrow definitely.

Read more comments on GitHub >

github_iconTop Results From Across the Web

14 Best Chatbot Datasets for Machine Learning - iMerit
We at iMerit have compiled a list of the most successful and commonly-used datasets that are perfect for anyone looking to train a...
Read more >
24 Best Machine Learning Datasets for Chatbot Training
Best ML datasets for chatbot training. Chatbot training datasets from multilingual dataset to dialogues and customer support chatbots.
Read more >
Training a Chatbot: How to Decide Which Data Goes to Your AI
Training a Chatbot: How to Decide Which Data Goes to Your AI ; How to Collect Data for Your Chatbot. Gather Data from...
Read more >
Top 15 Chatbot Datasets for NLP Projects - HackerNoon
Top 15 Chatbot Datasets for NLP Projects · Question-Answer Dataset · The WikiQA Corpus · Yahoo Language Data · TREC QA Collection ·...
Read more >
Where to get Chatbot Training Data (and what it is)
On a fundamental level, a chatbot turns raw data into a conversation. This data is usually unstructured (sometimes called unlabelled data, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found