Gather more data for the chatbot's database via publicly available datasets
See original GitHub issueRequirement
The sentences.csv
file has very limited data which can be used for the initial training. The aim is to gather more data via publicly available datasets and sources to help improve the responses of the bot via ML models.
Pre-requisite
- Elementary knowledge of Python
- Elementary understanding of the available data
Dependencies None
Description This is an open-ended issue where participants can explore various sources to gather the data required for improving the bot’s NLP capabilities. Depending on the data, it may or may not require some elementary pre-processing before getting added to the available data. A separate issue might be created for the pre-processing if needed later.
A good point to start here would be to look for common conversation examples like ‘Hello’, ‘How’re you’, ‘That’s good to hear’ which are labelled as ‘C’ in the sentences.csv
file. Looking for data based on the different labels might be easier.
Issue Analytics
- State:
- Created 4 years ago
- Comments:18 (12 by maintainers)
Update - Just fixing one problem in my code. Will then be ready to create a PR
Working on it. MySQL is giving an issue, will be making the pull request by tomorrow definitely.