LSTM for Hate Speech
See original GitHub issueI am trying to train a classifier which can classify hate speech mainly detect toxic comments from any data request. The final work of the code is that once i integrate it the system (a blog) i will use it to classify all the text that is toxic and deny posts…Seems fair enough 😄
The Statement : i am still leaning into the system and as far as i know LSTM is best used for training text data (suggestions welcome) . I am using Toxic Training data set train the network. The CSV file contains data as text and tags as, Toxic,Insult,Severe…, After cleaning the data and getting it into JSON format for API’s the final training data looks like:
#Removed the hard language in the toxic example #The Processed Training set is almost 150,000
{
"input": "You, sir, are my hero. Any chance you remember what page that's on?",
"output": "safe"
},
{
"input": "Congratulations from me as well, use the tools well "",
"output": "safe"
},
{
"input": "DONT PISS AROUND ON MY WORK",
"output": "toxic"
},
{
"input": "Your vandalism to the Matt Shirvington article has been reverted. Please don't do it again, or you will be banned.",
"output": "safe"
}
CODE:
//Choosing Net
const net = new brain.recurrent.LSTM();
const readdata = readJson('dataset-01.json');
function readJson(datafile) {
var obj = JSON.parse(fs.readFileSync(datafile));
return obj;
}
//Train
net.train(readdata, {
iterations: 1000,
errorThresh: 0.005,
log: true,
logPeriod: 10,
learningRate: 0.3,
momentum: 0.1,
callback: null,
callbackPeriod: 10,
timeout: Infinity
});
net.run('I Hate you');
Problem
Its taking a huge time for learning , not that it would matter to me, as save to JSON/function are there but the main issue is that despite of giving output in two forms ‘Safe’/‘Toxic’ it return garbage values For example " /H. Kaks". Any Help? I used another data form in which output was given in array and each index indicate a data point but again it wont get the desired output
[
{
"input": "D'aww! He matches this background colour I'm seemingly stuck with. Thanks. (talk) 21:51, January 11, 2016 (UTC)",
"output": [
0,
0,
0,
0,
0
]
},
{
"input": "Hey man, I'm really not trying to edit war. It's just that this guy is constantly removing relevant information and talking to me through edits instead of my talk page. He seems to care more about the formatting than the actual info.",
"output": [
0,
0,
0,
0,
0
]
},
{
"input": "Dude, I hate your face",
"output": [
0,
0,
1,
0,
0
]
}
Issue Analytics
- State:
- Created 4 years ago
- Comments:17 (1 by maintainers)
Top GitHub Comments
@equan4647 dont be fat share it with us
Hmm, still after a lot of training I am getting random stuff. But I think I know your issue now that I read over the code again, it is interpreting it as if you want to replicate that text and thus it is doing just that. I have to go now so I cannot directly help but just look at this github page, its where I learned how to do what you’re doing https://github.com/bradtraversy/brainjs_examples/blob/master/02_hardware-software.js
If you’re curious, here’s 25000 iterations,
libe MY WORKs aellatinnone MY WORKs aellatinnone MY WORKs aellatinnone MY WORKs aellatinnone