How can I parse a string of text messages into an object

How can I parse a string of text messages into an object - javascript

I want to parse text messages contained in a template string to the following JS object. Every text message is seperated by a new line and after the authors name there is a colon. The content of the message can also include new lines, square brackets and colons. What is your preferred way of solving this?
let string = `[03.12.21, 16:12:52] John Doe: Two questions:
How are you? And is lunch at 7 fine?
[03.12.21, 16:14:30] Jane Doe: Im fine. 7 sounds good.`;
let data = {
"03.12.21, 16:12:52": {
"author": "John Doe",
"content": "Two questions:\nHow are you? And is lunch at 7 fine?"
},
"03.12.21, 16:14:30": {
"author": "John Doe",
"content": "Im fine. 7 sounds good."
},
}; // will be parseString()
function parseString() {
// ?
}

like this?
let string = `[03.12.21, 16:12:52] John Doe: Two questions:
How are you? And is lunch at 7 fine?
[03.12.21, 16:14:30] Jane Doe: Im fine. 7 sounds good.`;
let chunks = string.split(/^\[(\d\d\.\d\d\.\d\d, \d\d\:\d\d\:\d\d)\](.*?):/m);
let data = {};
for (let i = 3; i < chunks.length; i += 3) {
const date = chunks[i - 2];
const author = chunks[i - 1].trim();
const content = chunks[i].trim();
data[date] = {
author,
content,
};
}
console.log(data);
.as-console-wrapper{top:0;max-height:100%!important}

I would split the to whole string into little pieces like: let string = date + author + content. With that its much simpler. Or use something like string.split()

Something like this?
function parseMessage(message) {
var messageObj = {};
var messageArray = message.split(" ");
messageObj.command = messageArray[0];
messageObj.args = messageArray.slice(1);
return messageObj;
}

Related

How to get a substring in JS

I have a search input and I need to bold a mathcing part of the string in result.
For example:
input: mac
Search results:
mac book pro 16
iMac 27"
Important macOS tips
I tried do something like that:
let results = document.querySelectorAll('.search-result-child');
results.forEach(result => {
let resultText = result.children[0].innerText;
let startText = resultText.toLowerCase().indexOf(searchInput.value.toLowerCase());
let matchingWord = resultText.toLowerCase().substring(startText);
let newWord = `${resultText.substring(0, startText)}<b>${matchingWord}</b>${partAfterMatchingWordHere}`;
result.children[0].innerHTML = newWord;
})
But in that case I don't know how to get the end index
So in word "mac book pro" - the first index need to be 0 and the last need to be 2.
If you have a solution for it or a best way to do that please help me

I was able to do it thanks to https://stackoverflow.com/users/12270289/klaycon
Code:
let results = document.querySelectorAll('.search-result-child');
results.forEach(result => {
let resultText = result.children[0].innerText;
let newWord = resultText.toLowerCase().replace(searchInput.value.toLowerCase(), `<b>${searchInput.value}</b>`);
result.children[0].innerHTML = newWord;
})

You can add (String.length-1) to the start index to get the ending index

Replacing and trimming messages

I'm trying to toLowerCase, replace and then trim that replace so something like
Hello [- the re _- wo rl d.
will turn into a string with no spaces or punctuation like
hellothereworld
I'm using this
let msg = message.content.toLowerCase()
let originalMessage = msg.split(" ");
let removePunctuation = originalMessage.toString().replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,"");
let checkMessage = removePunctuation.toString().trim();
then this to filter the message:
for (let i = 0; i < checkMessage.length; i++) {
if(allowed.includes(checkMessage[i])) {
continue;
}
if(curse.includes(checkMessage[i])) {
const filterViolation = new Discord.RichEmbed()
.setColor('#ff0000')
.setAuthor(message.author.username, message.author.avatarURL)
.setTitle('Filter Violation')
.setDescription(`${rule} **Profanity.** \n${ifcont}\n${ifmistake}`)
.setTimestamp()
.setFooter(copyright);
message.delete()
message.author.send(filterViolation)
message.channel.send(msgvio).then(msg => {msg.delete(15000)})
logger.write(logviolation)
}
//More code
}
Thanks to Tarazed for the help with the filter in general.
But when I type any message that goes against the filter, nothing happens and no errors are thrown in the console. Any ideas on what I did wrong?
New Code:
let checkMessage = message.content.toLowerCase().replace(/[^\w ]/g,"");
console.log(checkMessage);
// let checkMessage = msg.split(" ");
for (let i = 0; i < checkMessage.length; i++) {
if(checkMessage.includes(allowed[i])) {
continue;
}
//--------------------------------------- CURSE
if(checkMessage.includes(curse[i])) {
const filterViolation = new Discord.RichEmbed()
.setColor('#ff0000')
.setAuthor(message.author.username, message.author.avatarURL)
.setTitle('Filter Violation')
.setDescription(`${rule} **Profanity.** \n${ifcont}\n${ifmistake}`)
.setTimestamp()
.setFooter(copyright);
message.delete()
message.author.send(filterViolation)
message.channel.send(msgvio).then(msg => {msg.delete(15000)})
logger.write(logviolation)
}
}
It logs the check message correctly console.log(checkMessage); but it doesn't go through the filter, neither allowed or unallowed.
It logs the message correctly but the word violates the filter, but nothing is done.
New Code 2:
if(curse.some(word => checkMessage.includes(word) && !allowed.some(allow => allow.includes(word) && checkMessage.includes(allow)))) {
const filterViolation = new Discord.RichEmbed()
.setColor('#ff0000')
.setAuthor(message.author.username, message.author.avatarURL)
.setTitle('Filter Violation')
.setDescription(`${rule} **Profanity.** \n${ifcont}\n${ifmistake}`)
.setTimestamp()
.setFooter(copyright);
message.delete()
message.author.send(filterViolation)
message.channel.send(msgvio).then(msg => {msg.delete(15000)})
logger.write(logviolation)
return;
}

Your current problem is that you're iterating over literally just a string like "test" and checking if each letter is a curse word.
That is, checkMessage[0] will be 't', checkMessage[1] will be 'e', etc. I'm guessing no single character will match anything in your curse or even allowed array.
You can get rid of the loop entirely and simply check if(curse.includes(checkMessage)), the entire message... but be wary this may easily return false positives. Heaven forbid someone sends a message like "it was a nice pic until i looked closer" and pic until put together triggers a certain c-word in your filter.
Anyway, I'd like to also point out that your code for stripping out spaces and punctuation does some rather strange things. I'll comment what I expect to be happening at each stage. Assume an input message of " Hello - world. "
let originalMessage = msg //" hello - world. "
.split(" "); //["", "hello", "-", "world.", ""]
let removePunctuation = originalMessage
.toString() //",hello,-,world.," (why even split?)
.replace(/[.,\/#!$%\^&\*;:{}=\-_`~()]/g,""); //"helloworld"
let checkMessage = removePunctuation
.toString() //"helloworld" (does nothing ever)
.trim(); //"helloworld" (does nothing ever)
You can achieve the same thing very easily by using the regex class \W which matches all non-word characters:
let checkMessage = message.content.toLowerCase().replace(/\W/g,""); //"helloworld"
It looks like there's still issues related to the loop. I suggest using Array#some to test if the message contains any curse words:
let checkMessage = message.content.toLowerCase().replace(/[^\w ]/g,"");
if(curse.some(word => //search through the curse words
checkMessage.includes(word) && //if message has this curse word
!allowed.some(allow => //and there's no allowed word which:
allow.includes(word) && //1. contains this curse word
checkMessage.includes(allow) //2. is in the message
)
)) {
//then send violation
}

Since checkMessage is a string, your for loop is looping over and testing individual characters.
For example:
let checkMessage = "hellothereworld";
for (let i = 0; i < checkMessage.length; i++) {
console.log(checkMessage[i]);
}
Perhaps you'd rather keep the spaces in your string and operate on individual words:
let originalMessage = "Hello [-the world"
// Use regex that keeps spaces
let removePunctuation = originalMessage.toString().replace(/[^\w ]/g,"");
let checkMessage = removePunctuation.toString().trim();
// Split into words
let messageArray = checkMessage.split(' ');
// Loop over words
for (let i = 0; i < messageArray.length; i++) {
console.log(messageArray[i]);
}

add quotes to each item in a text file

I want to use this list of bad words for my input filtering. It's a plain list right now, but I need to convert it to JSON for my server to use.
I don't want to go through each line and add quotes and a ,. Is there a regex or fast way to add " ", to each line in a txt file?
Such that:
2g1c
2 girls 1 cup
acrotomophilia
alabama hot pocket
alaskan pipeline
Becomes
"2g1c",
"2 girls 1 cup",
"acrotomophilia",
"alabama hot pocket",
"alaskan pipeline",
...

Use backtick `
var txt=`2g1c
2 girls 1 cup
acrotomophilia
alabama hot pocket
alaskan pipeline`;
var arrayUntrimmed = txt.split("\n");
var array=arrayUntrimmed.map(function(a){return a.trim()});
(Note: This ECMAScript 6 feature supported from Firefox 34 and Chrome 41)

All you have to do is split the string at the new lines and drop the last item in the array (since it's empty).
var txt = '2g1c\n2 girls 1 cup\nacrotomophilia\nalabama hot pocket\nalaskan pipeline\n';
var array = txt.split('\n').slice(0, -1);
console.log(array)
You can then use Array.prototype.some as a predicate method to find out if a given string contains one or more of the blacklisted words.
var txt = '2g1c\n2 girls 1 cup\nacrotomophilia\nalabama hot pocket\nalaskan pipeline\n';
var array = txt.split('\n').slice(0, -1);
var input1 = 'not bad';
var input2 = 'An alaskan pipeline is quite creative...';
var input1HasBadWords = array.some(function (word) {
return input1.indexOf(word) > -1;
});
var input2HasBadWords = array.some(function (word) {
return input2.indexOf(word) > -1;
});
console.log('input1 is: ' + input1HasBadWords);
console.log('input2 is: ' + input2HasBadWords);
Your controller would look something like so:
const fs = require('fs');
app.post('/route', (req, res) => {
fs.readFile('/etc/hosts', 'utf8', (err, data) => {
if (err) {
res.sendStatus(500);
}
const badWords = data.split('\n').slice(0, -1);
const hasBadWords = badWords.some((word) => {
return req.body.input.indexOf(word) > -1;
});
if(hasBadWords) {
res.send('Dirty mouth? Clean it with orbit!');
} else {
res.send('You are very polite');
}
});
});

http://pastebin.com/U5phzWUM
I guess the easiest way is to use a software for this. It took me 30 sec to do this with SublimeText
http://www.sublimetext.com/docs/selection

You can use readline module. read and add quotes in each line.
readline: https://nodejs.org/api/readline.html

Javascript, split a string in 4 pieces, and leave the rest as one big piece

I'm building a Javascript chat bot for something, and I ran into an issue:
I use string.split() to tokenize my input like this:
tokens = message.split(" ");
Now my problem is that I need 4 tokens to make the command, and 1 token to have a message.
when I do this:
!finbot msg testuser 12345 Hello sir, this is a test message
these are the tokens I get:
["!finbot", "msg", "testuser", "12345", "Hello", "sir,", "this", "is", "a", "test", "message"]
However, how can I make it that it will be like this:
["!finbot", "msg", "testuser", "12345", "Hello sir, this is a test message"]
The reason I want it like this is because the first token (token[0]) is the call, the second (token[1]) is the command, the third (token[2]) is the user, the fourth (token[3]) is the password (as it's a password protected message thing... just for fun) and the fifth (token[4]) is the actual message.
Right now, it would just send Hello because I only use the 5th token.
the reason why I can't just go like message = token[4] + token[5]; etc. is because messages are not always exactly 3 words, or not exactly 4 words etc.
I hope I gave enough information for you to help me.
If you guys know the answer (or know a better way to do this) please tell me so.
Thanks!

Use the limit parameter of String.split:
tokens = message.split(" ", 4);
From there, you just need to get the message from the string. Reusing this answer for its nthIndex() function, you can get the index of the 4th occurrence of the space character, and take whatever comes after it.
var message = message.substring(nthIndex(message, ' ', 4))
Or if you need it in your tokens array:
tokens[4] = message.substring(nthIndex(message, ' ', 4))

I would probably start by taking the string like you did, and tokenizing it:
const myInput = string.split(" "):
If you're using JS ES6, you should be able to do something like:
const [call, command, userName, password, ...messageTokens] = myInput;
const message = messageTokens.join(" ");
However, if you don't have access to the spread operator, you can do the same like this (it's just much more verbose):
const call = myInput.shift();
const command = myInput.shift();
const userName = myInput.shift();
const password = myInput.shift();
const message = myInput.join(" ");
If you need them as an array again, now you can just join those parts:
const output = [call, command, userName, password, message];

If you can use es6 you can do:
let [c1, c2, c3, c4, ...rest] = input.split (" ");
let msg = rest.join (" ");

You could revert to regexp given that you defined your format as "4 tokens of not-space separated with spaces followed by message":
function tokenize(msg) {
return (/^(\S+) (\S+) (\S+) (\S+) (.*)$/.exec(msg) || []).slice(1, 6);
}
This has the perhaps unwanted behaviour of returning an empty array if your msg does not actually match the spec. Remove the ... || [] and handle accordingly, if that's not acceptable. The amount of tokens is also fixed to 4 + the required message. For a more generic approach you could:
function tokenizer(msg, nTokens) {
var token = /(\S+)\s*/g, tokens = [], match;
while (nTokens && (match = token.exec(msg))) {
tokens.push(match[1]);
nTokens -= 1; // or nTokens--, whichever is your style
}
if (nTokens) {
// exec() returned null, could not match enough tokens
throw new Error('EOL when reading tokens');
}
tokens.push(msg.slice(token.lastIndex));
return tokens;
}
This uses the global feature of regexp objects in Javascript to test against the same string repeatedly and uses the lastIndex property to slice after the last matched token for the rest.
Given
var msg = '!finbot msg testuser 12345 Hello sir, this is a test message';
then
> tokenizer(msg, 4)
[ '!finbot',
'msg',
'testuser',
'12345',
'Hello sir, this is a test message' ]
> tokenizer(msg, 3)
[ '!finbot',
'msg',
'testuser',
'12345 Hello sir, this is a test message' ]
> tokenizer(msg, 2)
[ '!finbot',
'msg',
'testuser 12345 Hello sir, this is a test message' ]
Note that an empty string will always be appended to returned array, even if the given message string contains only tokens:
> tokenizer('asdf', 1)
[ 'asdf', '' ] // An empty "message" at the end

Select word between two words

How can I create a function that selects everything between the words X and Y and pushes it to an array.
By Greili - 4 Hours and 40 Minutes ago.
#NsShinyGiveaway
0 comments
By ToneBob - 4 Hours and 49 Minutes ago.
#NsShinyGiveaway
0 comments
By hela222 - 5 Hours and 14 Minutes ago.
#NsShinyGiveaway
sure why not? XD
0 comments
By NovaSplitz - 5 Hours and 45 Minutes ago.
#NsShinyGiveaway Enjoy life off PokeHeroes buddy.
0 comments
Given the text above, I want to push each word after "By" and before SPACE onto an array. The result must be something like this:
name[0] = "Greili"
name[1] = "ToneBob"
name[2] = "hela222"

Here's a quick split and reduce:
var arr = str.split("By ").reduce(function(acc, curr) {
curr && acc.push(curr.split(" ")[0]); return acc;
}, []);
Result:
["Greili", "ToneBob", "hela222", "NovaSplitz"]
Demo: JSFiddle

Try using a regular expression:
var regex = /By ([^\s]+)\s/g;
var s = 'string to search goes here';
var names = [];
var result;
do {
result = regex.exec(s);
if (result) {
names.push(result[1]);
}
} while (result);
JSFiddle Example

I see the word you want is always the second word, so that's an easier way of solving the problem. You could split the string on each space, and then you have an array of words, where the word at index 1 is the name you want. Then add each name to a new array.
var words = "By Greili ...".split(" ");
var name = words[1]; // "Greili"
var namesArray = [];
namesArray.push(name);
You'd need to do that for each of your comment strings, in a loop.

Develop Reference

JavaScript is the programming language of the Web.

How can I parse a string of text messages into an object - javascript

I would split the to whole string into little pieces like: let string = date + author + content. With that its much simpler. Or use something like string.split()

Something like this? function parseMessage(message) { var messageObj = {}; var messageArray = message.split(" "); messageObj.command = messageArray[0]; messageObj.args = messageArray.slice(1); return messageObj; }

Related

How to get a substring in JS

Replacing and trimming messages

add quotes to each item in a text file

Javascript, split a string in 4 pieces, and leave the rest as one big piece

Select word between two words

Categories

Resources