.split() on elements of a sentence string, advanced separator - javascript

I want to be able to split a sentence string into an array of individual word strings.
sentenceArr = 'I take the dog to the park'
sentenceArr.split(' ');
Desired result: ['I', 'take', 'the', 'dog', 'to', 'the', 'park']
This is easy if they are just split by spaces as above, but if there are commas or double spaces, or RegExes in the string it can come unstuck.
sentenceArr = 'I take,the dog to\nthe park'
sentenceArr.split(' ');
How can I modify the split() separator argument to account for these irregularities?
Ideally, I want to be able to split anywhere there isn't a letter.

split also takes a regex as argument :
sentenceArr = 'I take,the dog to\nthe park'
var r= sentenceArr.split(/\W+/);
console.log(r)

Related

How to use string.include(substring) to test presence

Suppose there are five words (substring) cat, dog, elephant, tiger, lion and there is a sentence string -- > "We have one cat in our home". Now i need to see which of the substring is present in the string.
So how can we use string.include(substring) to check if any of the substring in present in string and if true which substring is it.
You can store the words to find in an array and use Array.filter to filter out the items that aren't included in the string.
const wordsToFind = ['cat', 'dog', 'elephant', 'tiger', 'lion']
const string = "We have one cat in our home";
const includedWords = wordsToFind.filter(e => string.includes(e))
const doesInclude = includedWords.length != 0;
console.log('Does include? ', doesInclude)
console.log('Included words: ', includedWords)

How to split text by regex not in quotation marks

I am using text.split(' ') to split text by 'space'.
Example:
Hi my name is John
to
['Hi', 'my', 'name', 'is', 'John'];
I would like to ignore spaces in question mark.
Hi pls 'DO NOT SPLIT THIS'
to
['Hi', 'pls', 'DO NOT SPLIT THIS']
How can I do this?
Thank you for any help!
How about the following?
regex = /\s+(?=(?:[^\'"]*[\'"][^\'"]*[\'"])*[^\'"]*$)/g
"Hi pls 'DO NOT SPLIT THIS'".split(regex)
// [ 'Hi', 'pls', "'DO NOT SPLIT THIS'" ]

How can I split commas and periods from words inside of string using split?

I am trying to change specific word in a string with something else. For example, I want to change 'John' in let name = 'Hi, my name is John.'; to 'Jack'.
I know how to split a string by words or characters. I also know how to remove commas, periods, and other symbols in a string. However, if I split the given string with a separator (" "), I will have 'John.' which I do not want. (I know I can switch 'John.' with 'Jack.' but assume that I have an key and value pairs in an object and I am using the values which are names {Father: Jack, Mother: Susan, ...}
I don't know how to separate a string word by word including commas and periods.
For example, if I was given an input which is a string:
'Hi, my name is John.'
I want to split the input as below:
['Hi', ',', 'my', 'name', 'is', 'John', '.']
Does anyone know how to do it?
Below is the challenge I am working on.
Create a function censor that accepts no arguments. censor will return a function that will accept either two strings, or one string. When two strings are given, the returned function will hold onto the two strings as a pair, for future use. When one string is given, the returned function will return the same string, except all instances of a first string (of a saved pair) will be replaced with the second string (of a saved pair).
//Your code here
const changeScene = censor();
changeScene('dogs', 'cats');
changeScene('quick', 'slow');
console.log(changeScene('The quick, brown fox jumps over the lazy dogs.')); // should log: 'The slow, brown fox jumps over the lazy cats.'
I think your real question is "How do I replace a substring with another string?"
Checkout the replace method:
let inputString = "Hi, my name is John.";
let switch1 = ["John", "Jack"];
let switched = inputString.replace(switch1[0], switch1[1]);
console.log(switched); // Hi, my name is Jack.
UPDATE: If you want to get ALL occurrences (g), be case insensitive (i), and use boundaries so that it isn't a word within another word (\\b), you can use RegExp:
let inputString = "I'm John, or johnny, but I prefer john.";
let switch1 = ["John", "Jack"];
let re = new RegExp(`\\b${switch1[0]}\\b`, 'gi');
console.log(inputString.replace(re, switch1[1])); // I'm Jack, or johnny, but I prefer Jack.
You can Try This ...
var string = 'Hi, my name is John.';
//var arr = string.split(/,|\.| /);
var arr = string.split(/([,.\s])/);
console.log(arr);
Using 'Hi, my name is John.'.split(/[,. ]/); will do the job. It will split commas and periods and spaces.
Edit: For those who want to keep the comma and period, here is my wildly inefficient method.
var str = 'Hi, my name is John.'
str = str.replace('.', 'period');
str = str.replace(',', 'comma');
str = str.split(/[,. ]/);
for (var i = 0; i < str.length; i++) {
if (str[i].indexOf('period') > -1) {
str[i] = str[i].replace('period', '');
str.splice(i+1, 0, ".");
} else if (str[i].indexOf('comma') > -1) {
str[i] = str[i].replace('comma', '');
str.splice(i+1, 0, ",");
}
}
console.log(str);

Why is this regex matching also words within a non-capturing group?

I have this string (notice the multi-line syntax):
var str = ` Number One: Get this
Number Two: And this`;
And I want a regex that returns (with match):
[str, 'Get this', 'And this']
So I tried str.match(/Number (?:One|Two): (.*)/g);, but that's returning:
["Number One: Get this", "Number Two: And this"]
There can be any whitespace/line-breaks before any "Number" word.
Why doesn't it return only what is inside of the capturing group? Am I misundersating something? And how can I achieve the desired result?
Per the MDN documentation for String.match:
If the regular expression includes the g flag, the method returns an Array containing all matched substrings rather than match objects. Captured groups are not returned. If there were no matches, the method returns null.
(emphasis mine).
So, what you want is not possible.
The same page adds:
if you want to obtain capture groups and the global flag is set, you need to use RegExp.exec() instead.
so if you're willing to give on using match, you can write your own function that repeatedly applies the regex, gets the captured substrings, and builds an array.
Or, for your specific case, you could write something like this:
var these = str.split(/(?:^|\n)\s*Number (?:One|Two): /);
these[0] = str;
Replace and store the result in a new string, like this:
var str = ` Number One: Get this
Number Two: And this`;
var output = str.replace(/Number (?:One|Two): (.*)/g, "$1");
console.log(output);
which outputs:
Get this
And this
If you want the match array like you requested, you can try this:
var getMatch = function(string, split, regex) {
var match = string.replace(regex, "$1" + split);
match = match.split(split);
match = match.reverse();
match.push(string);
match = match.reverse();
match.pop();
return match;
}
var str = ` Number One: Get this
Number Two: And this`;
var regex = /Number (?:One|Two): (.*)/g;
var match = getMatch(str, "#!SPLIT!#", regex);
console.log(match);
which displays the array as desired:
[ ' Number One: Get this\n Number Two: And this',
' Get this',
'\n And this' ]
Where split (here #!SPLIT!#) should be a unique string to split the matches. Note that this only works for single groups. For multi groups add a variable indicating the number of groups and add a for loop constructing "$1 $2 $3 $4 ..." + split.
Try
var str = " Number One: Get this\
Number Two: And this";
// `/\w+\s+\w+(?=\s|$)/g` match one or more alphanumeric characters ,
// followed by one or more space characters ,
// followed by one or more alphanumeric characters ,
// if following space or end of input , set `g` flag
// return `res` array `["Get this", "And this"]`
var res = str.match(/\w+\s+\w+(?=\s|$)/g);
document.write(JSON.stringify(res));

How to Split string with multiple rules in javascript

I have this string for example:
str = "my name is john#doe oh.yeh";
the end result I am seeking is this Array:
strArr = ['my','name','is','john','&#doe','oh','&yeh'];
which means 2 rules apply:
split after each space " " (I know how)
if there are special characters ("." or "#") then also split but add the characther "&" before the word with the special character.
I know I can strArr = str.split(" ") for the first rule. but how do I do the other trick?
thanks,
Alon
Assuming the result should be '&doe' and not '&#doe', a simple solution would be to just replace all . and # with & split by spaces:
strArr = str.replace(/[.#]/g, ' &').split(/\s+/)
/\s+/ matches consecutive white spaces instead of just one.
If the result should be '&#doe' and '&.yeah' use the same regex and add a capture:
strArr = str.replace(/([.#])/g, ' &$1').split(/\s+/)
You have to use a Regular expression, to match all special characters at once. By "special", I assume that you mean "no letters".
var pattern = /([^ a-z]?)[a-z]+/gi; // Pattern
var str = "my name is john#doe oh.yeh"; // Input string
var strArr = [], match; // output array, temporary var
while ((match = pattern.exec(str)) !== null) { // <-- For each match
strArr.push( (match[1]?'&':'') + match[0]); // <-- Add to array
}
// strArr is now:
// strArr = ['my', 'name', 'is', 'john', '&#doe', 'oh', '&.yeh']
It does not match consecutive special characters. The pattern has to be modified for that. Eg, if you want to include all consecutive characters, use ([^ a-z]+?).
Also, it does nothing include a last special character. If you want to include this one as well, use [a-z]* and remove !== null.
use split() method. That's what you need:
http://www.w3schools.com/jsref/jsref_split.asp
Ok. i saw, you found it, i think:
1) first use split to the whitespaces
2) iterate through your array, split again in array members when you find # or .
3) iterate through your array again and str.replace("#", "&#") and str.replace(".","&.") when you find
I would think a combination of split() and replace() is what you are looking for:
str = "my name is john#doe oh.yeh";
strArr = str.replace('\W',' &');
strArr = strArr.split(' ');
That should be close to what you asked for.
This works:
array = string.replace(/#|\./g, ' &$&').split(' ');
Take a look at demo here: http://jsfiddle.net/M6fQ7/1/

Categories

Resources