spliting text into array in javascript - javascript

I want to split when getting , or . in text in javascript.
My text is like this:
The cats climbed the tall tree.In this sentence, $U_SEL{} is a noun.
I want array as:
1.The cats climbed the tall tree.
2.In this sentence
3.$U_SEL{}
4.is a noun

try this
<script type="text/javascript">
var text = "The cats climbed the tall tree.In this sentence, $U_SEL{}, is a noun";
var spliteds = text.split(/[\.,]/);
alert(spliteds[0]);
alert(spliteds[1]);
alert(spliteds[2]);
alert(spliteds[3]);
</script>

The regular expression for this challenge will be.
var text = "The cats climbed the tall tree.In this sentence, $U_SEL{} is a noun."
var regex = /[.,]/;
text.split(regex);
FOR MORE INFORMATION ABOUT regex VISIT
https://developer.mozilla.org/en/docs/Web/JavaScript/Guide/Regular_Expressions

Here is the regex. To split on {} first replace that with {}, or {}., then try split.
var str = "The cats climbed the tall tree.In this sentence, $U_SEL{} is a noun";
str = str.replace("{}", "{},");
//Array with splitted value
var result = str.split(/[,.]/g);
//Printing the result array
console.log(result);

'The cats climbed the tall tree.In this sentence, $U_SEL{} is a noun.'.split(/[\.,]/)
would return:
Array [ "The cats climbed the tall tree", "In this sentence", " $U_SEL{} is a noun", "" ]
Take a look at String.prototype.split()

A Regular Expression is your best option in this case. All the above posts are already correctly covering this way of solving your issue. I am just leaving here another approach that will provide what you are after if you have no idea of how Regular Expressions work.
Take into account though that RegExp is quite optimal choice in your scenario. The above code is mostly to show how it can be done without using RegExps. (Not to mention that it will get chaotic adding more delimiters)
var myString = "The cats climbed the tall tree.In this sentence, $U_SEL{} , is a noun";
var mySplitString = myString.split(",");
var myFinalArray = new Array();
mySplitString.forEach(function(part){
var myTemp = part.split(".");
myTemp.forEach(function(key){
myFinalArray.push(key);
});
});
console.log(myFinalArray);

Maybe split is not accurate because splitting requires a single character delimiter and there is no delimiter for the third element.
Trying capturing rather than splitting may work better (though I don't know if it is wise from performance point of view).
You could try this:
var pattern = /(([^.,]+?)([.,]|\{\})) */g;
var captures = [];
var s = 'First capture.Second capture, $THIRD_CAPTURE{} fourth capture.';
while ( (match = pattern.exec(s)) != null ) {
if (match[3] == "." || match[3] == ",") {
captures.push(match[2]);
} else {
captures.push(match[1]);
}
}
console.log(captures);
var captures = [];
var s = 'The cats climbed the tall tree.In this sentence, $U_SEL{} is a noun.';
while ( (match = pattern.exec(s)) != null ) {
if (match[3] == "." || match[3] == ",") {
captures.push(match[2]);
} else {
captures.push(match[1]);
}
}
console.log(captures);
The principle is as below.
Capture blocks of either a part of the sentence ended by a dot or a comma, without inner dot or comma, or ending with empty brackets pair
Within each block, capture both the content and the ending (either a dot, a comma or an empty brackets pair)
For each resulting match, you have three captures:
At index 1, the first block
At index 3, the ending
At index 2, the content without the ending
Then, according to the ending, either the match of idx 1 or 2 is stored.
You could modify the loop selecting the match to get exactly what you want, with the dot on the first capture and not on the last one, unless it is a typo.

Related

Checking the presence of multiple words in a variable using JavaScript

The code the presence of a single word in a sentence and it's working fine.
var str ="My best food is beans and plantain. Yam is also good but I prefer yam porrage"
if(str.match(/(^|\W)food($|\W)/)) {
alert('Word Match');
//alert(' The matched word is' +matched_word);
}else {
alert('Word not found');
}
Here is my issue: I need to check presence of multiple words in a sentence (eg: food,beans,plantains etc) and then also alert the matched word.
something like //alert(' The matched word is' +matched_word);
I guess I have to passed the searched words in an array as per below:
var words_checked = ["food", "beans", "plantain"];
You can construct a regular expression by joining the array of words by |, then surround it with word boundaries \b:
var words_checked = ['foo', 'bar', 'baz']
const pattern = new RegExp(String.raw`\b(?:${words_checked.join('|')})\b`);
var str = 'fooNotAStandaloneWord baz something';
console.log('Match:', str.match(pattern)[0]);
Here's a way to solve this. Simply loop through the list of words to check, build the regex as you go and check to see if there is a match. You can read up on how to build Regexp objects here
var str ="My best food is beans and plantain. Yam is also good but I prefer
yam porrage"
var words = [
"food",
"beans",
"plantain",
"potato"
]
for (let word of words) {
let regex = new RegExp(`(^|\\W)${word}($|\\W)`)
if (str.match(regex)) {
console.log(`The matched word is ${word}`);
} else {
console.log('Word not found');
}
}
var text = "I am happy, We all are happy";
var count = countOccurences(text, "happy");
// count will return 2
//I am passing the whole line and also the word for which i want to find the number of occurences
// The code splits the string and find the length of the word
function countOccurences(string, word){
string.split(word).length - 1;
}

Search mechanism to include words as a whole [duplicate]

This question already has answers here:
Regex match entire words only
(7 answers)
Closed 4 years ago.
I have created a search mechanism that searches through an array of strings for an exact string match, however I want it to be a bit more intuitive.
I can also get it to search for a string within the string (for example chicken in grilled chicken - however the issue is this allows users to type ken or ill and it returns grilled chicken.
I would like it to return if I typed in chicken or grilled.
Does anyone have any suggestions on how to have a more intuitive search mechanism?
EDIT:
The correct answer below worked when typing 1 word and it would search all individual words in a string. However, I realised it fails when you search with 2 words (as it only searches each string word individually).
I solved this by adding || search == string to the if to include not just individually word matches but whole string matches.
However I am still having an issue with it either searching for:
Whole string matches
OR
Matches with individual words.
This means it fails when search = green cup and string = big green cup. Is there a way to solve this by cutting for collections to search within? Perhaps something similar to:
string.split(' ') but to also include big green, green cup to the array also?
Try This Simplest Code without Regex
var data = ["first string1 is here", "second string2 is here", "third string3 is here"];
var wordToSearch = "string2 is thanks";
var broken = wordToSearch.split(' ');
var result = 'not found';
if(broken.length == 1){
data.forEach(function(d){
d1 = d.split(' ');
if(d1.includes(wordToSearch))
result = d;
});
}
else if(broken.length == 2)
{
data.forEach(function(d){
var d1 = d.split(' ');
if(d1.includes(broken[0]) && d1.includes(broken[1]))
{
result = d;
}
});
}
alert(result);
I'd use RegExp with word boundary anchor - \b.
function search(query, arr) {
var res = [];
var re = new RegExp('\\b' + query + '\\b');
arr.forEach(function (item) {
if (re.test(item)) res.push(item);
});
return res;
}
It sounds like you only want to search by whole words, if that's the case, you could split the string by the space character and then search through the resultant array for matches.

Javascript regular expression for matching whole words including special characters

I am trying to match whole exact words using a javascript regular expression.
Given the strings: 1) "I know C++." and 2) "I know Java."
I have tried using new Regex('\\b' + text + '\\b', 'gi') and that works great for words without special characters like example #2.
I've also taken a look at this url:
Regular expression for matching exact word affect the special character matching
and implemented the:
escaped = escaped.replace(/^(\w)/, "\\b$1");
escaped = escaped.replace(/(\w)$/, "$1\\b");
and that will match text = 'C++' (it will match both examples)
However, if someone types a typo, and the string is "I know C++too.", the latter regex will still match the C++ when I don't want it to because the word "C++too" is not an exact match for text = 'C++'.
What changes can I make so that it will not match unless C++ is both the front of the word and the end of the word.
You can add a range of accepted characters([+#]) after word characters:
str = 'I know C++too. I know Java and C#.';
console.log(str.match(/(\w[+#]+|\w+)/g));
NB: \w[+#]+ must be placed first in the alternation expression to take precedence over the more generic \w+.
If whole words including special characters means everything but [\r\n\t\f\v ], you can simply do:
const REGEX = /([^\s]+)+/g;
function selectWords(string) {
const REGEX = /([^\s]+)+/g;
return string
// remove punctuation
.replace(/[^a-z0-9\s+#]/ig, "")
// perform the match
.match(REGEX)
// prevent null returns
|| []
;
}
var text = "Hello World"
var [first, second, ...rest] = selectWords(text);
console.log(1, {first, second, rest});
// example with punctuation
var text = "I can come today, she said, but not tomorrow."
var [first, second, third, ...rest] = selectWords(text);
console.log(2, {first, second, third, rest});
// example with possible throw
var text = ",.'\"` \r"
var [first, second, third, ...rest] = selectWords(text);
console.log(3, {first, second, third, rest});
// example with a specific word to be matched
function selectSpecificWord(string, ...words) {
return selectWords(string)
.filter(word => ~words.indexOf(word))
;
}
var expected = "C++";
var test = "I know C++";
var test1 = "I know C++AndJava";
console.log("Test Case 1", selectSpecificWord(test, expected));
console.log("Test Case 2", selectSpecificWord(test1, expected));
Use this ((?:(?:\w)+?)(?=\b|\w[-+]{2,2})(?:[-+]{2,2})?)
I've included a - symbol for an example also. See it in life.

Splitting a string by white space and a period when not surrounded by quotes

I know that similar questions have been asked many times, but my regular expression knowledge is pretty bad and I can't get it to work for my case.
So here is what I am trying to do:
I have a text and I want to separate the sentences. Each sentence ends with some white space and a period (there can be one or many spaces before the period, but there is always at least one).
At the beginning I used /\s+\./ and it worked great for separating the sentences, but then I noticed that there are cases such as this one:
"some text . some text".
Now, I don't want to separate the text in quotes. I searched and found a lot of solutions that work great for spaces (for example: /(".*?"|[^"\s]+)+(?=\s*|\s*$)/), but I was not able to modify them to separate by white space and a period.
Here is the code that I am using at the moment.
var regex = /\s+\./;
var result = regex.exec(fullText);
if(result == null) {
break;
}
var length = result[0].length;
var startingPoint = result.index;
var currentSentence = fullText.substring(0,startingPoint).trim();
fullText = fullText.substring(startingPoint+length);
I am separating the sentences one by one and removing them from the full text.
The length var represents the size of the portion that needs to be removed and startingPoint is the position on which the portion starts. The code is part of a larger while cycle.
Instead of splitting you may try and match all sentences between delimiters. In this case it will be easier to skip delimiters in quotes. The respective regex is:
(.*?(?:".*?".*?)?|.*?)(?: \.|$)
Demo: https://regex101.com/r/iS9fN6/1
The sentences then may be retrieved in this loop:
while (match = regex.exec(input)) {
console.log(match[1]); // each next sentence is in match[1]
}
BUT! This particular expression makes regex.exec(input) return true infinitely! (Looks like a candidate to one more SO question.)
So I can only suggest a workaround with removing the $ from the expression. This will cause the regex to miss the last part which later may be extracted as a trailer not matched by the regex:
var input = "some text . some text . \"some text . some text\" some text . some text";
//var regex = /(.*?(?:".*?".*?)?|.*?)(?: \.|$)/g;
var regex = /(.*?(?:".*?".*?)?|.*?) \./g;
var trailerPos = 0;
while (match = regex.exec(input)) {
console.log(match[1]); // each next sentence is in match[1]
trailerPos = match.index + match[0].length;
}
if (trailerPos < input.length) {
console.log(input.substring(trailerPos)); // the last sentence in
// input.substring(trailerPos)
}
Update:
If sentences span multiple lines, the regex won't work since . pattern does not match the newline character. In this case just use [\s\S] instead of .:
var input = "some \ntext . some text . \"some\n text . some text\" some text . so\nm\ne text";
var regex = /([\s\S]*?(?:"[\s\S]*?"[\s\S]*?)?|[\s\S]*?) \./g;
var trailerPos = 0;
var sentences = []
while (match = regex.exec(input)) {
sentences.push(match[1]);
trailerPos = match.index + match[0].length;
}
if (trailerPos < input.length) {
sentences.push(input.substring(trailerPos));
}
sentences.forEach(function(s) {
console.log("Sentence: -->%s<--", s);
});
Use the encode and decode of javascript while sending and receiving.

Parse string with JS to pull out words between several special characters

I have a string like:
lorem ipsum !bang #hash #hash2 ^caret word #at sym
I am trying to pull out the words beginning with the various characters - sometimes they can have a space in them, and there can be multiple of each type. So I want to convert this string to a set of values such as:
text: "lorem ipsum"
!: "bang"
#: ["hash", "hash2"]
^: "caret word"
#: "at sym"
It always starts with just the plain string. So far, I've tried a few things, but can't quite get it. For example, I tried splitting the string into an array of words, which kind of works except for spaces:
var list = str.split(' ');
var hashes=[];
$.each(list,function(i,val){
if(list[i].indexOf('#') == 0){
hashes.push(tagslistarr[i]);
}
});
//Result: [hash, hash2]
So this works, but obviously breaks as some can have spaces. It also means duplicating that code for each of the ^ # ! #.
I also tried:
str.substring(str.lastIndexOf("^")+1,str.lastIndexOf("#"));
Which does return caret word with the space correctly, but fails to take account of all the characters. I tried replacing "#" with an array, but no luck.
Any ideas on how I can achieve this?
Many thanks
Edit: Just a note that the various parts can be in any order, as could the amount of them, and they don't all have to be present. So could be ab !a #b #c #d ^f b a #a or cd ^x y !a
Please have a look at below JS code. Hope it will help you to achieve your goal.
var inputString = "lorem ipsum !bang #hash #hash2 ^caret word #at sym";
var specialChars = "!#^#";
var firstSpecialCharIndex = inputString.search(/[!#^#]/);
var plainText = inputString.substring(0, firstSpecialCharIndex);
var result = {};
result["text"] = plainText;
for (i = firstSpecialCharIndex + 1; i < inputString.length;) {
var modifiedString = inputString.substring(i);
var currentChar = inputString.charAt(i - 1);
if (result[currentChar] == null)
result[currentChar] = [];
var text = "";
var specialCharIndex = modifiedString.search(/[!#^#]/);
if (specialCharIndex != -1) {
text = modifiedString.substring(0, specialCharIndex);
text = text.trim();
result[currentChar].push(text);
i += specialCharIndex + 1;
} else {
text = modifiedString.substring(0);
text = text.trim();
result[currentChar].push(text);
i = inputString.length;
}
}
console.log(result);
Will the various types of data, i.e. no special character, then !, then #, then ^, then # always be in the same order?
lorem ipsum !bang #hash #hash2 ^caret word #at sym
If so, you could write an extractor function for each type of data, which used the bounding characters to detect the word section, for example, if you want to get out bang, you detect position of !, then detect position of #. You then know boundary positions of data, extract that data and clean up to get words. For # section, you go from position of # to position of ^, extract section and clean up to get words, etc.

Categories

Resources