Split string into lines and sentences, but ignoring abbrevations

Split string into lines and sentences, but ignoring abbrevations - javascript

There is some string content, which I have to split. First of all I need to split the string content into lines.
This is how I do that:
str.split('\n').forEach((item) => {
if (item) {
// TODO: split also each line into sentences
let data = {
type : 'item',
content: [{
content : item,
timestamp: Math.floor(Date.now() / 1000)
}]
};
// Save `data` to DB
}
});
But now I need to split also each line into sentences. The difficulty to me for this is to split it correctly. Therefore I would use . (dot and space) to split the line.
BUT there is also an array of abbrevations, which should NOT split the line:
cont abbr = ['vs.', 'min.', 'max.']; // Just an example; there are 70 abbrevations in that array
... and there are a few more rules:
Any number and dot or single letter and dot should also be ignored as split string: 1., 2., 30., A., b.
Upper and lower case should be ignored: Max. Lorem ipsum should not be splitted. Lorem max. ipsum either.
Example
const str = 'Just some examples:\nThis example has min. 2 lines. Max. 10 lines. There are some words: 1. Foo and 2. bar.';
The result of that should be four data-objects:
{ type: 'item', content: [{ content: 'Just some examples:', timestamp: 123 }] }
{ type: 'item', content: [{ content: 'This example has min. 2 lines.', timestamp: 123 }] }
{ type: 'item', content: [{ content: 'Max. 10 lines.', timestamp: 123 }] }
{ type: 'item', content: [{ content: 'There are some words: 1. Foo and 2. bar.', timestamp: 123 }] }

You can first detect the abbreviations and the numberings in the string, and replace the dot by a dummy string in each one. After splitting the string on the remaining dots, which signal the end of a sentence, you can restore the original dots. Once you have the sentences, you can split each one on newline characters like you do in your original code.
The updated code allows for more than one dot in the abbreviations (as shown for p.o. and s.v.p.).
var i, j, strRegex, regex, abbrParts;
const DOT = "_DOT_";
const abbr = ["p.o.", "s.v.p.", "vs.", "min.", "max."];
var str = 'Just some examples:\nThis example s.v.p. has min. 2 lines. Max. 10 lines. There are some words: 1. Foo and 2. bar. And also A. p.o. professional letters.';
console.log("String: " + str);
// Replace dot in abbreviations found in string
for (i = 0; i < abbr.length; i++) {
abbrParts = abbr[i].split(".");
strRegex = "(\\W*" + abbrParts[0] + ")";
for (j = 1; j < abbrParts.length - 1; j++) {
strRegex += "(\\.)(" + abbrParts[j] + ")";
}
strRegex += "(\\.)(" + abbrParts[abbrParts.length - 1] + "\\W*)";
regex = new RegExp(strRegex, "gi");
str = str.replace(regex, function () {
var groups = arguments;
var result = groups[1];
for (j = 2; j < groups.length; j += 2) {
result += (groups[j] === "." ? DOT + groups[j+1] : "");
}
return result;
});
}
// Replace dot in numbers found in string
str = str.replace(/(\W*\d+)(\.)/gi, "$1" + DOT);
// Replace dot in letter numbering found in string
str = str.replace(/(\W+[a-zA-Z])(\.)/gi, "$1" + DOT);
// Split the string at dots
var parts = str.split(".");
// Restore dots in sentences
var sentences = [];
regex = new RegExp(DOT, "gi");
for (i = 0; i < parts.length; i++) {
if (parts[i].trim().length > 0) {
sentences.push(parts[i].replace(regex, ".").trim() + ".");
console.log("Sentence " + (i + 1) + ": " + sentences[i]);
}
}

Related

How to remove delimiters from a given RegEx?

my input:
str = "User-123"
o/p:
name: User
id: 123
another input:
str = "User 123"// current this works with my regex.
o/p: As above
other possible inputs:
str = "User:123"
str = "User/123"
str = "User:123"
code:
let m = value.match(/([a-z]+\s*\d+)\s+([a-z]+\s*\d+|\d+\s*[a-z]+)/i);
if (m) {return true}
else {return false}
if I have delimiters the above code return false as it does not find the match for the delimiters. I want to return true for all the scenarios listed above.
currently it removes only the whitespaces, how can I remove delimiters from this regex as well?

It looks like you just want to split on a non-alphanumeric character:
let inputs = [
"User:123",
"User/123",
"User:123",
"User-123",
"User 123"
]
for (i of inputs){
let [name, id] = i.split(/[^a-z0-9]/i)
console.log("name:", name, "id:", id)
}

You might consider simplifying your expression. Using capturing groups, you can simply add/remove any delimiters that you wish. For instance, this expression shows how you might use capturing group:
([A-z]+)(:|\/)([0-9]+)
Graph
This graph shows how the expression work:
Code
This code shows how to do so and does a basic benchmark with 1 million times repeat.
repeat = 1000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = 'User/123';
var regex = /([A-z]+)(:|\/)([0-9]+)/g;
var match = string.replace(regex, "$1$3");
}
end = Date.now() - start;
console.log(match + " is a match 💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

How to add parentheses around every Regex match in jQuery/Javascript?

I need to add parentheses or "<>" around every match in the regex, I already got all the regex sentences ready. For example:
Input:
int a = 0;
Output:
<int><a><=><0>
There's one more thing, what I'm doing is a "translator" it needs to read an arithmetic count in C and generate its tokens flow. So, for example, the "=" will <assign_op> and the ";" will be <end_of_statement>.
The sentence above would be written as:
<int><a><assign_op><0>
Here's the code I've been working on:
function translate() {
var input = 'int a = 0;' +
'\nint b = 5;' +
'\nint a = b + 5;' +
'\nint c = a1 / 1;' +
'\ndouble a = 1;' +
'\nfloat a = 0;' +
'\na = 0;' +
'\nfloat a = b + 1;' +
'\na = (b - c) * 5;';
var regex3 = new RegExp(/(((int|long int|double|long double|float)+\s*([a-zA-Z_]+\d*)*|([a-zA-Z_]+\d*))\s*=\s*(([a-zA-Z_]*|[a-zA-Z_]+\d*)*|\d*|\d+\.\d+);)|(((int|long int|double|long double|float)+\s*([a-zA-Z_]+\d*)*|([a-zA-Z_]+\d*))\s*=(\s*\(*(([a-zA-Z_]*|[a-zA-Z_]+\d*)*|\d*|\d+\.\d+)\)*\s*[+\-/*%]\s*\(*(([a-zA-Z_]*|[a-zA-Z_]+\d*)*|\d*|\d+\.\d+)\)*)*\s*;)/g);
var text = input.match(regex3);
var varTypes = ['int', 'double', 'float', 'long int', 'long double'];
var output = '';
text.forEach(line => {
varTypes.forEach(type => {
if (line.match(type))
line = line.replace(type, '<' + type + '>');
});
if (line.match(/=/g)) {
line = line.replace(/=/g, '<assign_op>')
}
if (line.match(/;/g)) {
line = line.replace(/;/g, '<end_of_statement>');
}
if (line.match(/\(/g)) {
line = line.replace(/\(/g, '<open_parenthesis>')
}
if (line.match(/\)/g)) {
line = line.replace(/\)/g, '<close_parenthesis>')
}
if (line.match(/[+\-*/%]/g)) {
line = line.replace(/[+\-*/%]/g, '<operator>')
}
if (line.match(/\+{2}/g)) {
line = line.replace(/\+{2}/g, '<operator>')
}
output += line + '\n';
});
console.log(output);
}
Oh, sorry if I had many English writing mistakes, not an English native speaker :)

I worked on your complex string manipulation problem quite long...
I came with a "dictionary" idea make replacements management easier. And I used the spaces to target the string elements to wrap with < and >.
Have a look at the comments within the code. CodePen
var input =
'int a = 0;' +
'\nint b = 5;' +
'\nint a = b + 5;' +
'\nint c = a1 / 1;' +
'\ndouble a = 1;' +
'\nfloat a = 0;' +
'\na = 0;' +
'\nfloat a = b + 1;' +
'\na = (b - c) * 5;' +
'\nlong int = (w - x) * 7;' + // Added to test the two words types
'\nlong double = (x - w) * 7;'; // Added to test the two words types
var dictionary = [
{
target: "long int",
replacement: "long|int" // | to ensure keeping that space, will be restored later
},
{
target: "long double",
replacement: "long|double" // | to ensure keeping that space, will be restored later
},
{
target: /=/g,
replacement: "assign_op"
},
{
target: /;/g,
replacement: "end_of_statement"
},
{
target: /\(/g,
replacement: "open_parenthesis"
},
{
target: /\)/g,
replacement: "close_parenthesis"
},
{
target: /[+\-*/%]/g,
replacement: "operator"
},
{
target: /\+{2}/g,
replacement: "operator"
}
];
function translate(input) {
//console.log(input);
// Your unchanged regex
var regex3 = new RegExp(/(((int|long int|double|long double|float)+\s*([a-zA-Z_]+\d*)*|([a-zA-Z_]+\d*))\s*=\s*(([a-zA-Z_]*|[a-zA-Z_]+\d*)*|\d*|\d+\.\d+);)|(((int|long int|double|long double|float)+\s*([a-zA-Z_]+\d*)*|([a-zA-Z_]+\d*))\s*=(\s*\(*(([a-zA-Z_]*|[a-zA-Z_]+\d*)*|\d*|\d+\.\d+)\)*\s*[+\-/*%]\s*\(*(([a-zA-Z_]*|[a-zA-Z_]+\d*)*|\d*|\d+\.\d+)\)*)*\s*;)/g);
// An array of lines created by the use of your regex
var lines_array = input.match(regex3);
//console.log(lines_array);
// The result variable
var output = '';
// Process each lines
lines_array.forEach(line => {
// Use the dictionary to replace some special cases
// It adds spaces around the replacements to ensure word separation
dictionary.forEach(translation => {
if (line.match(translation.target)) {
line = line.replace(translation.target, " "+translation.replacement+" "); // Notice the spaces
}
});
// Remove double spaces
line = line.trim().replace(/\s+/g," ");
// Use the spaces to get a word array to add the angle brackets
var words = line.split(" ");
words.forEach(word => {
output += "<"+word+">";
});
// Re-add the line return
output += '\n';
});
// Final fixes on the whole result string
output = output
.replace(/\|/g, " ") // Restore the space in the "two words types" ( was replaced by a | )
.replace(/<</g, "<") // Remove duplicate angle brackets
.replace(/>>/g, ">")
console.log(output);
}
// Run the function
translate(input);

Replace text with html formatting using text location (text span) in Javascript

I have a string and I want to replace text with html font colors and I need to replace using a dictionary containing the start of the span (key) and the length of the span (value). Any idea on how to do this in JS so that all my text is replaced correctly with html?
str = "Make 'this' become blue and also 'that'."
// color_dict contains the START of the span and the LENGTH of the word.
// i.e. this & that are both size 4.
color_dict = {6: "4", 34: "4"};
console.log(str.slice(6, 10)); //obviously this is just a slice
console.log(str.slice(34, 38));
// This is what I would like at the end.
document.write("Make '<font color='blue'>this</font>' become blue and also '<font color='blue'>that</font>'.");
Overall, I'd like to replace the original string with some html but using the dictionary containing the text start and length of the substring.
Thank you so much!

This uses regular expressions to get the job done. The dictionary is processed in reverse order so that the indexes for replacements won't change.
var str = "Make 'this' become blue and also 'that'."
// color_dict contains the START of the span and the LENGTH of the word.
// i.e. this & that are both size 4.
var color_dict = { 6: "4", 34: "4" };
// get keys sorted numerically
var keys = Object.keys(color_dict).sort(function(a, b) {return a - b;});
// process keys in reverse order
for (var i = keys.length - 1; i > -1; i--) {
var key = keys[i];
str = str.replace(new RegExp("^(.{" + key + "})(.{" + color_dict[key] + "})"), "$1<font color='blue'>$2</font>");
}
document.write(str);

You could do it like this:
var str = "Make 'this' become blue and also 'that'.";
var new_str = '';
var replacements = [];
var prev = 0;
for (var i in color_dict) {
replacements.push(str.slice(prev, parseInt(i)-1));
prev = parseInt(i) + parseInt(color_dict[i]) + 1;
replacements.push(str.slice(parseInt(i)-1, prev));
}
for (var i = 0; i < replacements.length; i+=2) {
new_str += replacements[i] + "<font color='blue'>" + replacements[i+1] + "</font>";
}
new_str += str.substr(-1);
console.log(new_str);
//Make <font color='blue'>'this'</font> become blue and also <font color='blue'>'that'</font>.

HTML:
<div id="string">Make 'this' become blue and also 'that'.</div>
jQuery
var str = $("#string").text(); // get string
color_dict = [{index: 6, length: 4}, {index: 34, length: 4}]; // edited your object to instead be an array of objects
for(var i = 0; i < color_dict.length; i++) {
str = str.substring(0, color_dict[i].index) +
"<span style='color: blue'>" +
str.substring(color_dict[i].index, color_dict[i].length + color_dict[i].index) +
"</span>" +
str.substring(color_dict[i].index + color_dict[i].length);
for(var j = i+1; j < color_dict.length; j++) {
color_dict[j].index += color_dict[i].length + 29; // shift all further indeces back because you added a string
}
}
$("#string").html(str); // update string
See the working example on JSFiddle.
What this does is:
Get the text
Set the dictionary
For each "word" in the dictionary, change the original string so that it is:
the text before +
<div style="color: blue">
the dictionary text
</div>
the text after the dictionary text
On a side note, the <font> tag and its color attribute are deprecated. Use CSS instead.

Here is a function that would do it, it takes the string and the dictionary as arguments in the format you defined:
function decorateString(str, color_dict) {
// turn into more suitable array of {start, len}
var arr = Object.keys(color_dict).map(function (start) {
return {
start: parseInt(start),
len: parseInt(color_dict[start])
};
});
// sort on descending start position to ensure proper tag-insertion
arr.sort(function(a, b) {
return a.start < b.start;
});
// build new string and return it
return arr.reduce(function(str, word) {
return str.substr(0, word.start)
+ "<font color='blue'>"
+ str.substr(word.start, word.len)
+ '</font>'
+ str.substr(word.start + word.len);
}, str);
}
Use it as follows:
str = "Make 'this' become blue and also 'that'."
color_dict = {6: "4", 34: "4"};
document.write(decorateString(str, color_dict));
Here is a JS Fiddle

Javascript: Cut string after last specific character

I'm doing some Javascript to cut strings into 140 characters, without breaking words and stuff, but now i want the text so have some sense. so i would like if you find a character (just like ., , :, ;, etc) and if the string is>110 characters and <140 then slice it, so the text has more sense. Here is what i have done:
where texto means text, longitud means length, and arrayDeTextos means ArrayText.
Thank you.
//function to cut strings
function textToCut(texto, longitud){
if(texto.length<longitud) return texto;
else {
var cortado=texto.substring(0,longitud).split(' ');
texto='';
for(key in cortado){
if(key<(cortado.length-1)){
texto+=cortado[key]+' ';
if(texto.length>110 && texto.length<140) {
alert(texto);
}
}
}
}
return texto;
}
function textToCutArray(texto, longitud){
var arrayDeTextos=[];
var i=-1;
do{
i++;
arrayDeTextos.push(textToCut(texto, longitud));
texto=texto.replace(arrayDeTextos[i],'');
}while(arrayDeTextos[i].length!=0)
arrayDeTextos.push(texto);
for(key in arrayDeTextos){
if(arrayDeTextos[key].length==0){
delete arrayDeTextos[key];
}
}
return arrayDeTextos;
}

Break the string into sentences, then check the length of the final string before appending each sentence.
var str = "Test Sentence. Test Sentence";
var arr = str.split(/[.,;:]/) //create an array of sentences delimited by .,;:
var final_str = ''
for (var s in arr) {
if (final_str.length == 0) {
final_str += arr[s];
} else if (final_str.length + s.length < 140) {
final_str += arr[s];
}
}
alert(final_str); // should have as many full sentences as possible less than 140 characters.

I think Martin Konecny's solution doesn't work well because it excludes the delimiter and so removes lots of sense from the text.
This is my solution:
var arrTextChunks = text.split(/([,:\?!.;])/g),
finalText = "",
finalTextLength = 0;
for(var i = 0; i < arrTextChunks.length; i += 2) {
if(finalTextLength + arrTextChunks[i].length + 1 < 140) {
finalText += arrTextChunks[i] + arrTextChunks[i + 1];
finalTextLength += arrTextChunks[i].length;
} else if(finalTextLength > 110) {
break;
}
}
http://jsfiddle.net/Whre/3or7j50q/3/
I'm aware of the fact that the i += 2 part does only make sense for "common" usages of punctuation (a single dot, colon etc.) and nothing like "hi!!!?!?1!1!".

Should be a bit more effective without regex splits.
var truncate = function (str, maxLen, delims) {
str = str.substring(0, maxLen);
return str.substring(0, Math.max.apply(null, delims.map(function (s) {
return str.lastIndexOf(s);
})));
};

Try this regex, you can see how it works here: http://regexper.com/#%5E(%5Cr%5Cn%7C.)%7B1%2C140%7D%5Cb
str.match(/^(\r\n|.){1,140}\b/g).join('')

Javascript and regex: split string and keep the separator

I have a string:
var string = "aaaaaa<br />† bbbb<br />‡ cccc"
And I would like to split this string with the delimiter <br /> followed by a special character.
To do that, I am using this:
string.split(/<br \/>&#?[a-zA-Z0-9]+;/g);
I am getting what I need, except that I am losing the delimiter.
Here is the example: http://jsfiddle.net/JwrZ6/1/
How can I keep the delimiter?

I was having similar but slight different problem. Anyway, here are examples of three different scenarios for where to keep the deliminator.
"1、2、3".split("、") == ["1", "2", "3"]
"1、2、3".split(/(、)/g) == ["1", "、", "2", "、", "3"]
"1、2、3".split(/(?=、)/g) == ["1", "、2", "、3"]
"1、2、3".split(/(?!、)/g) == ["1、", "2、", "3"]
"1、2、3".split(/(.*?、)/g) == ["", "1、", "", "2、", "3"]
Warning: The fourth will only work to split single characters. ConnorsFan presents an alternative:
// Split a path, but keep the slashes that follow directories
var str = 'Animation/rawr/javascript.js';
var tokens = str.match(/[^\/]+\/?|\//g);

Use (positive) lookahead so that the regular expression asserts that the special character exists, but does not actually match it:
string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g);
See it in action:
var string = "aaaaaa<br />† bbbb<br />‡ cccc";
console.log(string.split(/<br \/>(?=&#?[a-zA-Z0-9]+;)/g));

If you wrap the delimiter in parantheses it will be part of the returned array.
string.split(/(<br \/>&#?[a-zA-Z0-9]+);/g);
// returns ["aaaaaa", "<br />†", "bbbb", "<br />‡", "cccc"]
Depending on which part you want to keep change which subgroup you match
string.split(/(<br \/>)&#?[a-zA-Z0-9]+;/g);
// returns ["aaaaaa", "<br />", "bbbb", "<br />", "cccc"]
You could improve the expression by ignoring the case of letters
string.split(/()&#?[a-z0-9]+;/gi);
And you can match for predefined groups like this: \d equals [0-9] and \w equals [a-zA-Z0-9_]. This means your expression could look like this.
string.split(/<br \/>(&#?[a-z\d]+;)/gi);
There is a good Regular Expression Reference on JavaScriptKit.

If you group the split pattern, its match will be kept in the output and it is by design:
If separator is a regular expression with capturing parentheses, then
each time separator matches, the results (including any undefined
results) of the capturing parentheses are spliced into the output
array.
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split#description
You don't need a lookahead or global flag unless your search pattern uses one.
const str = `How much wood would a woodchuck chuck, if a woodchuck could chuck wood?`
const result = str.split(/(\s+)/);
console.log(result);
// We can verify the result
const isSame = result.join('') === str;
console.log({ isSame });
You can use multiple groups. You can be as creative as you like and what remains outside the groups will be removed:
const str = `How much wood would a woodchuck chuck, if a woodchuck could chuck wood?`
const result = str.split(/(\s+)(\w{1,2})\w+/);
console.log(result, result.join(''));

answered it here also JavaScript Split Regular Expression keep the delimiter
use the (?=pattern) lookahead pattern in the regex
example
var string = '500x500-11*90~1+1';
string = string.replace(/(?=[$-/:-?{-~!"^_`\[\]])/gi, ",");
string = string.split(",");
this will give you the following result.
[ '500x500', '-11', '*90', '~1', '+1' ]
Can also be directly split
string = string.split(/(?=[$-/:-?{-~!"^_`\[\]])/gi);
giving the same result
[ '500x500', '-11', '*90', '~1', '+1' ]

I made a modification to jichi's answer, and put it in a function which also supports multiple letters.
String.prototype.splitAndKeep = function(separator, method='seperate'){
var str = this;
if(method == 'seperate'){
str = str.split(new RegExp(`(${separator})`, 'g'));
}else if(method == 'infront'){
str = str.split(new RegExp(`(?=${separator})`, 'g'));
}else if(method == 'behind'){
str = str.split(new RegExp(`(.*?${separator})`, 'g'));
str = str.filter(function(el){return el !== "";});
}
return str;
};
jichi's answers 3rd method would not work in this function, so I took the 4th method, and removed the empty spaces to get the same result.
edit:
second method which excepts an array to split char1 or char2
String.prototype.splitAndKeep = function(separator, method='seperate'){
var str = this;
function splitAndKeep(str, separator, method='seperate'){
if(method == 'seperate'){
str = str.split(new RegExp(`(${separator})`, 'g'));
}else if(method == 'infront'){
str = str.split(new RegExp(`(?=${separator})`, 'g'));
}else if(method == 'behind'){
str = str.split(new RegExp(`(.*?${separator})`, 'g'));
str = str.filter(function(el){return el !== "";});
}
return str;
}
if(Array.isArray(separator)){
var parts = splitAndKeep(str, separator[0], method);
for(var i = 1; i < separator.length; i++){
var partsTemp = parts;
parts = [];
for(var p = 0; p < partsTemp.length; p++){
parts = parts.concat(splitAndKeep(partsTemp[p], separator[i], method));
}
}
return parts;
}else{
return splitAndKeep(str, separator, method);
}
};
usage:
str = "first1-second2-third3-last";
str.splitAndKeep(["1", "2", "3"]) == ["first", "1", "-second", "2", "-third", "3", "-last"];
str.splitAndKeep("-") == ["first1", "-", "second2", "-", "third3", "-", "last"];

An extension function splits string with substring or RegEx and the delimiter is putted according to second parameter ahead or behind.
String.prototype.splitKeep = function (splitter, ahead) {
var self = this;
var result = [];
if (splitter != '') {
var matches = [];
// Getting mached value and its index
var replaceName = splitter instanceof RegExp ? "replace" : "replaceAll";
var r = self[replaceName](splitter, function (m, i, e) {
matches.push({ value: m, index: i });
return getSubst(m);
});
// Finds split substrings
var lastIndex = 0;
for (var i = 0; i < matches.length; i++) {
var m = matches[i];
var nextIndex = ahead == true ? m.index : m.index + m.value.length;
if (nextIndex != lastIndex) {
var part = self.substring(lastIndex, nextIndex);
result.push(part);
lastIndex = nextIndex;
}
};
if (lastIndex < self.length) {
var part = self.substring(lastIndex, self.length);
result.push(part);
};
// Substitution of matched string
function getSubst(value) {
var substChar = value[0] == '0' ? '1' : '0';
var subst = '';
for (var i = 0; i < value.length; i++) {
subst += substChar;
}
return subst;
};
}
else {
result.add(self);
};
return result;
};
The test:
test('splitKeep', function () {
// String
deepEqual("1231451".splitKeep('1'), ["1", "231", "451"]);
deepEqual("123145".splitKeep('1', true), ["123", "145"]);
deepEqual("1231451".splitKeep('1', true), ["123", "145", "1"]);
deepEqual("hello man how are you!".splitKeep(' '), ["hello ", "man ", "how ", "are ", "you!"]);
deepEqual("hello man how are you!".splitKeep(' ', true), ["hello", " man", " how", " are", " you!"]);
// Regex
deepEqual("mhellommhellommmhello".splitKeep(/m+/g), ["m", "hellomm", "hellommm", "hello"]);
deepEqual("mhellommhellommmhello".splitKeep(/m+/g, true), ["mhello", "mmhello", "mmmhello"]);
});

I've been using this:
String.prototype.splitBy = function (delimiter) {
var
delimiterPATTERN = '(' + delimiter + ')',
delimiterRE = new RegExp(delimiterPATTERN, 'g');
return this.split(delimiterRE).reduce((chunks, item) => {
if (item.match(delimiterRE)){
chunks.push(item)
} else {
chunks[chunks.length - 1] += item
};
return chunks
}, [])
}
Except that you shouldn't mess with String.prototype, so here's a function version:
var splitBy = function (text, delimiter) {
var
delimiterPATTERN = '(' + delimiter + ')',
delimiterRE = new RegExp(delimiterPATTERN, 'g');
return text.split(delimiterRE).reduce(function(chunks, item){
if (item.match(delimiterRE)){
chunks.push(item)
} else {
chunks[chunks.length - 1] += item
};
return chunks
}, [])
}
So you could do:
var haystack = "aaaaaa<br />† bbbb<br />‡ cccc"
var needle = '<br \/>&#?[a-zA-Z0-9]+;';
var result = splitBy(haystack , needle)
console.log( JSON.stringify( result, null, 2) )
And you'll end up with:
[
"<br />† bbbb",
"<br />‡ cccc"
]

Most of the existing answers predate the introduction of lookbehind assertions in JavaScript in 2018. You didn't specify how you wanted the delimiters to be included in the result. One typical use case would be sentences delimited by punctuation ([.?!]), where one would want the delimiters to be included at the ends of the resulting strings. This corresponds to the fourth case in the accepted answer, but as noted there, that solution only works for single characters. Arbitrary strings with the delimiters appended at the end can be formed with a lookbehind assertion:
'It is. Is it? It is!'.split(/(?<=[.?!])/)
/* [ 'It is.', ' Is it?', ' It is!' ] */

I know that this is a bit late but you could also use lookarounds
var string = "aaaaaa<br />† bbbb<br />‡ cccc";
var array = string.split(/(?<=<br \/>)/);
console.log(array);

I've also came up with this solution. No regex needed, very readable.
const str = "hello world what a great day today balbla"
const separatorIndex = str.indexOf("great")
const parsedString = str.slice(separatorIndex)
console.log(parsedString)

Develop Reference

JavaScript is the programming language of the Web.

Split string into lines and sentences, but ignoring abbrevations - javascript

Related

How to remove delimiters from a given RegEx?

How to add parentheses around every Regex match in jQuery/Javascript?

Replace text with html formatting using text location (text span) in Javascript

Javascript: Cut string after last specific character

Javascript and regex: split string and keep the separator

Categories

Resources