Split string in JavaScript using regex with zero width lookbehind - javascript

I know JavaScript regular expressions have native lookaheads but not lookbehinds.
I want to split a string at points either beginning with any member of one set of characters or ending with any member of another set of characters.
Split before ເ, ແ, ໂ, ໃ, ໄ. Split after ະ.
In: ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ
Out: ເລື້ອຍໆມະ ຫັດສະ ຈັນ ເອກອັກຄະ ລັດຖະ ທູດ
I can achieve the "split before" part using zero-width lookahead:
'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ'.split(/(?=[ໃໄໂເແ])/)
["ເລື້ອຍໆມະຫັດສະຈັນ", "ເອກອັກຄະລັດຖະທູດ"]
But I can't think of a general approach to simulating zero-width lookbehind
I'm splitting strings of arbitrary Unicode text so don't want to substitute in special markers in a first pass, since I can't guarantee the absence of any string from my input.

Instead of spliting, you may consider using the match() method.
var s = 'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ',
r = s.match(/(?:(?!ະ).)+?(?:ະ|(?=[ໃໄໂເແ]|$))/g);
console.log(r); //=> [ 'ເລື້ອຍໆມະ', 'ຫັດສະ', 'ຈັນ', 'ເອກອັກຄະ', 'ລັດຖະ', 'ທູດ' ]

You could try matching rather than splitting,
> var re = /((?:(?!ະ).)+(?:ະ|$))/g;
undefined
> var str = "ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ"
undefined
> var m;
undefined
> while ((m = re.exec(str)) != null) {
... console.log(m[1]);
... }
ເລື້ອຍໆມະ
ຫັດສະ
ຈັນເອກອັກຄະ
ລັດຖະ
ທູດ
Then again split the elements in the array using lookahead.

If you use parentheses in the delimited regex, the captured text is included in the returned array. So you can just split on /(ະ)/ and then concatenate each of the odd members of the resulting array to the preceding even member. Example:
"ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູ".split(/(ະ)/).reduce(function(arr,str,index) {
if (index%2 == 0) {
arr.push(str);
} else {
arr[arr.length-1] += str
};
return arr;
},[])
Result: ["ເລື້ອຍໆມະ", "ຫັດສະ", "ຈັນເອກອັກຄະ", "ລັດຖະ", "ທູ"]
You can do another pass to split on the lookahead:
"ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູ".split(/(ະ)/).reduce(function(arr,str,index) {
if (index%2 == 0) {
arr.push(str);
} else {
arr[arr.length-1] += str
};
return arr;
},[]).reduce(function(arr,str){return arr.concat(str.split(/(?=[ໃໄໂເແ])/));},[]);
Result: ["ເລື້ອຍໆມະ", "ຫັດສະ", "ຈັນ", "ເອກອັກຄະ", "ລັດຖະ", "ທູ"]

Related

Reverse a string except for the characters contained within { } with javascript

I need to reverse a string except the characters inside of "{}". I know how to reverse a string but I'm not sure how to create the exception. Please help.
function reverseChar(string2){
let string2Array = string2.split('');
let newArray = [];
for(let x = string2Array.length-1; x >= 0; x--){
newArray.push(string2Array[x])
}
console.log(newArray)
}
reverseChar("ab{cd}efg")
reverseChar("ab{cd}ef{gh}i")
Or, maybe, this is what you want?
function reverse(str) {
return str.split("").reverse().join("").replace(/}\w+\{/g,a=>reverse(a))
}
console.log(reverse("ab{cd}efg"))
console.log(reverse("ab{cd}ef{gh}i"))
The RegExp /}\w+\{/g will find any string of characters and numbers (\w+) that is enclosed by } and {. These patterns will exist after the whole string is reverse()-d initially. In the callback function to the String.replace() method the matched string will then be reversed again.
You can try this logic:
Get all the parts
If the part does not have special character, reverse it and set it.
Reverse the parts array
Join all the parts back and return it
function reverseChar(string2) {
const regex = /(\w+(?=\{|$)|\{\w+\})/g
return string2.match(regex)
.map((str) => /\{/.test(str) ? str : str.split("").reverse().join(""))
.reverse()
.join("")
}
console.log(reverseChar("ab{cd}efg"))
console.log(reverseChar("ab{cd}ef{gh}i"))

Regex to find 5 consecutive letters of alphabet (ex. abcde, noprst)

I have strings containing 5 letters of alphabet. I would like to match those that contain letters that are consecutive in alphabet for example:
abcde - return match
nopqrs - return match
cdefg - return match
fghij - return match
but
abcef - do not return match
abbcd - do not return match
I could write all combinations but as you can write in Regex [A-Z] I assumed there must be a better way.
A very simple alternative would be to just use String.prototype.includes:
function isConsecutive(string) {
const result = 'abcdefghijklmnopqrstuvwxyz'.includes(string);
console.log(string, result);
}
// true
isConsecutive('abcde');
isConsecutive('nopqrs');
isConsecutive('cdefg');
isConsecutive('fghij');
// false
isConsecutive('abcef');
isConsecutive('abbcd');
If you can live with Python, this function converts the string sequence into numbered characters, and checks if they are consequtive (if so, they are also consecutive alphabetically):
def are_letters_consequtive(text):
nums = [ord(letter) for letter in text]
if sorted(nums) == list(range(min(nums), max(nums)+1)):
return "match"
return "no match"
print(are_letters_consequtive('abcde'))
print(are_letters_consequtive('cdefg'))
print(are_letters_consequtive('fghij'))
print(are_letters_consequtive('abcef'))
print(are_letters_consequtive('abbcd'))
print(are_letters_consequtive('noprst'))
Outputs:
match
match
match
no match
no match
no match
An alternative using javascript:
let string1 = 'abcde'
let string2 = 'fghiz'
function conletters(string) {
if(string.length > 5 || typeof string != 'string') throw '[ERROR] not string or string greater than 5'
for(let i = 0; i < string.length - 1; i++) {
if(!(string.charCodeAt(i) + 1 == string.charCodeAt(i + 1)))
return false
}
return true
}
console.log('string1 is consecutive: ' + conletters(string1))
console.log('string2 is consecutive: ' + conletters(string2))
You should definitely do it with code:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
That said, you can do better than testing all the combinations when using regexes. With lookahead expressions you can basically do "and" operation. Since you know the length you could do:
const myRegex = /(?=^(ab|bc)...$)(?=^.(ab|bc)..$)(?=^..(ab|bc).$)(?=^...(ab|bc)$)/
You will need to replace the (ab|bc) with all the possible two combinations.
For this particular case it is actually worse than testing all the possibilities (since there are only 22 possibilities) but it makes it more extensible to other situations.

JavaScript: Amend the Sentence

I am having trouble below javaScript problem.
Question:
You have been given a string s, which is supposed to be a sentence. However, someone forgot to put spaces between the different words, and for some reason they capitalized the first letter of every word. Return the sentence after making the following amendments:
Put a single space between the words.
Convert the uppercase letters to lowercase.
Example
"CodefightsIsAwesome", the output should be "codefights is awesome";
"Hello", the output should be "hello".
My current code is:
Right now, my second for-loop just manually slices the parts from the string.
How can I make this dynamic and insert "space" in front of the Capital String?
You can use String.prototype.match() with RegExp /[A-Z][^A-Z]*/g to match A-Z followed by one or more characters which are not A-Z, or character at end of string; chain Array.prototype.map() to call .toLowerCase() on matched words, .join() with parameter " " to include space character between matches at resulting string.
var str = "CodefightsIsAwesome";
var res = str.match(/[A-Z][^A-Z]*/g).map(word => word.toLowerCase()).join(" ");
console.log(res);
Alternatively, as suggested by #FissureKing, you can use String.prototype.repalce() with .trim() and .toLowerCase() chained
var str = "CodefightsIsAwesome";
var res = str.replace(/[A-Z][^A-Z]*/g, word => word + ' ').trim().toLowerCase();
console.log(res);
Rather than coding a loop, I'd do it in one line with a (reasonably) simple string replacement:
function amendTheSentence(s) {
return s.replace(/[A-Z]/g, function(m) { return " " + m.toLowerCase() })
.replace(/^ /, "");
}
console.log(amendTheSentence("CodefightsIsAwesome"));
console.log(amendTheSentence("noCapitalOnFirstWord"));
console.log(amendTheSentence("ThereIsNobodyCrazierThanI"));
That is, match any uppercase letter with the regular expression /[A-Z]/, replace the matched letter with a space plus that letter in lowercase, then remove any space that was added at the start of the string.
Further reading:
String .replace() method
Regular expressions
We can loop through once.
The below assumes the very first character should always be capitalized in our return array. If that is not true, simply remove the first if block from below.
For each character after that, we check to see if it is capitalized. If so, we add it to our return array, prefaced with a space. If not, we add it as-is into our array.
Finally, we join the array back into a string and return it.
const sentence = "CodefightsIsAwesome";
const amend = function(s) {
ret = [];
for (let i = 0; i < s.length; i++) {
const char = s[i];
if (i === 0) {
ret.push(char.toUpperCase());
} else if (char.toUpperCase() === char) {
ret.push(` ${char.toLowerCase()}`);
} else {
ret.push(char);
}
}
return ret.join('');
};
console.log(amend(sentence));

Find a string surrounded by square brackets and *not* prefaced with a specific character

I would like to have a match with
[testing]
but not
![testing]
This is my query to grab a string surrounded by square brackets:
\[([^\]]+)\]
var match = /^[^!]*\[([^\]]+)\]/.exec(issueBody);
if (match)
{
$ISSUE_BODY.selectRange(match.index, match.index+match[0].length);
}
and it works marvelously.
However, I have spent a good half hour on http://regexr.com/ trying to skip strings with a "!" in front, and couldn't.
EDIT: I'm sorry guys I didn't realize that there were operations that could not be supported by specific interpreters. I am writing in Javascript and apparently lookbehind is not supported, I get this error:
Uncaught SyntaxError: Invalid regular expression:
/(?
Sorry for wasting time :\
You can use alternation:
(?:^|[^!])(\[[^\]]+\])
RegEx Demo
Here (?:^|[^!]) will match start of input OR any character that is NOT !
Code:
var re = /(?:^|[^!])(\[[^\]]+\])/gm;
var str = '![foobar123]\n[xyz789]';
while ((m = re.exec(str)) !== null)
console.log(m[1]);
Output:
[xyz789]
In Javascript, where lookbehinds are not supported, you can use:
^[^!]*\[([^\]]+)\]
(with the multiline flag to match every start of a line)
See it on regexr.com.
And here's a visualization from debuggex.com:
You can just use capturing:
var re = /(?:^|[^!])(\[[^[\]]*])/g;
var str = '[goodtesting] ![badtesting] ';
var m;
while ((m = re.exec(str)) !== null) {
document.getElementById("r").innerHTML += m[1] + "<br/>";
}
<div id="r"/>
The (?:^|[^!])(\[[^[\]]*]) regex matches the start of string or any character other than a ! (with a non-capturing group (?:^|[^!])) and matches and captures the substring enclosed with [ and ] that has no [ and ] inside (with (\[[^[\]]*])). When we need to get multiple matches, we need to use RegExp#exec() and access the captured groups using the indices (here, index 1).
Also, in JS, when you do not need to check what is after the match, just a lookbehind without a lookahead, you can use a reverse string technique (use a lookahead with the reversed string):
function revStr(s) {
return s.split('').reverse().join('');
}
var re = /][^[\]]*\[(?!!)/g; // Here, the regex pattern is reverse, too
var str = '![badtesting] [goodtesting]';
var m;
while ((m = re.exec(revStr(str))) !== null) { // We reverse a string here
document.getElementById("res").innerHTML += revStr(m[0]); // and the matched value here
}
<div id="res"/>
This is not possible with longer patterns but this one seems simple enough to go for it.

Javascript: How to remove characters from end of string? [duplicate]

I have a string, 12345.00, and I would like it to return 12345.0.
I have looked at trim, but it looks like it is only trimming whitespace and slice which I don't see how this would work. Any suggestions?
You can use the substring function:
let str = "12345.00";
str = str.substring(0, str.length - 1);
console.log(str);
This is the accepted answer, but as per the conversations below, the slice syntax is much clearer:
let str = "12345.00";
str = str.slice(0, -1);
console.log(str);
You can use slice! You just have to make sure you know how to use it. Positive #s are relative to the beginning, negative numbers are relative to the end.
js>"12345.00".slice(0,-1)
12345.0
You can use the substring method of JavaScript string objects:
s = s.substring(0, s.length - 4)
It unconditionally removes the last four characters from string s.
However, if you want to conditionally remove the last four characters, only if they are exactly _bar:
var re = /_bar$/;
s.replace(re, "");
The easiest method is to use the slice method of the string, which allows negative positions (corresponding to offsets from the end of the string):
const s = "your string";
const withoutLastFourChars = s.slice(0, -4);
If you needed something more general to remove everything after (and including) the last underscore, you could do the following (so long as s is guaranteed to contain at least one underscore):
const s = "your_string";
const withoutLastChunk = s.slice(0, s.lastIndexOf("_"));
console.log(withoutLastChunk);
For a number like your example, I would recommend doing this over substring:
console.log(parseFloat('12345.00').toFixed(1));
Do note that this will actually round the number, though, which I would imagine is desired but maybe not:
console.log(parseFloat('12345.46').toFixed(1));
Be aware that String.prototype.{ split, slice, substr, substring } operate on UTF-16 encoded strings
None of the previous answers are Unicode-aware.
Strings are encoded as UTF-16 in most modern JavaScript engines, but higher Unicode code points require surrogate pairs, so older, pre-existing string methods operate on UTF-16 code units, not Unicode code points.
See: Do NOT use .split('').
const string = "ẞ🦊";
console.log(string.slice(0, -1)); // "ẞ\ud83e"
console.log(string.substr(0, string.length - 1)); // "ẞ\ud83e"
console.log(string.substring(0, string.length - 1)); // "ẞ\ud83e"
console.log(string.replace(/.$/, "")); // "ẞ\ud83e"
console.log(string.match(/(.*).$/)[1]); // "ẞ\ud83e"
const utf16Chars = string.split("");
utf16Chars.pop();
console.log(utf16Chars.join("")); // "ẞ\ud83e"
In addition, RegExp methods, as suggested in older answers, don’t match line breaks at the end:
const string = "Hello, world!\n";
console.log(string.replace(/.$/, "").endsWith("\n")); // true
console.log(string.match(/(.*).$/) === null); // true
Use the string iterator to iterate characters
Unicode-aware code utilizes the string’s iterator; see Array.from and ... spread.
string[Symbol.iterator] can be used (e.g. instead of string) as well.
Also see How to split Unicode string to characters in JavaScript.
Examples:
const string = "ẞ🦊";
console.log(Array.from(string).slice(0, -1).join("")); // "ẞ"
console.log([
...string
].slice(0, -1).join("")); // "ẞ"
Use the s and u flags on a RegExp
The dotAll or s flag makes . match line break characters, the unicode or u flag enables certain Unicode-related features.
Note that, when using the u flag, you eliminate unnecessary identity escapes, as these are invalid in a u regex, e.g. \[ is fine, as it would start a character class without the backslash, but \: isn’t, as it’s a : with or without the backslash, so you need to remove the backslash.
Examples:
const unicodeString = "ẞ🦊",
lineBreakString = "Hello, world!\n";
console.log(lineBreakString.replace(/.$/s, "").endsWith("\n")); // false
console.log(lineBreakString.match(/(.*).$/s) === null); // false
console.log(unicodeString.replace(/.$/su, "")); // ẞ
console.log(unicodeString.match(/(.*).$/su)[1]); // ẞ
// Now `split` can be made Unicode-aware:
const unicodeCharacterArray = unicodeString.split(/(?:)/su),
lineBreakCharacterArray = lineBreakString.split(/(?:)/su);
unicodeCharacterArray.pop();
lineBreakCharacterArray.pop();
console.log(unicodeCharacterArray.join("")); // "ẞ"
console.log(lineBreakCharacterArray.join("").endsWith("\n")); // false
Note that some graphemes consist of more than one code point, e.g. 🏳️‍🌈 which consists of the sequence 🏳 (U+1F3F3), VS16 (U+FE0F), ZWJ (U+200D), 🌈 (U+1F308).
Here, even Array.from will split this into four “characters”.
Matching those is made easier with the RegExp set notation and properties of strings proposal.
Using JavaScript's slice function:
let string = 'foo_bar';
string = string.slice(0, -4); // Slice off last four characters here
console.log(string);
This could be used to remove '_bar' at end of a string, of any length.
A regular expression is what you are looking for:
let str = "foo_bar";
console.log(str.replace(/_bar$/, ""));
Try this:
const myString = "Hello World!";
console.log(myString.slice(0, -1));
Performance
Today 2020.05.13 I perform tests of chosen solutions on Chrome v81.0, Safari v13.1 and Firefox v76.0 on MacOs High Sierra v10.13.6.
Conclusions
the slice(0,-1)(D) is fast or fastest solution for short and long strings and it is recommended as fast cross-browser solution
solutions based on substring (C) and substr(E) are fast
solutions based on regular expressions (A,B) are slow/medium fast
solutions B, F and G are slow for long strings
solution F is slowest for short strings, G is slowest for long strings
Details
I perform two tests for solutions A, B, C, D, E(ext), F, G(my)
for 8-char short string (from OP question) - you can run it HERE
for 1M long string - you can run it HERE
Solutions are presented in below snippet
function A(str) {
return str.replace(/.$/, '');
}
function B(str) {
return str.match(/(.*).$/)[1];
}
function C(str) {
return str.substring(0, str.length - 1);
}
function D(str) {
return str.slice(0, -1);
}
function E(str) {
return str.substr(0, str.length - 1);
}
function F(str) {
let s= str.split("");
s.pop();
return s.join("");
}
function G(str) {
let s='';
for(let i=0; i<str.length-1; i++) s+=str[i];
return s;
}
// ---------
// TEST
// ---------
let log = (f)=>console.log(`${f.name}: ${f("12345.00")}`);
[A,B,C,D,E,F,G].map(f=>log(f));
This snippet only presents soutions
Here are example results for Chrome for short string
Use regex:
let aStr = "12345.00";
aStr = aStr.replace(/.$/, '');
console.log(aStr);
How about:
let myString = "12345.00";
console.log(myString.substring(0, myString.length - 1));
1. (.*), captures any character multiple times:
console.log("a string".match(/(.*).$/)[1]);
2. ., matches last character, in this case:
console.log("a string".match(/(.*).$/));
3. $, matches the end of the string:
console.log("a string".match(/(.*).{2}$/)[1]);
https://stackoverflow.com/questions/34817546/javascript-how-to-delete-last-two-characters-in-a-string
Just use trim if you don't want spaces
"11.01 °C".slice(0,-2).trim()
Here is an alternative that i don't think i've seen in the other answers, just for fun.
var strArr = "hello i'm a string".split("");
strArr.pop();
document.write(strArr.join(""));
Not as legible or simple as slice or substring but does allow you to play with the string using some nice array methods, so worth knowing.
debris = string.split("_") //explode string into array of strings indexed by "_"
debris.pop(); //pop last element off the array (which you didn't want)
result = debris.join("_"); //fuse the remainng items together like the sun
If you want to do generic rounding of floats, instead of just trimming the last character:
var float1 = 12345.00,
float2 = 12345.4567,
float3 = 12345.982;
var MoreMath = {
/**
* Rounds a value to the specified number of decimals
* #param float value The value to be rounded
* #param int nrDecimals The number of decimals to round value to
* #return float value rounded to nrDecimals decimals
*/
round: function (value, nrDecimals) {
var x = nrDecimals > 0 ? 10 * parseInt(nrDecimals, 10) : 1;
return Math.round(value * x) / x;
}
}
MoreMath.round(float1, 1) => 12345.0
MoreMath.round(float2, 1) => 12345.5
MoreMath.round(float3, 1) => 12346.0
EDIT: Seems like there exists a built in function for this, as Paolo points out. That solution is obviously much cleaner than mine. Use parseFloat followed by toFixed
if(str.substring(str.length - 4) == "_bar")
{
str = str.substring(0, str.length - 4);
}
Via slice(indexStart, indexEnd) method - note, this does NOT CHANGE the existing string, it creates a copy and changes the copy.
console.clear();
let str = "12345.00";
let a = str.slice(0, str.length -1)
console.log(a, "<= a");
console.log(str, "<= str is NOT changed");
Via Regular Expression method - note, this does NOT CHANGE the existing string, it creates a copy and changes the copy.
console.clear();
let regExp = /.$/g
let b = str.replace(regExp,"")
console.log(b, "<= b");
console.log(str, "<= str is NOT changed");
Via array.splice() method -> this only works on arrays, and it CHANGES, the existing array (so careful with this one), you'll need to convert a string to an array first, then back.
console.clear();
let str = "12345.00";
let strToArray = str.split("")
console.log(strToArray, "<= strToArray");
let spliceMethod = strToArray.splice(str.length-1, 1)
str = strToArray.join("")
console.log(str, "<= str is changed now");
In cases where you want to remove something that is close to the end of a string (in case of variable sized strings) you can combine slice() and substr().
I had a string with markup, dynamically built, with a list of anchor tags separated by comma. The string was something like:
var str = "<a>text 1,</a><a>text 2,</a><a>text 2.3,</a><a>text abc,</a>";
To remove the last comma I did the following:
str = str.slice(0, -5) + str.substr(-4);
You can, in fact, remove the last arr.length - 2 items of an array using arr.length = 2, which if the array length was 5, would remove the last 3 items.
Sadly, this does not work for strings, but we can use split() to split the string, and then join() to join the string after we've made any modifications.
var str = 'string'
String.prototype.removeLast = function(n) {
var string = this.split('')
string.length = string.length - n
return string.join('')
}
console.log(str.removeLast(3))
Try to use toFixed
const str = "12345.00";
return (+str).toFixed(1);
Try this:
<script>
var x="foo_foo_foo_bar";
for (var i=0; i<=x.length; i++) {
if (x[i]=="_" && x[i+1]=="b") {
break;
}
else {
document.write(x[i]);
}
}
</script>
You can also try the live working example on http://jsfiddle.net/informativejavascript/F7WTn/87/.
#Jason S:
You can use slice! You just have to
make sure you know how to use it.
Positive #s are relative to the
beginning, negative numbers are
relative to the end.
js>"12345.00".slice(0,-1)
12345.0
Sorry for my graphomany but post was tagged 'jquery' earlier. So, you can't use slice() inside jQuery because slice() is jQuery method for operations with DOM elements, not substrings ...
In other words answer #Jon Erickson suggest really perfect solution.
However, your method will works out of jQuery function, inside simple Javascript.
Need to say due to last discussion in comments, that jQuery is very much more often renewable extension of JS than his own parent most known ECMAScript.
Here also exist two methods:
as our:
string.substring(from,to) as plus if 'to' index nulled returns the rest of string. so:
string.substring(from) positive or negative ...
and some other - substr() - which provide range of substring and 'length' can be positive only:
string.substr(start,length)
Also some maintainers suggest that last method string.substr(start,length) do not works or work with error for MSIE.
Use substring to get everything to the left of _bar. But first you have to get the instr of _bar in the string:
str.substring(3, 7);
3 is that start and 7 is the length.

Categories

Resources