Regex split comma except escaped [duplicate] - javascript

I have this string:
a\,bcde,fgh,ijk\,lmno,pqrst\,uv
I need a JavaScript function that will split the string by every , but only those that don't have a \ before them
How can this be done?

Here's the shortest thing I could come up with:
'a\\,bcde,fgh,ijk\\,lmno,pqrst\\,uv'.replace(/([^\\]),/g, '$1\u000B').split('\u000B')
The idea behind is to find every place where comma isn't prefixed with a backslash, replace those with string that is uncommon to come up in your strings and then split by that uncommon string.
Note that backslashes before commas have to be escaped using another backslash. Otherwise, javascript treats form \, as escaped comma and produce simply a comma out of it! In other words if you won't escape the backslash, javascript sees this: a\,bcde,fgh,ijk\,lmno,pqrst\,uv as this a,bcde,fgh,ijk,lmno,pqrst,uv.

Since regular expressions in JavaScript does not support lookbehinds, I'm not going to cook up a giant hack to mimic this behavior. Instead, you can just split() on all commas (,) and then glue back the pieces that shouldn't have been split in the first place.
Quick 'n' dirty demo:
var str = 'a\\,bcde,fgh,ijk\\,lmno,pqrst\\,uv'.split(','), // Split on all commas
out = []; // Output
for (var i = 0, j = str.length - 1; i < j; i++) { // Iterate all but last (last can never be glued to non-existing next)
var curr = str[i]; // This piece
if (curr.charAt(curr.length - 1) == '\\') { // If ends with \ ...
curr += ',' + str[++i]; // ... glue with next and skip next (increment i)
}
out.push(curr); // Add to output
}

Another ugly hack around the lack of look-behinds:
function rev(s) {
return s.split('').reverse().join('');
}
var s = 'a\\,bcde,fgh,ijk\\,lmno,pqrst\\,uv';
// Enter bizarro world...
var r = rev(s);
// Split with a look-ahead
var rparts = r.split(/,(?!\\)/);
// And put it back together with double reversing.
var sparts = [ ];
while(rparts.length)
sparts.push(rev(rparts.pop()));
for(var i = 0; i < sparts.length; ++i)
$('#out').append('<pre>' + sparts[i] + '</pre>');
Demo: http://jsfiddle.net/ambiguous/QbBfw/1/
I don't think I'd do this in real life but it works even if it does make me feel dirty. Consider this a curiosity rather than something you should really use.

In case if need remove backslashes also:
var test='a\\.b.c';
var result = test.replace(/\\?\./g, function (t) { return t == '.' ? '\u000B' : '.'; }).split('\u000B');
//result: ["a.b", "c"]

In 2022 most of browsers support lookbehinds:
https://caniuse.com/js-regexp-lookbehind
Safari should be your only concern.
With a lookbehind you can split your string this way:
"a\\,bcde,fgh,ijk\\,lmno,pqrst\\,uv".split(/(?<!\\),/)
// => ['a\\,bcde', 'fgh', 'ijk\\,lmno', 'pqrst\\,uv']

You can use regex to do the split.
Here is the link to regex in javascript http://www.w3schools.com/jsref/jsref_obj_regexp.asp
Here is the link to other post where the author have used regex for split Javascript won't split using regex
From the first link if you note you can create a regular expression using
?!n Matches any string that is not followed by a specific string n
[,]!\\

Related

Get All possible matches between forward slashes

I would like to get all possible matches of a string with forward slashes '/' using regex.
I would like to regex that matches all the possibilities of a string between slashes but excludes a part which has no ending '/'
For example a string /greatgrandparent/grandparent/parent/child
it should return something like this:
/greatgrandparent/
/greatgrandparent/grandparent/
/greatgrandparent/grandparent/parent/
The following regex that will get each word that begins with a / and a positive lookahead for the / character is this /\/\w+(?=\/)/g
You can use the match() function that will place each word it finds in an array. You can then loop through the array to combine the different results. Check out the snippet below.
var str = `/greatgrandparent/grandparent/parent/child`;
var strArr = str.match(/\/\w+(?=\/)/g);
console.log(strArr);
var strLoop = ``;
for (i = 0; i < strArr.length; i++) {
strLoop += strArr[i];
document.write(`${strLoop}<br>`);
}

Javascript Remove strings in beginning and end

base on the following string
...here..
..there...
.their.here.
How can i remove the . on the beginning and end of string like the trim that removes all spaces, using javascript
the output should be
here
there
their.here
These are the reasons why the RegEx for this task is /(^\.+|\.+$)/mg:
Inside /()/ is where you write the pattern of the substring you want to find in the string:
/(ol)/ This will find the substring ol in the string.
var x = "colt".replace(/(ol)/, 'a'); will give you x == "cat";
The ^\.+|\.+$ in /()/ is separated into 2 parts by the symbol | [means or]
^\.+ and \.+$
^\.+ means to find as many . as possible at the start.
^ means at the start; \ is to escape the character; adding + behind a character means to match any string containing one or more that character
\.+$ means to find as many . as possible at the end.
$ means at the end.
The m behind /()/ is used to specify that if the string has newline or carriage return characters, the ^ and $ operators will now match against a newline boundary, instead of a string boundary.
The g behind /()/ is used to perform a global match: so it find all matches rather than stopping after the first match.
To learn more about RegEx you can check out this guide.
Try to use the following regex
var text = '...here..\n..there...\n.their.here.';
var replaced = text.replace(/(^\.+|\.+$)/mg, '');
Here is working Demo
Use Regex /(^\.+|\.+$)/mg
^ represent at start
\.+ one or many full stops
$ represents at end
so:
var text = '...here..\n..there...\n.their.here.';
alert(text.replace(/(^\.+|\.+$)/mg, ''));
Here is an non regular expression answer which utilizes String.prototype
String.prototype.strim = function(needle){
var first_pos = 0;
var last_pos = this.length-1;
//find first non needle char position
for(var i = 0; i<this.length;i++){
if(this.charAt(i) !== needle){
first_pos = (i == 0? 0:i);
break;
}
}
//find last non needle char position
for(var i = this.length-1; i>0;i--){
if(this.charAt(i) !== needle){
last_pos = (i == this.length? this.length:i+1);
break;
}
}
return this.substring(first_pos,last_pos);
}
alert("...here..".strim('.'));
alert("..there...".strim('.'))
alert(".their.here.".strim('.'))
alert("hereagain..".strim('.'))
and see it working here : http://jsfiddle.net/cettox/VQPbp/
Slightly more code-golfy, if not readable, non-regexp prototype extension:
String.prototype.strim = function(needle) {
var out = this;
while (0 === out.indexOf(needle))
out = out.substr(needle.length);
while (out.length === out.lastIndexOf(needle) + needle.length)
out = out.slice(0,out.length-needle.length);
return out;
}
var spam = "this is a string that ends with thisthis";
alert("#" + spam.strim("this") + "#");
Fiddle-ige
Use RegEx with javaScript Replace
var res = s.replace(/(^\.+|\.+$)/mg, '');
We can use replace() method to remove the unwanted string in a string
Example:
var str = '<pre>I'm big fan of Stackoverflow</pre>'
str.replace(/<pre>/g, '').replace(/<\/pre>/g, '')
console.log(str)
output:
Check rules on RULES blotter

RegEx needed to split javascript string on "|" but not "\|"

We would like to split a string on instances of the pipe character |, but not if that character is preceded by an escape character, e.g. \|.
ex we would like to see the following string split into the following components
1|2|3\|4|5
1
2
3\|4
5
I'm expecting to be able to use the following javascript function, split, which takes a regular expression. What regex would I pass to split? We are cross platform and would like to support current and previous versions (1 version back) of IE, FF, and Chrome if possible.
Instead of a split, do a global match (the same way a lexical analyzer would):
match anything other than \\ or |
or match any escaped char
Something like this:
var str = "1|2|3\\|4|5";
var matches = str.match(/([^\\|]|\\.)+/g);
A quick explanation: ([^\\|]|\\.) matches either any character except '\' and '|' (pattern: [^\\|]) or (pattern: |) it matches any escaped character (pattern: \\.). The + after it tells it to match the previous once or more: the pattern ([^\\|]|\\.) will therefor be matches once or more. The g at the end of the regex literal tells the JavaScript regex engine to match the pattern globally instead of matching it just once.
What you're looking for is a "negative look-behind matching regular expression".
This isn't pretty, but it should split the list for you:
var output = input.replace(/(\\)?|/g, function($0,$1){ return $1?$1:$0+'\n';});
This will take your input string and replace all of the '|' characters NOT immediately preceded by a '\' character and replace them with '\n' characters.
A regex solution was posted as I was looking into this. So I just went ahead and wrote one without it. I did some simple benchmarks and it is -slightly- faster (I expected it to be slower...).
Without using Regex, if I understood what you desire, this should do the job:
function doSplit(input) {
var output = [];
var currPos = 0,
prevPos = -1;
while ((currPos = input.indexOf('|', currPos + 1)) != -1) {
if (input[currPos-1] == "\\") continue;
var recollect = input.substr(prevPos + 1, currPos - prevPos - 1);
prevPos = currPos;
output.push(recollect);
}
var recollect = input.substr(prevPos + 1);
output.push(recollect);
return output;
}
doSplit('1|2|3\\|4|5'); //returns [ '1', '2', '3\\|4', '5' ]

Regex matching JS source that's not in a string or regex literal

Do there exist comprehensive regular expressions that, when applied to JavaScript source code, will match all valid string literals (such as "say \"Hello\"") and regex literals (such as /and\/or/)? The expressions would have to cover all edge cases, including line breaks and escape sequences.
Alternatively, does anyone know of regexes for matching patterns outside of string and regex literals?
My goal is to implement a simple JavaScript syntax extension that allows macros in delimeters (e.g. {{#foo.bar}} or ##foo.bar#) to be expanded by a preprocessor. However, I'd like the macros to be processed only outside of literals.
For now, I'm trying to accomplish this using just string replacement, without having to augment an existing JavaScript lexer/parser.
This JavaScript preprocessor will itself be implemented in JavaScript.
This is the regex that I've been using to match quoted strings which is pretty good since it should work with almost all engines since it does not require backtracking or backreferences or any of that voodoo. This will match all text INSIDE literals.
"(\\.|[^"])*"
Depending on the engine, it might support non capturing groups. In that case you can use
"(?:\\.|[^"])*"
and it should be faster.
I think this is too much for regexes.
Consider var foo = "//" // /"(?:\\.|[^"])*"/. Where do the strings, comments and regex literals start and end? You would need to write a complete JavaScript parser to cover all edge cases. Of course, the parser will be using regexes...
I would probably go about doing something like the following. It will need to be improved for certain possible conditions, though.
var str = '"aaa \"sss \\t bbb" sss #3 ss# ((t sdsds)) ff ';
str += '/gg sdfd \/dsds/ {aaa bbb} {{ss}} {#sdsd#}';
var repeating = ['"','\\\'','/','\\~','\\#'];
// "example" 'example' /example/ ~example~ #example#
var enclosing = [];
enclosing.push(['\\{','\\}']);
enclosing.push(['\\{\\{','\\}\\}']);
enclosing.push(['\\[','\\]']);
enclosing.push(['\\(\\(','\\)\\)']);
// {example} {{example}} [example] ((example))
for (var forEnclosing='',i = 0 ; i < enclosing.length; i++) {
var e = enclosing[i];
var r = e[0]+'(\\\\['+e[0]+e[1]+']|[^'+e[0]+e[1]+'])*'+e[1];
forEnclosing += r + (i < enclosing.length-1 ? '|' : '');
}
for (var forRepeating='',i = 0; i < repeating.length; i++) {
var e = repeating[i];
var r = e+'(\\'+e+'|[^'+e+'])*'+e;
forRepeating += r + (i < repeating.length-1 ? '|' : '');
}
var rx = new RegExp('('+forEnclosing+'|'+forRepeating+')','g');
var m = str.match(rx);
try { for (var i = 0; i < m.length; i++) console.log(m[i]) }
catch(e) {}
Outputs:
"aaa "sss \t bbb"
#3 ss#
((t sdsds))
/gg sdfd /dsds/
{aaa bbb}
{{ss}}
{#sdsd#}
The closest you can get with a regex is to have one regex that matches EITHER a string literal (single- or double-quoted) OR a regex OR a comment (OR whatever else might contain bogus matches) OR one of your macro thingies:
"[^"\\]*(?:\\.[^"\\]*)*"
|
'[^'\\]*(?:\\.[^'\\]*)*'
|
/[^/\\]*(?:\\.[^/\\]*)*/[gim]*
|
/\*[^*]*(?:\*(?!/)[^*]*)*\*/
|
##(\w+\.\w+)#
If group #1 contains anything after the match, it must be what you're looking for. Otherwise, ignore this match and go on to the next one.

Javascript: How to remove characters from end of string? [duplicate]

I have a string, 12345.00, and I would like it to return 12345.0.
I have looked at trim, but it looks like it is only trimming whitespace and slice which I don't see how this would work. Any suggestions?
You can use the substring function:
let str = "12345.00";
str = str.substring(0, str.length - 1);
console.log(str);
This is the accepted answer, but as per the conversations below, the slice syntax is much clearer:
let str = "12345.00";
str = str.slice(0, -1);
console.log(str);
You can use slice! You just have to make sure you know how to use it. Positive #s are relative to the beginning, negative numbers are relative to the end.
js>"12345.00".slice(0,-1)
12345.0
You can use the substring method of JavaScript string objects:
s = s.substring(0, s.length - 4)
It unconditionally removes the last four characters from string s.
However, if you want to conditionally remove the last four characters, only if they are exactly _bar:
var re = /_bar$/;
s.replace(re, "");
The easiest method is to use the slice method of the string, which allows negative positions (corresponding to offsets from the end of the string):
const s = "your string";
const withoutLastFourChars = s.slice(0, -4);
If you needed something more general to remove everything after (and including) the last underscore, you could do the following (so long as s is guaranteed to contain at least one underscore):
const s = "your_string";
const withoutLastChunk = s.slice(0, s.lastIndexOf("_"));
console.log(withoutLastChunk);
For a number like your example, I would recommend doing this over substring:
console.log(parseFloat('12345.00').toFixed(1));
Do note that this will actually round the number, though, which I would imagine is desired but maybe not:
console.log(parseFloat('12345.46').toFixed(1));
Be aware that String.prototype.{ split, slice, substr, substring } operate on UTF-16 encoded strings
None of the previous answers are Unicode-aware.
Strings are encoded as UTF-16 in most modern JavaScript engines, but higher Unicode code points require surrogate pairs, so older, pre-existing string methods operate on UTF-16 code units, not Unicode code points.
See: Do NOT use .split('').
const string = "ẞ🦊";
console.log(string.slice(0, -1)); // "ẞ\ud83e"
console.log(string.substr(0, string.length - 1)); // "ẞ\ud83e"
console.log(string.substring(0, string.length - 1)); // "ẞ\ud83e"
console.log(string.replace(/.$/, "")); // "ẞ\ud83e"
console.log(string.match(/(.*).$/)[1]); // "ẞ\ud83e"
const utf16Chars = string.split("");
utf16Chars.pop();
console.log(utf16Chars.join("")); // "ẞ\ud83e"
In addition, RegExp methods, as suggested in older answers, don’t match line breaks at the end:
const string = "Hello, world!\n";
console.log(string.replace(/.$/, "").endsWith("\n")); // true
console.log(string.match(/(.*).$/) === null); // true
Use the string iterator to iterate characters
Unicode-aware code utilizes the string’s iterator; see Array.from and ... spread.
string[Symbol.iterator] can be used (e.g. instead of string) as well.
Also see How to split Unicode string to characters in JavaScript.
Examples:
const string = "ẞ🦊";
console.log(Array.from(string).slice(0, -1).join("")); // "ẞ"
console.log([
...string
].slice(0, -1).join("")); // "ẞ"
Use the s and u flags on a RegExp
The dotAll or s flag makes . match line break characters, the unicode or u flag enables certain Unicode-related features.
Note that, when using the u flag, you eliminate unnecessary identity escapes, as these are invalid in a u regex, e.g. \[ is fine, as it would start a character class without the backslash, but \: isn’t, as it’s a : with or without the backslash, so you need to remove the backslash.
Examples:
const unicodeString = "ẞ🦊",
lineBreakString = "Hello, world!\n";
console.log(lineBreakString.replace(/.$/s, "").endsWith("\n")); // false
console.log(lineBreakString.match(/(.*).$/s) === null); // false
console.log(unicodeString.replace(/.$/su, "")); // ẞ
console.log(unicodeString.match(/(.*).$/su)[1]); // ẞ
// Now `split` can be made Unicode-aware:
const unicodeCharacterArray = unicodeString.split(/(?:)/su),
lineBreakCharacterArray = lineBreakString.split(/(?:)/su);
unicodeCharacterArray.pop();
lineBreakCharacterArray.pop();
console.log(unicodeCharacterArray.join("")); // "ẞ"
console.log(lineBreakCharacterArray.join("").endsWith("\n")); // false
Note that some graphemes consist of more than one code point, e.g. 🏳️‍🌈 which consists of the sequence 🏳 (U+1F3F3), VS16 (U+FE0F), ZWJ (U+200D), 🌈 (U+1F308).
Here, even Array.from will split this into four “characters”.
Matching those is made easier with the RegExp set notation and properties of strings proposal.
Using JavaScript's slice function:
let string = 'foo_bar';
string = string.slice(0, -4); // Slice off last four characters here
console.log(string);
This could be used to remove '_bar' at end of a string, of any length.
A regular expression is what you are looking for:
let str = "foo_bar";
console.log(str.replace(/_bar$/, ""));
Try this:
const myString = "Hello World!";
console.log(myString.slice(0, -1));
Performance
Today 2020.05.13 I perform tests of chosen solutions on Chrome v81.0, Safari v13.1 and Firefox v76.0 on MacOs High Sierra v10.13.6.
Conclusions
the slice(0,-1)(D) is fast or fastest solution for short and long strings and it is recommended as fast cross-browser solution
solutions based on substring (C) and substr(E) are fast
solutions based on regular expressions (A,B) are slow/medium fast
solutions B, F and G are slow for long strings
solution F is slowest for short strings, G is slowest for long strings
Details
I perform two tests for solutions A, B, C, D, E(ext), F, G(my)
for 8-char short string (from OP question) - you can run it HERE
for 1M long string - you can run it HERE
Solutions are presented in below snippet
function A(str) {
return str.replace(/.$/, '');
}
function B(str) {
return str.match(/(.*).$/)[1];
}
function C(str) {
return str.substring(0, str.length - 1);
}
function D(str) {
return str.slice(0, -1);
}
function E(str) {
return str.substr(0, str.length - 1);
}
function F(str) {
let s= str.split("");
s.pop();
return s.join("");
}
function G(str) {
let s='';
for(let i=0; i<str.length-1; i++) s+=str[i];
return s;
}
// ---------
// TEST
// ---------
let log = (f)=>console.log(`${f.name}: ${f("12345.00")}`);
[A,B,C,D,E,F,G].map(f=>log(f));
This snippet only presents soutions
Here are example results for Chrome for short string
Use regex:
let aStr = "12345.00";
aStr = aStr.replace(/.$/, '');
console.log(aStr);
How about:
let myString = "12345.00";
console.log(myString.substring(0, myString.length - 1));
1. (.*), captures any character multiple times:
console.log("a string".match(/(.*).$/)[1]);
2. ., matches last character, in this case:
console.log("a string".match(/(.*).$/));
3. $, matches the end of the string:
console.log("a string".match(/(.*).{2}$/)[1]);
https://stackoverflow.com/questions/34817546/javascript-how-to-delete-last-two-characters-in-a-string
Just use trim if you don't want spaces
"11.01 °C".slice(0,-2).trim()
Here is an alternative that i don't think i've seen in the other answers, just for fun.
var strArr = "hello i'm a string".split("");
strArr.pop();
document.write(strArr.join(""));
Not as legible or simple as slice or substring but does allow you to play with the string using some nice array methods, so worth knowing.
debris = string.split("_") //explode string into array of strings indexed by "_"
debris.pop(); //pop last element off the array (which you didn't want)
result = debris.join("_"); //fuse the remainng items together like the sun
If you want to do generic rounding of floats, instead of just trimming the last character:
var float1 = 12345.00,
float2 = 12345.4567,
float3 = 12345.982;
var MoreMath = {
/**
* Rounds a value to the specified number of decimals
* #param float value The value to be rounded
* #param int nrDecimals The number of decimals to round value to
* #return float value rounded to nrDecimals decimals
*/
round: function (value, nrDecimals) {
var x = nrDecimals > 0 ? 10 * parseInt(nrDecimals, 10) : 1;
return Math.round(value * x) / x;
}
}
MoreMath.round(float1, 1) => 12345.0
MoreMath.round(float2, 1) => 12345.5
MoreMath.round(float3, 1) => 12346.0
EDIT: Seems like there exists a built in function for this, as Paolo points out. That solution is obviously much cleaner than mine. Use parseFloat followed by toFixed
if(str.substring(str.length - 4) == "_bar")
{
str = str.substring(0, str.length - 4);
}
Via slice(indexStart, indexEnd) method - note, this does NOT CHANGE the existing string, it creates a copy and changes the copy.
console.clear();
let str = "12345.00";
let a = str.slice(0, str.length -1)
console.log(a, "<= a");
console.log(str, "<= str is NOT changed");
Via Regular Expression method - note, this does NOT CHANGE the existing string, it creates a copy and changes the copy.
console.clear();
let regExp = /.$/g
let b = str.replace(regExp,"")
console.log(b, "<= b");
console.log(str, "<= str is NOT changed");
Via array.splice() method -> this only works on arrays, and it CHANGES, the existing array (so careful with this one), you'll need to convert a string to an array first, then back.
console.clear();
let str = "12345.00";
let strToArray = str.split("")
console.log(strToArray, "<= strToArray");
let spliceMethod = strToArray.splice(str.length-1, 1)
str = strToArray.join("")
console.log(str, "<= str is changed now");
In cases where you want to remove something that is close to the end of a string (in case of variable sized strings) you can combine slice() and substr().
I had a string with markup, dynamically built, with a list of anchor tags separated by comma. The string was something like:
var str = "<a>text 1,</a><a>text 2,</a><a>text 2.3,</a><a>text abc,</a>";
To remove the last comma I did the following:
str = str.slice(0, -5) + str.substr(-4);
You can, in fact, remove the last arr.length - 2 items of an array using arr.length = 2, which if the array length was 5, would remove the last 3 items.
Sadly, this does not work for strings, but we can use split() to split the string, and then join() to join the string after we've made any modifications.
var str = 'string'
String.prototype.removeLast = function(n) {
var string = this.split('')
string.length = string.length - n
return string.join('')
}
console.log(str.removeLast(3))
Try to use toFixed
const str = "12345.00";
return (+str).toFixed(1);
Try this:
<script>
var x="foo_foo_foo_bar";
for (var i=0; i<=x.length; i++) {
if (x[i]=="_" && x[i+1]=="b") {
break;
}
else {
document.write(x[i]);
}
}
</script>
You can also try the live working example on http://jsfiddle.net/informativejavascript/F7WTn/87/.
#Jason S:
You can use slice! You just have to
make sure you know how to use it.
Positive #s are relative to the
beginning, negative numbers are
relative to the end.
js>"12345.00".slice(0,-1)
12345.0
Sorry for my graphomany but post was tagged 'jquery' earlier. So, you can't use slice() inside jQuery because slice() is jQuery method for operations with DOM elements, not substrings ...
In other words answer #Jon Erickson suggest really perfect solution.
However, your method will works out of jQuery function, inside simple Javascript.
Need to say due to last discussion in comments, that jQuery is very much more often renewable extension of JS than his own parent most known ECMAScript.
Here also exist two methods:
as our:
string.substring(from,to) as plus if 'to' index nulled returns the rest of string. so:
string.substring(from) positive or negative ...
and some other - substr() - which provide range of substring and 'length' can be positive only:
string.substr(start,length)
Also some maintainers suggest that last method string.substr(start,length) do not works or work with error for MSIE.
Use substring to get everything to the left of _bar. But first you have to get the instr of _bar in the string:
str.substring(3, 7);
3 is that start and 7 is the length.

Categories

Resources