Javascript regex: discard end of string match - javascript

I want to split a string preserving the newlines. The string can be everything, so the code must work in any case (new lines at begin of string, at end of string, consecutive new lines...).
I'm using this code:
var text = "abcd\nefg\n\nhijk\n"
var matches = text.match(/.*\n?/g)
which produces the following result:
[ 'abcd\n', 'efg\n', '\n', 'hijk', '' ]
That is what I need, except for the last match ('').
Actually I use matches.pop() in order to remove it, but I wonder if the regex could be improved in order to avoid that match.
Bonus points if you can explain why that match is present (I can't find any reason, but I suck at regexs :-) ).

Use an alternative:
var text = "abcd\nefg\n\nhijk\n";
var matches = text.match(/.+\n?|\n/g);

You can use array#filter:
var matches = text.match(/.*\n?/g).filter(Boolean);
//=> [ 'abcd\n', 'efg\n', '\n', 'hijk' ]
Or using a slightly different regex with non-optional \n (but it assumes new line is always there after last line):
var matches = text.match(/.*\n/g);
//=> [ 'abcd\n', 'efg\n', '\n', 'hijk' ]

Related

javascript regex to find only numbers with hyphen from a string content

In Javascript, from a string like this, I am trying to extract only the number with a hyphen. i.e. 67-64-1 and 35554-44-04. Sometimes there could be more hyphens.
The solvent 67-64-1 is not compatible with 35554-44-04
I tried different regex but not able to get it correctly. For example, this regex gets only the first value.
var msg = 'The solvent 67-64-1 is not compatible with 35554-44-04';
//var regex = /\d+\-?/;
var regex = /(?:\d*-\d*-\d*)/;
var res = msg.match(regex);
console.log(res);
You just need to add the g (global) flag to your regex to match more than once in the string. Note that you should use \d+, not \d*, so that you don't match something like '3--4'. To allow for processing numbers with more hyphens, we use a repeating -\d+ group after the first \d+:
var msg = 'The solvent 67-64-1 is not compatible with 23-35554-44-04 but is compatible with 1-23';
var regex = /\d+(?:-\d+)+/g;
var res = msg.match(regex);
console.log(res);
It gives only first because regex work for first element to test
// g give globel access to find all
var regex = /(?:\d*-\d*-\d*)/g;

Lookbehind alternative with both lookbehind and lookahead

I'm looking for a regex to split user supplied strings on the : character but not when the user has escaped the colon \: or it's part of a url, e.g. https://stackoverflow...
In javascript the majority of browsers don't yet support lookbehinds. Is it possible to apply some other approach for the lookbehind part?
In clojure/ Clojurescript on Chrome (which does support lookbehinds) this regex does the trick:
#"(?<!\):(?!//)"
but not in Safari (for example).
The main problem is that currently browsers aren't supporting the lookbehind, which is required to find and negate the prefix \ so we don't include \:.
One workaround (not very pretty but it works) is to first substitute the \: with some "symbol" you know will not occur naturally in your text, do your split, and the substitute back any \:.
For example, this method will return an empty element "" if you have "::" in your string:
let regex = /:(?!\/\/)/
//original string literal \: has to be expressed as \\:
let str = "http://example.com::hello:dolly:12\\:00\\:PM";
//substitute out any \:
str = str.replace(/\\:/g,"<colon>"); //http://example.com::hello:dolly:12<colon>00<colon>PM
//now we split 'normally' without lookbehind
let arr = str.split(regex); //[ 'http://example.com', '', 'hello', 'dolly', '12\\:00\\:PM' ]
//substitute back \:
arr = arr.map(element => element.replace(/<colon>/g, "\\:")); //[ 'http://example.com', '', 'hello', 'dolly', '12\\:00\\:PM' ]
console.log(arr);
If you're just after non-empty elements you can just do an arr.filter(Boolean) on it, or just use #Skeeve's matching solution as it's more elegant for this purpose.
An alternative could be to not search for the separator but to search for the elements:
var str="this:is\\:a:test:https://stackoverflow:80:test::test";
var elements= str.match(/((?:[^\\:]|\\:|:\/\/)+)/g);
// elements= [ "this", "is\\:a", "test", "https://stackoverflow", "80", "test", "test" ]
The elements may not be empty (Observe the"+" in the regexp) and how the empty element between the last 2 "test" is missing
You forgot that an URL can contain multiple colons. What about `http://me:password#myhost.com:8080/path?value=d:f'
Besides these I think it should work for you.
I think you can only overcome the disadvantages with a more or less sophisticated loop using regexp-exec.
P.S. I know the grouping isn't required here, but if you want to use it in regexp-exec, you'll need it.
Disadvantages:
P.P.S. Fixed the typo #chatnoir found
You might also make use of replace and pass a function as the second parameter.
You could use a pattern to match what you don't want and capture in a group what you want to keep. Then you can replace the part that you want to keep with a marker just as in the approach of #chatnoir and afterwards split on that marker.
:\/\/\S+|\\:|(:)
Explanation
:\/\/\S+ Match :// followed by 1+ times a non whitespace char
| Or
\\: Match \:
| Or
(:) Capture a : in group 1
Regex demo
let pattern = /:\/\/\S+|\\:|(:)/g;
let str = "string\\: or https://www.example.com:8000 or split:me or te\\:st or \\:test or notsplit\\:me:splitted or \\: or ftp://example.com :";
str = str.replace(pattern, function(match, group1) {
return group1 === undefined ? match : "<split>"
});
console.log(str.split("<split>").filter(Boolean));

remove last part of string following '&&&' with JavaScript Regex

I'm trying to use a regex in JS to remove the last part of a string. This substring starts with &&&, is followed by something not &&&, and ends with .pdf.
So, for example, the final regex should take a string like:
parent&&&child&&&grandchild.pdf
and match
parent&&&child
I'm not that great with regex's, so my best effort has been something like:
.*?(?:&&&.*\.pdf)
Which matches the whole string. Can anyone help me out?
You may use this greedy regex either in replace or in match:
var s = 'parent&&&child&&&grandchild.pdf';
// using replace
var r = s.replace(/(.*)&&&.*\.pdf$/, '$1');
console.log(r);
//=> parent&&&child
// using match
var m = s.match(/(.*)&&&.*\.pdf$/)
if (m) {
console.log(m[1]);
//=> parent&&&child
}
By using greedy pattern .* before &&& we make sure to match **last instance of &&& in input.
You want to remove the last portion, so replace it
var str = "parent&&&child&&&grandchild.pdf"
var result = str.replace(/&&&[^&]+\.pdf$/, '')
console.log(result)

How does this regexp work?

RegExes give me headaches. I have a very simple regex but I don't understand how it works.
The code:
var str= "startBlablablablablaend";
var regex = /start(.*?)end/;
var match = str.match(regex);
console.log( match[0] ); //startBlablablablablaend
console.log( match[1] ); //Blablablablabla
What I ultimately want would be the second one, in other words the text between the two delimiters (start,end).
My questions:
How does it work? (each character explained please)
Why does it match two different things?
Is there a better way to get match[1]?
If I want to get all the text's between all the start-end instances, how would I go about it?
For the last question, what I mean:
var str = "startBla1end startBla2end startBla3end";
var regex = /start(.*?)end/gmi;
var match = str.match(regex);
console.log( match ); // [ "startBla1end" , "startBla2end" , "startBla3end" ]
What I need is:
console.log( match ); // [ "Bla1" , "Bla2" , "Bla3" ];
Thanks :)
How does it work?
start matches start in the string
(.*?) non greedy match for character
end matches the end in the string
Matching
startBlablablablablaend
|
start
startBlablablablablaend
|
.
startBlablablablablaend
|
.
# and so on since quantifier * matches any number of character. ? makes the match non greedy
startBlablablablablaend
|
end
Why does it match two different things?
It doesnt match 2 differnt things
match[0] will contain the entire match
match[1] will contain the first capture group (the part matched in the first paranthesis)
Is there a better way to get match[1]?
Short answer No
If you are using languages other than javascript. its possible using look arounds
(?<=start)(.*?)(?=end)
#Blablablablabla
Note This wont work with javascript as it doesnt support negative lookbehinds
Last Question
The best that you can get from a single match statement would be
var str = "startBla1end startBla2end startBla3end";
var regex = /start(.*?)(?=end)/gmi;
var match = str.match(regex);
console.log( match ); // [ "startBla" , "startBla2" , "startBla3" ]
You need not to do a much effort on it.
Try this this regex:
start(.*)end
You can look at this stackoverflow question which already been answered before.
Regular Expression to get a string between two strings in Javascript
Hope it helps.
To solve your last question, you can split up your string and iterate:
var str = "startBla1end startBla2end startBla3end";
var str_array = str.split(" ");
Then iterate over each element of the str_array using your existing code to extract each Bla# substring.

using a lookahead to get the last occurrence of a pattern in javascript

I was able to build a regex to extract a part of a pattern:
var regex = /\w+\[(\w+)_attributes\]\[\d+\]\[own_property\]/g;
var match = regex.exec( "client_profile[foreclosure_defenses_attributes][0][own_property]" );
match[1] // "foreclosure_defenses"
However, I also have a situation where there will be a repetitive pattern like so:
"client_profile[lead_profile_attributes][foreclosure_defenses_attributes][0][own_property]"
In that case, I want to ignore [lead_profile_attributes] and just extract the portion of the last occurence as I did in the first example. In other words, I still want to match "foreclosure_defenses" in this case.
Since all patterns will be like [(\w+)_attributes], I tried to do a lookahead, but it is not working:
var regex = /\w+\[(\w+)_attributes\](?!\[(\w+)_attributes\])\[\d+\]\[own_property\]/g;
var match = regex.exec("client_profile[lead_profile_attributes][foreclosure_defenses_attributes][0][own_property]");
match // null
match returns null meaning that my regex isn't working as expected. I added the following:
\[(\w+)_attributes\](?!\[(\w+)_attributes\])
Because I want to match only the last occurrence of the following pattern:
[lead_profile_attributes][foreclosure_defenses_attributes]
I just want to grab the foreclosure_defenses, not the lead_profile.
What might I be doing wrong?
I think I got it working without positive lookahead:
regex = /(\[(\w+)_attributes\])+/
/(\[(\w+)_attributes\])+/
match = regex.exec(str);
["[a_attributes][b_attributes][c_attributes]", "[c_attributes]", "c"]
I was able to also achieve it through noncapturing groups. Output from chrome console:
var regex = /(?:\w+(\[\w+\]\[\d+\])+)(\[\w+\])/;
undefined
regex
/(?:\w+(\[\w+\]\[\d+\])+)(\[\w+\])/
str = "profile[foreclosure_defenses_attributes][0][properties_attributes][0][other_stuff]";
"profile[foreclosure_defenses_attributes][0][properties_attributes][0][other_stuff]"
match = regex.exec(str);
["profile[foreclosure_defenses_attributes][0][properties_attributes][0][other_stuff]", "[properties_attributes][0]", "[other_stuff]"]

Categories

Resources