Javascript regex between string delimiters - javascript

I have the following string:
%||1234567890||Joe||% some text winter is coming %||1234567890||Robert||%
PROBLEM: I am trying to match all occurrences between %||....||% and process those substring matches
MY REGEX: /%([\s\S]*?)(?=%)/g
MY CODE
var a = "%||1234567890||Joe||% some text winter is coming %||1234567890||Robert||%";
var pattern = /%([\s\S]*?)(?=%)/g;
a.replace( pattern, function replacer(match){
return match.doSomething();
} );
Now the patterns seems to be selecting the everything between the first and last occurrence of %|| .... %||
MY
FIDDLE
WHAT I NEED:
I want to iterate over the matches
%||1234567890||Joe||%
AND
%||1234567890||Robert||%
and do something

You need to use a callback inside a String#replace and modify the pattern to only match what is inside %|| and ||% like this:
var a = "%||1234567890||Joe||% some text winter is coming %||1234567890||Robert||%";
var pattern = /%\|\|([\s\S]*?)\|\|%/g;
a = a.replace( pattern, function (match, group1){
var chunks = group1.split('||');
return "{1}" + chunks.join("-") + "{/1}";
} );
console.log(a);
The /%\|\|([\s\S]*?)\|\|%/g pattern will match:
%\|\| - a %|| substring
([\s\S]*?) - Capturing group 1 matching any 0+ chars as few as possible up to the first...
\|\|% - a ||% substring
/g - multiple times.

Because he tries to take as much as possible, and [\s\S] basically means "anything". So he takes anything.
RegExp parts without escaping, exploded for readability
start tag : %||
first info: ([^|]*) // will stop at the first |
separator : ||
last info : ([^|]*) // will stop at the first |
end tag : ||%
Escaped RegExp:
/%\|\|([^\|]*)\|\|([^\|]*)\|\|%/g

Related

How does this regexp work?

RegExes give me headaches. I have a very simple regex but I don't understand how it works.
The code:
var str= "startBlablablablablaend";
var regex = /start(.*?)end/;
var match = str.match(regex);
console.log( match[0] ); //startBlablablablablaend
console.log( match[1] ); //Blablablablabla
What I ultimately want would be the second one, in other words the text between the two delimiters (start,end).
My questions:
How does it work? (each character explained please)
Why does it match two different things?
Is there a better way to get match[1]?
If I want to get all the text's between all the start-end instances, how would I go about it?
For the last question, what I mean:
var str = "startBla1end startBla2end startBla3end";
var regex = /start(.*?)end/gmi;
var match = str.match(regex);
console.log( match ); // [ "startBla1end" , "startBla2end" , "startBla3end" ]
What I need is:
console.log( match ); // [ "Bla1" , "Bla2" , "Bla3" ];
Thanks :)
How does it work?
start matches start in the string
(.*?) non greedy match for character
end matches the end in the string
Matching
startBlablablablablaend
|
start
startBlablablablablaend
|
.
startBlablablablablaend
|
.
# and so on since quantifier * matches any number of character. ? makes the match non greedy
startBlablablablablaend
|
end
Why does it match two different things?
It doesnt match 2 differnt things
match[0] will contain the entire match
match[1] will contain the first capture group (the part matched in the first paranthesis)
Is there a better way to get match[1]?
Short answer No
If you are using languages other than javascript. its possible using look arounds
(?<=start)(.*?)(?=end)
#Blablablablabla
Note This wont work with javascript as it doesnt support negative lookbehinds
Last Question
The best that you can get from a single match statement would be
var str = "startBla1end startBla2end startBla3end";
var regex = /start(.*?)(?=end)/gmi;
var match = str.match(regex);
console.log( match ); // [ "startBla" , "startBla2" , "startBla3" ]
You need not to do a much effort on it.
Try this this regex:
start(.*)end
You can look at this stackoverflow question which already been answered before.
Regular Expression to get a string between two strings in Javascript
Hope it helps.
To solve your last question, you can split up your string and iterate:
var str = "startBla1end startBla2end startBla3end";
var str_array = str.split(" ");
Then iterate over each element of the str_array using your existing code to extract each Bla# substring.

javascript replace text at second occurence of "/"

I have this string
"/mp3/mysong.mp3"
I need to do make this string look like this with javascript.
"/mp3/myusername/mysong.mp3"
My guess would be to find second occurrence of "/", then append "myusername/" there or prepend "/myusername" but I'm not sure how to do this in javascript.
Just capture the characters upto the second / symbol and store it into a group. Then replace the matched characters with the characters inside group 1 plus the string /myusername
Regex:
^(\/[^\/]*)
Replacement string:
$1/myusername
DEMO
> var r = "/mp3/mysong.mp3"
undefined
> r.replace(/^(\/[^\/]*)/, "$1/myusername")
'/mp3/myusername/mysong.mp3'
OR
Use a lookahead.
> r.replace(/(?=\/[^/]*$)/, "/myusername")
'/mp3/myusername/mysong.mp3'
This (?=\/[^/]*$) matches a boundary which was just before to the last / symbol. Replacing the matched boundary with /myusername will give you the desired result.
This works -
> "/mp3/mysong.mp3".replace(/(.*?\/)(\w+\.\w+)/, "$1myusername\/$2")
"/mp3/myusername/mysong.mp3"
Demo and explanation of the regex here
use this :
var str = "/mp3/mysong.mp3";
var res = str.replace(/(.*?\/){2}/g, "$1myusername/");
console.log(res);
this will insert the text myusername after the 2nd / .

Restrict action of toLowerCase to part of a string?

I want to convert most of a string to lower case, except for those characters inside of brackets. After converting everything outside the brackets to lower case, I then want to remove the brackets. So giving {H}ell{o} World as input should give Hello world as output. Removing the brackets is simple, but is there a way to selectively make everything outside the brackets lower case with regular expressions? If there's no simple regex solution, what's the easiest way to do this in javascript?
You can try this:
var str='{H}ell{o} World';
str = str.replace(/{([^}]*)}|[^{]+/g, function (m,p1) {
return (p1)? p1 : m.toLowerCase();} );
console.log(str);
The pattern match:
{([^}]*)} # all that is between curly brackets
# and put the content in the capture group 1
| # OR
[^{]+ # anything until the regex engine meet a {
# since the character class is all characters but {
the callback function has two arguments:
m the complete match
p1 the first capturing group
it returns p1 if p1 is not empty
else the whole match m in lowercase.
Details:
"{H}" p1 contains H (first part of the alternation)
p1 is return as it. Note that since the curly brackets are
not captured, they are not in the result. -->"H"
"ell" (second part of the alternation) p1 is empty, the full match
is returned in lowercase -->"ell"
"{o}" (first part) -->"o"
" World" (second part) -->" world"
I think this is probably what you are looking for:
Change case using Javascript regex
Detect on the first curly brace instead of a hyphen.
Assuming that all parentheses are well balanced, the parts that should be lower cased are contained like this:
Left hand side is either the start of your string or }
Right hand side is either the end of your string or {
This the code that would work:
var str = '{H}ELLO {W}ORLD';
str.replace(/(?:^|})(.*?)(?:$|{)/g, function($0, $1) {
return $1.toLowerCase();
});
// "Hello World"
I would amend #Jack s solution as follows :
var str = '{H}ELLO {W}ORLD';
str = str.replace (/(?:^|\})(.*?)(?:\{|$)/g, function($0, $1) {
return $1.toLowerCase ();
});
Which performs both the lower casing and the bracket removal in one operation!

Javascript (node) regex doesn't seem to match start of string

im struggling with regular expressions in Javascript, they don't seem to start at the beginning of the string. In a simple example bellow I want to get the file name and then everything after the first colon
//string
file.text:16: lots of random text here with goes on for ages
//regex
(.?)[:](.*)
// group 1 returns 't'
/^([^:]+):(.*)/.exec('file.text:16: lots of random text here with goes on for ages')
gives ....
["file.text:16: lots of random text here with goes on for ages", "file.text", "16: lots of random text here with goes on for ages"]
Try this regex:
/^([^:]+)[:](.*)/
Explaination:
^ #Start of string
( #Start of capturing class #1
[^:] #Any character other than :
+ #One or more of the previous character class
) #End of capturing class #1
[:] #One :
(.*) #Any number of characters other than newline
The ? operator captures zero or one of the previous symbol only.
You could also use string operations instead:
str = "file.text:16:";
var n = str.indexOf(":");
var fileName = str.substr(0, n);
var everythingElse = str.substr(n);
The ? operator returns 0 or 1 matches. You want the * operator, and you should select everything that isn't a : in the first set
([^:]*)[:](.*)
Non-regexy answer:
var a = s.split(":");
Then join a[1] and remaining elements.
Or just get the index of the first semicolon and create two strings using that.

java script Regular Expressions patterns problem

My problem start with like-
var str='0|31|2|03|.....|4|2007'
str=str.replace(/[^|]\d*[^|]/,'5');
so the output becomes like:"0|5|2|03|....|4|2007" so it replaces 31->5
But this doesn't work for replacing other segments when i change code like this:
str=str.replace(/[^|]{2}\d*[^|]/,'6');
doesn't change 2->6.
What actually i am missing here.Any help?
I think a regular expression is a bad solution for that problem. I'd rather do something like this:
var str = '0|31|2|03|4|2007';
var segments = str.split("|");
segments[1] = "35";
segments[2] = "123";
Can't think of a good way to solve this with a regexp.
Here is a specific regex solution which replaces the number following the first | pipe symbol with the number 5:
var re = /^((?:\d+\|){1})\d+/;
return text.replace(re, '$15');
If you want to replace the digits following the third |, simply change the {1} portion of the regex to {3}
Here is a generalized function that will replace any given number slot (zero-based index), with a specified new number:
function replaceNthNumber(text, n, newnum) {
var re = new RegExp("^((?:\\d+\\|){"+ n +'})\\d+');
return text.replace(re, '$1'+ newnum);
}
Firstly, you don't have to escape | in the character set, because it doesn't have any special meaning in character sets.
Secondly, you don't put quantifiers in character sets.
And finally, to create a global matching expression, you have to use the g flag.
[^\|] means anything but a '|', so in your case it only matches a digit. So it will only match anything with 2 or more digits.
Second you should put the {2} outside of the []-brackets
I'm not sure what you want to achieve here.

Categories

Resources