Say you have a string: "The ABC cow jumped over XYZ the moon" and you want to use jQuery to get the substring between the "ABC" and "XYZ", how would you do this? The substring should be "cow jumped over". Many thanks!
This has nothing to do with jQuery, which is primarily for DOM traversal and manipulation. You want a simple regular expression:
var str = "The ABC cow jumped over XYZ the moon";
var sub = str.replace(/^.*ABC(.*)XYZ.*$/m, '$1');
The idea is you're using a String.replace with a regular expression which matches your opening and closing delimiters, and replacing the whole string with the part matched between the delimiters.
The first argument is a regular expression. The trailing m causes it to match over multiple lines, meaning your text between ABC and XYZ may contain newlines. The rest breaks down as follows:
^ start at the beginning of the string
.* a series of 0 or more characters
ABC your opening delimiter
(.*) match a series of 0 or more characters
XYZ your closing delimiter
.* a series of 0 or more characters
$ match to the end of the string
The second parameter, the replacement string, is '$1'. replace will substitute in parenthesized submatchs from your regular exprsesion - the (.*) portion from above. Thus the return value is the entire string replace with the parts between the delimiters.
You may not need to use jQuery on this one. I'd do something like this:
function between(str, left, right) {
if( !str || !left || !right ) return null;
var left_loc = str.indexOf(left);
var right_loc = str.indexOf(right);
if( left_loc == -1 || right_loc == -1 ) return null;
return str.substring(left_loc + left.length, right_loc);
}
No guarantees the above code is bug-free, but the idea is to use the standard substring() function. In my experience these types of functions work the same across all browsers.
Meagar, your explanation is great, and clearly explains who it works.
Just a few minor questions:
Are the () parenthesis required ONLY as a way to indicate a submatch in the second parameter of the relpace function or would this also identify the submatches: /^.*ABC.XYZ.$/ but not work for what we are trying to do in this case?
Does this regular expression have 7 submatches:
^
.*
ABC
.*
XYZ
.*
$
Does the $1 mean to use the first parenthesized submatch? At first I thought it might mean to use the second submatch in the series (the first being $0).
Thanks,
Steve
Just to show you how you would use jQuery and meagar's regex. Let's say that you've got an HTML page with the following P tag:
<p id="grabthis">The ABC cow jumped over XYZ the moon</p>
To grab the string, you would use the following jQuery/JavaScript mix (sounds kind of stupid, since jQuery is JavaScript, but see jQuery as a JavaScript DOM library):
$(document).ready(function() { // Wait until the document has been fully loaded
var pStr=$("#grabthis").text(); // Grab the text from the P tag and put it into a JS variable
var subStr=pStr.replace(/^.*ABC(.*)XYZ.*$/m, '$1'); // Run the regex to grab the middle string
alert(subStr); // Output the grabbed middle string
});
Or the shorter version:
$(document).ready(function() {
alert($("#grabthis").text().replace(/^.*ABC(.*)XYZ.*$/m, '$1'));
});
The replace function is a JavaScript function. I hope this clears the confusion.
Related
I've been hoving around by some answers here, and I can't find a solution to my problem:
I have this regexp which matches everyting inside an HTML span tag, including contents:
<span\b[^>]*>(.*?)</span>
and I want to find a way to make a search in all the text, except for what is matched with that regexp.
For example, if my text is:
var text = "...for there is a class of <span class="highlight">guinea</span> pigs which..."
... then the regexp would match:
<span class="highlight">guinea</span>
and I want to be able to make a regexp such that if I search for "class", regexp will match "...for there is a class of..."
and will not match inside the tag, like in
"... class="highlight"..."
The word to be matched ("class") might be anywhere within the text. I've tried
(?!<span\b[^>]*>(.*?)</span>)class
but it keeps searching inside tags as well.
I want to find a solution using only regexp, not dealing with DOM nor JQuery. Thanks in advance :).
Although I wouldn't recommend this, I would do something like below
(class)(?:(?=.*<span\b[^>]*>))|(?:(?<=<\/span>).*)(class)
You can see this in action here
Rubular Link for this regex
You can capture your matches from the groups and work with them as needed. If you can, use a HTML parser and then find matches from the text element.
It's not pretty, but if I get you right, this should do what you wan't. It's done with a single RegEx but js can't (to my knowledge) extract the result without joining the results in a loop.
The RegEx: /(?:<span\b[^>]*>.*?<\/span>)|(.)/g
Example js code:
var str = '...for there is a class of <span class="highlight">guinea</span> pigs which...',
pattern = /(?:<span\b[^>]*>.*?<\/span>)|(.)/g,
match,
res = '';
match = pattern.exec(str)
while( match != null )
{
res += match[1];
match = pattern.exec(str)
}
document.writeln('Result:' + res);
In English: Do a non capturing test against your tag-expression or capture any character. Do this globally to get the entire string. The result is a capture group for each character in your string, except the tag. As pointed out, this is ugly - can result in a serious number of capture groups - but gets the job done.
If you need to send it in and retrieve the result in one call, I'd have to agree with previous contributors - It can't be done!
I am using regular expressions to do some basic converting of wiki markup code into copy-pastable plain text, and I'm using javascript to do the work.
However, javascript's regex engine behaves much differently to the ones I've used previously as well as the regex in Notepad++ that I use on a daily basis.
For example- given a test string:
==Section Header==
===Subsection 1===
# Content begins here.
## Content continues here.
I want to end up with:
Section Header
Subsection 1
# Content begins here.
## Content continues here.
Simply remove all equals signs.
I began with the regex setup of:
var reg_titles = /(^)(=+)(.+)(=+)/
This regex searches for lines that begin with one or more equals with another set of one or more equals. Rubular shows that it matches my lines accurately and does not catch equals signs in the middle of contet. http://www.rubular.com/r/46PrkPx8OB
The code to replace the string based on regex
var lines = $('.tb_in').val().split('\n'); //use jquery to grab text in a textarea, and split into an array of lines based on the \n
for(var i = 0;i < lines.length;i++){
line_temp = lines[i].replace(reg_titles, "");
lines[i] = line_temp; //replace line with temp
}
$('.tb_out').val(lines.join("\n")); //rejoin and print result
My result is unfortunately:
Section Header==
Subsection 1===
# Content begins here.
## Content continues here.
I cannot figure out why the regex replace function, when it finds multiple matches, seems to only replace the first instance it finds, not all instances.
Even when my regex is updated to:
var reg_titles = /(={2,})/
"Find any two or more equals", the output is still identical. It makes a single replacement and ignores all other matches.
No one regex expression executor behaves this way for me. Running the same replace multiple times has no effect.
Any advice on how to get my string replace function to replace ALL instances of the matched regex instead of just the first one?
^=+|=+$
You can use this.Do not forget to add g and m flags.Replace by ``.See demo.
http://regex101.com/r/nA6hN9/28
Add the g modifier to do a global search:
var reg_titles = /^(=+)(.+?)(=+)/g
Your regex is needlessly complex, and yet doesn't actually accomplish what you set out to do. :) You might try something like this instead:
var reg_titles = /^=+(.+?)=+$/;
lines = $('.tb_in').val().split('\n');
lines.forEach(function(v, i, a) {
a[i] = v.replace(reg_titles, '$1');
})
$('.tb_out').val(lines.join("\n"));
I am trying to find a regular expression that will match a string when it's NOT preceded by another specific string (in my case, when it is NOT preceded by "http://"). This is in JavaScript, and I'm running on Chrome (not that it should matter).
The sample code is:
var str = 'http://www.stackoverflow.com www.stackoverflow.com';
alert(str.replace(new RegExp('SOMETHING','g'),'rocks'));
And I want to replace SOMETHING with a regular expression that means "match www.stackoverflow.com unless it's preceded by http://". The alert should then say "http://www.stackoverflow.com rocks", naturally.
Can anyone help? It feels like I tried everything found in previous answers, but nothing works. Thanks!
As JavaScript regex engines don't support 'lookbehind' assertions, it's not possible to do with plain regex. Still, there's a workaround, involving replace callback function:
var str = "As http://JavaScript regex engines don't support `lookbehind`, it's not possible to do with plain regex. Still, there's a workaround";
var adjusted = str.replace(/\S+/g, function(match) {
return match.slice(0, 7) === 'http://'
? match
: 'rocks'
});
console.log(adjusted);
You can actually create a generator for these functions:
var replaceIfNotPrecededBy = function(notPrecededBy, replacement) {
return function(match) {
return match.slice(0, notPrecededBy.length) === notPrecededBy
? match
: replacement;
}
};
... then use it in that replace instead:
var adjusted = str.replace(/\S+/g, replaceIfNotPrecededBy('http://', 'rocks'));
JS Fiddle.
raina77ow's answer reflected the situation in 2013, but it is now outdated, as the proposal for lookbehind assertions got accepted into the ECMAScript spec in 2018.
See docs for it on MDN:
Characters
Meaning
(?<!y)x
Negative lookbehind assertion: Matches "x" only if "x" is not preceded by "y". For example, /(?<!-)\d+/ matches a number only if it is not preceded by a minus sign. /(?<!-)\d+/.exec('3') matches "3". /(?<!-)\d+/.exec('-3') match is not found because the number is preceded by the minus sign.
Therefore, you can now express "match www.stackoverflow.com unless it's preceded by http://" as /(?<!http:\/\/)www.stackoverflow.com/:
const str = 'http://www.stackoverflow.com www.stackoverflow.com';
console.log(str.replace(/(?<!http:\/\/)www.stackoverflow.com/g, 'rocks'));
This also works:
var variable = 'http://www.example.com www.example.com';
alert(variable.replace(new RegExp('([^(http:\/\/)|(https:\/\/)])(www.example.com)','g'),'$1rocks'));
The alert says "http://www.example.com rocks".
I want to convert most of a string to lower case, except for those characters inside of brackets. After converting everything outside the brackets to lower case, I then want to remove the brackets. So giving {H}ell{o} World as input should give Hello world as output. Removing the brackets is simple, but is there a way to selectively make everything outside the brackets lower case with regular expressions? If there's no simple regex solution, what's the easiest way to do this in javascript?
You can try this:
var str='{H}ell{o} World';
str = str.replace(/{([^}]*)}|[^{]+/g, function (m,p1) {
return (p1)? p1 : m.toLowerCase();} );
console.log(str);
The pattern match:
{([^}]*)} # all that is between curly brackets
# and put the content in the capture group 1
| # OR
[^{]+ # anything until the regex engine meet a {
# since the character class is all characters but {
the callback function has two arguments:
m the complete match
p1 the first capturing group
it returns p1 if p1 is not empty
else the whole match m in lowercase.
Details:
"{H}" p1 contains H (first part of the alternation)
p1 is return as it. Note that since the curly brackets are
not captured, they are not in the result. -->"H"
"ell" (second part of the alternation) p1 is empty, the full match
is returned in lowercase -->"ell"
"{o}" (first part) -->"o"
" World" (second part) -->" world"
I think this is probably what you are looking for:
Change case using Javascript regex
Detect on the first curly brace instead of a hyphen.
Assuming that all parentheses are well balanced, the parts that should be lower cased are contained like this:
Left hand side is either the start of your string or }
Right hand side is either the end of your string or {
This the code that would work:
var str = '{H}ELLO {W}ORLD';
str.replace(/(?:^|})(.*?)(?:$|{)/g, function($0, $1) {
return $1.toLowerCase();
});
// "Hello World"
I would amend #Jack s solution as follows :
var str = '{H}ELLO {W}ORLD';
str = str.replace (/(?:^|\})(.*?)(?:\{|$)/g, function($0, $1) {
return $1.toLowerCase ();
});
Which performs both the lower casing and the bracket removal in one operation!
I have a JS stirng like this
<div id="grouplogo_nav"><br> <ul><br> <li><a class="group_hlfppt" target="_blank" href="http://www.hlfppt.org/"> </a></li><br> </ul><br> </div>
I need to remove all <br> and $nbsp; that are only between > and <. I tried to write a regular expression, but didn't got it right. Does anybody have a solution.
EDIT :
Please note i want to remove only the tags b/w > and <
Avoid using regex on html!
Try creating a temporary div from the string, and using the DOM to remove any br tags from it. This is much more robust than parsing html with regex, which can be harmful to your health:
var tempDiv = document.createElement('div');
tempDiv.innerHTML = mystringwithBRin;
var nodes = tempDiv.childNodes;
for(var nodeId=nodes.length-1; nodeId >= 0; --nodeId) {
if(nodes[nodeId].tagName === 'br') {
tempDiv.removeChild(nodes[nodeId]);
}
}
var newStr = tempDiv.innerHTML;
Note that we iterate in reverse over the child nodes so that the node IDs remain valid after removing a given child node.
http://jsfiddle.net/fxfrt/
myString = myString.replace(/^( |<br>)+/, '');
... where /.../ denotes a regular expression, ^ denotes start of string, ($nbsp;|<br>) denotes " or <br>", and + denotes "one or more occurrence of the previous expression". And then simply replace that full match with an empty string.
s.replace(/(>)(?: |<br>)+(\s?<)/g,'$1$2');
Don't use this in production. See the answer from Phil H.
Edit: I try to explain it a bit and hope my english is good enough.
Basically we have two different kinds of parentheses here. The first pair and third pair () are normal parentheses. They are used to remember the characters that are matched by the enclosed pattern and group the characters together. For the second pair, we don't need to remember the characters for later use, so we disable the "remember" functionality by using the form (?:) and only group the characters to make the + work as expected. The + quantifier means "one or more occurrences", so or <br> must be there one or more times. The last part (\s?<) matches a whitespace character (\s), which can be missing or occur one time (?), followed by the characters <. $1 and $2 are kind of variables that are replaces by the remembered characters of the first and third parentheses.
MDN provides a nice table, which explains all the special characters.
You need to replace globally. Also don't forget that you can have the being closed . Try this:
myString = myString.replace(/( |<br>|<br \/>)/g, '');
This worked for me, please note for the multi lines
myString = myString.replace(/( |<br>|<br \/>)/gm, '');
myString = myString.replace(/^( |<br>)+/, '');
hope this helps