I have a JS stirng like this
<div id="grouplogo_nav"><br> <ul><br> <li><a class="group_hlfppt" target="_blank" href="http://www.hlfppt.org/"> </a></li><br> </ul><br> </div>
I need to remove all <br> and $nbsp; that are only between > and <. I tried to write a regular expression, but didn't got it right. Does anybody have a solution.
EDIT :
Please note i want to remove only the tags b/w > and <
Avoid using regex on html!
Try creating a temporary div from the string, and using the DOM to remove any br tags from it. This is much more robust than parsing html with regex, which can be harmful to your health:
var tempDiv = document.createElement('div');
tempDiv.innerHTML = mystringwithBRin;
var nodes = tempDiv.childNodes;
for(var nodeId=nodes.length-1; nodeId >= 0; --nodeId) {
if(nodes[nodeId].tagName === 'br') {
tempDiv.removeChild(nodes[nodeId]);
}
}
var newStr = tempDiv.innerHTML;
Note that we iterate in reverse over the child nodes so that the node IDs remain valid after removing a given child node.
http://jsfiddle.net/fxfrt/
myString = myString.replace(/^( |<br>)+/, '');
... where /.../ denotes a regular expression, ^ denotes start of string, ($nbsp;|<br>) denotes " or <br>", and + denotes "one or more occurrence of the previous expression". And then simply replace that full match with an empty string.
s.replace(/(>)(?: |<br>)+(\s?<)/g,'$1$2');
Don't use this in production. See the answer from Phil H.
Edit: I try to explain it a bit and hope my english is good enough.
Basically we have two different kinds of parentheses here. The first pair and third pair () are normal parentheses. They are used to remember the characters that are matched by the enclosed pattern and group the characters together. For the second pair, we don't need to remember the characters for later use, so we disable the "remember" functionality by using the form (?:) and only group the characters to make the + work as expected. The + quantifier means "one or more occurrences", so or <br> must be there one or more times. The last part (\s?<) matches a whitespace character (\s), which can be missing or occur one time (?), followed by the characters <. $1 and $2 are kind of variables that are replaces by the remembered characters of the first and third parentheses.
MDN provides a nice table, which explains all the special characters.
You need to replace globally. Also don't forget that you can have the being closed . Try this:
myString = myString.replace(/( |<br>|<br \/>)/g, '');
This worked for me, please note for the multi lines
myString = myString.replace(/( |<br>|<br \/>)/gm, '');
myString = myString.replace(/^( |<br>)+/, '');
hope this helps
Related
I have a string of text with HTML line breaks. Some of the <br> immediately follow a number between two delimiters «...» and some do not.
Here's the string:
var str = ("«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>");
I’m looking for a conditional regex that’ll remove the number and delimiters (ex. «1») as well as the line break itself without removing all of the line breaks in the string.
So for instance, at the beginning of my example string, when the script encounters »<br> it’ll remove everything between and including the first « to the left, to »<br> (ex. «1»<br>). However it would not remove «2»some text<br>.
I’ve had some help removing the entire number/delimiters (ex. «1») using the following:
var regex = new RegExp(UsedKeys.join('|'), 'g');
var nextStr = str.replace(/«[^»]*»/g, " ");
I sure hope that makes sense.
Just to be super clear, when the string is rendered in a browser, I’d like to go from this…
«1»
«2»some text
«3»
«4»more text
«5»
«6»even more text
To this…
«2»some text
«4»more text
«6»even more text
Many thanks!
Maybe I'm missing a subtlety here, if so I apologize. But it seems that you can just replace with the regex: /«\d+»<br>/g. This will replace all occurrences of a number between « & » followed by <br>
var str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\d+»<br>/g, '')
console.log(newStr)
To match letters and digits you can use \w instead of \d
var str = "«a»<br>«b»some text<br>«hel»<br>«4»more text<br>«5»<br>«6»even more text<br>"
var newStr = str.replace(/«\w+?»<br>/g, '')
console.log(newStr)
This snippet assumes that the input within the brackets will always be a number but I think it solves the problem you're trying to solve.
const str = "«1»<br>«2»some text<br>«3»<br>«4»more text<br>«5»<br>«6»even more text<br>";
console.log(str.replace(/(«(\d+)»<br>)/g, ""));
/(«(\d+)»<br>)/g
«(\d+)» Will match any brackets containing 1 or more digits in a row
If you would prefer to match alphanumeric you could use «(\w+)» or for any characters including symbols you could use «([^»]+)»
<br> Will match a line break
//g Matches globally so that it can find every instance of the substring
Basically we are only removing the bracketed numbers if they are immediately followed by a line break.
I need to find a string that contains "script" with as many characters before or after, and enclosed in < and >. I can do this with:<*script.*>
I also want to match only when that string is NOT followed by a <
The closest I've come, so far, is with this: (<*script.*>)([^=?<*]*)$
However, that will fail for something like <script></script> because the last > isn't followed by a < (so it doesn't match).
How can I check if only the the first > is followed by < or not?
For example,
<script> abc () ; </script> MATCH
<< ScriPT >abc (”XXX”);//<</ ScriPT > MATCH
<script></script> DON'T MATCH
And, a case that I still am working on:
<script/script> DON'T MATCH
Thanks!
You were close with your Regex. You just needed to make your first query non-greedy using a ? after the second *. Try this out:
(?i)<*\s*script.*?>[^<]+<*[^>]+>
There is an app called Expresso that really helps with designing Regex strings. Give it a shot.
Explanation: Without the ? non-greedy argument, your second * before the first > makes the search go all the way to the end of the string and grab the > at the end right at that point. None of the other stuff in your query was even being looked at.
EDIT: Added (?i) at the beginning for case-insensitivity. If you want a javascript specific case-insensitive regex, you would do that like this:
/<*\s*script.*?>[^<]+<*[^>]+>/i
I noticed you have parenthesis in your regex to make groups but you didn't specifically say you were trying to capture groups. Do you want to capture what's between the <script> and </script>? If so, that would be:
/<*\s*script.*?>([^<]+)<*[^>]+>/i
If I understand what you are looking for give this a try:
regex = "<\s*script\s*>([^<]+)<"
Here is an example in Python:
import re
textlist = ["<script>show this</script>","<script></script>"]
regex = "<\s*script\s*>([^<]+)"
for text in textlist:
thematch = re.search(regex, text, re.IGNORECASE)
if thematch:
print ("match found:")
print (thematch.group(1))
else:
print ("no match sir!")
Explanation:
start with < then possible spaces, the word script, possible spaces, a >
then capture all (at least 1) non < and make sure that's followed by a <
Hope that helps!
This would be better solved by using substring() and/or indexOf()
JavaScript methods
I've been hoving around by some answers here, and I can't find a solution to my problem:
I have this regexp which matches everyting inside an HTML span tag, including contents:
<span\b[^>]*>(.*?)</span>
and I want to find a way to make a search in all the text, except for what is matched with that regexp.
For example, if my text is:
var text = "...for there is a class of <span class="highlight">guinea</span> pigs which..."
... then the regexp would match:
<span class="highlight">guinea</span>
and I want to be able to make a regexp such that if I search for "class", regexp will match "...for there is a class of..."
and will not match inside the tag, like in
"... class="highlight"..."
The word to be matched ("class") might be anywhere within the text. I've tried
(?!<span\b[^>]*>(.*?)</span>)class
but it keeps searching inside tags as well.
I want to find a solution using only regexp, not dealing with DOM nor JQuery. Thanks in advance :).
Although I wouldn't recommend this, I would do something like below
(class)(?:(?=.*<span\b[^>]*>))|(?:(?<=<\/span>).*)(class)
You can see this in action here
Rubular Link for this regex
You can capture your matches from the groups and work with them as needed. If you can, use a HTML parser and then find matches from the text element.
It's not pretty, but if I get you right, this should do what you wan't. It's done with a single RegEx but js can't (to my knowledge) extract the result without joining the results in a loop.
The RegEx: /(?:<span\b[^>]*>.*?<\/span>)|(.)/g
Example js code:
var str = '...for there is a class of <span class="highlight">guinea</span> pigs which...',
pattern = /(?:<span\b[^>]*>.*?<\/span>)|(.)/g,
match,
res = '';
match = pattern.exec(str)
while( match != null )
{
res += match[1];
match = pattern.exec(str)
}
document.writeln('Result:' + res);
In English: Do a non capturing test against your tag-expression or capture any character. Do this globally to get the entire string. The result is a capture group for each character in your string, except the tag. As pointed out, this is ugly - can result in a serious number of capture groups - but gets the job done.
If you need to send it in and retrieve the result in one call, I'd have to agree with previous contributors - It can't be done!
I am using regular expressions to do some basic converting of wiki markup code into copy-pastable plain text, and I'm using javascript to do the work.
However, javascript's regex engine behaves much differently to the ones I've used previously as well as the regex in Notepad++ that I use on a daily basis.
For example- given a test string:
==Section Header==
===Subsection 1===
# Content begins here.
## Content continues here.
I want to end up with:
Section Header
Subsection 1
# Content begins here.
## Content continues here.
Simply remove all equals signs.
I began with the regex setup of:
var reg_titles = /(^)(=+)(.+)(=+)/
This regex searches for lines that begin with one or more equals with another set of one or more equals. Rubular shows that it matches my lines accurately and does not catch equals signs in the middle of contet. http://www.rubular.com/r/46PrkPx8OB
The code to replace the string based on regex
var lines = $('.tb_in').val().split('\n'); //use jquery to grab text in a textarea, and split into an array of lines based on the \n
for(var i = 0;i < lines.length;i++){
line_temp = lines[i].replace(reg_titles, "");
lines[i] = line_temp; //replace line with temp
}
$('.tb_out').val(lines.join("\n")); //rejoin and print result
My result is unfortunately:
Section Header==
Subsection 1===
# Content begins here.
## Content continues here.
I cannot figure out why the regex replace function, when it finds multiple matches, seems to only replace the first instance it finds, not all instances.
Even when my regex is updated to:
var reg_titles = /(={2,})/
"Find any two or more equals", the output is still identical. It makes a single replacement and ignores all other matches.
No one regex expression executor behaves this way for me. Running the same replace multiple times has no effect.
Any advice on how to get my string replace function to replace ALL instances of the matched regex instead of just the first one?
^=+|=+$
You can use this.Do not forget to add g and m flags.Replace by ``.See demo.
http://regex101.com/r/nA6hN9/28
Add the g modifier to do a global search:
var reg_titles = /^(=+)(.+?)(=+)/g
Your regex is needlessly complex, and yet doesn't actually accomplish what you set out to do. :) You might try something like this instead:
var reg_titles = /^=+(.+?)=+$/;
lines = $('.tb_in').val().split('\n');
lines.forEach(function(v, i, a) {
a[i] = v.replace(reg_titles, '$1');
})
$('.tb_out').val(lines.join("\n"));
I'm trying to remove the whitespaces from a textarea . The below code is not appending the text i'm selecting from two dropdowns. Can somebody tell me where i'd gone wrong? I'm trying to remove multiple spaces within the string as well, will that work with the same? Dont know regular expressions much. Please help.
function addToExpressionPreview() {
var reqColumnName = $('#ddlColumnNames')[0].value;
var reqOperator = $('#ddOperator')[0].value;
var expressionTextArea = document.getElementById("expressionPreview");
var txt = document.createTextNode(reqColumnName + reqOperator.toString());
if (expressionTextArea.value.match(/^\s+$/) != null)
{
expressionTextArea.value = (expressionTextArea.value.replace(/^\W+/, '')).replace(/\W+$/, '');
}
expressionTextArea.appendChild(txt);
}
> function addToExpressionPreview() {
> var reqColumnName = $('#ddlColumnNames')[0].value;
> var reqOperator = $('#ddOperator')[0].value;
You might as well use document.getElementById() for each of the above.
> var expressionTextArea = document.getElementById("expressionPreview");
> var txt = document.createTextNode(reqColumnName + reqOperator.toString());
reqOperator is already a string, and in any case, the use of the + operator will coerce it to String unless all expressions or identifiers involved are Numbers.
> if (expressionTextArea.value.match(/^\s+$/) != null) {
There is no need for match here. I seems like you are trying to see if the value is all whitespace, so you can use:
if (/^\s*$/.test(expressionTextArea.value)) {
// value is empty or all whitespace
Since you re-use expressionTextArea.value several times, it would be much more convenient to store it an a variable, preferably with a short name.
> expressionTextArea.value = (expressionTextArea.value.replace(/^\W+/,
> '')).replace(/\W+$/, '');
That will replace one or more non-word characters at the end of the string with nothing. If you want to replace multiple white space characters anywhere in the string with one, then (note wrapping for posting here):
expressionTextArea.value = expressionTextArea.value.
replace(/^\s+/,'').
replace(/\s+$/, '').
replace(/\s+/g,' ');
Note that \s does not match the same range of 'whitespace' characters in all browsers. However, for simple use for form element values it is probably sufficient.
Whitespace is matched by \s, so
expressionTextArea.value.replace(/\s/g, "");
should do the trick for you.
In your sample, ^\W+ will only match leading characters that are not a word character, and ^\s+$ will only match if the entire string is whitespace. To do a global replace(not just the first match) you need to use the g modifier.
Refer this link, you can get some idea. Try .replace(/ /g,"UrReplacement");
Edit: or .split(' ').join('UrReplacement') if you have an aversion to REs