I am trying to replace the & in "Sydney & Melbourne" with &.
But it's not working.
I have tried a few different ways as follows:
with for loop and if statement:
for(var i=0; i<str.length; i++){
if(str[i]==="&"){
str[i]="&";
}
with regex and replace:
var myRegExp = /\b&\b/;
str = str.replace(myRegExp,"&");
return str;
I do understand that & and & are the same things and so the result probably comes out at as & (in fact it's happening as I am writing it here on stack overflow). But is there a way around it?
Your first try
with for loop and if statement:
for(var i=0; i<str.length; i++){
if(str[i]==="&"){
str[i]="&";
}
}
Of course this won't work. JS strings are immutable, which means
Once a string is created, it is not possible to modify it
It won't cause a run-time error, but it won't do anything. (Even if JS strings were not immutable, which they are, you could not replace one character with multiple characters.)
Your second try
with regex and replace:
var myRegExp = /\b&\b/;
str = str.replace(myRegExp,"&");
return str;
Of course this won't work. There is no word boundary between a space and an ampersand. See the definition of word boundary:
A word boundary matches the position where a word character is not followed or preceded by another word-character.
where "word character" is equivalent to [a-zA-Z0-9_].
But why?
However, the real question is why you want to do this. If you simply want to insert this string into the DOM as text, then do so using textContent:
document.getElementById("city").textContent = "Sydney & Melbourne";
(instead of using innerHTML). In jQuery, if you happen to be using that, use text() instead of html(). This approach has the advantage that it won't be confused by other HTML characters in the string, notably <.
If your issue is related to & in a URL, you shouldn't be HTML-escaping it--you should be URI-encoding it, but you probably already knew that.
If your issue is that you are passing this to a server which expects properly encoded HTML strings, then you should reconsider your API design. In general, it's better to store the raw strings on the server, and decode/encode/escape/unescape them when necessary--remember that server data might be displayed in contexts other than browser. If you absolutely do want to send the server properly HTML-escaped strings, then you need to worry about more than just &, but also the other special HTML characters. For this, you should use some utility that is probably available in your favorite library, or the standard:
function htmlEscape(str) {
var div = document.createElement('div');
div.textContent = str;
return div.innerHTML;
}
Currently, in order to replace the &, you would need to have the & surrounded by word characters (ex: a&b).
Here is a JSFiddle that explains visually what I mean about the regex you were trying to use.
Instead, just target the & directly:
var string = 'Sydney & Melbourne';
string = string.replace(/&/g, '&');
console.log(string);
Related
I am working on an autocomplete used inside a textarea. I know there is some autocompletes already created, but anyway.
It works well, but if when I'm typing something and I select one or many characters and delete it, a appears at the end of my string (or where I was inside it). I tried to replace it while retrieving my html with replaceAll, but it doesn't work (There is not this special char when I use an indexOf). The problem is he doesn't find any result because of this char. Let's see an exemple :
This is my array (a little bit cut but we don't really care)
let array = [{
name: "test",
value: "I'm a test value"
},
{
name: "valueorange",
value: "I'm just an orange"
},
// This is how I get the contents of my span (I tried both innerHTML and innerText, same results).
// Same while using .text() or .html() with jquery
let value = jqElement.find("#searching-span")[0].innerHTML.substring(1).toLowerCase();
value = value.replaceAll(" ", " ");
value = value.replaceAll("", "");
I can replace every without any problems. Finally I check with a loop if there is some value with indexOf on each value, and if it returns anything I push it and get it in a new array. But when I have I have no results.
Any idea how I can resolve it ?
I tried to be clear, I hope my english wasn't so bad, sorry if I made many mistakes !
Character entities and HTML escaped characters like and appearing in HTML source code are converted by the HTML parser into unicode characters like \u00a0 and \ufeff before being inserted into the DOM.
If replacing them in JavaScript, use their unicode characters, not HTML escape sequences, to match them in DOM strings. For example:
p.textContent = p.textContent.replaceAll("\ufeff", '*'); // zwj
p.textContent = p.textContent.replaceAll("\xa0", '-'); // nbsp
<p id="p"> </p>
Note that zero width joiners are uses a lot in emoji character sequences and arbitrarily removing may break emoji character decoding (although decoding badly formed emoji strings is almost a prerequisite for handling emojis in the wild).
Second note: I am not suggesting this as a means of circumventing badly decoding characters that have been encoded using a Unicode Transform Format. Making sure decoding is performed correctly is always a better option.
My server returns value as support\testing. When I get this value in client it can be escaped as support testing. \t is escaped as tab space.
How do I avoid escaping special characters in JavaScript?
Your server needs to output the string with proper escaping.
In this case, you want a backslash character in the output; backslash is a special character, so that should be escaped.
The escape sequence for a backslash is \\ (ie two backslashes), but you shouldn't need to think about specific escape codes -- if you're outputting JS data, you should be outputting it using proper escaping for the whole string, which generally means you should be using JSON encoding.
Most server languages these days provide JSON encoding as a built-in feature. You haven't specified which language your server is using, but for example if it's written in PHP, you would output your string as json_encode($string) rather than just outputting $string directly. Other languages provide a similar feature. This will protect you not just from broken backslash characters, but also from other errors, such as quote marks or line feeds in your strings, which will also cause errors if you put them into a Javascript code as an unescaped string.
You can use tagged template literals
var str = (s => s.raw)`support\testing`[0]
The anonymous arrow function will serve as tag and s.raw contains the original input
If you are able to change the server-side code, you should add the escape character there: "support\\testing".
That will result in the desired result.
You can do a simple replace:
str.replace("\t","\\t");
And do this for other characters you need replacing.
Best Solution for this
function valid(f) {
debugger;
var s = "!##$%^&*()+=-[]\\\';,./{}|\":<>?~";
str = f.value;
for (var i = 0; i < str.length; i++) {
if (s.indexOf(str.charAt(i)) != -1) {
//alert("The box has special characters. \nThese are not allowed.\n");
f.value = f.value.replace(str.charAt(i), '');// : null;
return false;
}
}
}
I am trying to replace any non encoded ampersands in a string in JavaScript and was wondering if this was possible. I have the regex build to detect this in the string, but when I do a replace, I will lose the parameter name.
Current input:
http://www.somesite.com/id/2343?paramA=1¶mB=asdf
From a textarea
<textarea id='test-box'>http://www.somesite.com/id/2343?paramA=1¶mB=asdf</textarea>
var str = $('#test-box').val();;
var regex = /&[a-z]+=/gi;
str = str.replace(regex, [REPLACE &'s WITH &'s]);
console.log(str);
Desired output:
http://www.somesite.com/id/2343?paramA=1¶mB=asdf
How can I then use JavaScript to keep the name of the parameter and simply replace the '&' with '&'?
Try this regex: /&(?=[a-z]+=)/ and this replacement: &
This uses a lookahead assertion rather than eating up the parameter name.
If you have a URL which might be partially encoded in HTML, and you're trying to make a best effort at producing XHTML validating textarea content, then you can use the list of HTML character references to identify ampersands which are not part of an HTML character reference:
str.replace(/&(?!#(?:[0-9]|[xX][0-9A-Fa-f])|lt;|gt;|amp|...)/g, '&')
where ... is replaced with the set of entities from that list that you care to recognize.
Note that most of those character references end in semicolon, so are not allowed to be followed immediately by an equals sign, so are not ambiguous with URL parameters. Only certain entities can appear without a semicolon for backwards compatibility.
If you don't care about validating, then you can just let the browser take care of it by ensuring that your URL doesn't contain the substring </textarea by doing something like
str.replace(/</g, '%3c')
Apart from lookahead assert, you can also use a backreference:
var regex = /&([a-z]+)=/gi;
str = str.replace(/&([a-z]+)=/gi,'&$1');
When $n appears in the replace string, it will be replaced by the n'th parenthesized pattern in the regexp.
Who needs regex when you've got jQuery html(). Especially since you've got a jquery tag on your question :D
What this does is leverage the browser's innerHTML property. see api
Fiddle
var str = 'http://www.somesite.com/id/2343?paramA=1¶mB=asd';
$('#test-box').text(str);
$('#html-box').text($('#test-box').html());
From this q/a, I deduced that matching all instances of a given regex not inside quotes, is impossible. That is, it can't match escaped quotes (ex: "this whole \"match\" should be taken"). If there is a way to do it that I don't know about, that would solve my problem.
If not, however, I'd like to know if there is any efficient alternative that could be used in JavaScript. I've thought about it a bit, but can't come with any elegant solutions that would work in most, if not all, cases.
Specifically, I just need the alternative to work with .split() and .replace() methods, but if it could be more generalized, that would be the best.
For Example:
An input string of: +bar+baz"not+or\"+or+\"this+"foo+bar+
replacing + with #, not inside quotes, would return: #bar#baz"not+or\"+or+\"this+"foo#bar#
Actually, you can match all instances of a regex not inside quotes for any string, where each opening quote is closed again. Say, as in you example above, you want to match \+.
The key observation here is, that a word is outside quotes if there are an even number of quotes following it. This can be modeled as a look-ahead assertion:
\+(?=([^"]*"[^"]*")*[^"]*$)
Now, you'd like to not count escaped quotes. This gets a little more complicated. Instead of [^"]* , which advanced to the next quote, you need to consider backslashes as well and use [^"\\]*. After you arrive at either a backslash or a quote, you need to ignore the next character if you encounter a backslash, or else advance to the next unescaped quote. That looks like (\\.|"([^"\\]*\\.)*[^"\\]*"). Combined, you arrive at
\+(?=([^"\\]*(\\.|"([^"\\]*\\.)*[^"\\]*"))*[^"]*$)
I admit it is a little cryptic. =)
Azmisov, resurrecting this question because you said you were looking for any efficient alternative that could be used in JavaScript and any elegant solutions that would work in most, if not all, cases.
There happens to be a simple, general solution that wasn't mentioned.
Compared with alternatives, the regex for this solution is amazingly simple:
"[^"]+"|(\+)
The idea is that we match but ignore anything within quotes to neutralize that content (on the left side of the alternation). On the right side, we capture all the + that were not neutralized into Group 1, and the replace function examines Group 1. Here is full working code:
<script>
var subject = '+bar+baz"not+these+"foo+bar+';
var regex = /"[^"]+"|(\+)/g;
replaced = subject.replace(regex, function(m, group1) {
if (!group1) return m;
else return "#";
});
document.write(replaced);
Online demo
You can use the same principle to match or split. See the question and article in the reference, which will also point you code samples.
Hope this gives you a different idea of a very general way to do this. :)
What about Empty Strings?
The above is a general answer to showcase the technique. It can be tweaked depending on your exact needs. If you worry that your text might contain empty strings, just change the quantifier inside the string-capture expression from + to *:
"[^"]*"|(\+)
See demo.
What about Escaped Quotes?
Again, the above is a general answer to showcase the technique. Not only can the "ignore this match" regex can be refined to your needs, you can add multiple expressions to ignore. For instance, if you want to make sure escaped quotes are adequately ignored, you can start by adding an alternation \\"| in front of the other two in order to match (and ignore) straggling escaped double quotes.
Next, within the section "[^"]*" that captures the content of double-quoted strings, you can add an alternation to ensure escaped double quotes are matched before their " has a chance to turn into a closing sentinel, turning it into "(?:\\"|[^"])*"
The resulting expression has three branches:
\\" to match and ignore
"(?:\\"|[^"])*" to match and ignore
(\+) to match, capture and handle
Note that in other regex flavors, we could do this job more easily with lookbehind, but JS doesn't support it.
The full regex becomes:
\\"|"(?:\\"|[^"])*"|(\+)
See regex demo and full script.
Reference
How to match pattern except in situations s1, s2, s3
How to match a pattern unless...
You can do it in three steps.
Use a regex global replace to extract all string body contents into a side-table.
Do your comma translation
Use a regex global replace to swap the string bodies back
Code below
// Step 1
var sideTable = [];
myString = myString.replace(
/"(?:[^"\\]|\\.)*"/g,
function (_) {
var index = sideTable.length;
sideTable[index] = _;
return '"' + index + '"';
});
// Step 2, replace commas with newlines
myString = myString.replace(/,/g, "\n");
// Step 3, swap the string bodies back
myString = myString.replace(/"(\d+)"/g,
function (_, index) {
return sideTable[index];
});
If you run that after setting
myString = '{:a "ab,cd, efg", :b "ab,def, egf,", :c "Conjecture"}';
you should get
{:a "ab,cd, efg"
:b "ab,def, egf,"
:c "Conjecture"}
It works, because after step 1,
myString = '{:a "0", :b "1", :c "2"}'
sideTable = ["ab,cd, efg", "ab,def, egf,", "Conjecture"];
so the only commas in myString are outside strings. Step 2, then turns commas into newlines:
myString = '{:a "0"\n :b "1"\n :c "2"}'
Finally we replace the strings that only contain numbers with their original content.
Although the answer by zx81 seems to be the best performing and clean one, it needes these fixes to correctly catch the escaped quotes:
var subject = '+bar+baz"not+or\\"+or+\\"this+"foo+bar+';
and
var regex = /"(?:[^"\\]|\\.)*"|(\+)/g;
Also the already mentioned "group1 === undefined" or "!group1".
Especially 2. seems important to actually take everything asked in the original question into account.
It should be mentioned though that this method implicitly requires the string to not have escaped quotes outside of unescaped quote pairs.
I'm trying to remove a rectangular brackets(bbcode style) using javascript, this is for removing unwanted bbcode.
I try with this.
theString .replace(/\[quote[^\/]+\]*\[\/quote\]/, "")
it works with this string sample:
theString = "[quote=MyName;225]Test 123[/quote]";
it will fail within this sample:
theString = "[quote=MyName;225]Test [quote]inside quotes[/quote]123[/quote]";
if there any solution beside regex no problem
The other 2 solutions simply do not work (see my comments). To solve this problem you first need to craft a regex which matches the innermost matching quote elements (which contain neither [QUOTE..] nor [/QUOTE]). Next, you need to iterate, applying this regex over and over until there are no more QUOTE elements left. This tested function does what you want:
function filterQuotes(text)
{ // Regex matches inner [QUOTE]non-quote-stuff[/quote] tag.
var re = /\[quote[^\[]+(?:(?!\[\/?quote\b)\[[^\[]*)*\[\/quote\]/ig;
while (text.search(re) !== -1)
{ // Need to iterate removing QUOTEs from inside out.
text = text.replace(re, "");
}
return text;
}
Note that this regex employs Jeffrey Friedl's "Unrolling the loop" efficiency technique and is not only accurate, but is quite fast to boot.
See: Mastering Regular Expressions (3rd Edition) (highly recommended).
Try this one:
/\[quote[^\/]+\].*\[\/quote\]$/
The $ sign indicates that only the closing quote element at the end of the string should be used to determine the ending of the quote you're trying to remove.
And i added a "." before the asterisk so that this will match any sign in between. I tested this with your two strings and it worked.
edit: I don't exactly know how you are using that. But just as an addition. If you want the pattern also to match to a string where no attributes are added for example:
[quote]Hello[/quote]
You should change the "+" sign into an asterisk as well like this:
/\[quote[^\/]*\].*\[\/quote\]$/
This answer has flaws, see Ridgerunner's answer for a more correct one.
Here's my crack at it.
function filterQuotes(text)
{
return text.replace(/\[(\/)?quote([^\/]*)?\]/g,"");
}