Value &# to unicode convert - javascript

I have lots of characters in the form ¶ which I would like to display as unicode characters in my text editor.
This ought to convert them:
var newtext = doctext.replace(
/&#(\d+);/g,
String.fromCharCode(parseInt("$1", 10))
);
But doesn't seem to work. The regular expression /&#(\d+);/ is getting me the numbers out -- but the String.fromCharCode does not appear to give the results I'd like. What is up?

The replacement part should be an anonymous function instead of an expression:
var newtext = doctext.replace(
/&#(\d+);/g,
function($0, $1) {
return String.fromCharCode(parseInt($1, 10));
}
);

The replace method is not foolproof, if you use full HTML (i.e. don't control what the input is). For example, the method submitted by Jack (and obviously the idea in the original post as well) works excellently if your entities are all decimal, but doesn't work for hex A, and even less for named entities like ".
For this, there is another trick you can do: create an element, set its innerHTML to the source, then read out its text value. Basically, browsers know what to do with entities, so we delegate. :) In jQuery it is easy:
$('<div/>').html('&').text()
// => "&"
With plain JS it gets a bit more verbose:
var el = document.createElement();
el.innerHTML = '&';
el.textContent
// => "&"

Related

Why the .replace() and toUppercase() did not work in the second function? [duplicate]

I want to replace the smart quotes like ‘, ’, “ and ” to regular quotes. Also, I wanted to replace the ©, ® and ™. I used the following code. But it doesn't help.
Kindly help me to resolve this issue.
str.replace(/[“”]/g, '"');
str.replace(/[‘’]/g, "'");
Use:
str = str.replace(/[“”]/g, '"');
str = str.replace(/[‘’]/g, "'");
or to do it in one statement:
str = str.replace(/[“”]/g, '"').replace(/[‘’]/g,"'");
In JavaScript (as in many other languages) strings are immutable - string "replacement" methods actually just return the new string instead of modifying the string in place.
The MDN JavaScript reference entry for replace states:
Returns a new string with some or all matches of a pattern replaced by a replacement.
…
This method does not change the String object it is called on. It simply returns a new string.
replace return the resulting string
str = str.replace(/["']/, '');
The OP doesn't say why it isn't working, but there seems to be problems related to the encoding of the file. If I have an ANSI encoded file and I do:
var s = "“This is a test” ‘Another test’";
s = s.replace(/[“”]/g, '"').replace(/[‘’]/g,"'");
document.writeln(s);
I get:
"This is a test" "Another test"
I converted the encoding to UTF-8, fixed the smart quotes (which broke when I changed encoding), then converted back to ANSI and the problem went away.
Note that when I copied and pasted the double and single smart quotes off this page into my test document (ANSI encoded) and ran this code:
var s = "“This is a test” ‘Another test’";
for (var i = 0; i < s.length; i++) {
document.writeln(s.charAt(i) + '=' + s.charCodeAt(i));
}
I discovered that all the smart quotes showed up as ? = 63.
So, to the OP, determine where the smart quotes are originating and make sure they are the character codes you expect them to be. If they are not, consider changing the encoding of the source so they arrive as “ = 8220, ” = 8221, ‘ = 8216 and ’ = 8217. Use my loop to examine the source, if the smart quotes are showing up with any charCodeAt() values other than those I've listed, replace() will not work as written.
To replace all regular quotes with smart quotes, I am using a similar function. You must specify the CharCode as some different computers/browsers default settings may identify the plain characters differently ("",",',').
Using the CharCode with call the ASCII character, which will eliminate the room for error across different browsers, and operating systems. This is also helpful for bilingual use (accents, etc.).
To replace smart quotes with SINGLE QUOTES
function unSmartQuotify(n){
var name = n;
var apos = String.fromCharCode(39);
while (n.indexOf("'") > -1)
name = name.replace("'" , apos);
return name;
}
To find the other ASCII values you may need. Check here.

Regex to find a specific string that is not in a HTML attribute

My case is: I have a string with HTML elements:
This is a text and "specific_string"
I need a Regex to match only the one that is not in a HTML attribute.
This is my current Regex, it works but it gives a false positive when the string is wrapped by double quotes
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
I have tried the following Regex:
((?!\"[\w\s]*)specific_string(?![\w\s]*\"))
It works but it gives a false positive when the string is wrapped by double quotes
if you want to get what's inside the tag you might be trying to use the split() tool; to cut the string every >" or "<" basically like this:
let string = "<a href='something+specific_string' title='testing'>This is a text and 'specific_string'</a>";
string = string.split('>');
string = string[1].split('<');
console.log(string)
So, when you want to manipulate it, just use position 0 of the string. Is not regex like u wnat, but is an idea
Though it can suffice in simple cases, you should know it's often said that RegExp is ill-suited for parsing HTML, and depending on environment you could be better off using more robust techniques. (There's http://htmlparsing.com/ dedicated to the topic but yet it doesn't discuss JS.)
That said, the following works in Chrome 107 and Node 16.13.
(s=>s.match(/(?<=>[^<]*|^[^<]*)specific_string/))
('This is a text and "specific_string"')
It uses look-behind. In lieu of that you could use /(>[^<]*|^[^<]*)(specific_string)/ and compensate index/lengths to get the position of a match...
As you answer in a comment that you'll replace in user-provided HTML, I encourage you to consider security implications (namely XSS).
Back on the topic of parsing HTML w/o RegExp we obviously have the techniques in a web browser and I couldn't stop myself writing a quick and dirty textNode replacer in web JS, working in Chrome 107:
((html, fun) => {
const el = document.createElement('body')
el.innerHTML = html
const X = new XPathEvaluator, R = X.evaluate('//*[text()]', el)
const A = []; for (let n; n = R.iterateNext();) A.push(n) // mutating el while iterating XPathResult is illegal
for (let n of A) fun(n)
return el.innerHTML})
('This is a text and "specific_string"',
n => n.innerHTML = n.innerHTML
.replace(/specific_string/, '<b>replaced</b>'))

replace multiple words in string based on an array [duplicate]

I want to replace the smart quotes like ‘, ’, “ and ” to regular quotes. Also, I wanted to replace the ©, ® and ™. I used the following code. But it doesn't help.
Kindly help me to resolve this issue.
str.replace(/[“”]/g, '"');
str.replace(/[‘’]/g, "'");
Use:
str = str.replace(/[“”]/g, '"');
str = str.replace(/[‘’]/g, "'");
or to do it in one statement:
str = str.replace(/[“”]/g, '"').replace(/[‘’]/g,"'");
In JavaScript (as in many other languages) strings are immutable - string "replacement" methods actually just return the new string instead of modifying the string in place.
The MDN JavaScript reference entry for replace states:
Returns a new string with some or all matches of a pattern replaced by a replacement.
…
This method does not change the String object it is called on. It simply returns a new string.
replace return the resulting string
str = str.replace(/["']/, '');
The OP doesn't say why it isn't working, but there seems to be problems related to the encoding of the file. If I have an ANSI encoded file and I do:
var s = "“This is a test” ‘Another test’";
s = s.replace(/[“”]/g, '"').replace(/[‘’]/g,"'");
document.writeln(s);
I get:
"This is a test" "Another test"
I converted the encoding to UTF-8, fixed the smart quotes (which broke when I changed encoding), then converted back to ANSI and the problem went away.
Note that when I copied and pasted the double and single smart quotes off this page into my test document (ANSI encoded) and ran this code:
var s = "“This is a test” ‘Another test’";
for (var i = 0; i < s.length; i++) {
document.writeln(s.charAt(i) + '=' + s.charCodeAt(i));
}
I discovered that all the smart quotes showed up as ? = 63.
So, to the OP, determine where the smart quotes are originating and make sure they are the character codes you expect them to be. If they are not, consider changing the encoding of the source so they arrive as “ = 8220, ” = 8221, ‘ = 8216 and ’ = 8217. Use my loop to examine the source, if the smart quotes are showing up with any charCodeAt() values other than those I've listed, replace() will not work as written.
To replace all regular quotes with smart quotes, I am using a similar function. You must specify the CharCode as some different computers/browsers default settings may identify the plain characters differently ("",",',').
Using the CharCode with call the ASCII character, which will eliminate the room for error across different browsers, and operating systems. This is also helpful for bilingual use (accents, etc.).
To replace smart quotes with SINGLE QUOTES
function unSmartQuotify(n){
var name = n;
var apos = String.fromCharCode(39);
while (n.indexOf("'") > -1)
name = name.replace("'" , apos);
return name;
}
To find the other ASCII values you may need. Check here.

Converting ampersand (&) and blank space to a dash (-) in URLs using regex

With the code below, I have converted the following names into URL such as
Love & Relationships to http://domain.org/love-relationships
Career & Guidance to http://domain.org/career-guidance
filter('ampToDash', function(){
return function(text){
return text ? String(text).replace(/ & /g,'-'): '';
};
}).filter('dashToAmp', function(){
return function(text){
return text ? String(text).replace(/-/g,' & '): '';
};
})
However, I have a new set of names and I can't figure out how to do both at the same time.
Being Human to http://domain.org/being-human
Competitive Exams to http://domain.org/competitive-exams
filter('ampToDash', function(){
return function(text){
return text ? String(text).replace(/ /g,'-'): '';
};
}).filter('dashToAmp', function(){
return function(text){
return text ? String(text).replace(/-/g,' '): '';
};
})
How do I combine both the regex codes so it can work hand in hand?
You may also want to extend your replacement criteria to cover all "non-word" characters, instead of just accounting for the ones you're currently aware of (& and space). This would be more future-proof, and perhaps easier to reason with:
String(text).replace(/\W+/g, '-')
(\W+ means any sequence of non-word characters.)
Example:
'Jack & Jill went up the #$%#! hill'.replace(/\W+/g, '-')
Yields:
Jack-Jill-went-up-the-hill
And because there's loss of information (i.e. you don't know what exactly leads to a '-' by looking at the transformed string), a way you can find the original string is to simply store it and look up by the transformed string. To elaborate: You're probably going to be looking up some document from this new string (a "slug", as others pointed out). Store the slug along with the document and just look up the document (and its original title) from your database.
It looks like you simply want to change any instances of an ampersand with leading or trailing white-space or just white-space to a single hyphen. If so, you could just use the following expression :
// Replace any strings that have leading and trailing spaces or just a series of spaces
String(text).replace(/(\s+&\s+|\s+)/g,'-'): '';
Example
var input = ['Love & Relationships', 'Career & Guidance', 'Being Human', 'Competitive Exams'];
for (var i in input) {
var phrase = input[i];
console.log(phrase + ' -> ' + phrase.replace(/(\s+&\s+|\s+)/g, '-'));
}
I think you are looking for a lib that converts a string into a slug.
You can do this manually, but you'll probably have hard time covering other edge cases.
I would suggest you to use something like :
https://github.com/dodo/node-slug
Or check out this gist if you really want to stay with the regex way : https://gist.github.com/mathewbyrne/1280286
You have two separate problems:
how to 'slugify' a string
how to undo / reverse the slugify.
To answer 1: A generic slugify method would be something like: text.replace(/\W+/g, '-')
To answer 2: you can't. You have a function (ampToDash) that can produce the same output given different inputs. i.e. there is NO equivalent of dashToAmp any more.

Replace with RegExp only outside tags in the string

I have a strings where some html tags could present, like
this is a nice day for bowling <b>bbbb</b>
how can I replace with RegExp all b symbols, for example, with :blablabla: (for example) but ONLY outside html tags?
So in that case the resulting string should become
this is a nice day for :blablabla:owling <b>bbbb</b>
EDIT: I would like to be more specific, based on the answers I have received. So first of all I have just a string, not DOM element, or anything else. The string may or may not contain tags (opening and closing). The main idea is to be able to replace anywhere in the text except inside tags. For example if I have a string like
not feeling well today :/ check out this link http://example.com
the regexp should replace only first :/ with real smiley image, but should not replace second and third, because they are inside (and part of) tag. Here's an example snippet using the regexp from one of the answer.
var s = 'not feeling well today :/ check out this link http://example.com';
var replaced = s.replace(/(?:<[^\/]*?.*?<\/.*?>)|(:\/)/g, "smiley_image_here");
document.querySelector("pre").textContent = replaced;
<pre></pre>
It is strange but the DEMO shows that it captured the correct group, but the same regexp in replace function seem not to be working.
The regex itself to replace all bs with :blablabla: is not that hard:
.replace(/b/g, ":blablabla:")
It is a bit tricky to get the text nodes where we need to perform search and replace.
Here is a DOM-based example:
function replaceTextOutsideTags(input) {
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = input;
doc.appendChild( wrapper );
return textNodesUnder(doc);
}
function textNodesUnder(el){
var n, walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
while(n=walk.nextNode())
{
if (n.parentNode.nodeName.toLowerCase() === 'myelt')
n.nodeValue = n.nodeValue.replace(/:\/(?!\/)/g, "smiley_here");
}
return el.firstChild.innerHTML;
}
var s = 'not feeling well today :/ check out this link http://example.com';
console.log(replaceTextOutsideTags(s));
Here, we only modify the text nodes that are direct children of the custom-created element named myelt.
Result:
not feeling well today smiley_here check out this link http://example.com
var input = "this is a nice day for bowling <b>bbbb</b>";
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:')
+ c;
});
document.querySelector("pre").textContent = result;
<pre></pre>
You can do this:
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:') // you may do something else here
+ c;
});
Note that in most (no all but most) real complex use cases, it's much more convenient to manipulate a parsed DOM rather than just a string. If you're starting with a HTML page, you might use a library (some, like my one, accept regexes to do so).
I think you can use a regex like this : (Just for a simple data not a nested one)
/<[^\/]*?b.*?<\/.*?>|(b)/ig
[Regex Demo]
If you wanna use a regex I can suggest you use below regex to remove all tags recursively until all tags removed:
/<[^\/][^<]*>[^<]*<\/.*?>/g
then use a replace for finding any b.

Categories

Resources