I'm trying to figure out a way to play a sound notification whenever the phrase "2 credits" appears in the top right corner on a specific website at: https://www.submithub.com/. As it stands, that specific area of the site oscillates between "0 credits" and "2 credits" depending on if you've submitted to the site within the last few hours. Once you are able to submit again, the text changes to "2 credits."
I have also provided a screenshot of what I am talking about:
I have been using code primarily derived from the following StackOverflow thread: How can I have a Tampermonkey userscript play a sound when a specific word appears on a page?
Here is the code that I am working with:
// ==UserScript==
// #name SubmitHub Notification
// #include https://www.submithub.com/
// #grant none
// ==/UserScript==
/* eslint-disable no-multi-spaces */
//-- Set the watched terms using the awesome power of regex.
var watchRegx = /\b2 credits\b/i;
var alrtSound = new Audio ("data:audio/mp3;base64,SUQzAwAAAAAAIVRYWFgAAAAXAAAARW5jb2RlZCBieQBMYXZmNTIuMTYuMP/7kGQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAEluZm8AAAAPAAAACAAADrAAICAgICAgICAgICAgQEBAQEBAQEBAQEBAYGBgYGBgYGBgYGBgYICAgICAgICAgICAgKCgoKCgoKCgoKCgoKDAwMDAwMDAwMDAwMDg4ODg4ODg4ODg4ODg////////////////AAAAOUxBTUUzLjk5cgGqAAAAAAAAAAAUgCQElk4AAIAAAA6wvc1zzAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAP/7kGQAAAKyHsyVJMAAOAy3uqAUAFT9LU1Zt4AIukIiwwAgAAAA9HIgBAEAwgyaBAghk56oAwtO7JkyaeeIiIjP//EeyER/4iLu71jAwCFQIFDiwff4Jh/iAEAQ/UCAIBj/lwfB8Hw+CAIAgCADB8HwfSCAIOplz/0kwAADgAQYA/5znv//////5CEIT///v/yf5CfnIRuQn/n////kOc5znP/8jeggHADAMDh8Ph5/AAAAAYUp5aDYUTi1QAWFMPETMjUzsuArMZyNpdpzhQkN4CjB1E0a1CAN6BImTqNREhUBOI6RumQDCAomOhRLDyN4nogSoN5PrcxzsBYIy6WD9OQ87qBRokvznRrTaH4YmZFIYuUq3RVh6eByx4i6hHWfVr7tuLO8tua7lJATDjemKqNPuMmH7JW92rNXzyPMg4rJTvbwW9WM9osZia3K3///i1cP///4LnJ//////////v///9/9//a1f3+jfbl//fwhFFf3XYn/ryOUk43///6nI////5oYQjBEFVSgNgMavLKm6gKLJbs5loiEGlFGFP/7kmQMAEQ0Q1Y/aeACM0AIveCIAJFg+1JtJHbQwoAitBCJusgqIYmUAsoUCIhDIVAeLquwP4GuiiKRji+fby/u+wrGlicP7S/EGK+xNjEHdvHU7I7iqZ+0Ha5McNXOSnYUKPZDzxn1Kxv8f/feRHmbV1/jUCt/8avDh3iRIUFwckm5R4j+PiA8iPJnjhmbeJ4KEUWySNyhokkTEwXbeHnEKLbkJQPX/9n79H1UoaQa95V7dUV1uj9P//Sm2QXADEEjqWAKMAAAAGRRMFLIlxy7gHgI6KFAguWWAbEwIU3C0y44W/GrUHlcHIEAVQBzYGXjBcmPgAqAAxekACC4jvSOgbpMN5UYfLkNBYHmCKJGCYfJbBMVitHS0E7hFA4SMiVxAhHUZ0fE6MkW1EuudNt2ghIAYx05e+biCz57pWF7B1Egai14IZZAGYZ/////26AxGJAwiCK4xUVvUJOh9dlv/3f1vsU1ZnNoAm5pf2N/4mq//2DppS5gYUSGwyFSp0N1QOEJOPGxgSGdIDIIDEwUqC4gADFzQ2sbBRMUMZUATFj/+5JkEA/kiETUg3lMMC1gCN0EIm6RHQdUDWFyyL82IYAAm9k4BCJmyW7RjtwEQhixgRHgHwf8zJw4oQ8jVAYo9LktasuRHJtZyVCdyuGxQLKnZbiFCyOxCIVz0ycVlVXroBWYZfZEasAQkOGmyYNJDSEk85ianyVWnubGXqU4JV/v/qpJyjG15nlx3xq+7SSUYv59rJRpZHBJKkSQ1NSBCZrrHEvNSuAHfQxQuc9/7WKZ/axL7Ptf//0/9lS9K1tgUsOCAYVAUxPY6i9BEWBRckGNQcYHAQc/BCUIIJFI6mRPGrML3NvBNAcM6GOQwLBww5EkBm1RrYVZhw0inUa6qN5lhBGl/GQl0HQeVv3bi0iapMRCtVs50s8uwxc9KaY/LDI6ZBWu5FNusdOHVUl2+ifTa97/f39xDnW/mIq4/hyhKJjlobFruTjVv8+ln0PhirMBzZKH7z/Uj/Nl9F73nP03zNz9u+d+pCYXW3zjL3HTRxkTUYUcRA5XDARkhpgK3Q0EifsdESIAIQsicE3QU7PKYMMpnDIcRAYoBmSgJjSK//uSZBOORGVC1INsVjAlQBjNBCIAEGEPVmywWlDMtOIAEI75c4JAYEMFmxo2GRQv+YQYmjgqP4gDi1CF7c1mlAFNF6xAClYVDEHxTOmgAgBeDFeB4SNO6lwuER9+7NGv9kv2HZasHB8tHl/LyzaCDScvd/zCOk65r083ydtGJpv5CChh+/EWJ8RG9sAgAnAiBIUUk+/+r6//1eS/2//9qf/0f/vbbYtLTaCpIw9APgAeOrtOcdSZgPuOIC2nDoJeVnaBKUNKTyLrDycbLnCAY65DsWMqcDfDogkMwsEtOkz4WcYakQxhCc4AQCDhwaMj2zeCYNnNvu1CpGs5PLIz26sU2nFnReldZXF5VI03Sja9KD08nFlasI5zqlZE6eraVd/mDDkAXkIEMglhv///+z/R7/kg/368jL/l+Kczfnw///X///uy9jvIjgLfva+X9f/////LfhyOlOHT+6yFxrBxypQPaqwkzTqELKNCVVauvEFZZl7oq+y46cgAkKw61IEG9BUMDssTGR6YLnApdgTjtYQaRoZEXLaWgIkLYFXyGP/7kmQehxOBQdYDLBYiKyAYvAQiAA/NB1KtJLqItQBilACIAnpRTmEYUJT2Mum63n3r5jltfx6zDub3Pnt8pCodFQVRaDg1dpr376s2m9PLCiCiIJg3ufpAAtBgABCsIIpsTR1KZ3avT+n/zWoW3X76vs+n7P/1osuLi17QfcKuDgQBHhalIC2rNL3K4ay8jIDVAYlCVWr5UlEQsfd0cHhAURBjMl1fGPyA4oCFoQIV4IVzNCAvBqqwMBrXEZBZ6pn4m2HT67a1EtyUOHPwJaj8mpJRMjMVHtbrc05JIJ8/wcTOqIi9JpLLppVkoU+8+UO+qu+rj/dGMkUMA8o4e+kwmgchrIaup1a1M7vVb/U/70vujS77rz9A0Uf/K512jqf9+OUKwmMsBoYwoJBOAgAAQQQBAIeNkJLq1X4buThK+JAaVRV9XIUUcQqAspWWCqgMMbNZYHKsplAhUQ0C4FLeoTR0ZLIs0iCiwLBpkOm0Bpsk6+igBQakTlpvZbRptvG6OoqZ2ulbLLbyMudjLeUOhwwNxjsMX3G/ZH3/a1UaIDn/+5JkOgoDvUBVKykWJjPtGKwEIp5PUQVUbJh6UM2z4nQQCslnRlEYCbDAopmAPnNfM/8/LPN8v8/5f//+j9vK/9f/4v1////////+2zWs96o82VmVDzIYc56ZoEYLJTvRaZKEEQym+/KaClyS6qz3sEmIy+rSpGhzLotlD5G4G8KxsAUqXGEUFylUxqZJMEgMWbAVQUx3/XQ0F5IZ02SDoNlsXoeUtnOLbFEXPeZyydac5R4ySMgFRpZijAjMEkby8yvS9fsyJVr8bzaHFxIkF0O9NftowAAG2uAwICBf/96y9dX//f////tv7e/+/9W/b9vovRv//r/+1nnrkVMjMzVFIgmgHTwRdQIlAAyUDA5pbQ+V2HSjkOppxxr7OYFZdMwI9A0xjAbYgFQApwAAiFgGTFHBs6HHAs4sYN3CGhlg4GKSBhfQhAwCRMcZoXCkLmUQw2JlEvmlNNkqDmK5eeyLMkhNWWmbIoLdkGRQW7XuzKX+hbbdKr2UdmzXPrUgaVCgAAUC0UAADf//p//Tr////Sn////97//p/2////////uSZEwAA8VDVcViQAIyTXidoIgAW60vPvm9gABsgCODACAA///pbb89r0MikIjvGBqdRZ1ETgQAAAAAAAADhnMGRtyGYweHciRtAUcKNGGGgsPrDm7rAjBAgGLsJPmMhgqFFwVUzGQAyVyMGBDeZAGGJhxAb2jiVQZ8VmDQJEvGpRYFPTERBQUyUOL1DSuCQoaMDHhMlDkdDBwctGXoL5EIit1nLdV2gIcCCYcB1TszTtTe0+DjVmuSFsinTCVUGGs+aekMvGM3PjsPOSzGGX/k8itdaQuloUGu9ZjcFVuxavjjrs1jqzOSiP3pTupUqy6bpN3Zmr//VjjWZuig/v///8thqJz3////0s1dd/pU3/u6f////////+v2f//+XMfA6H/F1pN//hYSf/jzygAAYAAAImCEhz0feHViByqpZUsyg6zqBqru2XRcFL1IlLD5DUjhVONqVDYZyqGAhyqLqBdAIRxJEekhJ0sr1Wq2NBTqGoay0Yk891bf9rWrr+2/mvrX4tv1/9t+sF7aJQVBUGgaiUFn/lg5wad8SgqCo//7kmQ5D/OLLctnYeAAMMAYjeAIAIAAAaQAAAAgAAA0gAAABOt21ttt0CYKwoDQlBUFQVOlQViXxYGj3EQNA0e1A19YK1/EQNf/lf//5Y9wa//+CtVMQU1FMy45OS4zVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVU=");
//-- Customize this next for your page or get lazy and use "body".
var cntnrNode = document.querySelector ("#mainArea");
if (typeof cntnrNode === "undefined") console.error ("TM: Container node not found.");
else {
var wordObsrv = new MutationObserver (alertOnWord);
wordObsrv.observe (document.body, {childList: true, subtree: true} );
}
function alertOnWord (mutationRecords) {
mutationRecords.forEach (muttn => {
if (muttn.type === "childList" && typeof muttn.addedNodes === "object") {
muttn.addedNodes.forEach (newNode => {
/*-- Restrict the kinds of nodes watched for better performance
and/or to avoid false positives.
*/
if (newNode.nodeName === "DIV" || newNode.nodeName === "SPAN" || newNode.nodeName === "P") {
if (watchRegx.test (newNode.textContent) ) {
console.log ("Found new instance of target term!");
alrtSound.play ();
}
}
} );
}
} );
}
As it stands, I cannot get "2 credits" to make any notification sounds via Regex, such as shown below, but I have been able to get just "credits" to work:
var watchRegx = /\b2 credits\b/i;
Here is the HTML of the specific part of the page where "2 credits" lies on the site:
<span class="grey black-text lighten-2 credit-counter"><span class="remaining">2</span> credits</span>
The nature of the HTML indicates that this might not be as simple as using a Regex with "2 credits." Maybe there is an easy solution. Any and all advice would be wonderful. Thanks in advance.
I simply changed the RegEx code to var watchRegx = /2 credits/; A simple enough solution. Thanks for all the feedback
Here, we might want to separately capture our desired substrings then replace it:
.+?<span class="remaining">([0-9]+)?<\/span>(.+?)<\/span>
Our desired digits are in this capturing group:
([0-9]+)
and the second part is in this group:
(.+?)
Test
const regex = /.+?<span class="remaining">([0-9]+)?<\/span>(.+?)<\/span>/gm;
const str = `<span class="grey black-text lighten-2 credit-counter"><span class="remaining">2</span> credits</span>`;
const subst = `$1$2`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log(result);
DEMO
I'm attempting to make a Chrome extension that on load finds the first instance of a string and adds some details, as well as a link.
From what I can tell a TreeWalker is the best approach, so I've adapted this example (Dom TreeWalker).
This works well (though my implementation is inefficient). But I want the effect to be subtle, so I added a requirement that the string needs to appear in a paragraph to cut down on the chances of replacing the text in any menus or titles.
Now the behaviour is erratic (some replacements are made, others are not).
I know my .replace implementation at the end is not the right solution.
Any ideas?
var companies = [
"Apples",
"Black Beans",
"Sweet Potatoes"
];
var re = new RegExp(companies.join("|"), "i");
var walker = document.createTreeWalker(
document.body,
NodeFilter.SHOW_TEXT,
function(node) {
var matches = node.textContent.match(re);
//look for matching text but also accept only those in paragraphs
if((matches) && (node.parentNode.nodeName == "P")) {
return NodeFilter.FILTER_ACCEPT;
} else {
return NodeFilter.FILTER_SKIP;
}
},
false);
var nodes = [];
while(walker.nextNode()) {
nodes.push(walker.currentNode);
}
node=nodes[0]; //just replace the first hit within a paragraph
node.parentNode.innerHTML = node.parentNode.innerHTML.replace("Apples", "Apples [nutritional fact <a href='http://nutritionalinfo.com/apples'>Apples</a> ]");
node.parentNode.innerHTML = node.parentNode.innerHTML.replace("Black Beans", "Black Beans [nutritional fact <a href='http://nutritionalinfo.com/apples'>Apples</a> ]");
node.parentNode.innerHTML = node.parentNode.innerHTML.replace("Sweet Potatoes", "Sweet Potatoes[nutritional fact <a href='http://nutritionalinfo.com/apples'>Apples</a> ]");
String(s) is dynamic
It is originated from onclick event when user clicks anywhere in dom
if string(s)'s first part that is:
"login<b>user</b>account"
is enclosed in some element like this :
"<div>login<b>user</b>account</div>",
then I can get it with this:
alert($(s).find('*').andSelf().not('b,i').not(':empty').first().html());
// result is : login<b>user</b>account
But how can i get the same result in this condition when it is not enclosed in any element .i.e. when it is not enclosed in any element?
I tried this below code which works fine when first part do not include any <b></b> but it only gives "login" when it does include these tags.
var s = $.trim('login<b>user</b> account<tbody> <tr> <td class="translated">Lorem ipsum dummy text</td></tr><tr><td class="translated">This is a new paragraph</td></tr><tr><td class="translated"><b>Email</b></td></tr><tr><td><i>This is yet another text</i></td> </tr></tbody>');
if(s.substring(0, s.indexOf('<')) != ''){
alert(s.substring(0, s.indexOf('<')));
}
Note:
Suggest a generic solution that is not specific for this above string only. It should work for both the cases when there is bold tags and when there ain't any.
So it's just a b or a i, heh?
A recursive function is always the way to go. And this time, it's probably the best way to go.
var s = function getEm(elem) {
var ret = ''
// TextNode? Great!
if (elem.nodeType === 3) {
ret += elem.nodeValue;
}
else if (elem.nodeType === 1 &&
(elem.nodeName === 'B' || elem.nodeName === 'I')) {
// Element? And it's a B or an I? Get his kids!
ret += getEm(elem.firstChild);
}
// Ain't nobody got time fo' empty stuff.
if (elem.nextSibling) {
ret += getEm(elem.nextSibling);
}
return ret;
}(elem);
Jsfiddle demonstrating this: http://jsfiddle.net/Ralt/TZKsP/
PS: Parsing HTML with regex or custom tokenizer is bad and shouldn't be done.
You're trying to retrieve all of the text up to the first element that's not a <b> or <i>, but this text could be wrapped in an element itself. This is SUPER tricky. I feel like there's a better way to implement whatever it is you're trying to accomplish, but here's a solution that works.
function initialText(s){
var test = s.match(/(<.+?>)?.*?<(?!(b|\/|i))/);
var match = test[0];
var prefixed_element = test[1];
// if the string was prefixed with an element tag
// remove it (ie '<div> blah blah blah')
if(prefixed_element) match = match.slice(prefixed_element.length);
// remove the matching < and return the string
return match.slice(0,-1);
}
You're lucky I found this problem interesting and challenging because, again, this is ridiculous.
You're welcome ;-)
Try this:
if (s.substring(0, s.indexOf('<')) != '') {
alert(s.substring(0, s.indexOf('<tbody>')));
}
I need a reliable JavaScript library / function to check if an HTML snippet is valid that I can call from my code. For example, it should check that opened tags and quotation marks are closed, nesting is correct, etc.
I don't want the validation to fail because something is not 100% standard (but would work anyway).
Update: this answer is limited - please see the edit below.
Expanding on #kolink's answer, I use:
var checkHTML = function(html) {
var doc = document.createElement('div');
doc.innerHTML = html;
return ( doc.innerHTML === html );
}
I.e., we create a temporary div with the HTML. In order to do this, the browser will create a DOM tree based on the HTML string, which may involve closing tags etc.
Comparing the div's HTML contents with the original HTML will tell us if the browser needed to change anything.
checkHTML('<a>hell<b>o</b>')
Returns false.
checkHTML('<a>hell<b>o</b></a>')
Returns true.
Edit: As #Quentin notes below, this is excessively strict for a variety of reasons: browsers will often fix omitted closing tags, even if closing tags are optional for that tag. Eg:
<p>one para
<p>second para
...is considered valid (since Ps are allowed to omit closing tags) but checkHTML will return false. Browsers will also normalise tag cases, and alter white space. You should be aware of these limits when deciding to use this approach.
Well, this code:
function tidy(html) {
var d = document.createElement('div');
d.innerHTML = html;
return d.innerHTML;
}
This will "correct" malformed HTML to the best of the browser's ability. If that's helpful to you, it's a lot easier than trying to validate HTML.
None of the solutions presented so far is doing a good job in answering the original question, especially when it comes to
I don't want the validation to fail because something is not 100%
standard (but would work anyways).
tldr >> check the JSFiddle
So I used the input of the answers and comments on this topic and created a method that does the following:
checks html string tag by tag if valid
trys to render html string
compares theoretically to be created tag count with actually rendered html dom tag count
if checked 'strict', <br/> and empty attribute normalizations ="" are not ignored
compares rendered innerHTML with given html string (while ignoring whitespaces and quotes)
Returns
true if rendered html is same as given html string
false if one of the checks fails
normalized html string if rendered html seems valid but is not equal to given html string
normalized means, that on rendering, the browser ignores or repairs sometimes specific parts of the input (like adding missing closing-tags for <p> and converts others (like single to double quotes or encoding of ampersands).
Making a distinction between "failed" and "normalized" allows to flag the content to the user as "this will not be rendered as you might expect it".
Most times normalized gives back an only slightly altered version of the original html string - still, sometimes the result is quite different. So this should be used e.g. to flag user-input for further review before saving it to a db or rendering it blindly. (see JSFiddle for examples of normalization)
The checks take the following exceptions into consideration
ignoring of normalization of single quotes to double quotes
image and other tags with a src attribute are 'disarmed' during rendering
(if non strict) ignoring of <br/> >> <br> conversion
(if non strict) ignoring of normalization of empty attributes (<p disabled> >> <p disabled="">)
encoding of initially un-encoded ampersands when reading .innerHTML, e.g. in attribute values
.
function simpleValidateHtmlStr(htmlStr, strictBoolean) {
if (typeof htmlStr !== "string")
return false;
var validateHtmlTag = new RegExp("<[a-z]+(\s+|\"[^\"]*\"\s?|'[^']*'\s?|[^'\">])*>", "igm"),
sdom = document.createElement('div'),
noSrcNoAmpHtmlStr = htmlStr
.replace(/ src=/, " svhs___src=") // disarm src attributes
.replace(/&/igm, "#svhs#amp##"), // 'save' encoded ampersands
noSrcNoAmpIgnoreScriptContentHtmlStr = noSrcNoAmpHtmlStr
.replace(/\n\r?/igm, "#svhs#nl##") // temporarily remove line breaks
.replace(/(<script[^>]*>)(.*?)(<\/script>)/igm, "$1$3") // ignore script contents
.replace(/#svhs#nl##/igm, "\n\r"), // re-add line breaks
htmlTags = noSrcNoAmpIgnoreScriptContentHtmlStr.match(/<[a-z]+[^>]*>/igm), // get all start-tags
htmlTagsCount = htmlTags ? htmlTags.length : 0,
tagsAreValid, resHtmlStr;
if(!strictBoolean){
// ignore <br/> conversions
noSrcNoAmpHtmlStr = noSrcNoAmpHtmlStr.replace(/<br\s*\/>/, "<br>")
}
if (htmlTagsCount) {
tagsAreValid = htmlTags.reduce(function(isValid, tagStr) {
return isValid && tagStr.match(validateHtmlTag);
}, true);
if (!tagsAreValid) {
return false;
}
}
try {
sdom.innerHTML = noSrcNoAmpHtmlStr;
} catch (err) {
return false;
}
// compare rendered tag-count with expected tag-count
if (sdom.querySelectorAll("*").length !== htmlTagsCount) {
return false;
}
resHtmlStr = sdom.innerHTML.replace(/&/igm, "&"); // undo '&' encoding
if(!strictBoolean){
// ignore empty attribute normalizations
resHtmlStr = resHtmlStr.replace(/=""/, "")
}
// compare html strings while ignoring case, quote-changes, trailing spaces
var
simpleIn = noSrcNoAmpHtmlStr.replace(/["']/igm, "").replace(/\s+/igm, " ").toLowerCase().trim(),
simpleOut = resHtmlStr.replace(/["']/igm, "").replace(/\s+/igm, " ").toLowerCase().trim();
if (simpleIn === simpleOut)
return true;
return resHtmlStr.replace(/ svhs___src=/igm, " src=").replace(/#svhs#amp##/, "&");
}
Here you can find it in a JSFiddle https://jsfiddle.net/abernh/twgj8bev/ , together with different test-cases, including
"<a href='blue.html id='green'>missing attribute quotes</a>" // FAIL
"<a>hell<B>o</B></a>" // PASS
'hell<b>o</b>' // PASS
'<a href=test.html>hell<b>o</b></a>', // PASS
"<a href='test.html'>hell<b>o</b></a>", // PASS
'<ul><li>hell</li><li>hell</li></ul>', // PASS
'<ul><li>hell<li>hell</ul>', // PASS
'<div ng-if="true && valid">ampersands in attributes</div>' // PASS
.
9 years later, how about using DOMParser?
It accepts string as parameter and returns Document type, just like HTML.
Thus, when it has an error, the returned document object has <parsererror> element in it.
If you parse your html as xml, at least you can check your html is xhtml compliant.
Example
> const parser = new DOMParser();
> const doc = parser.parseFromString('<div>Input: <input /></div>', 'text/xml');
> (doc.documentElement.querySelector('parsererror') || {}).innerText; // undefined
To wrap this as a function
function isValidHTML(html) {
const parser = new DOMParser();
const doc = parser.parseFromString(html, 'text/xml');
if (doc.documentElement.querySelector('parsererror')) {
return doc.documentElement.querySelector('parsererror').innerText;
} else {
return true;
}
}
Testing the above function
isValidHTML('<a>hell<B>o</B></a>') // true
isValidHTML('hell') // true
isValidHTML('<a href='test.html'>hell</a>') // true
isValidHTML("<a href=test.html>hell</a>") // This page contains the following err..
isValidHTML('<ul><li>a</li><li>b</li></ul>') // true
isValidHTML('<ul><li>a<li>b</ul>') // This page contains the following err..
isValidHTML('<div><input /></div>' // true
isValidHTML('<div><input></div>' // This page contains the following err..
The above works for very simple html.
However if your html has some code-like texts; <script>, <style>, etc, you need to manipulate just for XML validation although it's valid HTML
The following updates code-like html to a valid XML syntax.
export function getHtmlError(html) {
const parser = new DOMParser();
const htmlForParser = `<xml>${html}</xml>`
.replace(/(src|href)=".*?&.*?"/g, '$1="OMITTED"')
.replace(/<script[\s\S]+?<\/script>/gm, '<script>OMITTED</script>')
.replace(/<style[\s\S]+?<\/style>/gm, '<style>OMITTED</style>')
.replace(/<pre[\s\S]+?<\/pre>/gm, '<pre>OMITTED</pre>')
.replace(/ /g, ' ');
const doc = parser.parseFromString(htmlForParser, 'text/xml');
if (doc.documentElement.querySelector('parsererror')) {
console.error(htmlForParser.split(/\n/).map( (el, ndx) => `${ndx+1}: ${el}`).join('\n'));
return doc.documentElement.querySelector('parsererror');
}
}
function validHTML(html) {
var openingTags, closingTags;
html = html.replace(/<[^>]*\/\s?>/g, ''); // Remove all self closing tags
html = html.replace(/<(br|hr|img).*?>/g, ''); // Remove all <br>, <hr>, and <img> tags
openingTags = html.match(/<[^\/].*?>/g) || []; // Get remaining opening tags
closingTags = html.match(/<\/.+?>/g) || []; // Get remaining closing tags
return openingTags.length === closingTags.length ? true : false;
}
var htmlContent = "<p>your html content goes here</p>" // Note: String without any html tag will consider as valid html snippet. If it’s not valid in your case, in that case you can check opening tag count first.
if(validHTML(htmlContent)) {
alert('Valid HTML')
}
else {
alert('Invalid HTML');
}
Using pure JavaScript you may check if an element exists using the following function:
if (typeof(element) != 'undefined' && element != null)
Using the following code we can test this in action:
HTML:
<input type="button" value="Toggle .not-undefined" onclick="toggleNotUndefined()">
<input type="button" value="Check if .not-undefined exists" onclick="checkNotUndefined()">
<p class=".not-undefined"></p>
CSS:
p:after {
content: "Is 'undefined'";
color: blue;
}
p.not-undefined:after {
content: "Is not 'undefined'";
color: red;
}
JavaScript:
function checkNotUndefined(){
var phrase = "not ";
var element = document.querySelector('.not-undefined');
if (typeof(element) != 'undefined' && element != null) phrase = "";
alert("Element of class '.not-undefined' does "+phrase+"exist!");
// $(".thisClass").length checks to see if our elem exists in jQuery
}
function toggleNotUndefined(){
document.querySelector('p').classList.toggle('not-undefined');
}
It can be found on JSFiddle.
function isHTML(str)
{
var a = document.createElement('div');
a.innerHTML = str;
for(var c= a.ChildNodes, i = c.length; i--)
{
if (c[i].nodeType == 1) return true;
}
return false;
}
Good Luck!
It depends on js-library which you use.
Html validatod for node.js https://www.npmjs.com/package/html-validator
Html validator for jQuery https://api.jquery.com/jquery.parsehtml/
But, as mentioned before, using the browser to validate broken HTML is a great idea:
function tidy(html) {
var d = document.createElement('div');
d.innerHTML = html;
return d.innerHTML;
}
Expanding on #Tarun's answer from above:
function validHTML(html) { // checks the validity of html, requires all tags and property-names to only use alphabetical characters and numbers (and hyphens, underscore for properties)
html = html.toLowerCase().replace(/(?<=<[^>]+?=\s*"[^"]*)[<>]/g,"").replace(/(?<=<[^>]+?=\s*'[^']*)[<>]/g,""); // remove all angle brackets from tag properties
html = html.replace(/<script.*?<\/script>/g, ''); // Remove all script-elements
html = html.replace(/<style.*?<\/style>/g, ''); // Remove all style elements tags
html = html.toLowerCase().replace(/<[^>]*\/\s?>/g, ''); // Remove all self closing tags
html = html.replace(/<(\!|br|hr|img).*?>/g, ''); // Remove all <br>, <hr>, and <img> tags
//var tags=[...str.matchAll(/<.*?>/g)]; this would allow for unclosed initial and final tag to pass parsing
html = html.replace(/^[^<>]+|[^<>]+$|(?<=>)[^<>]+(?=<)/gs,""); // remove all clean text nodes, note that < or > in text nodes will result in artefacts for which we check and return false
tags = html.split(/(?<=>)(?=<)/);
if (tags.length%2==1) {
console.log("uneven number of tags in "+html)
return false;
}
var tagno=0;
while (tags.length>0) {
if (tagno==tags.length) {
console.log("these tags are not closed: "+tags.slice(0,tagno).join());
return false;
}
if (tags[tagno].slice(0,2)=="</") {
if (tagno==0) {
console.log("this tag has not been opened: "+tags[0]);
return false;
}
var tagSearch=tags[tagno].match(/<\/\s*([\w\-\_]+)\s*>/);
if (tagSearch===null) {
console.log("could not identify closing tag "+tags[tagno]+" after "+tags.slice(0,tagno).join());
return false;
} else tags[tagno]=tagSearch[1];
if (tags[tagno]==tags[tagno-1]) {
tags.splice(tagno-1,2);
tagno--;
} else {
console.log("tag '"+tags[tagno]+"' trying to close these tags: "+tags.slice(0,tagno).join());
return false;
}
} else {
tags[tagno]=tags[tagno].replace(/(?<=<\s*[\w_\-]+)(\s+[\w\_\-]+(\s*=\s*(".*?"|'.*?'|[^\s\="'<>`]+))?)*/g,""); // remove all correct properties from tag
var tagSearch=tags[tagno].match(/<(\s*[\w\-\_]+)/);
if ((tagSearch===null) || (tags[tagno]!="<"+tagSearch[1]+">")) {
console.log("fragmented tag with the following remains: "+tags[tagno]);
return false;
}
var tagSearch=tags[tagno].match(/<\s*([\w\-\_]+)/);
if (tagSearch===null) {
console.log("could not identify opening tag "+tags[tagno]+" after "+tags.slice(0,tagno).join());
return false;
} else tags[tagno]=tagSearch[1];
tagno++;
}
}
return true;
}
This performs a few additional checks, such as testing whether tags match and whether properties would parse. As it does not depend on an existing DOM, it can be used in a server environment, but beware: it is slow. Also, in theory, tags can be names much more laxly, as you can basically use any unicode (with a few exceptions) in tag- and property-names. This would not pass my own sanity-check, however.
the questions says it all :)
eg. we have >, we need > using only javascript
Update: It seems jquery is the easy way out. But, it would be nice to have a lightweight solution. More like a function which is capable to do this by itself.
You could do something like this:
String.prototype.decodeHTML = function() {
var map = {"gt":">" /* , … */};
return this.replace(/&(#(?:x[0-9a-f]+|\d+)|[a-z]+);?/gi, function($0, $1) {
if ($1[0] === "#") {
return String.fromCharCode($1[1].toLowerCase() === "x" ? parseInt($1.substr(2), 16) : parseInt($1.substr(1), 10));
} else {
return map.hasOwnProperty($1) ? map[$1] : $0;
}
});
};
function decodeEntities(s){
var str, temp= document.createElement('p');
temp.innerHTML= s;
str= temp.textContent || temp.innerText;
temp=null;
return str;
}
alert(decodeEntities('<'))
/* returned value: (String)
<
*/
I know there are libraries out there, but here are a couple of solutions for browsers. These work well when placing html entity data strings into human editable areas where you want the characters to be shown, such as textarea's or input[type=text].
I add this answer as I have to support older versions of IE and I feel that it wraps up a few days worth of research and testing. I hope somebody finds this useful.
First this is for more modern browsers using jQuery, Please note that this should NOT be used if you have to support versions of IE before 10 (7, 8, or 9) as it will strip out the newlines leaving you with just one long line of text.
if (!String.prototype.HTMLDecode) {
String.prototype.HTMLDecode = function () {
var str = this.toString(),
$decoderEl = $('<textarea />');
str = $decoderEl.html(str)
.text()
.replace(/<br((\/)|( \/))?>/gi, "\r\n");
$decoderEl.remove();
return str;
};
}
This next one is based on kennebec's work above, with some differences which are mostly for the sake of older IE versions. This does not require jQuery, but does still require a browser.
if (!String.prototype.HTMLDecode) {
String.prototype.HTMLDecode = function () {
var str = this.toString(),
//Create an element for decoding
decoderEl = document.createElement('p');
//Bail if empty, otherwise IE7 will return undefined when
//OR-ing the 2 empty strings from innerText and textContent
if (str.length == 0) {
return str;
}
//convert newlines to <br's> to save them
str = str.replace(/((\r\n)|(\r)|(\n))/gi, " <br/>");
decoderEl.innerHTML = str;
/*
We use innerText first as IE strips newlines out with textContent.
There is said to be a performance hit for this, but sometimes
correctness of data (keeping newlines) must take precedence.
*/
str = decoderEl.innerText || decoderEl.textContent;
//clean up the decoding element
decoderEl = null;
//replace back in the newlines
return str.replace(/<br((\/)|( \/))?>/gi, "\r\n");
};
}
/*
Usage:
var str = ">";
return str.HTMLDecode();
returned value:
(String) >
*/
Here is a "class" for decoding whole HTML document.
HTMLDecoder = {
tempElement: document.createElement('span'),
decode: function(html) {
var _self = this;
html.replace(/&(#(?:x[0-9a-f]+|\d+)|[a-z]+);/gi,
function(str) {
_self.tempElement.innerHTML= str;
str = _self.tempElement.textContent || _self.tempElement.innerText;
return str;
}
);
}
}
Note that I used Gumbo's regexp for catching entities but for fully valid HTML documents (or XHTML) you could simpy use /&[^;]+;/g.
There is nothing built in, but there are many libraries that have been written to do this.
Here is one.
And here one that is a jQuery plugin.