Get HTML attribute value as is via JavaScript - javascript

I have a website where I feed information to an analytics engine via the meta tag as such:
<meta property="analytics-track" content="Hey There!">
I am trying to write a JavaScript script (no libraries) to access the content section and retrieve the information as is. In essence, it should include the HTML entity and not transform/strip it.
The reason is that I am using PhantomJS to examine which pages have HTML entities in the meta data and remove them as they screw up my analytics data (For example, I'll have entries that include both Hey There! and Hey There! when in fact they are both the same page, and thus should not have two separate data points).
The most simple JS format I have is this:
document.getElementsByTagName('meta')[4].getAttribute("content")
And when I examined it in on console, it returns the text in the following format:
"Hey There!"
What I would like it to return is:
"Hey There!"
How can I ensure that the data returned will keep the HTML entity. If that's not possible, is there a way to detect HTML entity via JavaScript. I tried:
document.getElementsByTagName('meta')[4].getAttribute("content").includes(' ')
But it returns false

Use queryselector to select the element with the property value "analytics-track", outerHTML to get the element as a String and match to select the unparsed value of the content property with Regex.
document.querySelector('[property=analytics-track]').outerHTML.match(/content="(.*)"/)[1];
See http://jsfiddle.net/sjmcpherso/mz63fnjg/

You can't, that isn't really there. Its just an encoding for a non-breaking space. To the document, the DOM, the web page, to everything, it looks like:
Hey There!
Except the character between the y and the T isn't a space of the sort you'd get by hitting the space bar, its a completely different character.
Observe:
<span id='a' data-a='Hey There!'></span>
<span id='a1' data-a='Hey There!'></span>
<span id='b' data-b='Hey There!'></span>
var a = document.getElementById('a').getAttribute('data-a')
var a1 = document.getElementById('a1').getAttribute('data-a')
var b = document.getElementById('b').getAttribute('data-b')
console.log(a,b,a==b)
console.log(a,a1,a==a1)
Gives:
Hey There! Hey There! false
Hey There! Hey There! true
Instead, consider altering your method of 'equality' to view a space and a non-breaking space as equal:
var re = '/(\xC2\xA0/| )';
x = x.replace(re, ' ');

To get the HTML of the meta tag as is, use outerHTML:
document.getElementsByTagName('meta')[4].outerHTML
Working Snippet:
console.log(document.getElementsByTagName('meta')[0].outerHTML);
<meta property="analytics-track" content="Hey There!">
<h3>Check your console</h3>
Element.outerHTML - Web APIs | MDN
Update 1:
To filter out the meta content, use the following:
metaInfo.match(/content="(.*)">/)[1]; // assuming that content attribute is always at the end of the meta tag
Working Snippet:
var metaInfo = document.getElementsByTagName('meta')[0].outerHTML;
console.log(metaInfo);
console.log('Meta Content = ' + metaInfo.match(/content="(.*)">/)[1]);
<meta property="analytics-track" content="Hey There!">
<h3>Check your console</h3>

Related

Convert String from Element to working HTML code with jQuery

I'm having a checkbox which is generated dinamically. The Checkbox text contains a string with some html code inside of it. The text comes directly from the database and doesn't display it as html, but just as a string. Is it possible to convert the string to html, so it get displayed correctly? The checkbox:
<label for="id_122_gen">"I hereby consent to the processing of my above-mentioned data according
to the <a href="/declarationofconsent.pdf" target="_blank">declaration of consent." </label>
<input type="checkbox" name="confirm" id="id_122_gen" >
I tried to get the containing text with $.text() method, what worked so far.
$mystring = $("#id_122_gen").text();
After that I've tried to use jQuery method $.parseHTML() and save the result again.
$myhtml = $.parseHTML( $mystring );
Apparantly it is saved as an array, because when I try to save the result again with the $.text() method, it displays:
[object Text],/declarationofconsent.pdf,[object Text]
It's just this. No clickable link and the checkbox disappeared aswell. I'm a bit confused now what to do and don't know how I can display the correct content with a clickable link.
The solution depends on how the html of your page is generated. Your label either has has it's inner html escaped or not.
This is important; inspect view may guide you wrong with escaped HTML and show as proper HTML, so make sure to check the page source.
Most likely your data is HTML escaped and that's why you can't see the link on initial render.
If you see < and > inside the label's source it's HTML escaped.
If it's escaped and you just want to convert it to proper HTML and set it to the label, use this:
$("label[for='id_122_gen']").html( $("label[for='id_122_gen']").text() );
Basically this unescapes the label's value. It reads the innerHTML as text and thus changes the escaped characters to real ones, and when you set it back as it's html value and the innerHTML becomes unescaped.
If you just want to get the link, read on.
If the value inside the label is HTML escaped then you'll have to use .text() to read that value. If it contains unescaped HTML then you'll have to use .html().
Afterwards the flow is the same, you parse the html first.
Since there is text and a link, the parsed html will return as an array with multiple elements. If you just want to get the link you have to search in the array.
You can check out the code below.
$mystring = $("label[for='id_122_gen']").text(); //Use .html() for unescaped
$myhtml = $.parseHTML( $mystring );
var mylink = null;
var e;
while (e = $myhtml.pop())
{
if(e.tagName == "A"){
mylink = e;
break;
}
}
console.log(mylink);

Unterminated string constant during #Html.Partial from javascript object

var html = '#Html.Partial("_mypartialview")';
$("#container").html(html);
I am working in a single page application and I don't want to hide the container div element; instead I want to load the html content into my javascript object.
But I get the error:
Unterminated string constant.
I have tried various methods without any success. Any help as to why I get this error would be really appreciated.
Like this
var html = "#Html.Partial("_mypartialview").ToHtmlString().Replace("\n", "<br/>").Replace(" ", "")";
The ToHtmlString() method HTML encodes your html so the quotes won't confuse your JavaScript.
The two replace methods will remove new lines and whitespace so the content is all on one line.
Update
_mypartialview
<div>
<span data-bind="text:playerInfo.Name"> </span>
</div>
Code
var html = "#Html.Partial("_mypartialview").ToString().Replace(Environment.NewLine, "").Replace(" ", "")"
This outputs the following
var html = "<div><spandata-bind="text:playerInfo.Name"></span></div>"

Using regexes to modify the text of html (with javascript)

I want to modify the text in a html file using javascript in an android webview.
Essentially, I want to do what android Linkify does to text, but I don't want to do it with java code, because I feel like that might delay the webview rendering the html (if I parse the text before sending it to the webview).
So, for example a piece of html like this:
<html>
<body>
google.com <!--these two shouldn't be linked-->
akhilcherian#gmail.com <!--these two shouldn't be linked-->
<p>www.google.com</p> <!--this should be linked-->
<p>102-232-2312 2032-122-332 </p><!-- should be linked as numbers-->
</body>
</html>
Should become this:
<html>
<body>
google.com
akhilcherian#gmail.com
<p>www.google.com</p>
<p>102-232-2312 <a href="tel:2032-122-332>2032-122-332</a> </p>
</body>
</html>
I already have the regexes to convert numbers and email ids to links, and they're working well enough. What I want to ensure is that I don't link anything that's already within tags. I've removed anchor tags, so they're not an issue, but I also need to avoid linking things like this:
<div width="1000"> <!-- Don't want this '1000' to be linked (but I do want other 4 digit numbers to be)-->
So for example if my regex for links is:
var replacePattern1 = /((https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim
How do I make sure that it's not within < and >? (Answers using javascript would be appreciated, but if you feel like this is a stupid way of doing it, please let me know about alternatives).
If you're answering with javascript, this question can essentially be shortened to:
How do I write a regex in javascript to search for patterns which are not surrounded by '<' '>' tags
So if you use JS than mean is client side, your DOM page have free access of all objects of your page coef events.
May be in this step you dont need to use a regex just using DOM.
jquery lib can easy update DOM object.
in your step you want only tag.
So i suggest :
//using jquery
$("p").each(function(){
console.log($(this))
});
//js
var paras = document.getElementsByTagName("p");
for(p in paras){
console.log(paras[p])
}
As i tell you the deal is manipulate the DOM so example with you step dunno if exactly what you try to get :
var paras = document.getElementsByTagName("p");
var hrefs = [];
//what you want to replace in the loop of p
var json_urls = {"links":["http://", "tel:"]};
for(p in paras){
//copy of text content of your p
var text_cp = paras[p].textContent;
//delete the p[i] content
paras[p].textContent = "";
//create element dom a
hrefs[p] = document.createElement("a");
//i add attribute id with some affectation unique
hrefs[p].id = "_" + p;
//add attribute href to a with some affectation replace + content
hrefs[p].href = json_urls.links[p] + text_cp;
hrefs[p].textContent = text_cp;
paras[p].appendChild(hrefs[p]);
}

Javascript regular expression prevent matching inside tags

I have to match a string that is not inside tags. I am working on projects that I don't have control over the back-end html rendering code. What I need to do is add a hover functionality for multiple dynamic words. I created a script that will look for those key words in specific elements and add their description in title tags for the hover. My problem is that if other keywords are found in other keyword's title tags.
My JS:
var str = 'match <span title="not match here">match</span> match';
str.replace( /match/gim, 'ok' );
I do not want the "match" word in the title attribute to be replaced, my desired result is:
'ok <span title="not match here">ok</span> ok'
how can I do that with Javascript?
I tried the expression below but it's not working for me:
^((?!(".+")match)*$
You need to capture tags first to be able to avoid them:
var result = str.replace(/(<[^>]*>)|match/gi, function (_,g1) {
return (g1==undefined)? 'ok':g1;
});
But if you can, using the DOM is probably the best way.

get snippet of html text without creating a DOM?

Given a html, I'd like to get first 100 characters of text (content without the markups)
I could create a jquery object with the html and use .text().
But the problem is that browsers may load all the images in the html.
So I wonder if there's a way to extract text snippet from html without building a DOM.
edit
given a html (just a string of html, not part of DOM yet)
<p>my lord</p><img src="some_url"><br>I'm overloaded
I could do $('<div/>').append(html).text().substr(0, 5); to get 5 characters.
But the img is downloaded by browser, and I don't want that.
var s = "<p>my lord</p><img src=\"some_url\"><br>I'm overloaded"
s = s.replace(/<[^>]+>/g,'').substr(0, 100);
You could remove the image elements and then load it to the dom
Something like
var html = "<p>my lord</p><img src="some_url"><br>I'm overloaded";
html = html.replace(/<img[^>]*>/g,"");
var firstFive = $('<div/>').append(html).text().substr(0, 5);

Categories

Resources