In the response of type application/x-javascript I am picking the required JSON portion in a varaible. Below is the JSON-
{
"__ra":1,
"payload":null,
"data":[
[
"replace",
"",
true,
{
"__html": "\u003Cspan class=\"highlight fsm\" id=\"u_c_0\">I want this text only\u003C\/span>"
}
]
]
}
From the references, which I got from Stackoverflow, I am able to pick the content inside data in the following way-
var temp = JSON.parse(resp).data;
But my aim is to get only the text part of __html value which is I want this text only . Somebody help.
First you have to access the object you targeted:
var html = JSON.parse(resp).data[0][3]._html;
But then the output you want is I want this text only
The html variable doesn't containt that text but some html where the content you're looking for is the text inside a span
If you accept including jQuery in your project you can access that content this way
var text = $(html).text();
To put it all together:
var html = JSON.parse(resp).data[0][3]._html;
var div = document.createElement("div");
div.innerHTML = html;
var text = div.textContent || div.innerText || "";
Kudos #Tim Down for this answer on cross-browser innerHTML: JavaScript: How to strip HTML tags from string?
First you'll need to be a bit more specific with that data to get to the string of text you want:
var temp = JSON.parse(resp).data[0][3]['__html'];
Next you'll need to search that string to extract the data you want. That will largely depend on the regularity of the response you are getting. In any case, you will probably need to use a regular expression to parse the string you get in the response.
In this case, you are trying to get the text within the <span> element in the string. If that was the case for all your responses, you could do something like:
var text = /<span[^>]*>([^<]*)<\/span>/.exec(temp)[1];
This very specifically looks for text within the opening and closing of one span tag that contains no other HTML tags.
The main part to look at in the expression here is the ([^<]*), which will capture any character that is not an opening angled bracket, <. Everything around this is looking for instances of <span> with optional attributes. The exec is the method you perform on the temp string to return a match and the [1] will give you the first and only capture (e.g. the text between the <span> tags).
You would need read up more about RegExp to find out how to do something more specific (or provide more specific information in your question about the pattern of response you are looking for). But's generally well worth reading up on regular expressions if you're going to be doing this kind of work (parsing text, looking for patterns and matches) because they are a very concise and powerful way of doing it, if a little confusing at first.
Related
I want to use regex to extract some text from the website html code i've retrieved by using the Nodejs. And the text i received was like this:
<body>
...
<p>text with certain format that I want.</p>
...
</body>
How should I extract the test and store it in a variable?
The reason I do this is because I need to retrieve the information from numerous pages, it is impossible to do it manually.
Huge thanks in advance!
If you're just looking for the first instance of a paragraph, you can do this, but this will only fetch the content of the first paragraph. If you want a specific paragraph, you need a way to identify that paragraph as opposed to every other one in the HTML.
If you're looking for something more specific, we'll need to know more about what you're trying to do.
var regex = /<p>(.*)?<\/p>/,
html = [your html here],
results = regex.exec(html);
console.log(results); // an array of matches
var text= '<p>text with certain format that I want.</p>';
jQuery('<div>' + text + '</div>').text();
I'm building a search results page (in Angular) but using regular expressions to highlight the searched 'keywords' based on a condition. I'm having problems with RegExp with getting the correct condition, so apologies if my current syntax is messy, I've been playing about for hours.
Basically for this test i'm highlighting the word 'midlands' and I want to highlight every 'midlands' word except the word within the 'a' tag <a /> of the href="" attribute. So anything that's apart of the URL I do not want to highlight as I'll be wrapping the keywords within a span and this will break the url structure. Can anyone help? - I think I'm almost there.
Here's the current RegExp I'm using:
/(\b|^|)(\s|\()midlands(\b|$)(|\))/gi
Here's a link to test what I'm after.
https://regex101.com/r/wV4gC3/2
Further info, after the view has rendered I grab the the html content of the repeating results and then do a search based on the rendered html with the condition above. - If this helps anyone.
You're going about this all wrong. Don't parse HTML with regular expressions - use the DOM's built in HTML parser and explicitly run the regex on text nodes.
First we get all the text nodes. With jQuery that's:
var texts = $(elem).content().get().filter(function(el){
return el.nodeType === 3; // 3 is text
});
Otherwise - see the answer here for code for getting all text nodes in VanillaJS.
Then, iterate them and replace the relevant text only in the text nodes:
foreach(var text of texts) { // if old browser - angular.forEach(texts, fn(text)
text.textContent = text.textContent.replace(/midlands/g, function(m){
return "<b>" + m + "</b>"; // surround with bs.
});
}
I've been working at this for a week and I'm stumped.
I'm trying to parse an RSS feed from SharePoint using jQuery. Using $.find works great on extracting the data between valid XML tags in the feed, but unfortunately one of the tags stores several HTML tags instead of the nice and clean strings like the others.
I have the tag extracted and stored as a string using the following:
$(xml).find("item").each(function () {
var description = $(this).find('description').text();
})
Which gives me the contents of the description tag:
<![CDATA[<div><b>Title:</b> Welcome!</div>
<div><b>Modified:</b> 6/10/2014 7:58 AM</div>
<div><b>Created:</b> 6/3/2014 2:55 PM</div>
<div><b>Created By:</b> John Smith</div>
<div><b>Modified By:</b> Samuel Smith</div>
<div><b>Version:</b> 1.0</div>
<div><b>AlertContent:</b> Stop the presses.</div>
<div><b>Team:</b> USA.</div>]]>
Now my problem is extracting and storing the useful bits. Is there a way to only extract the text following AlertContent:</b>? It seems this might be possible using regular expressions, but I don't know how to make a filter that would start at the end of the bold tag and extend all the way until the start of the closing div tag. Or is there a better way through jQuery's methods?
Sure you're quite right; regular expressions can help you do that. Here is how you can do it:
var alertContent = description.replace(/^.*AlertContent:</b>([^<]*).*$/i, '$1');
WORKING JSFIDDLE DEMO
I'm sure you've heard the warnings about parsing xml with regex. Nevertheless, in case you'd like to know how to do it with regex, this simple pattern will do it:
AlertContent:<\/b>([^<]*)
We start by matching AlertContent:</b>
Then the negative character class [^<]* matches all characters that are not a < and the parentheses capture them to Group 1
All we need to do is read Group 1. Here is sample code to do it:
var regex = /AlertContent:<\/b>([^<]*)/;
var match = regex.exec(string);
if (match != null) {
alert = match[1];
}
Suppose I have text called "Hello World" inside a DIV in html file. I want to manipulate sixth position in "Hello World" text and replace that result in DOM, like using innerHTML or something like that.
The way i do is
var text = document.getElementById("divID").innerText;
now somehow I got the text and and manipluate the result using charAt for particular position and replace the result in html by replacing the whole string not just that position element. What I want to ask is do we have to every time replace the whole string or is there a way using which we can extract the character from particular position and replace the result in that position only not the whole string or text inside the div.
If you just need to insert some text into an already existing string you should use replace(). You won't really gain anything by trying to replace only one character as it will need to make a new string anyway (as strings are immutable).
jsFiddle
var text = document.getElementById("divID").innerText;
// find and replace
document.getElementById("divID").innerText = text.replace('hello world', 'hello big world');
var newtext=text.replace(text[6],'b'); should work. Glad you asked, I didn't know that would work.
Curious that it works, it doesn't replace all instances of that character either which is odd... I guess accessing characters with bracket notation treats the character as some 'character' object, not just a string.
Don't quote me on that though.
Yes, you have to replace the entire string by another, since strings are immutable in JavaScript. You can in various ways hide this behind a function call, but in the end what happens is construction of a new string that replaces the old one.
Text with div's are actually text nodes and hence we will have to explicitly manipulate their content by replacing the older content with the newer one.
If you are using jQuery then you can refer to the below link for a possible technique:
[link Replacing text nodes with jQuery] http://www.bennadel.com/blog/2253-Replacing-Text-Nodes-With-jQuery.htm.
Behind the scenes, I would guess that jQuery still replaces the entire string ** for that text node**
I have a string variable which I would like to extract the title value in id="resultcount" element. The output should be 2.
var str = '<table cellpadding=0 cellspacing=0 width="99%" id="addrResults"><tr></tr></table><span id="resultcount" title="2" style="display:none;">2</span><span style="font-size: 10pt">2 matching results. Please select your address to proceed, or refine your search.</span>';
I tried the following regex but it is not working:
/id=\"resultcount\" title=['\"][^'\"](+['\"][^>]*)>/
Since var str = ... is Javascript syntax, I assume you need a Javascript solution. As Peter Corlett said, you can't parse HTML using regular expressions, but if you are using jQuery you can use it to take advantage of browser own parser without effort using this:
$('#resultcount', '<div>'+str+'</div>').attr('title')
It will return undefined if resultcount is not found or it has not a title attribute.
To make sure it doesn't matter which attribute (id or title) comes first in a string, take entire html element with required id:
var tag = str.replace(/^.*(<[^<]+?id=\"resultcount\".+?\/.+?>).*$/, "$1")
Then find title from previous string:
var res = tag.replace(/^.*title=\"(\d+)\".*$/, "$1");
// res is 2
But, as people have previously mentioned it is unreliable to use RegEx for parsing html, something as trivial as different quote (single instead of double quote) or space in "wrong" place will brake it.
Please see this earlier response, entitled "You can't parse [X]HTML with regex":
RegEx match open tags except XHTML self-contained tags
Well, since no one else is jumping in on this and I'm assuming you're just looking for a value and not trying to create a parser, I'll give you what works for me with PCRE. I'm not sure how to put it into the java format for you but I think you'll be able to do that.
span id="resultcount" title="(\d+)"
The part you're looking to get is the non-passive group $1 which is the '\d+' part. It will get one or more digits between the quote marks.