Node.js using Regular Expression to extract certain string from the response - javascript

I want to use regex to extract some text from the website html code i've retrieved by using the Nodejs. And the text i received was like this:
<body>
...
<p>text with certain format that I want.</p>
...
</body>
How should I extract the test and store it in a variable?
The reason I do this is because I need to retrieve the information from numerous pages, it is impossible to do it manually.
Huge thanks in advance!

If you're just looking for the first instance of a paragraph, you can do this, but this will only fetch the content of the first paragraph. If you want a specific paragraph, you need a way to identify that paragraph as opposed to every other one in the HTML.
If you're looking for something more specific, we'll need to know more about what you're trying to do.
var regex = /<p>(.*)?<\/p>/,
html = [your html here],
results = regex.exec(html);
console.log(results); // an array of matches

var text= '<p>text with certain format that I want.</p>';
jQuery('<div>' + text + '</div>').text();

Related

Using JavaScript, Can I upload a word file and use .replace then save as new document

Using JavaScript I would like to upload a word document and/or browse for file on local machine and view the contents... I would then like to replace the contents with different text.
Here is a snippet of the text replace I want to use.
<button onclick="myFunction()">Convert</button>
<script>
function myFunction()
{
var str = document.getElementById("source").value;
var res =
str.replace(/a/g, "ა")
.replace(/b/g, "ბ")
.replace(/g/g, "გ")
//+ more letters for entire alphabet
document.getElementById("source").value=res;
}
</script>
What I would like to know is if it's possible to get the contents of a word document file, change all of the letters into Georgian characters (whilst retaining formatting if possible) then to save as a new word document?
For docx you could use DOCX.js https://github.com/stephen-hardy/DOCX.js
If you use a .docx file this should be possible since docx is XML. You might want to use the jQuery XML parser (http://api.jquery.com/jQuery.parseXML/) or get the docs as XML string. With larger documents this might not be the best solution.

How to manipulate particular character in DOM in JavaScript?

Suppose I have text called "Hello World" inside a DIV in html file. I want to manipulate sixth position in "Hello World" text and replace that result in DOM, like using innerHTML or something like that.
The way i do is
var text = document.getElementById("divID").innerText;
now somehow I got the text and and manipluate the result using charAt for particular position and replace the result in html by replacing the whole string not just that position element. What I want to ask is do we have to every time replace the whole string or is there a way using which we can extract the character from particular position and replace the result in that position only not the whole string or text inside the div.
If you just need to insert some text into an already existing string you should use replace(). You won't really gain anything by trying to replace only one character as it will need to make a new string anyway (as strings are immutable).
jsFiddle
var text = document.getElementById("divID").innerText;
// find and replace
document.getElementById("divID").innerText = text.replace('hello world', 'hello big world');
var newtext=text.replace(text[6],'b'); should work. Glad you asked, I didn't know that would work.
Curious that it works, it doesn't replace all instances of that character either which is odd... I guess accessing characters with bracket notation treats the character as some 'character' object, not just a string.
Don't quote me on that though.
Yes, you have to replace the entire string by another, since strings are immutable in JavaScript. You can in various ways hide this behind a function call, but in the end what happens is construction of a new string that replaces the old one.
Text with div's are actually text nodes and hence we will have to explicitly manipulate their content by replacing the older content with the newer one.
If you are using jQuery then you can refer to the below link for a possible technique:
[link Replacing text nodes with jQuery] http://www.bennadel.com/blog/2253-Replacing-Text-Nodes-With-jQuery.htm.
Behind the scenes, I would guess that jQuery still replaces the entire string ** for that text node**

Unable to parse the JSON correctly

In the response of type application/x-javascript I am picking the required JSON portion in a varaible. Below is the JSON-
{
"__ra":1,
"payload":null,
"data":[
[
"replace",
"",
true,
{
"__html": "\u003Cspan class=\"highlight fsm\" id=\"u_c_0\">I want this text only\u003C\/span>"
}
]
]
}
From the references, which I got from Stackoverflow, I am able to pick the content inside data in the following way-
var temp = JSON.parse(resp).data;
But my aim is to get only the text part of __html value which is I want this text only . Somebody help.
First you have to access the object you targeted:
var html = JSON.parse(resp).data[0][3]._html;
But then the output you want is I want this text only
The html variable doesn't containt that text but some html where the content you're looking for is the text inside a span
If you accept including jQuery in your project you can access that content this way
var text = $(html).text();
To put it all together:
var html = JSON.parse(resp).data[0][3]._html;
var div = document.createElement("div");
div.innerHTML = html;
var text = div.textContent || div.innerText || "";
Kudos #Tim Down for this answer on cross-browser innerHTML: JavaScript: How to strip HTML tags from string?
First you'll need to be a bit more specific with that data to get to the string of text you want:
var temp = JSON.parse(resp).data[0][3]['__html'];
Next you'll need to search that string to extract the data you want. That will largely depend on the regularity of the response you are getting. In any case, you will probably need to use a regular expression to parse the string you get in the response.
In this case, you are trying to get the text within the <span> element in the string. If that was the case for all your responses, you could do something like:
var text = /<span[^>]*>([^<]*)<\/span>/.exec(temp)[1];
This very specifically looks for text within the opening and closing of one span tag that contains no other HTML tags.
The main part to look at in the expression here is the ([^<]*), which will capture any character that is not an opening angled bracket, <. Everything around this is looking for instances of <span> with optional attributes. The exec is the method you perform on the temp string to return a match and the [1] will give you the first and only capture (e.g. the text between the <span> tags).
You would need read up more about RegExp to find out how to do something more specific (or provide more specific information in your question about the pattern of response you are looking for). But's generally well worth reading up on regular expressions if you're going to be doing this kind of work (parsing text, looking for patterns and matches) because they are a very concise and powerful way of doing it, if a little confusing at first.

Javascript/Greasemonkey match(), regex

I need to grab data from this text from this page:
http://www.chess.com/home/game_archive?sortby=&show=echess&member=deckers1066
I cannot seem to get it working using.
var text = document.body;
var results = text.match(/id=[0-9]*>/g);
I need to grab all occurrences that look something like this
/echess/game?id=60942234
I'm interested more in the id number
You've got two problems with your code; one is the string you want to search is document.body.innerHTML and the other is the RegExp is looking for the end tag to the element, > without a quote before it. Try this
var results = document.body.innerHTML.match(/id=\d+/g);
Note I completely ommited the end tag because this RegExp is greedy and it means you don't have to worry about HTML parsing.
Please don't use regular expressions for this. You should be using a proper DOM parser (there are many available for pretty much every language) and then selecting the IDs using that.
If you insist on using regex (which I would recommend against), Paul S's answer is the best.

How do I extract the title value from a string using Javascript regexp?

I have a string variable which I would like to extract the title value in id="resultcount" element. The output should be 2.
var str = '<table cellpadding=0 cellspacing=0 width="99%" id="addrResults"><tr></tr></table><span id="resultcount" title="2" style="display:none;">2</span><span style="font-size: 10pt">2 matching results. Please select your address to proceed, or refine your search.</span>';
I tried the following regex but it is not working:
/id=\"resultcount\" title=['\"][^'\"](+['\"][^>]*)>/
Since var str = ... is Javascript syntax, I assume you need a Javascript solution. As Peter Corlett said, you can't parse HTML using regular expressions, but if you are using jQuery you can use it to take advantage of browser own parser without effort using this:
$('#resultcount', '<div>'+str+'</div>').attr('title')
It will return undefined if resultcount is not found or it has not a title attribute.
To make sure it doesn't matter which attribute (id or title) comes first in a string, take entire html element with required id:
var tag = str.replace(/^.*(<[^<]+?id=\"resultcount\".+?\/.+?>).*$/, "$1")
Then find title from previous string:
var res = tag.replace(/^.*title=\"(\d+)\".*$/, "$1");
// res is 2
But, as people have previously mentioned it is unreliable to use RegEx for parsing html, something as trivial as different quote (single instead of double quote) or space in "wrong" place will brake it.
Please see this earlier response, entitled "You can't parse [X]HTML with regex":
RegEx match open tags except XHTML self-contained tags
Well, since no one else is jumping in on this and I'm assuming you're just looking for a value and not trying to create a parser, I'll give you what works for me with PCRE. I'm not sure how to put it into the java format for you but I think you'll be able to do that.
span id="resultcount" title="(\d+)"
The part you're looking to get is the non-passive group $1 which is the '\d+' part. It will get one or more digits between the quote marks.

Categories

Resources