Javascript/Greasemonkey match(), regex - javascript

I need to grab data from this text from this page:
http://www.chess.com/home/game_archive?sortby=&show=echess&member=deckers1066
I cannot seem to get it working using.
var text = document.body;
var results = text.match(/id=[0-9]*>/g);
I need to grab all occurrences that look something like this
/echess/game?id=60942234
I'm interested more in the id number

You've got two problems with your code; one is the string you want to search is document.body.innerHTML and the other is the RegExp is looking for the end tag to the element, > without a quote before it. Try this
var results = document.body.innerHTML.match(/id=\d+/g);
Note I completely ommited the end tag because this RegExp is greedy and it means you don't have to worry about HTML parsing.

Please don't use regular expressions for this. You should be using a proper DOM parser (there are many available for pretty much every language) and then selecting the IDs using that.
If you insist on using regex (which I would recommend against), Paul S's answer is the best.

Related

JavaScript RegEx match unless wrapped with [nocode][/nocode] tags

My current code is:
var user_pattern = this.settings.tag;
user_pattern = user_pattern.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&"); // escape regex
var pattern = new RegExp(user_pattern.replace(/%USERNAME%/i, "(\\S+)"), "ig");
Where this.settings.tag is a string such as "[user=%USERNAME%]" or "#%USERNAME%". The code uses pattern.exec(str) to find any username in the corresponding tag and works perfectly fine. For example, if str = "Hello, [user=test]" then pattern.exec(str) will find test.
This works fine, but I want to be able to stop it from matching if the string is wrapped in [nocode][/nocode] tags. For example, if str = "[nocode]Hello, [user=test], how are you?[/nocode]" thenpattern.exec(str)` should not match anything.
I'm not quite sure where to start. I tried using a (?![nocode]) before and after the pattern, but to no avail. Any help would be great.
I would just test if the string starts with [nocode] first:
/^\[nocode\]/.test('[nocode]');
Then simply do not process it.
Maybe filter out [nocode] before trying to find the username(s)?
pattern.exec(str.replace(/\[nocode\](.*)\[\/nocode\]/g,''));
I know this isn't exactly what you asked for because now you have to use two separate regular expressions, however code readability is important too and doing it this way is definitely better in that aspect. Hope this helps 😉
JSFiddle: http://jsfiddle.net/1f485Lda/1/
It's based on this: Regular Expression to get a string between two strings in Javascript

Extracting HTML string within XML tag with jQuery

I've been working at this for a week and I'm stumped.
I'm trying to parse an RSS feed from SharePoint using jQuery. Using $.find works great on extracting the data between valid XML tags in the feed, but unfortunately one of the tags stores several HTML tags instead of the nice and clean strings like the others.
I have the tag extracted and stored as a string using the following:
$(xml).find("item").each(function () {
var description = $(this).find('description').text();
})
Which gives me the contents of the description tag:
<![CDATA[<div><b>Title:</b> Welcome!</div>
<div><b>Modified:</b> 6/10/2014 7:58 AM</div>
<div><b>Created:</b> 6/3/2014 2:55 PM</div>
<div><b>Created By:</b> John Smith</div>
<div><b>Modified By:</b> Samuel Smith</div>
<div><b>Version:</b> 1.0</div>
<div><b>AlertContent:</b> Stop the presses.</div>
<div><b>Team:</b> USA.</div>]]>
Now my problem is extracting and storing the useful bits. Is there a way to only extract the text following AlertContent:</b>? It seems this might be possible using regular expressions, but I don't know how to make a filter that would start at the end of the bold tag and extend all the way until the start of the closing div tag. Or is there a better way through jQuery's methods?
Sure you're quite right; regular expressions can help you do that. Here is how you can do it:
var alertContent = description.replace(/^.*AlertContent:</b>([^<]*).*$/i, '$1');
WORKING JSFIDDLE DEMO
I'm sure you've heard the warnings about parsing xml with regex. Nevertheless, in case you'd like to know how to do it with regex, this simple pattern will do it:
AlertContent:<\/b>([^<]*)
We start by matching AlertContent:</b>
Then the negative character class [^<]* matches all characters that are not a < and the parentheses capture them to Group 1
All we need to do is read Group 1. Here is sample code to do it:
var regex = /AlertContent:<\/b>([^<]*)/;
var match = regex.exec(string);
if (match != null) {
alert = match[1];
}

Javascript regex whitespace is being wacky

I'm trying to write a regex that searches a page for any script tags and extracts the script content, and in order to accommodate any HTML-writing style, I want my regex to include script tags with any arbitrary number of whitespace characters (e.g. <script type = blahblah> and <script type=blahblah> should both be found). My first attempt ended up with funky results, so I broke down the problem into something simpler, and decided to just test and play around with a regex like /\s*h\s*/g.
When testing it out on string, for some reason completely arbitrary amounts of whitespace around the 'h' would be a match, and other arbitrary amounts wouldn't, e.g. something like " h " would match but " h " wouldn't. Does anyone have an idea of why this occurring or the the error I'm making?
Since you're using JavaScript, why can't you just use getElementsByTagName('script')? That's how you should be doing it.
If you somehow have an HTML string, create an iframe and dump the HTML into it, then run getElementsByTagName('script') on it.
OK, to extend Kolink's answer, you don't need an iframe, or event handlers:
var temp = document.createElement('div');
temp.innerHTML = otherHtml;
var scripts = temp.getElementsByTagName('script');
... now scripts is a DOM collection of the script elements - and the script doesn't get executed ...
Why regex is not a fantastic idea for this:
As a <script> element may not contain the string </script> anywhere, writing a regex to match them would not be difficult: /<script[.\n]+?<\/script>/gi
It looks like you want to only match scripts with a specific type attribute. You could try to include that in your pattern too: /<script[^>]+type\s*=\s*(["']?)blahblah\1[.\n]*?<\/script>/gi - but that is horrible. (That's what happens when you use regular expressions on irregular strings, you need to simplify)
So instead you iterate through all the basic matched scripts, extract the starting tag: result.match(/<script[^>]*>/i)[0] and within that, search for your type attribute /type\s*=\s*((["'])blahblah\2|\bblahblah\b)/.test(startTag). Oh look - it's back to horrible - simplify!
This time via normalisation:
startTag = startTag.replace(/\s*=\s*/g, '=').replace(/=([^\s"'>]+)/g, '="$1"') - now you're in danger territory, what if the = is inside a quoted string? Can you see how it just gets more and more complicated?
You can only have this work using regex if you make robust assumptions about the HTML you'll use it on (i.e. to make it regular). Otherwise your problems will grow and grow and grow!
disclaimer: I haven't tested any of the regex used to see if they do what I say they do, they're just example attempts.

How do I extract the title value from a string using Javascript regexp?

I have a string variable which I would like to extract the title value in id="resultcount" element. The output should be 2.
var str = '<table cellpadding=0 cellspacing=0 width="99%" id="addrResults"><tr></tr></table><span id="resultcount" title="2" style="display:none;">2</span><span style="font-size: 10pt">2 matching results. Please select your address to proceed, or refine your search.</span>';
I tried the following regex but it is not working:
/id=\"resultcount\" title=['\"][^'\"](+['\"][^>]*)>/
Since var str = ... is Javascript syntax, I assume you need a Javascript solution. As Peter Corlett said, you can't parse HTML using regular expressions, but if you are using jQuery you can use it to take advantage of browser own parser without effort using this:
$('#resultcount', '<div>'+str+'</div>').attr('title')
It will return undefined if resultcount is not found or it has not a title attribute.
To make sure it doesn't matter which attribute (id or title) comes first in a string, take entire html element with required id:
var tag = str.replace(/^.*(<[^<]+?id=\"resultcount\".+?\/.+?>).*$/, "$1")
Then find title from previous string:
var res = tag.replace(/^.*title=\"(\d+)\".*$/, "$1");
// res is 2
But, as people have previously mentioned it is unreliable to use RegEx for parsing html, something as trivial as different quote (single instead of double quote) or space in "wrong" place will brake it.
Please see this earlier response, entitled "You can't parse [X]HTML with regex":
RegEx match open tags except XHTML self-contained tags
Well, since no one else is jumping in on this and I'm assuming you're just looking for a value and not trying to create a parser, I'll give you what works for me with PCRE. I'm not sure how to put it into the java format for you but I think you'll be able to do that.
span id="resultcount" title="(\d+)"
The part you're looking to get is the non-passive group $1 which is the '\d+' part. It will get one or more digits between the quote marks.

regex match different with jQuery

I am getting different results from regex match when using both regular JS and jQuery v1.4.2 and unable to figure out why. Only match string should be returned.
I use jQuery to grab the whole table via the parent ID using .html(). textToSearch is shorten.
textToSearch = '<tr><th colspan="5">my match with spaces here (<a href=';
pattern = /(?=<th colspan="\d">).*(?= \()/i;
expected_result = 'my match with spaces here';
var match = textToSearch.match(pattern);
In regular JS I get expected result, but in jQuery I get 'my match with spaces here'.
Am I doing something wrong is jQuery messing things up ?
Maybe there is a better way getting the expected result ?
Edit: Solution below.
var pattern = /.*(?= \()/;
var t = $('#'+id+' th[colspan]').text();
$('#'+targetid).text(t.match(pattern)[0]);
jQuery's html() will not return the html in the format you're expecting above, and it may even differ between browsers. The string returned is constructed from the browser's representation of the DOM tree, and so will look nothing like the "view source" feature of most browsers. Parsing HTML with regular expressions isn't a great idea for this reason (and others).
It's unclear to me why you're using a regular expression. If you're grabbing the html with html(), you may as well just filter the nodes and grab their text instead. It would be a much more robust solution. For instance, the following code may do the job for you:
var result = $("table th[colspan]")[0].firstChild.nodeValue;
Or perhaps you want the entire text for that <th> element?
var result = $("table th[colspan]").text();
Either way, there's almost certainly a better way to attain the match you're after without using regular expressions.

Categories

Resources