I'm doing some research for a project that I have going on the uses the document.createTreeWalker and I'm looking at a script that uses quite a few xpath's, but I'm curious as to where these come from. Some are obvious and I have been able to find answers to online, such as [#AttributeName] and [#TagName], but what is [#StoreName], [#AttributeValue1], [#AttributeValue2]...these I have not been able to look up online.
Particularly, I'm looking at these lines and not understanding:
thisURL = window.document.location.href.toString();
if(thisURL.search("[#StoreName]") != -1) { //do something }
Perhaps I'm misunderstanding your question, but there's nothing functionally or syntactically different between [#AttributeName] and [#StoreName]. They're both predicates that are looking for elements with particular attributes. The first one is looking for AttributeName attributes, while the second is looking for StoreName attributes.
That said, the code you're showing isn't actually doing any XPath work. It's just looking at whether the URL contains the character sequence [#StoreName] using JavaScript's string search function, and doing something if it does.
Related
I have a very specific problem concerning a regular expression matching in Javascript. I'm trying to match a piece of source code, more specifically a portion here:
<TD WIDTH=100% ALIGN=right>World Boards | Olympa - Trade | <b>Bump when Yasir...</b></TD>
The part I'm trying to match is boardid=106121">Olympa - Trade</a>, the part I actually need is "Olympa". So I use the following line of JS code to get a match and have "Olympa" returned:
var world = document.documentElement.innerHTML.match('/boardid=[0-9]+">([A-Z][a-z]+)( - Trade){0,1}<\/a>/i')[1];
the ( - Trade) part is optional in my problem, hence the {0,1} in the regex.
There's also no easier way to narrow down the code by e.g. getElementsByTagName, so searching the complete source code is my only option.
Now here's the funny thing. I have used two online regex matchers (of which one was for JS-regex specifically) to test my regex against the complete source code. Both times, it had a match and returned "Olympa" exactly as it should have. However, when I have Chrome include the script on the actual page, it gives the following error:
Error in event handler for 'undefined': Cannot read property '1' of null TypeError: Cannot read property '1' of null
Obviously, the first part of my line returns "null" because it does not find a match, and taking [1] of "null" doesn't work.
I figured I might not be doing the match on the source code, but when I let the script output document.documentElement.innerHTML to the console, it outputs the complete source code.
I see no reason why this regex fails, so I must be overlooking something very silly. Does anyone else see the problem?
All help appreciated,
Kenneth
You're putting your regular expression inside a string. It should not be inside a string.
var world = document.documentElement.innerHTML.match(/boardid=[0-9]+">([A-Z][a-z]+)( - Trade){0,1}<\/a>/i)[1];
Another thing — it appears you have a document object, in which case all this HTML is already parsed for you, and you can take advantage of that instead of reinventing a fragile wheel.
var element = document.querySelector('a[href*="boardid="]');
var world = element.textContent;
(This assumes that you don't need <=IE8 support. If you do, there remains a better way, though.)
(P.S. ? is shorthand for {0,1}.)
I'm working on making a wysiwyg editor using slate.js
I'm in a situation where I'm trying to find the first node with text.
This picture below shows what I'm talking about:
Slate.js find first text pic
In my picture, I'd want to find the node that contains "this is my title.", even if there's several empty lines before it.
Basically if I have a bunch of text written in the editor, how do I find the first text that's not an empty string?
Looking through the docs, I've found the filterDescendants and findDescendants functions which seem to do what I'm looking for.
However, I'm unclear how to use them.
I've tried something like this:
this.state.state.startBlock.findDescendant((d) => d.text !== "")
But this just returns null
The docs say that findDescendant will "Deeply find a descendant node by iterator", where iterator is a function, but there's no examples provided for what sort of function you'd pass here.
Does anyone have any ideas or examples?
Slate.js author here.
You'll like want to do something like:
state.document.getBlocks().find(block => block.text != '')
This will search through the leaf block nodes in the document (in this case your paragraphs, headers, etc.) and find the first one that isn't empty.
The Slate data model is built with Immutable.js, so reading up on how that library works is very helpful for using Slate. In this case getBlocks() returns an immutable List, which has a find method.
Hope that helps!
I saw a code fragment like this:
with(document)0[(getElementsByTagName('head')[0] || body).appendChild(createElement(xxx))]
I don't know how to understand with(document)0[]
This is not valid JavaScript syntax and, trying to read around that, the semantics are extremely unclear.
I imagine the author meant something like this:
document.getElementsByTagName('head')[0] || document.body.appendChild(createElement(xxx))
"If there are any <head> tag in the document, return the first. Otherwise return the the result of appending createElement(xxx) to the body".
It is hard to answer this question without the full code. But I'll made some assumptions here.
The first thing that I'd like to say is to avoid of using with() statement. It is not recommended in ECMAScript 5 and is forbidden in strict mode. And one of the reasons is your fragment - this code confused a lot of people, even you.
So let's rewrite it a little bit to make it more understandable:
with(document) {
0[(getElementsByTagName('head')[0] || body).appendChild(createElement(xxx))];
}
How with works you can read here - with statement, but basically it is give us an ability to use directly all the properties and methods of the expression that we are sending to with (in our case it's a document).
So, how this code fragment will looks like without with?
0[(document.getElementsByTagName('head')[0] || document.body).appendChild(document.createElement(xxx))];
The only answer that I don't have is - why to execute this code inside of brackets? The assumption that I have is the following:
this code fragment (document.getElementsByTagName('head')[0] || document.body).appendChild(document.createElement(xxx)) will return the node of new created element. But if we place this code into the 0[], it will return undefined as there is no such property. Again, it's hard to understand all parts of this code fragment without the whole picture.
I have a real simple question that I can't seem to find an answer to.
I want to compress two XPath statements (that are getting attribute values). I learned about the | operator, hearing how it returns node sets.
var getdata = xmldoc.evaluate
(
'/foo/bar[#world=\''+hello+'\']/child::*/attribute::name
|/foo/bar[#world=\''hello+'\']/child::*/attribute::id',
xmldoc, null, XPathResult.ANY_TYPE, null
);
To anyone wondering, no I do not format my evaluation strings that way ... though, I sort of like it now that I typed it out. Anyways, this is how I tested it out.
alert(getItemData.iterateNext().childNodes[0].nodeValue);
That works! But it only returns the first one. While writing this, I just tried .length and made a break through ... it's only counting one item. Was I deceived about the concept of |? How can I get a set and then go through them?
XML document, as requested.
<?xml version="1.0" encoding="ISO-8859-1"?>
<foo>
<bar world="hello" id="1">
<subbar name="item1" id="2">
</subbar>
</bar>
<bar world="bye" id="3">
<subbar name="item2" id="4">
</subbar>
</bar>
</foo>
Edit: I am currently using a function that grabs the element rather than the attribute, but I would really like to know the other way. Unless what I am doing is the best way.
If JQuery is an option, it might be worth your while to check out their XML traversal library. A quick search pulled up an article here. I wrote up a very rough example of what the logic may look like after you import the xml document, which is explained in the link.
var hello = "foo";
$('bar[world=' + hello + '] > subbar').each(function () {
// You'd want to save these values somewhere else, obviously.
$(this).getAttribute(name);
$(this).getAttribute(id);
});
The key here is the XPathResult type you use.
I have implemented a working sample for the same.
Please refer the code at http://jsbin.com/eneso3/5/edit
Basically you have to use Iterator as result type sot hat we can iterate through them to get the text. Refer Xpath reference mentioned on the working code sample page.
Well your usage of the "pipe" is correct (http://www.tizag.com/xmlTutorial/xpathbar.php) so the only code that I can see might be off is a missing + in the second xpath command, but that might be pseudo code, so I would only count this as a half answer. As for the best practice, in my opinion I would grab the subbar element then grab it's attributes out where you need them an optimization like the one you've suggested obfuscates what data is being referenced. Seems too much of a mico-optimization, but this is just an opinion. Maybe you have a long list of attributes and you really are saving parsing time.
I have some text in an element in my page, and i want to scrap the price on that page without any text beside.
I found the page contain price like that:
<span class="discount">now $39.99</span>
How to filter this and just get "$39.99" just using JavaScript and regular expressions.
The question may be too easy or asked by another way before but i know nothing about regular expressions so asked for your help :).
<script language="javascript">
window.onload = function () {
// Get all of the elements with class name "discount"
var elements = document.getElementsByClassName('discount');
// Loop over each <span class="discount">
for (var i=0; i < elements.length; i++) {
// get the text, e.g. "now $39.99"
var rawText = elements[i].innerHTML;
// Here's a regular expression to match one or more digits (\d+)
// followed by a period (\.) and one or more digits again (\d+)
var priceAsString = rawText.match(/\d+\.\d+/)
// You'll want to make the price a floating point number if you
// intend to do any calculations with it.
var price = parseFloat(priceAsString);
// Now what do you want to do with the price? I'll just write it out
// to the console (using FireBug or something similar)
console.log(price);
}
}
</script>
document.evaluate("//span[#class='discount']",
document,
null,
XPathResult.ANY_UNORDERED_NODE_TYPE,
null).singleNodeValue.textContent.replace("now $", "");
EDIT: This is standard XPath. I'm not sure what kind of explanation you're seeking. For outdated browsers, you will need a third-party library like Sarissa and/or Java-line.
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.
Patrick McElhaney's and Matthew Flaschen's answers are both good ways to solve the problem.
as Matthew Flaschen suggested, XPATH is a better way to go, if you know something about the node structure of the target document (and since you provided an example, you seem to). If you don't know the node structure, regexes are still lousy for parsing XML.
some more resources to kick-start you:
XPath in Javascript: Introduction
DOM Parsing With XPath and JavaScript
Mozilla dev-center: Introduction to using XPath in JavaScript
I've also found the FireFox extension combo of DOM Inspector and XPather to be an invaluable tool for deriving and testing XPath expressions on a given page. (If you're using another browser -- well, I don't know).