how can i search an html page for a word fast?
and how can i get the html tag that the word is in? (so i can work with the entire tag)
To find the element that word exists in, you'd have to traverse the entire tree looking in just the text nodes, applying the same test as above. Once you find the word in a text node, return the parent of that node.
var word = "foo",
queue = [document.body],
curr
;
while (curr = queue.pop()) {
if (!curr.textContent.match(word)) continue;
for (var i = 0; i < curr.childNodes.length; ++i) {
switch (curr.childNodes[i].nodeType) {
case Node.TEXT_NODE : // 3
if (curr.childNodes[i].textContent.match(word)) {
console.log("Found!");
console.log(curr);
// you might want to end your search here.
}
break;
case Node.ELEMENT_NODE : // 1
queue.push(curr.childNodes[i]);
break;
}
}
}
this works in Firefox, no promises for IE.
What it does is start with the body element and check to see if the word exists inside that element. If it doesn't, then that's it, and the search stops there. If it is in the body element, then it loops through all the immediate children of the body. If it finds a text node, then see if the word is in that text node. If it finds an element, then push that into the queue. Keep on going until you've either found the word or there's no more elements to search.
You can iterate through DOM elements, looking for a substring within them. Neither fast nor elegant, but for small HTML might work well enough.
I'd try something recursive, like: (code not tested)
findText(node, text) {
if(node.childNodes.length==0) {//leaf node
if(node.textContent.indexOf(text)== -1) return [];
return [node];
}
var matchingNodes = new Array();
for(child in node.childNodes) {
matchingNodes.concat(findText(child, text));
}
return matchingNodes;
}
You can try using XPath, it's fast and accurate
http://www.w3schools.com/Xpath/xpath_examples.asp
Also if XPath is a bit more complicated, then you can try any javascript library like jQuery that hides the boilerplate code and makes it easier to express about what you want found.
Also, as from IE8 and the next Firefox 3.5 , there is also Selectors API implemented. All you need to do is use CSS to express what to search for.
You can probably read the body of the document tree and perform simple string tests on it fast enough without having to go far beyond that - it depends a bit on the HTML you are working with, though - how much control do you have over the pages? If you are working within a site you control, you can probably focus your search on the parts of the page likely to be different page from page, if you are working with other people's pages you've got a tougher job on your hands simply because you don't necessarily know what content you need to test against.
Again, if you are going to search the same page multiple times and your data set is large it may be worth creating some kind of index in memory, whereas if you are only going to search for a few words or use smaller documents its probably not worth the time and complexity to build that.
Probably the best thing to do is to get some sample documents that you feel will be representative and just do a whole lot of prototyping based around the approaches people have offered here.
form.addEventListener("submit", (e) => {
e.preventDefault();
var keyword = document.getElementById("search_input");
let words = keyword.value;
var word = words,
queue = [document.body],
curr;
while (curr = queue.pop()) {
if (!curr.textContent.toUpperCase().match(word.toUpperCase())) continue;
for (var i = 0; i < curr.childNodes.length; ++i) {
switch (curr.childNodes[i].nodeType) {
case Node.TEXT_NODE: // 3
if (curr.childNodes[i].textContent.toUpperCase().match(word.toUpperCase())) {
console.log("Found!");
console.log(curr);
curr.scrollIntoView();
}
break;
case Node.ELEMENT_NODE: // 1
queue.push(curr.childNodes[i]);
break;
}
}
}
});
Related
I have a nearly 400 page document that I need to [randomly] reorder the pages. (If you need to know, this is a book of single page stories that need to be randomly distributed. I created a random list of pages to input into the script.)
I've been working with a modified script I found elsewhere on the internet that creates an array and moves the pages around:
var order="...list of new page numbers...";
// Create an array out of the list:
ranges = toSeparate (order);
if (ranges.length != app.activeDocument.pages.length)
{
alert ("Page number mismatch -- "+ranges.length+" given, "+app.activeDocument.pages.length+" in document");
exit(0);
}
// Consistency check:
sorted = ranges.slice().sort(numericSort);
for (a=0; a<sorted.length-1; a++)
{
if (sorted[a] < sorted[a+1]-1 ||
sorted[a] == sorted[a+1])
alert ("Mismatch from "+sorted[a]+" to "+sorted[a+1]);
}
// alert ("New order for "+order+"\nis "+ranges.join(", "));
// Convert from 1..x to 0..x-1:
for (moveThis=0; moveThis<ranges.length; moveThis++)
ranges[moveThis]--;
for (moveThis=0; moveThis<ranges.length; moveThis++)
{
if (moveThis != ranges[moveThis])
{
try{
app.activeDocument.pages[ranges[moveThis]].move (LocationOptions.BEFORE, app.activeDocument.pages[moveThis]);
} catch(_) { alert ("problem with page "+moveThis+"/index "+ranges[moveThis]); }
}
for (updateRest=moveThis+1; updateRest<ranges.length; updateRest++)
if (ranges[updateRest] < ranges[moveThis])
ranges[updateRest]++;
}
function toSeparate (list)
{
s = list.split(",");
for (l=0; l<s.length; l++)
{
try {
if (s[l].indexOf("-") > -1)
{
indexes = s[l].split("-");
from = Number(indexes[0]);
to = Number(indexes[indexes.length-1]);
if (from >= to)
{
alert ("Cannot create a range from "+from+" to "+to+"!");
exit(0);
}
s[l] = from;
while (from < to)
s.splice (++l,0,++from);
}} catch(_){}
}
// s.sort (numericSort);
return s;
}
function numericSort(a,b)
{
return Number(a) - Number(b);
}
This code worked, except that it was consistently rearranging them into the wrong random order, which, at the end of the day, is workable, but it'll just be a bigger pain in the ass to index the stories.
I suspected the problem might be caused by starting at the begginning of the document rather than the end, so I modified the script to start at the end, but then app.activeDocument.pages[ranges[moveThis]] kept coming up as undefined.
So I gave up and tried this:
app.activeDocument.pages[298].move (LocationOptions.BEFORE, app.activeDocument.pages[366]);
app.activeDocument.pages[33].move (LocationOptions.BEFORE, app.activeDocument.pages[365]);
app.activeDocument.pages[292].move (LocationOptions.BEFORE, app.activeDocument.pages[364]);
And so on for every page. (This reminds me of my time in junior high using sendKeys to create programs in Visual Basic. Had I bothered to seriously learn JavaScript instead of creating shitty AOL chatroom scrollers, I probably wouldn't be on here today.)
Nevertheless, I received the following error:
Error Number: 30477
Error String: Invalid value for parameter 'reference' of method 'move'. Expected Page or Spread, but received nothing.
I'm trying to avoid having to manually move the pages, especially considering the amount of time I've already been working on this. Any suggestions on what I need to change? Thank you!
The issue might be that you are using more than one page per spread and then trying to shuffle them across spread. The better way is to use single page per spread.
Here is a small snippet that works on my machine
var doc = app.activeDocument;
doc.documentPreferences.facingPages = false;
for (var i =0; i < 100; i++){
var index = parseInt((Math.random() * doc.spreads.length) % doc.spreads.length + '' , 10);
doc.spreads[index].move();
}
What this does is
Disables the facing pages options and makes one page per spread. A desirable condition as you mentioned that your stories are one page each(I am assuming that your stories will not violate this assumption).
Takes a random spread from the doc and sends it to the end of the spreads in the doc. It does so 100 times.
The result is what you wanted. A script to shuffle the current SPREADS randomly.
I'm new to lexing and parsing so sorry if the title isn't clear enough.
Basically, I'm using Jison to parse some text and I am trying to get the lexer to comprehend indentation. Here's the bit in question:
(\r\n|\r|\n)+\s* %{
parser.indentCount = parser.indentCount || [0];
var indentation = yytext.replace(/^(\r\n|\r|\n)+/, '').length;
if (indentation > parser.indentCount[0]) {
parser.indentCount.unshift(indentation);
return 'INDENT';
}
var tokens = [];
while (indentation < parser.indentCount[0]) {
tokens.push('DEDENT');
parser.indentCount.shift();
}
if (tokens.length) {
return tokens;
}
if (!indentation.length) {
return 'NEWLINE';
}
%}
So far, almost all of that works as expected. The one problem is the line where I attempt to return an array of DEDENT tokens. It appears that Jison is just converting that array into a string which causes me to get a parse error like Expecting ........, got DEDENT,DEDENT.
What I'm hoping I can do to get around this is manually push some DEDENT tokens onto the stack. Maybe with a function like this.pushToken('DEDENT') or something along those lines. But the Jison documentation is not so great and I could use some help.
Any thoughts?
EDIT:
I seem to have been able to hack my way around this after looking at the generated parser code. Here's what seems to work...
if (tokens.length) {
var args = arguments;
tokens.slice(1).forEach(function () {
lexer.performAction.apply(this, args);
}.bind(this));
return 'DEDENT';
}
This tricks the lexer into performing another action using the exact same input for each DEDENT we have in the stack, thus allowing it to add in the proper dedents. However, it feels gross and I'm worried there could be unforeseen problems.
I would still love it if anyone had any ideas on a better way to do this.
After a couple of days I ended up figuring out a better answer. Here's what it looks like:
(\r\n|\r|\n)+[ \t]* %{
parser.indentCount = parser.indentCount || [0];
parser.forceDedent = parser.forceDedent || 0;
if (parser.forceDedent) {
parser.forceDedent -= 1;
this.unput(yytext);
return 'DEDENT';
}
var indentation = yytext.replace(/^(\r\n|\r|\n)+/, '').length;
if (indentation > parser.indentCount[0]) {
parser.indentCount.unshift(indentation);
return 'INDENT';
}
var dedents = [];
while (indentation < parser.indentCount[0]) {
dedents.push('DEDENT');
parser.indentCount.shift();
}
if (dedents.length) {
parser.forceDedent = dedents.length - 1;
this.unput(yytext);
return 'DEDENT';
}
return `NEWLINE`;
%}
Firstly, I modified my capture regex to make sure I wasn't inadvertently capturing extra newlines after a series of non-newline spaces.
Next, we make sure there are 2 "global" variables. indentCount will track our current indentation length. forceDedent will force us to return a DEDENT if it has a value above 0.
Next, we have a condition to test for a truthy value on forceDedent. If we have one, we'll decrement it by 1 and use the unput function to make sure we iterate on this same pattern at least one more time, but for this iteration, we'll return a DEDENT.
If we haven't returned, we get the length of our current indentation.
If the current indentation is greater than our most recent indentation, we'll track that on our indentCount variable and return an INDENT.
If we haven't returned, it's time to prepare to possible dedents. We'll make an array to track them.
When we detect a dedent, the user could be attempting to close 1 or more blocks all at once. So we need to include a DEDENT for as many blocks as the user is closing. We set up a loop and say that for as long as the current indentation is less than our most recent indentation, we'll add a DEDENT to our list and shift an item off of our indentCount.
If we tracked any dedents, we need to make sure all of them get returned by the lexer. Because the lexer can only return 1 token at a time, we'll return 1 here, but we'll also set our forceDedent variable to make sure we return the rest of them as well. To make sure we iterate on this pattern again and those dedents can be inserted, we'll use the unput function.
In any other case, we'll just return a NEWLINE.
I'm using javascript loop (using setInterval) that runs through a list of search results, highlighting the search term by adding a css styled <span> around search hits as it goes. I'm using setInterval like this to release control of the browser while it does this.
In Chrome and Firefox this works well - even with a setInterval parameter of 10-20ms; and the user has full control of the browser (i.e. scrolling, clicking links etc.) while the results are rapidly highlighted:
mylooper = setInterval(function() {
// my functionality is here
},15); // 15ms
Unfortunately, when using the dreaded IE8, the browser locks up and takes a really long time to add the <span>'s and style the search results. It also takes a long time just to load the page in the first place - shortened a great deal when this script is removed.
So far I've tried:
changing the interval values (I've read that IE8 doesn't detect intervals of sub 15ms);
using setTimeout instead of setInterval;
removing the interval to check that this is in fact what is causing the slow-down (it is!); and
swearing about Internet Explorer a lot;
var highlightLoop;
var index = 0;
highlightLoop = setInterval(function () {
var regex = RegExp(regexPhrase, "gi"); // regexPhase created elsewhere
var searchResults = resultElements.eq(index).get(0); // run through resultElements which contain alll the nodes with search results in them.
findAndReplaceDOMText( // a function that does the searching and inserting of styling
regex,
searchResults,
function (fill, matchIndex) {
called = true;
var span = document.createElement("span");
span.className = "result-highlight";
span.innerHTML = fill;
return span;
}
);
if (index == resultElements.length || searchTermUpdated == true) { // stop interval loop when search term changes or we reach the end of results - variable set elsewhere.
searchTermUpdated = false;
clearInterval(highlightLoop); // stop the loop
}
index++
}
}, 50); // 50ms does not improve performance.
Any advice on workarounds for this kind of javascripting in IE would be massively appreciated. Thanks all.
I believe you may be able to improve the performance by tweaking findAndReplaceDOMText, and maybe its callback too. I suppose findAndReplaceDOMText appends the element returned by the callback to the DOM, from within a loop of all matches. If it's doing that inside a loop, try to move it outside the loop, and apply the all changes to the DOM at once. That should result in better performance, as repainting the page after each DOM update is expensive.
Try this recursive approach instead:
get a list of all elements to be acted upon into array X (one time cost)
while the array X has length, keep repeating the next actions
shift the first element off the array
process the single element
start this process again with the new array X (now Xn - 1 length) on a setTimeout
The code looks like this in general
function processArray(array) {
var element = array.shift();
processElement(element);
if (array)
setTimeout(function(){processArray(array);},15ms);
}
There might be something else to be done with this recursion, but it works fairly well in all browsers and never blocks, because you're only initiating the repeat when the last one has had time to finish.
r = r.replace(/<TR><TD><\/TD><\/TR>/gi, rider_html);
...does not work in IE but works in all other browsers.
Any ideas or alternatives?
I've come to the conclusion that the variable r must not have the value in it you expect because the regex replacement should work fine if there is actually a match. You can see in this jsFiddle that the replace works fine if "r" actually has a match in it.
This is the code from fiddle and it shows the proper replacement in IE.
var r = "aa<TR><TD></TD></TR>bb";
var rider_html = " foo ";
r = r.replace(/<TR><TD><\/TD><\/TR>/gi, rider_html);
alert(r);
So, we can't really go further to diagnose without knowing what the value of "r" is and where it came from or knowing something more specific about the version of IE that you're running in (in which case you can just try the fiddle in that version yourself).
If r came from the HTML of the document, then string matching on it is a bad thing because IE does not keep the original HTML around. Instead it reconstitutes it when needed from the parsed page and it puts some things in different order (like attributes), different or no quotes around attributes, different capitalization, different spacing, etc...
You could do something like this:
var rows = document.getElementsByTagName('tr');
for (var i = 0; i < rows.length; i++) {
var children = rows[i].children;
if (children.length === 1 && children[0].nodeName.toLowerCase() === 'td') {
children[0].innerHTML = someHTMLdata
}
}
Note that this sets the value of the table cell, rather than replacing the whole row. If you want to do something other than this, you'll have to use DOM methods rather than innerHTML and specify exactly what you actually want.
i'm implementing a charcounter in the UI, so a user can see how many characters are left for input.
To count, i use this simple function:
function typerCount(source, layerID)
{
outPanel = GetElementByID(layerID);
outPanel.innerHTML = source.value.length.toString();
}
source contains the field which values we want to meassure
layerID contains the element ID of the object we want to put the result in (a span or div)
outPanel is just a temporary var
If i activate this function, while typing the machine really slows down and i can see that FF is using one core at 100%. you can't write fluently because it hangs after each block of few letters.
The problem, it seems, may be the value.length() function call in the second line?
Regards
I can't tell you why it's that slow, there's just not enough code in your example to determine that. If you want to count characters in a textarea and limit input to n characters, check this jsfiddle. It's fast enough to type without obstruction.
It could be having problems with outPanel. Every time you call that function, it will look up that DOM node. If you are targeting the same DOM node, that's very expensive for the browser if it's doing that every single time you type a character.
Also, this is too verbose:
source.value.length.toString();
This is sufficient:
source.value.length;
JavaScript is dynamic. It doesn't need the conversion to a string.
I doubt your problem is with the use of innerHTML or getElementById().
I would try to isolate the problem by removing parts of the function and seeing how the cpu is used. For instance, try it all these ways:
var len;
function typerCount(source, layerID)
{
len = source.value.length;
}
function typerCount(source, layerID)
{
len = source.value.length.toString();
}
function typerCount(source, layerID)
{
outPanel = GetElementByID(layerID);
outPanel.innerHTML = "test";
}
As artyom.stv mentioned in the comments, cache the result of your GetElementByID call. Also, as a side note, what is GetElementByID doing? Is it doing anything else other than calling document.getElementById?
How would you cache this you say?
var outPanelsById = {};
function getOutPanelById(id) {
var panel = outPanelsById[id];
if (!panel) {
panel = document.getElementById(id);
outPanelsById[id] = panel;
}
return panel;
};
function typerCount(source, layerId) {
var panel = getOutPanelById(layerId);
panel.innerHTML = source.value.length.toString();
};
I'm thinking there has to be something else going on though, as even getElementById calls are extremely fast in FF.
Also, what is "source"? Is it a DOMElement? Or is it something else?