How to implement fulltext search for offline website

How to implement fulltext search for offline website - javascript

I need a client-side fulltext search for big offline website. The site is opened by browser. I've made a research and found some solutions - fullproof, fuse.js, flexsearch.js, elasticlunr.js. I searched for js libs, because as i inderstand, it's the single solution (please correct me if i'm wrong).
Also i can't clearly understand some moments:
As i know, browsers due to security policy block all scripts' execution from javascripts files. I couldn't run examples from fullproof git because of this problem, but i managed to run flexsearch example, because script execution was included in html code with tag. Can i implement some search system for my local website because i don't use any local server for hosting (like xampp)?
From documentation for different js libs i undestand, that all they use for indexing either variable with list of key words or json file. Maybe i have a luck of information, but how i can use search system to find words/expressions from the whole website (it has a main page and a lot of included pages with information)? Do i have to create some sort of database or some json file?
I'll be very grateful for your answers, explanations, solutions or maybe examples about this problem, thank you!

Try using Tipuesearch, they have a very simple mechanism for implementing offline search in your website...Visit their site, download al lrequired repositories and follow the instructions to add the search functionality to your website...
I have implemented it with success in a sample movie website i was designing and the searches can be modified to come with images...
https://directory.fsf.org/wiki/Tipue-Search

Firts of all, browsers doesn't block js scripts execution. Secondly, i managed to find two ways to solve my problem - keywords search and fulltext search: 1) I created a database with keywords (json file) and used flexsearch library to search in this database. Example of usage u can find on their website, either an example of json file. 2) This time i created a database (json file), in which one recording is a text content of a website. Then again i used flexsearch to find a word in this database. After appropriate site was found, it opened up and the searched word was highlighted (u can find such js libraries in net). All the solutions don't require any internet connection and can be used for offline websites.

Related

Running a python script in a Gsheet button

I want to use Gsheets to track accounts and group membership in a proprietary 3rd-party application. Right now I have a python script running on a server that checks the sheet for accounts and membership parameters and it works great. It will read the sheet and check the 3rd-party app and make sure they always match with the sheet being the source of truth.
However, I'm not really a Gsheet master. Recently I found that I can create buttons in Gsheets that will run JS code. I'm also not a JS master, however.
I actually have two questions. The first one I think the answer is yes. Can I hit outside resources like REST endpoints from these buttons?
If I am right and that can work. The second question is, can I have a JS script inside a button run a python script that I contain in the Gsheet somehow?
The reason is that this would make the entire process contained in the Gsheet for much easier distribution and adoption as it eliminates the need for ancillary resources to run the Python code.
Any help would be much appreciated!
Thanks!

How to use a basic Google Sheets javascript script in GitHub Pages?

Trying to implement a piece of code to display data from a Google Spreadsheet in my GitHub Pages page. I found Sheetrock.js and the JS Fiddle worked, but I'm missing how to setup a basic javascript file structure. I know I can name files ___.js and call them from the html file, but the Sheetrock.js info doesn't use this format. I remember trying this a while ago and after hitting the same roadblocks gave up, so I'm posting to hopefully save others, coming from a similar search for using GSheets in a static site through javascript, hours of searching on such a simple problem.

I searched for hours (through Jekyll tutorials and other JS package installations). I know this seems simple now, but there was only one basic javascript setup I finally found that answered it after searching on 'javascript', 'github pages', and even jquery with GH.
So the basic answer seems to be to simply create an index.js file next to index.html, and this is of course where all the generic javascript goes. The tags and external http CDN js file imports go in the HTML file.
You can use other javascript means such as JSON based access to connect to GSheet data, but Sheetrock.js seems to be doing okay.
There are also ways to adjust the iframe to select columns and use the SQL language base using the Google Google Visualization API Query Language described in this page. The SQL type commands carry over to the Sheetrock.js usage.

Angularjs vs SEO vs pushState

After reading this thread I decided to use pushstate api in my angularjs application which is fully API-based (independent frontend and independent backend).
Here is my test site: http://huyaks.com/index.html
I created a sitemap and uploaded to google webmaster tools.
From what I can see:
google indexed the main page, indexed the dynamic navigation (cool!) but did not index
any of dynamic urls.
Please take a look.
I examined the example site given in the related thread:
http://html5.gingerhost.com/london
As far as I can see, when I directly access a particular page the content which is presumed to be dynamic is returned by the server therefore it's indexed. But it's impossible in my case since my application is fully dynamic.
Could you, please, advise, what's the problem in my particular case and how to fix it?
Thanks in advance.
Note: this question is about pushState way. Please do not advise me to use escaped fragment or 3-d party services like prerender.io. I'd like to figure out how to use this approach.

Evidently Quentin didn't read the post you're referring to. The whole point of http://html5.gingerhost.com/london is that it uses pushState and proves that it doesn't require static html for the benefit of spiders.
"This site uses HTML5 wizrdry [sic] to load the 'actual content' asynchronusly [sic] to the rest of the code: this makes it faster for users, but it's still totally indexable by search engines."
Dodgy orthography aside, this demo shows that asynchronously-loaded content is indexable.

As far as I can see, when I directly access a particular page the content which is presumed to be dynamic is returned by the server
It isn't. You are loading a blank page with some JavaScript in it, and that JavaScript immediately loads the content that should appear for that URL.
You need to have the server produce the HTML you get after running the JavaScript and not depend on the JS.

Google does interpret Angular pages, as you can see on this quick demo page, where the title and meta description show up correctly in the search result.
It is very likely that if they interpret JS at all, they interpret it enough for thorough link analysis.
The fact that some pages are not indexed is due to the fact that Google does not index every page they analyze, even if you add it to a sitemap or submit it for indexing in webmaster tools. On the demo page, both the regular and the scope-bound link are currently not being indexed.
Update: so to answer the question specifically, there is no issue with pushState on the test site. Those pages simply do not contain value-adding content for Google. (See their general guidelines).

Sray, I recently opened up the same question in another thread and was advised that Googlebot and Bingbot do index SPAs that use pushState. I haven't seen an example that ensures my confidence, but it's what I'm told. To then cover your bases as far as Facebook is concerned, use open graph meta tags.
I'm still not confident about pushing forward without sending HTML snippets to bots, but like you I've found no tutorial telling how to do this while using pushState or even suggesting it. But here's how I imagine it would work using Symfony2...
Use prerender or another service to generate static snippets of all your pages. Store them somewhere accessible by your router.
In your Symfony2 routing file, create a route that matches your SPA. I have a test SPA running at localhost.com/ng-test/, so my route would look like this:
# Adding a trailing / to this route breaks it. Not sure why.
# This is also not formatting correctly in StackOverflow. This is yaml.
NgTestReroute:
----path: /ng-test/{one}/{two}/{three}/{four}
----defaults:
--------_controller: DriverSideSiteBundle:NgTest:ngTestReroute
--------'one': null
--------'two': null
--------'three': null
--------'four': null
----methods: [GET]
In your Symfony2 controller, check user-agent to see if it's googlebot or bingbot. You should be able to do this with the code below, and then use this list to target the bots you're interested in (http://www.searchenginedictionary.com/spider-names.shtml)...
if(strstr(strtolower($_SERVER['HTTP_USER_AGENT']), "googlebot"))
{
// what to do
}
If your controller finds a match to a bot, send it the HTML snippet. Otherwise, as in the case with my AngularJS app, just send the user to the index page and Angular will correctly do the rest.
Also, has your question been answered? If it has, please select one so I and others can tell what worked for you.
HTML snippets for AngularJS app that uses pushState?

Web crawler: Using Perl's MozRepl module to deal with Javascript

I am trying to save a couple of web pages by using a web crawler. Usually I prefer doing it with perl's WWW::Mechanize modul. However, as far as I can tell, the site I am trying to crawl has many javascripts on it which seem to be hard to avoid. Therefore I looked into the following perl modules
WWW::Mechanize::Firefox
MozRepl
MozRepl::RemoteObject
The Firefox MozRepl extension itself works perfectly. I can use the terminal for navigating the web site just the way it is shown in the developer's tutorial - in theory. However, I have no idea about javascript and therefore am having a hard time using the moduls properly.
So here is the source i like to start from: Morgan Stanley
For a couple of listed firms beneath 'Companies - as of 10/14/2011' I like to save their respective pages. E.g. clicking on the first listed company (i.e. '1-800-Flowers.com, Inc') a javascript function gets called with two arguments -> dtxt('FLWS.O','2011-10-14'), which produces the desired new page. The page I now like to save locally.
With perl's MozRepl module I thought about something like this:
use strict;
use warnings;
use MozRepl;
my $repl = MozRepl->new;
$repl->setup;
$repl->execute('window.open("http://www.morganstanley.com/eqr/disclosures/webapp/coverage")');
$repl->repl_enter({ source => "content" });
$repl->execute('dtxt("FLWS.O", "2011-10-14")');
Now I like to save the produced HTML page.
So again, the desired code I like to produce should visit for a couple of firms their HTML site and simply save the web page. (Here are e.g. three firms: MMM.N, FLWS.O, SSRX.O)
Is it correct, that I cannot go around the page's javascript functions and therefore cannot use WWW::Mechanize?
Following question 1, are the mentioned perl modules a plausible approach to take?
And finally, if you say the first two questions can be anwsered with yes, it would be really nice if you can help me out with the actual coding. E.g. in the above code, the essential part which is missing is a 'save'-command. (Maybe using Firefox's saveDocument function?)

The web works via HTTP requests and responses.
If you can discover the proper request to send, then you will get the proper response.
If the target site uses JS to form the request, then you can either execute the JS,
or analyse what it does so that you can do the same in the language that you are using.
An even easier approach is to use a tool that will capture the resulting request for you, whether the request is created by JS or not, then you can craft your scraping code
to create the request that you want.
The "Web Scraping Proxy" from AT&T is such a tool.
You set it up, then navigate the website as normal to get to the page you want to scrape,
and the WSP will log all requests and responses for you.
It logs them in the form of Perl code, which you can then modify to suit your needs.

newbie question about javascript embed code?

I am a javascript newbie. I am trying to write a requirements document, and need some help describing what I am looking for. We want our application to generate a javascript snippet like this:
<script src="http://www.jotform.com/jsform/10511502633"></script>
This will load a web form.
So my question is:
- How does a single script load an entire web form? Is this a JSON?
- What is this called? Is this a cross browser javascript?
- Can anyone point me in the direction of learning more about what this is?
Thank you for your help!

The javascript file is just hosted on an external site. It appears to be dynamically generated, so feel free to use some fancy words ;) But basically, you just include it here, as if it was on your own site.
You could say "The application will generate the required script-tags to include dynamically generated javascript file from an external, third-party site".
Offcourse you need to take special cautions for cases when the include won't work, because the other site is not reachable (site is down, DNS does not work, file is moved on other webserver, your application is on an intranet/behind a proxy/firewall...). Why can't you copy their file and mirror it locally? Or use a reliable Content Delivery Network, like Google or Amazon.

There are many names for this type of inclusion. The most common being widget.
What does it actually do:
take an id of some sort as parameter
use the id to fetch some specific data (most likely from a database)
generate some js and html based on the id/data
usually this involves iframes of some sort.
To use a script rather than an html iframe has multiple advantages
you can change what is actually delivered to the users browsers without changing the include
you can resize the iframe to fit certain predefined sizes
you can inject the necessary things into the page the widget is included (of course you need to make sure this is sanctioned)
We use this all the time and we never regreted it.
If you don't want to build the widget infrastructure yourself you can always use one of the widget providers like widgetbox:
http://www.widgetbox.com/widgets/make/
With those you are up and running in no time.

This is typically called a script include.

Google have lots of these types of items, and even they call them by many names,
widgets, custom javascript, snippets, custom code, etc. It really depending on who you are writing for... I would go with "cross platform embeddable javascript code" meaning that it would need to load all its dependancies. Also specify which browsers need to be supported and what should happen is the user has javascript turned off.
EDIT :
Actually since we are talking unique IDs, you will need 2 parts probably, the user/site unique "cross platform embeddable javascript code" and whatever serverside code to support it. Basically this is an API that is accessed using your own javascript widget. Feel free you point to examples in your requirements document, programmers love examples.

Develop Reference

JavaScript is the programming language of the Web.