Extract document extensions from clicks - javascript

I'm using this technique to extract the click events in my SharePoint site. It uses jquery and a regular expression to capture clicks and report them as events to google analytics.
I'm also just past total newbie with regex -- It is starting to make some sense to me, but I have a lot to learn still. So here goes.
I have a preapproved list of filetypes that I am interested in based on the site listed above.
var filetypes = /\.(zip|pdf|doc.*|xls.*|ppt.*|mp3|txt|wma|mov|avi|wmv|flv|wav|jpg)$/i;
But it isn't quite working like I need. With the $ I assume it is trying to match to the end of the line. But often in SharePoint we get links like this:
example.org/sharepoint/_layouts/15/wopiframe.aspx?sourcedoc=/sharepointlibrary/the%20document%20name.docx&action=default&defaultitemopen=1
The two problems I have are, I can't count on the file name being before the query or hash and I can't count on it being at the end. And all the different Microsoft Office extensions.
I found this thread on extracting extensions, but it doesn't seem to work correctly.
I've put together this version
var filetypes = \.(zip|pdf|doc|xls|ppt|mp3|txt|wma|mov|avi|wmv|flv|wav|jpg)[A-Za-z]*
I changed the office bits from doc.* to just plain doc and added the optional alpha character afterwards. And removed the $ end anchor. It seems to be working with my test sample, but I don't know if there are gotchas that I don't understand.
Does this seem like a good solution or is there a better way to get a predetermined list of extensions (including for example the Office varions like doc, docx, docm) that is either before the query string or might be one parameter in the query string?

I would go with the following which matches file name and extension:
/[^/]+\.(zip|pdf|doc[xm]?|xlsx?|ppt|mp3|txt|wma|mov|avi|wmv|flv|wav|jpg)/i
Outputs the%20document%20name.docx from you example.
There may be other formats that it might not work on but should get you what you want.

Related

How to use the simpler regex check that chrome extension uses for URLs?

I'm trying to limit the websites that my chrome extension runs on (based on user settings so this can't be set in the manifest.json) but I'd like the regex to be easy for users to input themselves. The regex that I am looking for is actually the same kind that the chrome extension is looking for in it's manifest file for the URLs the extension will run on.
For example, https://developer.chrome.com/extensions/match_patterns
stackoverflow.com/*
If I have that is an input to the options page. How can I get javascript to use this kind of regex validation instead of the more formal regex (that includes .?*[]|) ?
If I can use that kind of regex check I will pull the tab.url and the urls from user options, and validate to check if the extension needs to run on the current page or not.
You will have to analyze the text yourself or simply convert the string to RE2 (see below):
Assuming var url = new URL(url_text) (in the actual code this won't work because protocol may be *:// so you'll have to split the parts with regexp), here's a few examples of the logic:
if url.pathname contains one * at the end use urlPrefix of the event filter and hostEquals
if url.pathname is "/" and url.hostname has * at the end use hostPrefix
..............
This is a simple example that doesn't even check for multiple * in the input text, but in the working code don't forget to also parse and provide schemes parameter as well as ports and of course the various cases of * placement will produce different set of matching rules.
If none if the simple string event filters were applicable or you're not concerned with the performance (not sure the penalty is significant and can be measured at all though) then escape the string for Chrome's RE2 matching and use urlMatches or originAndPathMatches:
url_text.replace(/[{}()\[\]\\.+?^$|]/g, "\\$&").replace(/\*/g, ".*?")

Run Database Stored RegEx against DOM

I have a question about how to approach a certain scenario before I get halfway through it and figure out it was not the best option.
I work for a large company that has a team that creates tools for the team mates to use that aren’t official enterprise tools. We have no access to the database directly, just access to an internal server to store our files to run and be able to access the main site with javascript etc (same domain).
What I am working on is a tool that has a ton of options in it that allow you to select that I will call “data points” on a page.
There are things like “Account status, Balance, Name, Phone number, email etc” and have it save those to an excel sheet.
So you input account numbers, choose what you need and then using IE Objects it navigates to the page and scrapes data you request.
My question is as follows..
I want to make the scraping part pretty Dynamic in the way it works. I want to be able to add new datapoints on the fly.
My goal or idea is so store the regular expression needed to get the specific piece of data in the table with the “data point option”.
If I choose “Name” it knows the expression for name in the database to run again the DOM.
What would be the best way about creating that type of function in Javascript / Jquery?
I need to pass a Regex to a function, have it run against the DOM and then return the result.
I have a feeling that there will be things that require more than 1 step to get the information etc.
I am just trying to think of the best way to approach it without having to hardcode 200+ expressions into the file as the page may get updated and need to be changed.
Any ideas?
IRobotSoft scraper may be the tool you are looking for. Check this forum and see if questions are similar to what you are doing: http://irobotsoft.org/bb/YaBB.pl?board=newcomer. It is free.
What it uses is not regular expression but a language called HTQL, which may be more suitable for extracting web pages. It also supports regular expression, but not as the main language.
It organizes all your actions well with a visual interface, so you can dynamically compose actions or tasks for changing needs.

Fulltext search ignoring comments

I want fulltext search for my JavaScript code, but I'm usually not interested in matches from the comments.
How can I have fulltext search ignoring any commented match? Such a feature would increase my productivity as a programmer.
Also, how can I do the opposite: search within the comments only?
(I'm currently using Text Mate, but happy to change.)
See our Source Code Search Engine (SCSE). This tool indexes your code base using the langauge structure to guide the indexing; it can do so for many languages including JavaScript. Search queries are then stated in terms of abstract language tokens, e.g., to find identifiers involving the string "tax" multiplied by some constant, you'd write:
I=*tax* '*' N
This will search all indexed languages only for identifiers (in each language) following by a '*' token, followed by some kind of number. Because the tool understands language structure, it isn't confused by whitespace, formatting or interverning comments. Because it understands comments, you can search inside just comments (say, for authors):
C=*Author*
Given a query, the SCSE finds all the hits across the code base (possibly millions of lines), and offers these as set of choices; clicking on choice pulls up the file with the hit in the middle outlined where the match occurs.
If you insist on searching just raw text, the SCSE provides grep-style searches. If you have only a small set of files, this is still pretty fast. If you have a big set of files, this is a lot slower than language-structure based searches. In both cases, grep like searches get you more hits, usually at the cost of false positives (e.g., finding "tax" in a comment, or finding a variable named "Authorization_code"). But at least you have the choice.
While this doesn't operate from inside an editor, you can launch your editor (for most editors) on a file once you've found the hit you want.
Use ultraedit , It fully supports full text search ignoring comment or also within the comment search
How about NetBeans way (Find Symbol in the Navigate Menu),
It searches all variables,functions,objects etc.
Or you could customize JSLint and customize it if you want to integrate it in a web application or something like that.
I personnaly use Notepad++ wich is a great free code editor. It seems you need an editor supporting regular expression search (in one or many files). If you know Reg you can use powerfull search like in/out javascript comments...the work will be to build the right expression and test it with one file with all differents cases to be sure it will not miss things during real search, or maybe you can google for 'javascript comments regular expression' or something like...
Then must have a look at Notepad++ plugins, one is 'RegEx Helper' wich helps for building regular expressions.

JavaScript webpage version comparison

In order to expedite our 'content update review process', which is used in approving web page content for publishing, I'm looking to implement a JavaScript function that will compare two webpage versions.
So far, I've created a page that will load the content to be compared from the new and old versions of a particular page. Is there a (relatively) simple way to iterate through the html of each using JavaScript/jQuery and highlight what content has changed or is missing?
Since there would be so many html-specific details (since this is essentially html text comare), is there a JavaScript library I can use?
I should add that my first would be to implement this in PHP. Unfortunately, we have many constraints that only permit us to use limited resources such as JavaScript.
Version Control is a non-trivial problem. It's probably not something you should implement from scratch, either, if this is part of your "content update review process."
Instead, consider using a tool like Subversion, Git, or your favorite source control solution.
If you really wanna do this, you can go from something as simple as Regex matching to DOM matching. There's no "magic library" that I'm aware of that will encapsulate this for you, so it'll be work. Work that you'll probably do wrong.
Seriously consider a version control provider, or use a CMS that has built in versioning of pages. If you're feeling squirrely, check out an open source CMS (like Drupal) and try to figure out how they implement versioning, then reverse engineer/re-engineer it yourself. I hope the inefficiency in that is obvious.
I would do this in 3 steps
1/ segment the content into 2 arrays
for each page
. choose a separator, like the "." or ""
. you have the content as a big string, split it and build an array
2/ compare the arrays
loop on these 2 arrays containing the segmented content, let's say A[idxA] and B[idxB]
. if A[idxA] == B[idxB] then idxA++ and idxB++
. else find if there is an index where A[idxA] == B[index]
. if there is, mark all indexes between idxB and index as "B modified"
. else, mark idxA as "A modified"
3/ display the differences
At the end you should have all the indexes where A and B are not equal. You can then join the 2 arrays after adding some markups to highlight the differences.
It is not a perfect solution, it will be wrong sometimes.. But not often if you choose your separator correctly. If you want it perfect, you will have to test several match and compute the number of differences in order to minimise it

Add spell check to my website

I have an asp-based website which I would like to add spell checking capabilities to the textarea elements on the page. Most of the pages are generated from an engine, though I can add JavaScript to them. So my preferred solution is a JavaScript-based one. I have tried JavaScriptSpellCheck and it works okay, though I would like to see what some of my other options may be. I also found spellchecker.net but at $3500 for a server license it seems excessive.
Spell checking can be in a separate window and must support multiple languages (the more the better). Ultimately I would like to send the spell check object a collection or delimited string of textarea names or id's (preferably names as they already exist in the pages) and have it spell check all of them, updating the text as spelling is corrected.
Check out using Google's api for this: http://www.asp101.com/articles/jeremy/googlespell/default.asp
Here is a free, open source Javascript library for spell checking that I authored:
https://github.com/LPology/Javascript-PHP-Spell-Checker
There's a link to a live demo at the top. It's designed to have the feel of a spell checker in a desktop word processor. I wrote it after being dissatisified with these same options.
To use, just include the JS and CSS files into your page, and then add this:
var checker = new sc.SpellChecker(
button: 'spellcheck_button', // opens the spell checker when clicked
textInput: 'text_box', // HTML field containing the text to spell check
action: '/spellcheck.php' // URL of the server side script
);
It includes a PHP script for spell checking, but it could be ported to another language fairly easily as long as it returns the correct JSON response.
If I were you, I'd look into something like aspell - this is used as one of the supported spellchecking backends in TinyMCE. Personally, I use pspell because it's integrated into PHP.
EDIT
There's an aspell integration here that has a PHP or a Perl/CGI version; might be worth checking out.
If I am not wrong, Firefox's English dictionary for spell checking takes around 800KB of data.
If you like to do everything in JavaScript -- for a full-featured spell checking engine, it means you need to load that 800KB data in every page load. It's really not a good idea.
So, instead of doing that in JavaScript, send the data to the server with AJAX, check it server side, and return it back; that's the best way.
Well this is quite old question, but my answer might help people who are looking for latest options on this question.
"JavaScript SpellCheck" is the industry leading spellchecker plugin for javascript. It allows the developer to easily add and control spellchecking in almost any HTML environment. You can install it in about 5 minutes by copying a folder into your website.
http://www.javascriptspellcheck.com/
Also support multiple languages - http://www.javascriptspellcheck.com/Internationalization_Demo
I might be a bit late on the answer to this question. I found a solution a long while ago. You must have a spell checker installed on your browser first. Then create a bookmark with the following code as the link.
javascript:document.body.contentEditable='true'; document.designMode='on'; void 0

Categories

Resources