When should I split a JSON into smaller parts? - javascript

I want to use cmudict file in a web. It contains 170000 words with its phonetic transcription (in ARPAbet symbols).
http://www.speech.cs.cmu.edu/cgi-bin/cmudict
I want to use it in JSON format, search any word introduced by the user and return an explanation of how to pronounce it syllabe by syllabe. The second part is not very complex in search terms as there are only 39 different phonemes, but the first one with the 170000 entries may consume too much time if the user introduces a text instead of a single word to transcript.
I wonder if it's worth to split the JSON into for example 26 parts (one per initial letter) and search only in the corresponding file.
Also I don't know if JSON is the best format for this, but I want to use it in a free blog like Tumblr or Blogger ones (or similar, the thing is that I don't want to spend money in this) and Javascript is what they support. I would listen suggestions on this too.

Well, that is tough call since you must consider download size. I would shorten the names of all your properties to be as small as possible, so instead of repeating "description" : "the short description", I would go with "sd" : "the short description". You are trying to use javascript to serve a data file, which is okay since you can rely on caching and what not, but the initial download size may be rather large. I would do something like var myDictionary = { }; at the top of the file, that way you can reference the variable since it is in the global space. It is an interesting experiment for sure.

Related

Storing data into URL links?

I can't find out the keyword I'm looking for. When I google anything with URL encoding or storing data, include data, whatever, I get all kinds of results except what I'm interested in. This is the only website I could find off the top of my head that shows what I'm looking for:
http://www.pathofexile.com/passive-skill-tree/AAAAAgMA37CCEEGWBUKusyycwbTk7HYRfq9JsgLjB6Vr230Y7IpzU8BU5oERUDGIeQMI9It6EHQOXEV-Va6X9JeVUlOPpkSrPV8EB0yzLR-NGeAS3Yy1heZM2V8ucA==
after tree/ it has a long code that pretty much is full of data. What should I look into to be able to do something like that? Is one supposed to create their own method according to what they need? Or is there a way one can just take one super long text and have a library encode it to make it smaller for the URL and then decode it when it loads?
I require tons of numbers, around 100. I figured it would be something like this, first off use a symbol to separate each 'variable', in this case let's use '-' and do something like this:
www.url.com/tree/1-1-1-0-3-2-1-3-4-5-2...total of 100 numbers..1-0-2, but then it gets encoded to be much smaller to something like
www.url.com/tree/xDgdmFdmnDfjSDfjSFdKflWepLS and this url gets decoded once loaded and the data retrieved and used behind the scenes.
Is there an easier way of doing this, or does one have to do it manually depending on their needs? By easier I mean, a way of encoding it, or does one have to do the encoding themselves? For example, make it so if there are more of the same numbers next to each other then it takes them and transforms them into letters, let's say there are five 3's next to each other, it would use the letter c to show what the number is, and a capital letter for the number of times it's repeated, so cE would mean five 3's in a row.
My question is, is there a way to encode it or do I have to think of a way to encode it myself like I was writing in the example?
Any information you have related to this subjecte is GREATLY appreciated!! Thanks so much in advance for taking the time to read all this and reply, sorry to bother
You are looking to base 64 encode data.

Flash Twitter API with JSON

I have read a lot about parsing JSON with Actionscript. Originally it was said to use this library. http://code.google.com/p/as3corelib/ but it seems Flash Player 11 has native support for it now.
My problem is that I cannot find examples or help that takes you from beginning to end of the process. Everything I have read seems to start in the middle. I have no real experience with JSON so this is a problem. I don't even know how to point ActionScript to the JSON file it needs to read.
I have a project with a tight deadline that requires me to read twitter through JSON. I need to get the three most recent tweets, along with the user who posted it, their twitter name and the time those tweets were posted.
The back end to this is already set up I believe by the development team here, therefor my JSON files or XML just needs to be pointed to and then I need to display the values in the interface text boxes I have already designed and created.
Any help will be greatly appreciated...I do know that there are a lot of threads on here I just do not understand them as they all have some understanding of it to begin with.
You need to:
Load the data, whatever it is.
Parse the data from a particular format.
For this you would normally:
Use URLLoader class to load any data. (Just go to the language reference and look into example of how to use this class).
Use whatever parser to parse the particular format that you need. http://help.adobe.com/en_US/FlashPlatform/beta/reference/actionscript/3/JSON.html this is the reference to JSON API, it also shows usage examples. I'm not aware of these API being in production version of the player, still there might be quite a bit of FP 10.X players out there, so I'd have a fallback JSON parser, but I would recommend using this library: http://www.blooddy.by/en/crypto/ over as3corelib because it is faster. The built-in API are no different from those you would find in browser, so if you look up JSON JavaScript entries, the use should be in general similar to Flash.
After you parse JSON format, you will end up with a number of objects of the following types: Object, Array, Boolean, Number, String. It has also literals to mean null and undefined. Basically, you will be working with native to Flash data structures, you only should take extra care because they will be dynamically constructed, meaning you may not make assumption about existence of parts of the data - you must always check the availability.
wvxvw's answer is good, but I think skips over a to be desired explanation of what JSON itself is. JSON is plain text, javascript object notation, when you read the text on screen it looks something like this
http://www.json.org/example.html
you can see a side by side JSON and XML (both plain text formats) essentially JSON is a bunch of name value pairs.
When you use JSON.parse("your JSON string goes here") it will do the conversions to AS3 "dynamic objects" which are just plain objects (whose properties can be assigned without previously being defined, hence dynamic). But to make a long story short, take the example you see in the link above, copy and paste the JSON as a string variable in AS3, use
var str:String = '{"glossary": {"title": "example glossary","GlossDiv": {"title": "S","GlossList": {"GlossEntry": {"ID": "SGML","SortAs": "SGML","GlossTerm": "Standard Generalized Markup Language","Acronym": "SGML","Abbrev": "ISO 8879:1986","GlossDef": {"para": "A meta-markup language, used to create markup languages such as DocBook.","GlossSeeAlso": ["GML", "XML"]},"GlossSee": "markup"}}}}}';
var test:Object = JSON.parse(str);
method on the string, store it in a variable and use the debugger to see what the resulting object is. As far as I know there's really nothing else to JSON it's simply this format for storing data (you can't use E4X on it since it's not XML based and because of that it's slightly more concise than XML, no closing tags, but in my opionion slightly less readable... but is valid javascript). For a nice break-down of the performance gains/losses between AMF, JSON and XML check out this page: http://www.jamesward.com/census2/ Though many times you don't have a choice with regard to the delivery message format or protocol being used if you're not building the service, it's good to understand what the performance costs of them are.

Fulltext search ignoring comments

I want fulltext search for my JavaScript code, but I'm usually not interested in matches from the comments.
How can I have fulltext search ignoring any commented match? Such a feature would increase my productivity as a programmer.
Also, how can I do the opposite: search within the comments only?
(I'm currently using Text Mate, but happy to change.)
See our Source Code Search Engine (SCSE). This tool indexes your code base using the langauge structure to guide the indexing; it can do so for many languages including JavaScript. Search queries are then stated in terms of abstract language tokens, e.g., to find identifiers involving the string "tax" multiplied by some constant, you'd write:
I=*tax* '*' N
This will search all indexed languages only for identifiers (in each language) following by a '*' token, followed by some kind of number. Because the tool understands language structure, it isn't confused by whitespace, formatting or interverning comments. Because it understands comments, you can search inside just comments (say, for authors):
C=*Author*
Given a query, the SCSE finds all the hits across the code base (possibly millions of lines), and offers these as set of choices; clicking on choice pulls up the file with the hit in the middle outlined where the match occurs.
If you insist on searching just raw text, the SCSE provides grep-style searches. If you have only a small set of files, this is still pretty fast. If you have a big set of files, this is a lot slower than language-structure based searches. In both cases, grep like searches get you more hits, usually at the cost of false positives (e.g., finding "tax" in a comment, or finding a variable named "Authorization_code"). But at least you have the choice.
While this doesn't operate from inside an editor, you can launch your editor (for most editors) on a file once you've found the hit you want.
Use ultraedit , It fully supports full text search ignoring comment or also within the comment search
How about NetBeans way (Find Symbol in the Navigate Menu),
It searches all variables,functions,objects etc.
Or you could customize JSLint and customize it if you want to integrate it in a web application or something like that.
I personnaly use Notepad++ wich is a great free code editor. It seems you need an editor supporting regular expression search (in one or many files). If you know Reg you can use powerfull search like in/out javascript comments...the work will be to build the right expression and test it with one file with all differents cases to be sure it will not miss things during real search, or maybe you can google for 'javascript comments regular expression' or something like...
Then must have a look at Notepad++ plugins, one is 'RegEx Helper' wich helps for building regular expressions.

JavaScript webpage version comparison

In order to expedite our 'content update review process', which is used in approving web page content for publishing, I'm looking to implement a JavaScript function that will compare two webpage versions.
So far, I've created a page that will load the content to be compared from the new and old versions of a particular page. Is there a (relatively) simple way to iterate through the html of each using JavaScript/jQuery and highlight what content has changed or is missing?
Since there would be so many html-specific details (since this is essentially html text comare), is there a JavaScript library I can use?
I should add that my first would be to implement this in PHP. Unfortunately, we have many constraints that only permit us to use limited resources such as JavaScript.
Version Control is a non-trivial problem. It's probably not something you should implement from scratch, either, if this is part of your "content update review process."
Instead, consider using a tool like Subversion, Git, or your favorite source control solution.
If you really wanna do this, you can go from something as simple as Regex matching to DOM matching. There's no "magic library" that I'm aware of that will encapsulate this for you, so it'll be work. Work that you'll probably do wrong.
Seriously consider a version control provider, or use a CMS that has built in versioning of pages. If you're feeling squirrely, check out an open source CMS (like Drupal) and try to figure out how they implement versioning, then reverse engineer/re-engineer it yourself. I hope the inefficiency in that is obvious.
I would do this in 3 steps
1/ segment the content into 2 arrays
for each page
. choose a separator, like the "." or ""
. you have the content as a big string, split it and build an array
2/ compare the arrays
loop on these 2 arrays containing the segmented content, let's say A[idxA] and B[idxB]
. if A[idxA] == B[idxB] then idxA++ and idxB++
. else find if there is an index where A[idxA] == B[index]
. if there is, mark all indexes between idxB and index as "B modified"
. else, mark idxA as "A modified"
3/ display the differences
At the end you should have all the indexes where A and B are not equal. You can then join the 2 arrays after adding some markups to highlight the differences.
It is not a perfect solution, it will be wrong sometimes.. But not often if you choose your separator correctly. If you want it perfect, you will have to test several match and compute the number of differences in order to minimise it

Network-efficient difference between two strings in Javascript

I have a web application where a client side editor is editing a really really large text which is known on the server side.
The client can make any kind of modifications to this text.
What is the most network-efficient way to transmit the result difference in a way that the server understands? Also, since this will happen on client side (Javascript), I would also like it to be 'fast' (or at least not noticeably slow)
Some scenarios:
User modifies ONE character
User modifies several sentences in random positions
User erases everything and results in a blank text.
I cannot use diff-like syntax since it's not network efficent, it checks lines, where examples 1 and 3 will produce horrible differences (especially the last one, where the result will be more than the old itself).
Anyone has experience in this matter? User operates on a really large set of data - around 3-5MB of text, and uploading the whole "new" content is a big no-no.
To be clear, I'm looking for a "protocol" of transfer, string comparison is not the issue.
I'm not very familiar with this topic but I can point you to an open source (Apache License 2.0) project which may be very useful.
It is a Diff, Match and Patch library written in several languages, including JavaScript, from a Google engineer and it is used in several online collaborative editing services.
Here are a list of resources:
The Diff, Match and Patch project
The MobWrite project (Editor implementation based on the above project)
"Differential Synchronization" (A Google Tech Talk by the engineer)
A simple approach, assuming that you know the copy on the server isn't going to change, would just be to send a list of edits (deletions and additions), with the deletions represented as a start and end index, and the additions represented as a start index and the text to insert.
If you have more than a simple diff algorithm to work with (I'm not sure exactly what you mean by "string comparison is not the issue"), you could also detect moved or copied chunks of text, and send those as the start and end index of the moved or copied piece of text, as well as the destination to insert it.
Note that you'll need to make sure to keep track of whether your indices refer to the original document, or the document as edited so far. An easy approach to avoid this problem is to always perform the edits from the end of the document towards the beginning; then earlier edits won't affect the offsets specified by later edits.
For an example of an approach like this, see the ed format that diff -e outputs. This is basically input that could be fed into the ed line-oriented text editor. If you want the absolute smallest diffs to send across you may want to do character based indexing rather than line based indexing, but the same basic approach could work.
Any edits the user's performing can be efficiently broken down into: delete from X for length Y; insert at X text "whatever". X and Y are offsets in characters from the start of the text; Y is a number of characters; "whatever" is any string of characters. You say you need no help computing the diff, but an example is here, except it's richer in its output than you need, but does identify "removals and insertions", so, just change the output part.
The exact format in which you send the data to the server can be tuned, but I don't think there's much mileage in doing that -- pending measurement, I'd start by sending the commands as D for delete or I for insert, the numbers in decimal, the inserted string in quoted form. Once you have some statistics on actual transfers being performed, you can see how much overhead is in the numbers (decimal vs binary) and quotes, but I suspect that may not be all that meaningful (if it proves to be, there are all sort of things you can try, such as giving offsets from the latest point of insertion or deletion, rather than always from the start, to make things faster).
You can sample what the user is doing every few seconds, and just send the incremental changes over those last few seconds (if any) -- this way, each packet you're sending will be small, and if the net connection or the user's computer/browser crash, the user won't have lost much work.
You could just send changes every 500ms, so, whatever changes were made in the last 500ms would be sent, but you only send data when there was a change.
In this you could then send the position of the changed word(s) and just send the entire word, but I would have the position be from the front of the text.
It won't be several sentences worth, but there may be several words involved, but, if you send them in order of change then the result should be consistent.
Because there are so many ways to do edits--even within short periods of time like 500ms--including dragging and dropping, or cutting and pasting, large sections of text around within the document or from outside it--I don't know if there's going to be something that will cover all scenarios really well. This is certainly a non-answer to your question at face value, but I would consider carefully the trouble of developing and maintaining something like this compared to changing the interface to restrict the text size and breaking existing texts into smaller pieces.
Maybe that's not possible in your situation, but if it is, I would guess it would be much less trouble in the end to dodge the issue in this way and just send full documents after an edit.

Categories

Resources