Delta encoding for JSON objects [closed]

Delta encoding for JSON objects [closed] - javascript

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Is there a standard library or tool out there for computing and applying differences to JSON documents? Basically I have a bunch of largish documents that I want to keep synchronized across a network, and I would prefer to avoid having to resend their entire state each time that I want to synchronize them (since many of these variables aren't going to change). In other words, I only want to transmit the fields which changed, not retransmit the entire object. I would think that it would be convenient to have something like the following set of methods:
//Start with two distinct objects on the server
// prev represents a copy of the state of the object on the client
// next represents a copy of the state of the object on the server
//
//1. Compute a patch
patch = computePatch(prev, next);
//2. Send patch over the network
//3. Apply the patch on the client
applyPatch(prev, patch);
//Final invariant:
// prev represents an equivalent object to JSON.parse(JSON.stringify(next))
I could certainly implement one myself, but there are quite a few edge cases that need to be considered. Here are some of the straightforward (though somewhat unsatisfactory) methods that I can think of such as:
Roll my own JSON patcher. Asymptotically, this is probably the best way to go, since it would be possible to support all the relevant features of JSON documents, along with supporting some specialized methods for doing stuff like diffing ints, doubles and strings (using relative encoding/edit distance). However, JSON has a lot of special cases and I am a bit leery of trying to do this without a lot of testing, and so I would much prefer to find something that already solves this problem for me so that I can trust it, and not have to worry about network Heisenbugs showing up due to mistakes in my JSON patching
Just compute the edit distance directly between the JSON strings using dynamic programming. Unfortunately, this doesn't work if the client and server have different JSON implementations (ie the order of their fields could be serialized differently), and it is also pretty expensive being a quadratic time operation.
Use protocol buffers. Protocol buffers have a built in diff method which does exactly what I want, and they are a nice binary-serializable network friendly format. Unfortunately, because they are also strictly typed, they lack many of the advantages of using JSON such as the ability to dynamically add and remove fields. Right now this is the approach I am currently leaning towards, but it could make future maintenance really horrible as I would need to continually update each of my objects.
Do something really nasty, like make a custom protocol for each type of object, and hope that I get it right in both places (yeah right!).
Of course what I am really hoping for is for someone here on stackoverflow to come through and save the day with a reference to a space efficient javascript object differ/patcher that has been well tested in production environments and across multiple browsers.
*Update*
I started writing my own patcher, an early version of it is available at github here:
https://github.com/mikolalysenko/patcher.js
I guess since there doesn't seem to be much out here, I will instead accept as an alternative answer a list of interesting test cases for a JSON patcher.

I've been mantaining a json diff & patch library at github (yes, shameless plug):
https://github.com/benjamine/JsonDiffPatch
it handles long strings automatically using Neil Fraser's diff_match_patch lib.
it works both on browsers and server (unit tests running on both env).
(full feature list is on project page)
The only thing you probably would need, that's not implemented is the option to inject custom diff/patch functions for specific objects, but that doesn't sound hard to add, you're welcome to fork it, and even better send a pull request.
Regards,

The JSON-patch standard has been updated.
https://datatracker.ietf.org/doc/html/draft-ietf-appsawg-json-patch-10
You can find an implementation for applying patches and generating patches at https://github.com/Starcounter-Jack/Fast-JSON-Patch

I came across this question searching for implementations of json-patch. If you are rolling your own you might want to base it on this draft.
https://datatracker.ietf.org/doc/html/draft-pbryan-json-patch-00

Use JSON Patch which is the standard way to do this.
JSON Patch is a format for describing changes to a
JSON document. It can be used to avoid sending a whole document when
only a part has changed. When used in combination with the HTTP PATCH
method it allows partial updates for HTTP APIs in a standards
compliant way.
The patch documents are themselves JSON documents.
JSON Patch is specified in RFC 6902 from the IETF.
Libraries exist for most platforms and programming languages.
At the time of writing, Javascript, Python, PHP, Ruby, Perl, C, Java, C#, Go, Haskell and Erlang are supported (full list and libraries here).
Here is a list for javascript
Fast-JSON-Patch both diffs and patches, 509,361 weekly downloads on NPM
jiff both diffs and patches, 5,075 weekly downloads on npm
jsonpatch-js only applies patches, 2,014 weekly downloads in npm
jsonpatch.js only applies patches, 1,470 weekly downloads in npm
JSON8 Patch both diffs and patches, 400 weekly downloads on npm
By far, everybody (myself included), is using the Fast-JSON-Patch library. It works in NodeJS and in the browser.

Related

Jquery/Javascript solution for converting wiki-text to HTML and vice versa?

For my web front end I have to implement subsets of the wiki-syntax in my system. Do I need to manually specify rules and reinvent the wheel? Is there an existing javascript library or jquery plugin that could help out with it?
For example a user enters == Header == Since this needs to get converted to a medium header for example (assuming medium is defined in this context as a span as below)
<span class="mediumHeader" id = "Header">Header</span>
Now when the user edits the above text I'm guessing it'll involve replacing the
<span...> ... </span> with ==...==
Now for every system I design this will be as per 'my rules' and will almost always have to reinvent the wheel. Is there something that I could use to ease this wiki to/from HTML transformation using Jquery/Javascript? I'm sure it's a problem with a known solution.
I would prefer to customize what's acceptable and what isn't i.e. I don't everything to be translated into wiki syntax (or HTML) only subsets of it. Should I just roll my own for my application?

It's been long enough that you may not need this, but yours was the top SO hit when I started looking into it.
There are a couple javascript options - you're probably looking at instaview (check out test/test.js), or maybe Wiky.js (the less fully documented).
If you aren't limited to Javascript, check out the exhaustive list of MediaWiki parsers at http://www.mediawiki.org/wiki/Alternative_parsers - lots of tools for C++, Java, Perl, ruby, and more. That's the link to watch for new developments.

At the time of writing, Parsoid seems to be the only one which translates in both directions. This one also powers the visual editor on Wikipedia. But this is no handy client-site lib to include in your app, but a full-blown parsing and transformation server suite. A production version of Parsoid on the Wikimedia cluster can be accessed at http://parsoid-lb.eqiad.wikimedia.org/.
Other JavaScript Libraries, which are translating from WikiText to HTML only (ordered by popularity), are:
Wiky.js - doesn't support full WikiText syntax. (by tanin47, not to be confused with Wiki.js from Requarks - a different project completely)
wtf_wikipedia - isn't directly translating to HTML but JSON, which results in much more powerful possiblities (e.g. info-boxes as key-value pairs). This is the most up-to-date library and "its a combination of instaview, txtwiki, and uses the inter-language data from Parsoid javascript parser."
instaview - no updates in the last 2 years.
Also checkout the current and full list of alternative MediaWiki parsers.

Choosing an appropriate compression scheme for data transfer over JSON

After some comments by David, I've decided to revise my question. The original question can be found below as well as the newly revised question. I'm leaving the original question simply to have a history as to why this question was started.
Original Question (Setting LZMA properties for jslzma)
I've got some large json files I need to transfer with ajax. I'm currently using jQuery and $.getJSON(). I'd like to use the jslzma library to decompress the files upon receiving them. Currently, I'm using django with the pylzma library to compress the files.
The only problem is that there's a lack of documentation for the jslzma library. There is some, but not enough. So I have two questions about how to use the library.
It gives this as an example:
LZMA.decompress(properties, inStream, outStream, outSize);
I know how to set the inStream and outStream variables, but not the properties or the outSize. So can anyone give an example(s) on how to set the properties variable (ie. what's expected) and how to calculate the outSize...
Thanks.
Edit #1 (Revised Question)
I'm looking for a compression scheme that lends itself to highly repeatable data using python (django) and javascript.
The data being transferred contains elevation measurements. Each file has 1200x1200 data points, which equates to about 2.75MB in it's raw binary form uncompressed. JSON balloons it to between 5-6MB. I've also looked into base64 (just to cover all the bases), which would reduce the size but I haven't had any success reading it in js. I think the data lends itself to easy compression just because of the highly repeatable data values. For example, one file only has 83 unique elevation values to describe 1440000 data points.
I just haven't had much luck, mainly because I'm just starting to learn JavaScript.
So can anyone suggest a compression scheme for this type of data? The goal is to minimize the transfer time by reducing the size for the data.
Thanks.

For what it's worth LZMA is typically very slow to compress as well as decompress; and thus it is more common to use bit faster compression schemes. Standard GZIP (deflate) has reasonably good balance: its compression ratio is acceptable, and its compression speed is MUCH better than that of LZMA or bzip2.
Also: most web servers and clients support automatic handling of gzip compression, which makes it even more convenient to use.

Decompression on the client side with Javacscript can take a significant longer time and highly depends on the available bandwidth of the client's box. Why not just implement a lesser but faster and easier to write decompression like rle, delta or golomb code? Or maybe you want to look into compressed Jsons?

C.S. Basics: Understanding Data Packets, Protocols, Wireshark

The Quest
I'm trying to talk to a SRCDS Server from node.js via the RCON Protocol.
The RCON Protocol seems to be explained enough, implementations can be found on the bottom of the site in every major programming language. Using those is simple enough, but understanding the protocol and develop a JS library is what I set out to do.
Background
Being a self taught programmer, I skipped a lot of Computer Science Basics - learned only what I needed, to accomplish what I wanted. I started coding with PHP, eventually wrapped my head around OO, talked to databases etc. I'm currently programming with JavaScript, more specifically doing web stuff with node.js ..
Binary Data?!?!
I've read and understood the absolute binary basics. But when it comes to the packet data I'm totally lost. I'd like to read and understand the wireshark output, but I can't make any sense if it. My biggest problem is probably that I don't understand what the binary representation of the various INT and STRING (char ..) from JS look like and how I convert from data I got from the server to something usable in the program.
Help
So I'd be more than grateful if someone can point me to a tutorial on these topics. Tutorial as in "explanation that mere mortals can understand, preferably not written by a C.S. professor". :)
When I'm looking at the PHP reference implementation I see (too much) magic happening there which I can't translate to JS. Sending and reading data from a socket is no problem, but I need to know how PHPs unpack function works respectively how I can do that in JS with node.js.
So I hope you can see what I'm trying to accomplish here. First and foremost is understanding the whole theory needed to make implementing the protocol a breeze. But because I'm only good with scripting languages it would be incredibly helpful if someone could guide me a bit in the HOWTO part in PHP/JS..
Thank you so much for your time!

I applaud the low level protocol pursuit.
I'll tell you the path I took. My approach was to use the client and server that already spoke the protocol and use libpcap to do analysis. I created a library that was able to unpack the custom protocol I was analyzing during this phase.
Its super helpful to start with diagrams like this one:
From the wiki on TCP. Its an incredibly useful way to visualize the structure of the binary data. Its tightly packed, so slicing it apart requires attention to detail.
Buffers and Binary
I read up on Buffer. Its the way you deal with Binary in node. http://nodejs.org/docs/v0.4.8/api/buffers.html -- the first thing to realize here is that buffers can be accessed bit by bit via array syntax, ie buffer[0] and such.
Visualization
Its helpful to be able to dump your binary data into a hex representation. I used https://github.com/a2800276/hexy.js to achieve this.
node_pcap
I grabbed https://github.com/mranney/node_pcap -- this is the equivalent to wireshark, but you can programmatically poke at all outgoing and incoming traffic. I added udp payload support: https://github.com/jmoyers/node_pcap/commit/2852a8123486339aa495ede524427f6e5302326d
I read through all mranney's "unpack" code https://github.com/mranney/node_pcap/blob/master/pcap.js#L116-171
I found https://github.com/rmustacc/node-ctype
I read through all their "unpack" code https://github.com/rmustacc/node-ctype/blob/master/ctio.js
Now, things to remember when you're looking through this stuff. Most of the time they're taking a binary Buffer representation and converting to a native javascript type, like say Number or String. They'll use advanced techniques to do so -- bitwise operations like shifts and such. You don't necessarily need to understand all that.
The key things are:
1) endianness -- the ordering of bits (network and host byte order can be reverse from each other) as this pertains to how things are unpacked
2) Javascript Number representation is quirky -- node-ctype goes into detail in the comments about how they convert the various number types in javascript's Number. Integer, float, double etc are all Number in javascript land.
In the end, its likely fine if you just USE these unpackers for your adventures. I ended up having to unpack things that weren't covered in these libraries, like GUIDs and such, and it was tremendously helpful to study the source.
Isolate the traffic you're looking at
Filter, filter, filter. Target one host. Target one direction. Target one message type. Focus on stripping off data that has a known fixed length first -- often times the header in a protocol is a good place to start. Once you get the header unpacking into a nice json structure from binary, you are well on your way.
After that, its one field at a time, top to bottom, one message at a time. You can use Buffer#slice and the unpack functions from node-ctype to grab each piece of data at a time.

Lightweight Rules Engine in Javascript [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I am looking for suggestions for a lightweight rules engine implemented in Javascript.
The reason for such an implementation is to build a very lightweight but fast browser-based simulation using a small set of rules (less than 20). The simulation would take half a dozen parameters and run the rules and display results in the browser without any need to go back to the server. Think of a UI with a couple radio buttons, checkboxes, text boxes and sliders to control the parameters. The simulation would quickly re-run based on any parameter change.

I've implemented a (more complicated) version of what you are describing in c#, and thinking back through the code, all of it would be doable with JavaScript. I agree with the comments posted that writing your own is a viable option. It can be as simple or complex as you want it to be.
General observations for this type of rules engine (in no particular order):
Non-linear lookups are your friend. In JavaScript, this would be easy using the obj[key] = val syntax. Once you determine the output of a rule for a given set of parameters, cache its results so that you can use it again without executing the rule again.
Determine whether or not you need to process unique combinations of inputs. For example, let's say you allow the user to enter multiple names and ask for suggestions on XYZ. In reality, you now need to run all rules against each input value. This may be irrelevant, simple, or immensely complicated (imagine a hotel reservation system that takes multiple dates, times, locations, and criteria, and makes suggestions).
setTimeout() can be used to smooth out UI behavior, but the rules you describe should execute in a few milliseconds or less, so worry about performance last. Performance is less of a concern than you might think with a basic rules engine.
Rule definitions will be easiest to manipulate if they are objects (or even simple object trees).
Don't tie UI elements to output results; meaning, put the results of the rule execution into a flexible object list so that you can create whatever visual output you want from it.
Customized output messages are very useful to a user. Meaning, rather than triggering a generic message when a condition is met, try inserting a real value into the output message like, "Your credit score is only 550. You need a minimum of a 600 to continue."
That's it off the top of my head. Good luck.

Checkout the nools rule engine implemented in pure JavaScript for node.js. It has a pretty straightforward syntax for rules definitions.

Rule Reactor (https://github.com/anywhichway/rule-reactor) is a light weight, fast, expressive forward chaining business rule engine leveraging JavaScript internals, lazy cross-products, and Functions as objects rather than Rete. It can be used in the browser or on the server.

This is very simple rule engine, which use server side javascript(Mozilla's Rhino engine) (maybe it will be helpfully to you)
http://jubyrajan.blogspot.com/2010/04/implementing-simple-deterministic-rule.html

I've made an example html / javascript rule engine for a product configurator. The rule engine is based on if then statements. The if then statements will be checked with an array. This array is filled with all possible options every time an options is changed.
Check out my blog for the example.
Link to my blog "Forward chaining javascript rule engine"
I think the "obj[key] = val" is the key to a javascript rule engine. Jquery helps with javascript handling.

Please check out (JSL) https://www.npmjs.com/package/lib-jsl.
From the overview document, JSL is a JSON based logic programming library meant for embedded use in JS programs. It uses JSON as its syntax as well as I/O method, and provides callbacks into the host environment for performance optimisation.

Creating and parsing huge strings with javascript?

I have a simple piece of data that I'm storing on a server, as a plain string. It is kind of ridiculous, but it looks like this:
name|date|grade|description|name|date|grade|description|repeat for a long time
this string can be up to 1.4mb in size. The idea is that it's a bunch of student records, just strung together with a simple pipe delimeter. It's a very poor serialization method.
Once this massive string is pushed to the client, it is split along the pipes into student records again, using javascript.
I've been timing how long it takes to create, and split, these strings on the client side. The times are actually quite good, the slowest run I've seen on a few different machines is 0.2 seconds for 10,000 'student records', which has a final string size of ~1.4mb.
I realize this is quite bizarre, just wondering if there are any inherent problems with creating and splitting such large strings using javascript? I don't know how different browsers implement their javascript engines. I've tried this on the 'major' browsers, but don't know how this would perform on earlier versions of each.
Yeah looking for any comments on this, this is more for fun than anything else!
Thanks

String splitting for 1.4mb data is not a problem for decent machines, instead you should worry about the internet connection speed of your users. I've tried to do spell check with 800 kb dictionary (which is half of your data), main issue was loading time.
But looks like your students records data could be put in database, and might not need to load everything at loading time, So, how about do a pagination to show user records or use ajax to request to search certain user names?

If it's a really large string it may pay to continuously slice the string with 'string'.slice(from, to) to only process a smaller subset, appending all of the individual items to the end of the output with list.push() or something similar might work.
String split methods are probably the most efficient way of doing this though, even in IE. Processing individual characters using string.charAt(x) is extremely slow and will often show a security error as it stalls the browser. Using string split methods would certainly be much faster than splitting using regular expressions.
It may also be possible to encode the data using a JSON array, some newer browsers such as IE8/Webkit/FF3.5 have fast JSON parsing built in using JSON.parse(data). But using eval(JSON) may overflow the browser if there's enough data, so is probably a bad idea. It may pay to compare for performance though.
A much better approach in a lot of cases is to use AJAX and only load some of the data at once from the server, which would also save download time.

Besides S. Mark's excellent comments about local vs. x-fer speed and the tip to re-encode using AJAX, I suggest a (longterm) move away from JavaScript in the Browser (assuming that's were it runs) to either a non-browser implementation of JS (or possibly another language).
A browser based JS seems a week link in a data-x-fer chain and nothing I would want to run unmonitored, since the browsers are upgraded from time to time and breaking your JS-x-fer might be an unanticipates side effect!

Develop Reference

JavaScript is the programming language of the Web.