kafka node js client compression issue with snappy

kafka node js client compression issue with snappy - javascript

I am using kafka-node (https://github.com/SOHU-Co/kafka-node) consumer to retrieve data. I think the data which I get is compressed with SNAPPY. How do I decompress the data after I get it. I tried using node-snappy (https://github.com/kesla/node-snappy) to decompress the data, but it didn't work.
Is there any option in the library to set the compression to none?
Anyone used kafka-node library to get data from kafka..??
Thanks,
chandu

I also encountered these exact problems. I found a solution, at last! You can use kafkacat ('like netcat for kafka') download here, which requires librdkafka. This enables you to interact with Kafka from the command line using the librdkafka C/C++ library. (no JVM required =D )
Now that we have those dependencies taken care of we can use this fun node.js repo: node-kafkacat
You'll find that there's likely enough documentation between those three libraries to get you started and unlike some of the other kafka-node modules on github, they seem to have been updated fairly recently.
I have only successfully installed on Linux and Mac so far but things are working great with our Apache/Java environment. I'm not the author of any of these packages, btw - just some guy who kept hoping your question would be answered over the last couple weeks.

By default, compression is determined by the producer through the configuration property 'compression.type'. Currently gzip, snappy and lz4 are supported. kafka-node should automatically uncompress both gzip and snappy. I just tried it with snappy and this works out of the box. lz4 appears not to be implemented by kafka-node at this point.
Compression can be configured using the configuration property 'compression.type':
Per broker: https://kafka.apache.org/documentation/#brokerconfigs. Default: producer determines compression
Per topic: https://kafka.apache.org/documentation/#brokerconfigs (scroll further down). Default: producer determines compression
Per producer: https://kafka.apache.org/documentation/#producerconfigs. Default: none
If you can't influence the producer, you could therefore consider overriding the topic configuration.
Also, note that kafka-node may not directly return the data as you expect. For instance, my (key, value) pair is of type (integer, integer), and I have to call both message.key.readIntBE() and message.value.readIntBE() to extract the integer values.
Hope this helps!

Related

PBKDF2-SHA256 encryption in JavaScript [duplicate]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 10 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I am writing a login for a forum, and need to hash the password client side in javascript before sending it on to the server. I'm having trouble figuring out which SHA-256 implementation I can actually trust. I was expecting there to be some kind of authoritative script that everyone used, but I'm finding loads of different projects all with their own implementations.
I realize using other people's crypto is always a leap of faith unless you're qualified to review it yourself, and that there is no universal definition of "trustworthy", but this seems like something common and important enough that there ought to be some kind of consensus on what to use. Am I just naive?
Edit since it comes up a lot in the comments: Yes, we do a more stringent hash again on the server side. The client side hashing is not the final result that we save in the database. The client side hashing is because the human client requests it. They have not given a specific reason why, probably they just like overkill.

On https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/digest I found this snippet that uses internal js module:
async function sha256(message) {
// encode as UTF-8
const msgBuffer = new TextEncoder().encode(message);
// hash the message
const hashBuffer = await crypto.subtle.digest('SHA-256', msgBuffer);
// convert ArrayBuffer to Array
const hashArray = Array.from(new Uint8Array(hashBuffer));
// convert bytes to hex string
const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
return hashHex;
}
Note that crypto.subtle in only available on https or localhost - for example for your local development with python3 -m http.server you need to add this line to your /etc/hosts:
0.0.0.0 localhost
Reboot - and you can open localhost:8000 with working crypto.subtle.

OUTDATED: Many modern browsers now have first-class support for crypto operations. See Vitaly Zdanevich's answer below.
The Stanford JS Crypto Library contains an implementation of SHA-256. While crypto in JS isn't really as well-vetted an endeavor as other implementation platforms, this one is at least partially developed by, and to a certain extent sponsored by, Dan Boneh, who is a well-established and trusted name in cryptography, and means that the project has some oversight by someone who actually knows what he's doing. The project is also supported by the NSF.
It's worth pointing out, however...
... that if you hash the password client-side before submitting it, then the hash is the password, and the original password becomes irrelevant. An attacker needs only to intercept the hash in order to impersonate the user, and if that hash is stored unmodified on the server, then the server is storing the true password (the hash) in plain-text.
So your security is now worse because you decided add your own improvements to what was previously a trusted scheme.

For those interested, this is code for creating SHA-256 hash using sjcl:
import sjcl from 'sjcl'
const myString = 'Hello'
const myBitArray = sjcl.hash.sha256.hash(myString)
const myHash = sjcl.codec.hex.fromBits(myBitArray)

Forge's SHA-256 implementation is fast and reliable.
To run tests on several SHA-256 JavaScript implementations, go to http://brillout.github.io/test-javascript-hash-implementations/.
The results on my machine suggests forge to be the fastest implementation and also considerably faster than the Stanford Javascript Crypto Library (sjcl) mentioned in the accepted answer.
Forge is 256 KB big, but extracting the SHA-256 related code reduces the size to 4.5 KB, see https://github.com/brillout/forge-sha256

No, there's no way to use browser JavaScript to improve password security. I highly recommend you read this article. In your case, the biggest problem is the chicken-egg problem:
What's the "chicken-egg problem" with delivering Javascript cryptography?
If you don't trust the network to deliver a password, or, worse, don't trust the server not to keep user secrets, you can't trust them to deliver security code. The same attacker who was sniffing passwords or reading diaries before you introduce crypto is simply hijacking crypto code after you do.
[...]
Why can't I use TLS/SSL to deliver the Javascript crypto code?
You can. It's harder than it sounds, but you safely transmit Javascript crypto to a browser using SSL. The problem is, having established a secure channel with SSL, you no longer need Javascript cryptography; you have "real" cryptography.
Which leads to this:
The problem with running crypto code in Javascript is that practically any function that the crypto depends on could be overridden silently by any piece of content used to build the hosting page. Crypto security could be undone early in the process (by generating bogus random numbers, or by tampering with constants and parameters used by algorithms), or later (by spiriting key material back to an attacker), or --- in the most likely scenario --- by bypassing the crypto entirely.
There is no reliable way for any piece of Javascript code to verify its execution environment. Javascript crypto code can't ask, "am I really dealing with a random number generator, or with some facsimile of one provided by an attacker?" And it certainly can't assert "nobody is allowed to do anything with this crypto secret except in ways that I, the author, approve of". These are two properties that often are provided in other environments that use crypto, and they're impossible in Javascript.
Basically the problem is this:
Your clients don't trust your servers, so they want to add extra security code.
That security code is delivered by your servers (the ones they don't trust).
Or alternatively,
Your clients don't trust SSL, so they want you use extra security code.
That security code is delivered via SSL.
Note: Also, SHA-256 isn't suitable for this, since it's so easy to brute force unsalted non-iterated passwords. If you decide to do this anyway, look for an implementation of bcrypt, scrypt or PBKDF2.

I found this implementation very easy to use. Also has a generous BSD-style license:
jsSHA: https://github.com/Caligatio/jsSHA
I needed a quick way to get the hex-string representation of a SHA-256 hash. It only took 3 lines:
var sha256 = new jsSHA('SHA-256', 'TEXT');
sha256.update(some_string_variable_to_hash);
var hash = sha256.getHash("HEX");

It is possible to use CryptoJS - https://www.npmjs.com/package/crypto-js
import sha256 from 'crypto-js/sha256'
const hash = sha256('Text')

Besides the Stanford lib that tylerl mentioned. I found jsrsasign very useful (Github repo here:https://github.com/kjur/jsrsasign). I don't know how exactly trustworthy it is, but i've used its API of SHA256, Base64, RSA, x509 etc. and it works pretty well. In fact, it includes the Stanford lib as well.
If all you want to do is SHA256, jsrsasign might be a overkill. But if you have other needs in the related area, I feel it's a good fit.

js-sha256 is an npm package you can use, and unlike the popular crypto.subtle which only works on secure connections(localhost/https) it can work regardless. Of course having a secure connection is still the best. I was using crypto.subtle and it always worked because I run my webapp using localhost and it failed the instant I tried it on a server. I had to switch to the js-sha256 npm package as a temporary solution until a secure connection can be configured.

ethers.js has a SHA256 (https://docs.ethers.io/v5/api/utils/hashing/)
const { ethers } = require('ethers');
ethers.utils.sha256(ethers.utils.toUtf8Bytes('txt'));

Custom Stream Write Length Using Nodejs

This may already be answered somewhere on the site, but if it is I couldn't find it. I also couldn't find an exact answer to my question (or at least couldn't make sense of how to implement a solution) based on any of the official Node.js documentation.
Question: Is it possible to customize the length (in bytes) of each disk write that occurs while piping the input of a readable stream into a file?
I will be uploading large files (~50Gb) and it's possible that there could be many clients doing so at the same time. In order to accomplish this I'll be slicing files at the client side and then uploading a chunk at a time. Ideally I want physical writes to disk on the server side to occur in 1Mb portions - but is this possible? And if it is possible then how can it be implemented?

You will probably use a WriteStream. While it is not documented in the fs api, any Writable does take a highWaterMark option for when to flush its buffer. See also the details on buffering.
So it's just
var writeToDisk = fs.createWriteStream(path, {highWaterMark: 1024*1024});
req.pipe(writeToDisk);
Disclaimer: I would not believe in cargo-cult "server-friendly" chunk sizes. I'd go with the default (which is 16kb), and when performance becomes a problem test other sizes to find the optimal value for the current setup.

Choosing an appropriate compression scheme for data transfer over JSON

After some comments by David, I've decided to revise my question. The original question can be found below as well as the newly revised question. I'm leaving the original question simply to have a history as to why this question was started.
Original Question (Setting LZMA properties for jslzma)
I've got some large json files I need to transfer with ajax. I'm currently using jQuery and $.getJSON(). I'd like to use the jslzma library to decompress the files upon receiving them. Currently, I'm using django with the pylzma library to compress the files.
The only problem is that there's a lack of documentation for the jslzma library. There is some, but not enough. So I have two questions about how to use the library.
It gives this as an example:
LZMA.decompress(properties, inStream, outStream, outSize);
I know how to set the inStream and outStream variables, but not the properties or the outSize. So can anyone give an example(s) on how to set the properties variable (ie. what's expected) and how to calculate the outSize...
Thanks.
Edit #1 (Revised Question)
I'm looking for a compression scheme that lends itself to highly repeatable data using python (django) and javascript.
The data being transferred contains elevation measurements. Each file has 1200x1200 data points, which equates to about 2.75MB in it's raw binary form uncompressed. JSON balloons it to between 5-6MB. I've also looked into base64 (just to cover all the bases), which would reduce the size but I haven't had any success reading it in js. I think the data lends itself to easy compression just because of the highly repeatable data values. For example, one file only has 83 unique elevation values to describe 1440000 data points.
I just haven't had much luck, mainly because I'm just starting to learn JavaScript.
So can anyone suggest a compression scheme for this type of data? The goal is to minimize the transfer time by reducing the size for the data.
Thanks.

For what it's worth LZMA is typically very slow to compress as well as decompress; and thus it is more common to use bit faster compression schemes. Standard GZIP (deflate) has reasonably good balance: its compression ratio is acceptable, and its compression speed is MUCH better than that of LZMA or bzip2.
Also: most web servers and clients support automatic handling of gzip compression, which makes it even more convenient to use.

Decompression on the client side with Javacscript can take a significant longer time and highly depends on the available bandwidth of the client's box. Why not just implement a lesser but faster and easier to write decompression like rle, delta or golomb code? Or maybe you want to look into compressed Jsons?

Delta encoding for JSON objects [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
Is there a standard library or tool out there for computing and applying differences to JSON documents? Basically I have a bunch of largish documents that I want to keep synchronized across a network, and I would prefer to avoid having to resend their entire state each time that I want to synchronize them (since many of these variables aren't going to change). In other words, I only want to transmit the fields which changed, not retransmit the entire object. I would think that it would be convenient to have something like the following set of methods:
//Start with two distinct objects on the server
// prev represents a copy of the state of the object on the client
// next represents a copy of the state of the object on the server
//
//1. Compute a patch
patch = computePatch(prev, next);
//2. Send patch over the network
//3. Apply the patch on the client
applyPatch(prev, patch);
//Final invariant:
// prev represents an equivalent object to JSON.parse(JSON.stringify(next))
I could certainly implement one myself, but there are quite a few edge cases that need to be considered. Here are some of the straightforward (though somewhat unsatisfactory) methods that I can think of such as:
Roll my own JSON patcher. Asymptotically, this is probably the best way to go, since it would be possible to support all the relevant features of JSON documents, along with supporting some specialized methods for doing stuff like diffing ints, doubles and strings (using relative encoding/edit distance). However, JSON has a lot of special cases and I am a bit leery of trying to do this without a lot of testing, and so I would much prefer to find something that already solves this problem for me so that I can trust it, and not have to worry about network Heisenbugs showing up due to mistakes in my JSON patching
Just compute the edit distance directly between the JSON strings using dynamic programming. Unfortunately, this doesn't work if the client and server have different JSON implementations (ie the order of their fields could be serialized differently), and it is also pretty expensive being a quadratic time operation.
Use protocol buffers. Protocol buffers have a built in diff method which does exactly what I want, and they are a nice binary-serializable network friendly format. Unfortunately, because they are also strictly typed, they lack many of the advantages of using JSON such as the ability to dynamically add and remove fields. Right now this is the approach I am currently leaning towards, but it could make future maintenance really horrible as I would need to continually update each of my objects.
Do something really nasty, like make a custom protocol for each type of object, and hope that I get it right in both places (yeah right!).
Of course what I am really hoping for is for someone here on stackoverflow to come through and save the day with a reference to a space efficient javascript object differ/patcher that has been well tested in production environments and across multiple browsers.
*Update*
I started writing my own patcher, an early version of it is available at github here:
https://github.com/mikolalysenko/patcher.js
I guess since there doesn't seem to be much out here, I will instead accept as an alternative answer a list of interesting test cases for a JSON patcher.

I've been mantaining a json diff & patch library at github (yes, shameless plug):
https://github.com/benjamine/JsonDiffPatch
it handles long strings automatically using Neil Fraser's diff_match_patch lib.
it works both on browsers and server (unit tests running on both env).
(full feature list is on project page)
The only thing you probably would need, that's not implemented is the option to inject custom diff/patch functions for specific objects, but that doesn't sound hard to add, you're welcome to fork it, and even better send a pull request.
Regards,

The JSON-patch standard has been updated.
https://datatracker.ietf.org/doc/html/draft-ietf-appsawg-json-patch-10
You can find an implementation for applying patches and generating patches at https://github.com/Starcounter-Jack/Fast-JSON-Patch

I came across this question searching for implementations of json-patch. If you are rolling your own you might want to base it on this draft.
https://datatracker.ietf.org/doc/html/draft-pbryan-json-patch-00

Use JSON Patch which is the standard way to do this.
JSON Patch is a format for describing changes to a
JSON document. It can be used to avoid sending a whole document when
only a part has changed. When used in combination with the HTTP PATCH
method it allows partial updates for HTTP APIs in a standards
compliant way.
The patch documents are themselves JSON documents.
JSON Patch is specified in RFC 6902 from the IETF.
Libraries exist for most platforms and programming languages.
At the time of writing, Javascript, Python, PHP, Ruby, Perl, C, Java, C#, Go, Haskell and Erlang are supported (full list and libraries here).
Here is a list for javascript
Fast-JSON-Patch both diffs and patches, 509,361 weekly downloads on NPM
jiff both diffs and patches, 5,075 weekly downloads on npm
jsonpatch-js only applies patches, 2,014 weekly downloads in npm
jsonpatch.js only applies patches, 1,470 weekly downloads in npm
JSON8 Patch both diffs and patches, 400 weekly downloads on npm
By far, everybody (myself included), is using the Fast-JSON-Patch library. It works in NodeJS and in the browser.

C.S. Basics: Understanding Data Packets, Protocols, Wireshark

The Quest
I'm trying to talk to a SRCDS Server from node.js via the RCON Protocol.
The RCON Protocol seems to be explained enough, implementations can be found on the bottom of the site in every major programming language. Using those is simple enough, but understanding the protocol and develop a JS library is what I set out to do.
Background
Being a self taught programmer, I skipped a lot of Computer Science Basics - learned only what I needed, to accomplish what I wanted. I started coding with PHP, eventually wrapped my head around OO, talked to databases etc. I'm currently programming with JavaScript, more specifically doing web stuff with node.js ..
Binary Data?!?!
I've read and understood the absolute binary basics. But when it comes to the packet data I'm totally lost. I'd like to read and understand the wireshark output, but I can't make any sense if it. My biggest problem is probably that I don't understand what the binary representation of the various INT and STRING (char ..) from JS look like and how I convert from data I got from the server to something usable in the program.
Help
So I'd be more than grateful if someone can point me to a tutorial on these topics. Tutorial as in "explanation that mere mortals can understand, preferably not written by a C.S. professor". :)
When I'm looking at the PHP reference implementation I see (too much) magic happening there which I can't translate to JS. Sending and reading data from a socket is no problem, but I need to know how PHPs unpack function works respectively how I can do that in JS with node.js.
So I hope you can see what I'm trying to accomplish here. First and foremost is understanding the whole theory needed to make implementing the protocol a breeze. But because I'm only good with scripting languages it would be incredibly helpful if someone could guide me a bit in the HOWTO part in PHP/JS..
Thank you so much for your time!

I applaud the low level protocol pursuit.
I'll tell you the path I took. My approach was to use the client and server that already spoke the protocol and use libpcap to do analysis. I created a library that was able to unpack the custom protocol I was analyzing during this phase.
Its super helpful to start with diagrams like this one:
From the wiki on TCP. Its an incredibly useful way to visualize the structure of the binary data. Its tightly packed, so slicing it apart requires attention to detail.
Buffers and Binary
I read up on Buffer. Its the way you deal with Binary in node. http://nodejs.org/docs/v0.4.8/api/buffers.html -- the first thing to realize here is that buffers can be accessed bit by bit via array syntax, ie buffer[0] and such.
Visualization
Its helpful to be able to dump your binary data into a hex representation. I used https://github.com/a2800276/hexy.js to achieve this.
node_pcap
I grabbed https://github.com/mranney/node_pcap -- this is the equivalent to wireshark, but you can programmatically poke at all outgoing and incoming traffic. I added udp payload support: https://github.com/jmoyers/node_pcap/commit/2852a8123486339aa495ede524427f6e5302326d
I read through all mranney's "unpack" code https://github.com/mranney/node_pcap/blob/master/pcap.js#L116-171
I found https://github.com/rmustacc/node-ctype
I read through all their "unpack" code https://github.com/rmustacc/node-ctype/blob/master/ctio.js
Now, things to remember when you're looking through this stuff. Most of the time they're taking a binary Buffer representation and converting to a native javascript type, like say Number or String. They'll use advanced techniques to do so -- bitwise operations like shifts and such. You don't necessarily need to understand all that.
The key things are:
1) endianness -- the ordering of bits (network and host byte order can be reverse from each other) as this pertains to how things are unpacked
2) Javascript Number representation is quirky -- node-ctype goes into detail in the comments about how they convert the various number types in javascript's Number. Integer, float, double etc are all Number in javascript land.
In the end, its likely fine if you just USE these unpackers for your adventures. I ended up having to unpack things that weren't covered in these libraries, like GUIDs and such, and it was tremendously helpful to study the source.
Isolate the traffic you're looking at
Filter, filter, filter. Target one host. Target one direction. Target one message type. Focus on stripping off data that has a known fixed length first -- often times the header in a protocol is a good place to start. Once you get the header unpacking into a nice json structure from binary, you are well on your way.
After that, its one field at a time, top to bottom, one message at a time. You can use Buffer#slice and the unpack functions from node-ctype to grab each piece of data at a time.

Develop Reference

JavaScript is the programming language of the Web.