JavaScript String.split on a string literal to produce an array

JavaScript String.split on a string literal to produce an array - javascript

I've seen a few javascript programmers use this pattern to produce an array:
"test,one,two,three".split(','); // => ["test", "one", "two", "three"]
They're not splitting user input or some variable holding a string value, they're splitting a hard-coded string literal to produce an array. In all of the cases I've seen a line like the above it would seem that it's perfectly reasonable to just use an array literal without relying on split to create an array from a string. Are there any reasons that the above pattern for creating an array makes sense, or is somehow more efficient than simply using an array literal?

When splitting a string at run-time instead of using an array literal, you are trading a small amount of execution time for a small amount of bandwidth.
In most cases I would argue that it is not worth it. If you are minifying and gzipping your code before publishing it, as you should be, using a single comma inside of a string versus a quote-comma-quote from two strings in an array would have little to no impact on bandwidth savings. In fact after minification and gzipping, the version using the split string could possibly be longer due to the addition of the less compressible .split(',').
Splitting a string instead of creating an array literal of strings does mean a little less typing, but we spend more time reading code than writing it. Using the array literal would be more maintainable in the future. If you wanted to add a comma to an item in the array, you just add it as another string in the array literal; using split you would have to re-write the whole string using a different separator.
The only situation that I use split and a string literal to create an array is if I need an array that consists only of single characters, i.e. the alphabet, numbers or alphanumeric characters, etc.
var letters = 'abcdefghijklmnopqrstuvwxyz'.split(''),
numbers = '0123456789'.split(''),
alphanumeric = letters.concat(numbers);
You'll notice that I used concat to create alphanumeric. If I had instead copy-pasted the contents of letters and numbers into one long string this code would compress better. The reason I did not is that that would be a micro-optimization that would hurt future maintainability. If in the future characters with accents, tildes or umlauts need to be added to the list of letters, they can be added in one place; no need to remember to copy-paste them into alphanumeric too.
Splitting a string may be useful for code golf, but in a production environment where minification and gzipping are factors and writing easily readable, maintainable code is important, just using an array literal is almost always the better choice.

For example, in ruby, the array ["test", "one", "two", "three"]
could also wrote as %w(test one two three), which save you some characters to type.
But javascript doesn't support such notation, so someone use split method to achieve it.

If a large number a arrays are built manually. It could potentially reduce the load time of your page since less characters are transmitted. But you would need a large number of arrays or large arrays to have a considerable difference. Performance an array creation might be slower but it is faster to type. For large arrays I use a spreadsheet to apply the formating around each value with a formula like this ="'"&A1&"',". I stick with the array literal.

It makes more sense to use the "split" method when you can't control the output, ie. user input or string output from a method. If you're trying to get a specific value in the output that is separated by something it is easier to use the split method. Of course, if you're controlling the values it doesn't always make sense.

Related

What is the safest/most reliable separator/delimiter to join string in javascript?

I have included special characters like #, |,~ but they also appeared as data in some values which breaks/fails my idea of joining and splitting values later in the code.

There isn't any "safest" and "most reliable" separator to join string in any languages, not just JavaScript.
It depends entirely on your dataset, meaning the "safest" choice will be different for every different set of data.
For example, if your dataset is guaranteed to contain only integers, then any non-numeric characters can be the safest choice.
However, if your dataset is a free text, then there will be no "safest" choice, because even if you choose an arbitrary combination of string as the separator, i.e. %%%, an end-user can still supply that data in a legit sentence like My preferred pronoun is "%%%", albeit highly unlikely. Thus using %%% as separator here would still break your logic.
Because of this, you can only choose a separator that gives you the least risk.
Depending on your use case, there probably are other simpler solutions that does not require separators.

Generally we avoid joining strings if you need to separate them again later, JSON notation from serializing the data is usually a good compromise and has best interoperability.
CSV can work well too, but don't just insert commas, make sure you properly escape the values if they need it.
If JSON or CSV isn't appropriate, then using a sequence of special characters is more likely to be unique, you could use || (double pipe) as that is very unlikely to occur in anything except C based code.
You could use other special characters but I would avoid $ or % as these are commonly used in replacement tokens. Also avoid any form of brackets as they are used for other container based replacement.
A 3 character code using multiple symbols is more unique again |:| just pick something that visually looks like a barrier between values and can't be confused with tokens.

JavaScript String split: fixed-width vs. delimited performance

I am building a string to be parsed into an array by JavaScript. I can make it delimited or I can make the fields fixed-width. To test it, I built this jsperf test using a data string where the fields are both fixed-width and comma-delimited:
https://jsperf.com/string-split-fixed
I have only tested on Windows with Firefox and Chrome, so please run the test from other OSes and browsers. My two test results are clear: String.prototype.split() is the winner by a large margin.
Is my fixed-width code not efficient enough, or is the built-in string split function simply superior? Is there a way to code it so that the fixed-width parsing triumphs? If this was C/C++, the fixed-width code, written properly, would be the clear winner. But I know JavaScript is an entirely different beast.

String.prototype.split() is a built-in JavaScript function. Expect it to be highly optimized for the particular JS engine and be written not in JavaScript but in C++.
It should thus not come a surprise that you can't match its performance with pure JavaScript code.
String operations like splitting a delimited string are inherently memory-bound. Hence, knowing the location of delimiters doesn't really help much, since the entire string still needs to be traversed at least once (to copy the delimited fragments). Fixed-position splitting might be faster for strings that exceed D-cache size, but your string is just 13KB long, so traversing it multiple times isn't going to matter.

'indexOf' method performance optimization

According to Is JavaScript a pass-by-reference or pass-by-value language? JavaScript passes string objects by value. Because of that, calling the indexOf method will trigger copying the content.
In my case, I am parsing a big string to look for data. I heavily use indexOf to find data. String can big as long as 100-200KB and I could need to call indexOf up to 1000 times per full scan.
I'm afraid this will cause polluting 'memory' with unnecessarily copied string and could impact performance.
Am I correct in my conclusion? If so, what is the proper way to deal with my task?
Potentially, I could use regular expressions, but at the moment that looks too complex.

Strings are a strange beast in Javascript, they seem to inhabit a middle ground between primitive types and objects. While technically they're considered primitive, they can in many circumstances be treated as if they were a reference, due to their immutability.
Given that they are immutable, I'd be very surprised if a copy of the string was passed to any function, since that would be both hideously expensive and totally unnecessary.
A seemingly easy way to find out would be to pass a string to a function and change one of its characters to see if that's reflected in the original string on return. However, as mentioned, the immutability of strings makes that impossible.
It may be possible to test this in an indirect manner by executing indexOf("a") against one of two string in a loop of many million times.
The string to search would either be "a" or "a very long string containing many thousands of characters ...".
If the strings are passed by reference, there should be no noticeable difference in the times. Passing by value should be detectable since you have to copy the string millions of times and a short string should copy faster than a long one.
But, as I said, it's probably unnecessary since the engine will most likely be as optimised as possible.

.indexOf() is merely a search of the string and just returns a numeric index into the string. There is no need to make a copy and no copy is made. This doesn't really have anything to do with value/reference in Javascript. The operation is merely a search that returns an index. There is simply no need to make a copy.
Strings in Javascript are immutable. That means they can never be changed and these indexes into the string always point at the same place in the string. Any operation that operates on a string to make a change, returns a new string leaving the old one.
This allows for some interesting optimizations in the implementation. Because a string is immutable, it can be shared among all points of code that have a reference to it. Whenever anyone calls a function to modify the string, it simply returns a new string object created from the old one plus the modification.
If you were to use the index from .indexOf() with .slice() or something like that, then you would be copying part of the original string into a new string object (likely using some additional memory).
If you want to test this for yourself, feel free to run as many .indexOf() operations as you want on a large string and watch memory usage.

Can we compare two JavaScript files using C#?

I have tried comparing two text files. If these contain the same data but there is a difference of even one space the result is showing as ‘different’.
Can anyone tell me how to compare two JavaScript files using C#?

Since JavaScript is whitespace tolerant (tolerates any amount of whitespace as long as the syntax is correct), the simplest thing to do if you want to compare everything but the whitespace is to regex-replace:
Regex _r = new Regex(#"\s+", RegexOptions.Compiled);
string result = _r.Replace(value, " ");
Run this on both files and compare the results; it replaces any sequence of standard whitespace characters (space, tab, carriage return, vertical tab etc.) with a single space. You can then compare with Equals (case sensitive or not, as you require).
Of course, whitespace IS significant inside strings, so this assumes the string handling in all the compared files does not rely on whitespace too much.
However two very different code files can have the same effects, so if that's what you're after you have a hard job ahead of you.

Do you just need to know if they are exactly the same? If so you could just load them into memory and compare the .length() property...

Technically, if one file contains an extra space they aren't "the same". I would first compare the lengths and if those don't match you'll need to do a byte by byte comparison. If you want to remove extra spaces you'll probably want to do something like a Trim() on the contents of both files first.
Here's a link to an old MS post describing how to create a file compare function:
http://support.microsoft.com/kb/320348

Javascript client-data compression

I am trying to develop a paint brush application thru processingjs.
This API has function loadPixels() that will load the RGB values in to the array.
Now i want to store the array in the server db.
The problem is the size of the array, when i convert to a string the size is 5 MB.
Is the best solution is to do compression at javascript level? How to do it?

See http://rosettacode.org/wiki/LZW_compression#JavaScript for an LZW compression example. It works best on longer strings with repeated patterns.
From the Wikipedia article on LZW:
A dictionary is initialized to contain
the single-character strings
corresponding to all the possible
input characters (and nothing else
except the clear and stop codes if
they're being used). The algorithm
works by scanning through the input
string for successively longer
substrings until it finds one that is
not in the dictionary. When such a
string is found, the index for the
string less the last character (i.e.,
the longest substring that is in the
dictionary) is retrieved from the
dictionary and sent to output, and the
new string (including the last
character) is added to the dictionary
with the next available code. The last
input character is then used as the
next starting point to scan for
substrings.
In this way, successively longer
strings are registered in the
dictionary and made available for
subsequent encoding as single output
values. The algorithm works best on
data with repeated patterns, so the
initial parts of a message will see
little compression. As the message
grows, however, the compression ratio
tends asymptotically to the
maximum.

JavaScript implementation of Gzip has a couple answers that are relevant.
Also, Javascript LZW and Huffman Coding with PHP and JavaScript are other implementations I found.

Develop Reference

JavaScript is the programming language of the Web.