Is there a way to use non-octal literals in a string?

Is there a way to use non-octal literals in a string? - javascript

I have to filter out characters in a form. Thus I have implemented a filtering-out algorithm that works quite well and makes use of different filters (variables) according to different contexts; I have to make extended use of accented letters too.
Example:
gFilterALPHA1="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'-–àâäéèêëîïôöùüûÀÂÄÉÈÊËÎIÔÖÙÛÜæÆœŒçÇ ";
Strangely enough, letters é (e acute) or è (e grave) are taken into account (seen as such), while others such as à (a grave) are not. I found the solution is using octal litterals — for instance \340 or \371 for a grave or u grave respectively.
Q1. Any clue about why é (e acute) is succesfully parsed straightforwardly while other accented letters are not?
Q2. Since writing a long string of octal literals is both cumbersome and error-prone when one wants to check or add values, does anyone have a better idea or know of a workaround?
Thanks.
OK, here is the code thg435 thinks it useful to take a look at.
function jFiltre_Champ(event, NomDuFiltre)
{
var LeChamp=event.target.value; // value est de type ARRAY
switch (NomDuFiltre)
{
case "NUM1":
LeFiltre=gFiltreNUM1;
Msg=gMessageNUM1;
break;
case "ALPHA1":
LeFiltre=gFiltreALPHA1;
Msg=gMessageALPHA1;
break;
case "DATE1":
LeFiltre=gFiltreDATE1;
Msg=gMessageDATE1;
break;
case "ALPHANUM1":
LeFiltre=gFiltreALPHANUM1;
Msg=gMessageALPHANUM1;
break;
case "ALPHANUM2":
LeFiltre=gFiltreALPHANUM2;
Msg=gMessageALPHANUM2;
break;
}
Longueur=LeFiltre.length;
for (i=0; i<LeChamp.length; i++)
{
leCar = LeChamp.charAt(i);
for (j = 0; j < Longueur; j++)
{
if (leCar==LeFiltre.charAt(j)) break;
}
if (j==Longueur)
{
alert(Msg);
/*Cf doc. pour l'algorithme de la méthode slice*/
document.getElementById(event.target.id).value=event.target.value.slice("0", i);
break;
}
}
}
Here is a English-style version: (regarding (2))
function jform_input_filter(event, filterName)
{
var current_input = event.target.value; // the value is an array
switch (filterName)
{
case "NUM1":
current_filter = gFilterNUM1;
Msg = gMessageNUM1;
break;
case "ALPHA1":
current_filter = gFilterALPHA1;
Msg = gMessageALPHA1;
break;
case "DATE1":
current_filter = gFilterDATE1;
Msg = gMessageDATE1;
break;
case "ALPHANUM1":
current_filter = gFilterALPHANUM1;
Msg = gMessageALPHANUM1;
break;
case "ALPHANUM2":
current_filter = gFilterALPHANUM2;
Msg = gMessageALPHANUM2;
break;
}
length = current_filter.length;
for (i = 0; i < current_input.length; i++)
{
leCar = current_input.charAt(i);
for (j = 0; j < length; j++)
{
if (leCar==current_filter.charAt(j)) break;
}
if (j == length)
{
alert(Msg);
/*Cf doc. pour l'algorithme de la méthode slice*/
document.getElementById(event.target.id).value=event.target.value.slice("0", i);
break;
}
}
Comments:
Personally I should not think this code useful to give an answer to the original question;
variables and comments are in French, which may render it difficult to read for some — sorry about that;
this function is associated to an 'onchange' event from within a HTML form;
'g' variables (e.g. gFiltreALPHANUM2) are broad-scope vectors defined elsewhere in the same .js file so that they are accessible to the function.

Bergi is probably right: your file is probably saved or delivered with the wrong encoding. Consider UTF-8 as a well supported encoding for the Unicode character set. To test this idea, you can temporarily adjust your script to output the a-with-acute-accent into the page, whether in a field or as a text node. Use the verbatim character in your string literal, not its octal escape code. If it comes out garbled, then the character didn't make it in its pristine form into the browser and you've got an encoding problem.
If the encoding problem is confirmed, you'll need to save your file correctly, or adjust the response character encoding, which depends on your particular web server. You can find the current encoding as delivered by your web server by using Fiddler and inspecting the Content-Type response header. If the web server already thinks your file is in the right encoding (preferably, as indicated, UTF-8), then check your text editor to make sure it saves the JavaScript file in the same exact encoding.
I'm writing this as an answer because I don't think I can comment directly on the question.

Related

How can I compare "M" and "Ｍ" (in UTF) using Javascript?

I have a situation where I have to search a grid if it contains a certain substring. I have a search bar where the user can type the string. The problem is that the grid contains mix of Japanese text and Unicode characters,
for example : ＭＡＧシンチ注　３３３ＭＢｑ .
How can I compare for content equality the letter 'M' that I type from the keyboard and the letter "Ｍ" as in the example above? I am trying to do this using plain Javascript and not Jquery or other library. And I have to do this in Internet Explorer.
Thanks,

As mentioned in an insightful comment from #Rhymoid on the question, modern JavaScript (ES2015) includes support for normalization of Unicode. One mode of normalization is to map "compatible" letterforms from higher code pages down to their most basic representatives in lower code pages (to summarize, it's kind-of involved). The .normalize("NFKD") method will map the "M" from the Japanese code page down to the Latin equivalent. Thus
"ＭＡＧシンチ注　３３３ＭＢｑ".normalize("NFKD")
will give
"MAGシンチ注 333MBq"
As of late 2016, .normalize() isn't supported by IE.
At a lower level, ES2015 also has .codePointAt() (mentioned in another good answer), which is like the older .charCodeAt() described below but which also understands UTF-16 pairs. However, .codePointAt() is (again, late 2016) not supported by Safari.
below is original answer for older browsers
You can use the .charCodeAt() method to examine the UTF-16 character codes in the string.
"M".charCodeAt(0)
is 77, while
"Ｍ".charCodeAt(0)
is 65325.
This approach is complicated by the fact that for some Unicode characters, the UTF-16 representation involves two separate character positions in the JavaScript string. The language does not provide native support for dealing with that, so you have to do it yourself. A character code between 55926 and 57343 (D800 and DFFF hex) indicates the start of a two-character pair. The UTF-16 Wikipedia page has more information, and there are various other sources.

Building a dictionary should work in any browser, find the charCodes at the start of ranges you want to transform then move the characters in your favourite way, for example
function shift65248(str) {
var dict = {}, characters = [],
character, i;
for (i = 0; i < 10; ++i) { // 0 - 9
character = String.fromCharCode(65296 + i);
dict[character] = String.fromCharCode(48 + i);
characters.push(character);
}
for (i = 0; i < 26; ++i) { // A - Z
character = String.fromCharCode(65313 + i);
dict[character] = String.fromCharCode(65 + i);
characters.push(character);
}
for (i = 0; i < 26; ++i) { // a - z
character = String.fromCharCode(65313 + i);
dict[character] = String.fromCharCode(97 + i);
characters.push(character);
}
return str.replace(
new RegExp(characters.join('|'), 'g'),
function (m) {return dict[m];}
);
}
shift65248('ＭＡＧシンチ注　３３３ＭＢｑ'); // "MAGシンチ注　333MBｑ"
I tried just moving the whole range 65248..65375 onto 0..127 but it conflicted with the other characters :(

I am assuming that you have access to those strings, by reading the DOM for some other way.
If so, codePointAt will be your friend.
console.log("Test of values");
console.log("M".codePointAt(0));
console.log("Ｍ".codePointAt(0));
console.log("Determining end of string");
console.log("Ｍ".codePointAt(10));
var str = "ＭＡＧシンチ注　３３３ＭＢｑ .";
var idx = 0;
do {
point = str.codePointAt(idx);
idx++;
console.log(point);
} while(point !== undefined);

You could try building your own dictionary and compare function as follows:
var compareDB = {
'm' : ['Ｍ'],
'b' : ['Ｂ']
};
function doCompare(inputChar, searchText){
inputCharLower = inputChar.toLowerCase();
searchTextLower = searchText.toLowerCase();
if(searchTextLower.indexOf(inputChar) > -1)
return true;
if(compareDB[inputCharLower] !== undefined)
{
for(i=0; i<compareDB[inputCharLower].length; i++){
if(searchTextLower.indexOf(compareDB[inputCharLower][i].toLowerCase()) > -1)
return true;
}
}
return false;
}
console.log("searching with m");
console.log(doCompare('m', "searching text with Ｍ"));
console.log("searching with m");
console.log(doCompare('m', "searching text with Ｂ"));
console.log("searching with B");
console.log(doCompare('B', "searching text with Ｂ"));

passing values into functions in javascript

Not sure why the code below is not working. It should take in a string and convert a G to a C and an A to a T and vice versa. However, it collects the input string but doesn't provide any output i.e. the alert just says "here is your reverse complement DNA"
var dnaSequence = prompt("Enter your DNA sequence here", "");
var newSequence = reverseComplement(dnaSequence);
alert("here is your reverse complemented DNA: " + newSequence);
function reverseComplement(dnaString) {
var reverseC = [];
var dnaArr = dnaString.split('');
for (var i = 0; i < dnaArr.length; i++) {
switch (dnaArr[i]) {
case 'A':
reverseC.push('T');
break;
case 'T':
reverseC.push('A');
break;
case 'C':
reverseC.push('G');
break;
case 'G':
reverseC.push('C');
break;
}
}
// Reverse and rejoin the the string
return reverseC.reverse().join('');
}

It should take in a string and convert a G to a C and an A to a T and vice versa.
Then you don't need the reverse(), because you are pushing in order.
Also, Make sure that you are entering uppercase letters into the prompt.
Else, you can force the uppercase.
This is the code with the two fixes:
function reverseComplement(dnaString) {
var reverseC = [];
var dnaArr = dnaString.toUpperCase().split('');
for (var i = 0; i < dnaArr.length; i++) {
switch (dnaArr[i]) {
case 'A':
reverseC.push('T');
break;
case 'T':
reverseC.push('A');
break;
case 'C':
reverseC.push('G');
break;
case 'G':
reverseC.push('C');
break;
}
}
// Reverse and rejoin the the string
return reverseC.join('');
}
var dnaSequence = prompt("Enter your DNA sequence here", "");
var newSequence = reverseComplement(dnaSequence);
alert("here is your reverse complemented DNA: " + newSequence);

The main lesson you need here is how to test and debug your JavaScript code.
First, get familiar with the JavaScript debugger in your browser. Instead of wondering why your code doesn't work, you can see directly what it is doing. Every modern browser has built-in JavaScript debugging tools; for example here is an introduction to the Chrome DevTools.
Second, when you are testing a function like this, don't use prompt() or alert(). Instead, provide a hard coded input string and use console.log() to display the output in the JavaScript debug console. This way you can run the same test case repeatedly. After you get one test case to work, you can add others.
There are several JavaScript testing frameworks if you want to get fancy, but to start with, simply using a hard coded input and console.log() output plus inspection in the JavaScript debugger is fine.
To make it easy to debug a function when you first write it, add a debugger; statement at the beginning. Then it will stop in the debugger and you can single-step through the code to see which parts of your function actually get executed and what all your variable values are at each step of the way.
For example (since it sounds like you were mistakenly testing with lowercase input), you might do this:
var dnaSequence = 'actg';
var newSequence = reverseComplement(dnaSequence);
console.log(newSequence);
function reverseComplement(dnaString) {
debugger;
var reverseC = [];
var dnaArr = dnaString.split('');
for (var i = 0; i < dnaArr.length; i++) {
switch (dnaArr[i]) {
case 'A':
reverseC.push('T');
break;
case 'T':
reverseC.push('A');
break;
case 'C':
reverseC.push('G');
break;
case 'G':
reverseC.push('C');
break;
}
}
// Reverse and rejoin the the string
return reverseC.reverse().join('');
}
Now, if you have the DevTools open, it will stop in the debugger at the first line of your function. You can single-step through the function to see which of the case statements it actually goes to, and you will see that it doesn't go to any of them. You can also look at the value of dnaArr[i] and see whether it matches any of the case values.

How to compress URL parameters

Say I have a single-page application that uses a third party API for content. The app’s logic is in-browser only; there is no backend I can write to.
To allow deep-linking into the state of the app, I use pushState() to keep track of a few variables that determine the state of the app. (Note that Ubersicht’s public version doesn’t do this yet.)
Variables: repos, labels, milestones, username, show_open (bool), with_comments (bool), and without_comments (bool).
URL format: ?label=label_1,label_2,label_3&repos=repo_1….
Values: the usual suspects. Roughly, [a-zA-Z][a-zA-Z0-9_-], or any boolean indicator.
So far so good.
Now, since the query string can be a bit long and unwieldy and I would like to be able to pass around URLs like http://espy.github.io/ubersicht/?state=SOMOPAQUETOKENTHATLOSSLESSLYDECOMPRESSESINTOTHEORIGINALVALUES#hoodiehq, the shorter the better.
My first attempt was going to be using some zlib-like algorithm for this. Then #flipzagging pointed to antirez/smaz, which looks more suitable for short strings. (JavaScript version here.)
Since = and & are not specifically handled in the Javascript version (see line 9 of the main lib file), we might be able to tweak things a little there.
Furthermore, there is an option for encoding the values in a fixed table. With this option, the order of arguments is pre-defined and all we need to keep track of is the actual value. Example: turn a=hamster&b=cat into 7hamster3cat (length+chars) or hamster|cat (value + |), potentially before the smaz compression.
Is there anything else I should be looking for?

A working solution putting various bits of good (or so I think) ideas together
I did this for fun, mainly because it gave me an opportunity to implement an Huffman encoder in PHP and I could not find a satisfactory existing implementation.
However, this might save you some time if you plan to explore a similar path.
Burrows-Wheeler+move-to-front+Huffman transform
I'm not quite sure BWT would be best suited for your kind of input.
This is no regular text, so recurring patterns would probably not occur as often as in source code or plain English.
Besides, a dynamic Huffman code would have to be passed along with the encoded data which, for very short input strings, would harm the compression gain badly.
I might well be wrong, in which case I would gladly see someone prove me to be.
Anyway, I decided to try another approach.
General principle
1) define a structure for your URL parameters and strip the constant part
for instance, starting from:
repos=aaa,bbb,ccc&
labels=ddd,eee,fff&
milestones=ggg,hhh,iii&
username=kkk&
show_open=0&
show_closed=1&
show_commented=1&
show_uncommented=0
extract:
aaa,bbb,ccc|ddd,eee,fff|ggg,hhh,iii|kkk|0110
where , and | act as string and/or field terminators, while boolean values don't need any.
2) define a static repartition of symbols based on the expected average input and derive a static Huffman code
Since transmitting a dynamic table would take more space than your initial string, I think the only way to achhieve any compression at all is to have a static huffman table.
However, you can use the structure of your data to your advantage to compute reasonable probabilities.
You can start with the repartition of letters in English or other languages and throw in a certain percentage of numbers and other punctuation signs.
Testing with a dynamic Huffman coding, I saw compression rates of 30 to 50%.
This means with a static table you can expect maybe a .6 compression factor (reducing the lenght of your data by 1/3), not much more.
3) convert this binary Huffmann code into something an URI can handle
The 70 regular ASCII 7 bits chars in that list
!'()*-.0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz
would give you an expansion factor of about 30%, practically no better than a base64 encode.
A 30% expansion would ruin the gain from a static Huffman compression, so this is hardly an option!
However, since you control the encoding client and server side, you can use about anything that is not an URI reserved character.
An interesting possiblity would be to complete the above set up to 256 with whatever unicode glyphs, which would allow to encode your binary data with the same number of URI-compliant characters, thus replacing a painful and slow bunch of long integer divisions with a lightning fast table lookup.
Structure description
The codec is meant to be used both client and server side, so it is essential that server and clients share a common data structure definition.
Since the interface is likely to evolve, it seems wise to store a version number for upward compatibility.
The interface definition will use a very minimalistic description language, like so:
v 1 // version number (between 0 and 63)
a en // alphabet used (English)
o 10 // 10% of digits and other punctuation characters
f 1 // 1% of uncompressed "foreign" characters
s 15:3 repos // list of expeced 3 strings of average length 15
s 10:3 labels
s 8:3 milestones
s 10 username // single string of average length 10
b show_open // boolean value
b show_closed
b show_commented
b show_uncommented
Each language supported will have a frequency table for all its used letters
digits and other computerish symbols like -, . or _ will have a global frequency, regardless of languages
separators (, and |) frequencies will be computed according to the number of lists and fields present in the structure.
All other "foreign" characters will be escaped with a specific code and encoded as plain UTF-8.
Implementation
The bidirectional conversion path is as follows:
list of fields <-> UTF-8 data stream <-> huffman codes <-> URI
Here is the main codec
include ('class.huffman.codec.php');
class IRI_prm_codec
{
// available characters for IRI translation
static private $translator = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyzÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöùúûüýþÿĀāĂăĄąĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıĲĳĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňŉŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅ";
const VERSION_LEN = 6; // version number between 0 and 63
// ========================================================================
// constructs an encoder
// ========================================================================
public function __construct ($config)
{
$num_record_terminators = 0;
$num_record_separators = 0;
$num_text_sym = 0;
// parse config file
$lines = file($config, FILE_IGNORE_NEW_LINES|FILE_SKIP_EMPTY_LINES);
foreach ($lines as $line)
{
list ($code, $val) = preg_split('/\s+/', $line, 2);
switch ($code)
{
case 'v': $this->version = intval($val); break;
case 'a': $alphabet = $val; break;
case 'o': $percent_others = $val; break;
case 'f': $percent_foreign = $val; break;
case 'b':
$this->type[$val] = 'b';
break;
case 's':
list ($val, $field) = preg_split('/\s+/u', $val, 2);
#list ($len,$num) = explode (':', $val);
if (!$num) $num=1;
$this->type[$field] = 's';
$num_record_terminators++;
$num_record_separators+=$num-1;
$num_text_sym += $num*$len;
break;
default: throw new Exception ("Invalid config parameter $code");
}
}
// compute symbol frequencies
$total = $num_record_terminators + $num_record_separators + $num_text_sym + 1;
$num_chars = $num_text_sym * (100-($percent_others+$percent_foreign))/100;
$num_sym = $num_text_sym * $percent_others/100;
$num_foreign = $num_text_sym * $percent_foreign/100;
$this->get_frequencies ($alphabet, $num_chars/$total);
$this->set_frequencies (" .-_0123456789", $num_sym/$total);
$this->set_frequencies ("|", $num_record_terminators/$total);
$this->set_frequencies (",", $num_record_separators/$total);
$this->set_frequencies ("\1", $num_foreign/$total);
$this->set_frequencies ("\0", 1/$total);
// create Huffman codec
$this->huffman = new Huffman_codec();
$this->huffman->make_code ($this->frequency);
}
// ------------------------------------------------------------------------
// grab letter frequencies for a given language
// ------------------------------------------------------------------------
private function get_frequencies ($lang, $coef)
{
$coef /= 100;
$frequs = file("$lang.dat", FILE_IGNORE_NEW_LINES|FILE_SKIP_EMPTY_LINES);
foreach ($frequs as $line)
{
$vals = explode (" ", $line);
$this->frequency[$vals[0]] = floatval ($vals[1]) * $coef;
}
}
// ------------------------------------------------------------------------
// set a given frequency for a group of symbols
// ------------------------------------------------------------------------
private function set_frequencies ($symbols, $coef)
{
$coef /= strlen ($symbols);
for ($i = 0 ; $i != strlen($symbols) ; $i++) $this->frequency[$symbols[$i]] = $coef;
}
// ========================================================================
// encodes a parameter block
// ========================================================================
public function encode($input)
{
// get back input values
$bools = '';
foreach (get_object_vars($input) as $prop => $val)
{
if (!isset ($this->type[$prop])) throw new Exception ("unknown property $prop");
switch ($this->type[$prop])
{
case 'b': $bools .= $val ? '1' : '0'; break;
case 's': $strings[] = $val; break;
default: throw new Exception ("Uh oh... type ".$this->type[$prop]." not handled ?!?");
}
}
// set version number and boolean values in front
$prefix = sprintf ("%0".self::VERSION_LEN."b$bools", $this->version);
// pass strings to our Huffman encoder
$strings = implode ("|", $strings);
$huff = $this->huffman->encode ($strings, $prefix, "UTF-8");
// translate into IRI characters
mb_internal_encoding("UTF-8");
$res = '';
for ($i = 0 ; $i != strlen($huff) ; $i++) $res .= mb_substr (self::$translator, ord($huff[$i]), 1);
// done
return $res;
}
// ========================================================================
// decodes an IRI string into a lambda object
// ========================================================================
public function decode($input)
{
// convert IRI characters to binary
mb_internal_encoding("UTF-8");
$raw = '';
$len = mb_strlen ($input);
for ($i = 0 ; $i != $len ; $i++)
{
$c = mb_substr ($input, 0, 1);
$input = mb_substr ($input, 1);
$raw .= chr(mb_strpos (self::$translator, $c));
}
$this->bin = '';
// check version
$version = $this->read_bits ($raw, self::VERSION_LEN);
if ($version != $this->version) throw new Exception ("Version mismatch: expected {$this->version}, found $version");
// read booleans
foreach ($this->type as $field => $type)
if ($type == 'b')
$res->$field = $this->read_bits ($raw, 1) != 0;
// decode strings
$strings = explode ('|', $this->huffman->decode ($raw, $this->bin));
$i = 0;
foreach ($this->type as $field => $type)
if ($type == 's')
$res->$field = $strings[$i++];
// done
return $res;
}
// ------------------------------------------------------------------------
// reads raw bit blocks from a binary string
// ------------------------------------------------------------------------
private function read_bits (&$raw, $len)
{
while (strlen($this->bin) < $len)
{
if ($raw == '') throw new Exception ("premature end of input");
$this->bin .= sprintf ("%08b", ord($raw[0]));
$raw = substr($raw, 1);
}
$res = bindec (substr($this->bin, 0, $len));
$this->bin = substr ($this->bin, $len);
return $res;
}
}
The underlying Huffman codec
include ('class.huffman.dict.php');
class Huffman_codec
{
public $dict = null;
// ========================================================================
// encodes a string in a given string encoding (default: UTF-8)
// ========================================================================
public function encode($input, $prefix='', $encoding="UTF-8")
{
mb_internal_encoding($encoding);
$bin = $prefix;
$res = '';
$input .= "\0";
$len = mb_strlen ($input);
while ($len--)
{
// get next input character
$c = mb_substr ($input, 0, 1);
$input = substr($input, strlen($c)); // avoid playing Schlemiel the painter
// check for foreign characters
if (isset($this->dict->code[$c]))
{
// output huffman code
$bin .= $this->dict->code[$c];
}
else // foreign character
{
// escape sequence
$lc = strlen($c);
$bin .= $this->dict->code["\1"]
. sprintf("%02b", $lc-1); // character length (1 to 4)
// output plain character
for ($i=0 ; $i != $lc ; $i++) $bin .= sprintf("%08b", ord($c[$i]));
}
// convert code to binary
while (strlen($bin) >= 8)
{
$res .= chr(bindec(substr ($bin, 0, 8)));
$bin = substr($bin, 8);
}
}
// output last byte if needed
if (strlen($bin) > 0)
{
$bin .= str_repeat ('0', 8-strlen($bin));
$res .= chr(bindec($bin));
}
// done
return $res;
}
// ========================================================================
// decodes a string (will be in the string encoding used during encoding)
// ========================================================================
public function decode($input, $prefix='')
{
$bin = $prefix;
$res = '';
$len = strlen($input);
for ($i=0 ;;)
{
$c = $this->dict->symbol($bin);
switch ((string)$c)
{
case "\0": // end of input
break 2;
case "\1": // plain character
// get char byte size
if (strlen($bin) < 2)
{
if ($i == $len) throw new Exception ("incomplete escape sequence");
$bin .= sprintf ("%08b", ord($input[$i++]));
}
$lc = 1 + bindec(substr($bin,0,2));
$bin = substr($bin,2);
// get char bytes
while ($lc--)
{
if ($i == $len) throw new Exception ("incomplete escape sequence");
$bin .= sprintf ("%08b", ord($input[$i++]));
$res .= chr(bindec(substr($bin, 0, 8)));
$bin = substr ($bin, 8);
}
break;
case null: // not enough bits do decode further
// get more input
if ($i == $len) throw new Exception ("no end of input mark found");
$bin .= sprintf ("%08b", ord($input[$i++]));
break;
default: // huffman encoded
$res .= $c;
break;
}
}
if (bindec ($bin) != 0) throw new Exception ("trailing bits in input");
return $res;
}
// ========================================================================
// builds a huffman code from an input string or frequency table
// ========================================================================
public function make_code ($input, $encoding="UTF-8")
{
if (is_string ($input))
{
// make dynamic table from the input message
mb_internal_encoding($encoding);
$frequency = array();
while ($input != '')
{
$c = mb_substr ($input, 0, 1);
$input = mb_substr ($input, 1);
if (isset ($frequency[$c])) $frequency[$c]++; else $frequency[$c]=1;
}
$this->dict = new Huffman_dict ($frequency);
}
else // assume $input is an array of symbol-indexed frequencies
{
$this->dict = new Huffman_dict ($input);
}
}
}
And the huffman dictionary
class Huffman_dict
{
public $code = array();
// ========================================================================
// constructs a dictionnary from an array of frequencies indexed by symbols
// ========================================================================
public function __construct ($frequency = array())
{
// add terminator and escape symbols
if (!isset ($frequency["\0"])) $frequency["\0"] = 1e-100;
if (!isset ($frequency["\1"])) $frequency["\1"] = 1e-100;
// sort symbols by increasing frequencies
asort ($frequency);
// create an initial array of (frequency, symbol) pairs
foreach ($frequency as $symbol => $frequence) $occurences[] = array ($frequence, $symbol);
while (count($occurences) > 1)
{
$leaf1 = array_shift($occurences);
$leaf2 = array_shift($occurences);
$occurences[] = array($leaf1[0] + $leaf2[0], array($leaf1, $leaf2));
sort($occurences);
}
$this->tree = $this->build($occurences[0], '');
}
// -----------------------------------------------------------
// recursive build of lookup tree and symbol[code] table
// -----------------------------------------------------------
private function build ($node, $prefix)
{
if (is_array($node[1]))
{
return array (
'0' => $this->build ($node[1][0], $prefix.'0'),
'1' => $this->build ($node[1][1], $prefix.'1'));
}
else
{
$this->code[$node[1]] = $prefix;
return $node[1];
}
}
// ===========================================================
// extracts a symbol from a code stream
// if found : updates code stream and returns symbol
// if not found : returns null and leave stream intact
// ===========================================================
public function symbol(&$code_stream)
{
list ($symbol, $code) = $this->get_symbol ($this->tree, $code_stream);
if ($symbol !== null) $code_stream = $code;
return $symbol;
}
// -----------------------------------------------------------
// recursive search for a symbol from an huffman code
// -----------------------------------------------------------
private function get_symbol ($node, $code)
{
if (is_array($node))
{
if ($code == '') return null;
return $this->get_symbol ($node[$code[0]], substr($code, 1));
}
return array ($node, $code);
}
}
Example
include ('class.iriprm.codec.php');
$iri = new IRI_prm_codec ("config.txt");
foreach (array (
'repos' => "discussion,documentation,hoodie-cli",
'labels' => "enhancement,release-0.3.0,starter",
'milestones' => "1.0.0,1.1.0,v0.7",
'username' => "mklappstuhl",
'show_open' => false,
'show_closed' => true,
'show_commented' => true,
'show_uncommented' => false
) as $prop => $val) $iri_prm->$prop = $val;
$encoded = $iri->encode ($iri_prm);
echo "encoded as $encoded\n";
$decoded = $iri->decode ($encoded);
var_dump($decoded);
output:
encoded as 5ĶůťÊĕCOĔƀŪļŤłmĄZEÇŽÉįóšüÿjħũÅìÇēOĪäŖÏŅíŻÉĒQmìFOyäŖĞqæŠŹōÍĘÆŤŅËĦ
object(stdClass)#7 (8) {
["show_open"]=>
bool(false)
["show_closed"]=>
bool(true)
["show_commented"]=>
bool(true)
["show_uncommented"]=>
bool(false)
["repos"]=>
string(35) "discussion,documentation,hoodie-cli"
["labels"]=>
string(33) "enhancement,release-0.3.0,starter"
["milestones"]=>
string(16) "1.0.0,1.1.0,v0.7"
["username"]=>
string(11) "mklappstuhl"
}
In that example, the input got packed into 64 unicode characters, for an input length of about 100, yielding a 1/3 reduction.
An equivalent string:
discussion,documentation,hoodie-cli|enhancement,release-0.3.0,starter|
1.0.0,1.1.0,v0.7|mklappstuhl|0110
Would be compressed by a dynamic Huffman table to 59 characters. Not much of a difference.
No doubt smart data reordering would reduce that, but then you would need to pass the dynamic table along...
Chinese to the rescue?
Drawing on ttepasse's idea, one could take advantage of the huge number of Asian characters to find a range of 0x4000 (12 bits) contiguous values, to code 3 bytes into 2 CJK characters, like so:
// translate into IRI characters
$res = '';
$len = strlen ($huff);
for ($i = 0 ; $i != $len ; $i++)
{
$byte = ord($huff[$i]);
$quartet[2*$i ] = $byte >> 4;
$quartet[2*$i+1] = $byte &0xF;
}
$len *= 2;
while ($len%3 != 0) $quartet[$len++] = 0;
$len /= 3;
for ($i = 0 ; $i != $len ; $i++)
{
$utf16 = 0x4E00 // CJK page base, enough range for 2**12 (0x4000) values
+ ($quartet[3*$i+0] << 8)
+ ($quartet[3*$i+1] << 4)
+ ($quartet[3*$i+2] << 0);
$c = chr ($utf16 >> 8) . chr ($utf16 & 0xFF);
$res .= $c;
}
$res = mb_convert_encoding ($res, "UTF-8", "UTF-16");
and back:
// convert IRI characters to binary
$input = mb_convert_encoding ($input, "UTF-16", "UTF-8");
$len = strlen ($input)/2;
for ($i = 0 ; $i != $len ; $i++)
{
$val = (ord($input[2*$i ]) << 8) + ord ($input[2*$i+1]) - 0x4E00;
$quartet[3*$i+0] = ($val >> 8) &0xF;
$quartet[3*$i+1] = ($val >> 4) &0xF;
$quartet[3*$i+2] = ($val >> 0) &0xF;
}
$len *= 3;
while ($len %2) $quartet[$len++] = 0;
$len /= 2;
$raw = '';
for ($i = 0 ; $i != $len ; $i++)
{
$raw .= chr (($quartet[2*$i+0] << 4) + $quartet[2*$i+1]);
}
The previous output of 64 Latin chars
5ĶůťÊĕCOĔƀŪļŤłmĄZEÇŽÉįóšüÿjħũÅìÇēOĪäŖÏŅíŻÉĒQmìFOyäŖĞqæŠŹōÍĘÆŤŅËĦ
would "shrink" to 42 Asian characters:
乙堽孴峴勀垧壩坸冫嚘佰嫚凲咩俇噱刵巋娜奾埵峼圔奌夑啝啯嶼勲婒婅凋凋伓傊厷侖咥匄冯塱僌
However, as you can see, the sheer bulk of your average ideogram makes the string actually longer (pixel-wise), so even if the idea was promising, the outcome is rather disappointing.
Picking thinner glyphs
On the other hand, you can try to pick "thin" characters as a base for URI encoding. For instance:
█ᑊᵄ′ӏᶟⱦᵋᵎiïᵃᶾ᛬ţᶫꞌᶩ᠇܂اlᶨᶾᛁ⁚ᵉʇȋʇίן᠙ۃῗᥣᵋĭꞌ៲ᛧ༚ƫܙ۔ˀȷˁʇʹĭ∕ٱ;łᶥյ;ᴶ⁚ĩi⁄ʈ█
instead of
█5ĶůťÊĕCOĔƀŪļŤłmĄZEÇŽÉįóšüÿjħũÅìÇēOĪäŖÏŅíŻÉĒQmìFOyäŖĞqæŠŹōÍĘÆŤŅËĦ█
That will shrink the length by half with proportional fonts, including in a browser address bar.
My best candidate set of 256 "thin" glyphs so far:
᠊།ᑊʲ་༌ᵎᵢᶤᶩᶪᶦᶧˡ ⁄∕เ'Ꞌꞌ꡶ᶥᵗᶵᶨ|¦ǀᴵ  ᐧᶠᶡ༴ˢᶳ⁏ᶴʳʴʵ։᛬⍮ʹ′ ⁚⁝ᵣ⍘༔⍿ᠵᥣᵋᵌᶟᴶǂˀˁˤ༑,.   ∙Ɩ៲᠙ᵉᵊᵓᶜᶝₑₔյⵏⵑ༝༎՛ᵞᵧᚽᛁᛂᛌᛍᛙᛧᶢᶾ৷⍳ɩΐίιϊᵼἰἱἲἳἴἵἶἷὶίῐῑῒΐῖῗ⎰⎱᠆ᶿ՝ᵟᶫᵃᵄᶻᶼₐ∫ª౹᠔/:;\ijltìíîïĩīĭįıĵĺļłţŧſƚƫƭǐǰȉȋțȴȷɉɨɪɫɬɭʇʈʝːˑ˸;·ϳіїјӏ᠇ᴉᵵᵻᶅᶖḭḯḷḹḻḽṫṭṯṱẗẛỉị⁞⎺⎻⎼⎽ⱡⱦ꞉༈ǁ‖༅༚ᵑᵝᵡᵦᵪา᠑⫶ᶞᚁᚆᚋᚐᚕᵒᵔᵕᶱₒⵗˣₓᶹๅʶˠ᛫ᵛᵥᶺᴊ
Conclusion
This implementation should be ported to JavaScript to allow client-server exchange.
You should also provide a way to share the structure and Huffman codes with the clients.
It is not difficult and rather fun to do, but that means even more work :).
The Huffman gain in term of characters is around 30%.
Of course these characters are multibyte for the most part, but if you aim for the shortest URI it does not matter.
Except for the booleans that can easily be packed to 1 bit, those pesky strings seem rather reluctant to be compressed.
It might be possible to better tune the frequencies, but I doubt you will get above 50% compression rate.
On the other hand, picking thin glyphs does actually more to shrink the string.
So all in all the combination of both might indeed achieve something, though it's a lot of work for a modest result.

Just as you yourself propose, I would first get rid of all the characters that are not carrying any information, because they are part of the "format".
E.g. turn "labels=open,ssl,cypher&repository=275643&username=ryanbrg&milestones=&with_comment=yes" to
"open,ssl,cyper|275643|ryanbrg||yes".
Then use a Huffmann encoding with a fixed probability vector (resulting in a fixed mapping from characters to variable length bitstrings - with the most probable characters mapped to shorter bitstrings and less probable characters mapped to longer bitstrings).
You could even use different probability vectors for the different parameters. For example in the parameter "labels" the alpha characters will have high probability, but in the "repository" parameter the numeric characters will have the highest probability. If you do this, you should consider the separator "|" a part of the preceeding parameter.
And finally turn the long bitstring (which is the concatenation all the bitstrings to which the characters were mapped) into something you can put into an URL by base64url encoding it.
If you could send me a set of representative parameter lists, I could run them through a Huffmann coder to see how well they compress.
The probability vector (or equivalently the mapping from characters to bitstrings) should be encoded as constant arrays into the Javascript function that is sent to the browser.
Of course you could go even further and - for example - try to get a list of possible lables with their probabilities. Then you could map entire lables to bitstrings with a Huffmann encoding. This will give you better compression, but you will have extra work for those labels that are new (e.g. falling back to the single character encoding), and of course the mapping (which - as mentioned above - is a constant array in the Javascript function) will be much larger.

Why not using protocol-buffers?
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.
ProtoBuf.js converts objects to protocol buffer messages and vice vera.
The following object converts to: CgFhCgFiCgFjEgFkEgFlEgFmGgFnGgFoGgFpIgNqZ2I=
{
repos : ['a', 'b', 'c'],
labels: ['d', 'e', 'f'],
milestones : ['g', 'h', 'i'],
username : 'jgb'
}
Example
The following example is built using require.js. Give it a try on this jsfiddle.
require.config({
paths : {
'Math/Long' : '//rawgithub.com/dcodeIO/Long.js/master/Long.min',
'ByteBuffer' : '//rawgithub.com/dcodeIO/ByteBuffer.js/master/ByteBuffer.min',
'ProtoBuf' : '//rawgithub.com/dcodeIO/ProtoBuf.js/master/ProtoBuf.min'
}
})
require(['message'], function(message) {
var data = {
repos : ['a', 'b', 'c'],
labels: ['d', 'e', 'f'],
milestones : ['g', 'h', 'i'],
username : 'jgb'
}
var request = new message.arguments(data);
// Convert request data to base64
var base64String = request.toBase64();
console.log(base64String);
// Convert base64 back
var decodedRequest = message.arguments.decode64(base64String);
console.log(decodedRequest);
});
// Protobuf message definition
// Message definition could also be stored in a .proto definition file
// See: https://github.com/dcodeIO/ProtoBuf.js/wiki
define('message', ['ProtoBuf'], function(ProtoBuf) {
var proto = {
package : 'message',
messages : [
{
name : 'arguments',
fields : [
{
rule : 'repeated',
type : 'string',
name : 'repos',
id : 1
},
{
rule : 'repeated',
type : 'string',
name : 'labels',
id : 2
},
{
rule : 'repeated',
type : 'string',
name : 'milestones',
id : 3
},
{
rule : 'required',
type : 'string',
name : 'username',
id : 4
},
{
rule : 'optional',
type : 'bool',
name : 'with_comments',
id : 5
},
{
rule : 'optional',
type : 'bool',
name : 'without_comments',
id : 6
}
],
}
]
};
return ProtoBuf.loadJson(proto).build('message')
});

I have a cunning plan! (And a drink of gin tonic)
You doesn't seem to care about the length of the bytestream but of the length of the resulting glyphs, e.g. what the string which is displayed to the user.
Browser are pretty good in converting an IRI to the underlying [URI][2] while still displaying the IRI in the address bar. IRIs have a greater repertoire of possible characters while your set of possible chars is rather limited.
That means you can encode bigrams of your chars (aa, ab, ac, …, zz & special chars) into one char of the full unicode spectrum. Say you've got 80 possible ASCII chars: the number of possible combinations of two chars is 6400. Which are easy findable in Unicodes assigned chars, e.g. in the han unified CJK spectrum:
aa → 一
ab → 丁
ac → 丂
ad → 七
…
I picked CJK because this is only (slighty) reasonable if the target chars are assigned in unicode and have assigned glyphs on the major browser and operating systems. For that reason the private use area is out and the more efficient version using trigrams (whose possible combinations could use all of Unicodes 1114112 possible code points) are out.
To recap: the underlying bytes are still there and – given UTF-8 encoding – possible even longer, but the string of displayed characters the user sees and copies is 50% shorter.
Ok, Ok, reasons, why this solution is insane:
IRIs are not perfect. A lot of lesser tools than modern browser have their problems.
The algorithm needs obviously a lot of more work. You'll need a function which maps the bigrams to the target chars and back. And it should preferable work arithmetically to avoid big hash tables in memory.
The target chars should be checked if they are assigned and if they are simple chars and not fancy unicodian things like combining chars or stuff that got lost somewhere in Unicode normalization. Also if the target area is an continuous span of assigned chars with glyphs.
Browser are sometimes wary of IRIs. For good reason, given the IDN homograph attacks. Are they OK with all these non-ASCII-chars in their address bar?
And the biggest: people are notoriously bad at remembering characters in scripts they don't know. They are even worse at trying to (re)-type these chars. And copy'n'paste can go wrong in many different clicks. There is a reason URL shorteners use Base64 and even smaller alphabets.
… speaking of which: That would be my solution. Offloading the work of shortening links either to the user or integrating goo.gl or bit.ly via their APIs.

Small tip: Both parseInt and Number#toString support radix arguments. Try using a radix of 36 to encode numbers (or indexes into lists) in URLs.

Update: I released an NPM package with some more optimizations, see https://www.npmjs.com/package/#yaska-eu/jsurl2
Some more tips:
Base64 encodes with a..zA..Z0..9+/=, and un-encoded URI characters are a..zA..Z0..9-_.~. So Base64 results only need to swap +/= for -_. and it won't expand URIs.
You could keep an array of key names, so that objects could be represented with the first character being the offset in the array, e.g. {foo:3,bar:{g:'hi'}} becomes a3,b{c'hi'} given key array ['foo','bar','g']
Interesting libraries:
JSUrl specifically encodes JSON so it can be put in a URL without changes, even though it uses more characters than specified in the RFC. {"name":"John Doe","age":42,"children":["Mary","Bill"]} becomes ~(name~'John*20Doe~age~42~children~(~'Mary~'Bill)) and with a key dictionary ['name','age','children'] that could be ~(0~'John*20Doe~1~42~2~(~'Mary~'Bill)), thus going from 101 bytes URI encoded to 38.
Small footprint, fast, reasonable compression.
lz-string uses an LZW-based algorithm to compress strings to UTF16 for storing in localStorage. It also has a compressToEncodedURIComponent() function to produce URI-safe output.
Still only a few KB of code, pretty fast, good/great compression.
So basically I'd recommend picking one of these two libraries and consider the problem solved.

There are two main aspects to the problem: encoding and compression.
General purpose compression doesn’t seem to work well on small strings. As browsers don’t provide any API to compress strings, you also need to load the source, which can be huge.
But a lot of characters can be saved by using an efficient encoding. I have written a library named μ to handle the encoding and decoding part.
The idea is to specify as much as information available about the structure and domain of the URL parameters as a specification. This specification can be then used to drive the encoding and decoding. For example:
booleans can be encoded using just one bit;
integers can be converted to base64 (thereby reducing the number of characters required);
object keys need not be encoded (because they can be inferred from the specification);
enums can be encoded using log2(numberOfAllowedValues) bits.

Perhaps you can find a url shortener with a jsonp API, that way you could make all the URLs really short automatically.
http://yourls.org/ even has jsonp support.

It looks like the Github APIs have numeric IDs for many things (looks like repos and users have them, but labels don't) under the covers. It might be possible to use those numbers instead of names wherever advantageous. You then have to figure out how to best encode those in something that'll survive in a query string, e.g. something like base64(url).
For example, your hoodie.js repository has ID 4780572.
Packing that into a big-endian unsigned int (as many bytes as we need) gets us \x00H\xf2\x1c.
We'll just toss the leading zero, we can always restore that later, now we have H\xf2\x1c.
Encode as URL-safe base64, and you have SPIc (toss any padding you might get).
Going from hoodiehq/hoodie.js to SPIc seems like a good-sized win!
More generally, if you're willing to invest the time, you can try to exploit a bunch of redudancies in your query strings. Other ideas are along the lines of packing the two boolean params into a single character, possibly along with other state (like what fields are included). If you use base64-encoding (which seems the best option here due to the URL-safe version -- I looked at base85, but it has a bunch of characters that won't survive in a URL), that gets you 6 bits of entropy per character... there's a lot you can do with that.
To add to Thomas Fuchs' note, yes, if there's some kind of inherent, immutable ordering in some of things you're encoding, than that would obviously also help. However, that seems hard for both the labels and the milestones.

Maybe any simple JS minifier will help you. You'll need only to integrate it on serialization and deserialization points only. I think it'd be the easiest solution.

Why not use a third party link shortener?
(I am assuming you don't have a problem with URI length limits since you mentioned this is an existing application.)
It looks like you're writing a Greasemonkey script or thereabouts, so perhaps you have access to GM_xmlhttpRequest(), which would allow use of a third party link shortener.
Otherwise, you'd need to use XMLHttpRequest() and host your own link shortening service on the same server to avoid crossing the same-origin policy boundary. A quick online search for hosting your own shorteners supplied me with a list of 7 free/open source PHP link shortener scripts and one more on GitHub, though the question likely excludes this kind of approach since "The app’s logic is in-browser only, and there is no backend I can write to."
You can see example code implementing this kind of thing in the URL Shortener UserScript (for Greasemonkey), which pops up a shortened version of the current page's URL when you press SHIFT+T.
Of course, shorteners will redirect users to the long form URL, but this would be a problem in any non-server-side solution. At least a shortener can theoretically proxy (like Apache's RewriteRule with [P]) or use a <frame> tag.

Short
Use a URL packing scheme such as my own, starting only from the params section of your URL.
Longer
As other's here have pointed out, typical compression systems don't work for short strings. But, it's important to recognise that URLs and Params are a serialization format of a data model: a text human-readable format with specific sections - we know that the scheme is first, the host is found directly after, the port is implied but can be overridden, etc...
With the underlying conceptual data model, one can serialize with a more bit-efficient serialization scheme. In fact, I have created such a serialization myself which archives around 50% compression: see http://blog.alivate.com.au/packed-url/
Conceptually, my scheme was written with the conceptual data model in mind, it doesn't deserialize the URL into that conceptual model as a distinct step. However, that's possible, and that formal approach might yield greater efficiencies, where the bits don't need to be in the same order as what a string URL might be.

Firefox pref is destroying JSON

I have the following JSON: http://pastebin.com/Sh20StJY
SO removed the chars on my post, so look at the link for the real JSON
which was generated using JSON.stringify and saved on Firefox prefs (pref.setCharPref(prefName, value);)
The problem is that when I save the value, Firefox does something that corrupts the JSON. If I try a JSON.parse retrieving the value from the config I get an error:
Error: JSON.parse: bad control character in string literal
If I try to validate the above JSON (which was retrieved from the settings) I get an error at line 20, the tokens value contains two invalid characters.
If I try a JSON.parse immediately after JSON.stringify the error doesn't occur.
Do I have to set something to save in a different encoding? How can I fix it?

nsIPrefBranch.getCharPref() only works for ASCII data, your JSON data contains some non-ASCII characters however. You can store Unicode data in preferences, it is merely a little bit more complicated:
var str = Components.classes["#mozilla.org/supports-string;1"]
.createInstance(Components.interfaces.nsISupportsString);
str.data = value;
pref.setComplexValue(prefName, Components.interfaces.nsISupportsString, str);
And to read that preference:
var str = pref.getComplexValue(prefName, Components.interfaces.nsISupportsString);
var value = str.data;
For reference: Documentation

Your JSON appears to contain non-ASCII characters such as ½. Can you check what encoding everything is being handled in?
nsIPrefBranch.setCharPref() assumes that its input is UTF-8 encoded, and the return value of nsIPrefBranch.getCharPref() is always an UTF-8 string. If your input is a bytestring or a character in some other encoding, you will either need to switch to UTF-8, or encode and decode it yourself when interacting with preferences.

I did this in one place to fix this issue:
(function overrideJsonParse() {
if (!window.JSON || !window.JSON.parse) {
window.setTimeout(overrideJsonParse, 1);
return; //this code has executed before JSON2.js, try again in a moment
}
var oldParse = window.JSON.parse;
window.JSON.parse = function (s) {
var b = "", i, l = s.length, c;
for (i = 0; i < l; ++i) {
c = s[i];
if (c.charCodeAt(0) >= 32) { b += c; }
}
return oldParse(b);
};
}());
This works in IE8 (using json2 or whatever), IE9, Firefox and Chrome.

The code seems correct. Try use single quotes '..': '...' instead of double quotes "..":"..." .

I still couldn't find the solution, but I found a workaround:
var b = "";
[].forEach.call("{ JSON STRING }", function(c, i) {
if (c.charCodeAt(0) >= 32)
b += c;
});
Now b is the new JSON, and might work...

jQuery $.inArray returns -1 when it should be 0

Hell All,
I have an odd problem:
//dataText hold current language data that's gathered from another function
//pick one to test it out
//if english data gathered
var dataText = ["Data uploads"];
//if french data gathered
var dataText = ["Envois de données"];
function lang_lib(lang) {
var data_fre = [13, 'Envois de données'];
var data_eng = [14, 'Data uploads'];
var data_lang, rep_lang;
switch(lang) {
case "English":
data_lang = data_eng;
data_rep = rep_eng;
break;
case "Français":
data_lang = data_fre;
data_rep = rep_fre;
break;
default:
$('table.infobox tbody').append('<tr><td id="lang-fail"><ul class="first last"><li>User language is not available</li></ul></td></tr>');
};
this.data_uploads = data_lang[1];
}
_lang = new lang_lib($('#toplinks-language').text());
//if lang_lib("English")
alert($.inArray(_lang.data_uploads, dataText)); // 0
//if lang_lib("Français")
alert($.inArray(_lang.data_uploads, dataText)); // -1
I shortened the code but it should give a general idea of what I'm trying to achieve.
I know it seems weird why I would be using the same data in two arrays but the data_fre and data_eng have language specific dataText info plus other language specific data as well. dataText will have non-specific language data which is why I'm testing it agains data_fre or data_eng to find which language to use.
I can't figure out why it would return -1 because I have other languages set (with special character too like Russian text) and they all return 0.
Appreciate the help :)

-1 means false. 0 means 'at position 0'. Without knowing more about the data coming in, I expect it is working properly.

Strings do not match numbers.
Simple Test
var arr = [13, 'Envois de données'];
console.log($.inArray(13,arr)); // 0 - matches as a number
console.log($.inArray("13",arr)); // -1 - matches as a string

Ok I figured out what it was.
I used $.trim() in the function that collects data for dataText. Since I couldn't see any leading or trailing spaces when I would alert() it was confusing me why it wouldn't work.
This explains why $.inArray() wouldn't match "Envois de données" with "Envois de données ".
Thanks again everybody for taking a look :)

Develop Reference

JavaScript is the programming language of the Web.