Chinese Sorting by Pinyin in Javascript with localeCompare? - javascript

I am facing quite a challenge here. I am to sort certain Chinese "expressions" by pinyin.
The question:
How could I sort by pinyin in Firefox?
Is there a way to sort properly in IE 9 and 10? (They are also to be supported by the website)
Example:
财经传讯公司
财经顾问
房地产及按揭
According to a translator agency, this is what the sort order of the words should be. The translations are as follows:
Financial communication agencies
Financial consultancies
Real estate and mortgages
The pronanciations in latin alphabet:
cai jing chuan xun gong si
cai jing gu wen
fang di chan ji an jie
String.localeCompare:
MDN Docs
From what I understand I am to provide a 2nd argument to the String.localeCompare method that "tells" the method to sort by pinyin in BCP 47 format which should be zh-CN-u-co-pinyin.
So the full code should look like this:
var arr = [ "财经传讯公司", "财经顾问", "房地产及按揭"];
console.dir(arr.sort(function(a, b){
return a.localeCompare(b, [ "zh-CN-u-co-pinyin" ]);
}));
jsFiddle working example
I expected this to log to console the expressions in the order I entered them in the array but the output differs.
On FX 27, the order is: 3, 1, 2
In Chrome 33: 1, 2, 3
In IE 11: 1, 2, 3
Note:
Pinyin is the official phonetic system for transcribing the Mandarin
pronunciations of Chinese characters into the Latin alphabet.

This works on Chrome:
const arr = ["博","啊","吃","世","中","超"]
arr.sort((x,y)=>x.localeCompare(y, 'zh-CN'))

In general, people will use the following method for Chinese characters pinyin sort
var list=[' king ', 'a', 'li'];
list.Sort(function (a, b) {return a.localeCompare(b); });
localeCompare () : with local specific order to compare two strings.
This approach to pinyin sort is unreliable.
Second way: very dependent on Chinese operating system
Is very dependent on the browser kernel that is to say, if your site visitors are through the Chinese system, or the Internet explorer browser (Chrome), then he will probably unable to see the pinyin sort the result we expected.
Here I'll introduce my solution to this problem, hope to be able to derive somehow:
this method supports the Unicode character set x4e00 from 0 to 0 x9fa5 area a total of 20902 consecutive from China (including Taiwan), Japan, South Korea, Chinese characters, namely, CJK (Chinese Japanese Korean) characters.
var CompareStrings={.........}
getOrderedUnicode: function (char) {
var originalUnicode=char.charCodeAt ();
if (originalUnicode >=0 x4e00 && originalUnicode <=0 x9fa5) {
var index=this.Db.IndexOf (char);
if (index >1) {
return index + 0 x4e00;
}}
return originalUnicode;
},
compare: function (a, b) {
if (a==b) {return 0; }
//here can be rewritten according to the specific needs and the writing is the empty string at the bottom the if (a.length==0) {return 1; }
if (b.length==0) {return - 1; }
var count=a.length >B.length? B.length: a.length;
for (var i=0; i<count; i++) {
var au=this.GetOrderedUnicode (a [i]);
var bu=this.GetOrderedUnicode [i] (b);
if (au >bu) {
return 1;
} else if (au <bu) {
return - 1;
}}
return a.length >B.length? 1:1;
}}
//rewriting system native localeCompare
The prototype:
LocaleCompare = function (param) {
return CompareStrings.compare said (enclosing the toString (), param);
}
you can through the links below to download the complete code
A brief introduction of the principle of implementation:
According to pinyin sort good character (db) : there are multiple ways to achieve a goal, I am done with JavaScript + c# combination, use the script first put all the enumeration of Chinese characters, and then submitted to the c #good background sort, and output to the front desk, this is just the preparation, what all can.
Identify two characters who is bigger (getOrderedUnicode) : because when ordering, not only to deal with Chinese characters, and Chinese characters outside of the characters, so the comparator must be able to identify all of the characters, we here by judging whether a character is to discriminate Chinese characters: if it is Chinese characters, then the sort good word library search index, the index value plus the Unicode character set the location of the first Chinese characters, is after the "calibration" of the Unicode character set of the index value; If not Chinese characters, then return it directly on the index value of the Unicode character set.
Compare two strings (compare) : by comparing two each of the characters (within the effective range comparison, that is, the shorter the length of the string), if you find a greater than b, it returns 1, vice return 1.
Within the effective range after the comparison if haven't the tie, just see who is longer, such as a='123', b='1234', so long b to row in the back.
EDIT
You can also use JQuery plugin:
jQuery.extend( jQuery.fn.dataTableExt.oSort, {
"chinese-string-asc" : function (s1, s2) {
return s1.localeCompare(s2);
},
"chinese-string-desc" : function (s1, s2) {
return s2.localeCompare(s1);
}
} );
See the original post.

According to MDN, locales and options arguments in localeCompare() have been added in Firefox 29. You should be able to sort by pinyin now.

Here is a solution:
<!--
pinyin_dict_notone.js and pinyinUtil.js is available in URL below:
https://github.com/sxei/pinyinjs
-->
<script src="pinyin_dict_notone.js"></script>
<script src="pinyinUtil.js"></script>
<script>
jQuery.extend(jQuery.fn.dataTableExt.oSort, {
"chinese-string-asc": function(s1, s2) {
s1 = pinyinUtil.getPinyin(s1);
s2 = pinyinUtil.getPinyin(s2);
return s1.localeCompare(s2);
},
"chinese-string-desc": function(s1, s2) {
s1 = pinyinUtil.getPinyin(s1);
s2 = pinyinUtil.getPinyin(s2);
return s2.localeCompare(s1);
}
});
jQuery(document).ready(function() {
jQuery('#mydatatable').dataTable({
"columnDefs": [
{ type: 'chinese-string', targets: 0 }
]
});
});
</script>

Related

How can I compare "M" and "M" (in UTF) using Javascript?

I have a situation where I have to search a grid if it contains a certain substring. I have a search bar where the user can type the string. The problem is that the grid contains mix of Japanese text and Unicode characters,
for example : MAGシンチ注 333MBq .
How can I compare for content equality the letter 'M' that I type from the keyboard and the letter "M" as in the example above? I am trying to do this using plain Javascript and not Jquery or other library. And I have to do this in Internet Explorer.
Thanks,
As mentioned in an insightful comment from #Rhymoid on the question, modern JavaScript (ES2015) includes support for normalization of Unicode. One mode of normalization is to map "compatible" letterforms from higher code pages down to their most basic representatives in lower code pages (to summarize, it's kind-of involved). The .normalize("NFKD") method will map the "M" from the Japanese code page down to the Latin equivalent. Thus
"MAGシンチ注 333MBq".normalize("NFKD")
will give
"MAGシンチ注 333MBq"
As of late 2016, .normalize() isn't supported by IE.
At a lower level, ES2015 also has .codePointAt() (mentioned in another good answer), which is like the older .charCodeAt() described below but which also understands UTF-16 pairs. However, .codePointAt() is (again, late 2016) not supported by Safari.
below is original answer for older browsers
You can use the .charCodeAt() method to examine the UTF-16 character codes in the string.
"M".charCodeAt(0)
is 77, while
"M".charCodeAt(0)
is 65325.
This approach is complicated by the fact that for some Unicode characters, the UTF-16 representation involves two separate character positions in the JavaScript string. The language does not provide native support for dealing with that, so you have to do it yourself. A character code between 55926 and 57343 (D800 and DFFF hex) indicates the start of a two-character pair. The UTF-16 Wikipedia page has more information, and there are various other sources.
Building a dictionary should work in any browser, find the charCodes at the start of ranges you want to transform then move the characters in your favourite way, for example
function shift65248(str) {
var dict = {}, characters = [],
character, i;
for (i = 0; i < 10; ++i) { // 0 - 9
character = String.fromCharCode(65296 + i);
dict[character] = String.fromCharCode(48 + i);
characters.push(character);
}
for (i = 0; i < 26; ++i) { // A - Z
character = String.fromCharCode(65313 + i);
dict[character] = String.fromCharCode(65 + i);
characters.push(character);
}
for (i = 0; i < 26; ++i) { // a - z
character = String.fromCharCode(65313 + i);
dict[character] = String.fromCharCode(97 + i);
characters.push(character);
}
return str.replace(
new RegExp(characters.join('|'), 'g'),
function (m) {return dict[m];}
);
}
shift65248('MAGシンチ注 333MBq'); // "MAGシンチ注 333MBq"
I tried just moving the whole range 65248..65375 onto 0..127 but it conflicted with the other characters :(
I am assuming that you have access to those strings, by reading the DOM for some other way.
If so, codePointAt will be your friend.
console.log("Test of values");
console.log("M".codePointAt(0));
console.log("M".codePointAt(0));
console.log("Determining end of string");
console.log("M".codePointAt(10));
var str = "MAGシンチ注 333MBq .";
var idx = 0;
do {
point = str.codePointAt(idx);
idx++;
console.log(point);
} while(point !== undefined);
You could try building your own dictionary and compare function as follows:
var compareDB = {
'm' : ['M'],
'b' : ['B']
};
function doCompare(inputChar, searchText){
inputCharLower = inputChar.toLowerCase();
searchTextLower = searchText.toLowerCase();
if(searchTextLower.indexOf(inputChar) > -1)
return true;
if(compareDB[inputCharLower] !== undefined)
{
for(i=0; i<compareDB[inputCharLower].length; i++){
if(searchTextLower.indexOf(compareDB[inputCharLower][i].toLowerCase()) > -1)
return true;
}
}
return false;
}
console.log("searching with m");
console.log(doCompare('m', "searching text with M"));
console.log("searching with m");
console.log(doCompare('m', "searching text with B"));
console.log("searching with B");
console.log(doCompare('B', "searching text with B"));

Searching for multiple partial phrases so that one original phrase can not match multiple search phrases

Given a predefined set of phrases, I'd like to perform a search based on user's query. For example, consider the following set of phrases:
index phrase
-----------------------------------------
0 Stack Overflow
1 Math Overflow
2 Super User
3 Webmasters
4 Electrical Engineering
5 Programming Jokes
6 Programming Puzzles
7 Geographic Information Systems
The expected behaviour is:
query result
------------------------------------------------------------------------
s Stack Overflow, Super User, Geographic Information Systems
web Webmasters
over Stack Overflow, Math Overflow
super u Super User
user s Super User
e e Electrical Engineering
p Programming Jokes, Programming Puzzles
p p Programming Puzzles
To implement this behaviour I used a trie. Every node in the trie has an array of indices (empty initially).
To insert a phrase to the trie, I first break it to words. For example, Programming Puzzles has index = 6. Therefore, I add 6 to all the following nodes:
p
pr
pro
prog
progr
progra
program
programm
programmi
programmin
programming
pu
puz
puzz
puzzl
puzzle
puzzles
The problem is, when I search for the query prog p, I first get a list of indices for prog which is [5, 6]. Then, I get a list of indices for p which is [5, 6] as well. Finally, I calculate the intersection between the two, and return the result [5, 6], which is obviously wrong (should be [6]).
How would you fix this?
Key Observation
We can use the fact that two words in a query can match the same word in a phrase only if one query word is a prefix of the other query word (or if they are same). So if we process the query words in descending lexicographic order (prefixes come after their "superwords"), then we can safely remove words from the phrases at the first match. Doing so we left no possibility to match the same phrase word twice. As I said, it is safe because prefixes match superset of phrase words what their "superwords" can match, and pair of query words, where one is not a prefix of the other, always match disjoint set of phrase words.
We don't have to remove words from phrases or the trie "physically", we can do it "virtually".
Implementation of the Algorithm
var PhraseSearch = function () {
var Trie = function () {
this.phraseWordCount = {};
this.children = {};
};
Trie.prototype.addPhraseWord = function (phrase, word) {
if (word !== '') {
var first = word.charAt(0);
if (!this.children.hasOwnProperty(first)) {
this.children[first] = new Trie();
}
var rest = word.substring(1);
this.children[first].addPhraseWord(phrase, rest);
}
if (!this.phraseWordCount.hasOwnProperty(phrase)) {
this.phraseWordCount[phrase] = 0;
}
this.phraseWordCount[phrase]++;
};
Trie.prototype.getPhraseWordCount = function (prefix) {
if (prefix !== '') {
var first = prefix.charAt(0);
if (this.children.hasOwnProperty(first)) {
var rest = prefix.substring(1);
return this.children[first].getPhraseWordCount(rest);
} else {
return {};
}
} else {
return this.phraseWordCount;
}
}
this.trie = new Trie();
}
PhraseSearch.prototype.addPhrase = function (phrase) {
var words = phrase.trim().toLowerCase().split(/\s+/);
words.forEach(function (word) {
this.trie.addPhraseWord(phrase, word);
}, this);
}
PhraseSearch.prototype.search = function (query) {
var answer = {};
var phraseWordCount = this.trie.getPhraseWordCount('');
for (var phrase in phraseWordCount) {
if (phraseWordCount.hasOwnProperty(phrase)) {
answer[phrase] = true;
}
}
var prefixes = query.trim().toLowerCase().split(/\s+/);
prefixes.sort();
prefixes.reverse();
var prevPrefix = '';
var superprefixCount = 0;
prefixes.forEach(function (prefix) {
if (prevPrefix.indexOf(prefix) !== 0) {
superprefixCount = 0;
}
phraseWordCount = this.trie.getPhraseWordCount(prefix);
function phraseMatchedWordCount(phrase) {
return phraseWordCount.hasOwnProperty(phrase) ? phraseWordCount[phrase] - superprefixCount : 0;
}
for (var phrase in answer) {
if (answer.hasOwnProperty(phrase) && phraseMatchedWordCount(phrase) < 1) {
delete answer[phrase];
}
}
prevPrefix = prefix;
superprefixCount++;
}, this);
return Object.keys(answer);
}
function test() {
var phraseSearch = new PhraseSearch();
var phrases = [
'Stack Overflow',
'Math Overflow',
'Super User',
'Webmasters',
'Electrical Engineering',
'Programming Jokes',
'Programming Puzzles',
'Geographic Information Systems'
];
phrases.forEach(phraseSearch.addPhrase, phraseSearch);
var queries = {
's': 'Stack Overflow, Super User, Geographic Information Systems',
'web': 'Webmasters',
'over': 'Stack Overflow, Math Overflow',
'super u': 'Super User',
'user s': 'Super User',
'e e': 'Electrical Engineering',
'p': 'Programming Jokes, Programming Puzzles',
'p p': 'Programming Puzzles'
};
for(var query in queries) {
if (queries.hasOwnProperty(query)) {
var expected = queries[query];
var actual = phraseSearch.search(query).join(', ');
console.log('query: ' + query);
console.log('expected: ' + expected);
console.log('actual: ' + actual);
}
}
}
One can test this code here: http://ideone.com/RJgj6p
Possible Optimizations
Storing the phrase word count in each trie node is not very memory
efficient. But by implementing compressed trie it is possible to
reduce the worst case memory complexity to O(n m), there n is the
number of different words in all the phrases, and m is the total
number of phrases.
For simplicity I initialize answer by adding all the phrases. But
a more time efficient approach is to initialize answer by adding
the phrases matched by the query word matching least number of
phrases. Then intersect with the phrases of the query word matching
second least number of phrases. And so on...
Relevant Differences from the Implementation Referenced in the Question
In trie node I store not only the phrase references (ids) matched by the subtrie, but also the number of matched words in these phrases. So, the result of the match is not only the matched phrase references, but also the number of matched words in these phrases.
I process query words in descending lexicographic order.
I subtract the number of superprefixes (query words of which the current query word is a prefix) from current match results (by using variable superprefixCount), and a phrase is considered matched by the current query word only when the resulting number of matched words in it is greater than zero. As in the original implementation, the final result is the intersection of the matched phrases.
As one can see, changes are minimal and asymptotic complexities (both time and memory) are not changed.
If the set of phrases is defined and does not contain long phrases, maybe you can create not 1 trie, but n tries, where n is the maximum number of words in one phrase.
In i-th trie store i-th word of the phrase. Let's call it the trie with label 'i'.
To process query with m words let's consider the following algorithm:
For each phrase we will store the lowest label of a trie, where the word from this phrase was found. Let's denote it as d[j], where j is the phrase index. At first for each phrase j, d[j] = -1.
Search the first word in each of n tries.
For each phrase j find the label of a trie that is greater than d[j] and where the word from this phrase was found. If there are several such labels, pick the smallest one. Let's denote such label as c[j].
If there is no such index, this phrase can not be matched. You can mark this case with d[j] = n + 1.
If there is such c[j] that c[j] > d[j], than assign d[j] = c[j].
Repeat for every word left.
Every phrase with -1 < d[j] < n is matched.
This is not very optimal. To improve performance you should store only usable values of d array. After first word, store only phrases, matched with this word. Also, instead of assignment d[j] = n + 1, delete index j. Process only already stored phrase indexes.
You can solve it as a Graph Matching Problem in a Bipartite Graph.
For each document, query pair define the graph:
G=(V,E) Where
V = {t1 | for each term t1 in the query} U { t2 | for each term t2 in the document}
E = { (t1,t2) | t1 is a match for t2 }
Intuitively: you have a vertex for each term in the query, a vertex for each term in the document, and an edge between a document term and a query term, only if the query term matches the document term. You have already solved this part with your trie.
You got yourself a bipartite graph, there are only edges between the "query vertices" and the "document vertices" (and not between two vertices of the same type).
Now, invoke a matching problem for bipartite graph, and get an optimal matching {(t1_1,t2_1), ... , (t1_k,t2_k)}.
Your algorithm should return a document d for a query q with m terms in the query, if (and only if) all m terms are satisfied, which means - you have maximal matching where k=m.
In your example, the graph for query="prog p", and document="Programming Jokes", you will get the bipartite graph with the matching: (or with Programming,p matched - doesn't matter which)
And, for the same query, and document="Programming Puzzles", you will get the bipartite graph with the matching:
As you can see, for the first example - there is no matching that covers all the terms, and you will "reject" the document. For the 2nd example - you were able to match all terms, and you will return it.
For performance issues, you can do the suggested algorithm only on a subset of the phrases, that were already filtered out by your initial approach (intersection of documents that have matching for all terms).
After some thought I came up with a similar idea to dened's - in addition to the index of a matched phrase, each prefix will refer to how many words it is a prefix of in that phrase - then that number can be reduced in the query process by the number of its superfixes among other query words, and the returned results include only those with at least the same number of matched words as the query.
We can implement an additional small tweak to avoid large cross-checks by adding (for the English language) a maximum of approximately 26 choose 2 + 26 choose 3 and even an additional 26 choose 4 special elements to the trie that refer to ordered first-letter intersections. When a phrase is inserted, the special elements in the trie referring to the 2 and 3 first-letter combinations will receive its index. Then match results from larger query words can be cross-checked against these. For example, if our query is "Geo i", the match results for "Geo" would be cross-checked against the special trie element, "g-i", which hopefully would have significantly less match results than "i".
Also, depending on the specific circumstances, large cross-checks could at times be more efficiently handled in parallel (for example, via a bitset &).

My JavaScript doesn't work with chrome

I have a javascript which work correctly in Firefox, but it doesn't work in chrome!
The code is two array list that is supposed to sort a list of movies in alphabetical order.
One list from A to Z and one list with Z to A.
Any ideas what i have done wrong?
Thanks alot
var Movies = ["Pulp Fiction", "The Dark Knight", "Fight Club", " Terminator", "Matrix", "American History X", "Memento "];
function listMovies()
{
document.writeln("<b>Movies sort from A to B:</b>");
document.writeln(Movies.sort().join("<br>"));
document.writeln("<br><b>Movies sort from B to A:</b>");
document.writeln(Movies.sort(function(a, b){return b-a}).join("<br>"));
}
listMovies();
Two issues:
You have a space in front of Terminator which is throwing off the results.
Your "B to A" function is using the - operator on strings, which is not going to give you reliable results.
To solve #2, use a function that doesn't try to subtract strings:
document.writeln(Movies.sort(function(a, b){
return a == b ? 0 : (a < b ? 1 : -1)
}).join("<br>"));
Live Example
Side note: I'd avoid using document.writeln and similar.
First, you probably shouldn't be using document.writeLn. It's almost never necessary. Use DOM methods instead.
The problem, however, is that your code is trying to subtract one string from another in calculating the relative positions:
Movies.sort(function(a, b){return b-a})
This works fine with numbers (4 - 2 === 2) but not with strings ('Terminator' - 'American History' is NaN).
You need to compare with < and > instead, which makes your function a little more complex:
function listMovies()
{
document.writeLn("<b>Movies sort from A to B:</b>");
document.writeLn(Movies.sort().join("<br>"));
document.writeLn("<br><b>Movies sort from B to A:</b>");
document.writeLn(Movies.sort(function(a, b){
if (b === a) return 0;
if (b > a) return 1;
return -1;
}).join("<br>"));
}
The other oddity you have is that Terminator has a leading space, which causes it to go the beginning of the list alphabetically.
What about this DEMO ?
var Movies = ["Pulp Fiction", "The Dark Knight", "Fight Club", " Terminator", "Matrix", "American History X", "Memento "];
function listMovies(){
for(var i=0;i<Movies.length;i++){
Movies[i] = Movies[i].trim();
}
document.writeln("<b>Movies sort from A to B:</b>");
document.writeln(Movies.sort().join("<br>"));
document.writeln("<br><b>Movies sort from B to A:</b>");
document.writeln(Movies.sort(function(a, b){return b-a}).join("<br>"));
}
listMovies();

I need a JavaScript procedure that reverses the following procedure

I have the following function that encrypts a string and I was hoping for a function that reverses the process.
function encryptStr(thisString)
{
retString = "";
/* Make retString a string of the 8-bit representations of
the ASCII values of its thisCharacters in order.
EXAMPLE: "abc" --> "011000010110001001100011"
since the ASCII values for 'a', 'b' and 'c'
are 97=01100001, 98=01100010 and 99=01100011
respectively
*/
for (i = 0, j = thisString.length; i < j; i++)
{
bits = thisString.charCodeAt(i).toString(2);
retString += new Array(8-bits.length+1).join('0') + bits;
}
/* Compress retString by taking each substring of 3, 4, ..., 9
consecutive 1's or 0's and it by the number of such consecutive
thisCharacters followed by the thisCharacter.
EXAMPLES:
"10101000010111" --> "10101401031"
"001100011111111111111" --> "0011319151"
*/
retString = retString.replace(/([01])\1{2,8}/g, function($0, $1) { return ($0.length + $1);});
return retString;
}
I tried to make a function and I'm probably doing it wrong because it's 50 lines already. I'm realizing that there's tons of error checking that needs to go on. For instance, I just realized a potential problem because JavaScript characters don't span the entire 127 ASCII values. Should I just give up? Is this a futile problem?
First, find the numbers in the string which are not 0 or 1. Then, expand them in the opposite way that the original function collapsed them. You can again use String.prototype.replace() here with a replacement function...
str.replace(/([2-9])([01])/g,
function(all, replacementCount, bit) {
return Array(+replacementCount + 1).join(bit);
});
Then, simply decode the bit stream back into characters with String.fromCharCode(). You'd need to chunk the stream into 8 bit chunks, and then perform the conversion. I chose to use Array.prototype.reduce() as it's quite suited to this task. Alternatively, you could use String.fromCharCode.apply(String, chunks.map(function(byte) { return parseInt(byte, 2); })) to get the resulting string.
Something like...
str.split(/(.{8})/g).reduce(function(str, byte) {
return str + String.fromCharCode(parseInt(byte, 2));
}, "");
Put it together, and you get a function like...
function decryptStr(thisString) {
return thisString.replace(/([2-9])([01])/g,
function (all, replacementCount, bit) {
return Array(+replacementCount + 1).join(bit);
}).split(/(.{8})/g).reduce(function (str, byte) {
return str + String.fromCharCode(parseInt(byte, 2));
}, "");
}
jsFiddle.
Also, remember to place var in front of your variable declarations, otherwise those variable identifiers will leak to the containing scope until they're resolved (which is usually the global object).

Detect difference between & and %26 in location.hash

Analyzing the location.hash with this simple javascript code:
<script type="text/javascript">alert(location.hash);</script>
I have a difficult time separating out GET variables that contain a & (encoded as %26) and a & used to separate variables.
Example one:
code=php&age=15d
Example two:
code=php%20%26%20code&age=15d
As you can see, example 1 has no problems, but getting javascript to know that "code=php & code" in example two is beyond my abilities:
(Note: I'm not really using these variable names, and changing them to something else will only work so long as a search term does not match a search key, so I wouldn't consider that a valid solution.)
There is no difference between %26 and & in a fragment identifier (‘hash’). ‘&’ is only a reserved character with special meaning in a query (‘search’) segment of a URI. Escaping ‘&’ to ‘%26’ need be given no more application-level visibility than escaping ‘a’ to ‘%61’.
Since there is no standard encoding scheme for hiding structured data within a fragment identifier, you could make your own. For example, use ‘+XX’ hex-encoding to encode a character in a component:
hxxp://www.example.com/page#code=php+20+2B+20php&age=15d
function encodeHashComponent(x) {
return encodeURIComponent(x).split('%').join('+');
}
function decodeHashComponent(x) {
return decodeURIComponent(x.split('+').join('%'));
}
function getHashParameters() {
var parts= location.hash.substring(1).split('&');
var pars= {};
for (var i= parts.length; i-->0;) {
var kv= parts[i].split('=');
var k= kv[0];
var v= kv.slice(1).join('=');
pars[decodeHashComponent(k)]= decodeHashComponent(v);
}
return pars;
}
Testing on Firefox 3.1, it looks as if the browser converts hex codes to the appropriate characters when populating the location.hash variable, so there is no way JavaScript can know how the original was a single character or a hex code.
If you're trying to encode a character like & inside of your hash variables, I would suggest replacing it with another string.
You can also parse the string in weird ways, like (JS 1.6 here):
function pairs(xs) {
return xs.length > 1 ? [[xs[0], xs[1]]].concat(pairs(xs.slice(2))) : []
}
function union(xss) {
return xss.length == 0 ? [] : xss[0].concat(union(xss.slice(1)));
}
function splitOnLast(s, sub) {
return s.indexOf(sub) == -1 ? [s] :
[s.substr(0, s.lastIndexOf(sub)),
s.substr(s.lastIndexOf(sub) + sub.length)];
}
function objFromPairs(ps) {
var o = {};
for (var i = 0; i < ps.length; i++) {
o[ps[i][0]] = ps[i][1];
}
return o;
}
function parseHash(hash) {
return objFromPairs(
pairs(
union(
location.hash
.substr(1)
.split("=")
.map(
function (s) splitOnLast(s, '&')))))
}
>>> location.hash
"#code=php & code&age=15d"
>>> parseHash(location.hash)
{ "code": "php & code", "age": "15d" }
Just do the same as you do with the first example, but after you have split on the & then call unescape() to convert the %26 to & and the %20 to a space.
Edit:
Looks like I'm a bit out of date and you should be using decodeURIComponent() now, though I don't see any clear explanation on what it does differently to unescape(), apart from a suggestion that it doesn't handle Unicode properly.
This worked fine for me:
var hash = [];
if (location.hash) {
hash = location.href.split('#')[1].split('&');
}

Categories

Resources