Javascript function to convert UTF8 substring

Javascript function to convert UTF8 substring - javascript

Following up with JavaScript function to convert UTF8 string between fullwidth and halfwidth forms, this time I want to replace only part of the string.
I think I have found all the answers that I want (from previous post and Replace substring in string with range in JavaScript), but I just can't put it all together. Please take a look at the following demonstration:
// Extend the string object to add a new method convert
String.prototype.convert = function() {
return this.replace( /[\uff01-\uff5e]/g,
function(ch) { return String.fromCharCode(ch.charCodeAt(0) - 0xfee0); }
)
};
// Test and verify it's working well:
> instr = "！ａｂｃ　ＡＢＣ！！ａｂｃ　ＡＢＣ！"
"！ａｂｃ　ＡＢＣ！！ａｂｃ　ＡＢＣ！"
> instr.substr(5, 4)
"ＡＢＣ！"
> instr.substr(5, 4).convert()
"ABC!"
// Great!
// Goal: define a decode method like this
String.prototype.decode = function(start, length) {
return this.replace(
new RegExp("^(.{" + start + "})(.{" + length + "})"), "$1" + "$2".convert());
};
// Test/verify failed:
> instr.decode(5, 4)
"！ａｂｃ　ＡＢＣ！！ａｂｃ　ＡＢＣ！"
// That failed, now define a test method to verify
String.prototype.decode = function(start, length) {
return this.replace(
new RegExp("^(.{" + start + "})(.{" + length + "})"), "$2".length);
};
> instr.decode(5, 4)
"2！ａｂｃ　ＡＢＣ！"
I.e., I believe that all my string extending methods are defined properly (in the eyes of someone who doesn't know javascript several days ago). But when putting them together, they don't work as I expected (！ａｂｃ　ABC!！ａｂｃ　ＡＢＣ！).
Further the last test, the one test with "$2".length, I just can't understand why "$2".length is 2 but not 4.
Please help me out.
Thanks a lot.

You can't do "$2".convert() and "$2".length when you define the regular expressions, it should be something like this
return this.replace(new RegExp(...), function(m1, m2) {
return m2.length;
});
so that the script worked dynamically on every matching result

Related

check if a string array element is a sub string of a URL

I am working on a URL whitelist in a browser extension. What I have currently works but I need to check the list in two places and I want to try to make it more efficient so as to reduce the chances of increased page load times.
I have to check the list in two places. The first check is in a page mod with an attached content script which is applied to all sites, the content script is changed if the url is in the whitelist. The second check is in a request observer to send different headers if the url is whitelisted.
I have tried to only check it once and pass the result from the page mod to the requst observer or from the request observer to the page mod, it results in timing issues with either the headers not being correct or the modifictaions to the content script are not applied, when they should be.
Is there a way I can improve on the substring checking code below to make it faster
I have a list of user entered sites which are sorted alphabetically before saving.
For now the format of the list is simple.
example1.com
b.example2.com/some/content.html
c.exampleN.com
and the url could be
http://example1.com/some/site/content.html
I am currently checking the if the url contains a substring with the value of each array element
//check if a url is in the list
function listCheck(list,url){
for (var i=0; i<list.length; i++){
if (url.indexOf(list[i]) > -1)
return true;
}
return false;
};

You can use binary search with the first letter of the URL. This will come handy because whitelists can grow pretty fast. However you cannot do this with patterns. (e.g.: *.somedomain.com)
Consider about using a hashtable to store the whitelist. You can make it efficient and specialized by writing your own hash function.
Regex will make things easier, but will also make things slow at times. If you use regex, make sure you know what you are doing. You can shrink the comparison list first by one of the methods described above.
Edit: Here's the binary search I was talking about. This is applicable only if wildcards are not used.
function binarySearch(needle, haystack, startIndex, endIndex) {
//console.log("\ttrying to find " + needle + " between " +
// haystack[startIndex] + "(" + startIndex + ") and " +
// haystack[endIndex] + "(" + endIndex + ")");
// the basic case, where the list is narrowed down to 1 or 2 items
if (startIndex == endIndex || endIndex - startIndex == 1) {
if (haystack[startIndex] == needle)
return startIndex;
if (haystack[endIndex] == needle)
return endIndex;
return -1;
}
var midIndex = Math.ceil((startIndex + endIndex) / 2);
//console.log("\t\tgot " + haystack[midIndex] + "(" + midIndex +
// ") for middle of the list.");
var comparison = haystack[midIndex].localeCompare(needle);
//console.log("\t\tcomparison: " + comparison);
if (comparison > 0)
return binarySearch(needle, haystack, startIndex, midIndex);
if (comparison < 0)
return binarySearch(needle, haystack, midIndex, endIndex);
return midIndex; // (comparison == 0)
}
var sitelist = [ // the whitelist (the haystack).
"alpha.com",
"bravo.com",
"charlie.com",
"delta.com",
"echo.com",
"foxtrot.com",
"golf.com",
"hotel.com",
"india.com",
"juliet.com",
"kilo.com",
"lima.com",
"mike.com",
"november.com",
"oscar.com",
"papa.com",
"quebec.com",
"romeo.com",
"sierra.com",
"tango.com",
"uniform.com",
"victor.com",
"whiskey.com",
"xray.com",
"yankee.com",
"zulu.com"
];
function testBinarySearch(needle) {
console.log("trying to find " + needle);
var foundIndex = binarySearch(needle, sitelist, 0, sitelist.length - 1);
if (foundIndex < 0)
console.log(needle + " not found");
else
console.log(needle + " found at: " + foundIndex);
}
// note that the list is already sorted. if the list is not sorted,
// haystack.sort();
// we can find "uniform.com" using 5 comparisons, instead of 20
testBinarySearch("uniform.com");
// we can confirm the non-existance of "google.com" in 4 comparisons, not 26
testBinarySearch("google.com");
// this is an interesting (worst) case, it takes 5 comparisons, instead of 1
testBinarySearch("alpha.com");
// "zulu.com" takes 4 comparisons instead of 26
testBinarySearch("zulu.com");
When your list grows, binary search can scale very well. I would not go to other pros and cons of binary search since they are very well documented in large number of places.
More SO questions on JavaScript binary search:
Binary Search in Javascript
Searching a Binary Tree in JavaScript
Binary Search Code
Binary search in JSON object
javascript binary search tree implementation

Using a regexp will make things easier. Whit this code you just need to make ONE comparison.
function listCheck(list, url) {
var exp = new RegExp('(' + list.join('|') + ')');
if (exp.test(url)) return true;
else return false;
}
EDIT: you can get errors with symbols . or / or - in urls, so this code works better:
function listCheck(list, url) {
var exp = new RegExp('(' + list.join('|').replace(/(\/|\.|\-)/g, '\\$1') + ')');
if (exp.test(url)) return true;
else return false;
}

regex for nested parenthesis

Using javascript, im trying to make a node.js module to parse predicate logic statements.
I've been working on this regex for a bit and I just can't get it to behave the way i want
1. /\(((?:[^{}]*\([^{}]*\))*[^{}]*?)\)/
2. .replace(/\)((?:[^{}]*\)[^{}]*\))*[^{}]*?)\(/,'):::(')
the latter works fine on things like (a&s&d&a&s&d)->(a&s&(d)&s|(d)) but i just switched the delimiters...
what I'm trying to do is change a statement like
((r|a)&(s|r))&(~r)->(r|(q&r))->q
into
1->2->q
I can certainly write a procedural function to do it, that would be a fine solution. But Im really stuck on this.
The only real specification is the regex needs to respect the outermost parenthesis the most, and be able to replace separate ones.

Because this is not regex friendly I put togethor a couple of functions that do what you are looking for. The first matches parenthesis with depth:
function match_parens(code_to_test, level, opening, closing){
var sub_match, matched;
return code_to_test.replace(new RegExp('^([^'+opening+closing+']*(.))[\\s\\S]*$'), function(full_match, matched, $2, offset, original){
if ($2 == opening){
sub_match = match_parens(original.substr(offset+matched.length), level + 1, opening, closing);
matched = matched + sub_match
}
else if (level > 1){
sub_match = match_parens(original.substr(offset+matched.length), level - 1, opening, closing);
matched += sub_match;
}
return matched;
});
}
This function takes a string and returns everything up until the closing element.
The next function helps pulls a string passed to it apart, replacing all content in parenthesis with escalating numbers:
function pull_apart(testString){
var count = 1,
returnString = '',
tempIndex = testString.indexOf('(');
while (tempIndex !== -1){
returnString += testString.substring(0,tempIndex)+count;
count += 1;
testString = testString.substring(testString.indexOf('(') + match_parens(testString.substr(tempIndex + 1), 1, '(', ')').length+1)
tempIndex = testString.indexOf('(');
}
returnString += testString;
return returnString;
}
Running pull_apart('((r|a)&(s|r))&(~r)->(r|(q&r))->q') returns "1&2->3->q", which is what you are looking for. While this is not entirely regex, it is utilized in the paren matching function up above. I'm not sure if this fits whatever use case you had in mind, but hopefully it helps.

Writing an inverse function in javascript?

I ran into a situation at work today where I needed to write the inverse of a function that I had already written, but I found it inefficient to write the inverse manually because it seems like I would be repeating a lot of my code, and if I were to update the original function I would have to update the inverse with the corresponding changes. The function I am talking about looks something like this:
var f = function(id, str) {
if (id === 0) {
return str.substring(0, 4) + " " + str.substring(4, 8);
} else if (id === 1) {
return str.substring(0, 3) + "/" + str.substring(3, 8);
} else if (id === 2) {
return str.substring(0, 4) + "-" + str.substring(4, 8);
} else if (id == 3) {
return str.substring(0, 3) + "," + str.substring(3, 8);
}
}
So for example f(0, "ABCDEFGH") will return "ABCD EFGH". I need an inverse function that uses the function f(id, str) to come up with inputs from the output. So finverse(formattedStr) should return a dictionary of the corresponding inputs. For example, finverse("ABCD EFGH") should return { id: 0, str: "ABCDEFGH" }. Would it be possible to make use of the existing function f to write this inverse such that even if I were to update the original function with an extra "else if" clause, I wouldn't have to update finverse. In other words I do not want to manually construct finverse with if statements to map the outputs back to the inputs, rather I want to manipulate the original function somehow to come up with an inverse. Is this possible in javascript?

with a slight re-factoring, the task is actually pretty simple. You don't need all those ifs, and actually, ifs run slower than Object property lookup, not to mention them not being sealed-up in a private function somewhere...
we can accomplish a translation ( 1 in, 1+ out) without any flow logic:
// replace all the IF logic with an externally-modifiable logic table:
f.lut=[ [4," "], [3,"/"], [4,"-"], [3,","] ]; //(id=index, 0=pos, 1=char)
// simplify f() using the table to make choices instead of conditionals:
function f(id, str) {
id = f.lut[id];
return str.substring(0, id[0]) + id[1] + str.substring(id[0], 8);
}
// use the same table in reverse to compose an inverse function:
function finverse(s){
return {
id: +f.lut.map(function(A,i){ return A[1]==s.split(/[\w]+/).filter(Boolean)[0] ?
String(i):
""
}).filter(Boolean)[0][0],
str: s.split(/[\W]+/).filter(Boolean).join('')
};
}
// first, test new version of f():
f(0, "ABCDEFGH") // ABCD EFGH
f(1, "ABCDEFGH") // ABC/DEFGH
f(2, "ABCDEFGH") // ABCD-EFGH
f(3, "ABCDEFGH") // ABC,DEFGH
// now, test the inverse:
finverse("ABCD EFGH") //{id:0, str:"ABCDEFGH"}
finverse("ABC/DEFGH") //{id:1, str:"ABCDEFGH"}
finverse("ABCD-EFGH") //{id:2, str:"ABCDEFGH"}
finverse("ABC,DEFGH") //{id:3, str:"ABCDEFGH"}
let us know if this isn't what you were wanting, i wasn't 100% sure...

There is really no way to make it work perfectly. That is impossible to implement with nice speed characteristic. So, I try to give you two ways of solving this problem:
Make global object named fRules with rules which used in f().
fRules = [
{
id: 0,
char: ' ',
insertPosition: 4
},
// ... other rules ...
];
Then you can use fRules in f() simply finding rule with needed id and in fInverse iterating over array of rules and finding good one. Now you don't need to change f(), only fRules();
f.toString() to get text of function and parse function to abstract syntax tree with something. Like inner functions of UglifyJs. Read more here. Then you must manually write some inverser based on your function syntax tree. Ugly idea

How to achieve String Manipulation in JavaScript

The problem statement is like this: I have a contract. On renewal on every month the contract name should append with renewal identifier. For example at beginning the name is myContract then on first renewal name should be myContract-R1, next renewal name should be myContract-R2 and so on.. On each renewal, the name should automatically change. So in Jquery how can I do this?

This is a JavaScript question, not a jQuery question. jQuery adds little to JavaScript's built-in string manipulation.
It sounds like you want to take a string in the form "myContract" or "myContract-Rx" and have a function that appends "-R1" (if there's no "-Rx" already) or increments the number that's there.
There's no shortcut for that, you have to do it. Here's a sketch that works, I expect it could be optimized:
function incrementContract(name) {
var match = /^(.*)-R([0-9]+)$/.exec(name);
if (match) {
// Increment previous revision number
name = match[1] + "-R" + (parseInt(match[2], 10) + 1);
}
else {
// No previous revision number
name += "-R1";
}
return name;
}
Live copy

You can use a regular expression for this:
s = s.replace(/(-R\d+)?$/, function(m) {
return '-R' + (m.length === 0 ? 1 : parseInt(m.substr(2), 10) + 1);
});
The pattern (-R\d+)?$ will match the revision number (-R\d+) if there is one (?), and the end of the string ($).
The replacement will return -R1 if there was no revision number before, otherwise it will parse the revision number and increment it.

how you get renewal number? Calculating from date, or getting from database?
var renewal = 1,
name = 'myContract',
newname = name+'R'+renewal;
or maybe like
$(function(){
function renew(contract){
var num_re = /\d+/,
num = contract.match(num_re);
if (num==null) {
return contract+'-R1';
} else {
return contract.replace(num_re,++num[0]);
}
}
var str = 'myContract';
new_contract = renew(str); // myContract-1
new_contract = renew(new_contract); // myContract-2
new_contract = renew(new_contract); // myContract-3
});
Here jQuery can't help you. It's pure JavaScript working with strings
P.S. I have here simple reg exp, that's not concrete for your example (but it works). Better use reg-exp from example of T.J. Crowder

javascript parseFloat '500,000' returns 500 when I need 500000

How would it be a nice way of handling this?
I already thought on removing the comma and then parsing to float.
Do you know a better/cleaner way?
Thanks

parseFloat( theString.replace(/,/g,'') );

I don't know why no one has suggested this expression-
parseFloat( theString.replace(/[^\d\.]/g,'') );
Removes any non-numeric characters except for periods. You don't need custom functions/loops for this either, that's just overkill.

Nope. Remove the comma.

You can use the string replace method, but not in a one liner as a regexp allows.
while(str.indexOf(',')!=-1)str= str.replace(',','');
parseFloat(str);
Or to make a single expression without a regexp=
return parseFloat(str.split(',').join(''));
I'd use the regexp.

I don't have enough reputation to add a comment, but for anyone wondering on the performance for regex vs split/join, here's a quick fiddle: https://jsfiddle.net/uh3mmgru/
var test = "1,123,214.19";
var t0 = performance.now();
for (var i = 0; i < 1000000; i++)
{
var a = parseFloat(test.replace(/,/g,''));
}
var t1 = performance.now();
document.write('Regex took: ' + (t1 - t0) + ' ms');
document.write('<br>')
var t0 = performance.now();
for (var i = 0; i < 1000000; i++)
{
var b = parseFloat(test.split(',').join(''));
}
var t1 = performance.now();
document.write('Split/join took: ' + (t1 - t0) + ' ms');
The results I get are (for 1 million loops each):
Regex: 263.335 ms
Split/join: 1035.875 ms
So I think its safe to say that regex is the way to go in this scenario

Building on the idea from #kennebec, if you want to make sure that the commas are correct, and you don't want to replace commas, you could try something like this:
function myParse(num) {
var n2 = num.split(",")
out = 0
for(var i = 0; i < n2.length; i++) {
out *= 1000;
out += parseFloat(n2[i])
}
return out
}
alert(myParse("1,432,85"));
// Returns 1432085, as the comma is misplaced.
It may not be as fast, but you wanted alternatives :)

What about a simple function to solve most of the common problems?
function getValue(obj) {
Value = parseFloat( $(obj).val().replace(/,/g,'') ).toFixed(2);
return +Value;
}
The above function gets values from fields (using jQuery) assuming the entered values are numeric (I rather validate fields while user is entering data, so I know for sure field content is numeric).
In case of floating point values, if well formatted in the field, the function will return a float point value correctly.
This function is far from complete, but it quickly fix the "," (comma) issue for values entered as 1,234.56 or 1,234,567. It will return valid number as far the content is numeric.
The + (plus) sign in front of the variable Value in the return command is a "dirty trick" used in JavaScript to assure the variable content returned will be numeric.
it is easy to modify the function to other purposes, such as (for instance), convert strings to numeric values taking care of the "," (comma) issue:
function parseValue(str) {
Value = parseFloat( str.replace(/,/g,'') ).toFixed(2);
return +Value;
}
Both operations can even be combined in one function. I.e.:
function parseNumber(item,isField=false) {
Value = (isField) ? parseFloat( $(item).val().replace(/,/g,'') ).toFixed(2) : parseFloat( item.replace(/,/g,'') ).toFixed(2)
return +Value;
}
In such case, if function is called result = parseNumber('12,092.98'); it will parse the value as it is a String. But if called as result = parseNumber('#MyField', true); it will try to obtain the value from '#MyField'.
As I said before, such functions are far from complete, and can be expanded in many ways. One idea is to check the first character of the given parameter (string) and decide based on the string format where to obtain the value to be parsed (if 1st character is = '#' then it is an ID from a DOM object, otherwise, if it begins with a number, it must be a string to be parsed).
Try it... Happy coding.

Develop Reference

JavaScript is the programming language of the Web.

Javascript function to convert UTF8 substring - javascript

You can't do "$2".convert() and "$2".length when you define the regular expressions, it should be something like this return this.replace(new RegExp(...), function(m1, m2) { return m2.length; }); so that the script worked dynamically on every matching result

Related

check if a string array element is a sub string of a URL

regex for nested parenthesis

Writing an inverse function in javascript?

How to achieve String Manipulation in JavaScript

javascript parseFloat '500,000' returns 500 when I need 500000

Categories

Resources