Parsing with or without regular expressions? Which one is faster? - javascript

Say I have an array of strings of the following format:
"array[5] = 10"
What would be the best solution to parse it in JavaScript?
Ashamedly not being familiar with regular expressions, I can come up only with something like this:
for (i in lines){
var index = lines[i].indexOf("array[");
if (index >= 0) {
var pair = str.substring(index + 6).trim().split('=');
var index = pair[0].trim().substring(0, pair[0].trim().length - 1);
var value = pair[1].trim();
}
}
Is there a more elegant way to parse something like this? If the answer is using regex, would it make the code slower?

Don't ask which approach is faster; measure it!
This is a regular expression that should match what you've implemented in your code:
/array\[(\d+)]\s*=\s*(.+)/
To help you learn regular expression, you can use a tool like Regexper to visualize the code. Here's a visualization of the above expression:
Note how for the index I assumed it should be an integer, but for the value any characters are accepted. Your code doesn't specify that either the index or value should be numbers, but I made some assumptions to that effect. I leave it as an exercise to the reader to tweak the expression to something more fitting if need be.

If you want a regular expression approach, then, something like so will do the trick: ^".*?\[(\d+)\]\s*=\s*(\d+)"$. This will match and extract the number you have in your square brackets (\[(\d+)\]) and also any numbers you will have at the end just before the " sign.
Once matched, it will put them into a group which you can then eventually access. Please check this previous SO post to see how you can access said groups.
I can't comment on speed, but usually regular expressions make string processing code more compact, the drawback of which is that the code is usually more difficult to read (depending on the complexity of the expression).

Regex is slower than working by finding the index of a given char, regardless of the language.
In your case, don't use split but only substring at given index.
Moreover, some hints to improve perf : pair[0].trim() is called twice and first trim is useless because you already call pair[1].trim().
It's all about algorithms…
Here is a faster implementation :
for (var i = 0; i < lines.length; i++) {
var i1 = lines[i].indexOf("[");
var i2 = lines[i].indexOf("]");
var i3 = lines[i].indexOf("=");
if (i1 >= 0) {
var index = lines[i].substring(i1, i2);
var value = lines[i].substring(i3, lines[i].length-1).trim();
}
}

If all you want to do is extract the index and value, you don't need to parse the string (which infers tokenising and processing). Just find the bits you want and extract them.
If your strings are always like "array[5] = 10" and the values are always integers, then:
var nums = s.match(/\d+/);
var index = nums[0];
var value = nums[1];
should do the trick. If there is a chance that there will be no matches, then you might want:
var index = nums && nums[0];
var value = nums && nums[1];
and deal with cases where index or value are null to avoid errors.
If you genuinely want to parse the string, there's a bit more work to do.

Related

Javascript slice a string into a chunks of specified length stored in variable

I would like to slice a javascript string into an array of strings of specified length (the lenght can vary), so I would like to have length parametr as a separete variable:
var length = 3;
var string = 'aaabbbcccddd';
var myArray = string.match(/(.{3})/g);
How to use length variable in match?
Or any other solution similar to str_split in PHP.
My question is not a duplicate of:
Split large string in n-size chunks in JavaScript cause I know how to slice, but the question is how to use variable in match.
I can't manage to do that Javascript Regex: How to put a variable inside a regular expression?
Well string.substr() is a better option if you always have to split by
length only.
But in case you are curious to know how to do it with regex you can add variable in your RegExp by following way.
var length = 3;
let reg = new RegExp(`(.{${length}})`, 'g')
var string = 'aaabbbcccddd';
var myArray = string.match(reg);
console.log(myArray);
I do not know whether you got a fix for the issue:
I had gone through your question Yesterday, but was not able to answer it because of the reason that the question was on hold or marked as duplicate and later I got a fix for this. Hope it helps you if you do not got it fixed yet.
What I have used is new RegExp(), please see the fiddle:
var length = 3;
var string = 'aaabbbcccddd';
dynamicRegExp =new RegExp("(.{"+length+"})", "g");
console.log("Regex used: "+ dynamicRegExp);
var myArray = string.match(dynamicRegExp);
console.log("Output: "+ myArray);
Syntax
new RegExp(pattern[, flags])
RegExp(pattern[, flags])
Parameters
Pattern
The text of the regular expression or, as of ES5, another RegExp object (or literal) to copy (the latter for the two RegExp constructor notations only).
flags
If specified, flags indicates the flags to add, or if an object is supplied for the pattern, the flags value will replace any of that object's flags (and lastIndex will be reset to 0) (as of ES2015). If flags is not specified and a regular expressions object is supplied, that object's flags (and lastIndex value) will be copied over.
What #Code Maniac is also correct or same as this one.

How to catch empty string when using split length

Using the JS split function on a empty string will return 1, that makes sense of course.
I had a situation in which I needed to count the number of ID's inside a comma-separated string. When just using string.split(',').length on an empty string it will return 1, which won't correspond with the actual number of ID's inside the string (but is just the default behavior of split, since a single - empty - element is returned).
To catch this, I wrote the code below. But something tells me that this isn't the most excellent solution. I would like to improve the code below and therefore get a better understanding of best practice.
Hopefully someone could help out here and provide some feedback on my issue:
What's the best way to count the number of ID's inside a comma-separated string, with respect to empty strings?
var str1 = '12,16,91,89,43';
var str2 = '';
if(!str1)
countRight = 0;
else
countRight = str1.split(',').length;
if(!str2)
countWrong = 0;
else
countWrong = str2.split(',').length;
Your code is a little verbose. How about
count = str ? str.split (',').length : 0;
or shorter but a little more more obscure :
count = +(str && str.split(',').length);
or even
count = str.length && str.split (',').length;
which can be shortened to
count = (str && str.split (',')).length;
Off topic : If your strings are coming from user input I would recommend allowing spaces by splitting using the regular expression /\s*,\s*/

How to compare two Strings and get Different part

now I have two strings,
var str1 = "A10B1C101D11";
var str2 = "A1B22C101D110E1";
What I intend to do is to tell the difference between them, the result will look like
A10B1C101D11
A10 B22 C101 D110E1
It follows the same pattern, one character and a number. And if the character doesn't exist or the number is different between them, I will say they are different, and highlight the different part. Can regular expression do it or any other good solution? thanks in advance!
Let me start by stating that regexp might not be the best tool for this. As the strings have a simple format that you are aware of it will be faster and safer to parse the strings into tokens and then compare the tokens.
However you can do this with Regexp, although in javascript you are hampered by the lack of lookbehind.
The way to do this is to use negative lookahead to prevent matches that are included in the other string. However since javascript does not support lookbehind you might need to go search from both directions.
We do this by concatenating the strings, with a delimiter that we can test for.
If using '|' as a delimiter the regexp becomes;
/(\D\d*)(?=(?:\||\D.*\|))(?!.*\|(.*\d)?\1(\D|$))/g
To find the tokens in the second string that are not present in the first you do;
var bothstring=str2.concat("|",str1);
var re=/(\D\d*)(?=(?:\||\D.*\|))(?!.*\|(.*\d)?\1(\D|$))/g;
var match=re.exec(bothstring);
Subsequent calls to re.exec will return later matches. So you can iterate over them as in the following example;
while (match!=null){
alert("\""+match+"\" At position "+match.index);
match=re.exec(t);
}
As stated this gives tokens in str2 that are different in str1. To get the tokens in str1 that are different use the same code but change the order of str1 and str2 when you concatenate the strings.
The above code might not be safe if dealing with potentially dirty input. In particular it might misbehave if feed a string like "A100|A100", the first A100 will not be considered as having a missing object because the regexp is not aware that the source is supposed to be two different strings. If this is a potential issue then search for occurences of the delimiting character.
You call break the string into an array
var aStr1 = str1.split('');
var aStr2 = str2.split('');
Then check which one has more characters, and save the smaller number
var totalCharacters;
if(aStr1.length > aStr2.length) {
totalCharacters = aStr2.length
} else {
totalCharacters = aStr1.length
}
And loop comparing both
var diff = [];
for(var i = 0; i<totalCharacters; i++) {
if(aStr1[i] != aStr2[i]) {
diff.push(aStr1[i]); // or something else
}
}
At the very end you can concat those last characters from the bigger String (since they obviously are different from the other one).
Does it helps you?

regex - get numbers after certain character string

I have a text string that can be any number of characters that I would like to attach an order number to the end. Then I can pluck off the order number when I need to use it again. Since there's a possibility that the number is variable length, I would like to do a regular expression that catch's everything after the = sign in the string ?order_num=
So the whole string would be
"aijfoi aodsifj adofija afdoiajd?order_num=3216545"
I've tried to use the online regular expression generator but with no luck. Can someone please help me with extracting the number on the end and putting them into a variable and something to put what comes before the ?order_num=203823 into its own variable.
I'll post some attempts of my own, but I foresee failure and confusion.
var s = "aijfoi aodsifj adofija afdoiajd?order_num=3216545";
var m = s.match(/([^\?]*)\?order_num=(\d*)/);
var num = m[2], rest = m[1];
But remember that regular expressions are slow. Use indexOf and substring/slice when you can. For example:
var p = s.indexOf("?");
var num = s.substring(p + "?order_num=".length), rest = s.substring(0, p);
I see no need for regex for this:
var str="aijfoi aodsifj adofija afdoiajd?order_num=3216545";
var n=str.split("?");
n will then be an array, where index 0 is before the ? and index 1 is after.
Another example:
var str="aijfoi aodsifj adofija afdoiajd?order_num=3216545";
var n=str.split("?order_num=");
Will give you the result:
n[0] = aijfoi aodsifj adofija afdoiajd and
n[1] = 3216545
You can substring from the first instance of ? onward, and then regex to get rid of most of the complexities in the expression, and improve performance (which is probably negligible anyway and not something to worry about unless you are doing this over thousands of iterations). in addition, this will match order_num= at any point within the querystring, not necessarily just at the very end of the querystring.
var match = s.substr(s.indexOf('?')).match(/order_num=(\d+)/);
if (match) {
alert(match[1]);
}

Splitting Nucleotide Sequences in JS with Regexp

I'm trying to split up a nucleotide sequence into amino acid strings using a regular expression. I have to start a new string at each occurrence of the string "ATG", but I don't want to actually stop the first match at the "ATG". Valid input is any ordering of a string of As, Cs, Gs, and Ts.
For example, given the input string: ATGAACATAGGACATGAGGAGTCA
I should get two strings: ATGAACATAGGACATGAGGAGTCA (the whole thing) and ATGAGGAGTCA (the first match of "ATG" onward). A string that contains "ATG" n times should result in n results.
I thought the expression /(?:[ACGT]*)(ATG)[ACGT]*/g would work, but it doesn't. If this can't be done with a regexp it's easy enough to just write out the code for, but I always prefer an elegant solution if one is available.
If you really want to use regular expressions, try this:
var str = "ATGAACATAGGACATGAGGAGTCA",
re = /ATG.*/g, match, matches=[];
while ((match = re.exec(str)) !== null) {
matches.push(match);
re.lastIndex = match.index + 3;
}
But be careful with exec and changing the index. You can easily make it an infinite loop.
Otherwise you could use indexOf to find the indices and substr to get the substrings:
var str = "ATGAACATAGGACATGAGGAGTCA",
offset=0, match=str, matches=[];
while ((offset = match.indexOf("ATG", offset)) > -1) {
match = match.substr(offset);
matches.push(match);
offset += 3;
}
I think you want is
var subStrings = inputString.split('ATG');
KISS :)
Splitting a string before each occurrence of ATG is simple, just use
result = subject.split(/(?=ATG)/i);
(?=ATG) is a positive lookahead assertion, meaning "Assert that you can match ATG starting at the current position in the string".
This will split GGGATGTTTATGGGGATGCCC into GGG, ATGTTT, ATGGGG and ATGCCC.
So now you have an array of (in this case four) strings. I would now go and take those, discard the first one (this one will never contain nor start with ATG) and then join the strings no. 2 + ... + n, then 3 + ... + n etc. until you have exhausted the list.
Of course, this regex doesn't do any validation as to whether the string only contains ACGT characters as it only matches positions between characters, so that should be done before, i. e. that the input string matches /^[ACGT]*$/i.
Since you want to capture from every "ATG" to the end split isn't right for you. You can, however, use replace, and abuse the callback function:
var matches = [];
seq.replace(/atg/gi, function(m, pos){ matches.push(seq.substr(pos)); });
This isn't with regex, and I don't know if this is what you consider "elegant," but...
var sequence = 'ATGAACATAGGACATGAGGAGTCA';
var matches = [];
do {
matches.push('ATG' + (sequence = sequence.slice(sequence.indexOf('ATG') + 3)));
} while (sequence.indexOf('ATG') > 0);
I'm not completely sure if this is what you're looking for. For example, with an input string of ATGabcdefghijATGklmnoATGpqrs, this returns ATGabcdefghijATGklmnoATGpqrs, ATGklmnoATGpqrs, and ATGpqrs.

Categories

Resources