regex - get numbers after certain character string - javascript

I have a text string that can be any number of characters that I would like to attach an order number to the end. Then I can pluck off the order number when I need to use it again. Since there's a possibility that the number is variable length, I would like to do a regular expression that catch's everything after the = sign in the string ?order_num=
So the whole string would be
"aijfoi aodsifj adofija afdoiajd?order_num=3216545"
I've tried to use the online regular expression generator but with no luck. Can someone please help me with extracting the number on the end and putting them into a variable and something to put what comes before the ?order_num=203823 into its own variable.
I'll post some attempts of my own, but I foresee failure and confusion.

var s = "aijfoi aodsifj adofija afdoiajd?order_num=3216545";
var m = s.match(/([^\?]*)\?order_num=(\d*)/);
var num = m[2], rest = m[1];
But remember that regular expressions are slow. Use indexOf and substring/slice when you can. For example:
var p = s.indexOf("?");
var num = s.substring(p + "?order_num=".length), rest = s.substring(0, p);

I see no need for regex for this:
var str="aijfoi aodsifj adofija afdoiajd?order_num=3216545";
var n=str.split("?");
n will then be an array, where index 0 is before the ? and index 1 is after.
Another example:
var str="aijfoi aodsifj adofija afdoiajd?order_num=3216545";
var n=str.split("?order_num=");
Will give you the result:
n[0] = aijfoi aodsifj adofija afdoiajd and
n[1] = 3216545

You can substring from the first instance of ? onward, and then regex to get rid of most of the complexities in the expression, and improve performance (which is probably negligible anyway and not something to worry about unless you are doing this over thousands of iterations). in addition, this will match order_num= at any point within the querystring, not necessarily just at the very end of the querystring.
var match = s.substr(s.indexOf('?')).match(/order_num=(\d+)/);
if (match) {
alert(match[1]);
}

Related

Parsing with or without regular expressions? Which one is faster?

Say I have an array of strings of the following format:
"array[5] = 10"
What would be the best solution to parse it in JavaScript?
Ashamedly not being familiar with regular expressions, I can come up only with something like this:
for (i in lines){
var index = lines[i].indexOf("array[");
if (index >= 0) {
var pair = str.substring(index + 6).trim().split('=');
var index = pair[0].trim().substring(0, pair[0].trim().length - 1);
var value = pair[1].trim();
}
}
Is there a more elegant way to parse something like this? If the answer is using regex, would it make the code slower?
Don't ask which approach is faster; measure it!
This is a regular expression that should match what you've implemented in your code:
/array\[(\d+)]\s*=\s*(.+)/
To help you learn regular expression, you can use a tool like Regexper to visualize the code. Here's a visualization of the above expression:
Note how for the index I assumed it should be an integer, but for the value any characters are accepted. Your code doesn't specify that either the index or value should be numbers, but I made some assumptions to that effect. I leave it as an exercise to the reader to tweak the expression to something more fitting if need be.
If you want a regular expression approach, then, something like so will do the trick: ^".*?\[(\d+)\]\s*=\s*(\d+)"$. This will match and extract the number you have in your square brackets (\[(\d+)\]) and also any numbers you will have at the end just before the " sign.
Once matched, it will put them into a group which you can then eventually access. Please check this previous SO post to see how you can access said groups.
I can't comment on speed, but usually regular expressions make string processing code more compact, the drawback of which is that the code is usually more difficult to read (depending on the complexity of the expression).
Regex is slower than working by finding the index of a given char, regardless of the language.
In your case, don't use split but only substring at given index.
Moreover, some hints to improve perf : pair[0].trim() is called twice and first trim is useless because you already call pair[1].trim().
It's all about algorithms…
Here is a faster implementation :
for (var i = 0; i < lines.length; i++) {
var i1 = lines[i].indexOf("[");
var i2 = lines[i].indexOf("]");
var i3 = lines[i].indexOf("=");
if (i1 >= 0) {
var index = lines[i].substring(i1, i2);
var value = lines[i].substring(i3, lines[i].length-1).trim();
}
}
If all you want to do is extract the index and value, you don't need to parse the string (which infers tokenising and processing). Just find the bits you want and extract them.
If your strings are always like "array[5] = 10" and the values are always integers, then:
var nums = s.match(/\d+/);
var index = nums[0];
var value = nums[1];
should do the trick. If there is a chance that there will be no matches, then you might want:
var index = nums && nums[0];
var value = nums && nums[1];
and deal with cases where index or value are null to avoid errors.
If you genuinely want to parse the string, there's a bit more work to do.

Regex one-liner for splitting string at nth character where n is a variable length

I've found a few similar questions, but none of them are clean one-liners, which I feel should be possible. I want to split a string at the last instance of specific character (in my case .).
var img = $('body').attr('data-bg-img-url'); // the string http://sub.foo.com/img/my-img.jpg
var finalChar = img.split( img.split(/[.]+/).length-1 ); // returns int 3 in above string example
var dynamicRegex = '/[.$`finalChar`]/';
I know I'm breaking some rules here, wondering if someone smarter than me knows the correct way to put that together and compress it?
EDIT - The end goal here is to split and store http://sub.foo.com/img/my-img and .jpg as separate strings.
In regex, .* is greedy, meaning it will match as much as possible. Therefore, if you want to match up to the last ., you could do:
/^.*\./
And from the looks, you are trying to get the file extension, so you would want to add capture:
var result = /^.*\.(.*)$/.exec( str );
var extension = result[1];
And for both parts:
var result = /^(.*)\.(.*)$/.exec( str );
var path = result[1];
var extension = result[2];
You can use the lastIndexOf() method on the period and then use the substring method to obtain the first and second string. The split() method is better used in a foreach scenario where you want to split at all instances. Substring is preferable for these types of cases where you are breaking at a single instance of the string.

Regular expression in Javascript (without jQuery)?

I am new to Javascript and recently I wanted to use regular expression in order to get a number from url and store it into a var as string and another var as digit. For example I want to get the number 55 from the below webpage (which is not an accrual page) and I want to store it in a var.
I tried this but it is not working
https://www.google.com/55.html
url.replace(/(\d+)(\.html)$/, function(str, p1, p2) {
return((Number(p1) + 1) + p2);
Please I need help but not with jQuery because it does not make a lot of sense to me.
var numPortion = url.match(/(\d+)\.html/)[1]
(Assumes a match; if it might not match, check the results before applying the array subscript.)
Try this
var a="https://www.google.com/55.html";
var match = a.match(/(\d+)(\.html)/);
match is an array,
match[0] contains the matched expression from your script,
match[1] is the number (the 1st parenthesis),
and so on
var url = 'http://www.google.com/55.html';
var yournumber = /(\d+)(\.html)$/.exec(url);
yournumber = yournumber && yournumber[1]; // <-- shortcut for using if else

Javascript regex match for string "game_1"

I just can't get this thing to work in javascript. So, I have a text "game_1" without the quotes and now i want to get that number out of it and I tried this:
var idText = "game_1";
re = /game_(.*?)/;
found = idText.match(re);
var ajdi = found[1];
alert( ajdi );
But it doesn't work - please point out where am I going wrong.
If you're only matching a number, you may want to try
/game_([0-9]+)/
as your regular expression. That will match at least one number, which seems to be what you need. You entered a regexp that allows for 0 characters (*) and let it select the shortest possible result (?), which may be a problem (and match you 0 characters), depending on the regex engine.
If this is the complete text, then there is no need for regular expressions:
var id = +str.split('_')[1];
or
var id = +str.replace('game_', '');
(unary + is to convert the string to a number)
If you insist on regular expression, you have to anchor the expression:
/^game_(.*?)$/
or make the * greedy by omitting the ?:
/game_(.*)/
Better is to make the expression more restrictive as #Naltharial suggested.
Simple string manipulation:
var idText = "game_1",
adji = parseInt(idText.substring(5), 10);
* means zero or more occurrences. It seems that combining it with a greediness controller ? results in zero match.
You could replace * with + (which means one or more occurrences), but as #Felix Kling notes, it would only match one digit.
Better to ditch the ? completely.
http://jsfiddle.net/G8Qt7/2/
Try "game_1".replace(/^(game_)/, '')
this will return the number
You can simply use this re /\d+/ to get any number inside your string

Splitting Nucleotide Sequences in JS with Regexp

I'm trying to split up a nucleotide sequence into amino acid strings using a regular expression. I have to start a new string at each occurrence of the string "ATG", but I don't want to actually stop the first match at the "ATG". Valid input is any ordering of a string of As, Cs, Gs, and Ts.
For example, given the input string: ATGAACATAGGACATGAGGAGTCA
I should get two strings: ATGAACATAGGACATGAGGAGTCA (the whole thing) and ATGAGGAGTCA (the first match of "ATG" onward). A string that contains "ATG" n times should result in n results.
I thought the expression /(?:[ACGT]*)(ATG)[ACGT]*/g would work, but it doesn't. If this can't be done with a regexp it's easy enough to just write out the code for, but I always prefer an elegant solution if one is available.
If you really want to use regular expressions, try this:
var str = "ATGAACATAGGACATGAGGAGTCA",
re = /ATG.*/g, match, matches=[];
while ((match = re.exec(str)) !== null) {
matches.push(match);
re.lastIndex = match.index + 3;
}
But be careful with exec and changing the index. You can easily make it an infinite loop.
Otherwise you could use indexOf to find the indices and substr to get the substrings:
var str = "ATGAACATAGGACATGAGGAGTCA",
offset=0, match=str, matches=[];
while ((offset = match.indexOf("ATG", offset)) > -1) {
match = match.substr(offset);
matches.push(match);
offset += 3;
}
I think you want is
var subStrings = inputString.split('ATG');
KISS :)
Splitting a string before each occurrence of ATG is simple, just use
result = subject.split(/(?=ATG)/i);
(?=ATG) is a positive lookahead assertion, meaning "Assert that you can match ATG starting at the current position in the string".
This will split GGGATGTTTATGGGGATGCCC into GGG, ATGTTT, ATGGGG and ATGCCC.
So now you have an array of (in this case four) strings. I would now go and take those, discard the first one (this one will never contain nor start with ATG) and then join the strings no. 2 + ... + n, then 3 + ... + n etc. until you have exhausted the list.
Of course, this regex doesn't do any validation as to whether the string only contains ACGT characters as it only matches positions between characters, so that should be done before, i. e. that the input string matches /^[ACGT]*$/i.
Since you want to capture from every "ATG" to the end split isn't right for you. You can, however, use replace, and abuse the callback function:
var matches = [];
seq.replace(/atg/gi, function(m, pos){ matches.push(seq.substr(pos)); });
This isn't with regex, and I don't know if this is what you consider "elegant," but...
var sequence = 'ATGAACATAGGACATGAGGAGTCA';
var matches = [];
do {
matches.push('ATG' + (sequence = sequence.slice(sequence.indexOf('ATG') + 3)));
} while (sequence.indexOf('ATG') > 0);
I'm not completely sure if this is what you're looking for. For example, with an input string of ATGabcdefghijATGklmnoATGpqrs, this returns ATGabcdefghijATGklmnoATGpqrs, ATGklmnoATGpqrs, and ATGpqrs.

Categories

Resources