Length of a Regex Match

Length of a Regex Match - javascript

I have an array of data that is being filtered into different arrays via regular expressions. One of these arrays is for containing data that is considered "too long" for my program. Not all of these "too long" instances are the same length, but I would like to shorter them.
I want something like DRB1*01:02.
Too long is anything like DRB1*01:02:03 or longer, including things like DRB1*01:02:03:abc:29
However, the letters at the front will not always be the same length. I will be dealing with things such as A*1:01:02 or TIM*01:02. So I am specifically looking at the sets of two integers and their preceding colon, and perhaps any letters that may follow in data that is "too long". I want the letters out front, the star, and 2 sets of numbers and the colon between them.
I want to use a regular expression to find pieces of data that are "too long", and then measure the length of the data it matches, and slice backward to remove it.
Something so that it will inform me that DRB1*01:02:03 matches *01:02:03 and the length of that is 9. Same for anything like DRB1*01:02:03:abc:29, where it matches *01:02:03:abc:29 and tells me the length is 16. NOT matching a word by it's length.
Is there any way to find the length of what part of the data the regular expression has matched? Including cases where the regular expression does not mark a definite end?
I am using JavaScript.

Use a capture group to get the part that matches after the *:
var matches = str.match(/^[A-Z]+(\*.*)$/);
if (matches) {
var len = matches[1].length;
alert("It's "+len+" characters long");
}

perlish regex
if (/([A-Z0-9]+\*\d+:\d+)(.+)/) {
print "too long, prefix:$1 extra stuff:$2 length:".length($2)."\n";
}

Related

Regular Expression (regex, regexp) for username

Actual Code
Code Instructions
(Update): I added the instructions and code as images instead. I tried the suggestions and they did not go through. (Update)
This is my first ever question on here. Been using stack for advice but im stumped this time. I am new to regular expressions and am stuck on this assignment.
The question is as follows:
"write a regex test such that...
Only a username that has alphanumeric characters (lower and upper case letters
Numbers allowed only - (no spaces, no underscores)
Has at minimum 2 characters
Has a number as the final character (such as 'Jason1') is accepted via the form
We are working with this code here...
function validate() {
let inputStr = document.getElementById("username").value;
// const myReg = // Uncomment this line and add your regular expression literal here
if (myReg.test(inputStr))
alert("Username accepted");
else
alert("Username must contain only alphanumeric characters, contain a mininum of two characters, and end with a digit.");
}
So we have to take out the Uncomment and hopefully it matches the myReg.test.
I tried my best by doing /^[a-z\d][5.12]$/i and /^[a-z\d+?]$/i
But i am completely off! How should it look like?

My solution would be this (regex101 link):
/[A-z0-9]+\d/
Let me explain.
You can use [] to create lists of allowed characters (as in, each [] will match a single character, but any one in the list). Usually, I would use \w, but this is equivalent to [A-Za-z0-9_], and the question stipulates no underscores (you can save space by doing A to z, since capitals are before lower-case in unicode).
{} is used to specify an amount of a character that must match. So, you could say a{3,6}, and that would mean that only between three and six as would match. You can omit the last index (a{3,}) to say at least this many matches, or between n and unlimited times. Using this, you can match "at least one" with {1,}. This is then shortened to the equivalent +.
Finally, we say that the regex must have a digit \d at the end.
The "minimum length of two" is covered by requiring at least one of any accepted character with a digit at the end. One plus one is two.

function validate() {
let inputStr = document.getElementById("username").value;
const myReg = /^[a-z0-9]{1,}[0-9]$/igm
if (myReg.test(inputStr))
alert("Username accepted");
else
alert("Username must contain only alphanumeric characters, contain a minimum of two characters, and end with a digit.");
}
This link will make you understand all the regex https://regex101.com/r/dX3hD4/240

Regex to count the number of capturing groups in a regex

I need a regex that examines arbitrary regex (as a string), returning the number of capturing groups. So far I have...
arbitrary_regex.toString().match(/\((|[^?].*?)\)/g).length
Which works for some cases, where the assumption that any group that starts with a question mark, is non-capturing. It also counts empty groups.
It does not work for brackets included in character classes, or escaped brackets, and possibly some other scenarios.

Modify your regex so that it will match an empty string, then match an empty string and see how many groups it returns:
var num_groups = (new RegExp(regex.toString() + '|')).exec('').length - 1;
Example: http://jsfiddle.net/EEn6G/

The accepted answer is what you should use in any production system. However, if you wanted to solve it using a regex for fun, you can do that as shown below. It assumes the regex you want the number of groups in is correct.
Note that the number of groups is just the number of non-literal (s in the regex. The strategy we're going to take is instead of matching all the correct (, we're going to split on all the incorrect stuff in between them.
re.toString().split(/(\(\?|\\\[|\[(?:\\\]|.)*?\]|\\\(|[^(])+/g).length - 1
You can see how it works on www.debuggex.com.

Javascript RegEx not returning false as expected

Not a big user of RegEx - never really understood them! However, I feel the best way to check input for a username field would be with one that only allows Letters (upper or lower), numbers and the _ character, and must start with a letter as per the site policy. The My RegEx and code is as such:
var theCheck = /[a-zA-Z]|\d|_$/g;
alert(theCheck.test(theUsername));
Despite trying with various combinations, everything is returning "true".
Can anyone help?

Your regex is saying "does theUsername contain a letter, digit, or end with underscore".
Try this instead:
var theCheck = /^[a-z]([a-z_\d]*)$/i; // the "i" is "ignore case"
This says "theUsername starts with a letter and only contains letters, digits, or underscores".
Note: I don't think you need the "g" here, that means "all matches". We just want to test the whole string.

How about something like this:
^([a-zA-Z][a-zA-Z0-9_]{3,})$
To explain the entire pattern:
^ = Makes sure that the first pattern in brackets is at the beginning
() = puts the entire pattern in a group in case you need to pull it out and not just validate
a-zA-Z0-9_ = matches your character allowances
$ = Makes sure that this must be the entire line
{3,} = Makes sure there are a minimum of 3 characters.
You can add a number after the comma for a character limit max
You could also use a +, which would merely enforce at least one character match the second pattern. A * would not enforce any lengths

Use this as your regex:
^[A-Za-z][a-zA-Z0-9_]*$

How to make regex match only first occurrence of each match?

/\b(keyword|whatever)\b/gi
How can I modify the above javascript regex to match only the first occurance of each word (I believe this is called non-greedy)?
First occurance of "keyword" and first occurance of "whatever" and I may put more more words in there.

Remove g flag from your regex:
/\b(keyword|whatever)\b/i

What you're doing is simply unachievable with a singular regular expression. Instead you will have to store every word you wish to find in an array, loop through them all searching for an answer, and then for any matches, store the result in an array.
Example:
var words = ["keyword","whatever"];
var text = "Whatever, keywords are like so, whatever... Unrelated, I now know " +
"what it's like to be a tweenage girl. Go Edward.";
var matches = []; // An empty array to store results in.
/* When you search the text you need to convert it to lower case to make it
searchable.
* We'll be using the built in method 'String.indexOf(needle)' to match
the strings as it avoids the need to escape the input for regular expression
metacharacters. */
//Text converted to lower case to allow case insensitive searchable.
var lowerCaseText = text.toLowerCase();
for (var i=0;i<words.length;i++) { //Loop through the `words` array
//indexOf returns -1 if no match is found
if (lowerCaseText.indexOf(words[i]) != -1)
matches.push(words[i]); //Add to the `matches` array
}

Remove the g modifier from your regex. Then it will find only one match.

What you're talking about can't be done with a JavaScript regex. It might be possible with advanced regex features like .NET's unrestricted lookbehind, but JavaScript's feature set is extremely limited. And even in .NET, it would probably be simplest to create a separate regex for each word and apply them one by one; in JavaScript it's your only option.
Greediness only applies to regexes that employ quantifiers, like /START.*END/. The . means "any character" and the * means "zero or more". After the START is located, the .* greedily consumes the rest of the text. Then it starts backtracking, "giving back" one character at a time until the next part of the regex, END succeeds in matching.
We call this regex "greedy" because it matches everything from the first occurrence of START to the last occurrence of END.
If there may be more than one "START"-to-"END" sequence, and you want to match just the first one, you can append a ? to the * to make it non-greedy: /START.*?END/. Now, each time the . tries to consume the next character, it first checks to see if it could match END at that spot instead. Thus it matches from the first START to the first END after that. And if you want to match all the "START"-to-"END" sequences individually, you add the 'g' modifier: /START.*?END/g.
It's a bit more complicated than that, of course. For example, what if these sequences can be nested, as in START…START…END…END? If I've gotten a little carried away with this answer, it's because understanding greediness is the first important step to mastering regexes. :-/

New to Regular Expressions need help

I need a form with one button and window for input
that will check an array, via a regular expression.
And will find a exact match of letters + numbers. Example wxyz [some space btw] 0960000
or a mix of numbers and letters [some space btw] + numbers 01xg [some space btw] 0960000
The array has four objects for now.
Once found i need a function the will open a new page or window when match is found .
Thanks you for your help.
Michael

To answer the Javascript part, here's one way to "grep" through the array to find matching elements:
var matches = [];
var re = /whatever/;
foo.forEach(
function(el) {
if( re.exec(el) )
matches.push(el);
}
);
To attempt to answer the regular expression part: I don't know what "exact match" means to you, and I'm assuming "some space" belongs only in between the other terms, and I'm assuming letters means the English alphabet from 'a' to 'z' in lower and upper case and the digits should be 0-9 (otherwise, other language characters might be matched).
The first pattern would be /[a-zA-Z0-9]+\s*0960000/. Change "\s*" to "\s+" if there is at least one space, instead of zero or more space characters. Change "\s" to " " if matching the tab character (and some lesser-used space chars) is not desirable.
For the second pattern, I don't know what "numbers 01xg" means, but if it means numbers followed by that string, then the pattern would be /[a-zA-Z0-9]+\s*[0-9]+\s*01xg\s*0960000/. The same caveats apply as above.
Additionally, this will match a partial string. If the string much be matched in entirety (if nothing in the string must exist except that which is matched), add "^" to the beginning of the pattern to anchor it to the beginning of the string, and "$" at the end to anchor it to the end of the string. For example, /[a-zA-Z0-9]+\s*0960000/ matches "foo_bar 5 0960000", but /^[a-zA-Z0-9]+\s*0960000$/ does not.
For more on regular expressions in Javascript, take a look at developer.mozilla.org's article on the RegExp object (the link takes you to JS version 1.5 reference, which should apply to all JS-capable browsers).
(edited to add): To match either situation, since they have overlapping parts, you could use the following pattern: /[a-zA-Z0-9]+(?:\s*[0-9]+\s*01xg)?\s*0960000/. The question mark says to match the part that differs -- in a non-matching group (?:foo) -- once or zero times. (?:foo)? and (?:foo|) do the same thing in this case, but I'm not sure whether there is a performance difference; I would recommend to use the one that makes the most sense to you, so you can read it later.

Develop Reference

JavaScript is the programming language of the Web.