Find Value in CSV with Regex

Find Value in CSV with Regex - javascript

I am trying to determine if a CSV string contains a specific number (also String), in this case the number 3. I have wrote some script to attempt this but the result always returns null. The regex works when using an online testing tool, however not when utilized via script. Can anyone determine what I'm missing?
Here is my code:
var csv = ["1,25,3","3", "1", "1,9,10", "2,4,5,6,7,11,33,3", "2,1,2,12,15,27"];
function contains(param){
var regex = /(,)?\D[3]\D(,)?/g;
return param.match(regex);
}
for(var i = 0; i < csv.length; i++){
console.log(contains(csv[i]));
}
Or if you prefer: JsFiddle

The problem is that your pattern requires a character (\D) before and after your 3. Since all 3s in your example are at the end of the string the second \D can never match. What you want is probably something like this:
var regex = /(?:^|\D)3(?!\d)/;
For the end of the string we use negative lookahead. That asserts that there is no digit. This is better than asserting that there is a non-digit character (because it works for the end of the string, too). Ideally, we would use the same for the beginning, but that is not supported by JavaScript. So we say, either we have the beginning of the string or a non-digit character. In fact (as Brad Koch pointed out), in this specific case, both conditions constitute a word boundary (a position between a character in [a-zA-Z0-9_] and one that is not or an end of the string). So you can simply use:
var regex = /\b3\b/;
However, if your input can include other characters than digits and commas (e.g. 1,text,2,a3b,somemoretext), none of these approaches are sufficient. Instead you need to check for commas explicitly:
var regex = /(?:^|,)3(?![^,])/;
Also, since you don't need the actual match, but only want to know whether there is a match, you can use test instead:
return regex.test(param);
This will give you a boolean instead of an array (which probably also makes is marginally more efficient).

Related

How to extract a string that conforms to a regex?

Say I have a RegEx like the following:
^[a-zA-Z]\w{12}$
And I have the following string:
%7AgTy!5hG^vxWa2#AgW
I would like to "pull" out of that string something that conforms to that regex. In this example we would get the following:
AgTy5hGvxWa2A
Reason: it starts with A because the regex says the first letter must be [a-zA-Z] (so it skips the first 2 characters), and then it pulls successive \ws out until it reaches 12 characters.
Is this sort of thing possible?
Edit: My apologies for being unclear. I'm not looking for a new regular expression that will give the proper output. Rather, I'm looking for a way to use the existing RegEx to extract the proper output. In my program these regular expressions are entered by hand by the user to extract a password from a long base256 hash such that it will conform to these existing password requirement regexes.

Instead of trying to match what you want and reconstructing the string, replace everything you don't want with nothing. This gives the impression that you're extracting what you need, but, in fact, it's doing the opposite; gets rid of everything you don't want to extract. I also dropped $ from the end of your original pattern otherwise it'll never match the string you present in your question.
See regex in use here
^[^a-z]+|\W+
^ Assert position at the start of the line
[^a-z]+ Matches any character that is not in the range a-z one or more times. Since the i flag is specified, this also matches A-Z
\W+ Match any non-word character one or more times
const regex = /^[^a-z]+|\W+/gi
const a = [
`%7AgTy!5hG^vxWa2#AgW`,
`%7AgTy!5hG^vxWa2#`
]
a.forEach(function(s) {
var clean = s.replace(regex, '')
var match = clean.match(/^[a-z]\w{12}/i)
console.log(match)
})

Javascript RegEx not returning false as expected

Not a big user of RegEx - never really understood them! However, I feel the best way to check input for a username field would be with one that only allows Letters (upper or lower), numbers and the _ character, and must start with a letter as per the site policy. The My RegEx and code is as such:
var theCheck = /[a-zA-Z]|\d|_$/g;
alert(theCheck.test(theUsername));
Despite trying with various combinations, everything is returning "true".
Can anyone help?

Your regex is saying "does theUsername contain a letter, digit, or end with underscore".
Try this instead:
var theCheck = /^[a-z]([a-z_\d]*)$/i; // the "i" is "ignore case"
This says "theUsername starts with a letter and only contains letters, digits, or underscores".
Note: I don't think you need the "g" here, that means "all matches". We just want to test the whole string.

How about something like this:
^([a-zA-Z][a-zA-Z0-9_]{3,})$
To explain the entire pattern:
^ = Makes sure that the first pattern in brackets is at the beginning
() = puts the entire pattern in a group in case you need to pull it out and not just validate
a-zA-Z0-9_ = matches your character allowances
$ = Makes sure that this must be the entire line
{3,} = Makes sure there are a minimum of 3 characters.
You can add a number after the comma for a character limit max
You could also use a +, which would merely enforce at least one character match the second pattern. A * would not enforce any lengths

Use this as your regex:
^[A-Za-z][a-zA-Z0-9_]*$

How to make regex match only first occurrence of each match?

/\b(keyword|whatever)\b/gi
How can I modify the above javascript regex to match only the first occurance of each word (I believe this is called non-greedy)?
First occurance of "keyword" and first occurance of "whatever" and I may put more more words in there.

Remove g flag from your regex:
/\b(keyword|whatever)\b/i

What you're doing is simply unachievable with a singular regular expression. Instead you will have to store every word you wish to find in an array, loop through them all searching for an answer, and then for any matches, store the result in an array.
Example:
var words = ["keyword","whatever"];
var text = "Whatever, keywords are like so, whatever... Unrelated, I now know " +
"what it's like to be a tweenage girl. Go Edward.";
var matches = []; // An empty array to store results in.
/* When you search the text you need to convert it to lower case to make it
searchable.
* We'll be using the built in method 'String.indexOf(needle)' to match
the strings as it avoids the need to escape the input for regular expression
metacharacters. */
//Text converted to lower case to allow case insensitive searchable.
var lowerCaseText = text.toLowerCase();
for (var i=0;i<words.length;i++) { //Loop through the `words` array
//indexOf returns -1 if no match is found
if (lowerCaseText.indexOf(words[i]) != -1)
matches.push(words[i]); //Add to the `matches` array
}

Remove the g modifier from your regex. Then it will find only one match.

What you're talking about can't be done with a JavaScript regex. It might be possible with advanced regex features like .NET's unrestricted lookbehind, but JavaScript's feature set is extremely limited. And even in .NET, it would probably be simplest to create a separate regex for each word and apply them one by one; in JavaScript it's your only option.
Greediness only applies to regexes that employ quantifiers, like /START.*END/. The . means "any character" and the * means "zero or more". After the START is located, the .* greedily consumes the rest of the text. Then it starts backtracking, "giving back" one character at a time until the next part of the regex, END succeeds in matching.
We call this regex "greedy" because it matches everything from the first occurrence of START to the last occurrence of END.
If there may be more than one "START"-to-"END" sequence, and you want to match just the first one, you can append a ? to the * to make it non-greedy: /START.*?END/. Now, each time the . tries to consume the next character, it first checks to see if it could match END at that spot instead. Thus it matches from the first START to the first END after that. And if you want to match all the "START"-to-"END" sequences individually, you add the 'g' modifier: /START.*?END/g.
It's a bit more complicated than that, of course. For example, what if these sequences can be nested, as in START…START…END…END? If I've gotten a little carried away with this answer, it's because understanding greediness is the first important step to mastering regexes. :-/

JavaScript and regular expressions: get the number of parenthesized subpattern

I have to get the number of parenthesized substring matches in a regular expression:
var reg=/([A-Z]+?)(?:[a-z]*)(?:\([1-3]|[7-9]\))*([1-9]+)/g,
nbr=0;
//Some code
alert(nbr); //2
In the above example, the total is 2: only the first and the last couple of parentheses will create grouping matches.
How to know this number for any regular expressions?
My first idea was to check the value of RegExp.$1 to RegExp.$9, but even if there are no corresponding parenthseses, these values are not null, but empty string...
I've also seen the RegExp.lastMatch property, but this one represents only the value of the last matched characters, not the corresponding number.
So, I've tried to build another regular expression to scan any RegExp and count this number, but it's quite difficult...
Do you have a better solution to do that?
Thanks in advance!

Javascripts RegExp.match() method returns an Array of matches. You might just want to check the length of that result array.
var mystr = "Hello 42 world. This 11 is a string 105 with some 2 numbers 55";
var res = mystr.match(/\d+/g);
console.log( res.length );

Well, judging from the code snippet we can assume that the input pattern is always a valid regular expression, because otherwise it would fail before the some code partm right? That makes the task much easier!
Because We just need to count how many starting capturing parentheses there are!
var reg = /([A-Z]+?)(?:[a-z]*)(?:\([1-3]|[7-9]\))*([1-9]+)/g;
var nbr = (' '+reg.source).match(/[^\\](\\\\)*(?=\([^?])/g);
nbr = nbr ? nbr.length : 0;
alert(nbr); // 2
And here is a breakdown:
[^\\] Make sure we don't start the match with an escaping slash.
(\\\\)* And we can have any number of escaped slash before the starting parenthes.
(?= Look ahead. More on this later.
\( The starting parenthes we are looking for.
[^?] Make sure it is not followed by a question mark - which means it is capturing.
) End of look ahead
Why match with look ahead? To check that the parenthes is not an escaped entity, we need to capture what goes before it. No big deal here. We know JS doens't have look behind.
Problem is, if there are two starting parentheses sticking together, then once we capture the first parenthes the second parenthes would have nothing to back it up - its back has already been captured!
So to make sure a parenthes can be the starting base of the next one, we need to exclude it from the match.
And the space added to the source? It is there to be the back of the first character, in case it is a starting parenthes.

New to Regular Expressions need help

I need a form with one button and window for input
that will check an array, via a regular expression.
And will find a exact match of letters + numbers. Example wxyz [some space btw] 0960000
or a mix of numbers and letters [some space btw] + numbers 01xg [some space btw] 0960000
The array has four objects for now.
Once found i need a function the will open a new page or window when match is found .
Thanks you for your help.
Michael

To answer the Javascript part, here's one way to "grep" through the array to find matching elements:
var matches = [];
var re = /whatever/;
foo.forEach(
function(el) {
if( re.exec(el) )
matches.push(el);
}
);
To attempt to answer the regular expression part: I don't know what "exact match" means to you, and I'm assuming "some space" belongs only in between the other terms, and I'm assuming letters means the English alphabet from 'a' to 'z' in lower and upper case and the digits should be 0-9 (otherwise, other language characters might be matched).
The first pattern would be /[a-zA-Z0-9]+\s*0960000/. Change "\s*" to "\s+" if there is at least one space, instead of zero or more space characters. Change "\s" to " " if matching the tab character (and some lesser-used space chars) is not desirable.
For the second pattern, I don't know what "numbers 01xg" means, but if it means numbers followed by that string, then the pattern would be /[a-zA-Z0-9]+\s*[0-9]+\s*01xg\s*0960000/. The same caveats apply as above.
Additionally, this will match a partial string. If the string much be matched in entirety (if nothing in the string must exist except that which is matched), add "^" to the beginning of the pattern to anchor it to the beginning of the string, and "$" at the end to anchor it to the end of the string. For example, /[a-zA-Z0-9]+\s*0960000/ matches "foo_bar 5 0960000", but /^[a-zA-Z0-9]+\s*0960000$/ does not.
For more on regular expressions in Javascript, take a look at developer.mozilla.org's article on the RegExp object (the link takes you to JS version 1.5 reference, which should apply to all JS-capable browsers).
(edited to add): To match either situation, since they have overlapping parts, you could use the following pattern: /[a-zA-Z0-9]+(?:\s*[0-9]+\s*01xg)?\s*0960000/. The question mark says to match the part that differs -- in a non-matching group (?:foo) -- once or zero times. (?:foo)? and (?:foo|) do the same thing in this case, but I'm not sure whether there is a performance difference; I would recommend to use the one that makes the most sense to you, so you can read it later.

Develop Reference

JavaScript is the programming language of the Web.