How to extract a string that conforms to a regex? - javascript

Say I have a RegEx like the following:
^[a-zA-Z]\w{12}$
And I have the following string:
%7AgTy!5hG^vxWa2#AgW
I would like to "pull" out of that string something that conforms to that regex. In this example we would get the following:
AgTy5hGvxWa2A
Reason: it starts with A because the regex says the first letter must be [a-zA-Z] (so it skips the first 2 characters), and then it pulls successive \ws out until it reaches 12 characters.
Is this sort of thing possible?
Edit: My apologies for being unclear. I'm not looking for a new regular expression that will give the proper output. Rather, I'm looking for a way to use the existing RegEx to extract the proper output. In my program these regular expressions are entered by hand by the user to extract a password from a long base256 hash such that it will conform to these existing password requirement regexes.

Instead of trying to match what you want and reconstructing the string, replace everything you don't want with nothing. This gives the impression that you're extracting what you need, but, in fact, it's doing the opposite; gets rid of everything you don't want to extract. I also dropped $ from the end of your original pattern otherwise it'll never match the string you present in your question.
See regex in use here
^[^a-z]+|\W+
^ Assert position at the start of the line
[^a-z]+ Matches any character that is not in the range a-z one or more times. Since the i flag is specified, this also matches A-Z
\W+ Match any non-word character one or more times
const regex = /^[^a-z]+|\W+/gi
const a = [
`%7AgTy!5hG^vxWa2#AgW`,
`%7AgTy!5hG^vxWa2#`
]
a.forEach(function(s) {
var clean = s.replace(regex, '')
var match = clean.match(/^[a-z]\w{12}/i)
console.log(match)
})

Related

Regexp: excluding a word but including non-standard punctuation

I want to find strings that contain words in a particular order, allowing non-standard characters in between the words but excluding a particular word or symbol.
I'm using javascript's replace function to find all instances and put into an array.
So, I want select...from, with anything except 'from' in between the words. Or I can separate select...from from select...from (, as long as I exclude nesting. I think the answer is the same for both, i.e. how do I write: find x and not y within the same regexp?
From the internet, I feel this should work: /\bselect\b^(?!from).*\bfrom\b/gi but this finds no matches.
This works to find all select...from: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b/gi but modifying it to exclude the parenthesis "(" at the end prevents any matches: /\bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\(/gi
Can anyone tell me how to exclude words and symbols within this regexp?
Many thanks
Emma
Edit: partial string input:
left outer join [stage].[db].[table14] o on p.Project_id = o.project_id
left outer join
(
select
different_id
,sum(costs) - ( sum(brushes) + sum(carpets) + sum(fabric) + sum(other) + sum(chairs)+ sum(apples) ) as overallNumber
from
(
select ace from [stage].db.[table18] J
Javascript:
sequel = stringInputAsAbove;
var tst = sequel.replace(/\bselect\b[\s\S]*?\bfrom\b/gi, function(a,b) { console.log('match: '+a); selects.push(b); return a; });
console.log(selects);
Console.log(selects) should print an array of numbers, where each number is the starting character of a select...from. This works for the second regexp I gave in my info, printing: [95, 251]. Your \s\S variation does the same, #stribizhev.
The first example ^(?!from).* should do likewise but returns [].
The third example \s*^\( should return 251 only but returns []. However I have just noticed that the positive expression \s*\( does give 95, so some progress! It's the negatives I'm getting wrong.
Your \bselect\b^(?!from).*\bfrom\b regex doesn't work as expected because:
^ means here beginning of a line, not negation of next part, so
the \bselect\b^ means, select word followed by beginning of a
line. After removal of ^ regex start to match something
(DEMO) but it is still invalid.
in multiline text .* without modification will not match new line,
so regex will match only select...from in single lines, but if you
change it for (.|\n)* (as a simple example) it will match
multiline, but still invalid
the * is greede quantifire, so it will match as much a possible,
but if you use reluctant quantifire *?, regex will match to first
occurance of from word, and int will start to return relativly
correct result.
\bselect\b(?!from) means match separate select word which is not
directly followed by separate from word, so it would be
selectfrom somehow composed of separate words (because
select\bfrom) so (?!from) doesn't work and it is redundant
In effect you will get regex very similar to what Stribizhev gave you: \bselect\b(.|\n)*?\bfrom\b
In third expression you meke same mistake: \bselect\b[0-9a-zA-Z#\(\)\[\]\s\.\*,%_+-]*?\bfrom\b\s*^\( using ^ as (I assume) a negation, not beginning of a line. Remove ^ and you will again get relativly valid result (match from select through from to closing parathesis ) ).
Your second regex works similar to \bselect\b(.|\n)*?\bfrom\b or \bselect\b[\s\S]*?\bfrom\b.
I wrote "relativly valid result", as I also think, that parsing SQL with regex could be very camplicated, so I am not sure if it will work in every case.
You can also try to use positive lookahead to match just position in text, like:
(?=\bselect\b(?:.|\n)*?\bfrom\b)
DEMO - the () was added to regex just to return beginning index of match in groups, so it would be easier to check it validity
Negation in regex
We use ^ as negation in character class, for example [^a-z] means match anything but not letter, so it will match number, symbol, whitespace, etc, but not letter from range a to z (Look here). But this negation is on a level of single character. I you use [^from] it will prevent regex from matching characters f,r,o and m (demo). Also the [^from]{4} will avoid matching from but also form, morf, etc.
To exlude the whole word from matching by regex, you need to use negative look ahead, like (?!from), which will fail to match, if there will be chosen word from fallowing given position. To avoid matching whole line containing from you could use ^(?!.*from.*).+$ (demo).
However in your case, you don't need to use this construction, because if you replace greedy quantifire .*\bfrom with .*?\bfrom it will match to first occurance of this word. Whats more it would couse problems. Take a look on this regex, it will not match anything because (?![\s\S]*from[\s\S]*) is not restricted by anything, so it will match only if there is no from after select, but we want to match also from! in effect this regex try to match and exclude from at once, and fail. so the (?!.*word.*) construction works much better to exclude matching line with given word.
So what to do if we don't what to match a word in a fragment of a match? I think select\b([^f]|f(?!rom))*?\bfrom\b is a good solution. With ([^f]|f(?!rom))*? it will match everything between select and from, but will not exclude from.
But if you would like to match only select...from not followed by ( then it is good idea to use (?!\() like. But in your regex (multiline, use of (.|\n)*? or [\s\S]*? it will cause to match up to next select...from part, because reluctant quantifire will chenge a plece where it need to match to make whole regex . In my opinion, good solution would be to use again:
select\b([^f]|f(?!rom))*?\bfrom\b(?!\s*?\()
which will not overlap additional select..from and will not match if there is \( after select...from - check it here

Regular expression to check contains only

EDIT: Thank you all for your inputs. What ever you answered was right.But I thought I didnt explain it clear enough.
I want to check the input value while typing itself.If user is entering any other character that is not in the list the entered character should be rolled back.
(I am not concerning to check once the entire input is entered).
I want to validate a date input field which should contain only characters 0-9[digits], -(hyphen) , .(dot), and /(forward slash).Date may be like 22/02/1999 or 22.02.1999 or 22-02-1999.No validation need to be done on either occurrence or position. A plain validation is enough to check whether it has any other character than the above listed chars.
[I am not good at regular expressions.]
Here is what I thought should work but not.
var reg = new RegExp('[0-9]./-');
Here is jsfiddle.
Your expression only tests whether anywhere in the string, a digit is followed by any character (. is a meta character) and /-. For example, 5x/- or 42%/-foobar would match.
Instead, you want to put all the characters into the character class and test whether every single character in the string is one of them:
var reg = /^[0-9.\/-]+$/
^ matches the start of the string
[...] matches if the character is contained in the group (i.e. any digit, ., / or -).
The / has to be escaped because it also denotes the end of a regex literal.
- between two characters describes a range of characters (between them, e.g. 0-9 or a-z). If - is at the beginning or end it has no special meaning though and is literally interpreted as hyphen.
+ is a quantifier and means "one or more if the preceding pattern". This allows us (together with the anchors) to test whether every character of the string is in the character class.
$ matches the end of the string
Alternatively, you can check whether there is any character that is not one of the allowed ones:
var reg = /[^0-9.\/-]/;
The ^ at the beginning of the character class negates it. Here we don't have to test every character of the string, because the existence of only character is different already invalidates the string.
You can use it like so:
if (reg.test(str)) { // !reg.test(str) for the first expression
// str contains an invalid character
}
Try this:
([0-9]{2}[/\-.]){2}[0-9]{4}
If you are not concerned about the validity of the date, you can easily use the regex:
^[0-9]{1,2}[./-][0-9]{1,2}[./-][0-9]{4}$
The character class [./-] allows any one of the characters within the square brackets and the quantifiers allow for either 1 or 2 digit months and dates, while only 4 digit years.
You can also group the first few groups like so:
^([0-9]{1,2}[./-]){2}[0-9]{4}$
Updated your fiddle with the first regex.

Find Value in CSV with Regex

I am trying to determine if a CSV string contains a specific number (also String), in this case the number 3. I have wrote some script to attempt this but the result always returns null. The regex works when using an online testing tool, however not when utilized via script. Can anyone determine what I'm missing?
Here is my code:
var csv = ["1,25,3","3", "1", "1,9,10", "2,4,5,6,7,11,33,3", "2,1,2,12,15,27"];
function contains(param){
var regex = /(,)?\D[3]\D(,)?/g;
return param.match(regex);
}
for(var i = 0; i < csv.length; i++){
console.log(contains(csv[i]));
}
Or if you prefer: JsFiddle
The problem is that your pattern requires a character (\D) before and after your 3. Since all 3s in your example are at the end of the string the second \D can never match. What you want is probably something like this:
var regex = /(?:^|\D)3(?!\d)/;
For the end of the string we use negative lookahead. That asserts that there is no digit. This is better than asserting that there is a non-digit character (because it works for the end of the string, too). Ideally, we would use the same for the beginning, but that is not supported by JavaScript. So we say, either we have the beginning of the string or a non-digit character. In fact (as Brad Koch pointed out), in this specific case, both conditions constitute a word boundary (a position between a character in [a-zA-Z0-9_] and one that is not or an end of the string). So you can simply use:
var regex = /\b3\b/;
However, if your input can include other characters than digits and commas (e.g. 1,text,2,a3b,somemoretext), none of these approaches are sufficient. Instead you need to check for commas explicitly:
var regex = /(?:^|,)3(?![^,])/;
Also, since you don't need the actual match, but only want to know whether there is a match, you can use test instead:
return regex.test(param);
This will give you a boolean instead of an array (which probably also makes is marginally more efficient).

Javascript RegEx not returning false as expected

Not a big user of RegEx - never really understood them! However, I feel the best way to check input for a username field would be with one that only allows Letters (upper or lower), numbers and the _ character, and must start with a letter as per the site policy. The My RegEx and code is as such:
var theCheck = /[a-zA-Z]|\d|_$/g;
alert(theCheck.test(theUsername));
Despite trying with various combinations, everything is returning "true".
Can anyone help?
Your regex is saying "does theUsername contain a letter, digit, or end with underscore".
Try this instead:
var theCheck = /^[a-z]([a-z_\d]*)$/i; // the "i" is "ignore case"
This says "theUsername starts with a letter and only contains letters, digits, or underscores".
Note: I don't think you need the "g" here, that means "all matches". We just want to test the whole string.
How about something like this:
^([a-zA-Z][a-zA-Z0-9_]{3,})$
To explain the entire pattern:
^ = Makes sure that the first pattern in brackets is at the beginning
() = puts the entire pattern in a group in case you need to pull it out and not just validate
a-zA-Z0-9_ = matches your character allowances
$ = Makes sure that this must be the entire line
{3,} = Makes sure there are a minimum of 3 characters.
You can add a number after the comma for a character limit max
You could also use a +, which would merely enforce at least one character match the second pattern. A * would not enforce any lengths
Use this as your regex:
^[A-Za-z][a-zA-Z0-9_]*$

New to Regular Expressions need help

I need a form with one button and window for input
that will check an array, via a regular expression.
And will find a exact match of letters + numbers. Example wxyz [some space btw] 0960000
or a mix of numbers and letters [some space btw] + numbers 01xg [some space btw] 0960000
The array has four objects for now.
Once found i need a function the will open a new page or window when match is found .
Thanks you for your help.
Michael
To answer the Javascript part, here's one way to "grep" through the array to find matching elements:
var matches = [];
var re = /whatever/;
foo.forEach(
function(el) {
if( re.exec(el) )
matches.push(el);
}
);
To attempt to answer the regular expression part: I don't know what "exact match" means to you, and I'm assuming "some space" belongs only in between the other terms, and I'm assuming letters means the English alphabet from 'a' to 'z' in lower and upper case and the digits should be 0-9 (otherwise, other language characters might be matched).
The first pattern would be /[a-zA-Z0-9]+\s*0960000/. Change "\s*" to "\s+" if there is at least one space, instead of zero or more space characters. Change "\s" to " " if matching the tab character (and some lesser-used space chars) is not desirable.
For the second pattern, I don't know what "numbers 01xg" means, but if it means numbers followed by that string, then the pattern would be /[a-zA-Z0-9]+\s*[0-9]+\s*01xg\s*0960000/. The same caveats apply as above.
Additionally, this will match a partial string. If the string much be matched in entirety (if nothing in the string must exist except that which is matched), add "^" to the beginning of the pattern to anchor it to the beginning of the string, and "$" at the end to anchor it to the end of the string. For example, /[a-zA-Z0-9]+\s*0960000/ matches "foo_bar 5 0960000", but /^[a-zA-Z0-9]+\s*0960000$/ does not.
For more on regular expressions in Javascript, take a look at developer.mozilla.org's article on the RegExp object (the link takes you to JS version 1.5 reference, which should apply to all JS-capable browsers).
(edited to add): To match either situation, since they have overlapping parts, you could use the following pattern: /[a-zA-Z0-9]+(?:\s*[0-9]+\s*01xg)?\s*0960000/. The question mark says to match the part that differs -- in a non-matching group (?:foo) -- once or zero times. (?:foo)? and (?:foo|) do the same thing in this case, but I'm not sure whether there is a performance difference; I would recommend to use the one that makes the most sense to you, so you can read it later.

Categories

Resources