Regex to find 5 consecutive letters of alphabet (ex. abcde, noprst) - javascript

I have strings containing 5 letters of alphabet. I would like to match those that contain letters that are consecutive in alphabet for example:
abcde - return match
nopqrs - return match
cdefg - return match
fghij - return match
but
abcef - do not return match
abbcd - do not return match
I could write all combinations but as you can write in Regex [A-Z] I assumed there must be a better way.

A very simple alternative would be to just use String.prototype.includes:
function isConsecutive(string) {
const result = 'abcdefghijklmnopqrstuvwxyz'.includes(string);
console.log(string, result);
}
// true
isConsecutive('abcde');
isConsecutive('nopqrs');
isConsecutive('cdefg');
isConsecutive('fghij');
// false
isConsecutive('abcef');
isConsecutive('abbcd');

If you can live with Python, this function converts the string sequence into numbered characters, and checks if they are consequtive (if so, they are also consecutive alphabetically):
def are_letters_consequtive(text):
nums = [ord(letter) for letter in text]
if sorted(nums) == list(range(min(nums), max(nums)+1)):
return "match"
return "no match"
print(are_letters_consequtive('abcde'))
print(are_letters_consequtive('cdefg'))
print(are_letters_consequtive('fghij'))
print(are_letters_consequtive('abcef'))
print(are_letters_consequtive('abbcd'))
print(are_letters_consequtive('noprst'))
Outputs:
match
match
match
no match
no match
no match

An alternative using javascript:
let string1 = 'abcde'
let string2 = 'fghiz'
function conletters(string) {
if(string.length > 5 || typeof string != 'string') throw '[ERROR] not string or string greater than 5'
for(let i = 0; i < string.length - 1; i++) {
if(!(string.charCodeAt(i) + 1 == string.charCodeAt(i + 1)))
return false
}
return true
}
console.log('string1 is consecutive: ' + conletters(string1))
console.log('string2 is consecutive: ' + conletters(string2))

You should definitely do it with code:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
That said, you can do better than testing all the combinations when using regexes. With lookahead expressions you can basically do "and" operation. Since you know the length you could do:
const myRegex = /(?=^(ab|bc)...$)(?=^.(ab|bc)..$)(?=^..(ab|bc).$)(?=^...(ab|bc)$)/
You will need to replace the (ab|bc) with all the possible two combinations.
For this particular case it is actually worse than testing all the possibilities (since there are only 22 possibilities) but it makes it more extensible to other situations.

Related

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?
1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);
You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

regex to extract numbers starting from second symbol

Sorry for one more to the tons of regexp questions but I can't find anything similar to my needs. I want to output the string which can contain number or letter 'A' as the first symbol and numbers only on other positions. Input is any string, for example:
---INPUT--- -OUTPUT-
A123asdf456 -> A123456
0qw#$56-398 -> 056398
B12376B6f90 -> 12376690
12A12345BCt -> 1212345
What I tried is replace(/[^A\d]/g, '') (I use JS), which almost does the job except the case when there's A in the middle of the string. I tried to use ^ anchor but then the pattern doesn't match other numbers in the string. Not sure what is easier - extract matching characters or remove unmatching.
I think you can do it like this using a negative lookahead and then replace with an empty string.
In an non capturing group (?:, use a negative lookahad (?! to assert that what follows is not the beginning of the string followed by ^A or a digit \d. If that is the case, match any character .
(?:(?!^A|\d).)+
var pattern = /(?:(?!^A|\d).)+/g;
var strings = [
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
for (var i = 0; i < strings.length; i++) {
console.log(strings[i] + " ==> " + strings[i].replace(pattern, ""));
}
You can match and capture desired and undesired characters within two different sides of an alternation, then replace those undesired with nothing:
^(A)|\D
JS code:
var inputStrings = [
"A-123asdf456",
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
console.log(
inputStrings.map(v => v.replace(/^(A)|\D/g, "$1"))
);
You can use the following regex : /(^A)?\d+/g
var arr = ['A123asdf456','0qw#$56-398','B12376B6f90','12A12345BCt', 'A-123asdf456'],
result = arr.map(s => s.match(/(^A|\d)/g).join(''));
console.log(result);

What's the JS RegExp for this specific string?

I have a rather isolated situation in an inventory management program where our shelf locations have a specific format, which is always Letter: Number-Letter-Number, such as Y: 1-E-4. Most of us coworkers just type in "y1e4" and are done with it, but that obviously creates issues with inconsistent formats in a database. Are JS RegExp's the ideal way to automatically detect and format these alphanumeric strings? I'm slowly wrapping my head around JavaScript's Perl syntax, but what's a simple example of formatting one of these strings?
spec: detect string format of either "W: D-W-D" or "WDWD" and return "W: D-W-D"
This function will accept any format and return undefined if it doesnt match, returns the formatted string if a match does occur.
function validateInventoryCode(input) {
var regexp = /^([a-zA-Z]+)(?:\:\s*)?(\d+)-?(\w+)-?(\d+)$/
var r = regexp.exec(input);
if(r != null) {
return `${r[1]}: ${r[2]}-${r[3]}-${r[4]}`;
}
}
var possibles = ["y1e1", "y:1e1", "Y: 1r3", "y: 32e4", "1:e3e"];
possibles.forEach(function(posssiblity) {
console.log(`input(${posssiblity}), result(${validateInventoryCode(posssiblity)})`);
})
function validateInventoryCode(input) {
var regexp = /^([a-zA-Z]+)(?:\:\s*)?(\d+)-?(\w+)-?(\d+)$/
var r = regexp.exec(input);
if (r != null) {
return `${r[1]}: ${r[2]}-${r[3]}-${r[4]}`;
}
}
I understand the question as "convert LetterNumberLetterNumber to Letter: Number-Letter-Number.
You may use
/^([a-z])(\d+)([a-z])(\d+)$/i
and replace with $1: $2-$3-$4
Details:
^ - start of string
([a-z]) - Group 1 (referenced with $1 from the replacement pattern) capturing any ASCII letter (as /i makes the pattern case-insensitive)
(\d+) - Group 2 capturing 1 or more digits
([a-z]) - Group 3, a letter
(\d+) - Group 4, a number (1 or more digits)
$ - end of string.
See the regex demo.
var re = /^([a-z])(\d+)([a-z])(\d+)$/i;
var s = 'y1e2';
var result = s.replace(re, '$1: $2-$3-$4');
console.log(result);
OR - if the letters must be turned to upper case:
var re = /^([a-z])(\d+)([a-z])(\d+)$/i;
var s = 'y1e2';
var result = s.replace(re,
(m,g1,g2,g3,g4)=>`${g1.toUpperCase()}: ${g2}-${g3.toUpperCase()}-${g4}`
);
console.log(result);
this is the function to match and replace the pattern: DEMO
function findAndFormat(text){
var splittedText=text.split(' ');
for(var i=0, textLength=splittedText.length; i<textLength; i++){
var analyzed=splittedText[i].match(/[A-z]{1}\d{1}[A-z]{1}\d{1}$/);
if(analyzed){
var formattedString=analyzed[0][0].toUpperCase()+': '+analyzed[0][1]+'-'+analyzed[0][2].toUpperCase()+'-'+analyzed[0][3];
text=text.replace(splittedText[i],formattedString);
}
}
return text;
}
i think it's just as it reads:
y1e4
Letter, number, letter, number:
/([A-z][0-9][A-z][0-9])/g
And yes, it's ok to use regex in this case, like form validations and stuff like that. it's just there are some cases on which abusing of regular expressions gives you a bad performance (into intensive data processing and the like)
Example
"HelloY1E4world".replace(/([A-z][0-9][A-z][0-9])/g, ' ');
should return: "Hello world"
regxr.com always comes in handy

Remove Any Non-Digit And Check if Formatted as Valid Number

I'm trying to figure out a regex pattern that allows a string but removes anything that is not a digit, a ., or a leading -.
I am looking for the simplest way of removing any non "number" variables from a string. This solution doesn't have to be regex.
This means that it should turn
1.203.00 -> 1.20300
-1.203.00 -> -1.20300
-1.-1 -> -1.1
.1 -> .1
3.h3 -> 3.3
4h.34 -> 4.34
44 -> 44
4h -> 4
The rule would be that the first period is a decimal point, and every following one should be removed. There should only be one minus sign in the string and it should be at the front.
I was thinking there should be a regex for it, but I just can't wrap my head around it. Most regex solutions I have figured out allow the second decimal point to remain in place.
You can use this replace approach:
In the first replace we are removing all non-digit and non-DOT characters. Only exception is first hyphen that we negative using a lookahead.
In the second replace with a callback we are removing all the DOT after first DOT.
Code & Demo:
var nums = ['..1', '1..1', '1.203.00', '-1.203.00', '-1.-1', '.1', '3.h3',
'4h.34', '4.34', '44', '4h'
]
document.writeln("<pre>")
for (i = 0; i < nums.length; i++)
document.writeln(nums[i] + " => " + nums[i].replace(/(?!^-)[^\d.]+/g, "").
replace(/^(-?\d*\.\d*)([\d.]+)$/,
function($0, $1, $2) {
return $1 + $2.replace(/[.]+/g, '');
}))
document.writeln("</pre>")
A non-regex solution, implementing a trivial single-pass parser.
Uses ES5 Array features because I like them, but will work just as well with a for-loop.
function generousParse(input) {
var sign = false, point = false;
return input.split('').filter(function(char) {
if (char.match(/[0-9]/)) {
return sign = true;
}
else if (!sign && char === '-') {
return sign = true;
}
else if (!point && char === '.') {
return point = sign = true;
}
else {
return false;
}
}).join('');
}
var inputs = ['1.203.00', '-1.203.00', '-1.-1', '.1', '3.h3', '4h.34', '4.34', '4h.-34', '44', '4h', '.-1', '1..1'];
console.log(inputs.map(generousParse));
Yes, it's longer than multiple regex replaces, but it's much easier to understand and see that it's correct.
I can do it with a regex search-and-replace. num is the string passed in.
num.replace(/[^\d\-\.]/g, '').replace(/(.)-/g, '$1').replace(/\.(\d*\.)*/, function(s) {
return '.' + s.replace(/\./g, '');
});
OK weak attempt but seems fine..
var r = /^-?\.?\d+\.?|(?=[a-z]).*|\d+/g,
str = "1.203.00\n-1.203.00\n-1.-1\n.1\n3.h3\n4h.34\n44\n4h"
sar = str.split("\n").map(s=> s.match(r).join("").replace(/[a-z]/,""));
console.log(sar);

Split string in JavaScript using regex with zero width lookbehind

I know JavaScript regular expressions have native lookaheads but not lookbehinds.
I want to split a string at points either beginning with any member of one set of characters or ending with any member of another set of characters.
Split before ເ, ແ, ໂ, ໃ, ໄ. Split after ະ.
In: ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ
Out: ເລື້ອຍໆມະ ຫັດສະ ຈັນ ເອກອັກຄະ ລັດຖະ ທູດ
I can achieve the "split before" part using zero-width lookahead:
'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ'.split(/(?=[ໃໄໂເແ])/)
["ເລື້ອຍໆມະຫັດສະຈັນ", "ເອກອັກຄະລັດຖະທູດ"]
But I can't think of a general approach to simulating zero-width lookbehind
I'm splitting strings of arbitrary Unicode text so don't want to substitute in special markers in a first pass, since I can't guarantee the absence of any string from my input.
Instead of spliting, you may consider using the match() method.
var s = 'ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ',
r = s.match(/(?:(?!ະ).)+?(?:ະ|(?=[ໃໄໂເແ]|$))/g);
console.log(r); //=> [ 'ເລື້ອຍໆມະ', 'ຫັດສະ', 'ຈັນ', 'ເອກອັກຄະ', 'ລັດຖະ', 'ທູດ' ]
You could try matching rather than splitting,
> var re = /((?:(?!ະ).)+(?:ະ|$))/g;
undefined
> var str = "ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູດ"
undefined
> var m;
undefined
> while ((m = re.exec(str)) != null) {
... console.log(m[1]);
... }
ເລື້ອຍໆມະ
ຫັດສະ
ຈັນເອກອັກຄະ
ລັດຖະ
ທູດ
Then again split the elements in the array using lookahead.
If you use parentheses in the delimited regex, the captured text is included in the returned array. So you can just split on /(ະ)/ and then concatenate each of the odd members of the resulting array to the preceding even member. Example:
"ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູ".split(/(ະ)/).reduce(function(arr,str,index) {
if (index%2 == 0) {
arr.push(str);
} else {
arr[arr.length-1] += str
};
return arr;
},[])
Result: ["ເລື້ອຍໆມະ", "ຫັດສະ", "ຈັນເອກອັກຄະ", "ລັດຖະ", "ທູ"]
You can do another pass to split on the lookahead:
"ເລື້ອຍໆມະຫັດສະຈັນເອກອັກຄະລັດຖະທູ".split(/(ະ)/).reduce(function(arr,str,index) {
if (index%2 == 0) {
arr.push(str);
} else {
arr[arr.length-1] += str
};
return arr;
},[]).reduce(function(arr,str){return arr.concat(str.split(/(?=[ໃໄໂເແ])/));},[]);
Result: ["ເລື້ອຍໆມະ", "ຫັດສະ", "ຈັນ", "ເອກອັກຄະ", "ລັດຖະ", "ທູ"]

Categories

Resources