JavaScript Regex split at first letter? - javascript

Since many cases using Regex, differs from case to case, depending on what format your string is in, I'm having a hard time finding a solution to my problem.
I have an array containing strings in the format, as an example:
"XX:XX - XX:XX Algorithm and Data Structures"
Where "XX:XX - XX:XX" is timespan for a lecture, and X being a number.
I'm new to Regex and trying to split the string at the first letter occurring, like so:
let str = "08:15 - 12:50 Algorithm and Data Structures";
let re = //Some regex expression
let result = str.split(re); // Output: ["08:15 - 12:50", "Algorithm and Data Structures"]
I'm thinking it should be something like /[a-Z]/ but I'm not sure at all...
Thanks in advance!

The easiest way is probably to "mark" where you want to split and then split:
const str = '12 34 abcde 45 abcde'.replace(/^([^a-z]+)([a-z])/i, '$1,$2');
// '12 34 ,abcde 45 abcde'
str.split(',')
// [ '12 34 ', 'abcde 45 abcde' ]
This finds the place where the string starts, has a bunch of non a-z characters, then has an a-z characters, and puts a comma right in-between. Then you split by the comma.
You can also split directly with a positive look ahead but it might make the regex a bit less readable.

console.log(
"08:15 - 12:50 Algorithm and Data Structures".split(/ ([A-Za-z].*)/).filter(Boolean)
)
or, if it's really always XX:XX - XX:XX, easier to just do:
const splitTimeAndCourse = (input) => {
return [
input.slice(0, "XX:XX - XX:XX".length),
input.slice("XX:XX - XX:XX".length + 1)
]
}
console.log(splitTimeAndCourse("08:15 - 12:50 Algorithm and Data Structures"))

If you have a fixed length of the string where the time is, you can use this regex for example
(^.{0,13})(.*)
Check this here https://regex101.com/r/ANMHy5/1

I know you asked about regex in particular, but here is a way to this without regex...
Provided your time span is always at the beginning of your string and will always be formatted with white space between the numbers as XX:XX - XX:XX. You could use a function that splits the string at the white space and reconstructs the first three indexed strings into one chunk, the time span, and the last remaining strings into a second chunk, the lecture title. Then return the two chunks as an array.
let str = "08:15 - 12:50 Algorithm and Data Structures";
const splitString = (str) => {
// split the string at the white spaces
const strings = str.split(' ')
// define variables
let lecture = '',
timespan = '';
// loop over the strings
strings.forEach((str, i) => {
// structure the timespan
timespan = `${strings[0]} ${strings[1]} ${strings[2]}`;
// conditional to get the remaining strings and concatenate them into a new string
i > 2 && i < strings.length?lecture += `${str} `: '';
})
// place them into an array and remove white space from end of second string
return [timespan, lecture.trimEnd()]
}
console.log(splitString(str))

For that format, you might also use 2 capture groups instead of using split.
^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)
The pattern matches:
^ Start of string
(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2}) Capture group 1, match a timespan like pattern
\s+ Match 1+ whitspac chars
([A-Za-z].*) Capture group 2, start with a char A-Za-z and match the rest of the line.
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /^(\d{1,2}:\d{1,2}\s*-\s*\d{1,2}:\d{1,2})\s+([A-Za-z].*)/;
let [, ...groups] = str.match(regex);
console.log(groups);
Another option using split might be asserting not any chars a-zA-Z to the left from the start of the string using a lookbehind (see this link for the support), match 1+ whitespace chars and asserting a char a-zA-Z to the right.
(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])
Regex demo
let str = "08:15 - 12:50 Algorithm and Data Structures";
let regex = /(?<=^[^a-zA-Z]+)\s+(?=[A-Za-z])/;
console.log(str.split(regex))

Related

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?
1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);
You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

Regex matches numbers in date but shouldn't

Why does my regex pattern match the date part of the string? It seems like I'm not accounting for the / (slash) correctly with [^\/] to avoid the pattern to match date strings?
const reg = new RegExp(
/(USD|\$|EUR|€|USDC|USDT)?\s?(\d+[^\/]|\d{1,3}(,\d{3})*)(\.\d+)?(k|K|m|M)?\b/,
"i"
);
const str = "02/22/2021 $50k";
console.log(reg.exec(str));
// result: ['02', undefined, '02', undefined, undefined, undefined, undefined, index: 0, input: '02/22/2021 $50k', groups: undefined]
// was expecting: [$50k,...]
You get those matches for the date part and the undefined ones, because you use a pattern with optional parts and alternations |
In your pattern there is this part (\d+[^\/]|\d{1,3}(,\d{3})*). That first part of the alternation \d+[^\/] matches 1+ digits followed by any char except a / (which can also match a digit) and the minimum amount of characters is 2. That part will match 20, 22 and 2021 in the date part.
If there is 1 digit, the second part of the alternation will match it.
If you want to match only numbers as well, you can assert not / to the left and the right, and make the whole part with the first alternatives like USD optional with the optional whitspace chars as well, to prevent matching that before only digits.
The last alternation can be shortened to a character class [km]? with a case insensitive flag.
See this page for the lookbehind support for Javascript.
(?:(?:USD|\$|EUR|€|USDC|USDT)\s?)?(?<!\/)\b(?:\d{1,3}(?:,\d{3})*(?:\.\d+)?|\d+)(?!\/)[KkMm]?\b
Regex demo
const reg = /(?:(?:USD|\$|EUR|€|USDC|USDT)\s?)?(?<!\/)\b(?:\d{1,3}(?:,\d{3})*(?:\.\d+)?|\d+)(?!\/)[KkMm]?\b/gi;
const str = "02/22/2021 $50k 1,213.3 11111111 $50,000 $50000"
const res = Array.from(str.matchAll(reg), m => m[0]);
console.log(res)
If the currency is not optional:
(?:USD|\$|EUR|€|USDC|USDT)\s?(?:\d{1,3}(?:,\d{3})*(?:\.\d+)?|\d+)[KkMm]?\b
Regex demo
I can't get your regex well. so i try to figure out what result you would expect. check this. in groups you have each part of your string.
const regex = /(\d{2})*\/?(\d{2})\/(\d{2,4})?\s*(USD|\$|EUR|€|USDC|USDT)?(\d*)(k|K|m|M)?\b/i
const regexNamed = /(?<day>\d{2})*\/?(?<month>\d{2})\/(?<year>\d{2,4})?\s*(?<currency>USD|\$|EUR|€|USDC|USDT)?(?<value>\d*)(?<unit>k|K|m|M)?\b/i
const str1 = '02/22/2021 $50k'
const str2 = '02/2021 €50m'
const m1 = str1.match(regex)
const m2 = str2.match(regexNamed)
console.log(m1)
console.log(m2.groups)
Blockquote

Regex for getting only the last N numbers in javascript

I've being trying to generate a regex for this string:
case1: test-123456789 should get 56789
case2: test-1234-123456789 should get 56789
case3: test-12345 should fail or not giving anything
what I need is a way to get only the last 5 numbers from only 9 numbers
so far I did this:
case.match(/\d{5}$/)
it works for the first 2 cases but not for the last one
You may use
/\b\d{4}(\d{5})$/
See the regex demo. Get Group 1 value.
Details
\b - word boundary (to make sure the digit chunks are 9 digit long) - if your digit chunks at the end of the string can contain more, remove \b
\d{4} - four digits
(\d{5}) - Group 1: five digits
$ - end of string.
JS demo:
var strs = ['test-123456789','test-1234-123456789','test-12345'];
var rx = /\b\d{4}(\d{5})$/;
for (var s of strs) {
var m = s.match(rx);
if (m) {
console.log(s, "=>", m[1]);
} else {
console.log("Fail for ", s);
}
}
You can try this:
var test="test-123456789";
console.log((test.match(/[^\d]\d{4}(\d{5})$/)||{1: null/*default value if not found*/})[1]);
This way supports default value for when not found any matching (look at inserted comment inline above code.).
You can use a positive lookbehind (?<= ) to assert that your group of 5 digits is preceeded by a group of 4 digits without including them in the result.
/(?<=\d{4})\d{5}$/
var inputs = [
"test-123456789", // 56789
"test-1234-123456789", // 56789
"test-12345", //fail or not giving anything
]
var rgx = /(?<=\d{4})\d{5}$/
inputs.forEach(str => {
console.log(rgx.exec(str))
})

regex to extract numbers starting from second symbol

Sorry for one more to the tons of regexp questions but I can't find anything similar to my needs. I want to output the string which can contain number or letter 'A' as the first symbol and numbers only on other positions. Input is any string, for example:
---INPUT--- -OUTPUT-
A123asdf456 -> A123456
0qw#$56-398 -> 056398
B12376B6f90 -> 12376690
12A12345BCt -> 1212345
What I tried is replace(/[^A\d]/g, '') (I use JS), which almost does the job except the case when there's A in the middle of the string. I tried to use ^ anchor but then the pattern doesn't match other numbers in the string. Not sure what is easier - extract matching characters or remove unmatching.
I think you can do it like this using a negative lookahead and then replace with an empty string.
In an non capturing group (?:, use a negative lookahad (?! to assert that what follows is not the beginning of the string followed by ^A or a digit \d. If that is the case, match any character .
(?:(?!^A|\d).)+
var pattern = /(?:(?!^A|\d).)+/g;
var strings = [
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
for (var i = 0; i < strings.length; i++) {
console.log(strings[i] + " ==> " + strings[i].replace(pattern, ""));
}
You can match and capture desired and undesired characters within two different sides of an alternation, then replace those undesired with nothing:
^(A)|\D
JS code:
var inputStrings = [
"A-123asdf456",
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
console.log(
inputStrings.map(v => v.replace(/^(A)|\D/g, "$1"))
);
You can use the following regex : /(^A)?\d+/g
var arr = ['A123asdf456','0qw#$56-398','B12376B6f90','12A12345BCt', 'A-123asdf456'],
result = arr.map(s => s.match(/(^A|\d)/g).join(''));
console.log(result);

Regex using javascript to return just numbers

If I have a string like "something12" or "something102", how would I use a regex in javascript to return just the number parts?
Regular expressions:
var numberPattern = /\d+/g;
'something102asdfkj1948948'.match( numberPattern )
This would return an Array with two elements inside, '102' and '1948948'. Operate as you wish. If it doesn't match any it will return null.
To concatenate them:
'something102asdfkj1948948'.match( numberPattern ).join('')
Assuming you're not dealing with complex decimals, this should suffice I suppose.
You could also strip all the non-digit characters (\D or [^0-9]):
let word_With_Numbers = 'abc123c def4567hij89'
let word_Without_Numbers = word_With_Numbers.replace(/\D/g, '');
console.log(word_Without_Numbers)
For number with decimal fraction and minus sign, I use this snippet:
const NUMERIC_REGEXP = /[-]{0,1}[\d]*[.]{0,1}[\d]+/g;
const numbers = '2.2px 3.1px 4px -7.6px obj.key'.match(NUMERIC_REGEXP)
console.log(numbers); // ["2.2", "3.1", "4", "-7.6"]
Update: - 7/9/2018
Found a tool which allows you to edit regular expression visually: JavaScript Regular Expression Parser & Visualizer.
Update:
Here's another one with which you can even debugger regexp: Online regex tester and debugger.
Update:
Another one: RegExr.
Update:
Regexper and Regex Pal.
If you want only digits:
var value = '675-805-714';
var numberPattern = /\d+/g;
value = value.match( numberPattern ).join([]);
alert(value);
//Show: 675805714
Now you get the digits joined
I guess you want to get number(s) from the string. In which case, you can use the following:
// Returns an array of numbers located in the string
function get_numbers(input) {
return input.match(/[0-9]+/g);
}
var first_test = get_numbers('something102');
var second_test = get_numbers('something102or12');
var third_test = get_numbers('no numbers here!');
alert(first_test); // [102]
alert(second_test); // [102,12]
alert(third_test); // null
IMO the #3 answer at this time by Chen Dachao is the right way to go if you want to capture any kind of number, but the regular expression can be shortened from:
/[-]{0,1}[\d]*[\.]{0,1}[\d]+/g
to:
/-?\d*\.?\d+/g
For example, this code:
"lin-grad.ient(217deg,rgba(255, 0, 0, -0.8), rgba(-255,0,0,0) 70.71%)".match(/-?\d*\.?\d+/g)
generates this array:
["217","255","0","0","-0.8","-255","0","0","0","70.71"]
I've butchered an MDN linear gradient example so that it fully tests the regexp and doesn't need to scroll here. I think I've included all the possibilities in terms of negative numbers, decimals, unit suffixes like deg and %, inconsistent comma and space usage, and the extra dot/period and hyphen/dash characters within the text "lin-grad.ient". Please let me know if I'm missing something. The only thing I can see that it does not handle is a badly formed decimal number like "0..8".
If you really want an array of numbers, you can convert the entire array in the same line of code:
array = whatever.match(/-?\d*\.?\d+/g).map(Number);
My particular code, which is parsing CSS functions, doesn't need to worry about the non-numeric use of the dot/period character, so the regular expression can be even simpler:
/-?[\d\.]+/g
var result = input.match(/\d+/g).join([])
Using split and regex :
var str = "fooBar0123".split(/(\d+)/);
console.log(str[0]); // fooBar
console.log(str[1]); // 0123
The answers given don't actually match your question, which implied a trailing number. Also, remember that you're getting a string back; if you actually need a number, cast the result:
item=item.replace('^.*\D(\d*)$', '$1');
if (!/^\d+$/.test(item)) throw 'parse error: number not found';
item=Number(item);
If you're dealing with numeric item ids on a web page, your code could also usefully accept an Element, extracting the number from its id (or its first parent with an id); if you've an Event handy, you can likely get the Element from that, too.
As per #Syntle's answer, if you have only non numeric characters you'll get an Uncaught TypeError: Cannot read property 'join' of null.
This will prevent errors if no matches are found and return an empty string:
('something'.match( /\d+/g )||[]).join('')
Here is the solution to convert the string to valid plain or decimal numbers using Regex:
//something123.777.321something to 123.777321
const str = 'something123.777.321something';
let initialValue = str.replace(/[^0-9.]+/, '');
//initialValue = '123.777.321';
//characterCount just count the characters in a given string
if (characterCount(intitialValue, '.') > 1) {
const splitedValue = intitialValue.split('.');
//splittedValue = ['123','777','321'];
intitialValue = splitedValue.shift() + '.' + splitedValue.join('');
//result i.e. initialValue = '123.777321'
}
If you want dot/comma separated numbers also, then:
\d*\.?\d*
or
[0-9]*\.?[0-9]*
You can use https://regex101.com/ to test your regexes.
Everything that other solutions have, but with a little validation
// value = '675-805-714'
const validateNumberInput = (value) => {
let numberPattern = /\d+/g
let numbers = value.match(numberPattern)
if (numbers === null) {
return 0
}
return parseInt(numbers.join([]))
}
// 675805714
One liner
I you do not care about decimal numbers and only need the digits, I think this one liner is rather elegant:
/**
* #param {String} str
* #returns {String} - All digits from the given `str`
*/
const getDigitsInString = (str) => str.replace(/[^\d]*/g, '');
console.log([
'?,!_:/42\`"^',
'A 0 B 1 C 2 D 3 E',
' 4 twenty 20 ',
'1413/12/11',
'16:20:42:01'
].map((str) => getDigitsInString(str)));
Simple explanation:
\d matches any digit from 0 to 9
[^n] matches anything that is not n
* matches 0 times or more the predecessor
( It is an attempt to match a whole block of non-digits all at once )
g at the end, indicates that the regex is global to the entire string and that we will not stop at the first occurrence but match every occurrence within it
Together those rules match anything but digits, which we replace by an empty strings. Thus, resulting in a string containing digits only.

Categories

Resources