regex for nested values - javascript

I'm trying to get the numbers/stings out of a string that looks like this
"[123][456][abc]"
Also I don't want to include "[" or "]" and I want to keep the values separate.

Try this on for size.
/\[(\d+|[a-zA-Z]+)\]/
Edit:
If you can support lookahead and lookbehind
/(?<=\[)(\d+|[a-zA-Z]+)(?=\])/
Another edit:
try this in javascript
var text = "[12][34][56][bxe]";
var array = text.match(/(\d+|[a-zA-Z]+)/g);

This would be a lot easier if we knew the language. For example, in Javascript you can do:
"[123][456][abc]".split(/[\[\]]/);
and similarly in Python:
>>> import re
>>> re.split(r'[\[\]]', "[123][456][abc]")
['', '123', '', '456', '', 'abc', '']
I'm sure there are ways to do this in other languages, too.

See http://www.regular-expressions.info/javascript.html, particularly the "How to Use The JavaScript RegExp Object" section:
If you want to retrieve the part of
the string that was matched, call the
exec() function of the RegExp object
that you created, e.g.: mymatch =
myregexp.exec("subject"). This
function returns an array. The zeroth
item in the array will hold the text
that was matched by the regular
expression. The following items
contain the text matched by the
capturing parentheses in the regexp,
if any. mymatch.length indicates the
length of the match[] array, which is
one more than the number of capturing
groups in your regular expression.
mymatch.index indicates the character
position in the subject string at
which the regular expression matched.
mymatch.input keeps a copy of the
subject string.
That explains how to access individual parenthesized groups. You can use that in conjunction with a pattern like /\[(\w+)\]/g

Related

How to get content using filter and match?

I want to search in the array if theres the string that Im looking for, to do that im using match
const search_notes = array_notes.filter(notes => notes.real_content.toUpperCase().match(note.toUpperCase()));
as you can see, search_notes will give me an array with all the strings that at least has a character from the input or match completely, but theres a problem, because when I write , ), [], + or any regex symbol in the input it will gives me this error:
how can i solve this?
If you look at documentation for the match method (for instance, MDN's), you'll see that it accepts a RegExp object or a string, and if you give it a string, it passes that string into new RegExp. So naturally, characters that have special meaning in a regular expression need special treatment.
You don't need match, just includes, which doesn't do that:
const search_notes = array_notes.filter(
notes => notes.real_content.toUpperCase().includes(note.toUpperCase())
);

Empty value if regex match not found JavaScript

I'm attempting to extract any text characters at the beginning, and the following two numbers of a string. If the string starts with a number, I'd like to get an empty string value instead so the resulting array still contains 3 values.
String:
'M2.55X.45'
Code:
'M2.55X.45'.match(/(^[a-zA-Z]+)|((\.)?\d+[\/\d. ]*|\d)/g)
Expected:
["M", "2.55", ".45"]
Actual (correct):
["M", "2.55", ".45"]
String:
'2.55X.45'
Code:
'2.55X.45'.match(/(^[a-zA-Z]+)|((\.)?\d+[\/\d. ]*|\d)/g)
Expected:
["", "2.55", ".45"]
Actual:
["2.55", ".45"]
Use /^([a-zA-Z]?)(\d*(?:\.\d+)?)[a-zA-Z](\d*(?:\.\d+)?)$/.exec("2.55X.45") instead. This returns an array where the 1st element is the entire match, so you must access groups 1-indexed, for example, match[1] for the 1st value. You can try this out here.
Your current regex uses an alternate clause (|), which creates different types of grouping depending on which alternate is matched.
Here's an example (cleaned up a bit) that creates explicit groups and makes the individual groups optional.
const regex = /^([a-zA-Z]*)?(\d*(?:\.\d+)?)([a-zA-Z]+)(\d*(?:\.\d+)?)$/
console.log(regex.exec("2.55X.45"))
console.log(regex.exec("M2.55X.45"))
Note that I've removed the g flag, so the regex's state isn't preserved.
I've also used exec instead of match to not discard capture groups.
You can try this pattern
(\D*)(\d+(?:\.\d+))\D+(\.\d+)
let finder = (str) => {
return (str.match(/^(\D*)(\d+(?:\.\d+))\D+(\.\d+)/) || []).slice(1)
}
console.log(finder('M2.55X.45'))
console.log(finder("2.55X.45"))

filter an array based on regex expression [duplicate]

I'm doing a small javascript method, which receive a list of point, and I've to read those points to create a Polygon in a google map.
I receive those point on the form:
(lat, long), (lat, long),(lat, long)
So I've done the following regex:
\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)
I've tested it with RegexPal and the exact data I receive:
(25.774252, -80.190262),(18.466465, -66.118292),(32.321384, -64.75737),(25.774252, -80.190262)
and it works, so why when I've this code in my javascript, I receive null in the result?
var polygons="(25.774252, -80.190262),(18.466465, -66.118292),(32.321384, -64.75737),(25.774252, -80.190262)";
var reg = new RegExp("/\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g");
var result = polygons.match(reg);
I've no javascript error when executing(with debug mode of google chrome). This code is hosted in a javascript function which is in a included JS file. This method is called in the OnLoad method.
I've searched a lot, but I can't find why this isn't working. Thank you very much!
Use a regex literal [MDN]:
var reg = /\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g;
You are making two errors when you use RegExp [MDN]:
The "delimiters" / are should not be part of the expression
If you define an expression as string, you have to escape the backslash, because it is the escape character in strings
Furthermore, modifiers are passed as second argument to the function.
So if you wanted to use RegExp (which you don't have to in this case), the equivalent would be:
var reg = new RegExp("\\(\\s*([0-9.-]+)\\s*,\\s([0-9.-]+)\\s*\\)", "g");
(and I think now you see why regex literals are more convenient)
I always find it helpful to copy and past a RegExp expression in the console and see its output. Taking your original expression, we get:
/(s*([0-9.-]+)s*,s([0-9.-]+)s*)/g
which means that the expressions tries to match /, s and g literally and the parens () are still treated as special characters.
Update: .match() returns an array:
["(25.774252, -80.190262)", "(18.466465, -66.118292)", ... ]
which does not seem to be very useful.
You have to use .exec() [MDN] to extract the numbers:
["(25.774252, -80.190262)", "25.774252", "-80.190262"]
This has to be called repeatedly until the whole strings was processed.
Example:
var reg = /\(\s*([0-9.-]+)\s*,\s([0-9.-]+)\s*\)/g;
var result, points = [];
while((result = reg.exec(polygons)) !== null) {
points.push([+result[1], +result[2]]);
}
This creates an array of arrays and the unary plus (+) will convert the strings into numbers:
[
[25.774252, -80.190262],
[18.466465, -66.118292],
...
]
Of course if you want the values as strings and not as numbers, you can just omit the +.

Extracting a complicated part of the string with plain Javascript

I have a following string:
Text
I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com'
There are a few variables:
jan_kowalski is a name and surname it can change, and sometimes even have 3 elements
the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get)
the rest of the string remains the same for every case (beginning + everything after starting with _company_com ...
Ps. I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help
An alternative to Randy Casburn's solution using regex
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1];
console.log(out);
Or if you want to just get that string with those country codes you specified
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
A proof of concept that this solution also works for other combinations
let urls = [
new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'),
new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx')
]
urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1]))
I have been very successful before with this kind of problems with regular expressions:
var string = 'Text';
var regExp = /([\w]{2})_company_com/;
find = string.match(regExp);
console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code
First you got your given string. Second you have a regular expression, which is marked with two slashes at the beginning and at the end. A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful).
In this case here it matches exactly two word characters [\w]{2} followed directly by _company_com (\w indicates a word character, the [] group all wanted character types, here only word characters, and the {}indicate the number of characters to be found). Now to find the wanted part string.match(regExp) has to be called to get all captured findings. It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by ()). So in this case you get the country code with find[1], which is the first and only capture group of the regular expression.

Replace .split() with .match() using regex in javascript

I'm having difficulties with constructing some regular expressions using Javascript.
What I need:
I have a string like: Woman|{Man|Boy} or {Girl|Woman}|Man or Woman|Man etc.
I need to split this string by '|' separator, but I don't want it to be split inside curly brackets.
Examples of strings and desired results:
// Expample 1
string: 'Woman|{Man|Boy}'
result: [0] = 'Woman', [1] = '{Man|Boy}'
// Example 2
string '{Woman|Girl}|{Man|Boy}'
result: [0] = '{Woman|Girl}', [1] = '{Man|Boy}'
I can't change "|" symbol to another inside the brackets because the given strings are the result of a recursive function. For example, the original string could be
'Nature|Computers|{{Girls|Women}|{Boys|Men}}'
try this:
var reg=/\|(?![^{}]+})/g;
Example results:
var a = 'Woman|{Man|Boy}';
var b = '{Woman|Girl}|{Man|Boy}';
a.split(reg)
["Woman", "{Man|Boy}"]
b.split(reg)
["{Woman|Girl}", "{Man|Boy}"]
for your another question:
"Now I have another, but a bit similar problem. I need to parse all containers from the string. Syntax of the each container is {sometrash}. The problem is that container can contain another containers, but I need to parse only "the most relative" container. mystring.match(/\{+.+?\}+/gi); which I use doesn't work correctly. Could you correct this regex, please? "
you can use this regex:
var reg=/\{[^{}]+\}/g;
Example results:
var a = 'Nature|Computers|{{Girls|Women}|{Boys|Men}}';
a.match(reg)
["{Girls|Women}", "{Boys|Men}"]
You can use
.match(/[^|]+|\{[^}]*\}/g)
to match those. However, if you have a nesting of arbitrary depth then you'll need to use a parser, [javascript] regex won't be capable of doing that.
Test this:
([a-zA-Z0-9]*\|[a-zA-Z0-9]*)|{[a-zA-Z0-9]*\|[a-zA-Z0-9]*}

Categories

Resources