regex help in extracting values from a string

regex help in extracting values from a string - javascript

I have a string in javascript like
"some text #[14cd3:+Seldum Kype] things are going good for #[7f8ef3:+Kerry Williams] so its ok"
From this i want to extract the name and id for the 2 people. so data like -
[ { id: 14cd3, name : Seldum Kype},
{ id: 7f8ef3, name : Kerry Williams} ]
how can u use regex to extract this?
please help

var text = "some text #[14cd3:+Seldum Kype] things are going " +
"good for #[7f8ef3:+Kerry Williams] so its ok"
var data = text.match(/#\[.+?\]/g).map(function(m) {
var match = m.substring(2, m.length - 1).split(':+');
return {id: match[0], name: match[1]};
})
// => [ { id: '14cd3', name: 'Seldum Kype' },
// { id: '7f8ef3', name: 'Kerry Williams' } ]
// For demo
document.getElementById('output').innerText = JSON.stringify(data);
<pre id="output"></pre>

Get the id from Group index 1 and name from group index 2.
#\[([a-z\d]+):\+([^\[\]]+)\]
DEMO
Explanation:
# Matches a literal # symbol.
\[ Matches a literal [ symbol.
([a-z\d]+) Captures one or more chars lowercase alphabets or digits.
:\+ Matches :+ literally.
([^\[\]]+) Captures any character but not of [ or ] one or more times.
\] A literal ] symbol.

Try the following, the key is to properly escape reserved special symbols:
#\[([\d\w]+):\+([\s\w]+)\]

Related

Regex - Extract the first word from a string

I want to parse the text below:
Recipient Name: Tracy Chan SKU: 103990
I want to extract "Tracy" only, the first word after "Recipient Name:" as the first name
So I got as far as /(?<=Recipient Name: )(.*)(?= SKU)/gm but it only gives me "Tracy Chan".... Using the ECMA option in Regex101...
Appreciate any help on this.
Thanks, Tracy

Use \S+ to match a sequence of non-whitespace characters, instead of .*, to get one word.
let text = 'Recipient Name: Tracy Chan SKU: 103990';
let match = text.match(/(?<=Recipient Name: )\S+/);
console.log(match[0]);

To extract "Tracy" only, you can use the following regular expression:
/(?<=Recipient Name: )(\S+)/gm
This will match the first word (i.e., the first sequence of non-whitespace characters) after the "Recipient Name:" string.
The \S character class matches any non-whitespace character, and the + quantifier specifies that the preceding pattern should be matched one or more times until the first whitespace.
a working example:
const input = "Recipient Name: Tracy Chan SKU: 103990";
const regex = /(?<=Recipient Name: )(\S+)/gm;
const matches = regex.exec(input);
console.log(matches[0]); // "Tracy"
Update: based on your comments below, you also need to extract the last name from your string value.
I would suggest to either use the original regex written in your question, or use this one, in order to extract both Tracy and Chan, then you can use the javascript split method` to split the string into an array with all the extracted names.
consider the following example:
const input = "Recipient Name: Tracy Chan SKU: 103990";
const regex = /(?<=Recipient Name: )([^ ]+)\s([^ ]+)/gm;
const allMatches = input.match(regex);
let resultArray = allMatches[0].split(' ');
console.log('firstName: '+ resultArray[0]); // "Tracy"
console.log('lastName: '+ resultArray[1]); // "Chan"

(?<=Recipient Name: )([^ ]+) .*(?= SKU)
This Works

How to split a string based on a regex pattern with conditions (JavaScript)

I am trying to split a string so that I can separate it depending on a pattern. I'm having trouble getting the correct regex pattern to do so. I also need to insert the results into an array of objects. Perhaps by using a regex pattern, the string can be split into a resulting array object to achieve the objective. Note that the regex pattern must not discriminate between - or --. Or is there any better way to do this?
I tried using string split() method, but to no avail. I am trying to achieve the result below:
const example1 = `--filename test_layer_123.png`;
const example2 = `--code 1 --level critical -info "This is some info"`;
const result1 = [{ name: "--filename", value: "test_layer_123.png" }];
const result2 = [
{ name: "--code", value: "1" },
{ name: "--level", value: "critical" },
{ name: "-info", value: "This is some info" },
];

If you really want to use Regex to solve this.
Try this Pattern /((?:--|-)\w+)\s+"?([^-"]+)"?/g
Code example:
function matchAllCommands(text, pattern){
let new_array = [];
let matches = text.matchAll(pattern);
for (const match of matches){
new_array.push({name: match.groups.name, value: match.groups.value});
}
return new_array;
}
let RegexPattern = /(?<name>(?:--|-)\w+)\s+"?(?<value>[^-"]+)"?/g;
let text = '--code 1 --level critical -info "This is some info"';
console.log(matchAllCommands(text, RegexPattern));

Here is a solution that splits the argument string using a positive lookahead, and creates the array of key & value pairs using a map:
function getArgs(str) {
return str.split(/(?= --?\w+ )/).map(str => {
let m = str.match(/^ ?([^ ]+) (.*)$/);
return {
name: m[1],
value: m[2].replace(/^"(.*)"$/, '$1')
};
});
}
[
'--filename test_layer_123.png', // example1
'--code 1 --level critical -info "This is some info"' // example2
].forEach(str => {
var result = getArgs(str);
console.log(JSON.stringify(result, null, ' '));
});
Positive lookahead regex for split:
(?= -- positive lookahead start
--?\w+ -- expect space, 1 or 2 dashes, 1+ word chars, a space
) -- positive lookahead end
Match regex in map:
^ -- anchor at start of string
? -- optional space
([^ ]+) -- capture group 1: capture everything to next space
-- space
(.*) -- capture group 2: capture everything that's left
$ -- anchor at end of string

How to get specific text from a string in regex

I have a string from which I need to extract specific texts
let str = 'id = "Test This is" id ="second" abc 123 id ="third-123"';
let res = str.match(/[^id ="\[](.*)[^\]]/g);
console.log(res);
I want the texts in ids only ['Test This is','second','third-123']
But I am getting [ 'Test This is" id ="second" abc 123 id ="third-123"' ]
The whole text after first id which I don't want.I need help with the pattern.

Your pattern uses a negated character class where you exclude matching the listed individual characters, and also exclude matching [ and ] which are not present in the example data.
That way you match the first char T in the string with [^id ="\[] and match the last char ; in the string with [^\]] and the .* captures all in between.
I would suggest using a negated character class to exclude matching the " instead:;
\bid\s*=\s*"([^"]*)"
Regex demo
let str = 'id = "Test This is" id ="second" abc 123 id ="third-123"';
let res = str.matchAll(/\bid\s*=\s*"([^"]*)"/g);
console.log(Array.from(str.matchAll(/\bid\s*=\s*"([^"]*)"/g), m => m[1]));

You can simplify this down to a non-greedy regular expression indepedent of where the quotes fall in the string:
let str = 'id = "Test This is" id ="second" abc 123 id ="third-123"';
let res = str.match(/".*?"/g);
console.log(res);

Regex to match all words but the one beginning and ending with special chars

I'm struggling with a regex for Javascript.
Here is a string from which I want to match all words but the one prefixed by \+\s and suffixed by \s\+ :
this-is my + crappy + example
The regex should match :
this-is my + crappy + example
match 1: this-is
match 2: my
match 3: example

You can use the alternation operator in context placing what you want to exclude on the left, ( saying throw this away, it's garbage ) and place what you want to match in a capturing group on the right side.
\+[^+]+\+|([\w-]+)
Example:
var re = /\+[^+]+\+|([\w-]+)/g,
s = "this-is my + crappy + example",
match,
results = [];
while (match = re.exec(s)) {
results.push(match[1]);
}
console.log(results.filter(Boolean)) //=> [ 'this-is', 'my', 'example' ]
Alternatively, you could replace between the + characters and then match your words.
var s = 'this-is my + crappy + example',
r = s.replace(/\+[^+]*\+/g, '').match(/[\w-]+/g)
console.log(r) //=> [ 'this-is', 'my', 'example' ]

As per desired output. Get the matched group from index 1.
([\w-]+)|\+\s\w+\s\+
Live DEMO
MATCH 1 this-is
MATCH 2 my
MATCH 3 example

Tokenizing strings using regular expression in Javascript

Suppose I've a long string containing newlines and tabs as:
var x = "This is a long string.\n\t This is another one on next line.";
So how can we split this string into tokens, using regular expression?
I don't want to use .split(' ') because I want to learn Javascript's Regex.
A more complicated string could be this:
var y = "This #is a #long $string. Alright, lets split this.";
Now I want to extract only the valid words out of this string, without special characters, and punctuation, i.e I want these:
var xwords = ["This", "is", "a", "long", "string", "This", "is", "another", "one", "on", "next", "line"];
var ywords = ["This", "is", "a", "long", "string", "Alright", "lets", "split", "this"];

Here is a jsfiddle example of what you asked: http://jsfiddle.net/ayezutov/BjXw5/1/
Basically, the code is very simple:
var y = "This #is a #long $string. Alright, lets split this.";
var regex = /[^\s]+/g; // This is "multiple not space characters, which should be searched not once in string"
var match = y.match(regex);
for (var i = 0; i<match.length; i++)
{
document.write(match[i]);
document.write('<br>');
}
UPDATE:
Basically you can expand the list of separator characters: http://jsfiddle.net/ayezutov/BjXw5/2/
var regex = /[^\s\.,!?]+/g;
UPDATE 2:
Only letters all the time:
http://jsfiddle.net/ayezutov/BjXw5/3/
var regex = /\w+/g;

Use \s+ to tokenize the string.

exec can loop through the matches to remove non-word (\W) characters.
var A= [], str= "This #is a #long $string. Alright, let's split this.",
rx=/\W*([a-zA-Z][a-zA-Z']*)(\W+|$)/g, words;
while((words= rx.exec(str))!= null){
A.push(words[1]);
}
A.join(', ')
/* returned value: (String)
This, is, a, long, string, Alright, let's, split, this
*/

var words = y.split(/[^A-Za-z0-9]+/);

Here is a solution using regex groups to tokenise the text using different types of tokens.
You can test the code here https://jsfiddle.net/u3mvca6q/5/
/*
Basic Regex explanation:
/ Regex start
(\w+) First group, words \w means ASCII letter with \w + means 1 or more letters
| or
(,|!) Second group, punctuation
| or
(\s) Third group, white spaces
/ Regex end
g "global", enables looping over the string to capture one element at a time
Regex result:
result[0] : default group : any match
result[1] : group1 : words
result[2] : group2 : punctuation , !
result[3] : group3 : whitespace
*/
var basicRegex = /(\w+)|(,|!)|(\s)/g;
/*
Advanced Regex explanation:
[a-zA-Z\u0080-\u00FF] instead of \w Supports some Unicode letters instead of ASCII letters only. Find Unicode ranges here https://apps.timwhitlock.info/js/regex
(\.\.\.|\.|,|!|\?) Identify ellipsis (...) and points as separate entities
You can improve it by adding ranges for special punctuation and so on
*/
var advancedRegex = /([a-zA-Z\u0080-\u00FF]+)|(\.\.\.|\.|,|!|\?)|(\s)/g;
var basicString = "Hello, this is a random message!";
var advancedString = "Et en français ? Avec des caractères spéciaux ... With one point at the end.";
console.log("------------------");
var result = null;
do {
result = basicRegex.exec(basicString)
console.log(result);
} while(result != null)
console.log("------------------");
var result = null;
do {
result = advancedRegex.exec(advancedString)
console.log(result);
} while(result != null)
/*
Output:
Array [ "Hello", "Hello", undefined, undefined ]
Array [ ",", undefined, ",", undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "this", "this", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "is", "is", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "a", "a", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "random", "random", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "message", "message", undefined, undefined ]
Array [ "!", undefined, "!", undefined ]
null
*/

In order to extract word-only characters, we use the \w symbol. Whether or not this will match Unicode characters or not is implementation-dependent, and you can use this reference to see what the case is for your language/library.
Please see Alexander Yezutov's answer (update 2) on how to apply this into an expression.

Develop Reference

JavaScript is the programming language of the Web.

regex help in extracting values from a string - javascript

Try the following, the key is to properly escape reserved special symbols: #\[([\d\w]+):\+([\s\w]+)\]

Related

Regex - Extract the first word from a string

How to split a string based on a regex pattern with conditions (JavaScript)

How to get specific text from a string in regex

Regex to match all words but the one beginning and ending with special chars

Tokenizing strings using regular expression in Javascript

Categories

Resources