Regex match text between groups match - javascript

I'm doing a regex to separate as a key: value the text entry is similar to this
QA~BlaBlaBlaWE~1235123FA~blablablaER~blabla123ZX~2342blaaa
I have been able to separate it but when trying to take Group3 as a key and Group4 as a value
the QA ~ BlaBlaBla
it remains in Group2 (QA) and Group3 the value (BlaBlaBla)
my regex is this
((\w{2}~)?(.*?)(\w{2}~|$))
the point is to be able to create a list like this
> Key Value
> QA BlaBlaBla
> WE 1235123
> FA blablabla
> ER blabla123
> ZX 2342blaaa
and here is the example
https://regex101.com/r/Xh8RAA/1
I can not create the regex well so that everything is in Group3 and Group4 someone can help me

What you're looking for is lookahead, which will check that the current position is followed by some pattern without consuming characters in the pattern. You can also remove the unnecessary capturing group enclosing the whole regex, so you can get group 1 to contain the key, and group 2 to contain the value, without any other groups. Also, because the keys are required, the key group shouldn't be optional:
(\w{2})~(.*?)(?=\w{2}~|$)
https://regex101.com/r/Xh8RAA/6

You can use a positive lookahead pattern to avoid consuming the next header token:
([A-Z]{2})~(.*?)(?=[A-Z]{2}~|$)
Substitute the match with group 1 and group 2 followed by a newline and you will get the desired output.
Demo: https://regex101.com/r/Xh8RAA/2

You can try this Regular Expression:
/.{2}~[^~]+((?=..~)|$)/g
check its result below:
console.log("QA~BlaBlaBlaWE~1235123FA~blablablaER~blabla123ZX~2342blaaa".match(/.{2}~[^~]+((?=..~)|$)/g));
With the code below you can get it as object (key/val):
function CustomSplit(s){
var r={};
s.match(/(.{2})~([^~]+((?=..~)|$))/g).forEach(function(a){a=a.split("~"); r[a[0]]=a[1];});
return r;
}
console.log(CustomSplit("QA~BlaBlaBlaWE~1235123FA~blablablaER~blabla123ZX~2342blaaa"));

Related

Javascript regex pattern match multiple strings ( AND, OR, NEAR/n, P/n )

I need to filter a collection of strings based on a rather complex query
I have query input as a string
var query1 ='Abbott near/10 (assay* OR test* ) AND BLOOD near/10 (Point P/1 Care)';
From this query INPUT string I want to collect just the important words:
var words= 'Abbott assay* test* BLOOD Point care';
The query can change for example:
var query2='(assay* OR test* OR analy* OR array) OR (Abbott p/1 Point P/1 Care)';
from this query need to collect
var words='assay* test* analy* array Abbott Point Care';
I'm looking for your suggestion.
Thanks.
You may just use | in your regex to capture the words and/or special characters that you want to remove:
([()]|AND|OR|(NEAR|P)\/\d+) ?
DEMO: https://regex101.com/r/rqpmXr/2
Note the /gi in the regex options, with i meaning that it's case insensitive.
EXPLANATION:
([()]|AND|OR|(NEAR|P)\/\d+) - This is a capture group containing all the words you specified in your title, plus the parentheses.
(NEAR|P)\/\d+ - Just to clear out this part, \d+ means that one or more digits are following the words NEAR or P.
 ? - This captures the possible trailing space after the captured word.

Test if a sentence is matching a text declaration using regex

I want to test if a sentence like type var1,var2,var3 is matching a text declaration or not.
So, I used the following code :
var text = "int a1,a2,a3",
reg = /int ((([a-z_A-Z]+[0-9]*),)+)$/g;
if (reg.test(text)) console.log(true);
else console.log(false)
The problem is that this regular expression returns false on text that is supposed to be true.
Could someone help me find a good regular expression matching expressions as in the example above?
You have a couple of mistekes.
As you wrote, the last coma is required at the end of the line.
I suppose you also want to match int abc123 as correct string, so you need to include letter to other characters
Avoid using capturing groups for just testing strings.
const str = 'int a1,a2,a3';
const regex = /int (?:[a-zA-Z_](?:[a-zA-Z0-9_])*(?:\,|$))+/g
console.log(regex.test(str));
You will need to add ? after the comma ,.
This token ? matches between zero and one.
Notice that the last number in your text a3 does not have , afterward.
int ((([a-z_A-Z]+[0-9]*),?)+)$

Get string between tags when multiple tags present

Just trying to figure this one out as regex is nowhere near my strong point :(
Basically I'm trying to get the value between bbcode tags: That could look like either of the following:
[center]text[/center]
[left][center]text[/center][/left]
[right][left][center]text[/center][/left][/right]
And currently have this hideous if else block of code to prevent it getting large like the third option above.
if (/\[left\]|\[\/left\]/.test(text[2])) {
// set the value in the [left][/left] tags
text[2] = text[2].match(/\[left\](.*?)\[\/left\]/)[1];
} else if (/\[right\]|\[\/right\]/.test(text[2])) {
// set value in the [right][/right] tags
text[2] = text[2].match(/\[right\](.*?)\[\/right\]/)[1];
} else if (/\[center\]|\[\/center\]/.test(text[2])) {
// set value in the [right][/right] tags
text[2] = text[2].match(/\[center\](.*?)\[\/center\]/)[1];
}
What I'd like to do is shorten it down to a single regex expression to grab that value text from the above examples, I've gotten down to an expression like this:
/\[(?:center|left|right)\](.*?)\[\/(?:center|left|right)\]/
But as you can see in this RegExr demo, it doesn't match what I need it to.
How can I achieve this?
Note
It should only match left|right|center as the selected text could also have various other bbcode tags.
If the string looks like this:
[center][left][img]/link/to/img.png[/img][/left][/center]
I want to get what is between the left|center|right tags which in this case would be:
[img]/link/to/img.png[/img]
More examples:
[center][url=lintosomething.com]LINK TEXT[/url][/center]
Should only get: [url=lintosomething.com]LINK TEXT[/url]
Or
[center]egibibskdfbgfdkfbg sd fgkgb fkgbgk fhwo3g regbiurb geir so go to [url=lintosomething.com]LINK TEXT[/url] and ibgri gbenkenbieurgnerougnerogrnreog erngo[/center]
Wanting:
egibibskdfbgfdkfbg sd fgkgb fkgbgk fhwo3g regbiurb geir so go to [url=lintosomething.com]LINK TEXT[/url] and ibgri gbenkenbieurgnerougnerogrnreog erngo
Edit: Ok, I think this fits your needs.
My regex:
/[^\]\[]*\[(\w+)[=\.\"\w]*\][^\]]+\[\/\1\][^\]\[]*/g
Explanation:
Match 0 or more characters that arent [ or ]
Match a single [
Match 1 or more of alpha characters, we'll use this later as a backreference
Match 0 or more of = . " or alpha characters
Match a single ]
Match 1 or more non [ characters
Match a single [
Match a single /
Match the same characters as step 3. (Our back reference)
Match a single ]
Match 0 or more characters that arent [ or ]
See it in action
However I would like to state that if you're going to be parsing bbcodes you're almost certainly better off just using a bbparser.
Why not just replace all those tags with empty string
var rawString; // your input string
var cleanedString = rawString.replace(~\[/?(left|right|center)\]~, '');
You could use a capturing group like this:
(?:\[\w+\])*(\w+)(?:\[\/\w+\])*
Or with a capture group named "value" like this:
(?:\[\w+\])*(?<value>\w+)(?:\[\/\w+\])*
The first and last groups are non-capturing... (?: ...)
And the middle group is capturing (\w+)
And the middle group if named like this (?<value>\w+)
Note: For simplicity, I replaced your center|left|right values with \w+ but you could swap them back in with no impact.
I use an app called RegExRX. Here's a screenshot with the RegEx and captured values.
Lots of ways you could tweak it. Good luck!

Regex to capture everything but consecutive newlines

What is the best way to capture everything except when faced with two or more new lines?
ex:
name1
address1
zipcode
name2
address2
zipcode
name3
address3
zipcode
One regex I considered was /[^\n\n]*\s*/g. But this stops when it is faced with a single \n character.
Another way I considered was /((?:.*(?=\n\n)))\s*/g. But this seems to only capture the last line ignoring the previous lines.
What is the best way to handle similar situation?
UPDATE
You can consider replacing the variable length separator with some known fixed length string not appearing in your processed text and then split. For instance:
> var s = "Hi\n\n\nBye\nCiao";
> var x = s.replace(/\n{2,}/, "#");
> x.split("#");
["Hi", "Bye
Ciao"]
I think it is an elegant solution. You could also use the following somewhat contrived regex
> s.match(/((?!\n{2,})[\s\S])+/g);
["Hi", "
Bye
Ciao"]
and then process the resulting array by applying the trim() string method to its members in order to get rid of any \n at the beginning/end of every string in the array.
((.+)\n?)*(you probably want to make the groups non-capturing, left it as is for readability)
The inner part (.+)\n? means "non-empty line" (at least one non-newline character as . does not match newlines unless the appropriate flag is set, followed by an optional newline)
Then, that is repeated an arbitrary number of times (matching an entire block of non-blank lines).
However, depending on what you are doing, regexp probably is not the answer you are looking for. Are you sure just splitting the string by \n\n won't do what you want?
Do you have to use regex? The solution is simple without it.
var data = 'name1...';
var matches = data.split('\n\n');
To access an individual sub section split it by \n again.
//the first section's name
var name = matches[0].split('\n')[0];

Javascript (node) regex doesn't seem to match start of string

im struggling with regular expressions in Javascript, they don't seem to start at the beginning of the string. In a simple example bellow I want to get the file name and then everything after the first colon
//string
file.text:16: lots of random text here with goes on for ages
//regex
(.?)[:](.*)
// group 1 returns 't'
/^([^:]+):(.*)/.exec('file.text:16: lots of random text here with goes on for ages')
gives ....
["file.text:16: lots of random text here with goes on for ages", "file.text", "16: lots of random text here with goes on for ages"]
Try this regex:
/^([^:]+)[:](.*)/
Explaination:
^ #Start of string
( #Start of capturing class #1
[^:] #Any character other than :
+ #One or more of the previous character class
) #End of capturing class #1
[:] #One :
(.*) #Any number of characters other than newline
The ? operator captures zero or one of the previous symbol only.
You could also use string operations instead:
str = "file.text:16:";
var n = str.indexOf(":");
var fileName = str.substr(0, n);
var everythingElse = str.substr(n);
The ? operator returns 0 or 1 matches. You want the * operator, and you should select everything that isn't a : in the first set
([^:]*)[:](.*)
Non-regexy answer:
var a = s.split(":");
Then join a[1] and remaining elements.
Or just get the index of the first semicolon and create two strings using that.

Categories

Resources