Javascript regex pattern match multiple strings ( AND, OR, NEAR/n, P/n ) - javascript

I need to filter a collection of strings based on a rather complex query
I have query input as a string
var query1 ='Abbott near/10 (assay* OR test* ) AND BLOOD near/10 (Point P/1 Care)';
From this query INPUT string I want to collect just the important words:
var words= 'Abbott assay* test* BLOOD Point care';
The query can change for example:
var query2='(assay* OR test* OR analy* OR array) OR (Abbott p/1 Point P/1 Care)';
from this query need to collect
var words='assay* test* analy* array Abbott Point Care';
I'm looking for your suggestion.
Thanks.

You may just use | in your regex to capture the words and/or special characters that you want to remove:
([()]|AND|OR|(NEAR|P)\/\d+) ?
DEMO: https://regex101.com/r/rqpmXr/2
Note the /gi in the regex options, with i meaning that it's case insensitive.
EXPLANATION:
([()]|AND|OR|(NEAR|P)\/\d+) - This is a capture group containing all the words you specified in your title, plus the parentheses.
(NEAR|P)\/\d+ - Just to clear out this part, \d+ means that one or more digits are following the words NEAR or P.
 ? - This captures the possible trailing space after the captured word.

Related

Extracting a complicated part of the string with plain Javascript

I have a following string:
Text
I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com'
There are a few variables:
jan_kowalski is a name and surname it can change, and sometimes even have 3 elements
the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get)
the rest of the string remains the same for every case (beginning + everything after starting with _company_com ...
Ps. I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help
An alternative to Randy Casburn's solution using regex
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1];
console.log(out);
Or if you want to just get that string with those country codes you specified
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
A proof of concept that this solution also works for other combinations
let urls = [
new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'),
new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx')
]
urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1]))
I have been very successful before with this kind of problems with regular expressions:
var string = 'Text';
var regExp = /([\w]{2})_company_com/;
find = string.match(regExp);
console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code
First you got your given string. Second you have a regular expression, which is marked with two slashes at the beginning and at the end. A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful).
In this case here it matches exactly two word characters [\w]{2} followed directly by _company_com (\w indicates a word character, the [] group all wanted character types, here only word characters, and the {}indicate the number of characters to be found). Now to find the wanted part string.match(regExp) has to be called to get all captured findings. It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by ()). So in this case you get the country code with find[1], which is the first and only capture group of the regular expression.

Regex match text between groups match

I'm doing a regex to separate as a key: value the text entry is similar to this
QA~BlaBlaBlaWE~1235123FA~blablablaER~blabla123ZX~2342blaaa
I have been able to separate it but when trying to take Group3 as a key and Group4 as a value
the QA ~ BlaBlaBla
it remains in Group2 (QA) and Group3 the value (BlaBlaBla)
my regex is this
((\w{2}~)?(.*?)(\w{2}~|$))
the point is to be able to create a list like this
> Key Value
> QA BlaBlaBla
> WE 1235123
> FA blablabla
> ER blabla123
> ZX 2342blaaa
and here is the example
https://regex101.com/r/Xh8RAA/1
I can not create the regex well so that everything is in Group3 and Group4 someone can help me
What you're looking for is lookahead, which will check that the current position is followed by some pattern without consuming characters in the pattern. You can also remove the unnecessary capturing group enclosing the whole regex, so you can get group 1 to contain the key, and group 2 to contain the value, without any other groups. Also, because the keys are required, the key group shouldn't be optional:
(\w{2})~(.*?)(?=\w{2}~|$)
https://regex101.com/r/Xh8RAA/6
You can use a positive lookahead pattern to avoid consuming the next header token:
([A-Z]{2})~(.*?)(?=[A-Z]{2}~|$)
Substitute the match with group 1 and group 2 followed by a newline and you will get the desired output.
Demo: https://regex101.com/r/Xh8RAA/2
You can try this Regular Expression:
/.{2}~[^~]+((?=..~)|$)/g
check its result below:
console.log("QA~BlaBlaBlaWE~1235123FA~blablablaER~blabla123ZX~2342blaaa".match(/.{2}~[^~]+((?=..~)|$)/g));
With the code below you can get it as object (key/val):
function CustomSplit(s){
var r={};
s.match(/(.{2})~([^~]+((?=..~)|$))/g).forEach(function(a){a=a.split("~"); r[a[0]]=a[1];});
return r;
}
console.log(CustomSplit("QA~BlaBlaBlaWE~1235123FA~blablablaER~blabla123ZX~2342blaaa"));

Match a string between two other strings with regex in javascript

How can I use regex in javascript to match the phone number and only the phone number in the sample string below? The way I have it written below matches "PHONE=9878906756", I need it to only match "9878906756". I think this should be relatively simple, but I've tried putting negating like characters around "PHONE=" with no luck. I can get the phone number in its own group, but that doesn't help when assigning to the javascript var, which only cares what matches.
REGEX:
/PHONE=([^,]*)/g
DATA:
3={STATE=, SSN=, STREET2=, STREET1=, PHONE=9878906756,
MIDDLENAME=, FIRSTNAME=Dexter, POSTALCODE=, DATEOFBIRTH=19650802,
GENDER=0, CITY=, LASTNAME=Morgan
The way you're doing it is right, you just have to get the value of the capture group rather than the value of the whole match:
var result = str.match(/PHONE=([^,]*)/); // Or result = /PHONE=([^,]*)/.exec(str);
if (result) {
console.log(result[1]); // "9878906756"
}
In the array you get back from match, the first entry is the whole match, and then there are additional entries for each capture group.
You also don't need the g flag.
Just use dataAfterRegex.substring(6) to take out the first 6 characters (i.e.: the PHONE= part).
Try
var str = "3={STATE=, SSN=, STREET2=, STREET1=, PHONE=9878906756, MIDDLENAME=, FIRSTNAME=Dexter, POSTALCODE=, DATEOFBIRTH=19650802, GENDER=0, CITY=, LASTNAME=Morgan";
var ph = str.match(/PHONE\=\d+/)[0].slice(-10);
console.log(ph);

javascript regex capturing parentheses

I don't really get the concept on capturing parentheses when dealing with javascript regex. I don't understand why we need parentheses for the following example
var x = "{xxx} blah blah blah {yyy} and {111}";
x.replace( /{([^{}]*)}/g ,
function(match,content) {
console.log(match,content);
return "whatever";
});
//it will print
{xxx} xxx
{yyy} yyy
{111} 111
so when i drop the parentheses from my pattern x the results give a different value
x.replace( /{[^{}]*}/g ,
function(match,content) {
console.log(match,content);
return "whatever";
});
//it will print
{xxx} 0
{yyy} 37
{111} 49
so the content values now become numeric value which i have no idea why. Can someone explains what's going on behind the scene ?
According to the MDN documentation, the parameters to the function will be, in order:
The matched substring.
Any groups that are defined, if there are any.
The index in the original string where the match was found.
The original string.
So in the first example, content will be the string which was captured in group 1. But when you remove the group in the second example, content is actually the index where the match was found.
This is useful with replacement of texts.
For example, I have this string "one two three four" that I want to reverse like "four three two one". To achieve that I will use this line of code:
var reversed = "one two three four".replace(/(one) (two) (three) (four)/, "$4 $3 $2 $1");
Note how $n represents each word in the string.
Another example: I have the same string "one two three four" and I want to print each word twice:
var eachWordTwice = "one two three four".replace(/(one) (two) (three) (four)/, "$1 $1 $2 $2 $3 $3 $4 $4");
The numbers:
The offset of the matched substring within the total string being
examined. (For example, if the total string was "abcd", and the
matched substring was "bc", then this argument will be 1.)
Source:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace
"Specifying a function as a parameter" section
Parenthesis are used to capture/replace only a portion of the match. For instance, when I use it to match phone numbers that may or may not have extensions. This function matches the whole string (if the if is correct), so the entire string is replaced, but I am only using a specific types of characters in a specific order, with whitespace or other("() -x") characters allowed in the input.
It will always output a string formatted to (651) 258-9631 x1234 if given 6512589631x1234 or 1 651 258 9631 1234. It also doesn't allow (or in this case format) toll-free numbers as they aren't allowed in my field.
function phoneNumber(v) {
// take in a string, return a formatted string (651) 651-6511 x1234
if (v.search(/^[1]{0,1}[-(\s.]{0,1}(?!800|888|877|866|855|900)([2-9][0-9]{2})[-)\s.]{0,2}([2-9][0-9]{2})[-.\s]{0,2}([0-9]{4})[\s]*[x]{0,1}([0-9]{1,5}){1}$/gi) !== -1) {return v.replace(/^[1]{0,1}[-(\s.]{0,1}(?!800|888|877|866|855|900)([2-9][0-9]{2})[-)\s.]{0,2}([2-9][0-9]{2})[-.\s]{0,2}([0-9]{4})[\s]*[x]{0,1}([0-9]{1,5}){1}$/gi,"($1) $2-$3 x$4"); }
if (v.search(/^[1]{0,1}[-(\s.]{0,1}(?!800|888|877|866|855|900)([2-9][0-9]{2})[-)\s.]{0,1}([2-9][0-9]{2})[-.\s]{0,2}([0-9]{4})$/gi) !== -1) { return v.replace(/^[1]{0,1}[-(\s.]{0,1}(?!800|888|877|866|855|900)([2-9][0-9]{2})[-)\s.]{0,1}([2-9][0-9]{2})[-.\s]{0,2}([0-9]{4})$/gi,"($1) $2-$3"); }
return v;
}
What this allows me to do is gather the area code, prefix, line number, and an optional extension, and format it the way I need it (for users who can't follow directions, for instance).
So it you input 6516516511x1234 or "(651) 651-6511 x1234", it will match one regex or another in this example.
Now what is happening in your code is as #amine-hajyoussef said - The index of the start of each match is being returned. Your use of that code would be better serviced by match for example one (text returned), or search for the index, as in example two. p.s.w.g's answer expands.

Regex to capture everything but consecutive newlines

What is the best way to capture everything except when faced with two or more new lines?
ex:
name1
address1
zipcode
name2
address2
zipcode
name3
address3
zipcode
One regex I considered was /[^\n\n]*\s*/g. But this stops when it is faced with a single \n character.
Another way I considered was /((?:.*(?=\n\n)))\s*/g. But this seems to only capture the last line ignoring the previous lines.
What is the best way to handle similar situation?
UPDATE
You can consider replacing the variable length separator with some known fixed length string not appearing in your processed text and then split. For instance:
> var s = "Hi\n\n\nBye\nCiao";
> var x = s.replace(/\n{2,}/, "#");
> x.split("#");
["Hi", "Bye
Ciao"]
I think it is an elegant solution. You could also use the following somewhat contrived regex
> s.match(/((?!\n{2,})[\s\S])+/g);
["Hi", "
Bye
Ciao"]
and then process the resulting array by applying the trim() string method to its members in order to get rid of any \n at the beginning/end of every string in the array.
((.+)\n?)*(you probably want to make the groups non-capturing, left it as is for readability)
The inner part (.+)\n? means "non-empty line" (at least one non-newline character as . does not match newlines unless the appropriate flag is set, followed by an optional newline)
Then, that is repeated an arbitrary number of times (matching an entire block of non-blank lines).
However, depending on what you are doing, regexp probably is not the answer you are looking for. Are you sure just splitting the string by \n\n won't do what you want?
Do you have to use regex? The solution is simple without it.
var data = 'name1...';
var matches = data.split('\n\n');
To access an individual sub section split it by \n again.
//the first section's name
var name = matches[0].split('\n')[0];

Categories

Resources