Extracting a complicated part of the string with plain Javascript

Extracting a complicated part of the string with plain Javascript - javascript

I have a following string:
Text
I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com'
There are a few variables:
jan_kowalski is a name and surname it can change, and sometimes even have 3 elements
the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get)
the rest of the string remains the same for every case (beginning + everything after starting with _company_com ...
Ps. I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help

An alternative to Randy Casburn's solution using regex
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1];
console.log(out);
Or if you want to just get that string with those country codes you specified
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
A proof of concept that this solution also works for other combinations
let urls = [
new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'),
new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx')
]
urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1]))

I have been very successful before with this kind of problems with regular expressions:
var string = 'Text';
var regExp = /([\w]{2})_company_com/;
find = string.match(regExp);
console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code
First you got your given string. Second you have a regular expression, which is marked with two slashes at the beginning and at the end. A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful).
In this case here it matches exactly two word characters [\w]{2} followed directly by _company_com (\w indicates a word character, the [] group all wanted character types, here only word characters, and the {}indicate the number of characters to be found). Now to find the wanted part string.match(regExp) has to be called to get all captured findings. It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by ()). So in this case you get the country code with find[1], which is the first and only capture group of the regular expression.

Related

How to include a variable and exclude numbers[0-9] and letters[a-zA-Z] in RegExp?

I have a code that generates a random letter based on the word and I have tried to create a RegExp code to turn all the letters from the word to '_' except the randomly generated letter from the word.
const word = "Apple is tasty"
const randomCharacter = word[Math.floor(Math.random() * word.length)]
regex = new RegExp(/[^${randomCharacter}&\/\\#,+()$~%.'":;*?<>{}\s]/gi)
hint = word.replace(regex,'_')
I want to change all the letters to '_' except the randomly generated word. The above code for some reason does not work and shows the result: A___e __ ta_t_ and I'm not able to figure out what to do.
The final result I want is something like this: A____ __ _a___
Is there a way with regex to change all the alphabets and numbers '/[^a-zA-Z0-9]/g' to '_' except the randomly generated letter?
I'm listing all the expressions I want to include on my above code because I'm not able to figure out a way to do include and exclude at the same time using the variable with regex.

You can't do string interpolation inside of a RegExp literal (/.../). Meaning your placeholder ${randomCharacter} will not evaluate to its value in the template, but is instead interpreted literally as the string "${randomCharacter}".
If you want to use template literals, initialize your regex variable with a RegExp constructor instead, like:
const regex = new RegExp(`[^${randomCharacter}&\\/\\\#,+()$~%.'":;*?<>{}\\s]`, "gi");
See the MDN RegExp documentation for an explanation on the differences between the literal notation and constructor function, most notably:
The constructor of the regular expression object [...] results in runtime compilation of the regular expression. Use the constructor function when [...] you don't know the pattern and obtain it from another source, such as user input.

/(?:[^A\s])/
test it on regex101
just replace A in [^A\s] with you character that you want to ommit from replacement
demo:
const word = "Apple is tasty";
const randomCharacter = 'a';//word[Math.floor(Math.random() * word.length)];
regex = new RegExp('(?:[^' + randomCharacter + '\\s])', 'gi');
hint = word.replaceAll(regex, '_');
console.log(hint)

How to get content using filter and match?

I want to search in the array if theres the string that Im looking for, to do that im using match
const search_notes = array_notes.filter(notes => notes.real_content.toUpperCase().match(note.toUpperCase()));
as you can see, search_notes will give me an array with all the strings that at least has a character from the input or match completely, but theres a problem, because when I write , ), [], + or any regex symbol in the input it will gives me this error:
how can i solve this?

If you look at documentation for the match method (for instance, MDN's), you'll see that it accepts a RegExp object or a string, and if you give it a string, it passes that string into new RegExp. So naturally, characters that have special meaning in a regular expression need special treatment.
You don't need match, just includes, which doesn't do that:
const search_notes = array_notes.filter(
notes => notes.real_content.toUpperCase().includes(note.toUpperCase())
);

Match a string between two other strings with regex in javascript

How can I use regex in javascript to match the phone number and only the phone number in the sample string below? The way I have it written below matches "PHONE=9878906756", I need it to only match "9878906756". I think this should be relatively simple, but I've tried putting negating like characters around "PHONE=" with no luck. I can get the phone number in its own group, but that doesn't help when assigning to the javascript var, which only cares what matches.
REGEX:
/PHONE=([^,]*)/g
DATA:
3={STATE=, SSN=, STREET2=, STREET1=, PHONE=9878906756,
MIDDLENAME=, FIRSTNAME=Dexter, POSTALCODE=, DATEOFBIRTH=19650802,
GENDER=0, CITY=, LASTNAME=Morgan

The way you're doing it is right, you just have to get the value of the capture group rather than the value of the whole match:
var result = str.match(/PHONE=([^,]*)/); // Or result = /PHONE=([^,]*)/.exec(str);
if (result) {
console.log(result[1]); // "9878906756"
}
In the array you get back from match, the first entry is the whole match, and then there are additional entries for each capture group.
You also don't need the g flag.

Just use dataAfterRegex.substring(6) to take out the first 6 characters (i.e.: the PHONE= part).

Try
var str = "3={STATE=, SSN=, STREET2=, STREET1=, PHONE=9878906756, MIDDLENAME=, FIRSTNAME=Dexter, POSTALCODE=, DATEOFBIRTH=19650802, GENDER=0, CITY=, LASTNAME=Morgan";
var ph = str.match(/PHONE\=\d+/)[0].slice(-10);
console.log(ph);

javascript regex to extract the first character after the last specified character

I am trying to extract the first character after the last underscore in a string with an unknown number of '_' in the string but in my case there will always be one, because I added it in another step of the process.
What I tried is this. I also tried the regex by itself to extract from the name, but my result was empty.
var s = "XXXX-XXXX_XX_DigitalF.pdf"
var string = match(/[^_]*$/)[1]
string.charAt(0)
So the final desired result is 'D'. If the RegEx can only get me what is behind the last '_' that is fine because I know I can use the charAt like currently shown. However, if the regex can do the whole thing, even better.

If you know there will always be at least one underscore you can do this:
var s = "XXXX-XXXX_XX_DigitalF.pdf"
var firstCharAfterUnderscore = s.charAt(s.lastIndexOf("_") + 1);
// OR, with regex
var firstCharAfterUnderscore = s.match(/_([^_])[^_]*$/)[1]
With the regex, you can extract just the one letter by using parentheses to capture that part of the match. But I think the .lastIndexOf() version is easier to read.
Either way if there's a possibility of no underscores in the input you'd need to add some additional logic.

JavaScript regex back references returning an array of matches from single capture group (multiple groups)

I'm fairly sure after spending the night trying to find an answer that this isn't possible, and I've developed a work around - but, if someone knows of a better method, I would love to hear it...
I've gone through a lot of iterations on the code, and the following is just a line of thought really. At some point I was using the global flag, I believe, in order for match() to work, and I can't remember if it was necessary now or not.
var str = "#abc#def#ghi&jkl";
var regex = /^(?:#([a-z]+))?(?:&([a-z]+))?$/;
The idea here, in this simplified code, is the optional group 1, of which there is an unspecified amount, will match #abc, #def and #ghi. It will only capture the alpha characters of which there will be one or more. Group 2 is the same, except matches on & symbol. It should also be anchored to the start and end of the string.
I want to be able to back reference all matches of both groups, ie:
result = str.match(regex);
alert(result[1]); //abc,def,ghi
alert(result[1][0]); //abc
alert(result[1][1]); //def
alert(result[1][2]); //ghi
alert(result[2]); //jkl
My mate says this works fine for him in .net, unfortunately I simply can't get it to work - only the last matched of any group is returned in the back reference, as can be seen in the following:
(additionally, making either group optional makes a mess, as does setting global flag)
var str = "#abc#def#ghi&jkl";
var regex = /(?:#([a-z]+))(?:&([a-z]+))/;
var result = str.match(regex);
alert(result[1]); //ghi
alert(result[1][0]); //g
alert(result[2]); //jkl
The following is the solution I arrived at, capturing the whole portion in question, and creating the array myself:
var str = "#abc#def#ghi&jkl";
var regex = /^([#a-z]+)?(?:&([a-z]+))?$/;
var result = regex.exec(str);
alert(result[1]); //#abc#def#ghi
alert(result[2]); //jkl
var result1 = result[1].toString();
result[1] = result1.split('#')
alert(result[1][1]); //abc
alert(result[1][2]); //def
alert(result[1][3]); //ghi
alert(result[2]); //jkl

That's simply not how .match() works in JavaScript. The returned array is an array of simple strings. There's no "nesting" of capture groups; you just count the ( symbols from left to right.
The first string (at index [0]) is always the overall matched string. Then come the capture groups, one string (or null) per array element.
You can, as you've done, rearrange the result array to your heart's content. It's just an array.
edit — oh, and the reason your result[1][0] was "g" is that array indexing notation applied to a string gets you the individual characters of the string.

Develop Reference

JavaScript is the programming language of the Web.

Extracting a complicated part of the string with plain Javascript - javascript

Related

How to include a variable and exclude numbers[0-9] and letters[a-zA-Z] in RegExp?

How to get content using filter and match?

Match a string between two other strings with regex in javascript

javascript regex to extract the first character after the last specified character

JavaScript regex back references returning an array of matches from single capture group (multiple groups)

Categories

Resources