How to extract specific substring from a string using a one liner - javascript

Input: parent/123/child/grand-child
Expected output: child
Attempt 1: (?<=\/parent\/\d*)(.*)(?=\/.*)
Error: A quantifier inside a lookbehind makes it non-fixed width, look behind does not accept * but I don't know the width of the number hence must use it
Attempt 2: (works but 2 liners):
const currentRoute='/parent/123/child/grand-child'
let extract = currentRoute.replace(/\/parent\/\d*/g, '');
extract = extract.substring(1, extract.lastIndexOf('/'));
console.log('Result', extract)
How do I get the extract with a one liner, preferably using regex

Your current pattern will match 123/child instead of child only as there is a forward slash missing after \d* (note the * means 0 or more times)
It will also over match (See demo) due to the .* if there are more forward slashes present.
Instead, you could make use of a capturing group and use match.
parent\/\d+\/(\w+)\/
Regex demo
The value is in capturing group 1.
let res = "parent/123/child/grand-child".match(/parent\/\d+\/(\w+)\//);
if (res) console.log(res[1])
A pattern with a lookbehind to get the value child could be
(?<=parent\/\d*\/)([^\/]+)(?=\/)
Regex demo
Note that this is not yet widely supported.
let res = "parent/123/child/grand-child".match(/(?<=parent\/\d*\/)([^\/]+)(?=\/)/);
if (res) console.log(res[0])

How about
currentRoute.match(/\/parent\/(?:.*)\/(.*)\//)[1]

If the format is fixed, then use .split("/")[2] to get 3rd element
console.log(currentRoute.split("/")[2]);
"child"
To match the parent part of the string use .match(/^parent\/[^\/]+\/([^\/]+)/)[1]
console.log(currentRoute.match(/^parent\/[^\/]+\/([^\/]+)/)[1]);
"child"

Related

Splitting a String with Emoji Regex Respecting Variation Selector 15

I'm trying to create a way to split a string by emoji and non-emoji chunks. I managed to get a regex from here and altered to this to take into account the textual variation selector:
(?:(?!(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])+\ufe0e))(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])+
This works with .match such as:
'🇦🇨'.match(regex) // (["0x1F1E6", "0x1F1E8"]) => ['🇦🇨']
'🇦🇨'.match(regex) // (["0x1F1E6", "0x1F1E8", "0xFE0E]) => null
But split isn't giving me the expected results:
'🇦🇨'.split(regex) // (["", undefined, "🇨", ""]) => ['🇦🇨']
I need split to return the entire emoji in one element. What am I doing wrong?
EDIT:
I have a working regex now, except for the edge case exhibited here: https://regex101.com/r/Vki2ZS/2.
I don't want the second emoji to be matched since it is succeeded by the textual variant selector. I think this is because I'm using lookahead, as the reverse string is matched as expected, but I can't use negative look behind since it's not supported by all browsers.
Your pattern does not work because the second emoji got partly matched with the + quantified (?:\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])+: \uD83E\uDD20\uFE0F\uD83E\uDD20 was matched in \uD83E\uDD20\uFE0F\uD83E\uDD20\uFE0E with two iterations, first \uD83E\uDD20\uFE0F, then \uD83E\uDD20.
The pattern you may use with .split is
/((?:(?:\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])+(?!\ufe0e)(?:\ufe0f)?(?:\u200d)?)+)/
The main goal was to fail all matches where (?:\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])+ was followed with \uFE0E, see I added a negative lookahead (?!\ufe0e).
JS demo:
var regex = /((?:(?:\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])+(?!\ufe0e)(?:\ufe0f)?(?:\u200d)?)+)/;
console.log('🇦🇨'.split(regex));
console.log('🤠️🤠︎'.split(regex));
// If you need to wrap the match with some tags:
console.log('🤠️🤠︎'.replace(/(?:(?:\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])+(?!\ufe0e)(?:\ufe0f)?(?:\u200d)?)+/g, '<span class="special">$&</span>'))

Extracting a complicated part of the string with plain Javascript

I have a following string:
Text
I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com'
There are a few variables:
jan_kowalski is a name and surname it can change, and sometimes even have 3 elements
the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get)
the rest of the string remains the same for every case (beginning + everything after starting with _company_com ...
Ps. I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help
An alternative to Randy Casburn's solution using regex
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1];
console.log(out);
Or if you want to just get that string with those country codes you specified
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
A proof of concept that this solution also works for other combinations
let urls = [
new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'),
new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx')
]
urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1]))
I have been very successful before with this kind of problems with regular expressions:
var string = 'Text';
var regExp = /([\w]{2})_company_com/;
find = string.match(regExp);
console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code
First you got your given string. Second you have a regular expression, which is marked with two slashes at the beginning and at the end. A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful).
In this case here it matches exactly two word characters [\w]{2} followed directly by _company_com (\w indicates a word character, the [] group all wanted character types, here only word characters, and the {}indicate the number of characters to be found). Now to find the wanted part string.match(regExp) has to be called to get all captured findings. It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by ()). So in this case you get the country code with find[1], which is the first and only capture group of the regular expression.

Match only # and not ## without negative lookbehind

Using JavaScript, I need a regex that matches any instance of #{this-format} in any string. My original regex was the following:
#{[a-z-]*}
However, I also need a way to "escape" those instances. I want it so that if you add an extra #, the match gets escaped, like ##{this}.
I originally used a negative lookbehind:
(?<!#)#{[a-z-]*}
And that would work just fine, except... lookbehinds are an ECMAScript2018 feature, only supported by Chrome.
I read some people suggesting the usage of a negated character set. So my little regex became this:
(?:^|[^#])#{[a-z-]*}
...which would have worked just as well, except it doesn't work if you put two of these together: #{foo}#{bar}
So, anyone knows how can I achieve this? Remember that these conditions need to be met:
Find #{this} anywhere in a string
Be able to escape like ##{this}
Be able to put multiple adjacent, like #{these}#{two}
Lookbehinds must not be used
If you include ## in your regex pattern as an alternate match option, it will consume the ## instead of allowing a match on the subsequent bracketed entity. Like this:
##|(#{[a-z-]*})
You can then evaluate the inner match object in javascript. Here is a jsfiddle to demonstrate, using the following code.
var targetText = '#{foo} in a #{bar} for a ##{foo} and #{foo}#{bar} things.'
var reg = /##|(#{[a-z-]*})/g;
var result;
while((result = reg.exec(targetText)) !== null) {
if (result[1] !== undefined) {
alert(result[1]);
}
}
You could use (?:^|[^#])# to match the start of the pattern, and capture the following #{<sometext>} in a group. Since you don't want the initial (possible) [^#] to be in the result, you'll have to iterate over the matches manually and extract the group that contains the substring you want. For example:
function test(str) {
const re = /(?=(?:^|[^#])(#{[a-z-]*}))./g;
let match;
const matches = [];
while (match = re.exec(str)) {
matches.push(match[1]); // extract the captured group
}
return matches;
}
console.log(test('##{this}'))
console.log(test('#{these}#{two}'))

javascript regex insert new element into expression

I am passing a URL to a block of code in which I need to insert a new element into the regex. Pretty sure the regex is valid and the code seems right but no matter what I can't seem to execute the match for regex!
//** Incoming url's
//** url e.g. api/223344
//** api/11aa/page/2017
//** Need to match to the following
//** dir/api/12ab/page/1999
//** Hence the need to add dir at the front
var url = req.url;
//** pass in: /^\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var re = myregex.toString();
//** Insert dir into regex: /^dir\/api\/([a-zA-Z0-9-_~ %]+)(?:\/page\/([a-zA-Z0-9-_~ %]+))?$/
var regVar = re.substr(0, 2) + 'dir' + re.substr(2);
var matchedData = url.match(regVar);
matchedData === null ? console.log('NO') : console.log('Yay');
I hope I am just missing the obvious but can anyone see why I can't match and always returns NO?
Thanks
Let's break down your regex
^\/api\/ this matches the beginning of a string, and it looks to match exactly the string "/api"
([a-zA-Z0-9-_~ %]+) this is a capturing group: this one specifically will capture anything inside those brackets, with the + indicating to capture 1 or more, so for example, this section will match abAB25-_ %
(?:\/page\/([a-zA-Z0-9-_~ %]+)) this groups multiple tokens together as well, but does not create a capturing group like above (the ?: makes it non-captuing). You are first matching a string exactly like "/page/" followed by a group exactly like mentioned in the paragraph above (that matches a-z, A-Z, 0-9, etc.
?$ is at the end, and the ? means capture 0 or more of the precending group, and the $ matches the end of the string
This regex will match this string, for example: /api/abAB25-_ %/page/abAB25-_ %
You may be able to take advantage of capturing groups, however, and use something like this instead to get similar results: ^\/api\/([a-zA-Z0-9-_~ %]+)\/page\/\1?$. Here, we are using \1 to reference that first capturing group and match exactly the same tokens it is matching. EDIT: actually, this probably won't work, since the text after /api/ and the text after /page/ will most likely be different, carrying on...
Afterwards, you are are adding "dir" to the beginning of your search, so you can now match someting like this: dir/api/abAB25-_ %/page/abAB25-_ %
You have also now converted the regex to a string, so like Crayon Violent pointed out in their comment, this will break your expected funtionality. You can fix this by using .source on your regex: var matchedData = url.match(regVar.source); https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/source
Now you can properly match a string like this: dir/api/11aa/page/2017 see this example: https://repl.it/Mj8h
As mentioned by Crayon Violent in the comments, it seems you're passing a String rather than a regular expression in the .match() function. maybe try the following:
url.match(new RegExp(regVar, "i"));
to convert the string to a regular expression. The "i" is for ignore case; don't know that's what you want. Learn more here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

How to get the 1st character after a pattern using regex?

I'm trying to get the first character after the pattern.
i.e.
border-top-color
padding-top
/-[a-z]/g
selects:
border[-t]op[-c]olor
padding[-t]op
I want to select:
border-[t]op-[c]olor
padding-[t]op
How do you just get the first character after a selected pattern?
Example Here! :)
To get the t after border-, you usally match with this kind of regex:
border-(.)
You can then extract the submatch:
var characterAfter = str.match(/border-(.)/)[1];
match returns an array with the whole match as first element, and the submatches in the following positions.
To get an array of all the caracters following a dash, use
var charactersAfter = str.match(/-(.)/g).map(function(s){ return s.slice(1) })
Just use a capturing group:
"border-top-color".replace(/-([a-z])/g, "-[$1]")
Result:
"border-[t]op-[c]olor"
You can use submatching like dystroy said or simply use lookbehind to match it:
/(?<=-)./

Categories

Resources