convert Golang regex to JS regex - javascript

I have a regex from Golang (nameComponentRegexp). How can I convert it to JavaScript style regex?
The main blocking problem for me:
How can I do optional and repeated in JavaScript correctly
I tried copy from match(`(?:[._]|__|[-]*)`) but it cannot match single period or single underscore. I tried it from online regex tester.
The description from Golang:
nameComponentRegexp restricts registry path component names to start
with at least one letter or number, with following parts able to be
separated by one period, one or two underscore and multiple dashes.
alphaNumericRegexp = match(`[a-z0-9]+`)
separatorRegexp = match(`(?:[._]|__|[-]*)`)
nameComponentRegexp = expression(
alphaNumericRegexp,
optional(repeated(separatorRegexp, alphaNumericRegexp)))
Some valid example:
a.a
a_a
a__a
a-a
a--a
a---a

See how you build the nameComponentRegexp: you start with alphaNumericRegexp and then match 1 or 0 occurrences of 1 or more sequences of separatorRegexp+alphaNumericRegexp.
optional() does the following:
// optional wraps the expression in a non-capturing group and makes the
// production optional.
func optional(res ...*regexp.Regexp) *regexp.Regexp {
return match(group(expression(res...)).String() + `?`)
}
repeated() does this:
// repeated wraps the regexp in a non-capturing group to get one or more
// matches.
func repeated(res ...*regexp.Regexp) *regexp.Regexp {
return match(group(expression(res...)).String() + `+`)
}
Thus, what you need is
/^[a-z0-9]+(?:(?:[._]|__|-*)[a-z0-9]+)*$/
See the regex demo
Details:
^ - start of string
[a-z0-9]+ - 1 or more alphanumeric symbols
(?:(?:[._]|__|-*)[a-z0-9]+)* - zero or more sequences of:
(?:[._]|__|-*) - a ., _, __, or 0+ hyphens
[a-z0-9]+- 1 or more alphanumeric symbols
If you want to disallow strings like aaaa, you need to replace all * in the pattern (2 occurrences) with + (demo).
JS demo:
var ss = ['a.a','a_a','a__a','a-a','a--a','a---a'];
var rx = /^[a-z0-9]+(?:(?:[._]|__|-*)[a-z0-9]+)*$/;
for (var s of ss) {
console.log(s,"=>", rx.test(s));
}

Related

How to convert a camelcased string to sentence cased without excluding any special characters?

How to convert a camelcased string to sentence cased without excluding any special characters?
Suggest a regex for converting camelcased string with special characters and numbers to sentence case?:
const string = `includes:SummaryFromDetailHistory1990-AsAbstract`
Expected outcome:
Includes : Summary From Detail History 1990 - As Abstract
Currently I'm using lodash startCase to convert camelCased to sentenceCase. But the issue with this approach is that it is removing special characters like brackets, numbers, parenthesis, hyphens, colons, etc... (most of the special characters)
So the idea is to convert camelcased strings to sentence cased while preserve the string identity
For example:
const anotherString = `thisIsA100CharactersLong:SampleStringContaining-SpecialChar(s)10&20*`
const expectedReturn = `This Is A 100 Characters : Long Sample String Containing - Special Char(s) 10 & 20 *`
Is that possible with regex?
You'll have to deal with all the cases yourself:
[a-z](?=[A-Z]): lowercase followed by uppercase
[a-zA-Z](?=[0-9]): letter followed by digit
[0-9](?=[a-zA-Z]): digit followed by letter
[a-zA-Z0-9](?=[^a-zA-Z0-9]): letter or digit followed by neither letter nor digit (\w and \W could be used, but they cover _ too, so up to you)
[^a-zA-Z0-9](?=[a-zA-Z0-9]): not letter nor digit following by either letter or digit
etc.
Then, you can or them together:
([a-z](?=[A-Z])|[a-zA-Z](?=[0-9])|[0-9](?=[a-zA-Z])|[a-zA-Z0-9](?=[^a-zA-Z0-9])|[^a-zA-Z0-9](?=[a-zA-Z0-9]))
And replace by:
$1
(see the space after $1).
See https://regex101.com/r/4AVbAs/1 for instance.
You will hit edge cases though, e.g. Char(s), so you'll need special rules for the parens for instance (see the following section about lookbehinds that can help for that). A bit of a tough job, quite error prone too and hardly maintainable I'm afraid.
If lookbehinds were allowed, you would not need to capture the first char in each group, but wrap the left patterns in (?<=...) and replace by a simple space directly:
(?<=[a-z])(?=[A-Z]): preceded by lowercase, followed by uppercase
(?<=[a-zA-Z])(?=[0-9]): preceded by letter, followed by digit
(?<=[0-9])(?=[a-zA-Z]): preceded by digit, followed by letter
(?<=[a-zA-Z0-9])(?=[^a-zA-Z0-9])(?!(?:\(s)?\)): preceded by letter or digit, followed by not letter nor digit, as well as not followed by (s) nor )
(?<=[^a-zA-Z0-9])(?<!\()(?=[a-zA-Z0-9]): preceded by not letter nor digit, as well as not preceded by (, followed by letter or digit
or-ed together:
(?<=[a-z])(?=[A-Z])|(?<=[a-zA-Z])(?=[0-9])|(?<=[0-9])(?=[a-zA-Z])|(?<=[a-zA-Z0-9])(?=[^a-zA-Z0-9])(?!(?:\(s)?\))|(?<=[^a-zA-Z0-9])(?<!\()(?=[a-zA-Z0-9])
Replace with an empty space, see https://regex101.com/r/DB91DE/1.
The wanted result doesn't seem to be regular, some special characters are supposed to be preceeded with a space and some are not. Treating the parenthesis like you want is a bit tricky. You can use function to handle the parenthesis, like this:
let parenth = 0;
const str = `thisIsA100CharactersLong:SampleStringContaining-SpecialChar(s)10&20*`,
spaced = str.replace(/[A-Z]|\d+|\W/g, (m) => {
if (m === '(') {
parenth = 1;
return m;
}
if (parenth || m === ')') {
parenth = 0;
return m;
}
return ` ${m}`;
});
console.log(spaced);
If the data can contain other brackets, instead of just checking parentheses, use a RexExp to test any opening bracket: if (/[({[]/.test(m)) ..., and test for closing brackets: if (/[)}\]]/.test(m)) ....
You can test the snippet with different data at jsFiddle.
This is impossible. You cannot do this in regex. You will have to consider exceptions...

Regex match multiple same expression multiple times

I have got this string {bgRed Please run a task, {red a list has been provided below}, I need to do a string replace to remove the braces and also the first word.
So below I would want to remove {bgRed and {red and then the trailing brace which I can do separate.
I have managed to create this regex, but it is only matching {bgRed and not {red, can someone lend a hand?
/^\{.+?(?=\s)/gm
Note you are using ^ anchor at the start and that makes your pattern only match at the start of a line (mind also the m modifier). .+?(?=\s|$) is too cumbersome, you want to match any 1+ chars up to the first whitespace or end of string, use {\S+ (or {\S* if you plan to match { without any non-whitespace chars after it).
You may use
s = s.replace(/{\S*|}/g, '')
You may trim the outcome to get rid of resulting leading/trailing spaces:
s = s.replace(/{\S*|}/g, '').trim()
See the regex demo and the regex graph:
Details
{\S* - { char followed with 0 or more non-whitespace characters
| - or
} - a } char.
If the goal is go to from
"{bgRed Please run a task, {red a list has been provided below}"
to
"Please run a task, a list has been provided below"
a regex with two capture groups seems simplest:
const original = "{bgRed Please run a task, {red a list has been provided below}";
const rex = /\{\w+ ([^{]+)\{\w+ ([^}]+)}/g;
const result = original.replace(rex, "$1$2");
console.log(result);
\{\w+ ([^{]+)\{\w+ ([^}]+)} is:
\{ - a literal {
\w+ - one or more word characters ("bgRed")
a literal space
([^{]+) one or more characters that aren't {, captured to group 1
\{ - another literal {
\w+ - one or more word characters ("red")
([^}]+) - one or more characters that aren't }, captured to group 2
} - a literal }
The replacement uses $1 and $2 to swap in the capture group contents.

Regex to match optional parameter

I'm trying to write a regex to match an optional parameter at the end of a path.
I want to cover the first 4 paths but not the last one:
/main/sections/create-new
/main/sections/delete
/main/sections/
/main/sections
/main/sectionsextra
So far I've created this:
/\/main\/sections(\/)([a-zA-z]{1}[a-zA-z\-]{0,48}[a-zA-z]{1})?/g
This only finds the first 3. How can I make it match the first 4 cases?
You may match the string in question up the optional string starting with / with any 1 or or more chars other than / after it up to the end of the string:
\/main\/sections(?:\/[^\/]*)?$
^^^^^^^^^^^^^^
See the regex demo. If you really need to constrain the optional subpart to only consist of just letters and - with the - not allowed at the start/end (with length of 2+ chars), use
/\/main\/sections(?:\/[a-z][a-z-]{0,48}[a-z])?$/i
Or, to also allow 1 char subpart:
/\/main\/sections(?:\/[a-z](?:[a-z-]{0,48}[a-z])?)?$/i
Details
\/main\/sections - a literal substring /main/sections
(?:\/[^\/]*)? - an optional non-capturing group matching 1 or 0 occurrences of:
\/ - a / char
[^\/]* - a negated character class matching any 0+ chars other than /
$ - end of string.
JS demo:
var strs = ['/main/sections/create-new','/main/sections/delete','/main/sections/','/main/sections','/main/sectionsextra'];
var rx = /\/main\/sections(?:\/[^\/]*)?$/;
for (var s of strs) {
console.log(s, "=>", rx.test(s));
}

How to use RegEx to ignore the first period and match all subsequent periods?

How to use RegEx to ignore the first period and match all subsequent periods?
For example:
1.23 (no match)
1.23.45 (matches the second period)
1.23.45.56 (matches the second and third periods)
I am trying to limit users from entering invalid numbers. So I will be using this RegEx to replace matches with empty strings.
I currently have /[^.0-9]+/ but it is not enough to disallow . after an (optional) initial .
Constrain the number between the start ^ and end anchor $, then specify the number pattern you require. Such as:
/^\d+\.?\d+?$/
Which allows 1 or more numbers, followed by an optional period, then optional numbers.
I suggest using a regex that will match 1+ digits, a period, and then any number of digits and periods capturing these 2 parts into separate groups. Then, inside a replace callback method, remove all periods with an additional replace:
var ss = ['1.23', '1.23.45', '1.23.45.56'];
var rx = /^(\d+\.)([\d.]*)$/;
for (var s of ss) {
var res = s.replace(rx, function($0,$1,$2) {
return $1+$2.replace(/\./g, '');
});
console.log(s, "=>", res);
}
Pattern details:
^ - start of string
(\d+\.) - Group 1 matching 1+ digits and a literal .
([\d.]*) - zero or more chars other than digits and a literal dot
$ - end of string.

((xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,} works on regexpal.com, but not on jsfiddle.net

So this regular expression contained in "pattern" below, is only supposed to match what I say in the comment below (with the most basic match being 1 letter follow by a dot, and then two letters)
var link = "Help"
// matches www-data.it -- needs at least (1 letter + '.' + 2 letters )
var pattern = '((xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}';
var re2 = new RegExp('^' + pattern, 'i');
// if no http and there is something.something
if (link.search(re2) == 0)
{
link = link;
}
When I test this code # http://regexpal.com/ it works e.g. only something.something passes.
When I test it at JSFiddle and in production it matches more than it should, e.g. "Help" matches.
http://jsfiddle.net/2jU4D/
what's the deal?
You should construct the regular expression with native regex syntax:
var re2 = /^((xn--)?[a-z0-9]+(-[a-z0-9]+)*\.)+[a-z]{2,}/i;
In particular, the \. in the regular expression will look like just plain . by the time you call new RegExp(). The string grammar also uses backslash for quoting, so the backslash will be "eaten" when the expression is first parsed as a string.
Alternatively:
var pattern = '((xn--)?[a-z0-9]+(-[a-z0-9]+)*\\.)+[a-z]{2,}';
var re2 = new RegExp('^' + pattern, 'i');
Doubling the backslash will leave you with the proper string to pass to the RegExp constructor.
Here is the breakdown of what it matches. I would replace all the capture groups with non-capture groups. And put all the anchors in the body of a regex (don't append later).
The regex is valid, don't know about its delimeters or the way you are using it.
Pay attention to the required parts and you will see its not matching correctly, but not
I don't think because of the regex.
( # (1 start)
( xn-- )? # (2), optional capture 'xn--'
[a-z0-9]+ # many lower case letters or digits
( - [a-z0-9]+ )* # (3), optional many captures of '-' followed by many lower case letters or digits
\. # a dot '.'
)+ # (1 end), overwrite this capture buffer many times
[a-z]{2,} # Two or more lower case letters

Categories

Resources