Trying to split string with escaped and non-escaped delimiter - javascript

I have a string with the form of a b/c\/d\/e/f. I'm trying to split the string on the non-escaped forward slashes.
I have this regex so far (?:[^\\/])/. However it consumes the last character preceding the /. So if I'm doing a replace with "#" instead of a split, the string looks like a #c\/d\/#f. In the case of the split, I get the strings separated the same with the last character being consumed.
I tried using a non capturing group but that doesn't seem to do the trick either. Doing this in javascript.

You may use this regex in JS to return you all the matches before / ignoring all escaped cases i.e. \/. This regex also takes care of the cases when \ is also escaped as \\.
/[^\\\/]*(?:\\.[^\\\/]*)*(?=\/|$)/gm
RegEx Demo
const regex = /[^\\\/]*(?:\\.[^\\\/]*)*(?=\/|$)/gm
const str = `\\\\\\\\\\\\/a b/c\\/d\\\\/e\\\\\\/f1/2\\\\\\\\\\\\\\/23/99`;
let m = str.match(regex).filter(Boolean)
console.log(m)
.filter(Boolean) is used to filter out empty matches.

Related

regex custom lenght but no whitespace allowed [duplicate]

I have a username field in my form. I want to not allow spaces anywhere in the string. I have used this regex:
var regexp = /^\S/;
This works for me if there are spaces between the characters. That is if username is ABC DEF. It doesn't work if a space is in the beginning, e.g. <space><space>ABC. What should the regex be?
While you have specified the start anchor and the first letter, you have not done anything for the rest of the string. You seem to want repetition of that character class until the end of the string:
var regexp = /^\S*$/; // a string consisting only of non-whitespaces
Use + plus sign (Match one or more of the previous items),
var regexp = /^\S+$/
If you're using some plugin which takes string and use construct Regex to create Regex Object i:e new RegExp()
Than Below string will work
'^\\S*$'
It's same regex #Bergi mentioned just the string version for new RegExp constructor
This will help to find the spaces in the beginning, middle and ending:
var regexp = /\s/g
This one will only match the input field or string if there are no spaces. If there are any spaces, it will not match at all.
/^([A-z0-9!##$%^&*().,<>{}[\]<>?_=+\-|;:\'\"\/])*[^\s]\1*$/
Matches from the beginning of the line to the end. Accepts alphanumeric characters, numbers, and most special characters.
If you want just alphanumeric characters then change what is in the [] like so:
/^([A-z])*[^\s]\1*$/

How to replace different characters with regex and add conditionals.

Example string: George's - super duper (Computer)
Wanted new string: georges-super-duper-computer
Current regex: .replace(/\s+|'|()/g, '-')
It does not work and and when I remove the spaces and there is already a - in between I get something like george's---super.
tl;dr Your regex is malformed. Also you can't conditionally remove ' and \s ( ) in a single expression.
Your regex is malformed since ( and ) have special meanings. They are used to form groups so you have to escape them as \( and \). You'll also have to place another pipe | in between them, otherwise you're going to match the literal "()", which is not what you want.
The proper expression would look like this: .replace(/\s+|'|\(|\)/g, '-').
However, this is not what you want. Since this would produce George-s---super-duper--Computer-. I would recommend that you use Character Classes, which will also make your expression easier to read:
.replace(/[\s'()-]+/g, '-')
This matches whitespace, ', (, ) and any additional - on or more times and replaces them with -, yielding George-s-super-duper-Computer-.
This is still not quite right, so have this:
var myString = "George's - super duper (Computer)";
var myOtherString = myString
// Remove non-whitespace, non-alphanumeric characters from the string (note: ^ inverses the character class)
// also trim any whitespace from the beginning and end of the string (so we don't end up with hyphens at the start and end of the string)
.replace(/^\s+|[^\s\w]+|\s+$/g, "")
// Replace the remaining whitespace with hyphens
.replace(/\s+/g, "-")
// Finally make all characters lower case
.toLowerCase();
console.log(myString, '=>', myOtherString);
You could do match instead of replace then join result on -. Then you may need a replace to remove single quotes. Regex would be:
[a-z]+('[a-z]+)*
JS code:
var str = "George's - super duper (Computer)";
console.log(
str.match(/[a-z]+('[a-z]+)*/gi).join('-').replace("'", "").toLowerCase()
);

Javascript string split with regex

I am trying to split a string using a regular expression for links (urls).
The regex in question is
var regex = new RegExp('(?:^(?:(?:[a-z]+:)?//)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[/?#]\S*)?$)')
If i do
regex.test("https://google.com"); // returns true
but doing -
"Go to https://google.com".split(regex);
// return ["Go to https://google.com"]
Whereas i expect it to return
["Go to ", "https://google.com"]
Any idea what's going on here?
First of all, you're using a string literal to build your regex, which means that you have to escape your backslashes (since a backslash has a special meaning in strings, used for the line feed char \n for example):
var regex = new RegExp('(?:^(?:(?:[a-z]+:)?//)(?:\\S+(?::\\S*)?#)?(?:localhost|(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*(?:\\.(?:[a-z\\u00a1-\\uffff]{2,})))(?::\\d{2,5})?(?:[/?#]\\S*)?$)');
Another solution would be to use the regex literal, as JavaScript proposes one, but you would then have to escape the slashes:
var regex = /(?:^(?:(?:[a-z]+:)?\/\/)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[\/?#]\S*)?$)/;
Then, your regex will try to match against the entire input due to the ^ and $ anchors. So if you remove them (or better, replace them with word boundaries \b), you'll be able to find URLs in a string for example:
var regex = /(?:\b(?:(?:[a-z]+:)?\/\/)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[\/?#]\S*)?\b)/;
But, the main point is that you're misunderstanding the split concept. Given the string "hello world", if you split by space, you'll end up with ["hello", "world"]: no more space anymore since it was the char that was used to split.
That is, if you split by the URL regex, the output array won't contain the URLs anymore. It seems to me that a lookahead could suit your needs:
var regex = /(?=(?:\b(?:(?:[a-z]+:)?\/\/)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[\/?#]\S*)?\b))/;
"Go to https://google.com".split(regex) // ["Go to ", "https://google.com"]
The regex explained:
(?=(?:\b(?:(?:[a-z]+:)?//)(?:\S+(?::\S*)?#)?(?:localhost|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:[/?#]\S*)?\b))
Debuggex Demo
By splitting a string with a positive lookahead (?=content_of_lookahead), you'll split by each interchar that is followed by the content of the lookahead.
Take a look at 8 Regular Expressions You Should Know.
To match an url you can use following regex :
var regex = "(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w# \.-]*)*\/?$";
"Go to https://google.com".split(regex);
// return ["https://google.com"]
Live example.
Hope this helps.

Extracting url path using regexp

Scenario
I want to extract the path string from the document.location, excluding the leading slash.
So for example, if the url is:
http://stackoverflow.com/questions/ask
I would get:
questions/ask
This should be straightforward:
/* group everything after the leading slash */
var re = /\/(.+)/gi;
var matches = document.location.pathname.match(re);
console.log(matches[0]);
But if I run this snippet in the firebug console, I still get the leading slash.
I have already tested the regexp, and the regexp engine correctly extract the group.
Question
How to properly get the group 1 string?
You don't really need regular expressions if you just want to get pathname without leading slash. Since location.pathname always starts with / you can simply take the substring from the first index:
document.location.pathname.substr(1) // or .slice(1)
Are you saying trailing or leading slash? From your post it looks like leading slash.
document.location.pathname.replace(/^\//,"")
By the way, your regexp is right, but you just need to remove gi and read matches[1] rather than matches[0], because matches[0] is the whole string matches the regexp, while matches[1] is the captured part within the matched string (quote with brackets in the regexp).
var matches = document.location.pathname.match(/\/(.+)/);
console.log(matches); // ["/questions/ask", "questions/ask"]
Using regex you can do:
var m = 'http://stackoverflow.com/questions/ask'.match(/\/{2}[^\/]+(\/.+)/);
console.log(m[1]); /questions/ask

Split string by regex in JavaScript

I'm trying to split a string by the following array of characters:
"!", "%", "$", "#"
I thought about using regex, so I developed the following method which I thought would split the string by the characters:
var splitted = string.split(/\!|%|\$|#*/);
However, when I run the following code, the output is split by every character, not what I was hoping for:
var toSplit = "abc%123!def$456#ghi";
var splittedArray = toSplit.split(/\!|%|\$|#*/);
How could I make it so that splittedArray contains the following elements?
"abc", "123", "def", "456", "ghi"
Any help appreciated.
#* matches the empty string and there's an empty string between any two characters, so the string is split at every single character. Use + instead:
/\!|%|\$|#+/
Also if you meant the + to apply to every character and not just # then group them up:
/(\!|%|\$|#)+/
Or better yet, use a character class. This lets you omit the backslashes since none of these characters are special inside square brackets.
/[!%$#]+/
Use the following:
var splittedArray = toSplit.split(/[!%$#]+/);
Your current code will split between every character because #* will match empty strings. I am assuming since you used #* that you want to consider consecutive characters a single delimiter, which is why the + is at the end of the regex. This will only match one or more characters, so it will not match empty strings.
The [...] syntax is a character class, which is like alternation with the | character except that it only works for single characters, so [!%$#] will match either !, %, $, or #. Inside of the character class the escaping rules change a little bit, so you can just use $ instead of \$.

Categories

Resources