JavaScript Regex - Splitting a string into an array by the Regex pattern - javascript

Given an input field, I'm trying to use a regex to find all the URLs in the text fields and make them links. I want all the information to be retained, however.
So for example, I have an input of "http://google.com hello this is my content" -> I want to split that by the white space AFTER this regex pattern from another stack overflow question (regexp = /(ftp|http|https)://(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(/|/([\w#!:.?+=&%#!-/]))?/) so that I end up with an array of ['http://google.com', 'hello this is my content'].
Another ex: "hello this is my content http://yahoo.com testing testing http://google.com" -> arr of ['hello this is my content', 'http://yahoo.com', 'testing testing', 'http://google.com']
How can this be done? Any help is much appreciated!

First transform all the groups in your regular expression into non-capturing groups ((?:...)) and then wrap the whole regular expression inside a group, then use it to split the string like this:
var regex = /((?:ftp|http|https):\/\/(?:\w+:{0,1}\w*#)?(?:\S+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:.?+=&%#!-/]))?)/;
var result = str.split(regex);
Example:
var str = "hello this is my content http://yahoo.com testing testing http://google.com";
var regex = /((?:ftp|http|https):\/\/(?:\w+:{0,1}\w*#)?(?:\S+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:.?+=&%#!-/]))?)/;
var result = str.split(regex);
console.log(result);

You had few unescaped backslashes in your RegExp.
var str = "hello this is my content http://yahoo.com testing testing http://google.com";
var captured = str.match(/(ftp|http|https):\/\/(\w+:{0,1}\w*#)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%#!-/]))?/g);
var nonCaptured = [];
str.split(' ').map((v,i) => captured.indexOf(v) == -1 ? nonCaptured.push(v) : null);
console.log(nonCaptured, captured);

Related

JavaScript regex for patterns starting with special characters [duplicate]

Hi all I tried to create some regex with random value.
var data = "demo purpose?"; **OR** var data = "demo purpose";
var sentence = "can I put these app as demo purpose?";
var re = new RegExp("\\b(" + data + ")\\b", "g");
console.log(sentence.match(re)); // output ["demo purpose"]
In variable data have two different value demo purpose? & demo purpose with only question mark. Both console out are same please any one Give me hint what should i do in these case.
-
Thank you
you need to escape ? (i.e. write \\?) or else it would be interpreted as a quantifier in regex.
Furthermore, the \\b is not really necessary because it tries to match a non blank char in which case there is nothing behind demo purpose? so sentence.match(new RegExp("\\b(demo purpose\\?)\\b", "g")) would return null.
If you want randomness, use Math.random. Make an array and get an random integer or 0 or 1 (with Math.floor) as the index.
In order to pass variables into JS regex when using constructor notation, you need to escape all characters that act as regex special characters (quantifiers, group delimiters, etc.).
The escape function is available at MDN Web site:
function escapeRegExp(string){
return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}
Note also, that \b is a word boundary, and it prevents from matching the strings you need as ? is a non-word character. If you do not need to match word boundaries, remove \b. If you need to check if the search word is a whole word, use (?:^|\W) at the beginning and (?!\w) at the end and use exec rather than match to obtain access to the captured group 1.
So, your code will become:
function escapeRegExp(string){
return string.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
}
var data = "demo purpose?";
var sentence = "can I put these app as demo purpose?";
var re = new RegExp("(?:^|\\W)(" + escapeRegExp(data) + ")(?!\\w)", "g");
while ((m = re.exec(sentence)) !== null) {
console.log(m[1]); // output ["demo purpose?"]
}
If you search for emo purpose?, no result will be returned since there will be no match.
This
var data = "demo purpose?"; // **OR** var data = "demo purpose";
var sentence = "can I put these app as demo purpose?";
var re = new RegExp(/demo purpose\?/, "g");
console.log(sentence.match(re)); // output ["demo purpose?"]
return ["demo purpose?"],
changing RegExp("xxx", "g"); to RegExp(/xxx/, "g");
You can do
var data = /demo purpose\?/; // **OR** var data = "demo purpose";
var sentence = "can I put these app as demo purpose?";
var re = new RegExp(data, "g");
console.log(sentence.match(re));
and you will get the same output

Regex that allows a pattern to start with a an optional, specific character, but no other character

How can I write a regex that allows a pattern to start with a specific character, but that character is optional?
For example, I would like to match all instances of the word "hello" where "hello" is either at the very start of the line or preceded by an "!", in which case it does not have to be at the start of the line. So the first three options here should match, but not the last:
hello
!hello
some other text !hello more text
ahello
I'm specfically interested in JavaScript.
Match it with: /^hello|!hello/g
The ^ will only grab the word "hello" if it's at the beginning of a line.
The | works as an OR.
var str = "hello\n!hello\n\nsome other text !hello more text\nahello";
var regex = /^hello|!hello/g;
console.log( str.match(regex) );
Edit:
If you're trying to match the whole line beginning with "hello" or containing "!hello" as suggested in the comment below, then use the following regex:
/^.*(^hello|!hello).*$/gm
var str = "hello\n!hello\n\nsome other text !hello more text\nahello";
var regex = /^.*(^hello|!hello).*$/gm;
console.log(str.match(regex));
Final solution (hopefully)
Looks like, catching the groups is only available in ECMAScript 2020. Link 1, Link 2.
As a workaround I've found the following solution:
const str = `hello
!hello
some other text !hello more text
ahello
this is a test hello !hello
JvdV is saying hello
helloing or helloed =).`;
function collectGroups(regExp, str) {
const groups = [];
str.replace(regExp, (fullMatch, group1, group2) => {
groups.push(group1 || group2);
});
return groups;
}
const regex = /^(hello)|(?:!)(hello\b)/g;
const groups = collectGroups(regex, str)
console.log(groups)
/(?=!)?(\bhello\b)/g should do it. Playground.
Example:
const regexp = /(?=!)?(\bhello\b)/g;
const str = `
hello
!hello
some other text !hello more text
ahello
`;
const found = str.match(regexp)
console.log(found)
Explanation:
(?=!)?
(?=!) positive lookahead for !
? ! is optional
(\bhello\b): capturing group
\b word boundary ensures that hello is not preceded or succeeded by a character
Note: If you also make sure, that hello should not be succeeded by !, then you could simply add a negative lookahead like so /(?=!)?(\bhello\b)(?!!)/g.
Update
Thanks to the hint of #JvdV in the comment, I've adapted the regex now, which should meet your requirements:
/(^hello\b)|(?:!)(hello\b)/gm
Playground: https://regex101.com/r/CXXPHK/4 (The explanation can be found on the page as well).
Update 2:
Looks like the non-capturing group (?:!) doesn't work well in JavaScript, i.e. I get a matching result like ["hello", "!hello", "!hello", "!hello"], where ! is also included. But who cares, here is a workaround:
const regex = /(^hello\b)|(?:!)(hello\b)/gm;
const found = (str.match(regex) || []).map(m => m.replace(/^!/, ''));

Javascript find and replace string with variable values

I have an angular app, but I have one page that needs to be pre-rendered with no javascript (for printing and PDF), some of the content is loaded with the Angular variable inputs {{field(10)}}
I pre-load the content but need a way to find and replace the string so that:
{{field(10)}}
is changed into the value of this
submission.inputs[10]
So for example:
var string = 'This is some content: {{field[10]}}';
var submission.inputs[10] = 'replace value';
I want the new string to be this
var newString = 'This is some content: replace value';
This is running in Node so it has the latest version of Javascript.
I tried this:
var newString = string.replace(/\{{(.+?)}}/g, submission.inputs[$1]);
But I don't think my syntax is correct.
Your current regex extracts all the text contained within {{}} with its capture group. But you only want the index of the replacement, which is contained within the [], and not the entire string itself. So you have two options:
Modify regex to capture only the index, so that would look like /{{field\[(.+?)\]}}/, where the capture group now only takes the number within the brackets.
Leave the original regex alone, but change the replace function to extract the number from the returned match. In this case you'll have a second regex (or some other method) to extract the number from the matched string (in this case, get "10" out of "field[10]").
Here's an example demonstrating both:
var string = 'This is some content: {{field[10]}}';
var submission = {inputs: []};
submission.inputs[10] = 'replace value';
// I want the new string to be this
// var newString = 'This is some content: replace value';
var newString = string.replace(/{{field\[(.+?)\]}}/g, (match, cap1) => submission.inputs[cap1]);
console.log(newString)
// OR:
var otherNewString = string.replace(/\{{(.+?)}}/g, (match, cap1) => submission.inputs[cap1.match(/\[(.+?)\]/)[1]]);
console.log(otherNewString)
You can use the following regex to extract the contents between {{field[ and ]}} as the snippet below shows. The snippet uses a callback in the replace function and passes the captured group's value to it so that an appropriate value may be returned (submission.inputs[b] where b is the number you want: 10 in this case).
{{[^[]+\[([^\]]+)]}}
{{ Match this literally
[^[]+ Match any character except [ one or more times
\[ Match [ literally
([^\]]+) Capture any character except ] one or more times into capture group 1. This is the value you want
]}} Match this literally
var string = 'This is some content: {{field[10]}}'
var submission = {inputs:[]}
submission.inputs[10] = 'replace value'
var newString = string.replace(/{{[^[]+\[([^\]]+)]}}/g, function(a, b) { return submission.inputs[b] })
console.log(newString)

Regex check both side of match but not include in match string

I want get match with checking both side expropriation of main match.
var str = 1234 word !!! 5678 another *** 000more))) get word and another
console.log(str.match(/(?!\d+\s?)\w+(?=\s?\W+)/g))
>> (3) ["word", "another", "more"]
it check both side but not include in the main match sets.
But in html it not working [not working]
var str = ''; get url, url2 and url3
console.log(str.match(/(?!href=")[^"]+?(?=")/g))
>> (6) ["<a href=", "url3"]
I try to Negative lookarounds using (?!href=") and Positive lookarounds using (?=") to match only the value of its attribute but it return more attributes.
Is there any way to so like this here, Thanks
What you could do for your example data is capture what is between double quotes href="([^"]+) in an captured group and loop through the result:
var str = '';
var pattern = /href="([^"]+)/g;
var match = pattern.exec(str);
while (match != null) {
console.log(match[1]);
match = pattern.exec(str);
}
In other flavors of regex you could have used e.g. positive lookbehind
((?<=href="), but unfortunately Javascript regex does not support
lookbehinds.
A reasonable solution is:
Match href=" as "ordinary" content, to be ignored.
Match the attribute value as a capturing group ((\w+)),
to be "consumed".
Set the boundary of the above group with a *positive lookup"
((?=")), just as you did.
So the whole regex can be:
href="(\w+)(?=")
and read "your" value from group 1.
You can't parse HTML with regex. Because HTML can't be parsed by regex.
Have you tried using the DOM parser that's right at your fingertips?
var str = '';
var div = document.createElement('div');
div.innerHTML = str; // parsing magic!
var links = Array.from(div.getElementsByTagName("a"));
var urls = links.map(function(a) {return a.href;});
// above returns fully-resolved absolute URLs.
// for the literal attribute value, try a.getAttribute("href")
console.log(urls);

Javascript regex to bring back all symbol matches?

I need a javascript regex object that brings back any matches of symbols in a string,
take for example the following string:
input = !"£$[]{}%^&*:#\~#';/.,<>\|¬`
then the following code:
input.match(regExObj,"g");
would return an array of matches:
[[,!,",£,$,%,^,&,*,:,#,~,#,',;,/,.,,,<,>,\,|,¬,`,]]
I have tried the following with no luck.
match(/[U+0021-U+0027]/g);
and I cannot use the following because I need to allow none ascii chars, for example Chinese characters.
[^0-9a-zA-Z\s]
var re = /[!"\[\]{}%^&*:#~#';/.<>\\|`]/g;
var matches = [];
var someString = "aejih!\"£$[]{}%^&*:#\~#';/.,<>\\|¬`oejtoj%";
while(match = re.exec(someString)) {
matches.push(match[1]);
}
Getting
['!','"','[',']','{','}','%','^','&','*',':','#','~','#',''',';','/','.','<','>','\','|','`','%]
What about
/[!"£$\[\]{}%^&*:#\\~#';\/.,<>|¬`]/g
?

Categories

Resources