Split text by urls [duplicate] - javascript

This question already has answers here:
Javascript and regex: split string and keep the separator
(11 answers)
What is the best regular expression to check if a string is a valid URL?
(62 answers)
Closed 3 years ago.
I need to split a given text by urls that it might contain, while keeping the urls-separators in the resulting array.
For example splitting this text:
"An example text that contains many links such us
http://www.link1.com, https://www.link2.com/path?param=value, www.link3.com and
link-4.com."
would result into this array:
["An example text that contains many links such us ", "http://www.link1.com", ", ", "https://www.link2.com/path?param=value", ", ", "www.link3.com", " and ", "link-4.com", "."]
I tried to use String.protoype.split() with a regular expression, but it's not working as it contains unwanted parts of the urls themselves:
var text = "An example text that contains many links such us http://www.link1.com, https://www.link2.com/path?param=value, www.link3.com and link-4.com.";
console.log(text.split(/((https?:\/\/)|([\w-]{2,}[.])+([\S]{2,})[^\s|,!$\^\*;:{}`()])+/ig));
EDIT
This question is different than the suggested ones, my purpose is not to check if a url is valid or not, but to find a regular expression susceptible to be used in the split method, and that splits correctly the text.
As for splitting a text by regex, it is already used in the snippet sample. What is proposed in the suggested question is more general, and what I am looking for is more specific to urls.

it's not ideal and it would be hard to find or create perfect regex for it that you going to test all cases but you can quickly write something like this:
var text2 = "An example text that contains many links such us http://www.link1.com, https://www.link2.com/path?param=value, www.link3.com and link-4.com.";
text2
.split(/(^|\s)((https?:\/\/)?[\w-]+(\.[\w-]+)+\.?(:\d+)?(\/\S*)?)/ig)
.filter(Boolean)
.filter((x)=>{ return x.indexOf('.')>0 })

Related

how to get the whole string matched in regex [duplicate]

This question already has answers here:
How do I find words starting with a specific letter?
(2 answers)
Closed 2 years ago.
I have this code
const paragraph = 'my name is bright and this is a testing interface, right.';
const regex = /\b(b)/g;
const found = paragraph.match(regex);
console.log(found);
What i want is that i want to get the whole word instead of just a single letter.
e.g the output in this code above is b which is gotten from the string bright in the paragraph but i don't just want the b but the word bright as a whole and still be able to manipulate it like make it bolder or something else. Please how do i do it and i have also checked other similar questions on stackoverflow but nothing
Assuming you only want to match words delimited by spaces, this should do the trick.
((?:\w)+)

Remove unquoted attribute from string [duplicate]

This question already has answers here:
Parse an HTML string with JS
(15 answers)
Closed 3 years ago.
I have an issue parsing the dom elements when text contains something like below. I wanted to remove highligted text from actual using Javascript. Can you please help me on this. I want to depend on regular expressions on the same.
I know how to get the quoted attributes using standard string functions and also using dom parser.
For the nodes like below, using string functions such as replace, slice may work but I need to traverse thru entire string. Which is performance issue.
So I wanted to go with regular expressions to find such attributes in a node.
<p class=MsoListParagraphCxSpFirst style='text-indent:-.25in;mso-list:l0 level1 lfo1'>
In the above example I want to remove class attribute and class name could be anything. These nodes are generated from MS word and are not in my control.
EDIT: Following is the pattern I am using to search unquoted text. But it is not working
var pattern = /<p class=\s*=\s*([^" >]+)/im
Regex101 Example
Regex:
\S+?=[^'"]\S*[^'"\s]
the tricky part with this one is finding the end of the unquoted attribute, in this example i'm assuming it will not contain any white space characters, so I can use the first occurrence of white space to terminate the match

"Learning JavaScript" Chapter 17: Regular Expressions...Backreference examples failing [duplicate]

This question already has answers here:
What is a non-capturing group in regular expressions?
(18 answers)
Closed 4 years ago.
I'm currently reading "Learning JavaScript" by Ethan Brown (2016). I'm going through the examples in the Backreferences section and they keep coming up as 'null'. There are two examples.
Example 1: Match names that follow the pattern XYYX.
const promo = "Opening for XAAX is the dynamic GOOG! At the box office now!";
const bands = promo.match(/(?:[A-Z])(?:[A-Z])\2\1/g);
console.log('bands: '+ bands);//output was null
If I understand the text correctly, the result should be...
bands: XAAX, GOOG
Example 2: Matching single and/or double quotation marks.
//we use backticks here because we're using single and
//double quotation marks:
const html = `<img alt='A "simple" example,'>` +
`<img alt="Don't abuse it!">`;
const matches = html.match(/<img alt=(?:['"]).*?\1/g);
console.log('matches: '+ matches);//output was null
Again, if I understand the text correctly, the result should not be 'null'. The text doesn't say exactly what the result should be.
I'm at a loss trying to figure out why when I run this in Node.js it keeps giving me 'null' for these two examples. Anyone have any insight?
The problem is that your group there is
(?:['"])
the ?: indicates that it's a non-capturing group - that means that you can't backreference the group (or get the group in your match result). Use plain parentheses instead to indicate that the group should be captured:
const html = `<img alt='A "simple" example,'>` +
`<img alt="Don't abuse it!">`;
const matches = html.match(/<img alt=(['"]).*?\1/g);
console.log('matches: '+ matches);
Looks like an error in the book.
The regex in the code snippets are using non-capturing groups: What is a non-capturing group? What does (?:) do?
These are not usable with back references. Use normal parentheses instead:
const promo = "Opening for XAAX is the dynamic GOOG! At the box office now!";
const bands = promo.match(/([A-Z])([A-Z])\2\1/g);
console.log('bands: '+ bands);//output was null
The same goes for the other samples...
Update: I have checked the original source (3rd edition) and can confirm: All samples are wrong and using non-capturing groups.
BTW: The author writes:
Grouping enables another technique called backreferences. In my
experience, this is one of the least used regex features, but there is
one instance where it comes in handy. ...
The only time I think I have ever needed to use backreferences (other
than solving puzzles) is matching quotation marks. In HTML, you can
use either single or double quotes for attribute values.
And then follows the HTML regex sample shown in the OP. Cthulhu is calling?

how to parse values from comma separated values using regex for javascript [duplicate]

This question already has an answer here:
Learning Regular Expressions [closed]
(1 answer)
Closed 5 years ago.
I have
HOSTNAMEA,HOSTNAMEB,HOSTNAMEC,...
I have a third party workflow tool that can do the looping but can only use regex to parse values. I'd like to get a regex that grabs each hostname and puts into it's own variable in my workflow tool so the results will be
HOSTNAMEA
HOSTNAMEB
HOSTNAMEC
...
I'm struggling to get a regex that just grabs the text block X between the commas
ever heard of \w+ if you just want the strings between the comma, you can use .split(", ") as well
var str = "HOSTNAMEA,HOSTNAMEB,HOSTNAMEC";
var res = str.match(/\w+/g);
console.log(res.join(" "));
sample code for your help

Text between two dollar signs JavaScript Regex [duplicate]

This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 6 years ago.
I'm trying to use RegEx to select all strings between two dollar signs.
text = text.replace(/\$.*\$/g, "meow");
I'm trying to turn all text between two dollar signs into "meow" (placeholder).
EDIT:
Original question changed because the solution was too localized, but the accepted answer is useful information.
That's pretty close to what you want, but it will fail if you have multiple pairs of $text$ in your string. If you make your .* repeater lazy, it will fix that. E.g.,
text = text.replace(/\$.*?\$/g, "meow");
I see one problem: if you have more than one "template" like
aasdasdsadsdsa $a$ dasdasdsd $b$ asdasdasdsa
your regular expression will consider '$a$ dasdasdsd $b$' as a text between two dolar signals. you can use a less specific regular expression like
/\$[^$]*\$/g
to consider two strings in this example

Categories

Resources