Regex for finding link

Regex for finding link - javascript

I have an issue related to finding a regex for the link with some conditions. Here is the scenario:
I have created utils.ts it's a typescript. basically, it will take an API response as an input and return the formatted HTML supported text, like bold text, email, Images, Links.
So let's take one scenario which I am facing.
as a return of the utils.ts file, I am getting this.
https://www.google.com Click here
(Note: normal links and 'a' tag links can occure in any order)
from the above text, as you can see this part Click here is already in HTML supported method.
So I will get the following output on GUI
https://www.google.com Click here
so from this point, I want a regex which can format https://www.google.com but it must not manipulate Click here as it is already formated.
Here I also want to format https:///www.google.com as follow
Google
The main problem I am facing is when I am replacing the string with 'https://..' with tags it will also replace the links inside 'href' like this
Google Google">Click me</a>
Which is what I don't want.
Please share your thought on this.
Thank you

Not yet formatted links can be found using alternations. The idea is - if a link is formatted it's not captured to a group (don't be confused that the regex still finds something - you should only look at Group 1). Otherwise, the link is captured to a group.
The regex below is really simple, just to explain the idea. You might want to update it with a better URL search pattern.
demo
(?:href="https?\S+")|(https?\S+)

If I understood correctly, you want to extract from the text those web addresses that appear in the text and are not links. If so check out the following javascript:
//the data:
var txt1='https://www.google.com Click here http://other.domain.com';
// strip html tags
String.prototype.stripHTML = function () {
var reTag = /<(?:.|\s)*?>/g;
return this.replace(reTag, " ");
};
var txt2=txt1.stripHTML();
//console.log(txt2);
//split tokens
var regex1 = /\s/;
var tokens = txt2.split(regex1);
//console.log(tokens);
//build an address table
regex2=/^https?:\/\/.*/;
var i=0, j=0;
var addresses=[];
for (i in tokens) {
if (regex2.test(tokens[i])) {
addresses[j] = tokens[i];
j++;
}
i++;
}
console.log(addresses);

Related

AEM <a href> not working when using JavaScript to concatenate string with currentPage.path

I'm creating a project in Adobe Experience Manager and have run into problems in the implementation of my language switching component. The component is supposed allow the user to click on a link and have the language of the page change. For example, if they are on the English page /content/myproject/en/home.html and they click it, they are supposed to end up on /content/myproject/fr_ca/home.html.
As part of getting it up and running, I was trying to concatenate currentPage.path and "/profile.html" so that I could at least get the component to register some change to the string in the tag.
From the English home.html page, currentPage.path produces the string "/content/myproject/en/home". Concatenating it with /profile.html should produce the string "/content/myproject/en/home/profile.html" which it does if I use Sightly to do something like <p>${langinfo.goToPage}</p>.
However, if I try this: the component will show a blank anchor tag. It will also blank anything I've written in between the two anchor tags.
So far I've tried returning a string I've written out by hand "/content/myproject/en/home/profile.html" as the goToPage value and it works in the anchor tag. Also, if I only return currentPage.path it works. It refuses to work like this if I try to concatenate but it will work like this: <a>It works here!.
The best I can figure at this point is that currentPage.path is a Java String object that is being accessed by JavaScript and there are problems when JS tries to type it to a JavaScript string with +. It also doesn't work if I try to cast the statement as a string with either String(goToPage) or goToPage.toString(). It doesn't seem to matter when I cast it as a string. One blog I looked at seemed to hint that this was a problem with Rhino and that I should do a .toString() after the initial concatenation. That didn't work. Another post on stackOverflow seemed to point out that it could be a problem trying to concatenate a Java String object in JavaScript and pointed out that this should be taken into account but didn't go into how to deal with the issue.
I appending to a string isn't the intended end functionality of my component, but if I can't modify the string by concatenating, seems like I can hardly do a search and replace to change /en/ to /fr-ca/. If anyone has a more elegant solution to my problem than what I'm attempting, that would be appreciated as much as a fix for what I'm working on.
I've pasted my code here (as suggested) and posted screenshots of my code to help.
Javascript:
use(function() {
var pageLang = currentPage.properties.get("jcr:language", "en");
var otherLangText;
var currPage = currentPage.name;
var currPagePath = currentPage.path;
var goPage;
if (pageLang == "fr_ca") {
otherLangText = "English";
goPage = "/content/myproject/en/index/home.html";
} else {
otherLangText = "Français";
goPage = "/content/myproject/fr-ca/home/home.html";
};
return {
otherLanguage: otherLangText,
goToPage: goPage
}
})
HTML:
<nav data-sly-use.langinfo="langcontact.js">
<ul class="lang-list-container">
<li class="lang-list-item">${langinfo.otherLanguage}</li>
<li class="lang-list-item">Contact</li>
</ul>
</nav>
I'm pretty stumped here. What am I doing wrong?

The line <li class="lang-list-item">${langinfo.otherLanguage}</li>
should actually be -
<li class="lang-list-item">${langinfo.otherLanguage}</li>
What you are trying to do is pass an object to object which will not work, in case you want to pass the extension to be used in JS you need to do that in the USE call. Refer to the samples in this blog.
Update -
You code works fine for me as long as the link is valid.
use(function() {
var pageLang = currentPage.properties.get("jcr:language", "en");
var otherLangText;
var currPage = currentPage.name;
var currPagePath = currentPage.path;
var goPage;
if (pageLang == "fr_ca") {
otherLangText = "English";
goPage = currPagePath+"/profile.html";
} else {
otherLangText = "Français";
goPage = currPagePath+"/profile.html";
};
return {
otherLanguage: otherLangText,
goToPage: goPage
}
})
The only possible reason you are getting empty href is because your link is not valid and thus linkchecker is removing it. If you check on author instance you will see broken link symbol along with your link text.
Ideally you should fix the logic so that proper valid link is generated. On development you could disable the linkchecker and/orlink transformer to let all links work (even invalid ones | not recommended). The two services can be checked in http://localhost:4502/system/console/configMgr by searching for - Day CQ Link Checker Service and Day CQ Link Checker Transformer

How to detect Links with out anchor element in a plain text

If user enters his text in the text box and saves it and again what's to add some more text he can edit that text and save it if required.
Firstly if user enters that text with some links I, detected them and converted any hyperlinks to linkify in new tab. Secondly if user wants to add some more text and links he clicks on edit and add them and save it at this time I must ignore the links that already hyperlinked with anchor button
Please help and advice
For example:
what = "<span>In the task system, is there a way to automatically have any site / page URL or image URL be hyperlinked in a new window?</span><br><br><span>So If I type or copy http://www.stackoverflow.com/ for example anywhere in the description, in any of the internal messages or messages to clients, it automatically is a hyperlink in a new window.</span><br>http://www.stackoverflow.com/<br> <br><span>Or if I input an image URL anywhere in support description, internal messages or messages to cleints, it automatically is a hyperlink in a new window:</span><br> <span>https://static.doubleclick.net/viewad/4327673/1-728x90.jpg</span><br><br>https://static.doubleclick.net/viewad/4327673/1-728x90.jpg<br><br><br><span>This would save us a lot time in task building, reviewing and creating messages.</span>
Test URL's
http://www.stackoverflow.com/
http://stackoverflow.com/
https://stackoverflow.com/
www.stackoverflow.com
//stackoverflow.com/
<a href='http://stackoverflow.com/'>http://stackoverflow.com/</a>";
I've tried this code
function Linkify(what) {
str = what; out = ""; url = ""; i = 0;
do {
url = str.match(/((https?:\/\/)?([a-z\-]+\.)*[\-\w]+(\.[a-z]{2,4})+(\/[\w\_\-\?\=\&\.]*)*(?![a-z]))/i);
if(url!=null) {
// get href value
href = url[0];
if(href.substr(0,7)!="http://") href = "http://"+href;
// where the match occured
where = str.indexOf(url[0]);
// add it to the output
out += str.substr(0,where);
// link it
out += ''+url[0]+'';
// prepare str for next round
str = str.substr((where+url[0].length));
} else {
out += str;
str = "";
}
} while(str.length>0);
return out;
}
Please help
Thanks.

here is a regex where you select all the links without having anchors
(?:(?:http(?:s)?(?:\:\/\/))?(?:www\.)?(?:\w)*(?:\.[a-zA-Z]{2,4}\/?))(?!([\/a-z<\/a>])|(\'|\"))
Here is a RegExFiddle (updated 14:41)
quit a lil difficult task because in javascript you don't have a preceded by statement. :)
EDIT1: Now it detects...
http://www.abc.xy
http://abc.xy
https://www.abc.xy
https://abc.xy
www.abc.xy
abc.xy
EDIT2:
Here is it a little shorted and the usage fiddle
Regex
/((http(s)?(\:\/\/))?(www\.)?(\w)*(\.[a-zA-Z]{2,4}\/?))(?!([\/a-z<\/a>])|(\'|\"))/g
function
function Linkify(str) {
var newStr = str.replace(/((http(s)?(\:\/\/))?(www\.)?(\w)*(\.[a-zA-Z]{2,4}\/?))(?!([\/a-z<\/a>])|(\'|\"))/g,'$1');
return newStr;
}
var newData = Linkify(data);
WORKING JS-FIDDLE
EDIT 1.000.000 :D
/((http(s)?(\:\/\/))?(www\.)?([\w\-\.\/])*(\.[a-zA-Z]{2,3}\/?))(?!(.*a>)|(\'|\"))/g
this solves your problem now.
the only problem you will run in here is, 4 letters after a dot is not selected. e.g .info if you want them selected than change {2,3} to {2,4} BUT be carefully... if someone adds a text like my name is.john than is.john will be translated to a link.
EDIT 2.0
If you have a really complex URL like the following
((http(s)?(\:\/\/))?(www\.)?([\a-zA-Z0-9-_\.\/])*(\.[a-zA-Z]{2,3}\/?))([\a-zA-Z0-9-_\/?=&#])*(?!(.*a>)|(\'|\"))
Matches
https://stackoverflow.com/questions/34170950/summernote-inserthtml?firstname=channaveer&lastname=hakari#fsdfsdf

A more simple solution is probably to strip the links which you created (so the user gets exactly what they typed when they click "Edit" again).
Another idea is to split the string at </a>. That gives you a list of strings which all end with an anchor element (except the last one). Iterate over this list, cut away the part after the last <a, linkify.

Find specific results from a text file

Suppose you were reading a text file, with Javascript and jQuery and suppose the server-side guy was unwilling to give you say xml or JSON, and you want to parse the thing once to get relevant text that you will use later in an autocomplete, like so:
Text file (assume there are many similar listings and there are different DATABASES):
QUERY:1
DATABASE:geoquery
NL:What are the capitals of the states that border the most populated states?
SQL:something
DR:
root(ROOT-0, What-1)
cop(What-1, are-2)
det(capitals-4, the-3)
nsubj(What-1, capitals-4)
det(states-7, the-6)
prep_of(capitals-4, states-7)
nsubj(border-9, states-7)
rcmod(states-7, border-9)
det(states-13, the-10)
advmod(populated-12, most-11)
amod(states-13, populated-12)
dobj(border-9, states-13)
QUERY:2
DATABASE:geoquery
NL:What are the capitals of states bordering New York?
SQL:SELECT state.Capital FROM state JOIN border_info ON state.State_Name
DR:
root(ROOT-0, What-1)
cop(What-1, are-2)
det(capitals-4, the-3)
nsubj(What-1, capitals-4)
prep_of(capitals-4, states-6)
partmod(states-6, bordering-7)
nn(York-9, New-8)
dobj(bordering-7, York-9)
I can use a regex to peel off say all NL: for example, but I need to first pare the file down so only specific NL's associated with a DATABASE get read. So read the file once getting all matches for a specific database that the user selects from a select, then make an array of NL from that list to be the source of an autocomplete.
$(document).ready(function(){
$.get('inputQueryExamples.txt',function(data){
// need code here to read text file first and limit results
var queryString = data;
var cleanString = "";
cleanString = queryString.match(/^NL.*/gm);
console.log(cleanString);
$('#what').html(cleanString);
var nlString = cleanString.map(function(el) {return el.replace('NL:','');});
$('#query-list').autocomplete({
source:nlString
});
});//end get
});
Thanks for any insight.

Using regex for this is like using ducktape to patch up a severed limb.
Any way,
By the looks of it, you want to get all of the NL('s) when they come from a particular database.
You would need to do a multiline regex match, with a positive lookbehind for the database name, then you'd simply match anything after NL, stopping at the next newline.
Example:
(?<=DATABASE:geoquery).*?(?<=NL:)(.*?)(?=[\r\n])
Online demo:
Regex101 Example

Matching a list of possible hashtags in a tweet - Javascript or jQuery

Firstly I've looked at a lot of posts on Stackoverflow but I don't see one which seems to be the definitive way. Someone always seems to find a flaw in the regex.
I already have retrieved my tweets and obviously they can contain any number of hashtags in each one.
If I have an array of possible hashtags that I want to find - ["#ENGLAND","#IRELAND","#wales"] etc.
What is a RELIABLE way to check if a tweet contains these hashtags. I don't want to call the API again, I only want to check my existing tweets, as I'm clicking on buttons to change the filtering on the fly, want to avoid rate limit if they keep clicking around for ages.
EDIT:
Example: Here is a tweet that contains #ENGLAND and #someothertag
I want to search all the tweets and just show the tweets that CONTAIN one or more of my array of tags, I already cache the tweets, I don't want to make a call containing any tags just filter the existing results!

Why only hashify particular hashtags (which you need to specify and then maintain) when you can hashify any hashtag?
I usually use something like this:
var hashregex = /#([a-z0-9_\-]+)/gi,
text = text.replace(hashregex, function (value) {
return '<a target="_blank" href="http://twitter.com/#!/search?q=' + value.replace('#', '%23') + '">' + value + '</a>';
});
Then you can just use text when you set the content to the processed tweets

You could store the hashtags from the entities on the element, for instance
<div class='tweet' data-hashtags='england ireland'>some_tweet</div>
And filter like this when someone clicks your button:
$('div.tweet').hide();
$('div.tweet[data-hashtags~="ireland"]').show();
It's obviously greatly simplified, but the general approach should help you avoid having to parse out the tags yourself

// match a #, followed by either "question" or "idea"
var myregexp = /#(england|ireland|wales)\b/i;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1]; // will contain "england","ireland", or "wales"
} else {
result = "";
}
If you don't know the names of the hashtags on hand
replace
var myregexp = /#(england|ireland|wales)\b/i;
with
var myregexp = /#(\w+)/; // Use this instead

Replace all strings "<" and ">" in a variable with "<" and ">"

I am currently trying to code an input form where you can type and format a text for later use as XML entries. In order to make the HTML code XML-readable, I have to replace the code brackets with the corresponding symbol codes, i.e. < with < and > with >.
The formatted text gets transferred as HTML code with the variable inputtext, so we have for example the text
The Genji and the Heike waged a long and bloody war.
which needs to get converted into
The Genji and the Heike waged a long and bloody war.
I tried it with the .replace() function:
inputxml = inputxml.replace("<", "<");
inputxml = inputxml.replace(">", ">");
But this would just replace the first occurrence of the brackets. I'm pretty sure I need some sort of loop for this; I also tried using the each() function from jQuery (a friend recommended I looked at the jQuery package), but I'm still new to coding in general and I have troubles getting this to work.
How would you code a loop which would replace the code brackets within a variable as described above?
Additional information
You are, of course, right in the assumption that this is part of something larger. I am a graduate student in Japanese studies and currently, I am trying to visualize information about Japenese history in a more accessible way. For this, I am using the Simile Timeline API developed by MIT grad students. You can see a working test of a timeline on my homepage.
The Simile Timeline uses an API based on AJAX and Javascript. If you don't want to install the AJAX engine on your own server, you can implement the timeline API from the MIT. The data for the timeline is usually provided either by one or several XML files or JSON files. In my case, I use XML files; you can have a look at the XML structure in this example.
Within the timeline, there are so-called "events" on which you can click in order to reveal additional information within an info bubble popup. The text within those info bubbles originates from the XML source file. Now, if you want to do some HTML formatting within the info bubbles, you cannot use code bracket because those will just be displayed as plain text. It works if you use the symbol codes instead of the plain brackets, however.
The content for the timeline will be written by people absolutely and totally not accustomed to codified markup, i.e. historians, art historians, sociologists, among them several persons of age 50 and older. I have tried to explain to them how they have to format the XML file if they want to create a timeline, but they occasionally slip up and get frustrated when the timeline doesn't load because they forgot to close a bracket or to include an apostrophe.
In order to make it easier, I have tried making an easy-to-use input form where you can enter all the information and format the text WYSIWYG style and then have it converted into XML code which you just have to copy and paste into the XML source file. Most of it works, though I am still struggling with the conversion of the text markup in the main text field.
The conversion of the code brackets into symbol code is the last thing I needed to get working in order to have a working input form.

look here:
http://www.bradino.com/javascript/string-replace/
just use this regex to replace all:
str = str.replace(/\</g,"<") //for <
str = str.replace(/\>/g,">") //for >

To store an arbitrary string in XML, use the native XML capabilities of the browser. It will be a hell of a lot simpler that way, plus you will never have to think about the edge cases again (for example attribute values that contain quotes or pointy brackets).
A tip to think of when working with XML: Do never ever ever build XML from strings by concatenation if there is any way to avoid it. You will get yourself into trouble that way. There are APIs to handle XML, use them.
Going from your code, I would suggest the following:
$(function() {
$("#addbutton").click(function() {
var eventXml = XmlCreate("<event/>");
var $event = $(eventXml);
$event.attr("title", $("#titlefield").val());
$event.attr("start", [$("#bmonth").val(), $("#bday").val(), $("#byear").val()].join(" "));
if (parseInt($("#eyear").val()) > 0) {
$event.attr("end", [$("#emonth").val(), $("#eday").val(), $("#eyear").val()].join(" "));
$event.attr("isDuration", "true");
} else {
$event.attr("isDuration", "false");
}
$event.text( tinyMCE.activeEditor.getContent() );
$("#outputtext").val( XmlSerialize(eventXml) );
});
});
// helper function to create an XML DOM Document
function XmlCreate(xmlString) {
var x;
if (typeof DOMParser === "function") {
var p = new DOMParser();
x = p.parseFromString(xmlString,"text/xml");
} else {
x = new ActiveXObject("Microsoft.XMLDOM");
x.async = false;
x.loadXML(xmlString);
}
return x.documentElement;
}
// helper function to turn an XML DOM Document into a string
function XmlSerialize(xml) {
var s;
if (typeof XMLSerializer === "function") {
var x = new XMLSerializer();
s = x.serializeToString(xml);
} else {
s = xml.xml;
}
return s
}

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/replace
You might use a regular expression with the "g" (global match) flag.
var entities = {'<': '<', '>': '>'};
'<inputtext><anotherinputext>'.replace(
/[<>]/g, function (s) {
return entities[s];
}
);

You could also surround your XML entries with the following:
<![CDATA[...]]>
See example:
<xml>
<tag><![CDATA[The <b>Genji</b> and the <b>Heike</b> waged a long and bloody war.]]></tag>
</xml>
Wikipedia Article:
http://en.wikipedia.org/wiki/CDATA

What you really need, as mentioned in comments, is to XML-encode the string. If you absolutely want to do this is Javascript, have a look at the PHP.js function htmlentities.

I created a simple JS function to replace Greater Than and Less Than characters
Here is an example dirty string: < noreply#email.com >
Here is an example cleaned string: [ noreply#email.com ]
function RemoveGLthanChar(notes) {
var regex = /<[^>](.*?)>/g;
var strBlocks = notes.match(regex);
strBlocks.forEach(function (dirtyBlock) {
let cleanBlock = dirtyBlock.replace("<", "[").replace(">", "]");
notes = notes.replace(dirtyBlock, cleanBlock);
});
return notes;
}
Call it using
$('#form1').submit(function (e) {
e.preventDefault();
var dirtyBlock = $("#comments").val();
var cleanedBlock = RemoveGLthanChar(dirtyBlock);
$("#comments").val(cleanedBlock);
this.submit();
});

Develop Reference

JavaScript is the programming language of the Web.