How to Extract URL from the Text with javascript [duplicate]

How to Extract URL from the Text with javascript [duplicate] - javascript

Does anyone have suggestions for detecting URLs in a set of strings?
arrayOfStrings.forEach(function(string){
// detect URLs in strings and do something swell,
// like creating elements with links.
});
Update: I wound up using this regex for link detection… Apparently several years later.
kLINK_DETECTION_REGEX = /(([a-z]+:\/\/)?(([a-z0-9\-]+\.)+([a-z]{2}|aero|arpa|biz|com|coop|edu|gov|info|int|jobs|mil|museum|name|nato|net|org|pro|travel|local|internal))(:[0-9]{1,5})?(\/[a-z0-9_\-\.~]+)*(\/([a-z0-9_\-\.]*)(\?[a-z0-9+_\-\.%=&]*)?)?(#[a-zA-Z0-9!$&'()*+.=-_~:#/?]*)?)(\s+|$)/gi
The full helper (with optional Handlebars support) is at gist #1654670.

First you need a good regex that matches urls. This is hard to do. See here, here and here:
...almost anything is a valid URL. There
are some punctuation rules for
splitting it up. Absent any
punctuation, you still have a valid
URL.
Check the RFC carefully and see if you
can construct an "invalid" URL. The
rules are very flexible.
For example ::::: is a valid URL.
The path is ":::::". A pretty
stupid filename, but a valid filename.
Also, ///// is a valid URL. The
netloc ("hostname") is "". The path
is "///". Again, stupid. Also
valid. This URL normalizes to "///"
which is the equivalent.
Something like "bad://///worse/////"
is perfectly valid. Dumb but valid.
Anyway, this answer is not meant to give you the best regex but rather a proof of how to do the string wrapping inside the text, with JavaScript.
OK so lets just use this one: /(https?:\/\/[^\s]+)/g
Again, this is a bad regex. It will have many false positives. However it's good enough for this example.
function urlify(text) {
var urlRegex = /(https?:\/\/[^\s]+)/g;
return text.replace(urlRegex, function(url) {
return '' + url + '';
})
// or alternatively
// return text.replace(urlRegex, '$1')
}
var text = 'Find me at http://www.example.com and also at http://stackoverflow.com';
var html = urlify(text);
console.log(html)
// html now looks like:
// "Find me at http://www.example.com and also at http://stackoverflow.com"
So in sum try:
$$('#pad dl dd').each(function(element) {
element.innerHTML = urlify(element.innerHTML);
});

Here is what I ended up using as my regex:
var urlRegex =/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
This doesn't include trailing punctuation in the URL. Crescent's function works like a charm :)
so:
function linkify(text) {
var urlRegex =/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return text.replace(urlRegex, function(url) {
return '' + url + '';
});
}

I googled this problem for quite a while, then it occurred to me that there is an Android method, android.text.util.Linkify, that utilizes some pretty robust regexes to accomplish this. Luckily, Android is open source.
They use a few different patterns for matching different types of urls. You can find them all here:
http://grepcode.com/file/repository.grepcode.com/java/ext/com.google.android/android/2.0_r1/android/text/util/Regex.java#Regex.0WEB_URL_PATTERN
If you're just concerned about url's that match the WEB_URL_PATTERN, that is, urls that conform to the RFC 1738 spec, you can use this:
/((?:(http|https|Http|Https|rtsp|Rtsp):\/\/(?:(?:[a-zA-Z0-9\$\-\_\.\+\!\*\'\(\)\,\;\?\&\=]|(?:\%[a-fA-F0-9]{2})){1,64}(?:\:(?:[a-zA-Z0-9\$\-\_\.\+\!\*\'\(\)\,\;\?\&\=]|(?:\%[a-fA-F0-9]{2})){1,25})?\#)?)?((?:(?:[a-zA-Z0-9][a-zA-Z0-9\-]{0,64}\.)+(?:(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])|(?:biz|b[abdefghijmnorstvwyz])|(?:cat|com|coop|c[acdfghiklmnoruvxyz])|d[ejkmoz]|(?:edu|e[cegrstu])|f[ijkmor]|(?:gov|g[abdefghilmnpqrstuwy])|h[kmnrtu]|(?:info|int|i[delmnoqrst])|(?:jobs|j[emop])|k[eghimnrwyz]|l[abcikrstuvy]|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])|(?:name|net|n[acefgilopruz])|(?:org|om)|(?:pro|p[aefghklmnrstwy])|qa|r[eouw]|s[abcdeghijklmnortuvyz]|(?:tel|travel|t[cdfghjklmnoprtvwz])|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]))|(?:(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}|[1-9][0-9]|[0-9])))(?:\:\d{1,5})?)(\/(?:(?:[a-zA-Z0-9\;\/\?\:\#\&\=\#\~\-\.\+\!\*\'\(\)\,\_])|(?:\%[a-fA-F0-9]{2}))*)?(?:\b|$)/gi;
Here is the full text of the source:
"((?:(http|https|Http|Https|rtsp|Rtsp):\\/\\/(?:(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)"
+ "\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_"
+ "\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\#)?)?"
+ "((?:(?:[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}\\.)+" // named host
+ "(?:" // plus top level domain
+ "(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])"
+ "|(?:biz|b[abdefghijmnorstvwyz])"
+ "|(?:cat|com|coop|c[acdfghiklmnoruvxyz])"
+ "|d[ejkmoz]"
+ "|(?:edu|e[cegrstu])"
+ "|f[ijkmor]"
+ "|(?:gov|g[abdefghilmnpqrstuwy])"
+ "|h[kmnrtu]"
+ "|(?:info|int|i[delmnoqrst])"
+ "|(?:jobs|j[emop])"
+ "|k[eghimnrwyz]"
+ "|l[abcikrstuvy]"
+ "|(?:mil|mobi|museum|m[acdghklmnopqrstuvwxyz])"
+ "|(?:name|net|n[acefgilopruz])"
+ "|(?:org|om)"
+ "|(?:pro|p[aefghklmnrstwy])"
+ "|qa"
+ "|r[eouw]"
+ "|s[abcdeghijklmnortuvyz]"
+ "|(?:tel|travel|t[cdfghjklmnoprtvwz])"
+ "|u[agkmsyz]"
+ "|v[aceginu]"
+ "|w[fs]"
+ "|y[etu]"
+ "|z[amw]))"
+ "|(?:(?:25[0-5]|2[0-4]" // or ip address
+ "[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(?:25[0-5]|2[0-4][0-9]"
+ "|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1]"
+ "[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}"
+ "|[1-9][0-9]|[0-9])))"
+ "(?:\\:\\d{1,5})?)" // plus option port number
+ "(\\/(?:(?:[a-zA-Z0-9\\;\\/\\?\\:\\#\\&\\=\\#\\~" // plus option query params
+ "\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?"
+ "(?:\\b|$)";
If you want to be really fancy, you can test for email addresses as well. The regex for email addresses is:
/[a-zA-Z0-9\\+\\.\\_\\%\\-]{1,256}\\#[a-zA-Z0-9][a-zA-Z0-9\\-]{0,64}(\\.[a-zA-Z0-9][a-zA-Z0-9\\-]{0,25})+/gi
PS: The top level domains supported by above regex are current as of June 2007. For an up to date list you'll need to check https://data.iana.org/TLD/tlds-alpha-by-domain.txt.

Based on Crescent Fresh answer
if you want to detect links with http:// OR without http:// and by www. you can use the following
function urlify(text) {
var urlRegex = /(((https?:\/\/)|(www\.))[^\s]+)/g;
//var urlRegex = /(https?:\/\/[^\s]+)/g;
return text.replace(urlRegex, function(url,b,c) {
var url2 = (c == 'www.') ? 'http://' +url : url;
return '' + url + '';
})
}

This library on NPM looks like it is pretty comprehensive https://www.npmjs.com/package/linkifyjs
Linkify is a small yet comprehensive JavaScript plugin for finding URLs in plain-text and converting them to HTML links. It works with all valid URLs and email addresses.

Function can be further improved to render images as well:
function renderHTML(text) {
var rawText = strip(text)
var urlRegex =/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return rawText.replace(urlRegex, function(url) {
if ( ( url.indexOf(".jpg") > 0 ) || ( url.indexOf(".png") > 0 ) || ( url.indexOf(".gif") > 0 ) ) {
return '<img src="' + url + '">' + '<br/>'
} else {
return '' + url + '' + '<br/>'
}
})
}
or for a thumbnail image that links to fiull size image:
return '<img style="width: 100px; border: 0px; -moz-border-radius: 5px; border-radius: 5px;" src="' + url + '">' + '' + '<br/>'
And here is the strip() function that pre-processes the text string for uniformity by removing any existing html.
function strip(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
var urlRegex =/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return tmp.innerText.replace(urlRegex, function(url) {
return '\n' + url
})
}

There is existing npm package: url-regex, just install it with yarn add url-regex or npm install url-regex and use as following:
const urlRegex = require('url-regex');
const replaced = 'Find me at http://www.example.com and also at http://stackoverflow.com or at google.com'
.replace(urlRegex({strict: false}), function(url) {
return '' + url + '';
});

let str = 'https://example.com is a great site'
str.replace(/(https?:\/\/[^\s]+)/g,"<a href='$1' target='_blank' >$1</a>")
Short Code Big Work!...
Result:-
<a href="https://example.com" target="_blank" > https://example.com </a>

If you want to detect links with http:// OR without http:// OR ftp OR other possible cases like removing trailing punctuation at the end, take a look at this code.
https://jsfiddle.net/AndrewKang/xtfjn8g3/
A simple way to use that is to use NPM
npm install --save url-knife

Detect URLs in text and make clickable.
const detectURLInText = ( contentElement ) => {
const elem = document.querySelector(contentElement);
elem.innerHTML = elem.innerHTML.replace(/(https?:\/\/[^\s]+)/g, `<a class='link' href="$1">$1</a>`)
return elem
}
detectURLInText( '#myContent');
<div id="myContent">
Hell world!, detect URLs in text and make clickable.
IP: https://123.0.1.890:8080
Web: https://any-domain.com
</div>

try this:
function isUrl(s) {
if (!isUrl.rx_url) {
// taken from https://gist.github.com/dperini/729294
isUrl.rx_url=/^(?:(?:https?|ftp):\/\/)?(?:\S+(?::\S*)?#)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$/i;
// valid prefixes
isUrl.prefixes=['http:\/\/', 'https:\/\/', 'ftp:\/\/', 'www.'];
// taken from https://w3techs.com/technologies/overview/top_level_domain/all
isUrl.domains=['com','ru','net','org','de','jp','uk','br','pl','in','it','fr','au','info','nl','ir','cn','es','cz','kr','ua','ca','eu','biz','za','gr','co','ro','se','tw','mx','vn','tr','ch','hu','at','be','dk','tv','me','ar','no','us','sk','xyz','fi','id','cl','by','nz','il','ie','pt','kz','io','my','lt','hk','cc','sg','edu','pk','su','bg','th','top','lv','hr','pe','club','rs','ae','az','si','ph','pro','ng','tk','ee','asia','mobi'];
}
if (!isUrl.rx_url.test(s)) return false;
for (let i=0; i<isUrl.prefixes.length; i++) if (s.startsWith(isUrl.prefixes[i])) return true;
for (let i=0; i<isUrl.domains.length; i++) if (s.endsWith('.'+isUrl.domains[i]) || s.includes('.'+isUrl.domains[i]+'\/') ||s.includes('.'+isUrl.domains[i]+'?')) return true;
return false;
}
function isEmail(s) {
if (!isEmail.rx_email) {
// taken from http://stackoverflow.com/a/16016476/460084
var sQtext = '[^\\x0d\\x22\\x5c\\x80-\\xff]';
var sDtext = '[^\\x0d\\x5b-\\x5d\\x80-\\xff]';
var sAtom = '[^\\x00-\\x20\\x22\\x28\\x29\\x2c\\x2e\\x3a-\\x3c\\x3e\\x40\\x5b-\\x5d\\x7f-\\xff]+';
var sQuotedPair = '\\x5c[\\x00-\\x7f]';
var sDomainLiteral = '\\x5b(' + sDtext + '|' + sQuotedPair + ')*\\x5d';
var sQuotedString = '\\x22(' + sQtext + '|' + sQuotedPair + ')*\\x22';
var sDomain_ref = sAtom;
var sSubDomain = '(' + sDomain_ref + '|' + sDomainLiteral + ')';
var sWord = '(' + sAtom + '|' + sQuotedString + ')';
var sDomain = sSubDomain + '(\\x2e' + sSubDomain + ')*';
var sLocalPart = sWord + '(\\x2e' + sWord + ')*';
var sAddrSpec = sLocalPart + '\\x40' + sDomain; // complete RFC822 email address spec
var sValidEmail = '^' + sAddrSpec + '$'; // as whole string
isEmail.rx_email = new RegExp(sValidEmail);
}
return isEmail.rx_email.test(s);
}
will also recognize urls such as google.com , http://www.google.bla , http://google.bla , www.google.bla but not google.bla

Generic Object Oriented Solution
For people like me that use frameworks like angular that don't allow manipulating DOM directly, I created a function that takes a string and returns an array of url/plainText objects that can be used to create any UI representation that you want.
URL regex
For URL matching I used (slightly adapted) h0mayun regex: /(?:(?:https?:\/\/)|(?:www\.))[^\s]+/g
My function also drops punctuation characters from the end of a URL like . and , that I believe more often will be actual punctuation than a legit URL ending (but it could be! This is not rigorous science as other answers explain well) For that I apply the following regex onto matched URLs /^(.+?)([.,?!'"]*)$/.
Typescript code
export function urlMatcherInText(inputString: string): UrlMatcherResult[] {
if (! inputString) return [];
const results: UrlMatcherResult[] = [];
function addText(text: string) {
if (! text) return;
const result = new UrlMatcherResult();
result.type = 'text';
result.value = text;
results.push(result);
}
function addUrl(url: string) {
if (! url) return;
const result = new UrlMatcherResult();
result.type = 'url';
result.value = url;
results.push(result);
}
const findUrlRegex = /(?:(?:https?:\/\/)|(?:www\.))[^\s]+/g;
const cleanUrlRegex = /^(.+?)([.,?!'"]*)$/;
let match: RegExpExecArray;
let indexOfStartOfString = 0;
do {
match = findUrlRegex.exec(inputString);
if (match) {
const text = inputString.substr(indexOfStartOfString, match.index - indexOfStartOfString);
addText(text);
var dirtyUrl = match[0];
var urlDirtyMatch = cleanUrlRegex.exec(dirtyUrl);
addUrl(urlDirtyMatch[1]);
addText(urlDirtyMatch[2]);
indexOfStartOfString = match.index + dirtyUrl.length;
}
}
while (match);
const remainingText = inputString.substr(indexOfStartOfString, inputString.length - indexOfStartOfString);
addText(remainingText);
return results;
}
export class UrlMatcherResult {
public type: 'url' | 'text'
public value: string
}

Here is a little solution for react app without using any library please note that this method work if the url is not attached to any character
this component will return a paragraph with kink detection !
import React from "react";
interface Props {
paragraph: string,
}
const REGEX = /^(http:\/\/www\.|https:\/\/www\.|http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/gm;
const Paragraph: React.FC<Props> = ({ paragraph }) => {
const paragraphArray = paragraph.split(' ');
return <div>
{
paragraphArray.map((word: any) => {
return word.match(REGEX) ? (
<>
{word} {' '}
</>
) : word + ' '
})
}
</div>;
};
export default LinkParaGraph;

tmp.innerText is undefined. You should use tmp.innerHTML
function strip(html)
{
var tmp = document.createElement("DIV");
tmp.innerHTML = html;
var urlRegex =/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig;
return tmp.innerHTML .replace(urlRegex, function(url) {
return '\n' + url
})

You can use a regex like this to extract normal url patterns.
(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})
If you need more sophisticated patterns, use a library like this.
https://www.npmjs.com/package/pattern-dreamer

Related

How to check for Forward Slash within this Regex for all special characters?

I am trying to find a regex solution to check if a string matches all conditions + / forward slashes.
Current code:
var specialChars = /^[a-zA-Z0-9!##\$%\^\&*\)\(+=._-]+$/g;
This will match true if a string looks like so: 4!##$.
However it does not work if the string looks like this: 5/6/2019
This is how I'm implementing this check, basically I have a function that takes in an long string. And what I'm trying to do is pluck out the tracking ID then create a link out of it.
My test cases are also in the demo, the date test is the one that fails, since the linkCreator function ends up linking to the date:
https://jsfiddle.net/cojuevp5/
var linkCreator = function(value) {
var strings = value.split(' ');
var aHref = '<a href="http://www.google.com/search?q=';
var targetBlank = '" target="_blank" style="text-decoration: underline">';
var trackingString = strings.reduce(function(prevVal, currVal, idx) {
var specialChars = /^[a-zA-Z0-9!##\$%\^\&*\)\(+=._-]+$/g;
// Does val start with number and not contain special characters including /
var link = currVal.match(/^\d/) && !currVal.match(specialChars) ?
aHref + currVal + targetBlank + currVal + '</a>' :
currVal;
return idx == 0 ? link : prevVal + ' ' + link;
}, '');
console.log(trackingString);
}
const case1 = '434663008870'
const case2 = '4S4663008870'
const case3 = '4S4663008870 PS'
const case4 = 'SHD FX 462367757727 PS'
const case5 = 'SHD FX 429970755485, R'
const case6 = 'SHD HEADER TRACKING PS'
const case7 = 'N/A'
const case8 = 'AF SHD FX 462367757727 PS'
const case9 = '4/7/2019'
const case10 = '4!##$%^&'
const value = case9
const link = linkCreator(value)
console.log(link)

You might want to add a \/ and that would likely solve your problem:
^([A-z0-9!\/##$%^&*)(+=._-]+)$
Just like Barmar says, you do not need to escape all chars inside []:
I'm guessing that this may be what you might want to match:
You might just use this tool and design any expression that you wish.
Graph
This graph shows how your expression works:

add to URL after last /

using jQuery; to add something to a url after the last /
for example add sale to:
/gender/category/brand/
so it becomes:
/gender/category/brand/sale
However due to the way the URL's are generated and built I can't just always say 'add it to the end of a URL' as there are sometimes ?query strings on the end for example:
/gender/category/brand/?collection=short&colour=red
I just can't figure out how I can add sale after the final / and always before a ?query string if one exists.
Searching through stackoverflow I've seen some bits about extracting content after the last / but not this, is this possible? I really would appreciate help getting this sorted.
EDIT - The solution
Thanks too all for your help but I was able to adapt Shree's answer the easiest to get this which did what I needed:
if(window.location.href.indexOf("sale") > -1) {
} else {
var raw = window.location.href;
var add = 'sale';
var rest = raw.substring(0, raw.lastIndexOf("/") + 1);
var last = raw.substring(raw.lastIndexOf("/") + 1, raw.length);
var newUrl = rest + add + last;
window.location.href = newUrl;
}

Use substring with lastIndexOf.
var raw = '/gender/category/brand/?collection=short&colour=red';
var add = 'sale';
var rest = raw.substring(0, raw.lastIndexOf("/") + 1);
var last = raw.substring(raw.lastIndexOf("/") + 1, raw.length);
var newUrl = rest + add + last;
console.log(newUrl);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>

In vanilla javascript
var a = "/gender/category/brand/?collection=short&colour=red";
var lastIndexPosition = a.lastIndexOf('/');
a = a.substring(0,lastIndexPosition+1)
+"sale"
+a.substring(lastIndexPosition+1 , a.length);
console.log(a);

By using a reusable function in Javascript:
You can use lastIndexOf and get the last '/' index position and append your new data there.
The lastIndexOf() method returns the position of the last occurrence
of a specified value in a string.
Using this you can send any parameter into function there by it is reusable.
function insert(main_string, ins_string, pos) {
return main_string.slice(0, pos) + ins_string + main_string.slice(pos);
}
var url = "/gender/category/brand/?collection=short&colour=red"
url = insert(url, 'sale', url.lastIndexOf("/") + 1)
console.log(url)
Here is a working DEMO

An alternative, use .split("?") to separate at the ? then combine them back, eg:
// Example with querystring
var url = '/gender/category/brand/?collection=short&colour=red'
var parts = url.split("?");
var newurl = parts[0] + "sale" + "?" + (parts[1]||"")
console.log(newurl)
// Test without querystring
var url = '/gender/category/brand/'
var parts = url.split("?");
var newurl = parts[0] + "sale" + (parts[1]||"")
console.log(newurl)
The (parts[1]||"") handles the case where there isn't a querystring.

Convert String in the Parenthesis in the URL To Query Parameters: Javascript

I want to convert the string in {} in search URL to query parameters which would help users capture search terms in web analytics tools.
Here's what I am trying to do, Let's say
Search URL is:
example.com/search/newyork-gyms?dev=desktop&id=1220391131
User Input will be:
var search_url_format = '/search/{city}-{service}
Output URL:
example.com/search?city=newyork&service=gyms&dev=desktop&id=1220391131
The problem is when is use the regex {(.*)} it captures the whole string {city}-{service}.
But I what I want is [{city},{service}].
The URL format can also be like
search/{city}/{service}/
search/{city}_{service}/
What I have tried is for a single variable.
It returns correct output.
Eg: URL:/search/newyork
User Input: /search/{city}
Output: /search/newyork?city=newyork
URL: /search-germany
User Input: /search-{country}
Output: /search-germany?country=germany
var search_url_format = '/search/{city}' //User Enters any variable in brackets
var urloutput = '/search/newyork' //Demo URL
//Log output
console.log(URL2Query(search_url_format)) //Output: '/search/newyork?city=newyork'
function URL2Query(url) {
var variableReg = new RegExp(/{(.*)}/)
var string_url = variableReg.exec(url)
var variable1 = string_url[0]
//Capture the variable
var reg = new RegExp(url.replace(variable1, '([^\?|\/|&]+)'))
var search_string = reg.exec(urloutput)[1]
if (location.search.length > 0) // if no query parameters
{
return urloutput + "?" + string_url[1] + "=" + search_string
} else {
return urloutput + "&" + string_url[1] + "=" + search_string
}
}

You are missing two things:
parenthesis to match groups and you use .* which includes "{" sign.
So can use match instead of exec like this:
var search_url_format = '/search/{city}-{service}' //User Enters any
var variableReg = new RegExp(/({\w+})/g)
var string_url = url.match(variableReg); // [{city}, {service}]

You can probably assume your "variable" will be alphanumeric instead of any character. With this assumption "{", "-", "_" etc will be punctuation.
so your grouping regexp could be /({\w+})/g.
//example
const r = /({\w+})/g;
let variable;
const url = '/search/{city}-{service}';
while ((variable = r.exec(url)) !== null) {
let msg = 'Found ' + variable[0] + '. ';
msg += 'Next match starts at ' + r.lastIndex;
console.log(msg);
}

How do you add a parameter to a URL and reload the page

I'm looking for the simplest way to add parameters to a URL and then reload the page via javascript/jquery. I'm trying to avoid any plugins. Essentially I want:
http://www.mysite.com/about
to become:
http://www.mysite.com/about?qa=newParam
or, if a parameter already exists, then add a second parameter:
http://www.mysite.com/about?qa=oldParam&qa=newParam

Here is a vanilla solution, it should work nicely for all cases (except wrong inputs of course).
function replace_search(name, value) {
var str = location.search;
if (new RegExp("[&?]"+name+"([=&].+)?$").test(str)) {
str = str.replace(new RegExp("(?:[&?])"+name+"[^&]*", "g"), "")
}
str += "&";
str += name + "=" + value;
str = "?" + str.slice(1);
// there is an official order for the query and the hash if you didn't know.
location.assign(location.origin + location.pathname + str + location.hash)
};
EDIT: if you want to add stuff and never remove anything the function is way smaller. I'm not very found of having multiple fields with different values but there is no specifications on that.
function replace_search(name, value) {
var str = "";
if (location.search.length == 0) {
str = "?"
} else {
str = "&"
}
str += name + "=" + value;
location.assign(location.origin + location.pathname + location.search + str + location.hash)
};

Have a look at Window.location (MDN) for information on window.location.
A quick and dirty solution is:
location += (location.search ? "&" : "?") + "qa=newParam"
It should work for your example, but misses some edge cases.

location.href will give you the current URL. You can then edit your query string and refresh the page by doing something like this:
if (location.href.indexOf("?") === -1) {
window.location = location.href += "?qa=newParam";
}
else {
window.location = location.href += "&qa=newParam";
}

Remove sections of url to only keep file name

I want to remove everything in the URL and only keep the name of the file/image. The URL is a dynamic input/variable.
Code:
var str = "http://website.com/sudir/sudir/subdir/Image_01.jpg";
str = str.replace("http://website.com/sudir/sudir/subdir/", "")
.replace(/[^a-z\s]/gi, ' ').replace("_", " ")
.replace("subdir", "").toLowerCase().slice(0,-4);

You can do this easily with lastIndexOf():
var str = "http://website.com/sudir/sudir/subdir/Image_01.jpg";
str.substring(str.lastIndexOf("/") + 1)
//"Image_01.jpg"

This function will give you the file name,
function GetFilename(url)
{
if (url)
{
var m = url.toString().match(/.*\/(.+?)\./);
if (m && m.length > 1)
{
return m[1];
}
}
return "";
}

From How to catch the end filename only from a path with javascript?
var filename = path.replace(/.*\//, '');
From Getting just the filename from a path with Javascript
var fileNameIndex = yourstring.lastIndexOf("/") + 1;
var filename = yourstring.substr(fileNameIndex);

I know you haven't specified the exact URL format and whether this may be possible in your situation, but this may be a solution worth considering.
Javascript
var str = "http://website.com/sudir/sudir/subdir/Image_01.jpg?x=y#abc";
console.log(str.split(/[?#]/)[0].split("/").slice(-1)[0]);
str = "http://website.com/sudir/sudir/subdir/Image_01.jpg";
console.log(str.split(/[?#]/)[0].split("/").slice(-1)[0]);
On jsfiddle

You can always Regex to extract data from strings:
The Regex to extract data from URL:
"http://website.com/sudir/sudir/subdir/(?<FileName>[0-9A-Za-z._]+)"

Develop Reference

JavaScript is the programming language of the Web.

How to Extract URL from the Text with javascript [duplicate] - javascript

This library on NPM looks like it is pretty comprehensive https://www.npmjs.com/package/linkifyjs Linkify is a small yet comprehensive JavaScript plugin for finding URLs in plain-text and converting them to HTML links. It works with all valid URLs and email addresses.

let str = 'https://example.com is a great site' str.replace(/(https?:\/\/[^\s]+)/g,"<a href='$1' target='_blank' >$1</a>") Short Code Big Work!... Result:- <a href="https://example.com" target="_blank" > https://example.com </a>

If you want to detect links with http:// OR without http:// OR ftp OR other possible cases like removing trailing punctuation at the end, take a look at this code. https://jsfiddle.net/AndrewKang/xtfjn8g3/ A simple way to use that is to use NPM npm install --save url-knife

Related

How to check for Forward Slash within this Regex for all special characters?

add to URL after last /

Convert String in the Parenthesis in the URL To Query Parameters: Javascript

How do you add a parameter to a URL and reload the page

Remove sections of url to only keep file name

Categories

Resources