Node.js get all occurrences of a substring in a string - javascript

I have a string in Node.js Runtime e.g
var content = "my content contain some URL like https://this.me/36gD6d3 or https://this.me/39Jwjd";
How can I read each https://this.me/36gD6d3 and https://this.me/39Jwjd to replace it with another URL?
A forEach loop or something similar would be best. :-)
What I need is to make a request to each of that URL to get the real URL behind the shorten URL. That's not the problem.
Before and after each of that URLs is neither a whitespace or a ..
Domain https://this.me/ is constant but the IDs 39Jwjd, 36gD6d3 are changing.
Looking forward to your answers! :)

You can use regex to find occurrences of this URL.
var content = "my content contain some URL like https://this.me/36gD6d3 or https://this.me/39Jwjd";
console.log(content.match(/https:\/\/this\.me\/[a-zA-Z0-9]+/g))
This outputs:
[
"https://this.me/36gD6d3",
"https://this.me/39Jwjd"
]
In order to replace the found occurrences, use replace() function.
var content = "my content contain some URL like https://this.me/36gD6d3 or https://this.me/39Jwjd";
console.log(content.replace(/https:\/\/this\.me\/[a-zA-Z0-9]+/g, "<Replaced URL here>"))
Output:
my content contain some URL like <Replaced URL here> or <Replaced URL here>
If you want to replace the occurrences depending on the previous value, you could either use substitution or pass replacement function as the second argument.
Learn more on String.prototype.replace function at MDN

If you want your replace to be asynchronous (which I'm guessing is the case when you lookup the full URL), you could do something like this:
(async () => {
const str = "my content contain some URL like https://this.me/36gD6d3 or https://this.me/39Jwjd",
res = await replaceAllUrls(str);
console.log(res);
})();
function replaceAllUrls(str) {
const regex = /https?:\/\/this\.me\/[a-zA-Z0-9_-]+/g,
matches = str.match(regex) || [];
return Promise.all(matches.map(getFullUrl)).then(values => {
return str.replace(regex, () => values.shift());
});
}
function getFullUrl(u) {
// Just for the demo, use your own
return new Promise((r) => setTimeout(() => r(`{{Full URL of ${u}}}`), 100));
// If it fails (you cannot get the full URL),
// don't forget to catch the error and return the original URL!
}

Related

Inject variable into regular expression

So I am trying to certain values from a URL. Suppose I have the following URL:
let url = "https://gk.example.com/my-path/to/some+more/multiple.variables/moreparams";
I am trying to extract the some+more and multiple.variables parts from the URL using a Regular Expression. I came up with the following expressions:
/(?<=/)([^/]*\+[^/]*)(?=/)/g (for the + separator) and /(?<=/)([^/]*\.[^/]*)(?=/)/g (for the . separator)
results = url.match(/(?<=/)([^/]*\+[^/]*)(?=/)/g); // result: ['some+more']
results = url.match(/(?<=/)([^/]*\.[^/]*)(?=/)/g); // result: ['gk.example.com', 'multiple.variables']
This returns ['some+more'] and ['gk.example.com', 'multiple.variables'], which is a result I can work with. However, instead of using a if statement to switch between expressions, I would rather inject a variable into a generic regular expression. I tried the following (using backticks ( ` ) to be able to inject the variable :
function getSplitUrl(sep, url) {
if (!url) url = window.location.href;
let regex = new RegExp('(?<=/)([^/]*'+sep+'[^/]*)(?=/)', `g`),
results = [];
results[0] = url.match(`(?<=/)([^/]*${sep}[^/]*)(?=/)`, `g`);
results[1] = regex.exec(url);
console.log('Regex 1: ', regex); // logs `Regex 1: /(?<=/)([^/]*${sep}[^/]*)(?=\/)/g`, where ${sep} is replaced by either \. or \+ (seemingly correct expression)
console.log(url, sep, results);
return null;
}
From the console.log(regex) it seems that it is the correct regular expression but the result is still wrong. The result is now ['gk.example.com', 'gk.example.com'].
Am I missing something obvious here?
Edit:
Somehow, url.match(regex) returns a correct result, whereas regex.exec(url) does not.
Keep it simple and don't use regex, but rather the URL API to parse urls:
const url = new URL("https://gk.example.com/my-path/to/some+more/multiple.variables/moreparams?limit=5&more=conf/fusion");
console.log(url.hostname);
console.log(url.pathname)
const parts = url.pathname.slice(1).split('/');
console.log(parts);
const find = (substring) => parts.filter(p => p.includes(substring));
console.log(find('+'));
console.log(find('.'));

Part of the string is missing when trying to replace

I have a string that looks like this:
[TITLE|prefix=X|suffix=a] [STORENAME|prefix=b] [DYNAMIC|limit=10|seperator=-|random=1|reverse=1|prefix=c]
I would like to replace the values of all prefix attributes with hello. So the goal is that the string looks like this:
[TITLE|prefix=hello|suffix=a] [STORENAME|prefix=hello] [DYNAMIC|limit=10|seperator=-|random=1|reverse=1|prefix=hello]
This is what I have tried:
const obj = {};
obj.prefix = "[TITLE|prefix=a|suffix=x] [STORENAME|prefix=b] [DYNAMIC|limit=10|seperator=-|random=1|reverse=1|prefix=c]";
function replace(search, replace) {
const regex = new RegExp(`(?<=\\[${search}\\|[^\\]]*${replace}=)[^|\\]]+`);
obj.prefix = obj.prefix.replace(regex, 'hello');
}
replace('TITLE', 'prefix');
replace('STORENAME', 'prefix');
replace('DYNAMIC', 'prefix');
console.log(obj.prefix);
As you see it works fine!
I have used almost the same code for my project but it fails. You can check my project on JSFiddle. Just type anything on an input field and check the console. You will see that the value of first prefix will be changed but 2 further prefix attributes are missing.
So this is what I get:
[TITLE|prefix=anything|suffix=a] [STORENAME] [DYNAMIC|limit=10|seperator=-|random=1|reverse=1]
And this is what I should get:
[TITLE|prefix=anything|suffix=a] [STORENAME|prefix=another thing] [DYNAMIC|limit=10|seperator=-|random=1|reverse=1|prefix=one more thing]
What is the reason that those attributes are missing?
Update
If I am not mistaking, my main problem is the if-statement:
if (mtPr.query[mtPr.settings.activeLang].includes(replace)) {
With this if-statement, I would like to check if either TITLE has the attribute prefix or STORENAME has the attribute prefix or DYNAMIC has the attribute prefix. But this is a bad workaround since the value of replace is always prefix (see line numbers 241, 245 and 251). And since we already have prefix in the WHOLE string, it means that we're caught in that if-statement every single time. So a possible solution could be, checking if the parameter replace is included AND does it belong to the parameter search.
Try this
function replace(search, replace) {
const regex = new RegExp(`(${search}[^\\[\\]]*\\|prefix\\=)[^\\|\\[\\]]+`);
obj.prefix = obj.prefix.replace(regex, '$1hello');
}
As I have described in my question, the problem was really the if-statement. This how I could resolve it:
const searchMatchesReplace = new RegExp(`(?<=${search}.+${replace}=)[^\\]|]+`);
if (searchMatchesReplace.test(mtPr.query[mtPr.settings.activeLang])) {
const regex = new RegExp(`(?<=\\[${search}\\|[^\\]]*${replace}=)[^|\\]]+`, 'g');
result = mtPr.query[mtPr.settings.activeLang].replace(regex, value);
}
// Replace parameters if they do not exist
else {
const regex = new RegExp(`(\\[${search}(?:\\|[^\\][]*)?)]`, 'gi');
result = mtPr.query[mtPr.settings.activeLang].replace(regex, `$1|${replace}=${value}]`);
}

Rewrite URL Prefix Using Javascript / Jquery

I am retrieving some data from an external API using javascript, I'm then displaying this data on a HTML page.
Within this returned data is a URL, it's in the following format;
var url = https://img.evbuc.com/moreStuff
I need to rewrite this URL so that it's prefixed with www, like this;
var url = https://www.img.evbuc.com/moreStuff
I want to achieve this using either javascript or jquery.
How can I achieve this? An explanation of the correct code would be great too.
You don't need regex for this you can simply use URL api
let url = "https://img.evbuc.com/moreStuff"
let parsed = new URL(url)
parsed.host = parsed.host.startsWith('www.') ? parsed.host : "www."+ parsed.host
console.log(parsed)
You can use a regular expression to search and replace.
Following example also works with:
http://img.evbuc.com/moreStuff
//img.evbuc.com/moreStuff
https://img.evbuc.com/moreStuff//someMoreStuff
function prependUrl(url) {
return url.replace(/^([^\/]*)(\/\/)(.*)/, '$1//www.$3');
}
const urls = [
'https://img.evbuc.com/moreStuff',
'http://img.evbuc.com/moreStuff',
'//img.evbuc.com/moreStuff',
'https://img.evbuc.com/moreStuff//someMoreStuff'
];
urls.forEach((url) => console.log(`${ url } -> ${ prependUrl(url) }`));
The regular expression contains 3 capturing groups:
Select everything up to the first / (excluding)
Select the // (for protocol root)
Select the rest
The replacement value takes everything up to the first / (which may be an empty string as well)
Replace the // with //www.
Append the rest
If you want something that will work with any protocol, try this regex:
var url = "https://img.evbuc.com/moreStuff"
var new_url = url.replace(/^([a-zA-Z][a-zA-Z0-9\.\+\-]*):\/\//, "$1://www.")
console.log('new URL: ', new_url)
Simple string operations:
var url = 'https://img.evbuc.com/moreStuff'
var newUrl = url.split('//')[0] + '//www.' + url.split('//')[1]
console.log(newUrl)
and yet another way to do this is like this:
var url = 'https://img.evbuc.com/moreStuff'
var newUrl = url.replace('https://', 'https://www.')
console.log(newUrl)

Extract links in a string and return an array of objects

I receive a string from a server and this string contains text and links (mainly starting with http://, https:// and www., very rarely different but if they are different they don't matter).
Example:
"simple text simple text simple text domain.ext/subdir again text text text youbank.com/transfertomealltheirmoney/witharegex text text text and again text"
I need a JS function that does the following:
- finds all the links (no matter if there are duplicates);
- returns an array of objects, each representing a link, together with keys that return where the link starts in the text and where it ends, something like:
[{link:"http://www.dom.ext/dir",startsAt:25,endsAt:47},
{link:"https://www.dom2.ext/dir/subdir",startsAt:57,endsAt:88},
{link:"www.dom.ext/dir",startsAt:176,endsAt:192}]
Is this possible? How?
EDIT: #Touffy: I tried this but I could not get how long is any string, only the starting index. Moreover, this does not detect www: var str = string with many links (SO does not let me post them)"
var regex =/(\b(https?|ftp|file|www):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig; var result, indices = [];
while ( (result = regex.exec(str)) ) {
indices.push({startsAt:result.index});
}; console.log(indices[0].link);console.log(indices[1].link);
One way to approach this would be with the use of regular expressions. Assuming whatever input, you can do something like
var expression = /(https?:\/\/(?:www\.|(?!www))[^\s\.]+\.[^\s]{2,}|www\.[^\s]+\.[^\s]{2,})/gi;
var matches = input.match(expression);
Then, you can iterate through the matches to discover there starting and ending points with the use of indexOf
for(match in matches)
{
var result = {};
result['link'] = matches[match];
result['startsAt'] = input.indexOf(matches[match]);
result['endsAt'] =
input.indexOf(matches[match]) + matches[match].length;
}
Of course, you may have to tinker with the regular expression itself to suit your specific needs.
You can see the results logged by console in this fiddle
const getLinksPool = (links) => {
//you can replace the https with any links like http or www
const linksplit = links.replace(/https:/g, " https:");
let linksarray = linksplit.split(" ");
let linkspools = linksarray.filter((array) => {
return array !== "";
});
return linkspools;
};

How to extract the filename of the URL of the current document path in JavaScript?

I'm trying to extract the current file name in Javascript without any parameters.
$(location).attr('href').match(/([a-zA-Z\-\_0-9]+\.\w+)$/);
var current_path = RegExp.$1;
if ((current_path == 'index.html') || ...) {
// something here
}
But it doesn't work at all when you access like http://example.com/index.html?lang=ja. Sure before the file name will be changed at random.
Any idea?
If you're looking for the last item in the path, try this:
var current_path = window.location.pathname.split('/').pop();
This:
window.location.pathname
will give you something like:
"/questions/6543242/how-to-extract-the-filename-of-url-in-javascript"
Then the .split() will split the string into an Array, and .pop() will give you the last item in the Array.
function filename(path){
path = path.substring(path.lastIndexOf("/")+ 1);
return (path.match(/[^.]+(\.[^?#]+)?/) || [])[0];
}
console.log(filename('http://example.com/index.html?lang=ja'));
// returned value: 'index.html'
The filename of a URL is everything following the last "/" up to one of the following: 1.) a "?" (beginning of URL query), or 2.) a "#" (beginning of URL fragment), or 3.) the end of the string (if there is no query or fragment).
This tested regex does the trick:
.match(/[^\/?#]+(?=$|[?#])/);
There is a URL.js library that makes it very easy to work with URLs. I recommend it!
Example
var uri = new URI('http://example.org/foo/hello.html?foo=bar');
uri.filename(); // => 'hello.html'
your regex isn't correct. Instead try to be more specific:
.match(/([a-zA-Z\-\_0-9]+\.[a-zA-Z]{2,4})[\?\$]/);
says:
find any number of alphanumeric or hypens[a-zA-Z\-\_0-9]+ before a fullstop that has between 2 and 4 alphabetic characters [a-zA-Z]{2,4} that combefore either the end (\$) or a question mark (\?)
tested on:
("http://www.example.com/index.html?lang=ja").match(/([a-zA-Z\-\_0-9]+\.[a-zA-Z]{2,4})[\?\$]/);
var current_path = RegExp.$1;
alert(current_path);
try this:
window.location.pathname.substring(1)
You can do something more simple:
var url = "http://google.com/img.png?arg=value#div5"
var filename = url.split('/').pop().split('#')[0].split('?')[0];
Result:
filename => "img.png"

Categories

Resources