Javascript | Link/Bookmarklet to remove variables in url - javascript

I found this and was able to do what I initially wanted.
Javascript | Link/Bookmarklet to replace current window location
javascript:(function(){var loc=location.href;loc=loc.replace('gp/product','dp'); location.replace(loc)})()
Which was to change an amazon url from the product link to the dude perfect link.
It turns this url: https://www.amazon.com/gp/product/B01NBKTPTS/
into this url: https://www.amazon.com/dp/B01NBKTPTS/
I would like to take this a step further. Is there a way to do the above switch and then also remove the string of variables after the ? essentially cleaning up
https://www.amazon.com/gp/product/B01NBKTPTS/?pf_rd_r=DQV2YXJP8FFKM1Q50KS9&pf_rd_p=eb347dce-a775-4231-8920-ae66bdd987f4&pf_rd_m=ATVPDKIKX0DER&pf_rd_t=Landing&pf_rd_i=16310101&pf_rd_s=merchandised-search-2&linkCode=ilv&tag=onamzbybcreat-20&ascsubtag=At_Home_Cooking_210426210002&pd_rd_i=B01NBKTPTS
to
https://www.amazon.com/dp/B01NBKTPTS/
Thanks!

You've almost done it yourself!
To do the second part you can use split on your /? string (i.e. URL).
In our case that will give you an array with two elements: the first element stores everything BEFORE the /? (reference [0], that's what we can use), and the other stores everything AFTER (reference [1], not needed for us)
FYI: if there were more /?, then split would produce an array with several elements. Additional information.
In addition, you shouldn't forget to escape the special character / this way: \/.
So here is the final working bookmarklet code to get the first URL part before /? letters, with gd/product replaced by dp:
javascript:(function(){
var loc=location.href;
loc=loc
.split('\/?')[0]
.replace('gp/product','dp')
+'/';
location.replace(loc);
})();

Related

How to check if string is a valid Figma link?

I'm building an app on NodeJS that uses Figma API, and I need to check if the string passed by a user is a valid Figma link. I'm currently using this simple regex expression to check the string:
/^https\:\/\/www.figma.com\/.*/i
However, it matches all links from figma.com, even the home page, not only links to the files and prototypes. Here is an example Figma link that should match:
https://www.figma.com/file/OoYmkiTlusAzIjYwAgSbv8wy/Test-File?node-id=0%3A1
Also the match should be positive if this is a prototype link, with proto instead of file in the path.
Moreover, since I'm using the Figma API, it would be useful to extract necessary parts of the URL such as the file ID and node ID at the same time.
TL;DR
✅ Use this expression to capture four most important groups (type, file id, file name and URL properties) and work from there.
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/?([^\?]+)?(.*))?$/
From the docs
This is the regex expression code provided by Figma on their developer documentation page about embeds:
/https://([w.-]+.)?figma.com/(file|proto)/([0-9a-zA-Z]{22,128})(?:/.*)?$/
🛑 However, it doesn't work in JS as the documentation is currently wrong and this expression has multiple issues:
Slashes and a dots are not escaped with backslashes.
It doesn't match from the start of the string. I added the start of string anchor ^ after VLAZ pointed it out in the comments. This way we will avoid matching strings that don't start with https, for example malicious.site/?link=https://figma.com/...
It will match not only www. subdomain but any other amount of W which is not great (e.g. wwwww.) — it can be fixed by replacing letter match with a simpler expression. Also this is a useless capturing group, I'll make it non-capturing.
It would be nice if the link matched even if it doesn't begin with https:// as some engines (e.g. Twitter) strip this part for brevity and if person is copying a link from there, it should still be valid.
After applying all the improvements, we are left with the following expression:
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/.*)?$/
There is also a dedicated NPM package that simply checks the URL against the similar pattern. However, it contains some of the flaws listed above so I don't advice using it, especially for just one line of code.
Extracting parts of the URL
This expression is extremely useful to use with Figma API as it even extracts necessary parts from the URL such as type of link (proto/file) and the file key. You can access them by indexes.
You can also add a piece of regex to match specific keys in the query such as node-id:
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/.*)?node-id=([^&]*)$/
Now you can use it in code and get all the parts of the URL separately:
var pattern = /^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/.*)?node-id=([^&]*)$/
var matched = 'https://www.figma.com/file/OoYmkiTlusAzIjYwAgSbv8wy/Test-File?node-id=0%3A1'.match(pattern)
console.log('url:', matched[0]) // whole matched string
console.log('type:', matched[1]) // group 1
console.log('file key:', matched[2]) // group 2
console.log('node id:', matched[3]) // group 3
Digging deeper
I spent some time recreating this expression almost from scratch so it would match as many possible Figma file/prototype URLs without breaking things. Here are three similar versions of it that would work for different cases.
✅ This version captures the URL parameters and the name of the file separately for easier processing. You can check it here. I added it in the beginning of the answer, because I think it's the cleanest and most useful solution.
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/?([^\?]+)?(.*))?$/
The groups in it are as following:
Group 1: file/proto
Group 2: file key/id
Group 3: file name (optional)
Group 4: url parameters (optional)
✅ Next up, I wanted to do the same but separating the /duplicate part that can be added in the end of any Figma URL to create a duplicate of the file upon opening.
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/?([^\?]+)?([^\/]*)(\/duplicate)?)?$/
✅ And back to the node-id parameter. The following regex expression finds and captures multiple URLs inside a multiline string successfully. The only downside that I found in the end is that it (as well as all the previous ones) doesn't check if this URL contains unencoded special characters meaning that it can potentially break things, but it can be avoided by manually encoding all parameters using encodeURI() function.
/^(?:https:\/\/)?(?:www\.)?figma\.com\/(file|proto)\/([0-9a-zA-Z]{22,128})(?:\/([^\?\n\r\/]+)?((?:\?[^\/]*?node-id=([^&\n\r\/]+))?[^\/]*?)(\/duplicate)?)?$/gm
There are six groups that can be captured by this expression:
Group 1: file/proto
Group 2: file key/id
Group 3: file name (optional)
Group 4: url parameters (optional)
Group 5: node-id (optional; only present when group 4 is present)
Group 6: /duplicate
And, finally, here is the example of a match and its groups (or try it yourself):

Matching a JS string with regex

I have a long xml raw message that is being stored in a string format. A sample is as below.
<tag1>val</tag><tag2>val</tag2><tagSomeNameXYZ/>
I'm looking to search this string and find out if it contains an empty html tag such as <tagSomeNameXYZ/>. This thing is, the value of SomeName can change depending on context. I've tried using Str.match(/tagSomeNameXYZ/g) and Str.match(/<tag.*.XYZ\/>/g) to find out if it contains exactly that string, but am able to get it return anything. I'm having trouble in writing a reg ex that matches something like <tag*XYZ/>, where * is going to be SomeName (which I'm not interested in)
Tl;dr : How do I filter out <tagSomeNameXYZ/> from the string. Format being : <constant variableName constant/>
Example patterns that it should match:
<tagGetIndexXYZ/>
<tagGetAllIndexXYZ/>
<tagGetFooterXYZ/>
The issue you have with Str.match(/<tag.*.XYZ\/>/g) is the .* takes everything it sees and does not stop at the XYZ as you wish. So you need to find a way to stop (e.g. the [^/]* means keep taking until you find a /) and then work back from there (the slice).
Does this help
testString = "<tagGetIndexXYZ/>"
res = testString.match(/<tag([^/]*)\/\>/)[1].slice(0,-3)
console.log(res)

regex to match all keywords in a string

Being noob in regex I require some support from community
Let say I have this string str
www.anysite.com hello demo try this link
anysite.com indeed demo link
http://www.anysite.com another one
www.anysite.com
http://anysite.com
Consider 1-5 as whole string str here
I want to convert all 'anysite.com' into clickable html links, for which I am using:
str = str.replace(/((http|https|ftp):\/\/[\w?=&.\/-;#~%-]+(?![\w\s?&.\/;#~%"=-]*>))/g, '$1');
This converts all space separated words starting with http/https/ftp into links as
url
So, line 3 and line 5 has been converted correctly. Now to convert all www.anysite.com into links I again used
str = str.replace(/(\b^(http|https|ftp)?(www\.)[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/ig, '$1');
Though it only converts www.anysite.com into link if it is found at very beginning of str. So it convert line number 1 but not line number 4.
Note that I have used ^(http|https|ftp)?(www.) to find all www not
starting with http/https/ftp, as for http they already have been
converted
Also the link on line number 2, where it is neither started with http nor www rather it ends with .com, how the regex would be for that.
For reference you can try posting this whole string to you facebook timeline, it converts all five line into links. Check snapshot
Thanks for help, the final RegEx that helped me is:
//remove all http:// and https://
str = str.replace(/(http|https):\/\//ig, "");
//replace all string ending with .com or .in only into link
str = str.replace( /((www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.(com|in))/ig, '$1');
I used .com and .in for my specific requirement, else the solution on this http://regexr.com/39i0i will work
Though sill there is issue like- it doesn't convert shortened url into
links perfectly. e.g http://s.ly/qhdfTyuiOP will give link till s.ly
Still any suggestions?
^(http|https|ftp)?(www\.) does not mean "all www not starting with http/https/ftp" but rather "a string that starts with an optional http/https/ftp followed by www..
Indeed, ^ in this context isn't a negation but rather an anchor representing the start of the string. I suppose you used it this way because of its meaning when used in a character class ([^...]) ; it is rather tricky since its meaning change depending on the context it is found in.
You could just remove it and you should be fine, as I see no point of making sure the string does not start with http/https/ftp (you transformed those occurrences just before, there should be none left).
Edit : I mentioned lookbehind but forgot it's not available in JS...
If you wanted to make some kind of negation, the easiest way would be to use a negative lookbehind :
(?<!http|https|ftp)www\.
This matches "www." only when it's not preceded by http, https nor ftp.

Only match regex if it doesnt start with a pattern in javascript

I have a bit of a strange one here, I basically have a large chunk of text which may or may not contain links to images.
So lets say it does I have a pattern which will extract the image url fine, however once a match is found it is replaced with a element with the link as the src. Now the problem is there may be multiple matches within the text and this is where it gets tricky. As the url pattern will now match the src tags url, which will basically just enter an infinite loop.
So is there a way to ONLY match in regex if it doesnt start with a pattern like ="|=' ? as then it would match the url in something like:
some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6
but not
some image <img src="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6">
I am not sure if it is possible, but if it is could someone point me in the right direction? A replace by itself will not suffice in this scenario as the url matched needs to be used elsewhere too so it needs to be used like a capture.
The main scenarios I need to account for are:
Many links in one block of varied text
A single link without any other text
A single link with other varied text
== edit ==
Here is the current regex I am using to match urls:
(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
== edit 2 ==
Just so everyone understands why I cannot use the /g command here is an answer which explains the issue, if I could use this /g like I originally tried then it would make things a lot simpler.
Javascript regex multiple captures again
What you are looking for is a negative look behind, but Javascript doesn't support any kind of look behinds, so you will either have to use a callback function to check what was matched and make sure it is not preceded by a ' or ", or you can use the following regex:
(?:^|[^"'])(\b(https?|ftp|file):\/\/[-a-zA-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
which has a single problem, that is in the case of a successful match it will catch one more character, the one right before the (\b(https?|ftp|file) pattern in the input, but I think you can deal with this easily.
Regex101 Demo
Using the /ig command at the end should work... the g is for global replace and the i is for case-insensitivity, which is necessary as you've only got A-Z instead of a-zA-Z.
Using the following vanilla JS appears to work for me (see jsfiddle)...
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");
Although, what it does highlight is that the query string part of the URL (the ?v=6 is not being picked up with your RegEx).
For jQuery, it would be (see jsfiddle)...
$(document).ready(function(){
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
$("#output").html(test.replace(re,"<img src=\"$1\"/>"));
});
Update
Just in case my example of using the same image URL in the example doesn't convince you - it also works with different URLs... see this jsfiddle update
var test="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 http://cdn.sstatic.net/serverfault/img/sprites.png?v=7";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");
Couldn't you just see if there is a whitespace in front of the url, instead of that word-boundary? seems to work, although you will have to remove the matched whitespace later.
(\s(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
http://rubular.com/r/9wSc0HNWas
Edit: Damn, too slow :) I'll still leave this here as my regex is shorter ;)
as was said by freefaller, you might use /g flag to just find all matches in one go, if exec is not a must.
otherwise: you can add (="|=')? to the beginning of your regex, and check if $1 is undefined. if it is undefined, then it was not started with a ="|=' pattern

Building a Hashtag in Javascript without matching Anchor Names, BBCode or Escaped Characters

I would like to convert any instances of a hashtag in a String into a linked URL:
#hashtag -> should have "#hashtag" linked.
This is a #hashtag -> should have "#hashtag" linked.
This is a [url=http://www.mysite.com/#name]named anchor[/url] -> should not be linked.
This isn't a pretty way to use quotes -> should not be linked.
Here is my current code:
String.prototype.parseHashtag = function() {
return this.replace(/[^&][#]+[A-Za-z0-9-_]+(?!])/, function(t) {
var tag = t.replace("#","")
return t.link("http://www.mysite.com/tag/"+tag);
});
};
Currently, this appears to fix escaped characters (by excluding matches with the amperstand), handles named anchors, but it doesn't link the #hashtag if it's the first thing in the message, and it seems to grab include the 1-2 characters prior to the "#" in the link.
Halp!
How about the following:
/(^|[^&])#([A-Za-z0-9_-]+)(?![A-Za-z0-9_\]-])/g
matches the hashtags in your example. Since JavaScript doesn't support lookbehind, it tries to either match the start of the string or any character except & before the hashtag. It captures the latter so it can later be replaced. It also captures the name of the hashtag.
So, for example:
subject.replace(/(^|[^&])#([A-Za-z0-9_-]+)(?![A-Za-z0-9_\]-])/g, "$1http://www.mysite.com/tag/$2");
will transform
#hashtag
This is a #hashtag and this one #too.
This is a [url=http://www.mysite.com/#name]named anchor[/url]
This isn't a pretty way to use quotes
into
http://www.mysite.com/tag/hashtag
This is a http://www.mysite.com/tag/hashtag and this one http://www.mysite.com/tag/too.
This is a [url=http://www.mysite.com/#name]named anchor[/url]
This isn't a pretty way to use quotes
This probably isn't what t.link() (which I don't know) would have returned, but I hope it's a good starting point.
There is an open-source Ruby gem to do this sort of thing (hashtags and #usernames) called twitter-text. You might get some ideas and regexes from that, or try out this JavaScript port.
Using the JavaScript port, you'll want to just do:
var linked = TwitterText.auto_link_hashtags(text, {hashtag_url_base: "http://www.mysite.come/tag/"});
Tim, your solution was almost perfect. Here's what I ended up using:
subject.replace(/(^| )#([A-Za-z0-9_-]+)(?![A-Za-z0-9_\]-])/g, "$1#$2");
The only change is the first conditional, changed it to match the beginning of the string or a space character. (I tried \s, but that didn't work at all.)

Categories

Resources