RegEx for matching the first instance of a URL

RegEx for matching the first instance of a URL - javascript

Say I have the HTML in a string variable htmlString and I want to find the first instance of an mp3 link in the html, and store that link in a variable.
<html>
...
src="https://example.com/mp3s/2342344?id=24362456"
...
</html>
The link https://example.com/mp3s/2342344?id=24362456 will be extracted.
Note there are lots of other urls in the html, but I just want the one in this format.
How do I get this?

While it is not usually recommended to parse HTMLs using regular expressions, this expression might help you to design an expression, if you wish/have to get the first mp3 URL.
^(src=\x22(https:\/\/[a-z]+.com\/mp3s\/[0-9]+\?id=[0-9]+)\x22)[\s\S]*
I have added several boundaries to it, just to be safe, which you can simply remove it from or simplify it in the second capturing group where your desired URL is:
(https:\/\/[a-z]+.com\/mp3s\/[0-9]+\?id=[0-9]+)
The key is that to add a [\s\S]* such that it would pass everything else after capturing the first URL.
Graph
This graph shows how it would work:
JavaScript Demo with 10 million times performance benchmark
repeat = 10000000;
start = Date.now();
for (var i = repeat; i >= 0; i--) {
var string = 'src=\"https://example.com/mp3s/2342344?id=24362456\" src=\"https://example.com/mp3s/08103480132984?id=0a0f8ad0f8\" src=\"https://example.com/mp3s/2342344?id=24362456\" href=\"https://example.com/mp3s/2342344?id=91847890\" src=\"https://example.com/mp3s/2342344?id0980184\"';
var regex = /^(src=\x22(https:\/\/[a-z]+.com\/mp3s\/[0-9]+\?id=[0-9]+)\x22)[\s\S]*/g;
var match = string.replace(regex, "$2");
}
end = Date.now() - start;
console.log(match + " is a match 💚 ");
console.log(end / 1000 + " is the runtime of " + repeat + " times benchmark test. 😳 ");

Related

Transform number to specific string using regex

I can't seem to get my head around javascript regex, so I need your help!
I need to transform the following:
1234567891230
Into:
urn:epc:id:sgln:12345678.9123.0
I already did it with a normal javascript algorithm (see underneath), but we need to be able to configure this transformation. I just need it for the default configuration value!
Using slice it would be:
var result = "urn:epc:id:sgln:" + myString.slice(0, 8) + "." +
myString.slice(8, 12) + "." + myString.slice(12);
If you can include an explanation in your answer I would be grateful :)

If you want to use regex for this try the following:
var regex = /(\d{8})(\d{4})/;
var splittedNumber = regex.exec("123456789123");
var result = "urn:epc:id:sgln:"+splittedNumber[1]+"."+splittedNumber[2]+".0";
console.log(result);
But I would recommend the string split you did already.

You could use a regex to capture 3 groups for 12345678, 9123 and 0 and use a word boundary \b at the begin and at the end.
Then using slice you could get all elements but leave out the first element from the array returned by match because that contains the full match that we don't need.
After that you could join the elements from the array using the dot as the separator.
\b(\d{8})(\d{4})(\d)\b
let str = "1234567891230";
let prefix = "urn:epc:id:sgln:";
console.log(prefix + str.match(/\b(\d{8})(\d{4})(\d)\b/).slice(1).join("."));

Split a text and put some text between them

i have a Javascript code from an extension im creating and i need to split the word im selecting in like, half for each part...
for example this is my code that i use for every page i need
function child1_7Search(info,tab) {
chrome.tabs.create({
url: "www.blablabla.com/" + info.selectionText,
});
}
but i have to split the selected code in 2. For example, my selected code is 1854BIGGER000208, where the first four letters need to be split in half and put somewhere in the URL and the other twelve letters needs to be put in other place, but in the URL.
the page needs to look something like this
https://www.urbano.com.ar/urbano3/wbs/cespecifica/?shi_codigo=001854&cli_codigo=BIGGER000208
where in shi_codigo adds two zeros and put the first half, and in cli_codigo puts the rest of the code.
The selected code its always the same length!

you can try to concatenate parts like this..
// this is your original text / code that you get
var text = "1854BIGGER000208"
// here we `slice` or take first 4 chars from it
var pret = text.slice(0,4);
// here we are taking other half of the text
var post = text.slice(4);
// and here we just concatenate them into final url part
var final = "shi_codigo" + "00" + pret + "&cli_codigo=" + post
console.log( final );
I guess that you will want to concatenate the first part of the url also and for that you can also prepend it with + sign as we did with all parts of the code above..

Here's a simple solution using .substring() method:
var code = "1854BIGGER000208";
var shi = "00" + code.substring(0, 4);
var cli = code.substring(4);
var url = "https://www.urbano.com.ar/urbano3/wbs/cespecifica/?shi_codigo=" + shi + "&cli_codigo=" + cli;
console.log(url);
Note:
code.substring(0, 4) will extract the first four digits from the selection, returns 1854.
And code.substring(4) will extract the remaining characters in the selection and returns BIGGER000208.
Note the use of "" in "00" the two zeros are wrapped in a string
so they can be concatenated with the shi code, otherwise 00+1854 will
be evaluated as 1854.

Here are a number of string functions you can use in JavaScript.
https://www.w3schools.com/jsref/jsref_obj_string.asp
In particular, you may want to use the slice function. Syntax is as follows :
var l = info.selectionText.length() - 1;
var num = info.selectionText.slice(0,3);
var end = info.selectionText.slice(4, l);
Here, the properties being passed into the slice function are the start and stop points of where you would like to slice the string. As usual, the index starts at 0.

Solution using ES6 Sintax
let yourString = "yourstringthatneedstobesliced";
let initial = yourString.slice(0,4);
let post = yourString.slice(4);
let char = `shi_codigo=00${initial}&cli_codigo=${post}`;

Regext to match any substring longer than 2 characters

regex-pattern-to-match-any-substring-matching exact characters longer-than-two-characters-from-a-provided input,where ever exact string matches
Only pot or potato should be highlighted, instead of ota or ot, when user type pota and click search button.
Please find code below where matched string is highlighted.
// Core function
function buildRegexFor(find) {
var regexStr = find.substr(0,3);
for (var i = 1; i < find.length - 2; i++) {
regexStr = '(' + regexStr + find.substr(i+2,1) + '?|' + find.substr(i,3) + ')';
}
return regexStr;
}
// Handle button click event
document.querySelector('button').onclick = function () {
// (1) read input
var find = document.querySelector('input').value;
var str = document.querySelector('textarea').value;
// (2) build regular expression using above function
var regexStr = buildRegexFor(find);
// (3) apply regular expression to text and highlight all found instances
str = str.replace(new RegExp(regexStr, 'g'), "<strong class='boldtxt'>$1</strong>");
// (4) output
document.querySelector('span').textContent = regexStr;
document.querySelector('div').innerHTML = str;
};
consider "meter & parameter" as one string, if type meter in input box and click search button. meter should be highlighted as well as meter in parameter should highlight.Thanks in advance

Your for loop is set to go from i = 1, while i is less than find.length-2. find.length is 4. 4-2 is 2. So your for loop is set to go from i = 1 while i is less than 2. In other words, it's operating exactly once. I have no idea what you thought that for loop was going to do, but I'm betting that isn't it.
Prior to the for loop, regextr is set equal to the string pot (the first three characters of the find string. The first (and only) time through the for loop, it is set to a new value: the left paren, the existing value (pot), the fourth character of find (a), the question mark, the vertical bar, and three characters from find starting with the second. Put those together, and your regextr comes out to:
(pota?|ota)
That RegEx says to find either the string "pota" (with the a being optional, so "pot" also works) or the string "ota". So any instances of pota, pot, or ota will be found and highlighted.
If you just wanted "pota?", just eliminate the right half of that line inside the for loop. Better yet, replace the entire subroutine with just a line that appends the ? character to the find string.

Get javascript node raw content

I have a javascript node in a variable, and if I log that variable to the console, I get this:
"asekuhfas eo"
Just some random string in a javascript node. I want to get that literally to be a string. But the problem is, when I use textContent on it, I get this:
asekuhfas eo
The special character is converted. I need to get the string to appear literally like this:
asekuhfas eo
This way, I can deal with the special character (recognize when it exists in the string).
How can I get that node object to be a string LITERALLY as it appears?

As VisionN has pointed out, it is not possible to reverse the UTF-8 encoding.
However by using charCodeAt() you can probably still achieve your goal.
Say you have your textContent. By iterating through each character, retrieving its charCode and prepending "&#" as well as appending ";" you can get your desired result. The downside of this method obviously being that you will have each and every character in this annotation, even those do not require it. By introducing some kind of threshold you can restrict this to only the exotic characters.
A very naive approach would be something like this:
var a = div.textContent;
var result = "";
var treshold = 1000;
for (var i = 0; i < a.length; i++) {
if (a.charCodeAt(i) > 1000)
result += "&#" + a.charCodeAt(i) + ";";
else
result += a[i];
}

textContent returns everything correctly, as  is the Unicode Character 'ZERO WIDTH SPACE' (U+200B), which is:
commonly abbreviated ZWSP
this character is intended for invisible word separation and for line break control; it has no width, but its presence between two characters does not prevent increased letter spacing in justification
It can be easily proven with:
var div = document.createElement('div');
div.innerHTML = 'xXx';
console.log( div.textContent ); // "xXx"
console.log( div.textContent.length ); // 4
console.log( div.textContent[0].charCodeAt(0) ); // 8203
As Eugen Timm mentioned in his answer it is a bit tricky to convert UTF characters back to HTML entities, and his solution is completely valid for non standard characters with char code higher than 1000. As an alternative I may propose a shorter RegExp solution which will give the same result:
var result = div.textContent.replace(/./g, function(x) {
var code = x.charCodeAt(0);
return code > 1e3 ? '&#' + code + ';' : x;
});
console.log( result ); // "xXx"
For a better solution you may have a look at this answer which can handle all HTML special characters.

How can I replace string of numbers in document.body.innerHTML with hyperlink

Currently I am trying to write a function in javascript that will search through doc.body.innerHTML and search for a set string of numbers always in the same format.
Ex. 00000000123456
When it locates this number I want to have it deduct the 8 0's and replace it with only the 123456.
Once it has only the last 6 digits I would like to make it a hyperlink to search those digits on a specific page.
If it is however any easier to code it is not entirely important to remove the 8 0's before making a hyperlink as long as I can make the hyperlink itself search the last 6 digits only.
I've tried a few different way but none work, please be easy as I am fairly new to this.
edit :
Example :
page contains he following
Name : john johnson
account # : 00000000123456
email :john#johnson.com
I need to find the account number, remove the 0's from the beginning and replace the 00000000123456 with 123456. the 123456 will then become a hyperlink that will bring me to said account page.

hope you are looking for the below:(Hint)
document.write("<p>Link: " + txt.link("http://www.example.com") + "</p>");

I believe this does what you want:
document.body.innerHTML = document.body.innerHTML.replace(/0+(\d{6})/g, function(match, number){
return "<a href='/search/" + number + "'>" + number + "</a>";
});
This finds any instance of a six-digit number preceded by one or more zeroes, and turns it into a link to '/search/[the six-digit number]'. You can run it in your browser's javascript console to test it out.
You may want to look into replacing only the relevant portion of the page, rather than the entire document (in the interest of performance).

This could be one way to do it:
var body = document.getElementsByTagName('body')[0];
// search through each node in the DOM starting from the body element
(function search(child){
for(var i = 0;i < child.childNodes.length;i++){
search(child.childNodes[i]);
}
// if it's a text node
if(child.nodeType === 3){
// if it matches the number replace with a link
var result = /^(:?0{8}(\d+))/.exec(child.textContent);
if(result){
link = document.createElement('a');
// change this to be the href you want
link.href = result[2];
// change this to be the text you want
link.innerHTML = 'search '+result[2];
child.parentNode.replaceChild(link, child);
}
}
})(body);
Demo: http://jsfiddle.net/louisbros/6a8fA/

Develop Reference

JavaScript is the programming language of the Web.

RegEx for matching the first instance of a URL - javascript

Related

Transform number to specific string using regex

Split a text and put some text between them

Regext to match any substring longer than 2 characters

Get javascript node raw content

How can I replace string of numbers in document.body.innerHTML with hyperlink

Categories

Resources