URL extraction from string - javascript

I found a regular expression that is suppose to capture URLs but it doesn't capture some URLs.
$("#links").change(function() {
//var matches = new array();
var linksStr = $("#links").val();
var pattern = new RegExp("^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$","g");
var matches = linksStr.match(pattern);
for(var i = 0; i < matches.length; i++) {
It doesn't capture this url (I need it to):
But it captures this

Several things:
The main reason it didn't work, is when passing strings to RegExp(), you need to slashify the slashes. So this:
"^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$"
Should be:
"^(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
Next, you said that FF reported, "Regular expression too complex". This suggests that linksStr is several lines of URL candidates.
Therefore, you also need to pass the m flag to RegExp().
The existing regex is blocking legitimate values, eg: "HTTP://STACKOVERFLOW.COM". So, also use the i flag with RegExp().
Whitespace always creeps in, especially in multiline values. Use a leading \s* and $.trim() to deal with it.
Relative links, eg /file/63075291/LlMlTL355-EN6-SU8S.rar are not allowed?
Putting it all together (except for item 5), it becomes:
var linksStr = "http://www.wupload.com/file/63075291/LlMlTL355-EN6-SU8S.rar \n"
+ " http://XXXupload.co.uk/fun.exe \n "
+ " WWW.Yupload.mil ";
var pattern = new RegExp (
"^\\s*(https?:\/\/)?([\\da-z\\.-]+)\\.([a-z\\.]{2,6})([\/\\w \\.-]*)*\/?$"
, "img"
var matches = linksStr.match(pattern);
for (var J = 0, L = matches.length; J < L; J++) {
console.log ( $.trim (matches[J]) );
Which yields:

Why not do make:
URLS = str.match(/https?:[^\s]+/ig);

this will locate any url in text


Javascript regex invalid quantifier error to find 8 digit number in PDF

I have the following javascript code:
/* Extract pages to folder */
// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;
// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");
try {for (var i = 0; i < this.numPages; i++)
var id = /\ (?<!\d)\d{8}(?!\d)/;
nStart: i,
cPath: "/J/my file path/" + "SBIC_" + id + ".pdf"
} catch (e) { console.println("Aborted: " + e) }
I get the error that the quantifier is invalid in this line of code var reg = /\ (?<!\d)\d{8}(?!\d)/
However, this line of regex pulls the id 22001188 when I use it in https://regex101.com/ to find the 8 digit number in "I.D. Control 22001188".
Do I have to integrate the regex a different way in the code for it to search through the text in the document?
UPDATED 1/30/2023
I am using the below REGEX in the code to find the 8 digit ID I need. First, I put all the PDFs text into a string and then I use a search query to find it. Now I just need to figure out how to add the result into a variable so I can extract each page in the PDF by ID.
/* Extract pages to folder */
// function padLeft(s,len,c){c=c || '0'; while(s.length< len) s= c+s; return s; }
// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;
// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");
for (var i = 0; i < this.numPages; i++) { // Loop through the entire document
numWords = this.getPageNumWords(i); // Find out how many words are on the page
var WordString = ""; // Prepare a string
for (var j = 0; j < numWords; j++) // Put all the words on the page into a string
WordString = WordString + " " + this.getPageNthWord(i, j);
if (WordString.match(/\b\d{8}\b/)) { // Search for the word "Hello" in the string
search.matchWholeWord = true; // If we got here, we'll search for "Hello" in the document
search.query(WordString.match(/\b\d{8}\b/), "ActiveDoc");
UPDATED 2/2/2023
Below is the working code used to extract every page from the pdf and then name it the 8 digit ID found within the text of the pdf.
// Regular expression used to acquire the base name of file
var re = /\.pdf$/i;
// filename is the base name of the file Acrobat is working on
var filename = this.documentFileName.replace(re,"");
for (var i = 0; i < this.numPages; i++) { // Loop through the entire document
numWords = this.getPageNumWords(i); // Find out how many words are on the page
var WordString = ""; // Prepare a string
for (var j = 0; j < numWords; j++) // Put all the words on the page into a string
{WordString = WordString + " " + this.getPageNthWord(i, j);}
ID = WordString.match(/\b\d{8}\b/); // Search for the ID control # in the string
nStart: i,
cPath: "/J/Middle Office Read/Operational Support/SBA Spreadsheets & Forms/Funded SBAs/" + "SBIC_" + ID + ".pdf"
The sequence ?<! is a negative look-behind sequence which is not yet supported by all the browsers/systems.
It seems that it is not supported in your case.
You may use word boundaries in regex as given below to extract 8-digit numbers from your string:
Those (?<!\d) and (?!\d) are probably the problem. They are only supported in some regex libraries.
You can instead use ^\d{8}$ to match 8 digits at the start and end of the line, or \b\d{8}\b to match 8 digits surrounded by word boundaries, as said in ayush-s answer.

Javascript: Get first number substring for each semi-colon separated substring

I am creating a script of time calculation from MySQL as I don't want to load the scripts on server-side with PHP.
I am getting the data and parsing it using JSON, which gives me a string of values for column and row data. The format of this data looks like:
I need to split this string by semi-colon, and then extract the first VARCHAR number from before each comma to use that in subsequent calculation.
So for example, I would like to extract the following from the data above:
[1548145153, 1548145209, 1548148072, 1548161279, 1548145161, 1548148082, 1548161291]
I used the following type of for-loop but is not working as I wanted to:
for (var i=0; i < words.length; i++) {
var1 = words[i];
The string and the for-loop together are like following:
var processData = function(data) {
for(var a = 0; a < data.length; a++) {
var obj = data[a];
var str= obj.report // something like 1548145153,1548145165,End,Day;1548145209,1548145215,End,Day;1548148072,1548148086,End,Day;1548161279,1548161294,End,Day;1548145161,1548145163,End,Day;1548148082,1548148083,End,Day;1548161291,1548161293,End,Day
words = str.split(',');
words = str.split(';');
for (var i=0; i < words.length; i++) {
var1 = words[i];
var2 = var1[0];
Here is an approach based on a regular expression:
const str = "1548145153,1548145165,End,Day;1548145209,1548145215,End,Day;1548148072,1548148086,End,Day;1548161279,1548161294,End,Day;1548145161,1548145163,End,Day;1548148082,1548148083,End,Day;1548161291,1548161293,End,Day";
const ids = str.match(/(?<=;)(\d+)|(^\d+(?=,))/gi)
The general idea here is to classify the first VARCHAR value as either:
a number sequence directly preceded by a ; character (see 1 below) or, for the edge case
the very first number sequence of the input string directly followed by a , character (see 2 below).
These two cases are expressed as follows:
Match any number sequence that is preceded by a ; using the negated lookbehind rule: (?<=;)(\d+), where ; is the character that must follow a number sequence \d+ to be a match
Match any number sequence that is the first number sequence of the input string, and that has a , directly following it using the lookahead rule (^\d+(?=,)), where \d+ is the number sequence and , is the character that must directly follow that number sequence to be a match
These building blocks 1 and 2 are combined using the | operator to achieve the final result
First thing is that you override words with the content of str.split(';'), so it won't hold what you expect. To split the string into chunks, split by ; first, then iterate over the resulting array and within the loop, split by ,.
const str= "1548145153,1548145165,End,Day;1548145209,1548145215,End,Day;1548148072,1548148086,End,Day;1548161279,1548161294,End,Day;1548145161,1548145163,End,Day;1548148082,1548148083,End,Day;1548161291,1548161293,End,Day";
const lines = str.split(';');
lines.forEach(line => {
const parts = line.split(',');
What you are doing is not correct, you'll have to separate strings twice as there are two separators. i.e. a comma and a semicolon.
I think you need a nested loop for that.
var str = "1548145153,1548145165,End,Day;1548145209,1548145215,End,Day;1548148072,1548148086,End,Day;1548161279,1548161294,End,Day;1548145161,1548145163,End,Day;1548148082,1548148083,End,Day;1548161291,1548161293,End,Day"
let words = str.split(';');
for (var i=0; i < words.length; i++) {
let varChars = words[i].split(',');
for (var j=0; j < varChars.length; i++)
I hope this helps. Please don't forget to mark the answer.

Javascript respecting backslashes in input: negative lookbehind

In Javascript, I have a situation where I get input which I .split(/[ \n\t]/g) into an array. The point is that if a space is directly preceded by a backslash, I don't want the split to happen there.
E.g. is_multiply___spaced_text -> ['is','multiply','','','spaced','text']
But: is\_multiply\___spaced_text -> ['is multiply ','','spaced','text']
(Underscores used for spaces for clarity)
If this wasn't Javascript (which doesn't support lookbehinds in regex'es), I'd just use /(?<!\\)[ \n\t]/g. That doesn't work, so what would be the best way to handle this?
You can reverse the string, then use negative lookahead and then reverse the strings in the array:
var pre_results = "is\\ multiply\\ spaced text".split('').reverse().join('').split(/[ \t](?!\\)/);
var results = [];
for(var i = 0; i < pre_results.length; i++) {
for(var i = 0; i < results.length; i++) {
document.write(results[i] + "<br>");
In this example, the result should be:
['text', 'spaced', '', 'is\\ multiply\\']
"is\_multiply\___spaced_text".replace(/\_/, " ").replace(/_/, " ").split("_");

Yet Another document.referrer.pathname Thing

I'm looking for the equivalent of "document.referrer.pathname". I know there are other questions that are similar to this on SO, but none of them handle all the use cases. For example:
All examples should return:
Some folks may want the trailing slash included, but I don't because I'm matching against a list of referrers.
I've started with:
and am struggling adding the query string parsing.
If you have URLs as strings you can create empty anchors and give them the url as href to access the pathname:
var url = 'http://example.com/RESULT?query=string', // or document.referrer
a = document.createElement('a');
a.href = url;
var result = a.pathname.replace(/(^\/|\/$)/g,'');
I set up a test example for you here: http://jsfiddle.net/eWydy/
Try this regular expression:
If you don't want to create a new element for it or rely on a.pathname, I'd suggest using indexOf and slice.
function getPath(s) {
var i = s.indexOf('://') + 3, j;
i = s.indexOf('/',i) + 1; // find first / (ie. after .com) and start at the next char
if( i === 0 ) return '';
j = s.indexOf('?',i); // find first ? after first / (as before doesn't matter anyway)
if( j == -1 ) j = s.length; // if no ?, use until end of string
while( s[j-1] === '/' ) j = j - 1; // get rid of ending /s
return s.slice(i, j); // return what we've ended up at
If you want regex though, maybe this
which does "find the first ://, keep going until next /, then get everything that isn't a ? until a ? or the last / or end of string and capture it", which is basically the same as the function I did above.

Javascript Regex: Get everything from inside / tags

What I want
From the above subject I want to get search=adam and page=content and message=2.
What I have tried so far
But this is not good because sometimes the subject ends with nothing and in my case there must be a /
But this is not good because goes trought the (\/*?) and shows me everyting what's after /search=
Tool Tip:
Regex Tester
Use String.split(), no regex required:
var A = '/search=adam/page=content/message=2'.split('/');
Note that you may have to discard the first array item using .slice(1).
Then you can iterate through the name-value pairs using something like:
for(var x = 0; x < A.length; x++) {
var nameValue = A[x].split('=');
if(nameValue[0] == 'search') {
// do something with nameValue[1]
This assumes that no equals signs will be in the value. Hopefully this is the case, but if not, you could use nameValue.slice(1).join('=') instead of nameValue[1];
shows me everyting what's after /search=
You used a greedy .* that will happily match slashes as well. You can use a non-greedy .*?, or a character class that excludes the slash:
Here the front and end may be either a slash or the start/end (^/$) of the string. (I removed the +s, as I can't work out at all what they're supposed to be doing.)
Alternatively, forget the regex:
var params= {};
var pieces= subject.split('/');
for (var i= pieces.length; i-->0;) {
var ix= pieces[i].indexOf('=');
if (ix!==-1)
params[pieces[i].slice(0, ix)]= pieces[i].slice(ix+1);
Now you can just say params.search, params.page etc.

