Data extraction from a generated <script> and process the results - javascript

string Url= "https://www.audiusa.com/dealers-webapp/map/dealer/423E99";
HtmlWeb web = new HtmlWeb();
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
HtmlDocument doc = web.Load(Url);
var scriptGoogleTagManager = doc.DocumentNode.SelectNodes("//script").Where(x => x.InnerHtml.Contains("window.Audi.Vars.searchType"));
if (scriptGoogleTagManager )
{
foreach(var tag in scriptGoogleTagManager)
{
var s = tag.InnerText;
Regex r = new Regex("\\s+window\\.Audi\\.Vars\\.searchResult\\s+\\=\\s+");
Match m = r.Match(s.ToLower());
}
}
In above script I want to extract values after window.Audi.Vars.searchResult = and window.Audi.Vars.dealers = .I am facing problem in regex as I dont have much knowledge of it .Kindly help me

I understand you want to get rid of e.g.
window.Audi.Vars.searchResult =
var extract = s.slice(31); // since the string "window.Audi.Vars.searchResult =" has 31 chars
The slice() method extracts parts of a string and returns the extracted parts in a new string. Use the start and end parameters to specify the part of the string you want to extract. Here we only give the start param and it extracts to the end. The first character has the position 0, the second has position 1, and so on. >br> Regex is imho good when relacing, removing chars in a string here a simpler method works.
Modify your code and post the console result:
var scriptGoogleTagManager = doc.DocumentNode.SelectNodes("//script").Where(x => x.InnerHtml.Contains("window.Audi.Vars.searchType"));
if (scriptGoogleTagManager )
{
foreach(var tag in scriptGoogleTagManager)
{
var s = tag.InnerText;
console.debug("[content of s] " + s);
var extract = s.slice(31); // since the string
}
}

Related

Find and replace a string with a number attached

Sample URL:
.com/projects.php?&filterDate=this_week?page=5
The query strings like I've listed above may or may not have the ?page=5 query string in them. I'm looking for a way to grab the URL (done), search the string to determine whether or not it has the ?page=# query string (also done), add it in if it's not there (also done), but if it is there, replace it with a different number (need help with this). The code currently doesn't change the query string (ie page=5 doesn't change to page=6 or anything else for that matter). It doesn't seem like the .replace method's regex is correct (see current_window_location3 variable) below.
//Get the current URL
var current_window_location = window.location.href;
if(current_window_location.match("\\?page=([^&]+)")){
//Replace this query string
var current_window_location3 = current_window_location.replace("\\?page=([^&]+)", new_page_num);
//Go to this newly replaced location
window.location = current_window_location3;
}else{
//Add clicked '?page=#' query string to the URL
var current_window_location2 = current_window_location + "?page="+new_page_num;
//Go to this new location
window.location = current_window_location2;
}
String.prototype.replace() takes as its first value, the search pattern, a string or a regex. If a string ("abc?") is given, it searches for the literal string to replace. If a regex (/abc?/) is passed a regex match, a search is done using regular expressions.
Change your
var current_window_location3 = current_window_location.replace("\\?page=([^&]+)", new_page_num);
to
var current_window_location3 = current_window_location.replace(/\?page=([^&]+)/, new_page_num);
Here's an illustrating snippet:
var current_window_location = '.com/projects.php?&filterDate=this_week*?page=5*',
window_location = '',
new_page_num = 123;
if(current_window_location.match("\\?page=([^&]+)")){
//Replace this query string
var current_window_location = current_window_location.replace(/\?page=([^&]+)/, new_page_num);
//Go to this newly replaced location
window_location = current_window_location;
} else {
//Add clicked '?page=#' query string to the URL
var current_window_location = current_window_location + "?page="+new_page_num;
//Go to this new location
window_location = current_window_location;
}
document.write("window_location = " + window_location);

regex: get string in url login/test

I have a url
https://test.com/login/param2
how do I get the the second parameter "param2" from the url using REGEX?
the url can also be
https://test.com/login/param2/
So the regex should work for both urls.
I tried
var loc = window.location.href;
var locParts = loc.split('/');
and then looping through locParts, but that seems inefficient.
The "param2" can be have number, alphatical character from a-z, and a dash.
Use String#match method with regex /[^\/]+(?=\/?$)/.
var a = 'https://test.com/login/facebook',
b = 'https://test.com/login/facebook/';
var reg = /[^\/]+(?=\/?$)/;
console.log(
a.match(reg)[0],
b.match(reg)[0]
)
Or using String#split get last non-empty element.
var a = 'https://test.com/login/facebook',
b = 'https://test.com/login/facebook/';
var splita = a.split('/'),
splitb = b.split('/');
console.log(
splita.pop() || splita.pop(),
splitb.pop() || splitb.pop()
)
If you don't mind using JS only (so no regex), you can use this :
var lastParameter = window.location.href.split('/').slice(-1);
Basicaly, like you, I fetch the URL, split by the / character, but then I use the splice function to get teh last element of the split result array.
Regular expressions might be compact, but they're certainly not automatically efficient if you can do what you want without.
Here's how you can change your code:
var loc = 'https://test.com/login/facebook/'; // window.location.href;
var locParts = loc.split('/').filter(function(str) {return !!str});
var faceBookText = locParts.pop();
console.log(faceBookText);
The filter removes the last empty item you would get if the url ends with '/'. That's all you need, then just take the last item.

regex to find specific strings in javascript

disclaimer - absolutely new to regexes....
I have a string like this:
subject=something||x-access-token=something
For this I need to extract two values. Subject and x-access-token.
As a starting point, I wanted to collect two strings: subject= and x-access-token=. For this here is what I did:
/[a-z,-]+=/g.exec(mystring)
It returns only one element subject=. I expected both of them. Where i am doing wrong?
The g modifier does not affect exec, because exec only returns the first match by specification. What you want is the match method:
mystring.match(/[a-z,-]+=/g)
No regex necessary. Write a tiny parser, it's easy.
function parseValues(str) {
var result = {};
str.split("||").forEach(function (item) {
var parts = item.split("=");
result[ parts[0] /* key */ ] = parts[1]; /* value */
});
return result;
}
usage
var obj = parseValues("subject=something||x-access-token=something-else");
// -> {subject: "something", x-access-token: "something-else"}
var subj = obj.subject;
// -> "something"
var token = obj["x-access-token"];
// -> "something-else"
Additional complications my arise when there is an escaping schema involved that allows you to have || inside a value, or when a value can contain an =.
You will hit these complications with regex approach as well, but with a parser-based approach they will be much easier to solve.
You have to execute exec twice to get 2 extracted strings.
According to MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string.
Usually, people extract all strings matching the pattern one by one with a while loop. Please execute following code in browser console to see how it works.
var regex = /[a-z,-]+=/g;
var string = "subject=something||x-access-token=something";
while(matched = regex.exec(string)) console.log(matched);
You can convert the string into a valid JSON string, then parse it to retrieve an object containing the expected data.
var str = 'subject=something||x-access-token=something';
var obj = JSON.parse('{"' + str.replace(/=/g, '":"').replace(/\|\|/g, '","') + '"}');
console.log(obj);
I don't think you need regexp here, just use the javascript builtin function "split".
var s = "subject=something1||x-access-token=something2";
var r = s.split('||'); // r now is an array: ["subject=something1", "x-access-token=something2"]
var i;
for(i=0; i<r.length; i++){
// for each array's item, split again
r[i] = r[i].split('=');
}
At the end you have a matrix like the following:
y x 0 1
0 subject something1
1 x-access-token something2
And you can access the elements using x and y:
"subject" == r[0][0]
"x-access-token" == r[1][0]
"something2" == r[1][1]
If you really want to do it with a pure regexp:
var input = 'subject=something1||x-access-token=something2'
var m = /subject=(.*)\|\|x-access-token=(.*)/.exec(input)
var subject = m[1]
var xAccessToken = m[2]
console.log(subject);
console.log(xAccessToken);
However, it would probably be cleaner to split it instead:
console.log('subject=something||x-access-token=something'
.split(/\|\|/)
.map(function(a) {
a = a.split(/=/);
return { key: a[0], val: a[1] }
}));

How can we split a string using starts with regular expression (/^myString/g)

I am having a case where i need to split given string using starts with regex (/^'searchString'/) which is not working such as
"token=123412acascasdaASDFADS".split('token=')
Here i want to extract the token value but as there might be some other possible parameters such as
"reset_token=SDFDFdsf12313ADADF".split('token=')
Here it also split the string with 'token=', Thats why i need to split the string using some regex which states that split the string where it starts with given string.
Thanks..
EDITED
Guys thanks for your valuable response this issue can be resolve using /\btoken=/ BUT BUT its does not work if 'token=' stored as a string into a variable such as
sParam = 'token=';
"token=123412acascasdaASDFADS".split(/\bsParam/);
This does not works.
You can use regex in split with word boundary:
"token=123412acascasdaASDFADS".split(/\btoken=/)
If token is stored in a variable then use RegExp constructor:
var sParam = "token";
var re = new RegExp("\\b" + sParam + "=");
Then use it:
var tokens = "token=123412acascasdaASDFADS".split( re );
This is the use case for the \b anchor:
\btoken=
It ensures there's no other word character before token (a word character being [a-zA-Z0-9_])
You need to split the string using the & parameter delimiter, then loop through those parameters:
var token;
$.each(params.split('&'), function() {
var parval = this.split('=');
if (parval[0] == "token") {
token = parval[1];
return false; // end the $.each loop
}
});
if you just use token= as the split delimiter, you'll include all the other parameters after it in the value.
It's not clear what you need, but this may be an idea to work with?
var reqstr = "token=12345&reset_token=SDFDFdsf12313ADADF&someval=foo"
.split(/[&=]/)
,req = [];
reqstr.map( function (v, i) {
if (i%2==0) {
var o = {};
o[/token/i.test(v) ? 'token' : v] = reqstr[i+1];
this.push(o);
} return v
}, req);
/* => req now contains:
[ { token: '12345' },
{ token: 'SDFDFdsf12313ADADF' },
{ someval: 'foo' } ]
*/
You can try with String#match() function and get the matched group from index 1
sample code
var re = /^token=(.*)$/;
var str = 'token=123412acascasdaASDFADS';
console.log('token=123412acascasdaASDFADS'.match('/^token=(.*)$/')[1]);
output:
123412acascasdaASDFADS
If token is dynamic then use RegExp
var token='token=';
var re = new RegExp("^"+token+"(.*)$");
var str = 'token=123412acascasdaASDFADS';
console.log(str.match(re)[1]);
Learn more...

Split string into two parts [duplicate]

This question already has answers here:
Split string once in javascript?
(17 answers)
Closed 8 years ago.
I know there are several ways to split an array in jQuery but I have a special case:
If I have for example this two strings:
"G09.4 What"
"A04.3 A new Code"
When I split the first by ' ' I can simply choose the code in front with [0] what would be G09.4. And when I call [1] I get the text: What
But when I do the same with the second string I get for [1] A but I want to retrieve A new Code.
So how can I retrieve for each string the code and the separate text?
Use
var someString = "A04.3 A new Code";
var index = someString.indexOf(" "); // Gets the first index where a space occours
var id = someString.substr(0, index); // Gets the first part
var text = someString.substr(index + 1); // Gets the text part
You can split the string and shift off the first entry in the returned array. Then join the leftovers e.g.
var chunks = "A04.3 A new Code".split(/\s+/);
var arr = [chunks.shift(), chunks.join(' ')];
// arr[0] = "A04.3"
// arr[1] = "A new Code"
Instead of splitting the string on the space, use a combination of indexOf and slice:
var s = "A04.3 A new Code";
var i = s.indexOf(' ');
var partOne = s.slice(0, i).trim();
var partTwo = s.slice(i + 1, s.length).trim();
You can use match() and capture what you need via a regular expression:
"G09.4 What".match(/^(\S+)\s+(.+)/)
// => ["G09.4 What", "G09.4", "What"]
"A04.3 A new Code".match(/^(\S+)\s+(.+)/)
// => ["A04.3 A new Code", "A04.3", "A new Code"]
As you can see the two items you want are in [1] and [2] of the returned arrays.
What about this one:
function split2(str, delim) {
var parts=str.split(delim);
return [parts[0], parts.splice(1,parts.length).join(delim)];
}
FIDDLE
Or for more performance, try this:
function split2s(str, delim) {
var p=str.indexOf(delim);
if (p !== -1) {
return [str.substring(0,p), str.substring(p+1)];
} else {
return [str];
}
}
You can get the code and then remove it from the original string leaving you with both the code and the string without the code.
var originalString = "A04.3 A new Code",
stringArray = originalString.split(' '),
code,
newString;
code = stringArray[0];
newString = originalString.replace(code, '');

Categories

Resources