Detect URl pattern - javascript

I have data collecting software.
Data: site visits/views.
So i have a lot views data: page url, date, visitor info.
Most of URLs is just different filters or something like. I.e URLs same but it have dynamic parameters.
For example:
site1.com/?search=something
site1.com/?search=some_word
site1.com/?search=hello
site1.com/?search=world
Should be "detected" as site1.com/?search={variable}
So that is a question:
Any algorithms to auto-detect patterns of URL?
Or some analyzing classes/functions? Any programming language.
Need solution that can process big batches of URLs.
Wihout any manual pattern defining(coz i dont know it and cant do it manually for many different sites).
UPD
For example:
I have many different URLs. From many sites. I dont know how these sites work. So i need to get for example 500 URLs from one site then compare and group it by common part to get 10 unique urls as result. Which should be automatically merged via replacing with {var} any dynamic URL parts.

I think you won't get much out of a simple pattern, and have to write partially Complex algorithm something along the line of:
break each URI to it's parts: domain, page, Query-String (as keys-values)
group all URIs from same domain
if there is a page, group by that too. (most sites today use url rewrite rules so there isn't a real "PAGE")
here come the "hard part":
Match Query String Variables between the grouped URIs
if a Var is matching All(almost, all) uris, it might be meaningful to the content.
if all (almost) have the same value, it might be smth less meaningful...
note: you should also pre-check some common VarIds like: search, q,query, id,itemId, etc...
One last thing, today, as i mentioned, parts of the URL (aside from queryString) can infer dynamic parameters (e.g. Ebay items: www.ebay.com/itm/9125483; www.ebay.com/itm/{itemId})
but hey, that's why you are paid for, to think about all those issues :p
Good luck.

Here is some kind Proof of Concept :)
Example of splitting URL by "?"
Parse parameters.
Calculate frequency for unique parameter values.
Get Nth percentile.
Build URLs and replace parameters which frequency is more than Nth percentile
For small data like here in sandbox 50 percentile is enough to group some URL.
For "big real data" 90-95 percentile.
For example: I use 90 percentile for 5000 links -> result ~200 links
<?php
$stats = [];
$pages = [
(object)['page' => 'http://example.com/?page=123'],
(object)['page' => 'http://example.com/?page=123'],
(object)['page' => 'http://example.com/?page=123'],
(object)['page' => 'http://example.com/?page=321'],
(object)['page' => 'http://example.com/?page=321'],
(object)['page' => 'http://example.com/?page=321'],
(object)['page' => 'http://example.com/?page=qwas'],
(object)['page' => 'http://example.com/?page=safa15'],
]; // array of objects with page property = URL
$params_counter = [];
foreach ($pages as $page) {
$components = explode('?', $page->page);
if (!empty($components[1])) {
parse_str($components[1], $params);
foreach ($params as $key => $val) {
if (!isset($params_counter[$key][$val])) {
$params_counter[$key][$val] = 0;
}
$params_counter[$key][$val]++;
}
}
}
function procentile($percentile, $array)
{
sort($array);
$index = ($percentile/100) * count($array);
if (floor($index) == $index) {
$result = ($array[$index-1] + $array[$index])/2;
} else {
$result = $array[floor($index)];
}
return $result;
}
$some_data = [];
foreach ($params_counter as $key => $val) {
$some_data[$key] = count($val);
}
$procentile = procentile(90, $some_data);
foreach ($pages as $page) {
$components = explode('?', $page->page);
if (!empty($components[1])) {
parse_str($components[1], $params);
arsort($params);
foreach ($params as $key => $val) {
if ($some_data[$key] > $procentile) {
$params[$key] = '$var';
}
}
arsort($params);
$pattern = http_build_query($params);
$new_url = urldecode('?'.$pattern);
if (!isset($stats[$new_url])) {
$stats[$new_url] = 0;
}
$stats[$new_url]++;
}
}
arsort($stats);

I think what OP wants is regex, first you find the domain part in the url using regex, then you can remove the domain part and anything after the matched parts will remain (aka patterns).
for instance,
/^\w*.\w*(.\w*)?/\?search=/
will match the domain part in the url up to the ?search= part, then if you remove them from the whole urls you gonna get the pattern.
however i think it will match all domain-like strings in the url so you might want to change this so you don't remove the necessary part
edited for grammar and stuff

Unfortunately, I think you're out of luck without using pattern matching. You could use a library, or someone else's code for now, but there are just too many variations to solve this problem otherwise. Try this on for size:
function getURLQueryString(url) {
var query_list = {};
var query_strings = url.match(/.*\?(.*)/)[1].split('&');
var i, param;
for(i in query_strings) {
param = query_strings[i].split('=');
query_list[ param[0] ] = param[1];
}
return query_list
}
You'll get back an object in which every key-value pair is a parameter from the query string.

Related

I need to allocate a url to very student name in Javascript

The name list is supposedly as below:
Rose : 35621548
Jack : 32658495
Lita : 63259547
Seth : 27956431
Cathy: 75821456
Given you have a variable as StudentCode that contains the list above (I think const will do! Like:
const StudentCode = {
[Jack]: [32658495],
[Rose]: [35621548],
[Lita]: [63259547],
[Seth]: [27956431],
[Cathy]:[75821456],
};
)
So here are the questions:
1st: Ho can I define them in URL below:
https://www.mylist.com/student=?StudentCode
So the link for example for Jack will be:
https://www.mylist.com/student=?32658495
The URL is imaginary. Don't click on it please.
2nd: By the way the overall list is above 800 people and I'm planning to save an external .js file to be called within the current code. So tell me about that too. Thanks a million
Given
const StudentCode = {
"Jack": "32658495",
"Rose": "35621548",
"Lita": "63259547",
"Seth": "27956431",
"Cathy": "75821456",
};
You can construct urls like:
const urls = Object.values(StudentCode).map((c) => `https://www.mylist.com?student=${c}`)
// urls: ['https://www.mylist.com?student=32658495', 'https://www.mylist.com?student=35621548', 'https://www.mylist.com?student=63259547', 'https://www.mylist.com?student=27956431', 'https://www.mylist.com?student=75821456']
To get the url for a specific student simply do:
const url = `https://www.mylist.com?student=${StudentCode["Jack"]}`
// url: 'https://www.mylist.com?student=32658495'
Not sure I understand your second question - 800 is a rather low number so will not be any performance issues with it if that is what you are asking?
The properties of the object (after the trailing comma is removed) can be looped through using a for-in loop, (see: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/for...in)
This gives references to each key of the array and the value held in that key can be referenced using objectName[key], Thus you will loop through your object using something like:
for (key in StudentCode) {
keyString = key; // e.g = "Jack"
keyValue = StudentCode[key]; // e.g. = 32658495
// build the urls and links
}
to build the urls, string template literals will simplify the process (see: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals) allowing you to substitute values in your string. e.g.:
url = `https://www.mylist.com/student=?${StudentCode[key]`}
Note the use of back ticks and ${} for the substitutions.
Lastly, to build active links, create an element and sets its innerHTML property to markup built using further string template literals:
let link = `<a href=${url}>${keyValue}</a>`
These steps are combined in the working snippet here:
const StudentCode = {
Jack: 32658495,
Rose: 35621548,
Lita: 63259547,
Seth: 27956431,
Cathy: 75821456,
};
const studentLinks = [];
for (key in StudentCode) {
let url = `https://www.mylist.com/student=?${StudentCode[key]}`;
console.log(url);
studentLinks.push(`<a href href="url">${key}</a>`)
}
let output= document.createElement('div');
output.innerHTML = studentLinks.join("<br>");
document.body.appendChild(output);

Perform "javascript/jQuery-like" functions using PHP

I'm trying to move some processing from client to server side.
I am doing this via AJAX.
In this case t is a URL like this: https://itunes.apple.com/us/podcast/real-crime-profile/id1081244497?mt=2&uo=2.
First problem, I need to send a bunch of these URLs through this little function, to just pull out "1081244497" using my example. The following accomplishes this in javascript, but not sure how to make it loop in PHP.
var e = t.match(/id(\d+)/);
if (e) {
podcastid= e[1];
} else {
podcastid = t.match(/\d+/);
}
The next part is trickier. I can pass one of these podcastid at a time into AJAX and get back what I need, like so:
$.ajax({
url: 'https://itunes.apple.com/lookup',
data: {
id: podcastid,
entity: 'podcast'
},
type: 'GET',
dataType: 'jsonp',
timeout: 5000,
success: function(data) {
console.log(data.results);
},
});
What I don't know how to do is accomplish this same thing in PHP, but also using the list of podcastids without passing one at a time (but that might be the only way).
Thoughts on how to get started here?
MAJOR EDIT
Okay...let me clarify what I need now given some of the comments.
I have this in PHP:
$sxml = simplexml_load_file($url);
$jObj = json_decode($json);
$new = new stdClass(); // create a new object
foreach( $sxml->entry as $entry ) {
$t = new stdClass();
$t->id = $entry->id;
$new->entries[] = $t; // create an array of objects
}
$newJsonString = json_encode($new);
var_dump($new);
This gives me:
object(stdClass)#27 (1) {
["entries"]=>
array(2) {
[0]=>
object(stdClass)#31 (1) {
["id"]=>
object(SimpleXMLElement)#32 (1) {
[0]=>
string(64) "https://itunes.apple.com/us/podcast/serial/id917918570?mt=2&uo=2"
}
}
[1]=>
object(stdClass)#30 (1) {
["id"]=>
object(SimpleXMLElement)#34 (1) {
[0]=>
string(77) "https://itunes.apple.com/us/podcast/real-crime-profile/id1081244497?mt=2&uo=2"
}
}
}
}
What I need now is to pull out each of the strings (the URLs) and then run them through a function like the following to just end up with this: "917918570,1081244497", which is just a piece of the URL, joined by a commas.
I have this function to get the id number for one at a time, but struggling with how the foreach would work (plus I know there has to be a better way to do this function):
$t="https://itunes.apple.com/us/podcast/real-crime-profile/id1081244497?mt=2&uo=2";
$some =(parse_url($t));
$newsome = ($some['path']);
$bomb = explode("/", $newsome);
$newb = ($bomb[4]);
$mrbill = (str_replace("id","",$newb,$i));
print_r($mrbill);
//outputs 1081244497
find match preg_match() and http_build_query() to turn array into query string. And file_get_contents() for the request of the data. and json_decode() to parse the json responce into php array.
in the end it should look like this.
$json_array = json_decode(file_get_contents('https://itunes.apple.com/lookup?'.http_build_query(['id'=>25,'entity'=>'podcast'])));
if(preg_match("/id(\d+)/", $string,$matches)){
$matches[0];
}
You may have to mess with this a little. This should get you on the right track though. If you have problems you can always use print_r() or var_dump() to debug.
As far as the Apple API use , to seperate ids
https://itunes.apple.com/lookup?id=909253,284910350
you will get multiple results that come back into an array and you can use a foreach() loop to parse them out.
EDIT
Here is a full example that gets the artist name from a list of urls
$urls = [
'https://itunes.apple.com/us/podcast/real-crime-profile/id1081244497?mt=2&uo=2.',
'https://itunes.apple.com/us/podcast/dan-carlins-hardcore-history/id173001861?mt=2'
];
$podcast_ids = [];
$info = [];
foreach ($urls as $string) {
if (preg_match('/id(\d+)/', $string, $match)) {
$podcast_ids[] = $match[1];
}
}
$json_array = json_decode(file_get_contents('https://itunes.apple.com/lookup?' . http_build_query(['id' => implode(',', $podcast_ids)])));
foreach ($json_array->results as $item) {
$info[] = $item->artistName;
}
print '<pre>';
print_r($info);
print '</pre>';
EDIT 2
To put your object into an array just run it through this
foreach ($sxml->entries as $entry) {
$urls[] = $entry->id[0];
}
When you access and object you use -> when you access an array you use []. Json and xml will parse out in to a combination of both objects and arrays. So you just need to follow the object's path and put the right keys in the right places to unlock that gate.

Extra slash comes in json array

I works on nestable drag and drop. When I drag and drop tiles It generate an array in textarea which is [{},{"id":267},{"id":266}]. Now When I post this array in action page then It posted [{},{\"id\":267},{\"id\":266}]. Why this extra slash comes in array. In action page I convert this array using json_decode. Now How I remove this slash from array or how I ignore this array that I successfully decode this array through jsondecode.
$(document).ready(function()
{
var updateOutput = function(e)
{
var list = e.length ? e : $(e.target),
output = list.data('output');
if (window.JSON) {
output.val(window.JSON.stringify(list.nestable('serialize')));//, null, 2));
} else {
output.val('JSON browser support required for this demo.');
}
};
// activate Nestable for list 1
$('#rightservices').nestable({
group: 1
})
.on('change', updateOutput);
// output initial serialised data
updateOutput($('#rightservices').data('output', $('#siteservices')));
//$('#nestable3').nestable();
});
Sounds like Magic Quotes is set on the server. This is an old, deprecated, feature of PHP where any request data would be automatically escaped with slashes regardless of what is was. You can follow the instructions listed here to disable them. From that page, any of these should work, depending on what you have access to:
In php.ini
This is the most efficient option, if you have access to php.ini.
; Magic quotes for incoming GET/POST/Cookie data.
magic_quotes_gpc = Off
In .htaccess
If you don't have access to php.ini:
php_flag magic_quotes_gpc Off
At runtime
This is inefficient, only use if you can't use the above settings.
<?php
if (get_magic_quotes_gpc()) {
$process = array(&$_GET, &$_POST, &$_COOKIE, &$_REQUEST);
while (list($key, $val) = each($process)) {
foreach ($val as $k => $v) {
unset($process[$key][$k]);
if (is_array($v)) {
$process[$key][stripslashes($k)] = $v;
$process[] = &$process[$key][stripslashes($k)];
} else {
$process[$key][stripslashes($k)] = stripslashes($v);
}
}
}
unset($process);
}
?>
The below will remove the first object in the array but doesn't really solve the real issue of why it is being added in the first place?
var arr = [{},{\"id\":267},{\"id\":266}];
arr.splice(0,1);

ajax, json, external url load data

i have a problem, help please
i have 2 sites and i want send data each other
first site :
var site_adres = $(location).attr('href');
var myArray = site_adres.split('/');
var site_adi_al = myArray[2];
$.getJSON('xxx.com/site/admin.php?site_adres='+ site_adi_al +'',
{},
function (data) {
$.each( data, function ( i, val ) {
var id=val['id'];
var site_adi=val['site_adi'];
$(".site_adi").append('<li>'+id+' >> <a href="'+site_adi+'"
target="_blank">'+site_adi+'</a></li>');
});
second site:
$site_adi = $_GET["site_adi"];
/* query */
$query = mysql_query("SELECT * FROM site WHERE site_adi = '$site_adi'");
if ( mysql_affected_rows() ){
$row = mysql_fetch_object($query);
$json = array(
"id" => $row->id,
"site_adi" => $row->site_adi
);
}else{
$json["hata"] = "Nothing !";
}
header("access-control-allow-origin: *");
echo json_encode($json);
result zero, what is wrong, help please
You have two basic problems (aside from the security issues explained in the comments on the question).
You are sending site_adres but reading $_GET["site_adi"]. You can't use different names for the same thing without explicitly writing code to link them somehow.
You are looping over data with $.each( data, function ( i, val ) { as if it was an array of objects, but your PHP is only sending a single object (which isn't in an array). You should be accessing the properties of data directly and not using each or val.
You should set up CORS on your webservers to allow them to fetch data from each other, since you're using php, i'm gonna assume you're using apache:
Header set Access-Control-Allow-Origin "*"
replace the * with the ip adress of your other website and vice versa.

Multiple keys query in IndexedDB (Similar to OR in sql)

I have store with multiEntry index on tags.
{ tags: [ 'tag1', 'tag2', 'tag3' ] }
And i have query that also list of tags.
[ 'tag2', 'tag1', 'tag4' ]
I need to get all records which contain one of tag in query (Similar to SQL OR statement).
Currently I cannot find any other solution except iterate over tags in query and search by the each tag in the store.
Is there any better solution?
Thank you.
You cannot retrieve all results with one query except with iteration. You can optimize the search result by opening a index from the lowest value to the highest:
IDBKeyRange.bound ('tag1', 'tag4');
Other Indexed-Db feature you can use is to open multiple queries and combine the result when the queries complete. This way would be much faster than the iteration.
IndexedDB has only range query as Deni Mf answered.
OR query is simply union of multiple queries. That may be OK.
If you want efficient query, you have to iterate the cursor and seek the cursor position as necessary. Using my library, it will be
tags = ['tag2', 'tag1', 'tag4'];
tags.sort();
iter = new ydn.db.KeyIterator('store name', 'tags', IDBKeyRange.bound(tags[0], tags[tags.length-1]);
keys = [];
i = 0;
var req = db.open(iter, function(cursor) {
if (tags.indexOf(cursor.indexKey()) >= 0) {
// we got the result
if (keys.indexOf(cursor.key()) == -1) { // remove duplicate
keys.push(cursor.key());
}
} else {
return tags[++i]; // jump to next index position.
}
);
req.done(function() {
db.list('store name', keys).done(function(results) {
console.log(results);
}
});
Notice that the algorithm has no false positive retrieval. Key only query is performed first, so that we don't waste on de-serilization. The results is retrieved only we get all the primary keys after removing deplicates.

Categories

Resources