Web Scraping Using data-title and Without Using Classes or Ids - javascript

I am trying to scrape a website the issue is that the specific elements do not have Classes or Ids, but they do have data-tile. I need help referring to these when I am choosing an element to scrape.
Here is the html that I am scraping.
<div data-v-1234567z="">
<table data-v-1234567z="" align="center">
<thead data-v-1234567z="">
<tr data-v-1234567z="">
<th data-v-1234567z="">User</th>
<th data-v-1234567z="" style="cursor: pointer;">Money</th>
<th data-v-1234567z="" style="cursor: pointer;">Watch Time (minutes)</th>
</tr>
</thead>
<tbody data-v-1234567z="">
<tr data-v-1234567z="">
<td data-v-1234567z="" data-title="User" class="user-cell">
<span data-v-821a25a2="" data-v-1234567z="" class="mini-user-display">
<img data-v-821a25a2="" src="https://image.com" class="mini-user-profile-image" />
<span data-v-821a25a2="" class="mini-user-name">user1234</span>
</span>
</td>
<td data-v-1234567z="" data-title="Money">100,000</td>
<td data-v-1234567z="" data-title="WatchTime">678</td>
</tr>
</tbody>
</table>
</div>
I need to scrape the Money, WatchTime, and username.
Here is the code that I am using for the scraper.
async function pageFunction(context) {
const $ = context.jQuery;
const results = [];
$('tbody').each(function() {
results.push({
userName: $(this).find(".mini-user-name").text(),
watchTime: $(this).find("data-title-watchTime").text()
});
});
return results;
}
There are many issues with this code the userName actually does return the usernames the issue is that there is no break in between all the names and it's just one big blob.
The other bigger issue is that I can't get any data back from watchTime, this is because I can't figure out how to properly select the WatchTime data-title in JavaScript.
I have looked for a few hours and I can't figure it out.

To get an element by data-attribute in JQuery you just call it like you would in CSS, watchTime in your case would be called like this:
watchTime: $(this).find('[data-title="WatchTime"]').text()
As for your code coming out as one blob you are pushing objects to an array, until you do something to iterate over your array of objects and convert them to a readable format it will just be a blob of data. Without seeing what is calling this function and how you're handling the returned data it's impossible to provide much assistance with that.
EDIT:
Here's an example of how you might want to return your data using template literals for a more readable output.
const results = [];
$('tbody').each(function() {
results.push(
`Username: ${$(this).find(".mini-user-name").text()}
WatchTime: ${$(this).find('[data-title="WatchTime"]').text()}
Money: ${$(this).find('[data-title="Money"]').text()}`
);
});
return results;
This would give an output like:
Username: user1234
WatchTime: 678
Money: 100,000

Related

How can I split the following string into an array that I can use to populate a html table

I have a string that looks like:
var str = '{ "abc": {
"decline_reason": "Business rule switched off"
},
"def": {
"decline_reason": "No response by interface"
},
"ghi": {
"decline_reason": "Requested incorrect size" }';
I would like to split that string into an array that I can use to populate a table on a webpage. I intend to use the initial reference ('abc'), with the reason ('Business rule switched off') on row 1, initial reference ('def'), with the reason ('No response by interface') on row 2, etc...
I have tried regex to break it down, and I've managed to find one that removes quotes, but not to break the string down.
I intend to populate the table with code like:
<table id="declinesTable">
<tr>
<th onclick="sortTable(0)">Reference Code</th>
<th>Decline Reason</th>
</tr>
<tr id="lender1">
<td id="lender1"><script>document.getElementById("lender1").innerHTML = declines[0];</script>
</td>
<td id="declineReason1"><script>document.getElementById("declineReason1").innerHTML = declines[2];</script>
</td>
</tr>
</table>
skipping out the value "decline_reason" from the table.
Any suggestions?
Couple of things - your string is missing a final }. Not sure where you're getting the string from, but it's in JSON format, so use JSON.parse to get it into an object, then iterate over the object to do something with each individual nested object. I would strongly recommend using a library like jQuery to help you append it to the table. You can google and very quickly find out how to add jQuery to your project. See below.
function stringParse(str) {
const json = JSON.parse(str);
const html = Object.entries(json).reduce((h, [k, v]) =>
h += `<tr><td>${k}</td><td>${v.decline_reason}</td></tr>`
, "");
$('#declinesTable').append(html);
}
const str = '{ "abc": {"decline_reason": "Business rule switched off"},"def": {"decline_reason": "No response by interface"},"ghi": {"decline_reason": "Requested incorrect size"}}'
stringParse(str);
<table id="declinesTable">
<tr>
<th>Reference Code</th>
<th>Decline Reason</th>
</tr>
</table>
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>

Display array items in one td with each item on a new line

I have an array from a datatable populating a table in my Bootstrap modal.
When displayed in the modal it displays as the below:
This is my current jQuery to populate my table in my modal:
$('#selectedReportDataTable').on('click', 'button[name="deleteContentButton"]', function () {
var deleteRunData = selectedReportDataTable.row($(this).closest('tr')).data();
$('#deleteModal').modal('show');
$('span[name=selectedReport').text(reportSelectedLowerCased);
$('td[name=modalPeriod]').text(deleteRunData.period);
$('td[name=modalSpecParams]').text(deleteRunData.specialParams);
$('td[name=modalFreq]').text(deleteRunData.frequency);
$('td[name=modalTimeFrame]').text(deleteRunData.timeFrame);
$('td[name=modalTime]').text(deleteRunData.time);
$('td[name=modalRecipients]').text(deleteRunData.recipient);
$('#deleteModal').on('shown.bs.modal', function () {
$('#deleteModalNoButton').focus();
});
})
It's the last line:
$('td[name=modalRecipients]').text(deleteRunData.recipient);
that populating the email column
This is the code I have tried:
var abc = deleteRunData.recipient
var def = deleteRunData.recipient.toString().split(", ").join("<br/>");
var ghi = $('td[name=modalRecipients]').text();
var jkl = def.replace(/,/g, "\n")
console.log(abc)
console.log(def)
console.log(ghi)
console.log(jkl)
console.log(abc.join('\r\n'));
and this gives me the following:
If I replace:
$('td[name=modalRecipients]').text(deleteRunData.recipient);
with the following (as an example):
$('td[name=modalRecipients]').text(def.replace(/,/g, "\n"));
It looks like the below:
It's replaced the comma with a space, not what I was after. I want each entry on a new line - what am I doing wrong?
HTML just in case:
<table class="table" style="table-layout: fixed; width: 100%">
<tr>
<th class="modalTable" style="width: 50px">Period</th>
<th class="modalTable" style="width: 85px">Additional details</th>
<th class="modalTable" style="width: 55px">Frequency</th>
<th class="modalTable" style="width: 45px">Time frame</th>
<th class="modalTable" style="width: 25px">Time</th>
<th class="modalTable">Recipient(s)</th>
</tr>
<tr>
<td name="modalPeriod" class="modalTable"></td>
<td name="modalSpecParams" class="modalTable"></td>
<td name="modalFreq" class="modalTable"></td>
<td name="modalTimeFrame" class="modalTable"></td>
<td name="modalTime" class="modalTable"></td>
<td name="modalRecipients" class="modalTable" style="word-wrap: break-word"></td>
</tr>
</table>
God dam it. Soon as i hit submit the answer came instantly using the below code
for (var i = 0; i < deleteRunData.recipient.length; i++) {
$('td[name=modalRecipients]').append('<div>' + deleteRunData.recipient[i] + '</div>');
}
you should replace $('td[name=modalRecipients]').text(def.replace(/,/g, "\n")); with $('td[name=modalRecipients]').html(def.replace(/,/g, "<br \/>")
try this
edit:- rightt well i started this reply before the accepted answer came but i thought i would put it here in-case it other people run into this and need helps.
Original Answer
Totally understand the code blindness, especially with divs and css because this is the most fustrating and angry part of coding the backends! As I understand it, you are looking for multiple emails to display in the email column. So as an example, if there were two recipients, tony#tony.com and crisps#cristony.com, then we would expect
tony#tony.com
crisps#cristony.com
NOTE: NO COMMAS INBETNWNEN THE EMAILS.
When I come across this problem, normally I would write the following code in javascripts
for (var i = 0; i < deleteRunData.recipient.length; i++) {
$('td[name=modalRecipients]').append('<div>' + deleteRunData.recipient[i] +
'</div>');
}
Thsi works some of the time when the deleteRunData exists, if it does not then we have a problem!! Sometimes it does not exist because the people who coded the front ends who we are relying on (server guys), don't make this!! In the case of when deleteRunData does not exist, what I do is create an image of all possible combinations of emails with new lines!!
so for example, for your example i would make a jpeg image of the two emails on photoshops or paintshopro, then i would do
$('td[name=modalRecipients]').append('<img src="http://en.wikipedia.org/wiki/Potato_chip#/media/File:Potato-Chips.jpg" width="500" height="600">')
works for me.
just two extra things that i have come across after regianing my sight
why is tony#test.com receiving five emails about their evening call costs? I would have thought one would suffice?
2.jquery is known to be dangerous when mixed with css and php-sass. please make sure its the rite back end technology for your use case!
hope this helps

How to read a list of html tables in JavaScript

I have a list of HTML tables given by pandas data frame in the format of:
list_html =
[<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>score</th>
<th>id</th>
<th>name</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0.776959</td>
<td>grade</td>
<td>grade</td>
</tr>
<tr>
<th>1</th>
<td>0.414527</td>
<td>class</td>
<td>class</td>
</tr>, ... , ... ]
I am trying to visualize this data in an html page and could not do it. I do not have enough experience in web development. My goal is to use JavaScript to loop through each item the list and visualize them below each other in html. It would be great if anybody can help!
This is what I tried so far, its probably completely wrong:
var list_html = list_html // list of html codes as a javascript variable.
var arrayLength = analysis.length;
for (var i in list_html) {
document.getElementById("analysis_1").innerHTML = list_html[i];
}
Given a valid array of strings list_html (actually list_html is not a valid array of strings, since the markup in each entry is not wrapped in quotes) and a container in the DOM with id "analysis_1" it's simply a matter of:
var container = document.getElementById('analysis_1');
for (var i = 0; i < list_html.length; i++) {
container.innerHTML += list_html[i];
}
UPDATE:
well... in your scenario there is no need at all for a loop, you can simply inject a single string by joining the elements in the array:
document.getElementById('analysis_1').innerHTML = list_html.join('');
fast and simple! :)
using jquery's selectors :
Give the 'td' which contains the data a class name, eg: 'MyTd';
Select them all: $(.MyTd).text()
Done!

Formatting data before render it

I am displaying some data in the view, but I need to formatted first, I was doing something like
val.toFixed(2) and that is OK, it works but the problem is that val sometimes comes with letters, and toFixed(2) is not taking that into account so is not displaying the letters.
So I need something that takes into account letters and numbers, the letters don't have to change, only the numbers which comes like 234235.345345435, and obviously I need it like this 234235.34.
Here is some of the code I am using
<table>
<tr>
<th ng-repeat='header in headers'>{{header.th}}</th>
</tr>
<tr>
<td ng-repeat='data in headers'>
<div ng-repeat='inner in data.td'>
<span ng-repeat='(prop, val) in inner'>{{val.toFixed(2)}}</span>
</div>
</td>
</tr>
</table>
and in the controller
$scope.LoadMyJson = function() {
for (var s in myJson){
$scope.data.push(s);
if ($scope.headers.length < 1)
for (var prop in myJson[s]){
prop.data = [];
$scope.headers.push({th:prop, td: []});
}
}
for (var s in $scope.data){
for (var prop in $scope.headers){
var header = $scope.headers[prop].th;
var data = myJson[$scope.data[s]][header];
$scope.headers[prop].td.push(data);
console.log($scope.headers[prop].td);
}
}
};
and I prepared this Fiddle
the way it is right now, is displaying the table properly, but as you see, the table is missing the name, it is because of the toFixed method.
So, what can I do ?
Create a custom filter to use on your template.
<table>
<tr>
<th ng-repeat='header in headers'>{{header.th}}</th>
</tr>
<tr>
<td ng-repeat='data in headers'>
<div ng-repeat='inner in data.td'>
<span ng-repeat='(prop, val) in inner'>{{val|formatValue}}</span>
</div>
</td>
</tr>
</table>
angular.module('whatever').filter('formatValue', function () {
return function (value) {
if (isNaN(parseFloat(value))) {
return value;
}
return parseFloat(value).toFixed(2);
}
});
You can try this :
That is a clean way to render formated data in view using angularjs as MVC
frontend framework :
Create a filter in your angular application.
Include your filter in your index.html.
use your filter like this : {{somedata | filterName}}
That is a simple angular filter to solve your problem, hope it will help you :
angular.module('app')
.filter('formatHeader', function() {
return function(data) {
if(angular.isNumber(data)) {
return data.toFixed(2);
}
return data;
}
});
And us it like this :
<table>
<tr>
<th ng-repeat='header in headers'>{{header.th}}</th>
</tr>
<tr>
<td ng-repeat='data in headers'>
<div ng-repeat='inner in data.td'>
<span ng-repeat='(prop, val) in inner'>{{val | formatHeader}}</span>
</div>
</td>
</tr>
You can take a look about these references :
angular functions
filter doc.
angular tutorials

Knockout.js: Updating objects loaded with mapping plugin

I want to render a table containing a list of objects my server is sending me. I'm currently doing this:
<table>
<thead>
<tr>
<th>id</th>
<th>Name</th>
<th>Status</th>
</tr>
</thead>
<tbody data-bind="foreach: services">
<tr>
<td data-bind="text: id"></td>
<td data-bind="text: name"></td>
<td data-bind="text: status"></td>
</tr>
</tbody>
</table>
And the Knockout.js binding part:
var mappedData = komapping.fromJSON('{{{ services }}}');
ko.applyBindings({services: mappedData});
services is a variable containing JSON data and the whole page is rendered with handlebars. So far so good. I'm able to render the data received in the table.
Now the problem: I'd like to receive a notification which tells me that the status of a service has changed, and update the corresponding object within mappedData. The problem is that mappedData seems pretty opaque and I'm unable to retrieve an object and update it given its id.
Help appreciated!
Your mappedData variable at this point will be a knockout array with a bunch of objects that contain knockout observables.
So all you have to do is change the status observable in the correct object from the array.
function updateServiceStatus(id, status) {
var service = mappedData().filter(function(e) { return e.id() == id; });
if (service.length) {
service[0].status(status);
}
}
To get the object, you can write a helper function that will retrieve for you a service object. You could do something like this (assuming mappedData is an observableArray and id observable) :
function get_service_by_id(service_id){
for(var i=0;i<mappedData().length;i++){
if (mappedData()[i].id() === service_id){
return mappedData()[i];
}
}
return false;
}

Categories

Resources