I'm trying to figure out the best way to accomplish the following:
Download a large XML (1GB) file on daily basis from a third-party website
Convert that XML file to relational database on my server
Add functionality to search the database
For the first part, is this something that would need to be done manually, or could it be accomplished with a cron?
Most of the questions and answers related to XML and relational databases refer to Python or PHP. Could this be done with javascript/nodejs as well?
If this question is better suited for a different StackExchange forum, please let me know and I will move it there instead.
Below is a sample of the xml code:
<case-file>
<serial-number>123456789</serial-number>
<transaction-date>20150101</transaction-date>
<case-file-header>
<filing-date>20140101</filing-date>
</case-file-header>
<case-file-statements>
<case-file-statement>
<code>AQ123</code>
<text>Case file statement text</text>
</case-file-statement>
<case-file-statement>
<code>BC345</code>
<text>Case file statement text</text>
</case-file-statement>
</case-file-statements>
<classifications>
<classification>
<international-code-total-no>1</international-code-total-no>
<primary-code>025</primary-code>
</classification>
</classifications>
</case-file>
Here's some more information about how these files will be used:
All XML files will be in the same format. There are probably a few dozen elements within each record. The files are updated by a third party on a daily basis (and are available as zipped files on the third-party website). Each day's file represents new case files as well as updated case files.
The goal is to allow a user to search for information and organize those search results on the page (or in a generated pdf/excel file). For example, a user might want to see all case files that include a particular word within the <text> element. Or a user might want to see all case files that include primary code 025 (<primary-code> element) and that were filed after a particular date (<filing-date> element).
The only data entered into the database will be from the XML files--users won't be adding any of their own information to the database.
All steps could certainly be accomplished using node.js. There are modules available that will help you with each of these tasks:
node-cron: lets you easily set up cron tasks in your node program. Another option would be to set up a cron task on your operating system (lots of resources available for your favourite OS).
download: module to easily download files from a URL.
xml-stream: allows you to stream a file and register events that fire when the parser encounters certain XML elements. I have successfully used this module to parse KML files (granted they were significantly smaller than your files).
node-postgres: node client for PostgreSQL (I am sure there are clients for many other common RDBMS, PG is the only one I have used so far).
Most of these modules have pretty great examples that will get you started. Here's how you would probably set up the XML streaming part:
var XmlStream = require('xml-stream');
var xml = fs.createReadStream('path/to/file/on/disk'); // or stream directly from your online source
var xmlStream = new XmlStream(xml);
xmlStream.on('endElement case-file', function(element) {
// create and execute SQL query/queries here for this element
});
xmlStream.on('end', function() {
// done reading elements
// do further processing / query database, etc.
});
Are you sure you need to put the data in a relational database, or do you just want to search it in general?
There don't seem to be any actual relations in the data, so it might be simpler to put it in a document search index such as ElasticSearch.
Any automatic XML to JSON converter would probably produce suitable output. The large file size is an issue. This library, despite its summary saying "not streaming", is actually streaming if you inspect the source code, so it would work for you.
I had task with xml files as you wrote. This are principals I used:
All incoming files I stored as is in DB (XMLTYPE), because I need a source file info;
All incoming files parsed with XSL transformation. For example, I see that it is three entity here: fileInfo, fileCases, fileClassification. You can write XSL transformation to compile source file info in 3 entity types (in tags FileInfo, FileCases, FileClassification);
When you have output transformed XML you can make 3 procedures, that inserts data into DB (each entity in DB area).
Related
I'm coding a webpage that needs to read some data from different csv on a path depending on the country of the user.
the path is something like this:
./csv/m2-2022-10-25_13_45_55_es.csv
m2-2022-10-25_13_45_56_fr.csv
m2-2022-10-25_13_46_04_it.csv
etc
And those files will be replaced regularly, the only that we'll always have is the country code (es, fr, it, etc).
So, what I need is to list all the files on the path to an array, and loop through the array to find if the last characters of the filename are $countryCode + ".csv", and there run some code.
But I can't find how, all the solutions I find are using Node.js, but are there a solution using only Javascript (or jQuery)?
Regards!
You cannot use pure Javascript to do that, because if you wanted to search files in your computer only using javascript, it would be a huge security breach.
You must use node.js to open files but you can make an API to your nodejs file from your javascript and you can send as a response the content of your file.
Here some links that might help you :
FS : https://nodejs.org/api/fs.html
NodeJS api : https://medium.com/swlh/how-to-create-a-simple-restful-api-in-node-js-ae4bfddea158
You can check a similar question here:
Get list of filenames in folder with Javascript
You can't access to filesystem from the frontend, this it would be a huge security breach, because anyone could access to your filesystem tree.
You have to do a function in backend to build the array you want and send it to frontend.
If you create a function in backend file that returns the array of files in the folder, you can call it from the frontend via XMLHttpRequest or Fetch to get the array in frontend and be able to use in your js file.
I'm new to Excel Web Add-Ins and want to figure out if it's possible to make an add-in that can export a custom file.
I've looked around and all I find are Excel specific commands like Workbook.SaveAs() but I can't find anything on making custom export functions. I need to convert the file into XML but a specific XML setup and so, I could just work the data before I save it to XML. But again, can't find much of anything to suggest that this is supported.
How would I go about writing a file to disk from Excel that isn't just the Workbook?
There's no such API to support exporting custom file to disk. It seems we can have workaround to do this work, this workaround just works for excel online.
Please see this link:
How to create a file in memory for user to download, but not through server?
The closest thing there is for what you want to do is:
Office.context.document.getFileAsync(Office.FileType.Compressed, (result) => {
const file = result.value;
// do whatever ...
});
The file variable in this case contains the entire document in Office Open XML (OOXML) format as a byte array.
I need to store a file pairing colours and images for use in my JavaScript. I would have liked to use a simple CSV file and Papa Parse, but PP requires either text or a File object as input, and I can find no way of opening a File object, nor of reading the text from the CSV file. Surely my code should be allowed to read files that reside under the web site, not randomly among the file system?
My alternative is to have the end user, non-technical, edit a JSON file that is parsed by my code.
Am I wrong, or is this the case? Then maybe I should build an editor for the JSON file that simplifies the data editing for the end user.
All the file has to store is colour/image name pairs.
Have you tried using something like this? Are you getting an error? Sorry, I cannot leave comments yet.
StreamWriter _testData = new StreamWriter(Server.MapPath("~/data.txt"), true);
_testData.WriteLine(TextBox1.Text); // Write the file.
_testData.Flush();
_testData.Close(); // Close the instance of StreamWriter.
_testData.Dispose(); // Dispose from memory.
I've been looking around for quite some time now and I cant seem to find a way to get a list of subfolders from a specific directory.
An example would be, if I'm at www.mysite.com/projects and inside projects there are several folders containing individual project files.
the reason I want to do this is I was going to make a script that would add new project's names to a menu using the sub folder names.
Am I missing something ? Is this even possible with JQuery or JavaScript ?
I've gone as far as getting pathnames and locations and also had a look at ActiveXObjects but cant get anything to work on either my PC or on the server.
Any help would be appreciated.
There is no such thing as a directory in HTTP. Only resources.
Some of those resources might be an HTML document that lists some other resources (which are in a particular directory on a file system on a computer running the HTTP server). Most HTTP servers will generate such documents for you automatically.
You need to have your server generate a suitable response for a suitable request. Then use (since you mention jQuery) the ajax() method to make that request.
Then you need to parse the response. You can either use the default directory index page and then parse the HTML returned, or you can write a server side program to generate the data in a nicer format (such as JSON).
That said…
the reason I want to do this is I was going to make a script that would add new project's names to a menu using the sub folder names.
You would almost certainly be better off doing that on the server. You'll get more reliable, faster and search engine friendly results.
ActiveX is a technology enabling JScript (the Microsoft implementation of JavaScript) to have more access to the clients computer and it only works on Internet Explorer.
Folders on the server are not like folders in a filesystem. Any folder/subfolder has the potential to contain an index.html which outputs some text (not necessarily the list of subfolders it contains).
Also most webserver configurations have an active options of not showing the subfolder list even if there is not index.html present.
What you can do is place an index.php file in that folder with the following code:
<?php
$directories = scandir('.');
header('Content-Type: application/json');
echo json_encode($directories);
And you can receive this content as such:
$.getJSON('http://domain.com/path/to/folder/', function(directories) {
do_something(directories);//
});
I hope the following isn't too tricky.
I have a simple html button. Now I want to open a filechooser as soon as a user clicks on this button.
I do this like the following:
$('.button').click(function()
{
$('<input type="file"/>').attr('value');
});
This opens a filechooser, but I want this file-chooser to only show files on the server, not on the client. I've searched the net but couldn't find an adequate solution so far.
Any proposals are welcome :)
Impossible, sorry. You'd need to use server side code to make a tool that allows the end user to browse the server's files.
The file input is used for the end user to choose file(s) on their machine. It has no knowledge of the server's files.
It's not tricky, but you can't use Input tag for it. The steps are:
Create a module to traverse the directory on your server and output is a JSON format in whatever server implementation that you choose
Create a REST endpoint to give the browser the JSON output from step #1
Use AJAX to call this REST webservice and get the directory listings
Use Tree Widget to basically build the file structure based on JSON (I am sure if you look, one is probably there already for you to use)
There's no simple way to do it. If you use jQuery UI you can use a plugin like this:
http://gusc.lv/jquery/gcmedia.html
With a server-side scripts that outputs a list of the files you want to make browseable.