I am having a hard time managing the storage and access of a large dataset within a Ruby on Rails application. Here is my application in a nutshell: I am performing Dijkstra's algorithm as it pertains to a road network, and then displaying the nodes that it visits using the google maps API. I am using an open dataset of the US road network to construct the graph by iterating over two txt files given in the link, but I am having trouble storing this data in my app.
I am under the impression that a large dataset like this not an ActiveRecord object - I don't need to modify the contents of this data, rather be able to access it and cache it locally in a hash to perform ruby methods on it. I have tried a few things but I am running into trouble.
I figured that it would make most sense to parse the txt files and store the graph in yml format. I would then be able to load the graph into a DB as seed data, and grab the graph using Node.all, or something along those lines. Unfortunately, the yml file becomes too large for rails to handle. Running a Rake causes the system to run at 100% for infinity...
Next I figured, well since I don't need to modify the data, I can just create the graph every time the application loads as start of its "initialization." But I don't exactly know where to put this code, I need to run some methods, or at least a block of data. And then store it in some sort of global/session variable that I can access in all controllers/methods. I don't want to be passing this large dataset around, just have access to it from anywhere.
This is the way I am currently doing it, but it is just not acceptable. I am parsing the text files that creates the graph on a controller action, and hoping that it gets computing before the server times out.
Ideally, I would store the graph in a database that I could grab the entire contents to use locally. Or at least only require the parsing of the data once as the application loads and then I would be able to access it from different page views, etc.. I feel like this would be the most efficient, but I am running into hurdles at the moment.
Any ideas?
You're on the right path. There are a couple of ways to do this. One is, in your model class, outside of any method, set up constants like these examples:
MY_MAP = Hash[ActiveRecord::Base.connection.select_all('SELECT thingone, thingtwo from table').map{|one| [one['thingone'], one['thingtwo']]}]
RAW_DATA = `cat the_file` # However you read and parse your file
CA = State.find_by_name 'California'
NY = State.find_by_name 'New York'
These will get executed once in a production app: when the model's class is loaded. Another option: do this initialization in an initializer or other config file. See the config/initializers directory.
Related
I am writing my first web application with Javascript and WebGL. For now I am running the app on localhost from Apache. The app needs to work with data that is provided instantly. Until now I worked with AJAX calls that happen during runtime which doesn't work out for my purposes anymore. So instead of serving individual files from Server to Client when asked, I want the application to load all files from the Server to Client side at initialization time (I want this to happen automatically at the start so I don't have to add every new file as a url in the html index). I understand I should do this with Server Side scripting; probably with PHP since I have a Apache localhost? I have different folders which hold my necessary resources in a uniform dataformat (.txt, .png and .json). So what I want to do is, before the Javascript app starts, look through the folder and send one object per folder that holds filenames as keys bound to filedata. Is my intuition right that I need to do that with PHP? If yes, where do I start to tell my application what to do when (first start serving files with php, then start the javascript app)? How do I do this on localhost? Should I already think about extending my toolset (e.g. using nodeJS on ServerSide(locally for now))? If so what lightweight tools do you propose for this kind of work? I feel I am missing some design principles here.
EDIT:
Keep in mind that I don't want to specifically call a single file... I am already doing that. What I need is a script that automatically serves all the files of a certain folder on the server to the client side at init time of the app before the program logic of the actual application starts.
Your question is kind of broad so I'll try my best. Why does AJAX not work for real-time data but loading all the files once does? If you're working with real time data, why not look into a websocket or at the bare minimum, AJAX queries?
If you want to pass data from the server to the client, you will need to use a HTTP request no matter what. A GET request or POST request is necessary for the client to request data from the server and receive it as a response.
You could theoretically just pass the data from PHP straight to the view of the application (which is technically done through a GET request whenever a user requests data such as .php files from the server) but this isn't as flexible as if Javascript had access to the data. You can do some hacks and 'transfer' the data from the view to Javascript with some .value methods, but this isn't ideal and can be prone to some security holes. This also means data is only being passed once.
So what would need to happen is that the data would need to be processed upon initialization and then immediately transferred to the client by use of Javascript and HTTP requests.
So if you want Javascript to have access to the data and use it in variables or manipulate it further, then you'd need to use an HTTP request such as GET or POST which is called by Javascript. Otherwise, you need to immediately pass the data to the view upon initialization (through PHP), but this means you can't work with real-time data because the data is only being passed once when there is a page refresh.
Example of scandir():
<?php
//scandir() returns filenames, not data from files
$fileArray = scandir('datafolder/') //this is a relative path reference to the folder 'datafolder'
$finalArray = [];
foreach($fileArray as $filename){
tempArray = [];
$file = fopen('datafolder/' . $filename, 'r'); //im pretty sure scandir only retrieves the filenames and not the path, so you might need to append the filepath so your script knows where to look
$tempArray = fgetcsv($file, 1024); //temp array to hold contents of each iteration of foreach loop
array_push($finalArray, $tempArray); //this will store the data for later use
}
Or the data can be used however, depending on what it is. Say, if you need to combine the data from multiple .csv files, you can read each file and append it to a single array. If you want to read multiple distinct files and preserve the independence of each file, you can create multiple arrays and then pass back a single JSON encoded object that contains each file's data as a separate attribute of the object such as:
{
'dataOne': [0,1,2,3,4,5,6...],
'dataTwo': ['new', 'burger', 'milkshake'],
'dataThree': ['Mary', 'Joe', 'Pam', 'Eric']
}
Which can be created with a PHP associative array using one of the following methods:
//assuming $arrayOne is already assigned from reading a file and storing its contents within $arrayOne
$data['dataOne'] = $arrayOne;
// or
array_push($data['dataTwo'], $arrayTwo);
// or
array_push($data, [
'dataThree' => ['Mary', 'Joe', 'Pam', 'Eric']
]);
Then $data can simply be passed back which is a single array containing all the different sets of data, if each set needs to be distinct.
This may be a "stupid" question to ask, but I am working with a "a lot" of data for the first time.
What I want to do: Querying the World Bank API
Problem: The API is very unflexible when it comes to searching/filtering... I could query every country/indicator for it self, but I would generate a lot of calls. So I wanted to download all informations abourt a country or indicator at once and then sort them on the machine.
My Question: Where/How to store the data? Can I simply but it into an array, do I have to worry about size? Should I write to a temporary json file ? Or do you have another idea ?
Thanks for your time!
Example:
20 Countries, 15 Indicators
If I would query every country for itself I would generate 20*15 API calls, if I would call ALL countries for 1 indicator it would result in 15 API calls. I would get a lot of "junk" data :/
You can keep the data in RAM in an appropriate data structure (array or object) if the following are true:
The data is only needed temporarily (during one particular operation) or can easily be retrieved again if your server restarts.
If you have enough available RAM for your node.js process to store the data in RAM. In a typical server environment, there might be more than a GB of RAM available. I wouldn't recommend using all of that, but you could easily use 100MB of that for data storage.
Keeping it in RAM will likely make it faster and easier to interact with than storing it on disk. The data will, obviously, not be persistent across server restarts if it is in RAM.
If the data is needed long term and you only want to fetch it once and then have access to the data over and over again even if your server restarts of if the data is more than hundreds of MBs or if your server environment does not have a lot of RAM, then you will want to write the data to an appropriate database where it will persist and you can query it as needed.
If you don't know how large your data will be, you can write code to temporarily put it in an array/object and observe the memory usage of your node.js process after the data has been loaded.
I would suggest storing it in a nosql database, since you'll be working with JSON, and querying from there.
mongodb is very 'node friendly' - there's the native driver - https://github.com/mongodb/node-mongodb-native
or mongoose
Storing data from an external source you don't control brings with it the complexity of keeping the data in sync if the data happens to change. Without knowing your use case or the API it's hard to make recommendations. For example, are you sure you need the entire data set? Is there a way to filter down the data based on information you already have (user input, etc)?
I am using AngularJS and I have to import/export an array.
I could export the array object converting it into JSON object then using FileSave.js library to save the file locally.
Now I can't find any information about how to import this json file from my PC to my application, then converting it into an object to display the array.
Thanks
Client-side javscript is unable to access the local file system by design for security reasons. As far as I am aware, there are 4 possible solutions for you. I've listed them below in order of ease.
1) Create a variable in your program, and simply copy paste the contents of your json file into your js file as the value. This will take two seconds, but it can be really messy if your JSON file is large or if you need to use multiple json files.var localJSONFile = [literally copy-pasted JSON text]
2) Check out Brackets by Adobe. I just did some quick googling and found this page that shows how to access local files. Open that and do a ctrl+f > 'local' and you'll find it. This is my recommended approach, as it's fast and easy. You will have to switch your IDE, which if you are just starting out, then most IDEs (Brackets, Sublime, VSCode, Atom) will feel the same anyways.
3) Create a basic angular service to inject into your program with the sole purpose of storing copy-pasted JSON files as variables. This is ultimately the same as 1), but will help you make the files you are working in less cluttered and easier to manage. This is probably the best option if you don't want to switch IDEs and will have a couple JSON files you are working with.
4) Get a local server going. There are tons of different options. When I was in your position I went the node.js route. There is definitely a learning curve involved, as there is with learning to set up any server, but at least with node, you are still using javascript so you won't have to learn a new language. This is the recommended approach if you know you will need to have lots of different data files flowing back and forth between the project you are working on. If that is the case, you will ideally have a back-end developer joining you soon. If not, you can set up a server quickly by downloading node.js and npm (comes with it) and using npm from your command prompt to install something called express, and then express-generator. With express generator you can run an init command from your command line and it will build an entire fully functioning web server for you, including local folder structure, which you can instantiate with a quick command from your command prompt. Then you would just go to the file it provides for your routes and adjust it. Node.js CAN read your local file system, so you could set up a quick route that when hit, reads the file from your file system and sends it to the requester. That would let you move forward immediately. If you need to add a database later on, you will need to install a database locally, get the plugins from npm for that database (there are tons, so no worries there), and then update your route to read from the database instead.
This seems too easy, so forgive me if I'm oversimplifying:
$http.get('/myJsonFile.json').
success(function(data, status, headers, config) {
$scope.myJsonData = data;
});
Or, if your response headers aren't set up to serve application/json:
$http.get('/myJsonFile.json').
success(function(data, status, headers, config) {
$scope.myJsonData = JSON.parse(data);
});
I have a fairly large Application and I'm currently trying to find a way around having to pass Data from PHP (User Tokens for 3rd Party API's and such) through the DOM. Currently I use data-* attributes on a single element and parse the Data from that, but it's pretty messy.
I've considered just making the contents of the element encoded JSON with all the config in, which would greatly improve the structure and effectiveness, but at the same time storing sensitive information in the DOM isn't ideal or secure whatsoever.
Getting the data via AJAX is also not so feasible, as the Application requires this information all the time, on any page - so running an AJAX request on every page load before allowing user input or control will be a pain for users and add load to my server.
Something I've considered is having an initial request for information, storing it in the Cache/localStorage along with a checksum of the data, and include the checksum for the up-to-date data in the DOM. So on every page load it'll compare the checksums and if they are different (JavaScript has out-of-date data stored in Cache/localStorage), it'll send another request.
I'd rather not have to go down this route, and I'd like to know if there are any better methods that you can think of. I can't find any alternative methods in other questions/Google, so any help is appreciated.
You could also create a php file and put the header as type javascript. Request this file as a normal javascript file. <script src="config.js.php"></script> (considering the filename is config.js.php) You can structure your javascript code and simply assign values dynamically.
For security, especially if login is required, this file can only be returned once the user is logged in or something. Otherwise you simply return a blank file.
You could also just emit the json you need in your template and assign it to a javascript global.
This would be especially easy if you were using a templating system that supports inheritance like twig. You could then do something like this in the base template for your application:
<script>
MyApp = {};
MyApp.cfg = {{cfg | tojson | safe}};
</script>
where cfg is a php dictionary in the templating context. Those filters aren't twig specific, but there to give you an idea.
It wouldn't be safe if you were storing sensitive information, but it would be easier than storing the info in local storage,
I'm building a grails application and loading my backend by converting my csv tables into json files and rendering them to index.gsp for reading.
Most of the functionality of the dashboard is done, but there is a major flaw in my code as in, i'm loading JSON files of size 55 mb onto the browser everytime a selection is made. This is absolutely not recommended and there should be a middletier(or socket etc) or something, which takes the main json file and gives the browser exactly what is needed to show the data visualization, the size of which should be in kB for best performance
I am very new to this and was trying to resolve this via the javascript part of my code, but the problem lies with my groovy controller part and I must have a placeholder to store the json file and pull only relevant data on my browser to prevent a crash.
Any suggestions/approaches to this problem?
UPDATE :
So, after consultation with a javascript guy, I will have to use AJAX calls in both index.gsp and controller groovy part such that I pick only the relevant data in the browser and the remaining data lies in the controller such that every time a filer is changed, only the relevant data comes to the browser
I'm a beginner in AJAX, groovy and grails.
After searching for a while, it seems I can use AJAX driven selects in GSP and also the remotefunction()
Also, using filters plugin or some kind of "params" have to be used to solve this issue
Any suggestions/approached will be appreciated as to how to proceed in the same
If by loading the backend, you mean that you are creating objects in your database or memory, you can do this very easily through the BootStrap.groovy in the conf folder.
Look at the "Creating Test Data" part of this page for more information:
https://grails.org/Quick+Start
Hope this helps.