How to Properly merge 2 JS Objects on a single property - javascript

I have 3 JSON files. Lets call them file1.json , file2.json and file3.json.
They all look very similar and are in this structure:
"orgItems":[...]
There is a top level orgItems property. Now what I'm struggling with is that these are 3 large files. Almost 20-30mb each. I want to concatenate all of these into 1 file.
Whats the best way to do this ? To be able to grab the orgItems object out of each of the 3 files and then have it all in one object in one file. So I just want one file combinedResponses.json and that should have
combinedResponses.json
"orgItems":[...file1.orgItems,...file2.orgItems,...file3.orgItems]

Take a look at JSONStream. It provides an easy way to stream json files (so you don't have to load it all into memory at once) and receive events for matching paths. You could pipe the input stream to an output stream that writes the concatenated file.
This is a bit outside my wheelhouse and I'm sure it could be improved, but I think something like this does what you want. You could tweak the JSONStream.parse argument to get only the objects you care about.
const JSONStream = require('JSONStream');
const fs = require('fs');
function appendData (sourceFile, outputStream) {
const file = fs.createReadStream(sourceFile);
const jsonStream = JSONStream.parse('*');
jsonStream.on('data', data => outputStream.write(JSON.stringify(data)));
file.pipe(jsonStream);
file.on('end', () => {
file.close();
});
}
const out = fs.createWriteStream('data/concatted.json', {autoClose: true});
appendData('data/data-1.json', out);
appendData('data/data-2.json', out);
appendData('data/data-3.json', out);

Related

How to insert a javascript file in another javascript at build time using comments

Is there something on npm or in VS Code or anywhere that can bundle or concatenate Javascript files based on comments, like:
function myBigLibrary(){
//#include util.js
//#include init.js
function privateFunc(){...}
function publicFunc(){
//#include somethingElse.js
...
}
return {
init, publicFunc, etc
}
}
Or something like that? I would think that such a thing is common when your javascript files get very large. All I can find are complicated things like webpack.
I'm looking for any equivalent solution that allows you to include arbitrary code in arbitrary positions in other code. I suppose that would cause trouble for intellisense, but an extension could handle that.
I'm not sure what you really means but if you means linking variables from other javascript files. probably this will help you
const { variable name } = require(the javascript file path)
example:
in index.js
const { blue } = require('./../js/blue.js)
console.log(blue)
Meanwhile in blue.js
const blue = "dumbass"
if this doesnt help you just ignore this
So here is a bare bones way to to what I wanted. I have been learning more about how you can do things with esbuild or other bundlers, but I didn't quite figure out something that fit my needs. And this is simpler and more flexible. It can work for any file type. You can do automatic updates when files change using nodemon instead of node to run this code.
const fs = require('fs')
/////////////////////////
const input = 'example.js'
const output = 'output.js'
const ext = '.js'
// looks for file with optional directory as: //== dir/file
const regex = /\/\/== *([\w-\/]+)/g
const fileContent = fs.readFileSync(input).toString()
// replace the comment references with the corresponding file content
const replacement = fileContent.replace(regex, (match, group)=>{
const comment = '//////// '+group+ext+' ////////\n\n'
const replace = fs.readFileSync(group+ext).toString()
return comment + replace
})
// write replacement to a file
fs.writeFileSync(output, replacement)

How to loop through a Cherrio inside an async function and populate an outside variable?

I need to create an API that web scraps GitHub's repos getting the following data:
File name;
File extension;
File size (bytes, kbytes, mbytes, etc);
File number of lines;
I'm using Node with TypeScript so, to get the most out of it, I decided to create an interface called FileInterface, that has the four attributes mentioned above.
And of course, the variable is an array of that interface:
let files: FileInterface[] = [];
Let's take my own repo to use as an example: https://github.com/raphaelalvarenga/git-hub-web-scraping
So far so good.
I'm already pointing to the HTML's files section with request-promise dependency and storing them in a Cheerio variable so I can traverse through the "tr" tags to create a loop. As you might think, those "tr" tags represent each files/folders inside of a "table" tag (if you inspect the page, it can easily be found). The loop will fill a temp variable called:
let tempFile: FileInterface;
And at the end of every cycle of the loop, the array will be populated:
files.push(tempFile);
In GitHub repo's initial page, we can find the file names and their extension. But the size and total of lines, we can't. They are found when clicking on them to redirect to the file page. Let's say we clicked in README.md:
Ok, now we can see README.md has 2 lines and 91 Bytes.
My problem is, since this will take a long time, it needs to be an async function. But I can't handle the loop in Cheerio content inside the async function.
Things that I've tried:
Using map and each methods to loop through it and push in the array files;
Using await before the loop. I knew this one wouldn't actually work since it's just a loop that doesn't return anything;
The last thing I tried and believed that would work is Promise. But TypeScript accuses Promises return the "Promise unknown" type and I'm not allowed to populate the result in files arrays, since the types "unknown" and "FilesInterface[]" are not equal.
Below I'll put the code I created so far. I'll upload the repo in case you want to download and test (the link is at the beginning of this post), but I need to warn that this code is in the branch "repo-request-bad-loop". It's not in the master. Don't forget because the master branch doesn't have any of this that I mentioned =)
I'm making a request in Insomnia to the route "/" and passing this object:
{
"action": "getRepoData",
"url": "https://github.com/raphaelalvarenga/git-hub-web-scraping"
}
index-controller.ts file:
As you can see, it calls the getRowData file, the problematic one. And here it is.
getRowData.ts file:
I will try to help you, although I do not know typescript. I redid the getRowData function a bit and now it works for me:
import cheerio from "cheerio";
import FileInterface from "../interfaces/file-interface";
import getFileRemainingData from "../routines/getFileRemaningData";
const getRowData = async (html: string): Promise<FileInterface[]> => {
const $ = cheerio.load(html);
const promises: any[] = $('.files .js-navigation-item').map(async (i: number, item: CheerioElement) => {
const tempFile: FileInterface = {name: "", extension: "", size: "", totalLines: ""};
const svgClasses = $(item).find(".icon > svg").attr("class");
const isFile = svgClasses?.split(" ")[1] === "octicon-file";
if (isFile) {
// Get the file name
const content: Cheerio = $(item).find("td.content a");
tempFile.name = content.text();
// Get the extension. In case the name is such as ".gitignore", the whole name will be considered
const [filename, extension] = tempFile.name.split(".");
tempFile.extension = filename === "" ? tempFile.name : extension;
// Get the total lines and the size. A new request to the file screen will be needed
const relativeLink = content.attr("href")
const FILEURL = `https://github.com${relativeLink}`;
const fileRemainingData: {totalLines: string, size: string} = await getFileRemainingData(FILEURL, tempFile);
tempFile.totalLines = fileRemainingData.totalLines;
tempFile.size = fileRemainingData.size;
} else {
// is not file
}
return tempFile;
}).get();
const files: FileInterface[] = await Promise.all(promises);
return files;
}
export default getRowData;

Return array with fast-csv in Node

I am attempting to parse a large file using the fast-csv library and return its values as an array to a config.js file. Please help as the value of countries in the config's model.exports section ends up being undefined.
Parser:
import csv from 'fast-csv';
export function getCountries() {
let countries = [];
csv.fromPath('./src/config/csv_configs/_country.csv')
.on('data',
function(data) {
countries.push(data);
})
.on('end', function () {
return countries;
});
}
Config:
import {getCountries} from '../tools/csv_parser';
let countryList = [];
module.exports = {
port: process.env.PORT || 8000,
token: '',
countries: getCountryList()
};
function getCountryList() {
if (countryList.length === 0) {
countryList = getCountries();
}
return countryList;
}
I understand this is due to me attempting to return a value from the anonymous function on(), however I do not know the proper approach.
You're correct that returning values from the callback in .on('end' is the source of your problem.
Streams are asynchronous. If you want to use this fast-csv library, you're going to need to return a promise from getCountries(). However, I'm assuming that's not what you want, since you're using the result in a config file, which is synchronous.
Either you need to read your csv synchronously, or you need to refactor the way your application works to be able to have your config be asynchronous. I'm assuming the second option isn't possible.
You probably want to look into using another CSV library that doesn't use streams, and is synchronous. Two examples from a quick Google search are:
https://www.npmjs.com/package/csv-load-sync
https://www.npmjs.com/package/csvsync
I haven't used either of these libraries personally, but it looks like they'd support what you're trying to do. I'm assuming your CSV file is small enough to all be stored in memory at once, if not, you're going to have to explore more complicated options.
As a side note, is there any specific reason that the data has to be in CSV format? It would seem to be much easier to store it in JSON format. JSON can be imported to your config file directly with require; no external libraries needed.

Trading RAM by CPU (performance issue)

I'm working with a program that deals with files, I can do several things like rename them, read the contents of them, etc.
Today I'm initializing it as follows:
return new Promise((resolve, reject) => {
glob("path/for/files/**/*", {
nodir: true
}, (error, files) => {
files = files.map((file) => {
// properties like full name, basename, extension, etc.
});
resolve(files);
});
});
So, i read the content of a specific directory, return all files within an array, and then use the Array.map to iterate over the array and change the paths for a object with properties.
Sometimes i work with 200.000 text files, so, this is becoming a problem because it is consuming too much RAM.
So, i want replace by a construction function with lazy loading.. but i never did that before... so i'm looking for a help hand.
That's my code:
class File {
constructor(path) {
this.path = path;
}
extension() {
return path.extname(this.path);
}
// etc
}
So, my main question is: should i only return the evaluation of the property, or should i replace it? Like this:
extension() {
this.extension = path.extname(this.path);
}
I understand this is a trade off.. i'm going to trade the memory by cpu usage.
Thank you.
If you want to reduce RAM usage, I suggest you store an extra meta-data file for each path, as follows:
Keep the paths in memory, or some of them, as necessary.
Save files properties to hard drive
files.forEach( (file) => {
// collect the properties you want for the file
// ...
var json = { path: file, extension: extension, .. }
// mark the metadata file so you can access it later, for example: put it in the same path with a suffix
var metaFile = path + '_meta.json';
fs.writeFile(metaFile, JSON.stringify(json), (err) => {
if (err) throw err;
});
});
Now all the meta data is on hard drive. This way, I believe, you trade memory for disk space and CPU calls.
If you wish to get properties for a file, just read and JSON.parse its corresponding meta data file.
There's no reason to trade CPU for space. Just walk the tree and process files as they're found. The space needed for walking the tree is proportional to the tree depth if it's done depth first. This is almost certainly has same overhead as just creating the list of paths in your existing code.
For directory walking, the node.js FAQ recommends node-findit. The documentation there is pretty clear. Your code will look something like:
var finder = require('findit')(root_directory);
var path = require('path');
var basenames = [];
finder.on('file', function (file, stat) {
basenames.push(path.basename(file));
// etc
}
Or you can wrap the captured values in an object if you like.
If you store only path property NodeJS class instance take for your example 200k * (path.length * 2 + 6) bytes memory.
If you want to use lazy loading for basenames, extenstions etc use lazy getters
class File {
constructor(path) {
this.path = path;
this._basename = null;
this._extname = null;
}
get extname() {
return this._extname || (this._extname = path.extname(this.path));
}
get basename() {
return this._basename || (this._basename = path.basename(this.path));
}
}

NodeJS & Gulp Streams & Vinyl File Objects- Gulp Wrapper for NPM package producing incorrect output

Goal
I am currently trying to write a Gulp wrapper for NPM Flat that can be easily used in Gulp tasks. I feel this would be useful to the Node community and also accomplish my goal. The repository is here for everyone to view , contribute to, play with and pull request. I am attempting to make flattened (using dot notation) copies of multiple JSON files. I then want to copy them to the same folder and just modify the file extension to go from *.json to *.flat.json.
My problem
The results I am getting back in my JSON files look like vinyl-files or byte code. For example, I expect output like
"views.login.usernamepassword.login.text": "Login", but I am getting something like {"0":123,"1":13,"2":10,"3":9,"4":34,"5":100,"6":105 ...etc
My approach
I am brand new to developing Gulp tasks and node modules, so definitely keep your eyes out for fundamentally wrong things.
The repository will be the most up to date code, but I'll also try to keep the question up to date with it too.
Gulp-Task File
var gulp = require('gulp'),
plugins = require('gulp-load-plugins')({camelize: true});
var gulpFlat = require('gulp-flat');
var gulpRename = require('gulp-rename');
var flatten = require('flat');
gulp.task('language:file:flatten', function () {
return gulp.src(gulp.files.lang_file_src)
.pipe(gulpFlat())
.pipe(gulpRename( function (path){
path.extname = '.flat.json'
}))
.pipe(gulp.dest("App/Languages"));
});
Node module's index.js (A.k.a what I hope becomes gulp-flat)
var through = require('through2');
var gutil = require('gulp-util');
var flatten = require('flat');
var PluginError = gutil.PluginError;
// consts
const PLUGIN_NAME = 'gulp-flat';
// plugin level function (dealing with files)
function flattenGulp() {
// creating a stream through which each file will pass
var stream = through.obj(function(file, enc, cb) {
if (file.isBuffer()) {
//FIXME: I believe this is the problem line!!
var flatJSON = new Buffer(JSON.stringify(
flatten(file.contents)));
file.contents = flatJSON;
}
if (file.isStream()) {
this.emit('error', new PluginError(PLUGIN_NAME, 'Streams not supported! NYI'));
return cb();
}
// make sure the file goes through the next gulp plugin
this.push(file);
// tell the stream engine that we are done with this file
cb();
});
// returning the file stream
return stream;
}
// exporting the plugin main function
module.exports = flattenGulp;
Resources
https://github.com/gulpjs/gulp/blob/master/docs/writing-a-plugin/README.md
https://github.com/gulpjs/gulp/blob/master/docs/writing-a-plugin/using-buffers.md
https://github.com/substack/stream-handbook
You are right about where the error is. The fix is simple. You just need to parse file.contents, since the flatten function operates on an object, not on a Buffer.
...
var flatJSON = new Buffer(JSON.stringify(
flatten(JSON.parse(file.contents))));
file.contents = flatJSON;
...
That should fix your problem.
And since you are new to the Gulp plugin thing, I hope you don't mind if I make a suggestion. You might want to consider giving your users the option to prettify the JSON output. To do so, just have your main function accept an options object, and then you can do something like this:
...
var flatJson = flatten(JSON.parse(file.contents));
var jsonString = JSON.stringify(flatJson, null, options.pretty ? 2 : null);
file.contents = new Buffer(jsonString);
...
You might find that the options object comes in useful for other things, if you plan to expand on your plugin in future.
Feel free to have a look at the repository for a plugin I wrote called gulp-transform. I am happy to answer any questions about it. (For example, I could give you some guidance on implementing the streaming-mode version of your plugin if you would like).
Update
I decided to take you up on your invitation for contributions. You can view my fork here and the issue I opened up here. You're welcome to use as much or as little as you like, and in case you really like it, I can always submit a pull request. Hopefully it gives you some ideas at least.
Thank you for getting this project going.

Categories

Resources