I've got a CSV file with 3 million+ rows.
The format is supposed to be like so:
date, name , num1, num2
e.g.
"2019-05-07, New york, 10, 3
2019-05-08, New york, 15, 5,
2019-05-09, New york, 12, 6"
and so on...
The problem is every 5,000 rows or so, the "Name" column will have commas in its value.
e.g.
2019-05-09, Denver, Colorado, 10, 9
My script incorrectly reads 4 columns and fails.
Some values in the name column even have 3 commas.
Note the Name column values are not enclosed in quotes, so that's why it's giving me the error.
Is there a way to detect these extra commas? I don't think there is, so I'm beginning to think this 3m+ row file is useless trying to parse.
To parse, you can split into an array, then use shift and pop for the peripheral fields. Finally, you can just join on what's left:
let line = '2019-05-09, Denver, Colorado, 10, 9';
let entries = line.split(',');
let parsed = {
date: entries.shift().trim(),
num2: entries.pop().trim(),
num1: entries.pop().trim(),
name: entries.join(',').trim()
}
console.log(parsed);
So, to answer your question: No, your csv file is not unreadable, FOR NOW. If columns can be appended in the future, and such columns suffer the same issue as "name", you're in trouble. It's probably wiser to push back on the developer of the file and get them to properly quote it. You would not be out of line.
Well, nothing is impossible per se... you can, for example, work backwards and look for the first column (delimited by the first comma), the last two columns (by looking for the last 2 commas) and treat everything in between as the name. But you'll need to implement your own parsing function as I doubt a library would deal with invalid CSV like the one you have.
It's not very efficient, but if the column in question is always cities and states you could always do a find/replace for any states in the file before running your script. (e.g. -Find ", Colorado" replace with " Colorado".
Related
I'm trying to take some text in the form:
ONLINE TRANSACTION
PERSON2 , MAILBOXES , VIA ONLINE - PYMT , FP 03/04/21 10 , 34130520496279000N
306.00
ONLINE TRANSACTION
PERSON3 , COM BULDING 1391 , VIA ONLINE - PYMT , FP 10/04/21 10 , 59190636993136000N
7,200.00
AUTOMATED CREDIT
PERSON4 , GLEN 14 , FP 18/04/21 2028 ,
00151146632BBHGBLK
1,675.00
ONLINE TRANSACTION
A COMPANY , INS , VIA ONLINE - PYMT , FP 17/04/21 10 , 13221513368328000N
673.36
The data has been derived from a PDF, via google docs and getTables so there are some random whitespace and new line chars in there which help to split each entry into what is in effect 4 elements:
TRANSACTION TYPE
PAYEE/CREDITOR
PAID IN
PAID OUT
Ideally I want to return a 2d array of each entry with each sub array split for each of the above elements
The regex I've come up with is /(.*)\s\n([\w\W]*?)\n(\n|[0-9,]+\.\d{2}\s\n)(\n|[0-9,]+\.\d{2})/g and seems to work fine using regex101.com but I can't figure out what to use in javascript to get the result. I've tried using split & match but only seem to get one result and thats not split up.
Thanks in advance for any help
EDIT1: Getting somewhere with this:
let regEx = /([^\s].*)\s\n([\w\W]*?)\n(\n|[0-9,]+\.\d{2}\s\n)(\n|[0-9,]+\.\d{2})/g;
let result = tableText.match(regEx)
let entries = result.map(result => {let retRes = result.split(regEx); retRes.pop(); retRes.shift(); return retRes;} );
Not entirely sure why I'm getting back a 6 element array (hence the pop and shift)
I have an array like so:
const testArray = [ 'blah', 'abctesttt', 'atestc', 'testttttt' ]
I would like to split the string once it reaches a certain character count, for example lets use 10 characters. Also, I would like the output to swap itself to be able to use within 10 characters. Please see the expected output below if this doesn't really make sense to you. Please assume that each item in the array will not be above 10 characters just for example purpose.
So once the testArray reaches 10 characters I would like the next item to be under a new variable maybe? Not sure if thats the best way of doing this.
Something like this maybe? Again this may be very inefficient, if so please feel free to use another method.
const testArray = [ 'blah', 'abctesttt', 'atestc', 'testttttt' ]
if ((testArray.join('\n')).length) >= 10 {
/* split the string into parts and store it under a variable maybe?
console.log((the_splitted_testArray).join('\n')); */
}
Expected output:
"blah
atestc" //instead of using "abctesttt" it would use "atestc" as it's the next element in the array and it also avoids reaching the 10 character limit, if adding "atestc" caused the character limit to go over 10, I would like it to check the next element and so on
"abctesttt" // it can't add the remaining "testttttt" since that would cause the character limit to be reached
"testttttt"
First of all, as you can't create a new variable out of nowhere at run time, you are probably going to use a "parent"-array, which then contains the actual strings with a length of 10 maximally.
For the grouping you probably have to design an algorithm yourself. My first idea for an algorithm is something like below. Probably not the best and most efficient way (as the description of "efficient" depends on your personal priorities), but feel free to optimise it yourself :)
Walk through $testArray[], sort all strings into a new two-dimensional array: $stringLength[$messagesWithSameLength[]]. Like array(1=>array('.','a'),2=>array('hi','##',...),...)
Now, always try to get as many strings together as possible. Start with one of the longest strings, calculate the remaining space and get a string suiting best into it. If none fits, start a new group.
Always try to use the space as good as possible
I need to get make a json file from a whitespace-delineated txt file.
However:
1. the whitespaces are inconsistent in length and
2. some of the data of each "column" is missing.
A single row looks like this in the txt file:
5653 Phrakhtaes Phrakhtaes 34.56717 33.02724 L LCTY GB 05 0 32 Asia/Nicosia 2014-09
Ultimately, this data will go onto Redis. But without some means of creating keys for each "column", I don't see how I can work with this data.
Please, I could really use the help!
Thanks in advance!
Simply just split where there are 2 or more spaces in between your data:
var line = "5653 Phrakhtaes Phrakhtaes 34.56717 33.02724 L LCTY GB 05 0 32 Asia/Nicosia 2014-09";
console.log(line.split(/ +/));
As far as data missing, I'd recommend you just check the length of the array, and < the number of expected results, you simply discard. The only other option is to loop through, and judge which one may be missing (Based on string type, if it's in integer, uppercase, etc...) if there are a variable number of spaces in between data points.
Overview:
I'm not a programmer but I managed to get some serious coding into a Gsheets to track my teams project, so we have multiple-variable dropdown menus and integration with google calendar to track projects development and all that.
Why I'm at stackoverflow:
I kind of lack the knowledge to start the code from the scratch, I usually find spare parts of code through forums on the internet and clue them together and it usually works surprisingly well, but this time I couldn't find much informtation.
What I need:
I have 5 cells, and we can put as below,
Date start - Date end - date code* - number** - Priority***
*script to add the date range to gcalendar
** & *** The number is an array that's based on the word written on the priority cell, for example: If priority is written Weekly them
the number colunm will show 7 on the cell to the left and them it
goes. (monthly = 30 and blablabla...)
So I'd like to know if someone could give a hand with a script that would work (at least in my head) as following:
If I set the priority to weekly, it will show 7 on the number colunm and them, every time the "Date end" has passed, it will automatically add 7 days to the "Date start" and "Date end" again.
That way I could keep the projects on a loop where I'll be able to track them constatly.
Thanks in advance for any insights provided,
ps: I've seen some posts about this on sql, but I have no idea also on how to take advantage of the proposals that were presented there.
Edit:
Spreadsheet picture
eDIT2:
Spreadsheet with a increment colunm
Pertinent to the data set and description, you probably do not need any VBA as the increment could be achieved by adding +1 to the reference pointing to previous cell. For example, assuming cell A1 is formatted as Date, enter in cell B1: =A1+1 , then in cell C1: =B1+1 and so on. The result should be as shown below
A B C
9/1/2017 9/2/2017 9/3/2017
It could be further extended with simple logic allowing do display incremented value only if the previous cell is not empty, like : =IF(A1,A1+1,"")
In your case, it could be cell F1 containing =IF(E1,E1+1,"").
FYI, the underlying value of Date is just an Integer value (Time is represented as decimal part), so the arithmetic operations could be applied.
More generic solution would be based on the Excel DATE() Worksheet formula as shown in sample shown below (adding 1 mo. to the date entered in cell A1):
=DATE(YEAR(A1), MONTH(A1)+1, DAY(A1))
In order to implement additional logic, you may consider using Excel Worksheet IF() statement like for example, cell B1 containing:
=A1+IF(C1="week",7,1)
A B C
9/1/2017 9/8/2017 week
so based on the IF() condition it will add either 7 days if C1 contains the word "week" or 1 day otherwise. It could be further extended with nested IF().
Hope this will help.
So I know how to format a string or integer like 2000 to 2K, but how do I reverse it?
I want to do something like:
var string = "$2K".replace("/* K with 000 and remove $ symbol in front of 2 */");
How do I start? I am not very good regular expressions, but I have been taking some more time out to learn them. If you can help, I certainly appreciate it. Is it possible to do the same thing for M for millions (adding 000000 at the end) or B for billions (adding 000000000 at the end)?
var string = "$2K".replace(/\$(\d+)K/, "$1000");
will give output as
2000
I'm going to take a different approach to this, as the best way to do this is to change your app to not lose the original numeric information. I recognize that this isn't always possible (for example, if you're scraping formatted values...), but it could be useful way to think about it for other users with similar question.
Instead of just storing the numeric values or the display values (and then trying to convert back to the numeric values later on), try to update your app to store both in the same object:
var value = {numeric: 2000, display: '2K'}
console.log(value.numeric); // 2000
console.log(value.display); // 2K
The example here is a bit simplified, but if you pass around your values like this, you don't need to convert back in the first place. It also allows you to have your formatted values change based on locale, currency, or rounding, and you don't lose the precision of your original values.