Given the below source text, using javascript, I need to match the below excerpt.
My Regex (what I have so far):
subject.match(/\s\([A-Z]{3}\)[\w\s]+\([A-Z]{3}\)[\s\S]*?\([A-Z]\)/)
Excerpt (match I need to get):
Atlanta (ATL) to Charlotte (CLT) — Wed, Dec 17
American Airlines Inc. 658
Dep: 5:50 am
Arr: 6:57 am
1h 7m
Airbus A321
Economy (L)
Source (group of text to get match from, taken from textarea):
Atlanta (ATL) to Cancun (CUN) — Wed, Dec 17
Long layover
Atlanta (ATL) to Charlotte (CLT) — Wed, Dec 17
American Airlines Inc. 658
Dep: 5:50 am
Arr: 6:57 am
1h 7m
Airbus A321
Economy (L)
OPERATED BY US AIRWAYS
Layover in CLT
2h 33m
Charlotte (CLT) to Cancun (CUN) — Wed, Dec 17
American Airlines Inc. 883
Dep: 9:30 am
Arr: 11:26 am
2h 56m
Boeing 767
Economy (L)
Food for Purchase
OPERATED BY US AIRWAYS
Cancun (CUN) to Atlanta (ATL) — Wed, Dec 24
Long layover
Cancun (CUN) to Miami (MIA) — Wed, Dec 24
American Airlines Inc. 1157
Dep: 12:01 pm
Arr: 2:40 pm
1h 39m
Boeing 737
Economy (G)
Layover in MIA
3h 40m
Miami (MIA) to Atlanta (ATL) — Wed, Dec 24
American Airlines Inc. 349
Dep: 6:20 pm
Arr: 8:25 pm
2h 5m
Boeing 737
Economy (G)
My problem: my regex matches from the wrong starting point when certain lines are duplicated in the source text. See screenshot below taken from RegexBuddy's test panel for a better explanation.
How can I change my regex to match starting at the point indicated?
I solved that problem with this:
subject.match(/\s\([A-Z]{3}\)[\w\s]+\([A-Z]{3}\).*\n(?:.{3,}\n)*.*\([A-Z]\)/)
Just completed the first line with non newline elements (0 or more), then matched a newline (just 1), and then matched all lines with length of 3 or more, untill last (had to fill it too).
PD: There's a non capturing group there, harmless.
One of the problems of your regex is that you allowed anything, including the newlines, before the second right parenthesis. That's causing the ending match to the string "(L)". If you can include in your requirements that at least three lines of text are needed to start a match, then the following may work for you:
subject.match(/\s\([A-Z]{3}\)[^\r\n]+\([A-Z]{3}\)([^\r\n]+[\r\n]){3,}/);
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 20 days ago.
Improve this question
The US Treasury posts results of its auctions on the website linked below:
https://www.treasurydirect.gov/auctions/auction-query/
What I would like to do is to select all the columns from the "Show / Hide Columns" button, show 1000 rows in the "Show rows" from the list box in the bottom right corner and download a CSV file through the script that runs after clicking "CSV" button. I will write a for loop in order to perform the task 10 times so I can download all the data at once every time.
I read on the Internet that GET() and POST() functions from the 'httr' package may be beneficial to perform this task, but I have completely no idea where to start and any guidance will be appreciated.
Query result in Chrome Developer
My experience in web-related activities in R is limited to the 'rvest' package and web scrapping html data from websites. I tried adding arguments into POST() function, but it resulted with nothing.
With the API link that I found in the network section in the developer tools. Scraping the first 1000 rows. Inspect the embedded URL and loop to gather all observations.
library(tidyverse)
library(httr2)
"https://www.treasurydirect.gov/TA_WS/securities/jqsearch?format=json&callback=jQuery360005083887372902929_1675098529599&filterscount=0&groupscount=0&pagenum=0&pagesize=1000&recordstartindex=0&recordendindex=1000" %>%
request() %>%
req_perform() %>%
resp_body_json(simplifyVector = TRUE) %>%
pluck("securityList") %>%
as_tibble() %>%
type_convert()
Output:
# A tibble: 1,000 × 118
cusip issueDate secur…¹ secur…² maturityDate inter…³ refCp…⁴ refCp…⁵ announcementDate auctionDate aucti…⁶
<chr> <dttm> <chr> <chr> <dttm> <dbl> <dbl> <dbl> <dttm> <dttm> <dbl>
1 912796Y… 2023-02-02 00:00:00 Bill 13-Week 2023-05-04 00:00:00 NA NA NA 2023-01-26 00:00:00 2023-01-30 00:00:00 2023
2 912796Y… 2023-02-02 00:00:00 Bill 26-Week 2023-08-03 00:00:00 NA NA NA 2023-01-26 00:00:00 2023-01-30 00:00:00 2023
3 912796Y… 2023-01-31 00:00:00 Bill 4-Week 2023-02-28 00:00:00 NA NA NA 2023-01-24 00:00:00 2023-01-26 00:00:00 2023
4 912796Z… 2023-01-31 00:00:00 Bill 8-Week 2023-03-28 00:00:00 NA NA NA 2023-01-24 00:00:00 2023-01-26 00:00:00 2023
5 91282CG… 2023-01-31 00:00:00 Note 7-Year 2030-01-31 00:00:00 3.5 NA NA 2023-01-19 00:00:00 2023-01-26 00:00:00 2023
6 912797F… 2023-01-31 00:00:00 Bill 17-Week 2023-05-30 00:00:00 NA NA NA 2023-01-24 00:00:00 2023-01-25 00:00:00 2023
7 91282CG… 2023-01-31 00:00:00 Note 2-Year 2025-01-31 00:00:00 NA NA NA 2023-01-19 00:00:00 2023-01-25 00:00:00 2023
8 91282CG… 2023-01-31 00:00:00 Note 5-Year 2028-01-31 00:00:00 3.5 NA NA 2023-01-19 00:00:00 2023-01-25 00:00:00 2023
9 91282CG… 2023-01-31 00:00:00 Note 2-Year 2025-01-31 00:00:00 4.12 NA NA 2023-01-19 00:00:00 2023-01-24 00:00:00 2023
10 912796Z… 2023-01-26 00:00:00 Bill 52-Week 2024-01-25 00:00:00 NA NA NA 2023-01-19 00:00:00 2023-01-24 00:00:00 2023
# … with 990 more rows, 107 more variables: datedDate <dttm>, accruedInterestPer1000 <dbl>, accruedInterestPer100 <dbl>,
# adjustedAccruedInterestPer1000 <dbl>, adjustedPrice <dbl>, allocationPercentage <dbl>, allocationPercentageDecimals <dbl>,
# announcedCusip <chr>, auctionFormat <chr>, averageMedianDiscountRate <dbl>, averageMedianInvestmentRate <lgl>,
# averageMedianPrice <lgl>, averageMedianDiscountMargin <dbl>, averageMedianYield <dbl>, backDated <chr>, backDatedDate <dttm>,
# bidToCoverRatio <dbl>, callDate <lgl>, callable <chr>, calledDate <lgl>, cashManagementBillCMB <chr>, closingTimeCompetitive <time>,
# closingTimeNoncompetitive <time>, competitiveAccepted <dbl>, competitiveBidDecimals <dbl>, competitiveTendered <dbl>,
# competitiveTendersAccepted <chr>, corpusCusip <chr>, cpiBaseReferencePeriod <chr>, currentlyOutstanding <dbl>, …
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
Javascript allows you to see what time it is in another timezone if you specify the IANA given name of that timezone. For example:
let strTime = new Date().toLocaleString("en-US", {timeZone: "America/Chicago"});
console.log(strTime);
Below you can see that IANA provides multiple names within each general timezone:
America/New_York Eastern (most areas)
America/Detroit Eastern - MI (most areas)
America/Kentucky/Louisville Eastern - KY (Louisville area)
America/Kentucky/Monticello Eastern - KY (Wayne)
America/Indiana/Indianapolis Eastern - IN (most areas)
America/Indiana/Vincennes Eastern - IN (Da, Du, K, Mn)
America/Indiana/Winamac Eastern - IN (Pulaski)
America/Indiana/Marengo Eastern - IN (Crawford)
America/Indiana/Petersburg Eastern - IN (Pike)
America/Indiana/Vevay Eastern - IN (Switzerland)
America/Chicago Central (most areas)
America/Indiana/Tell_City Central - IN (Perry)
America/Indiana/Knox Central - IN (Starke)
America/Menominee Central - MI (Wisconsin border)
America/North_Dakota/Center Central - ND (Oliver)
America/North_Dakota/New_Salem Central - ND (Morton rural)
America/North_Dakota/Beulah Central - ND (Mercer)
America/Denver Mountain (most areas)
America/Boise Mountain - ID (south); OR (east)
America/Phoenix MST - Arizona (except Navajo)
America/Los_Angeles Pacific
America/Anchorage Alaska (most areas)
America/Juneau Alaska - Juneau area
America/Sitka Alaska - Sitka area
America/Metlakatla Alaska - Annette Island
America/Yakutat Alaska - Yakutat
America/Nome Alaska (west)
America/Adak Aleutian Islands
Pacific/Honolulu Hawaii
Why is that necessary?
For example, both America/Detroit and America/New_York are (generally) in the Eastern Time Zone. Why don't these two locations share a single IANA timezone name?
Are there occations of the year where the time in New York is different from that of Detroit?
If not, then why allow more timezone names than the exact number of variances?
I'll use your example:
For example, both America/Detroit and America/New_York are in the Eastern Time Zone. Why don't these two locations share a single timezone name?
In the TZDB, the Zone entry for America/New_York looks like this:
# Zone NAME GMTOFF RULES FORMAT [UNTIL]
Zone America/New_York -4:56:02 - LMT 1883 Nov 18 12:03:58
-5:00 US E%sT 1920
-5:00 NYC E%sT 1942
-5:00 US E%sT 1946
-5:00 NYC E%sT 1967
-5:00 US E%sT
While the Zone entry for America/Detroit looks like this:
# Zone NAME GMTOFF RULES FORMAT [UNTIL]
Zone America/Detroit -5:32:11 - LMT 1905
-6:00 - CST 1915 May 15 2:00
-5:00 - EST 1942
-5:00 US E%sT 1946
-5:00 Detroit E%sT 1973
-5:00 US E%sT 1975
-5:00 - EST 1975 Apr 27 2:00
-5:00 US E%sT
To fully decipher this, one also needs the Rule entries for US, NYC, and Detroit (which I won't copy/paste here, but you can follow the links).
As you can see, Detroit has had variations from New York, the last of which was in 1975 when Detroit started daylight saving time slightly later than most of the Eastern time zone (Apr 27 shown here vs Feb 23rd given by Rule US).
Since then however, they have been the same. The TZDB rules require a unique zone for areas that have agreed since 1970, and these areas have deviations in 1973 and 1975, thus they require unique zone identifiers.
One can see this difference in JavaScript like so:
var d = new Date("1975-03-01T00:00:00.000Z"); // Midnight UTC on March 1st
d.toLocaleString("en-US", {timeZone: "America/New_York"}) //=> "2/28/1975, 8:00:00 PM"
d.toLocaleString("en-US", {timeZone: "America/Detroit"}) //=> "2/28/1975, 7:00:00 PM"
Of course, if in your application, you never deal with dates going back that far, then you can just use America/New_York to represent the US Eastern time zone, and omit America/Detroit (and a few others) - but this is entirely your decision to make.
You may also be interested in reading the Theory file with in the tzdb itself, which explains the concepts and principles of the time zone database in a lot more detail.
Im looking for someone to point me in the right direction to solve a small project im working on using javascript. The idea is i would like the user to be able to input some raw data (which has been copied and pasted) from a website into a form box or input of some sort on day one and then again on day two etc etc.
What i would like the JS to do is compare the two sets of data and return any changes. For example
Day One Raw Data: (copy and pasted from a website)
Welcome
Home
Contact
Info
A
Andy 29 5'9 $3000 low
B
Betty 19 4'8 $2800 low
Bella 23 5'2 £4300 medium
C
Charles 43 5'3 $5000 high
Your local date/time is Thu Jan 11 2018 20:58:14 GMT+0000 (GMT Standard Time).
Current server date/time is 11-01-2018 | 21:58
Logout
Day Two Raw Data: (copy and pasted from a website)
Welcome
Home
Contact
Info
A
Andy 29 5'9 $3200 low
B
Betty 19 4'8 $2900 low
Bella 23 5'2 £3900 high
C
Charles 43 5'3 $7000 high
Carrie 18 5'8 $1000 medium
Your local date/time is Thu Jan 11 2018 20:58:14 GMT+0000 (GMT Standard Time).
Current server date/time is 11-01-2018 | 21:58
Logout
The only bit of data im looking to compare is the name + information lines
Andy 29 5'9 $3200 low
for example. The rest of the raw data is just noise which should always be the same, the links on the top of the page for example and the footer at the bottom also including the A, B,C etc which are alphabet links.
What i would like the outcome to be is something like the following:
Results: (printed to page)
Andy 29 5'9 $3200 low --- (+ $200)
Betty 19 4'8 $2900 low --- (+ $100)
Bella 23 5'2 £3900 high --- (- $400 medium)
Charles 43 5'3 $7000 high --- (+ $2000)
Carrie 18 5'8 $1000 medium --- (**New Entry**)
How the results are displayed and the actually figures are irrelevant. Im looking for suggestions for methods to actually achieving this kind of data comparisons where i ignore certain parts of the raw input and compare those that are of importance. Report back with the new and removed entries, changes to duplicate entries. The only data that will ever change is the amount of people in the raw data the headers, footers and alphabet tags will always be there.
Hopefully ive explained well enough to get pointed in the right direction. Thanks for any help in advance.
Ok this is messy (its late) but this will do what you want I think...
There is huge room for cleaning this up so take this as a steer in the right direction. The key is you need regex to analyse the strings. Then there's a fair amount of manipulation to compare the results.
<script>
var dayOne = `Welcome
Home
Contact
Info
A
Andy 29 5'9 $3000 low
B
Betty 19 4'8 $2800 low
Bella 23 5'2 £4300 medium
C
Charles 43 5'3 $5000 high
Your local date/time is Thu Jan 11 2018 20:58:14 GMT+0000 (GMT Standard Time).
Current server date/time is 11-01-2018 | 21:58
Logout `;
var dayTwo = `
Welcome
Home
Contact
Info
A
Andy 29 5'9 $3200 low
B
Betty 19 4'8 $2900 low
Bella 23 5'2 £3900 high
C
Charles 43 5'3 $7000 high
Carrie 18 5'8 $1000 medium
Your local date/time is Thu Jan 11 2018 20:58:14 GMT+0000 (GMT Standard Time).
Current server date/time is 11-01-2018 | 21:58
Logout `;
/**
* Converts an array to an object with keys for later comparison
*/
function convertNamesToKeys(arr){
var obj = {}
for(var i=0, j=arr.length; i<j; i+=1){
var name = arr[i].substring(0,arr[i].indexOf(' '));
obj[name] = arr[i];
}
return obj;
}
/**
* Count object length
*/
function getObjectLength(obj) {
var length = 0;
for( var key in obj ) {
if( obj.hasOwnProperty(key) ) {
length+=1;
}
}
return length;
};
/**
* Compares two objects for differences in values
* retains objects with different keys
*/
function compareObjectValue(primaryObject, secondaryObject){
for(var name in primaryObject){
if( primaryObject.hasOwnProperty(name)
&& name in secondaryObject){
if(primaryObject[name] === secondaryObject[name]){
delete primaryObject[name];
}
}
}
//This is your final array which should contain just unique values between the two days
console.log(primaryObject);
}
//split the large string into lines for manageability and simplicity of regex
var dayOneArray = dayOne.match(/[^\r\n]+/g);
var dayTwoArray = dayTwo.match(/[^\r\n]+/g);
//discard any lines which are noise
var regex = /^[a-z\s0-9']+(\$|£)[0-9\sa-z]+$/i
var dayOneFiltered = dayOneArray.filter(line => regex.test(line));
var dayTwoFiltered = dayTwoArray.filter(line => regex.test(line));
//convert the arrays into objects using name as key for easy comparison
var dayOneConverted = convertNamesToKeys(dayOneFiltered);
var dayTwoConverted = convertNamesToKeys(dayTwoFiltered);
//Determine which of the two objects is the larger and loop that one
//We will unset keys which have values that are the same and leave keys
//in the larger array which must be unique - not sure if you want that?
if( getObjectLength(dayOneConverted) > getObjectLength(dayTwoConverted)){
compareObjectValue(dayOneConverted, dayTwoConverted)
}
else {
compareObjectValue(dayTwoConverted, dayOneConverted);
}
</script>
I have time series data that associates a measurement to the time. Suppose that it is an app where a user enters their height whenever they want. Based on past behavior, I not only what to predict what their next height measurement is, but I also want to predict when the measurement will be entered.
Sample data for a single person:
Date | Measurement
-------------|------------
Nov 8, 2014 | 1.42 m
Nov 23, 2014 | 1.43 m
Mar 8, 2015 | 1.48 m
Jun 15, 2015 | 1.52 m
Dec 18, 2015 | 1.52 m
Mar 1, 2016 | 1.59 m
Nov 8, 2016 | 1.60 m
What I want to predict is the next data point in this series. For example, it might be (Dec 8, 2016, 1.61 m).
My initial thoughts have been to make two separate models, one that is simply the time data with x values being indices. For example
0 | Nov 8, 2014
1 | Nov 23, 2014
2 | Mar 8, 2015
3 | Jun 15, 2015
4 | Dec 18, 2015
5 | Mar 1, 2016
6 | Nov 8, 2016
(where the dates have been converted to minutes since 1970 or something).
Use this model to predict the next time point, then the original model to predict, at that time point, what will be the measurement.
In terms of algorithms to use I was thinking to use a Kalman filter for both models.
My question is that I feel like I am missing something, or possibly over complicating this problem. Does anyone have an idea for an alternative solution?
I will be implementing in javascript with hopefully no external libraries.
How to estimate local time using IANA (Internet Assigned Numbers Authority) data files?
On this website i find the following data:
https://www.timeanddate.com/time/zone/uk/london
2015 Sun, 29 Mar, 01:00GMT → BST+1 hour (DST start)UTC+1h
Sun, 25 Oct, 02:00BST → GMT-1 hour (DST end)UTC
2016 Sun, 27 Mar, 01:00GMT → BST+1 hour (DST start)UTC+1h
Sun, 30 Oct, 02:00BST → GMT-1 hour (DST end)UTC
2017 Sun, 26 Mar, 01:00GMT → BST+1 hour (DST start)UTC+1h
Sun, 29 Oct, 02:00BST → GMT-1 hour (DST end)UTC
2018 Sun, 25 Mar, 01:00GMT → BST+1 hour (DST start)UTC+1h
Sun, 28 Oct, 02:00BST → GMT-1 hour (DST end)UTC
2019 Sun, 31 Mar, 01:00GMT → BST+1 hour (DST start)UTC+1h
Sun, 27 Oct, 02:00BST → GMT-1 hour (DST end)UTC
As you can see, the Europe/London time change rule is different every year : 2018 rule applies from 25th March on 2019 from 31st March, etc..
But i can not find this information on IANA data distribution.
https://www.iana.org/time-zones
From tzdata2016h.tar.gz, extracted file europe:
# Zone NAME GMTOFF RULES FORMAT [UNTIL]
Zone Europe/London -0:01:15 - LMT 1847 Dec 1 0:00s
0:00 GB-Eire %s 1968 Oct 27
1:00 - BST 1971 Oct 31 2:00u
0:00 GB-Eire %s 1996
0:00 EU GMT/BST
Maybe i do not understand how to use IANA data? How to extract IANA datafile?
You first look at the Zone entry for Europe/London:
# Zone NAME GMTOFF RULES FORMAT [UNTIL]
Zone Europe/London -0:01:15 - LMT 1847 Dec 1 0:00s
0:00 GB-Eire %s 1968 Oct 27
1:00 - BST 1971 Oct 31 2:00u
0:00 GB-Eire %s 1996
0:00 EU GMT/BST
Each row ends in an [UNTIL] date, except for the last one. When there is no [UNTIL] date, infinitely far in the future is implied. So London is currently goverened by the last row in the table above which says that the UTC offset is 0:00, with daylight saving rules governed by the Rule EU, and the abbreviations to be used are GMT for "standard" time and BST for daylight saving time.
Now go find the Rule EU:
# Rule NAME FROM TO TYPE IN ON AT SAVE LETTER/S
Rule EU 1977 1980 - Apr Sun>=1 1:00u 1:00 S
Rule EU 1977 only - Sep lastSun 1:00u 0 -
Rule EU 1978 only - Oct 1 1:00u 0 -
Rule EU 1979 1995 - Sep lastSun 1:00u 0 -
Rule EU 1981 max - Mar lastSun 1:00u 1:00 S
Rule EU 1996 max - Oct lastSun 1:00u 0 -
You're looking for the Rules that are currently in effect. There are two of them at the moment, the last two rows.
The second to last row says that for each year, starting with 1981, on the last Sunday of March at 01:00 UTC, 1:00 is added to the UTC offset (specified by the Zone). The last column which contains S is not used in this example. But if the abbreviation specified in the Zone contained a %s, then this letter would be substituted in for the %s.
The last row says that for each year, starting with 1996, on the last Sunday of October at 01:00 UTC, 0:00 is added to the UTC offset (specified by the Zone).
Matt Johnson adds in the comments below:
You may also be interested in iana.org/time-zones/repository/tz-how-to.html
I thought this such an important comment that it should be in the answer for higher visibility. Thanks Matt!