get last 6 characters of a match (javascript regex) - javascript

I am trying to parse txt files with js + regex and my problem is as follows:
I have multiple txt files, and inside each one I need to search for an Id, made by 6 characters (numb + letters)
this is the string inside one of those files:
**IFCPROPERTYSINGLEVALUE('codice sito',$,IFCTEXT('I013FR'),$);**
I need to extract the I013FR only, and so far the closest js-regex I wrote is:
(codice sito',\$,IFCTEXT\('[a-zA-Z\d]{6})
using that, I get in return:
codice sito',$,IFCTEXT('I372TO
now I need to "add something" at the end of the regex, in order to only take the last 6 characters from the match.
Is that possible? am I on the right way? or maybe there is another better way to do that?

To extract the sequence of symbols, you need to put it in parenthesis. This pattern is called a "capturing group". Read more
/codice sito',\$,IFCTEXT\('([a-zA-Z\d]{6})/g
And then you can get your id using RegExp.exec() method.
const str = "**IFCPROPERTYSINGLEVALUE('codice sito',$,IFCTEXT('I013FR'),$);**";
const regex = /codice sito',\$,IFCTEXT\('([a-zA-Z\d]{6})/g;
const id = regex.exec(str)[1];

Related

Extracting a complicated part of the string with plain Javascript

I have a following string:
Text
I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com'
There are a few variables:
jan_kowalski is a name and surname it can change, and sometimes even have 3 elements
the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get)
the rest of the string remains the same for every case (beginning + everything after starting with _company_com ...
Ps. I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help
An alternative to Randy Casburn's solution using regex
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1];
console.log(out);
Or if you want to just get that string with those country codes you specified
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
A proof of concept that this solution also works for other combinations
let urls = [
new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'),
new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx')
]
urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1]))
I have been very successful before with this kind of problems with regular expressions:
var string = 'Text';
var regExp = /([\w]{2})_company_com/;
find = string.match(regExp);
console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code
First you got your given string. Second you have a regular expression, which is marked with two slashes at the beginning and at the end. A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful).
In this case here it matches exactly two word characters [\w]{2} followed directly by _company_com (\w indicates a word character, the [] group all wanted character types, here only word characters, and the {}indicate the number of characters to be found). Now to find the wanted part string.match(regExp) has to be called to get all captured findings. It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by ()). So in this case you get the country code with find[1], which is the first and only capture group of the regular expression.

Parsing file names with javascript

I have file names like the following:
SEM_VSE_SKINSHARPS_555001881_181002_1559_37072093.DAT
SEM_VSE_SECURITY_555001881_181002_1559_37072093.DAT
SEM_VSE_MEDICALCONDEMERGENCIES_555001881_181002_1559_37072093.DAT
SEM_REASONS_555001881_181002_1414_37072093.DAT
SEM_PSE_NPI_SECURITY_555001881_181002_1412_37072093.DAT
and I need to strip the numbers from the end. This will happen daily and and the numbers will change. I HAVE to do it in javascript. The problem is, I know really nothing about javascript. I've looked at both split and slice and I'm not sure either will work. These files come from a government entity which means the file name will probably not be consistent.
expected output:
SEM_VSE_SKINSHARPS
SEM_VSE_SECURITY
SEM_VSE_MEDICALCONDEMERGENCIES
SEM_REASONS
SEM_PSE_NPI_SECURITY
Any help is greatly appreciated.
This is a good use case for regular expressions. For example,
var oldFileName = 'SEM_VSE_SKINSHARPS_555001881_181002_1559_37072093.DAT',
newFileName;
newFileName = oldFileName.replace(/[_0-9]+(?=.DAT$)/, ''); // SEM_VSE_SKINSHARPS.DAT
This says to replace as many characters as it can in the set - and 0-9, with the requirement that the replaced portion must be followed by .DAT and the end of the string.
If you want to strip the .DAT, as well, use /[_0-9]+.DAT$/ as the regular expression instead of the one above.
If all the files end in .XYZ and follow the given pattern, this might also work:
var filename = "SEM_VSE_SKINSHARPS_555001881_181002_1559_37072093.DAT"
filename.slice(0,-4).split("_").filter(x => !+x).join("_")
results in:
"SEM_VSE_SKINSHARPS"
This is how it works:
drop the last 4 chars (.DAT)
split by _
filter out the numbers
join what is remaining with another _
You can also create a function out of this solution (or the other ones) and use it to process all the files provided they are in an array:
var fileTrimmer = filename => filename.slice(0,-4).split("_").filter(x => !+x).join("_")
var result = array_of_filenames.map(fileTrimmer)
Below is a solution that assumes you have your file name strings stored in an array. The code below simply creates a new array of properly formatted file names by utilizing Array.prototype.map on the original array - the map callback function first grabs the extension part of the string to tack on the file name later. Next, the function breaks the fileName string into an array delimited on the _ character. Finally, the filter function returns true if it does not find a number within the fileName string - returning true means that the element will be part of the new array. Otherwise, filter will return false and will not include the portion of the string that contains a number.
var fileNames = ['SEM_VSE_SKINSHARPS_555001881_181002_1559_37072093.DAT', 'SEM_VSE_SECURITY_555001881_181002_1559_37072093.DAT', 'SEM_VSE_MEDICALCONDEMERGENCIES_555001881_181002_1559_37072093.DAT', 'SEM_REASONS_555001881_181002_1414_37072093.DAT', 'SEM_PSE_NPI_SECURITY_555001881_181002_1412_37072093.DAT'];
var formattedFileNames = fileNames.map(fileName => {
var ext = fileName.substring(fileName.indexOf('.'), fileName.length);
var parts = fileName.split('_');
return parts.filter(part => !part.match(/[0-9]/g)).join('_') + ext;
});
console.log(formattedFileNames);

How would I write a Regular Expression to capture the value between Last Slash and Query String?

Problem:
Extract image file name from CDN address similar to the following:
https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba
Two-stage Solution:
I am using two regular expressions to retrieve the file name:
var postLastSlashRegEx = /[^\/]+$/,
preQueryRegEx = /^([^?]+)/;
var fileFromURL = urlString.match(postLastSlashRegEx)[0].match(preQueryRegEx)[0];
// fileFromURL = "photo%2FB%_2.jpeg"
Question:
Is there a way I can combine both regular expressions?
I've tried using capture groups, but haven't been able to produce a working solution.
From my comment
You can use a lookahead to find the "?" and use [^/] to match any non-slash characters.
/[^/]+(?=\?)/
To remove the dependency on the URL needing a "?", you can make the lookahead match a question mark or the end of line indicator (represented by $), but make sure the first glob is non-greedy.
/[^/]+?(?=\?|$)/
You don't have to use regex, you can just use split and substr.
var str = "https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba".split("?")[0];
var fileName = temp.substr(temp.lastIndexOf('/')+1);
but if regex is important to you, then:
str.match(/[^?]*\/([^?]+)/)[1]
The code using the substring method would look like the following -
var fileFromURL = urlString.substring(urlString.lastIndexOf('/') + 1, urlString.lastIndexOf('?'))

Regex one-liner for splitting string at nth character where n is a variable length

I've found a few similar questions, but none of them are clean one-liners, which I feel should be possible. I want to split a string at the last instance of specific character (in my case .).
var img = $('body').attr('data-bg-img-url'); // the string http://sub.foo.com/img/my-img.jpg
var finalChar = img.split( img.split(/[.]+/).length-1 ); // returns int 3 in above string example
var dynamicRegex = '/[.$`finalChar`]/';
I know I'm breaking some rules here, wondering if someone smarter than me knows the correct way to put that together and compress it?
EDIT - The end goal here is to split and store http://sub.foo.com/img/my-img and .jpg as separate strings.
In regex, .* is greedy, meaning it will match as much as possible. Therefore, if you want to match up to the last ., you could do:
/^.*\./
And from the looks, you are trying to get the file extension, so you would want to add capture:
var result = /^.*\.(.*)$/.exec( str );
var extension = result[1];
And for both parts:
var result = /^(.*)\.(.*)$/.exec( str );
var path = result[1];
var extension = result[2];
You can use the lastIndexOf() method on the period and then use the substring method to obtain the first and second string. The split() method is better used in a foreach scenario where you want to split at all instances. Substring is preferable for these types of cases where you are breaking at a single instance of the string.

javascript regex to extract the first character after the last specified character

I am trying to extract the first character after the last underscore in a string with an unknown number of '_' in the string but in my case there will always be one, because I added it in another step of the process.
What I tried is this. I also tried the regex by itself to extract from the name, but my result was empty.
var s = "XXXX-XXXX_XX_DigitalF.pdf"
var string = match(/[^_]*$/)[1]
string.charAt(0)
So the final desired result is 'D'. If the RegEx can only get me what is behind the last '_' that is fine because I know I can use the charAt like currently shown. However, if the regex can do the whole thing, even better.
If you know there will always be at least one underscore you can do this:
var s = "XXXX-XXXX_XX_DigitalF.pdf"
var firstCharAfterUnderscore = s.charAt(s.lastIndexOf("_") + 1);
// OR, with regex
var firstCharAfterUnderscore = s.match(/_([^_])[^_]*$/)[1]
With the regex, you can extract just the one letter by using parentheses to capture that part of the match. But I think the .lastIndexOf() version is easier to read.
Either way if there's a possibility of no underscores in the input you'd need to add some additional logic.

Categories

Resources