Extracting substring from string based on delimiter

Extracting substring from string based on delimiter - javascript

I am trying to extract the data out of an encoded 2D barcode. The extraction part is working fine, and I can get the value in a text input.
E.g., the decoded string is
]d20105000456013482172012001000001/:210000000001
Based on the following rules (couldn't get the proper table markdown thus attaching a picture), I am trying to extract the substrings from the string mentioned above.
Substrings I want to extract:
05000456013482 (which is after the delimiter 01)
201200 (which is after delimiter 17)
00001 (which is after delimiter 10)
0000000001 (which is after delimiter 21)
P.S - > the first 3 chars in the original string (]d2) are always the same since it just simply signifies the decoding method.
Now some quirks:   1) The number of letters after delimiter 10 is not fixed. So, in the above-given example even though it is 00001 it could be even 001. Similarly, the number of letters after delimiter 21 is also not fixed and it could be of varying length.
For different length delimiters, I have added a constant /: to determine when encoding has ended after scanning through a handheld device.
Now, I have a look for /: after delimiter 10 and extract the string until it hits /: or EOL and find delimiter 21 and remove the string until it hits /: or EOL
2) The number of letters after delimiter 01 and 17 are always fixed (14 letter and six letters respectively)   as shown in the table.
Note: The position of delimiters could change. In order words, the encoded barcode could be written in a different sequence.
  ]d20105000456013482172012001000001/:210000000001 - Note: No /: sign after 21 group since it is EOL
]d2172012001000001/:210000000001/:0105000456013482 - Note: Both 10 and 21 group have /. sign to signify we have to extract until that sign
]d21000001/:210000000001/:010500045601348217201200 - First two are of varying length, and the next two are of fixed length.
I am not an expert in regex and thus far I only tried using some simple patterns like (01)(\d*)(21)(\d*)(10)(\d*)(17)(\d*)$ which doesn't work in the given an example since it looks for 10 like the first 2 chars. Also, using substring(x, x) method only works in case of a fixed length string when I am aware of which indexes I have to pluck the string.
P.S - Either JS and jQuery help is appreciated.

While you could try to make a very complicated regex to do this, it would be more readable, and maintainable to parse through the string in steps.
Basic steps would be to:
remove the decode method characters (]d2).
Split off the first two characters from the result of step 1.
Use that to choose which method to extract the data
Remove and save that data from the string, goto step 2 repeat until exhausted string.
Now since you have a table of the structure of the AI/data you can make several methods to extract the different forms of data
For instance, since AI: 01, 11, 15, 17 are all fixed length you can just use string's slice method with the length
str.slice(0,14); //for 01
str.slice(0,6); //for 11 15 17
While the variable ones like AI 21, would be something like
var fnc1 = "/:";
var fnc1Index = str.indexOf(fnc1);
str.slice(0,fnc1Index);
Demo
var dataNames = {
'01': 'GTIN',
'10': 'batchNumber',
'11': 'prodDate',
'15': 'bestDate',
'17': 'expireDate',
'21': 'serialNumber'
};
var input = document.querySelector("input");
document.querySelector("button").addEventListener("click",function(){
var str = input.value;
console.log( parseGS1(str) );
});
function parseGS1(str) {
var fnc1 = "/:";
var data = {};
//remove ]d2
str = str.slice(3);
while (str.length) {
//get the AI identifier: 01,10,11 etc
let aiIdent = str.slice(0, 2);
//get the name we want to use for the data object
let dataName = dataNames[aiIdent];
//update the string
str = str.slice(2);
switch (aiIdent) {
case "01":
data[dataName] = str.slice(0, 14);
str = str.slice(14);
break;
case "10":
case "21":
let fnc1Index = str.indexOf(fnc1);
//eol or fnc1 cases
if(fnc1Index==-1){
data[dataName] = str.slice(0);
str = "";
} else {
data[dataName] = str.slice(0, fnc1Index);
str = str.slice(fnc1Index + 2);
}
break;
case "11":
case "15":
case "17":
data[dataName] = str.slice(0, 6);
str = str.slice(6);
break;
default:
console.log("unexpected ident encountered:",aiIndent);
return false;
break;
}
}
return data;
}
<input><button>Parse</button>

Ok, here's my take on this. I created a regex that will match all possible patterns. That way all parts are split correctly, all that remains is to use the first two digits to know what it means.
^\]d2(?:((?:10|21)[a-zA-Z0-9]{1,20}(?:\/:|$))|(01[0-9]{14})|((?:11|15|17)[0-9]{6}))*
I suggest you copy it into regex101.com to read the full descriptors and test it out against different possible results.
There are 3 mains parts:
((?:10|21)[a-zA-Z0-9]{1,20}(?:\/:|$))
Which tests for the sections starting in 10 and 21. It looks for alphanumerical entities between 1 and 20 times. It should end either with EOL or /:
(01[0-9]{14})
Looks up for the GTIN, pretty straightforward.
((?:11|15|17)[0-9]{6})
Looks up for the 3 date fields.
As we expect those 3 segments to come in any order, I've glued them around | to imply a OR and expect this big sequence to repeat (with the * at the end expressing 0 or more, we could define the exact minimum and maximum for more reliability)
I am unsure if this will work for everything as the test strings you gave do not include identifiers inside actual values... It could very well happen that a product's best before date is in January so there will be a 01 in its value. But forcing the regex to execute in this manner should circumvent some of those problems.
EDIT: Capturing groups are only capturing the last occurence, so we need to split their definitions:
^\]d2(?:(21[a-zA-Z0-9]{1,20}(?:\/:|$))|(10[a-zA-Z0-9]{1,20}(?:\/:|$))|(01[0-9]{14})|(11[0-9]{6})|(15[0-9]{6})|(17[0-9]{6}))*
EDIT AGAIN: Javascript seems to cause us some headaches... I am not sure of the correct way to handle it, but here's an example code that could work.
var str = "]d20105000456013482172012001000001/:210000000001";
var r = new RegExp("(21[a-zA-Z0-9]{1,20}(?:\/:|$))|(10[a-zA-Z0-9]{1,20}(?:\/:|$))|(01[0-9]{14})|(11[0-9]{6})|(15[0-9]{6})|(17[0-9]{6})", "g");
var i = 0;
while ((match = r.exec(str)) != null) {
console.log(match[0]);
}
I am not very happy with how it turns out though. There might be better solutions.

Related

how to substring in javascript but excluding symbols

I need to use (substring method) in my javascript web network project, but excluding : colon symbol, because it is hexadecimal ip address, and I don't want to accept : colons as a string or lets say number in the substring, I want to ignore it. How to do that?
This is the example IPV6 in the input field:
2001:2eb8:0dc1:54ed:0000:0000:0000:0f31
after substring from 1 to 12:
001:2eb8:0d
as you can see it accepted colons also, but in fact, I need this result:
2001:2eb8:0dc1
so by excluding these two symbols, it would have gave me that result above, but I don't know how.
and here is the code, IpAddressInput, is only a normal input field which I write the ip address in it.
Here is the code:
var IpValue = $('#IpAddressInput').val();
alert(IpValue.substr(1, (12) -1));

Answer 1: I think there is no direct function to results like you want but this answer will help you. I counted the number of colons from index 0 to 12 and then slice the source string from 0 to 12 plus the number. Here is the code:
let val = "2001:2eb8:0dc1:54ed:0000:0000:0000:0f31";
let numOfColons = val.slice(0, 12).match(/:/g).length;
let result = val.slice(0, 12 + numOfColons);
console.log(result)
Answer 2: If you are sure that there is a colon after exactly every 4 characters, a better solution will be this. The idea is to remove all colons from the string, slice from index 0 to 12, and add a colon after every 4 characters. Finally, it removes the last colon. Here is the code:
let value = "2001:2eb8:0dc1:54ed:0000:0000:0000:0f31";
let valueExcludeColon = value.replace(/:/g, ''); // '20012eb80dc154ed0000000000000f31'
let result = valueExcludeColon.slice(0,12).replace(/(.{4})/g, "$1:"); // '2001:2eb8:0dc1:'
let finalResult = result.slice(0, -1); // 2001:2eb8:0dc1
console.log(finalResult)

Match hyphenated floats

I'm working on JavaScript code, and I need to extract float or int numbers between hyphens. Like this:
var string="someText-180.5-200.70-someOtherText";
I need to get:
var number1="180.5";
var number2="200.70";

While other RegExp based solutions work, I'm going to propose a solution that does not use regular expressions.
My motivation for this is that regular expressions can be difficult to read, modify and/or debug sometimes.
This code returns an array with all the floats from the string (assuming the floats are positive)
"someText-180.5-200.70-someOtherText".split("-"). //splits by hyphens
map(function(elem){
return parseFloat(elem);//convert each part to float
}).filter(function(elem){
return elem===elem;//filter out the not-a-numbers,
// that is, stuff converting to float failed on
});
This could also be done without map/filter in a less 'functional' way if you're more comfortable with that. I believe code readability is very important and you should feel comfortable with your own code. You can do
var str = "someText-180.5-200.70-someOtherText";
var splitStr = str.split("-");
var arrayOfFloats = [];
for(var i=0;i<splitStr.length;i++){
var asFloat = parseFloat(splitStr[i]);
if(asFloat===asFloat){ //check for NaN, that is parse float fails
arrayOfFloats.push(asFloat);
}
}
//now arrayOfFloats contains all the floats in your expression
If you would like not to match the first and last parts (that is, only floats that are strictly between hyphens) elements you can slice them first :)

One of the problems is, that a regex matches this: "-180.5-" , than the next match will be searched in "200.70-someOtherText" string so, when the next search run wont match 200.70 ...
We should do a bit more than write a regex.
In my soultion, i cuted the examined part of the string, and run regex again on the other part of the string.. and do while there is matching.
See below the code:
function findAllINeed(str){
result = [];
while ((match = /(?:\-)(([0-9]+)\.?([0-9]*))(?:\-)/.exec(str)) != null) {
str = str.substr(match.index + match[1].length)
result.push(parseFloat(match[1]));
}
return result;
}
I tried this:
findAllINeed("someText-180.5-200.70-someOtherText sd sdf -6 -6.777 7- s 4.55 -4-sdfsdfsdf -45.77-4-")
[180.5, 200.7, 4, 45.77, 4]
Does not match -6, -6.7777, 4.55, 7- ...
But find all Positive float or integer between '-' characters
I hope this helped you out.

You can use the following regex to accomplish what you are looking for:
[0-9]*\.?,?[0-9]+
Input:
someText-180.5-200.70-someOtherText
Output:
180.5
200.70

You can use this regex:
var string="someText-180.5-200.70-someOtherText";
var match = string.match(/.*?(\d+(\.\d+)?).*?(\d+(\.\d+)?)/);
console.log(match[1]); // prints 180.5
console.log(match[3]); // prints 200.70

var numbers = string.match(/[\d.]+/g)
Or if you're concerned about multiple periods creating invalid numbers:
var numbers = string.match(/\d+(\.\d+)?/g)

getting contents of string between digits

have a regex problem :(
what i would like to do is to find out the contents between two or more numbers.
var string = "90+*-+80-+/*70"
im trying to edit the symbols in between so it only shows up the last symbol and not the ones before it. so trying to get the above variable to be turned into 90+80*70. although this is just an example i have no idea how to do this. the length of the numbers, how many "sets" of numbers and the length of the symbols in between could be anything.
many thanks,
Steve,

The trick is in matching '90+-+' and '80-+/' seperately, and selecting only the number and the last constant.
The expression for finding the a number followed by 1 or more non-numbers would be
\d+[^\d]+
To select the number and the last non-number, add parens:
(\d+)[^\d]*([^\d])
Finally add a /g to repeat the procedure for each match, and replace it with the 2 matched groups for each match:
js> '90+*-+80-+/*70'.replace(/(\d+)[^\d]*([^\d])/g, '$1$2');
90+80*70
js>

Or you can use lookahead assertion and simply remove all non-numerical characters which are not last: "90+*-+80-+/*70".replace(/[^0-9]+(?=[^0-9])/g,'');

You can use a regular expression to match the non-digits and a callback function to process the match and decide what to replace:
var test = "90+*-+80-+/*70";
var out = test.replace(/[^\d]+/g, function(str) {
return(str.substr(-1));
})
alert(out);
See it work here: http://jsfiddle.net/jfriend00/Tncya/
This works by using a regular expression to match sequences of non-digits and then replacing that sequence of non-digits with the last character in the matched sequence.

i would use this tutorial, first, then review this for javascript-specific regex questions.

This should do it -
var string = "90+*-+80-+/*70"
var result = '';
var arr = string.split(/(\d+)/)
for (i = 0; i < arr.length; i++) {
if (!isNaN(arr[i])) result = result + arr[i];
else result = result + arr[i].slice(arr[i].length - 1, arr[i].length);
}
alert(result);
Working demo - http://jsfiddle.net/ipr101/SA2pR/

Similar to #Arnout Engelen
var string = "90+*-+80-+/*70";
string = string.replace(/(\d+)[^\d]*([^\d])(?=\d+)/g, '$1$2');
This was my first thinking of how the RegEx should perform, it also looks ahead to make sure the non-digit pattern is followed by another digit, which is what the question asked for (between two numbers)
Similar to #jfriend00
var string = "90+*-+80-+/*70";
string = string.replace( /(\d+?)([^\d]+?)(?=\d+)/g
, function(){
return arguments[1] + arguments[2].substr(-1);
});
Instead of only matching on non-digits, it matches on non-digits between two numbers, which is what the question asked
Why would this be any better?
If your equation was embedded in a paragraph or string of text. Like:
This is a test where I want to clean up something like 90+*-+80-+/*70 and don't want to scrap the whole paragraph.
Result (Expected) :
This is a test where I want to clean up something like 90+80*70 and don't want to scrap the whole paragraph.
Why would this not be any better?
There is more pattern matching, which makes it theoretically slower (negligible)
It would fail if your paragraph had embedded numbers. Like:
This is a paragraph where Sally bought 4 eggs from the supermarket, but only 3 of them made it back in one piece.
Result (Unexpected):
This is a paragraph where Sally bought 4 3 of them made it back in one piece.

Splitting Nucleotide Sequences in JS with Regexp

I'm trying to split up a nucleotide sequence into amino acid strings using a regular expression. I have to start a new string at each occurrence of the string "ATG", but I don't want to actually stop the first match at the "ATG". Valid input is any ordering of a string of As, Cs, Gs, and Ts.
For example, given the input string: ATGAACATAGGACATGAGGAGTCA
I should get two strings: ATGAACATAGGACATGAGGAGTCA (the whole thing) and ATGAGGAGTCA (the first match of "ATG" onward). A string that contains "ATG" n times should result in n results.
I thought the expression /(?:[ACGT]*)(ATG)[ACGT]*/g would work, but it doesn't. If this can't be done with a regexp it's easy enough to just write out the code for, but I always prefer an elegant solution if one is available.

If you really want to use regular expressions, try this:
var str = "ATGAACATAGGACATGAGGAGTCA",
re = /ATG.*/g, match, matches=[];
while ((match = re.exec(str)) !== null) {
matches.push(match);
re.lastIndex = match.index + 3;
}
But be careful with exec and changing the index. You can easily make it an infinite loop.
Otherwise you could use indexOf to find the indices and substr to get the substrings:
var str = "ATGAACATAGGACATGAGGAGTCA",
offset=0, match=str, matches=[];
while ((offset = match.indexOf("ATG", offset)) > -1) {
match = match.substr(offset);
matches.push(match);
offset += 3;
}

I think you want is
var subStrings = inputString.split('ATG');
KISS :)

Splitting a string before each occurrence of ATG is simple, just use
result = subject.split(/(?=ATG)/i);
(?=ATG) is a positive lookahead assertion, meaning "Assert that you can match ATG starting at the current position in the string".
This will split GGGATGTTTATGGGGATGCCC into GGG, ATGTTT, ATGGGG and ATGCCC.
So now you have an array of (in this case four) strings. I would now go and take those, discard the first one (this one will never contain nor start with ATG) and then join the strings no. 2 + ... + n, then 3 + ... + n etc. until you have exhausted the list.
Of course, this regex doesn't do any validation as to whether the string only contains ACGT characters as it only matches positions between characters, so that should be done before, i. e. that the input string matches /^[ACGT]*$/i.

Since you want to capture from every "ATG" to the end split isn't right for you. You can, however, use replace, and abuse the callback function:
var matches = [];
seq.replace(/atg/gi, function(m, pos){ matches.push(seq.substr(pos)); });

This isn't with regex, and I don't know if this is what you consider "elegant," but...
var sequence = 'ATGAACATAGGACATGAGGAGTCA';
var matches = [];
do {
matches.push('ATG' + (sequence = sequence.slice(sequence.indexOf('ATG') + 3)));
} while (sequence.indexOf('ATG') > 0);
I'm not completely sure if this is what you're looking for. For example, with an input string of ATGabcdefghijATGklmnoATGpqrs, this returns ATGabcdefghijATGklmnoATGpqrs, ATGklmnoATGpqrs, and ATGpqrs.

Using Regular Expressions with Javascript replace method

Friends,
I'm new to both Javascript and Regular Expressions and hope you can help!
Within a Javascript function I need to check to see if a comma(,) appears 1 or more times. If it does then there should be one or more numbers either side of it.
e.g.
1,000.00 is ok
1,000,00 is ok
,000.00 is not ok
1,,000.00 is not ok
If these conditions are met I want the comma to be removed so 1,000.00 becomes 1000.00
What I have tried so is:
var x = '1,000.00';
var regex = new RegExp("[0-9]+,[0-9]+", "g");
var y = x.replace(regex,"");
alert(y);
When run the alert shows ".00" Which is not what I was expecting or want!
Thanks in advance for any help provided.
strong text
Edit
strong text
Thanks all for the input so far and the 3 answers given. Unfortunately I don't think I explained my question well enough.
What I am trying to achieve is:
If there is a comma in the text and there are one or more numbers either side of it then remove the comma but leave the rest of the string as is.
If there is a comma in the text and there is not at least one number either side of it then do nothing.
So using my examples from above:
1,000.00 becomes 1000.00
1,000,00 becomes 100000
,000.00 is left as ,000.00
1,,000.00 is left as 1,,000.00
Apologies for the confusion!

Your regex isn't going to be very flexible with higher orders than 1000 and it has a problem with inputs which don't have the comma. More problematically you're also matching and replacing the part of the data you're interested in!
Better to have a regex which matches the forms which are a problem and remove them.
The following matches (in order) commas at the beginning of the input, at the end of the input, preceded by a number of non digits, or followed by a number of non digits.
var y = x.replace(/^,|,$|[^0-9]+,|,[^0-9]+/g,'');
As an aside, all of this is much easier if you happen to be able to do lookbehind but almost every JS implementation doesn't.
Edit based on question update:
Ok, I won't attempt to understand why your rules are as they are, but the regex gets simpler to solve it:
var y = x.replace(/(\d),(\d)/g, '$1$2');

I would use something like the following:
^[0-9]{1,3}(,[0-9]{3})*(\.[0-9]+)$
[0-9]{1,3}: 1 to 3 digits
(,[0-9]{3})*: [Optional] More digit triplets seperated by a comma
(\.[0-9]+): [Optional] Dot + more digits
If this regex matches, you know that your number is valid. Just replace all commas with the empty string afterwards.

It seems to me you have three error conditions
",1000"
"1000,"
"1,,000"
If any one of these is true then you should reject the field, If they are all false then you can strip the commas in the normal way and move on. This can be a simple alternation:
^,|,,|,$

I would just remove anything except digits and the decimal separator ([^0-9.]) and send the output through parseFloat():
var y = parseFloat(x.replace(/[^0-9.]+/g, ""));

// invalid cases:
// - standalone comma at the beginning of the string
// - comma next to another comma
// - standalone comma at the end of the string
var i,
inputs = ['1,000.00', '1,000,00', ',000.00', '1,,000.00'],
invalid_cases = /(^,)|(,,)|(,$)/;
for (i = 0; i < inputs.length; i++) {
if (inputs[i].match(invalid_cases) === null) {
// wipe out everything but decimal and dot
inputs[i] = inputs[i].replace(/[^\d.]+/g, '');
}
}
console.log(inputs); // ["1000.00", "100000", ",000.00", "1,,000.00"]

Develop Reference

JavaScript is the programming language of the Web.

Extracting substring from string based on delimiter - javascript

Related

how to substring in javascript but excluding symbols

Match hyphenated floats

getting contents of string between digits

Splitting Nucleotide Sequences in JS with Regexp

Using Regular Expressions with Javascript replace method

Categories

Resources