Split a string based on a condition

Split a string based on a condition - javascript

I would like to split a spreadsheet cell reference (eg, A10, AB100, ABC5) to two string parts: column reference and row reference.
A10 => A and 10
AB100 => AB and 100 ...
Does anyone know how to do this by string functions?

var res = "AA123";
//Method 1
var arr = res.match(/[a-z]+|[^a-z]+/gi);
document.write(arr[0] + "<br>" + arr[1]);
//Method 2 (as deceze suggested)
var arr = res.match(/([^\d]+)(\d+)/);
document.write("<br>" + arr[1] + "<br>" + arr[2]);
//Note here [^\d] is the same as \D

This is easiest to do with a regular expression (regex). For example:
var ref = "AA100";
var matches = ref.match(/^([a-zA-Z]+)([1-9][0-9]*)$/);
if (matches) {
var column = matches[1];
var row = Number(matches[2]);
console.log(column); // "AA"
console.log(row); // 100
} else {
throw new Error('invalid ref "' + ref + '"');
}
The important part here is the regex literal, /^([a-zA-Z]+)([1-9][0-9]*)$/. I'll walk you through it.
^ anchors the regex to the start of the string. Otherwise you might match something like "123ABC456".
[a-zA-Z]+ matches one or more character from a-z or A-Z.
[1-9][0-9]* matches exactly one character from 1-9, and then zero or more characters from 0-9. This makes sure that the number you are matching never starts with zero (i.e. "A001" is not allowed).
$ anchors the regex to the end of the string, so that you don't match something like "ABC123DEF".
The parentheses around ([a-zA-Z]+) and ([1-9][0-9]*) "capture" the strings inside them, so that we can later find them using matches[1] and matches[2].
This example is strict about only matching valid cell references. If you trust the data you receive to always be valid then you can get away with a less strict regex, but it is good practice to always validate your data anyway in case your data source changes or you use the code somewhere else.
It is also up to you to decide what you want to do if you receive invalid data. In this example I make the script throw an error, but there might be better choices in your situation (e.g. prompt the user to enter another value).

Related

How to allow only certain words consecutively with Regex in javascript

I'm trying to write a regex that will return true if it matches the format below, otherwise, it should return false. It should only allow words as below:
Positive match (return true)
UA-1234-1,UA-12345-2,UA-34578-2
Negative match (return false or null)
Note: A is missing after U
UA-1234-1,U-12345-2
It should always give me true when the string passed to regex is
UA-1234-1,UA-12345-2,UA-34578-2,...........
Below is what I am trying to do but it is matching only the first element and not returning null.
var pattern=/^UA-[0-9]+(-[0-9]+)?/g;
pattern.match("UA-1234-1,UA-12345-2,UA-34578-2");
pattern.exec("UA-1234-1,UA-12345-2,UA-34578-2)
Thanks in advance. Help is greatly appreciated.

The pattern you need is a pattern enclosed with anchors (^ - start of string and $ - end of string) that matches your pattern at first (the initial "block") and then matches 0 or more occurrences of a , followed with the block pattern.
It looks like /^BLOCK(?:,BLOCK)*$/. You may introduce optional whitespaces in between, e.g. /^BLOCK(?:,\s*BLOCK)*$/.
In the end, the pattern looks like ^UA-[0-9]+(?:-[0-9]+)?(?:,UA-[0-9]+(?:-[0-9]+)?)*$. It is best to build it dynamically to keep it readable and easy to maintain:
const block = "UA-[0-9]+(?:-[0-9]+)?";
let rx = new RegExp(`^${block}(?:,${block})*$`); // RegExp("^" + block + "(?:," + block + ")*$") // for non-ES6
let tests = ['UA-1234-1,UA-12345-2,UA-34578-2', 'UA-1234-1,U-12345-2'];
for (var s of tests) {
console.log(s, "=>", rx.test(s));
}

split the string by commas, and test each element instead.

Regexp to capture comma separated values

I have a string that can be a comma separated list of \w, such as:
abc123
abc123,def456,ghi789
I am trying to find a JavaScript regexp that will return ['abc123'] (first case) or ['abc123', 'def456', 'ghi789'] (without the comma).
I tried:
^(\w+,?)+$ -- Nope, as only the last repeating pattern will be matched, 789
^(?:(\w+),?)+$ -- Same story. I am using non-capturing bracket. However, the capturing just doesn't seem to happen for the repeated word
Is what I am trying to do even possible with regexp? I tried pretty much every combination of grouping, using capturing and non-capturing brackets, and still not managed to get this happening...

If you want to discard the whole input when there is something wrong, the simplest way is to validate, then split:
if (/^\w+(,\w+)*$/.test(input)) {
var values = input.split(',');
// Process the values here
}
If you want to allow empty value, change \w+ to \w*.
Trying to match and validate at the same time with single regex requires emulation of \G feature, which assert the position of the last match. Why is \G required? Since it prevents the engine from retrying the match at the next position and bypass your validation. Remember than ECMA Script regex doesn't have look-behind, so you can't differentiate between the position of an invalid character and the character(s) after it:
something,=bad,orisit,cor&rupt
^^ ^^
When you can't differentiate between the 2 positions, you can't rely on the engine to do a match-all operation alone. While it is possible to use a while loop with RegExp.exec and assert the position of last match yourself, why would you do so when there is a cleaner option?
If you want to savage whatever available, torazaburo's answer is a viable option.

Live demo
Try this regex :
'/([^,]+)/'
Alternatively, strings in javascript have a split method that can split a string based on a delimeter:
s.split(',')

Split on the comma first, then filter out results that do not match:
str.split(',').filter(function(s) { return /^\w+$/.test(s); })

This regex pattern separates numerical value in new line which contains special character such as .,,,# and so on.
var val = [1234,1213.1212, 1.3, 1.4]
var re = /[0-9]*[0-9]/gi;

var str = "abc123,def456, asda12, 1a2ass, yy8,ghi789";
var re = /[a-z]{3}\d{3}/g;
var list = str.match(re);
document.write("<BR> list.length: " + list.length);
for(var i=0; i < list.length; i++) {
document.write("<BR>list(" + i + "): " + list[i]);
}
This will get only "abc123" code style in the list and nothing else.

May be you can use split function
var st = "abc123,def456,ghi789";
var res = st.split(',');

Put the filename and the filetype in a array

I want to put the filename and the filetype in a array and I know the answer (split) but I don't know how to look for the last dot before the extension begin.
Examples: Funny - SMS 02.jpg will get Funny - SMS 02in one array and jpg in another. But when I'm try to split the name of an file that already contains dots, the trouble begins. Funny - When you see it....jpg prints Funny - When you see it in for example fname[0] and jpg in fname[1].
How can I make it print Funny - When you see it... as fname[0] and jpg as fname[1]?
Thanks in advance.

function getFnameExt(filename) {
var parts = filename.split('.'), ext = parts.pop(), fname = parts.join('.');
return [ fname, ext ];
}
console.log( getFnameExt("Funny - When you see it....jpg") );

var array = [];
var s = "Funny - When you see it....jpg";
var lastDot = s.lastIndexOf(".");
array[0] = s.substring(0, lastDot);
array[1] = s.substring(lastDot + 1);
alert(array[0] + "---" + array[1]);

For these kinds of tasks splitting is usually cumbersome. Regular expressions are more powerful:
var matches = /^(.*)\.([^.]*)$/g.exec("Funny - When you see it....jpg");
matches.shift();
// matches:
// ["Funny - When you see it...", "jpg"]
This matches the string against the regexp, which results in an array with three elements. The first is the full match which is not needed, so shift it.
^ begin of string
.* any amount of any character
\. a dot
[^.]* any amount of any character except a dot
$ end of string
With the begin/end of string anchors, .* must contain all characters before the last dot.
( and ) denote a group, which adds the matched substring to the array.

fname = "Funny - When you see it....jpg"
parts = fname.split(/\.(?=[^.]*$)/)
// parts=["Funny - When you see it...", "jpg"]
?= is called 'lookahead' and basically means "followed by". So, the above reads: "split by a dot if there's no dots after it".

Using regex to search for keywords at the beginning of words only

I have a searching system that splits the keyword into chunks and searches for it in a string like this:
var regexp_school = new RegExp("(?=.*" + split_keywords[0] + ")(?=.*" + split_keywords[1] + ")(?=.*" + split_keywords[2] + ").*", "i");
I would like to modify this so that so that I would only search for it in the beginning of the words.
For example if the string is:
"Bbe be eb ebb beb"
And the keyword is: "be eb"
Then I want only these to hit "be ebb eb"
In other words I want to combine the above regexp with this one:
var regexp_school = new RegExp("^" + split_keywords[0], "i");
But I'm not sure how the syntax would look like.
I'm also using the split function to split the keywords, but I don't want to set a length since I don't know how many words there are in the keyword string.
split_keywords = school_keyword.split(" ", 3);
If I leave the 3 out, will it have dynamic length or just length of 1? I tried doing a
alert(split_keywords.lenght);
But didn't get a desired response

You should use the special word boundary character \b to match the beginning of a word. To create the expression for an arbitrary number of keywords, you can generate it in a loop.
var regex = '';
for(var i = split_keywords.length;i--; ) {
// two slashes are needed to insert `\` literally
regex += "(?=.*\\b" + split_keywords[i] + ")";
}
var regexp_school = new RegExp(regex, "i");
I'm not sure about performance, but you can also consider to use indexOf to test whether a substring is contained in a string.
Update:
If \b does not work for you (because of other "special" characters), and all your words are separated by a white space, you can use
"(?=.*\\s" + split_keywords[i] + ")"
or
"(?=.* " + split_keywords[i] + ")"
But for this to work you have to prepend the text you are searching in with a white space:
" " + textYouSearchIn
or you are write a more complex expression:
"(?=(^|.*\\s)" + split_keywords[i] + ")"

A couple points. First, you need to anchor the regex to the start of the string. Otherwise, if there is no match, there are a LOT of combinations that the regex engine must try before declaring a match failure (it must check all of them, in fact). Second, when splitting the string, use /\s+/ instead of a single space - this prevents getting empty matches in the resulting array in case there are multiple spaces between any keywords. Third, if there are empty strings in the array of keywords, you do not want to add them to the regex. Felix's solution is pretty close to the mark, but does not actually match the string once all the positive lookahead assertions are finished. That said, here's my proposed solution:
var split_keywords = school_keyword.split(/\s+/);
var regex = "^"; // Anchor to start of string.
for (var i = 0, len = split_keywords.length; i < len; ++i) {
if (split_keywords[i]) { // Skip empty keyword strings.
regex += "(?=.*?\\b" + split_keywords[i] + ")";
}
}
regex += ".*$"; // Add ending to actually match the line.
var regexp_school = new RegExp(regex, "i");
I've also changed the greedy quantifier to lazy. This is one case where it is applicable.

Splitting Nucleotide Sequences in JS with Regexp

I'm trying to split up a nucleotide sequence into amino acid strings using a regular expression. I have to start a new string at each occurrence of the string "ATG", but I don't want to actually stop the first match at the "ATG". Valid input is any ordering of a string of As, Cs, Gs, and Ts.
For example, given the input string: ATGAACATAGGACATGAGGAGTCA
I should get two strings: ATGAACATAGGACATGAGGAGTCA (the whole thing) and ATGAGGAGTCA (the first match of "ATG" onward). A string that contains "ATG" n times should result in n results.
I thought the expression /(?:[ACGT]*)(ATG)[ACGT]*/g would work, but it doesn't. If this can't be done with a regexp it's easy enough to just write out the code for, but I always prefer an elegant solution if one is available.

If you really want to use regular expressions, try this:
var str = "ATGAACATAGGACATGAGGAGTCA",
re = /ATG.*/g, match, matches=[];
while ((match = re.exec(str)) !== null) {
matches.push(match);
re.lastIndex = match.index + 3;
}
But be careful with exec and changing the index. You can easily make it an infinite loop.
Otherwise you could use indexOf to find the indices and substr to get the substrings:
var str = "ATGAACATAGGACATGAGGAGTCA",
offset=0, match=str, matches=[];
while ((offset = match.indexOf("ATG", offset)) > -1) {
match = match.substr(offset);
matches.push(match);
offset += 3;
}

I think you want is
var subStrings = inputString.split('ATG');
KISS :)

Splitting a string before each occurrence of ATG is simple, just use
result = subject.split(/(?=ATG)/i);
(?=ATG) is a positive lookahead assertion, meaning "Assert that you can match ATG starting at the current position in the string".
This will split GGGATGTTTATGGGGATGCCC into GGG, ATGTTT, ATGGGG and ATGCCC.
So now you have an array of (in this case four) strings. I would now go and take those, discard the first one (this one will never contain nor start with ATG) and then join the strings no. 2 + ... + n, then 3 + ... + n etc. until you have exhausted the list.
Of course, this regex doesn't do any validation as to whether the string only contains ACGT characters as it only matches positions between characters, so that should be done before, i. e. that the input string matches /^[ACGT]*$/i.

Since you want to capture from every "ATG" to the end split isn't right for you. You can, however, use replace, and abuse the callback function:
var matches = [];
seq.replace(/atg/gi, function(m, pos){ matches.push(seq.substr(pos)); });

This isn't with regex, and I don't know if this is what you consider "elegant," but...
var sequence = 'ATGAACATAGGACATGAGGAGTCA';
var matches = [];
do {
matches.push('ATG' + (sequence = sequence.slice(sequence.indexOf('ATG') + 3)));
} while (sequence.indexOf('ATG') > 0);
I'm not completely sure if this is what you're looking for. For example, with an input string of ATGabcdefghijATGklmnoATGpqrs, this returns ATGabcdefghijATGklmnoATGpqrs, ATGklmnoATGpqrs, and ATGpqrs.

Develop Reference

JavaScript is the programming language of the Web.

Split a string based on a condition - javascript

I would like to split a spreadsheet cell reference (eg, A10, AB100, ABC5) to two string parts: column reference and row reference. A10 => A and 10 AB100 => AB and 100 ... Does anyone know how to do this by string functions?

var res = "AA123"; //Method 1 var arr = res.match(/[a-z]+|[^a-z]+/gi); document.write(arr[0] + "<br>" + arr[1]); //Method 2 (as deceze suggested) var arr = res.match(/([^\d]+)(\d+)/); document.write("<br>" + arr[1] + "<br>" + arr[2]); //Note here [^\d] is the same as \D

Related

How to allow only certain words consecutively with Regex in javascript

Regexp to capture comma separated values

Put the filename and the filetype in a array

Using regex to search for keywords at the beginning of words only

Splitting Nucleotide Sequences in JS with Regexp

Categories

Resources