How to split a string by a character not directly preceded by a character of the same type? - javascript

Let's say I have a string: "We.need..to...split.asap". What I would like to do is to split the string by the delimiter ., but I only wish to split by the first . and include any recurring .s in the succeeding token.
Expected output:
["We", "need", ".to", "..split", "asap"]
In other languages, I know that this is possible with a look-behind /(?<!\.)\./ but Javascript unfortunately does not support such a feature.
I am curious to see your answers to this question. Perhaps there is a clever use of look-aheads that presently evades me?
I was considering reversing the string, then re-reversing the tokens, but that seems like too much work for what I am after... plus controversy: How do you reverse a string in place in JavaScript?
Thanks for the help!

Here's a variation of the answer by guest271314 that handles more than two consecutive delimiters:
var text = "We.need.to...split.asap";
var re = /(\.*[^.]+)\./;
var items = text.split(re).filter(function(val) { return val.length > 0; });
It uses the detail that if the split expression includes a capture group, the captured items are included in the returned array. These capture groups are actually the only thing we are interested in; the tokens are all empty strings, which we filter out.
EDIT: Unfortunately there's perhaps one slight bug with this. If the text to be split starts with a delimiter, that will be included in the first token. If that's an issue, it can be remedied with:
var re = /(?:^|(\.*[^.]+))\./;
var items = text.split(re).filter(function(val) { return !!val; });
(I think this regex is ugly and would welcome an improvement.)

You can do this without any lookaheads:
var subject = "We.need.to....split.asap";
var regex = /\.?(\.*[^.]+)/g;
var matches, output = [];
while(matches = regex.exec(subject)) {
output.push(matches[1]);
}
document.write(JSON.stringify(output));
It seemed like it'd work in one line, as it did on https://regex101.com/r/cO1dP3/1, but had to be expanded in the code above because the /g option by default prevents capturing groups from returning with .match (i.e. the correct data was in the capturing groups, but we couldn't immediately access them without doing the above).
See: JavaScript Regex Global Match Groups
An alternative solution with the original one liner (plus one line) is:
document.write(JSON.stringify(
"We.need.to....split.asap".match(/\.?(\.*[^.]+)/g)
.map(function(s) { return s.replace(/^\./, ''); })
));
Take your pick!

Note: This answer can't handle more than 2 consecutive delimiters, since it was written according to the example in the revision 1 of the question, which was not very clear about such cases.
var text = "We.need.to..split.asap";
// split "." if followed by "."
var res = text.split(/\.(?=\.)/).map(function(val, key) {
// if `val[0]` does not begin with "." split "."
// else split "." if not followed by "."
return val[0] !== "." ? val.split(/\./) : val.split(/\.(?!.*\.)/)
});
// concat arrays `res[0]` , `res[1]`
res = res[0].concat(res[1]);
document.write(JSON.stringify(res));

Related

check whether csv form or not [duplicate]

What is the regular expression to validate a comma delimited list like this one:
12365, 45236, 458, 1, 99996332, ......
I suggest you to do in the following way:
(\d+)(,\s*\d+)*
which would work for a list containing 1 or more elements.
This regex extracts an element from a comma separated list, regardless of contents:
(.+?)(?:,|$)
If you just replace the comma with something else, it should work for any delimiter.
It depends a bit on your exact requirements. I'm assuming: all numbers, any length, numbers cannot have leading zeros nor contain commas or decimal points. individual numbers always separated by a comma then a space, and the last number does NOT have a comma and space after it. Any of these being wrong would simplify the solution.
([1-9][0-9]*,[ ])*[1-9][0-9]*
Here's how I built that mentally:
[0-9] any digit.
[1-9][0-9]* leading non-zero digit followed by any number of digits
[1-9][0-9]*, as above, followed by a comma
[1-9][0-9]*[ ] as above, followed by a space
([1-9][0-9]*[ ])* as above, repeated 0 or more times
([1-9][0-9]*[ ])*[1-9][0-9]* as above, with a final number that doesn't have a comma.
Match duplicate comma-delimited items:
(?<=,|^)([^,]*)(,\1)+(?=,|$)
Reference.
This regex can be used to split the values of a comma delimitted list. List elements may be quoted, unquoted or empty. Commas inside a pair of quotation marks are not matched.
,(?!(?<=(?:^|,)\s*"(?:[^"]|""|\\")*,)(?:[^"]|""|\\")*"\s*(?:,|$))
Reference.
/^\d+(?:, ?\d+)*$/
i used this for a list of items that had to be alphanumeric without underscores at the front of each item.
^(([0-9a-zA-Z][0-9a-zA-Z_]*)([,][0-9a-zA-Z][0-9a-zA-Z_]*)*)$
You might want to specify language just to be safe, but
(\d+, ?)+(\d+)?
ought to work
I had a slightly different requirement, to parse an encoded dictionary/hashtable with escaped commas, like this:
"1=This is something, 2=This is something,,with an escaped comma, 3=This is something else"
I think this is an elegant solution, with a trick that avoids a lot of regex complexity:
if (string.IsNullOrEmpty(encodedValues))
{
return null;
}
else
{
var retVal = new Dictionary<int, string>();
var reFields = new Regex(#"([0-9]+)\=(([A-Za-z0-9\s]|(,,))+),");
foreach (Match match in reFields.Matches(encodedValues + ","))
{
var id = match.Groups[1].Value;
var value = match.Groups[2].Value;
retVal[int.Parse(id)] = value.Replace(",,", ",");
}
return retVal;
}
I think it can be adapted to the original question with an expression like #"([0-9]+),\s?" and parse on Groups[0].
I hope it's helpful to somebody and thanks for the tips on getting it close to there, especially Asaph!
In JavaScript, use split to help out, and catch any negative digits as well:
'-1,2,-3'.match(/(-?\d+)(,\s*-?\d+)*/)[0].split(',');
// ["-1", "2", "-3"]
// may need trimming if digits are space-separated
The following will match any comma delimited word/digit/space combination
(((.)*,)*)(.)*
Why don't you work with groups:
^(\d+(, )?)+$
If you had a more complicated regex, i.e: for valid urls rather than just numbers. You could do the following where you loop through each element and test each of them individually against your regex:
const validRelativeUrlRegex = /^(^$|(?!.*(\W\W))\/[a-zA-Z0-9\/-]+[^\W_]$)/;
const relativeUrls = "/url1,/url-2,url3";
const startsWithComma = relativeUrls.startsWith(",");
const endsWithComma = relativeUrls.endsWith(",");
const areAllURLsValid = relativeUrls
.split(",")
.every(url => validRelativeUrlRegex.test(url));
const isValid = areAllURLsValid && !endsWithComma && !startsWithComma

regex to remove certain characters at the beginning and end of a string

Let's say I have a string like this:
...hello world.bye
But I want to remove the first three dots and replace .bye with !
So the output should be
hello world!
it should only match if both conditions apply (... at the beginning and .bye at the end)
And I'm trying to use js replace method. Could you please help? Thanks
First match the dots, capture and lazy-repeat any character until you get to .bye, and match the .bye. Then, you can replace with the first captured group, plus an exclamation mark:
const str = '...hello world.bye';
console.log(str.replace(/\.\.\.(.*)\.bye/, '$1!'));
The lazy-repeat is there to ensure you don't match too much, for example:
const str = `...hello world.bye
...Hello again! Goodbye.`;
console.log(str.replace(/\.\.\.(.*)\.bye/g, '$1!'));
You don't actually need a regex to do this. Although it's a bit inelegant, the following should work fine (obviously the function can be called whatever makes sense in the context of your application):
function manipulate(string) {
if (string.slice(0, 3) == "..." && string.slice(-4) == ".bye") {
return string.slice(4, -4) + "!";
}
return string;
}
(Apologies if I made any stupid errors with indexing there, but the basic idea should be obvious.)
This, to me at least, has the advantage of being easier to reason about than a regex. Of course if you need to deal with more complicated cases you may reach the point where a regex is best - but I personally wouldn't bother for a simple use-case like the one mentioned in the OP.
Your regex would be
const rx = /\.\.\.([\s\S]*?)\.bye/g
const out = '\n\nfoobar...hello world.bye\nfoobar...ok.bye\n...line\nbreak.bye\n'.replace(rx, `$1!`)
console.log(out)
In English, find three dots, anything eager in group, and ending with .bye.
The replacement uses the first match $1 and concats ! using a string template.
An arguably simpler solution:
const str = '...hello world.bye'
const newStr = /...(.+)\.bye/.exec(str)
const formatted = newStr ? newStr[1] + '!' : str
console.log(formatted)
If the string doesn't match the regex it will just return the string.

Regexp to capture comma separated values

I have a string that can be a comma separated list of \w, such as:
abc123
abc123,def456,ghi789
I am trying to find a JavaScript regexp that will return ['abc123'] (first case) or ['abc123', 'def456', 'ghi789'] (without the comma).
I tried:
^(\w+,?)+$ -- Nope, as only the last repeating pattern will be matched, 789
^(?:(\w+),?)+$ -- Same story. I am using non-capturing bracket. However, the capturing just doesn't seem to happen for the repeated word
Is what I am trying to do even possible with regexp? I tried pretty much every combination of grouping, using capturing and non-capturing brackets, and still not managed to get this happening...
If you want to discard the whole input when there is something wrong, the simplest way is to validate, then split:
if (/^\w+(,\w+)*$/.test(input)) {
var values = input.split(',');
// Process the values here
}
If you want to allow empty value, change \w+ to \w*.
Trying to match and validate at the same time with single regex requires emulation of \G feature, which assert the position of the last match. Why is \G required? Since it prevents the engine from retrying the match at the next position and bypass your validation. Remember than ECMA Script regex doesn't have look-behind, so you can't differentiate between the position of an invalid character and the character(s) after it:
something,=bad,orisit,cor&rupt
^^ ^^
When you can't differentiate between the 2 positions, you can't rely on the engine to do a match-all operation alone. While it is possible to use a while loop with RegExp.exec and assert the position of last match yourself, why would you do so when there is a cleaner option?
If you want to savage whatever available, torazaburo's answer is a viable option.
Live demo
Try this regex :
'/([^,]+)/'
Alternatively, strings in javascript have a split method that can split a string based on a delimeter:
s.split(',')
Split on the comma first, then filter out results that do not match:
str.split(',').filter(function(s) { return /^\w+$/.test(s); })
This regex pattern separates numerical value in new line which contains special character such as .,,,# and so on.
var val = [1234,1213.1212, 1.3, 1.4]
var re = /[0-9]*[0-9]/gi;
var str = "abc123,def456, asda12, 1a2ass, yy8,ghi789";
var re = /[a-z]{3}\d{3}/g;
var list = str.match(re);
document.write("<BR> list.length: " + list.length);
for(var i=0; i < list.length; i++) {
document.write("<BR>list(" + i + "): " + list[i]);
}
This will get only "abc123" code style in the list and nothing else.
May be you can use split function
var st = "abc123,def456,ghi789";
var res = st.split(',');

How to compare two Strings and get Different part

now I have two strings,
var str1 = "A10B1C101D11";
var str2 = "A1B22C101D110E1";
What I intend to do is to tell the difference between them, the result will look like
A10B1C101D11
A10 B22 C101 D110E1
It follows the same pattern, one character and a number. And if the character doesn't exist or the number is different between them, I will say they are different, and highlight the different part. Can regular expression do it or any other good solution? thanks in advance!
Let me start by stating that regexp might not be the best tool for this. As the strings have a simple format that you are aware of it will be faster and safer to parse the strings into tokens and then compare the tokens.
However you can do this with Regexp, although in javascript you are hampered by the lack of lookbehind.
The way to do this is to use negative lookahead to prevent matches that are included in the other string. However since javascript does not support lookbehind you might need to go search from both directions.
We do this by concatenating the strings, with a delimiter that we can test for.
If using '|' as a delimiter the regexp becomes;
/(\D\d*)(?=(?:\||\D.*\|))(?!.*\|(.*\d)?\1(\D|$))/g
To find the tokens in the second string that are not present in the first you do;
var bothstring=str2.concat("|",str1);
var re=/(\D\d*)(?=(?:\||\D.*\|))(?!.*\|(.*\d)?\1(\D|$))/g;
var match=re.exec(bothstring);
Subsequent calls to re.exec will return later matches. So you can iterate over them as in the following example;
while (match!=null){
alert("\""+match+"\" At position "+match.index);
match=re.exec(t);
}
As stated this gives tokens in str2 that are different in str1. To get the tokens in str1 that are different use the same code but change the order of str1 and str2 when you concatenate the strings.
The above code might not be safe if dealing with potentially dirty input. In particular it might misbehave if feed a string like "A100|A100", the first A100 will not be considered as having a missing object because the regexp is not aware that the source is supposed to be two different strings. If this is a potential issue then search for occurences of the delimiting character.
You call break the string into an array
var aStr1 = str1.split('');
var aStr2 = str2.split('');
Then check which one has more characters, and save the smaller number
var totalCharacters;
if(aStr1.length > aStr2.length) {
totalCharacters = aStr2.length
} else {
totalCharacters = aStr1.length
}
And loop comparing both
var diff = [];
for(var i = 0; i<totalCharacters; i++) {
if(aStr1[i] != aStr2[i]) {
diff.push(aStr1[i]); // or something else
}
}
At the very end you can concat those last characters from the bigger String (since they obviously are different from the other one).
Does it helps you?

How to read all string inside parentheses using regex

I wanted to get all strings inside a parentheses pair. for example, after applying regex on
"fun('xyz'); fun('abcd'); fun('abcd.ef') { temp('no'); "
output should be
['xyz','abcd', 'abcd.ef'].
I tried many option but was not able to get desired result.
one option is
/fun\((.*?)\)/gi.exec("fun('xyz'); fun('abcd'); fun('abcd.ef')").
Store the regex in a variable, and run it in a loop...
var re = /fun\((.*?)\)/gi,
string = "fun('xyz'); fun('abcd'); fun('abcd.ef')",
matches = [],
match;
while(match = re.exec(string))
matches.push(match[1]);
Note that this only works for global regex. If you omit the g, you'll have an infinite loop.
Also note that it'll give an undesired result if there a ) between the quotation marks.
You can use this code will almost do the job:
"fun('xyz'); fun('abcd'); fun('abcd.ef')".match(/'.*?'/gi);
You'll get ["'xyz'", "'abcd'", "'abcd.ef'"] which contains extra ' around the string.
The easiest way to find what you need is to use this RegExp: /[\w.]+(?=')/g
var string = "fun('xyz'); fun('abcd'); fun('abcd.ef')";
string.match(/[\w.]+(?=')/g); // ['xyz','abcd', 'abcd.ef']
It will work with alphanumeric characters and point, you will need to change [\w.]+ to add more symbols.

Categories

Resources