How to split comma, semicolon and comma separated phrases with semicolons around where it should treat anything between semicolons with a comma as the delimiter, but also delimit just comma and semicolon all together in funcion?
String of words:
var words = "apple;banana;apple, banana;fruit"
Regex function separate by , and ;
var result = words.split(/[,;]+/);
Result from that function is:
[ "apple", "banana", "apple", " banana", "fruit" ]
But what I am looking to get is this, to have "banana, apple" and treat it as a separate value
[ "apple", "banana, apple", " banana", "fruit" ]
So is it possible to combine 2 cases in that function to output as the second example? Or maybe some other elegant way?
A combination of match and replace can be used.
With the match we take the two patterns we have
Values between delimiter (?:\w+;\w+,)
Value not in between delimiter \w+;?
Now based on matched group we just change the values in desired format using replace
let words = `apple;banana;apple, banana;fruit`
let op = words.match(/(?:\w+;\w+,)|\w+;?/g)
.map(e=>e.replace(/(?:(\w+);(\w+),)|(\w+);?/g, (m,g1,g2,g3)=> g1 ? g1+', '+g2 : g3 ))
console.log(op)
Array math is easier than regex.
Objective
apple;banana;apple, banana;fruit > [ "apple", "banana, apple", " banana", "fruit" ]
Remove spaces, split at ;, split at ,.
Solution
const words = 'apple;banana;apple, banana;fruit';
const result = words.split(' ').join('').split(';').join(',').split(',');
console.log(result);
Accorting to question apple-banana second example (after this question update) try
words.split(';');
var words = "apple;banana;apple, banana;fruit"
var result = words.split(';');
console.log(result);
Related
I have a string like
var string = "#developers must split #hashtags";
I want to split it when a word starts with # symbol
I tried these two examples
var example1 = string.split(/(?=#)/g);
//result is ["#developers must split ", "#hashtags"]
var example2 = string.split(/(?:^|[ ])#([a-zA-Z]+)/g);
// result is ["", "developers", "must split", "hashtags", ""]
Result must looks like this
var description = ["#developers", "must split", "#hashtags"]
JSFiddle example
I have a solution but it is a bit long, I want it short with regex. thank you,
When you split, the captured groups are included in the split results array. So you can capture the #word delimiter and omit the space before and after the delimiter with an expression like \s*(#\S+)\s*. Omit empty strings by filter-ing on an expression that tests the truthiness of each string (e.g.: x => x).
let result = "#developers must split #hashtags".split(/\s*(#\S+)\s*/g).filter(x => x);
console.log(result);
I am trying to get this result: 'Summer-is-here'. Why does the code below generate extra spaces? (Current result: '-Summer--Is- -Here-').
function spinalCase(str) {
var newA = str.split(/([A-Z][a-z]*)/).join("-");
return newA;
}
spinalCase("SummerIs Here");
You are using a variety of split where the regexp contains a capturing group (inside parentheses), which has a specific meaning, namely to include all the splitting strings in the result. So your result becomes:
["", "Summer", "", "Is", " ", "Here", ""]
Joining that with - gives you the result you see. But you can't just remove the unnecessary capture group from the regexp, because then the split would give you
["", "", " ", ""]
because you are splitting on zero-width strings, due to the * in your regexp. So this doesn't really work.
If you want to use split, try splitting on zero-width or space-only matches looking ahead to a uppercase letter:
> "SummerIs Here".split(/\s*(?=[A-Z])/)
^^^^^^^^^ LOOK-AHEAD
< ["Summer", "Is", "Here"]
Now you can join that to get the result you want, but without the lowercase mapping, which you could do with:
"SummerIs Here" .
split(/\s*(?=[A-Z])/) .
map(function(elt, i) { return i ? elt.toLowerCase() : elt; }) .
join('-')
which gives you want you want.
Using replace as suggested in another answer is also a perfectly viable solution. In terms of best practices, consider the following code from Ember:
var DECAMELIZE_REGEXP = /([a-z\d])([A-Z])/g;
var DASHERIZE_REGEXP = /[ _]/g;
function decamelize(str) {
return str.replace(DECAMELIZE_REGEXP, '$1_$2').toLowerCase();
}
function dasherize(str) {
return decamelize(str).replace(DASHERIZE_REGEXP, '-');
}
First, decamelize puts an underscore _ in between two-character sequences of lower-case letter (or digit) and upper-case letter. Then, dasherize replaces the underscore with a dash. This works perfectly except that it lower-cases the first word in the string. You can sort of combine decamelize and dasherize here with
var SPINALIZE_REGEXP = /([a-z\d])\s*([A-Z])/g;
function spinalCase(str) {
return str.replace(SPINALIZE_REGEXP, '$1-$2').toLowerCase();
}
You want to separate capitalized words, but you are trying to split the string on capitalized words that's why you get those empty strings and spaces.
I think you are looking for this :
var newA = str.match(/[A-Z][a-z]*/g).join("-");
([A-Z][a-z]*) *(?!$|[a-z])
You can simply do a replace by $1-.See demo.
https://regex101.com/r/nL7aZ2/1
var re = /([A-Z][a-z]*) *(?!$|[a-z])/g;
var str = 'SummerIs Here';
var subst = '$1-';
var result = str.replace(re, subst);
var newA = str.split(/ |(?=[A-Z])/).join("-");
You can change the regex like:
/ |(?=[A-Z])/ or /\s*(?=[A-Z])/
Result:
Summer-Is-Here
I want to split a string into an array using "space and the comma" (" ,") as the separator. Through looking through some similar questions I figured out how to make them work as one separator. However I want them to work ONLY as one. So I do not want the array to be separated by only comma's or only spaces.
So I'd like the string "txt1, txt2,txt3 txt4, t x t 5" become the array txt1, "txt2,txt3 txt4", "t x t 5"
Here is my current code which doesn't do this:
var array = string.split(/(?:,| )+/)
Here is a link to the jsFiddle: http://jsfiddle.net/MSQxk/
Just do: var array = string.split(", ");
You can use this
var array = string.split(/,\s*/);
//=> ["txt1", "txt2", "txt3", "txt4", "t x t 5"]
This will compensate for strings like
// comma separated
foo,bar
// comma and optional space
foo,bar, hello
If you wanted to compensate for optional whitespace on each side of the comma, you could use this:
// "foo,bar, hello , world".split(/\s*,\s*);
// => ['foo', 'bar', 'hello', 'world']
I have this string for example:
str = "my name is john#doe oh.yeh";
the end result I am seeking is this Array:
strArr = ['my','name','is','john','&#doe','oh','&yeh'];
which means 2 rules apply:
split after each space " " (I know how)
if there are special characters ("." or "#") then also split but add the characther "&" before the word with the special character.
I know I can strArr = str.split(" ") for the first rule. but how do I do the other trick?
thanks,
Alon
Assuming the result should be '&doe' and not '&#doe', a simple solution would be to just replace all . and # with & split by spaces:
strArr = str.replace(/[.#]/g, ' &').split(/\s+/)
/\s+/ matches consecutive white spaces instead of just one.
If the result should be '&#doe' and '&.yeah' use the same regex and add a capture:
strArr = str.replace(/([.#])/g, ' &$1').split(/\s+/)
You have to use a Regular expression, to match all special characters at once. By "special", I assume that you mean "no letters".
var pattern = /([^ a-z]?)[a-z]+/gi; // Pattern
var str = "my name is john#doe oh.yeh"; // Input string
var strArr = [], match; // output array, temporary var
while ((match = pattern.exec(str)) !== null) { // <-- For each match
strArr.push( (match[1]?'&':'') + match[0]); // <-- Add to array
}
// strArr is now:
// strArr = ['my', 'name', 'is', 'john', '&#doe', 'oh', '&.yeh']
It does not match consecutive special characters. The pattern has to be modified for that. Eg, if you want to include all consecutive characters, use ([^ a-z]+?).
Also, it does nothing include a last special character. If you want to include this one as well, use [a-z]* and remove !== null.
use split() method. That's what you need:
http://www.w3schools.com/jsref/jsref_split.asp
Ok. i saw, you found it, i think:
1) first use split to the whitespaces
2) iterate through your array, split again in array members when you find # or .
3) iterate through your array again and str.replace("#", "&#") and str.replace(".","&.") when you find
I would think a combination of split() and replace() is what you are looking for:
str = "my name is john#doe oh.yeh";
strArr = str.replace('\W',' &');
strArr = strArr.split(' ');
That should be close to what you asked for.
This works:
array = string.replace(/#|\./g, ' &$&').split(' ');
Take a look at demo here: http://jsfiddle.net/M6fQ7/1/
Suppose I've a long string containing newlines and tabs as:
var x = "This is a long string.\n\t This is another one on next line.";
So how can we split this string into tokens, using regular expression?
I don't want to use .split(' ') because I want to learn Javascript's Regex.
A more complicated string could be this:
var y = "This #is a #long $string. Alright, lets split this.";
Now I want to extract only the valid words out of this string, without special characters, and punctuation, i.e I want these:
var xwords = ["This", "is", "a", "long", "string", "This", "is", "another", "one", "on", "next", "line"];
var ywords = ["This", "is", "a", "long", "string", "Alright", "lets", "split", "this"];
Here is a jsfiddle example of what you asked: http://jsfiddle.net/ayezutov/BjXw5/1/
Basically, the code is very simple:
var y = "This #is a #long $string. Alright, lets split this.";
var regex = /[^\s]+/g; // This is "multiple not space characters, which should be searched not once in string"
var match = y.match(regex);
for (var i = 0; i<match.length; i++)
{
document.write(match[i]);
document.write('<br>');
}
UPDATE:
Basically you can expand the list of separator characters: http://jsfiddle.net/ayezutov/BjXw5/2/
var regex = /[^\s\.,!?]+/g;
UPDATE 2:
Only letters all the time:
http://jsfiddle.net/ayezutov/BjXw5/3/
var regex = /\w+/g;
Use \s+ to tokenize the string.
exec can loop through the matches to remove non-word (\W) characters.
var A= [], str= "This #is a #long $string. Alright, let's split this.",
rx=/\W*([a-zA-Z][a-zA-Z']*)(\W+|$)/g, words;
while((words= rx.exec(str))!= null){
A.push(words[1]);
}
A.join(', ')
/* returned value: (String)
This, is, a, long, string, Alright, let's, split, this
*/
var words = y.split(/[^A-Za-z0-9]+/);
Here is a solution using regex groups to tokenise the text using different types of tokens.
You can test the code here https://jsfiddle.net/u3mvca6q/5/
/*
Basic Regex explanation:
/ Regex start
(\w+) First group, words \w means ASCII letter with \w + means 1 or more letters
| or
(,|!) Second group, punctuation
| or
(\s) Third group, white spaces
/ Regex end
g "global", enables looping over the string to capture one element at a time
Regex result:
result[0] : default group : any match
result[1] : group1 : words
result[2] : group2 : punctuation , !
result[3] : group3 : whitespace
*/
var basicRegex = /(\w+)|(,|!)|(\s)/g;
/*
Advanced Regex explanation:
[a-zA-Z\u0080-\u00FF] instead of \w Supports some Unicode letters instead of ASCII letters only. Find Unicode ranges here https://apps.timwhitlock.info/js/regex
(\.\.\.|\.|,|!|\?) Identify ellipsis (...) and points as separate entities
You can improve it by adding ranges for special punctuation and so on
*/
var advancedRegex = /([a-zA-Z\u0080-\u00FF]+)|(\.\.\.|\.|,|!|\?)|(\s)/g;
var basicString = "Hello, this is a random message!";
var advancedString = "Et en français ? Avec des caractères spéciaux ... With one point at the end.";
console.log("------------------");
var result = null;
do {
result = basicRegex.exec(basicString)
console.log(result);
} while(result != null)
console.log("------------------");
var result = null;
do {
result = advancedRegex.exec(advancedString)
console.log(result);
} while(result != null)
/*
Output:
Array [ "Hello", "Hello", undefined, undefined ]
Array [ ",", undefined, ",", undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "this", "this", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "is", "is", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "a", "a", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "random", "random", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "message", "message", undefined, undefined ]
Array [ "!", undefined, "!", undefined ]
null
*/
In order to extract word-only characters, we use the \w symbol. Whether or not this will match Unicode characters or not is implementation-dependent, and you can use this reference to see what the case is for your language/library.
Please see Alexander Yezutov's answer (update 2) on how to apply this into an expression.