How to split comma, semicolon and comma separated phrases with semicolons around?

How to split comma, semicolon and comma separated phrases with semicolons around? - javascript

How to split comma, semicolon and comma separated phrases with semicolons around where it should treat anything between semicolons with a comma as the delimiter, but also delimit just comma and semicolon all together in funcion?
String of words:
var words = "apple;banana;apple, banana;fruit"
Regex function separate by , and ;
var result = words.split(/[,;]+/);
Result from that function is:
[ "apple", "banana", "apple", " banana", "fruit" ]
But what I am looking to get is this, to have "banana, apple" and treat it as a separate value
[ "apple", "banana, apple", " banana", "fruit" ]
So is it possible to combine 2 cases in that function to output as the second example? Or maybe some other elegant way?

A combination of match and replace can be used.
With the match we take the two patterns we have
Values between delimiter (?:\w+;\w+,)
Value not in between delimiter \w+;?
Now based on matched group we just change the values in desired format using replace
let words = `apple;banana;apple, banana;fruit`
let op = words.match(/(?:\w+;\w+,)|\w+;?/g)
.map(e=>e.replace(/(?:(\w+);(\w+),)|(\w+);?/g, (m,g1,g2,g3)=> g1 ? g1+', '+g2 : g3 ))
console.log(op)

Array math is easier than regex.
Objective
apple;banana;apple, banana;fruit > [ "apple", "banana, apple", " banana", "fruit" ]
Remove spaces, split at ;, split at ,.
Solution
const words = 'apple;banana;apple, banana;fruit';
const result = words.split(' ').join('').split(';').join(',').split(',');
console.log(result);

Accorting to question apple-banana second example (after this question update) try
words.split(';');
var words = "apple;banana;apple, banana;fruit"
var result = words.split(';');
console.log(result);

Related

split mixed description hashtags and text

I have a string like
var string = "#developers must split #hashtags";
I want to split it when a word starts with # symbol
I tried these two examples
var example1 = string.split(/(?=#)/g);
//result is ["#developers must split ", "#hashtags"]
var example2 = string.split(/(?:^|[ ])#([a-zA-Z]+)/g);
// result is ["", "developers", "must split", "hashtags", ""]
Result must looks like this
var description = ["#developers", "must split", "#hashtags"]
JSFiddle example
I have a solution but it is a bit long, I want it short with regex. thank you,

When you split, the captured groups are included in the split results array. So you can capture the #word delimiter and omit the space before and after the delimiter with an expression like \s*(#\S+)\s*. Omit empty strings by filter-ing on an expression that tests the truthiness of each string (e.g.: x => x).
let result = "#developers must split #hashtags".split(/\s*(#\S+)\s*/g).filter(x => x);
console.log(result);

Separating words with Regex

I am trying to get this result: 'Summer-is-here'. Why does the code below generate extra spaces? (Current result: '-Summer--Is- -Here-').
function spinalCase(str) {
var newA = str.split(/([A-Z][a-z]*)/).join("-");
return newA;
}
spinalCase("SummerIs Here");

You are using a variety of split where the regexp contains a capturing group (inside parentheses), which has a specific meaning, namely to include all the splitting strings in the result. So your result becomes:
["", "Summer", "", "Is", " ", "Here", ""]
Joining that with - gives you the result you see. But you can't just remove the unnecessary capture group from the regexp, because then the split would give you
["", "", " ", ""]
because you are splitting on zero-width strings, due to the * in your regexp. So this doesn't really work.
If you want to use split, try splitting on zero-width or space-only matches looking ahead to a uppercase letter:
> "SummerIs Here".split(/\s*(?=[A-Z])/)
^^^^^^^^^ LOOK-AHEAD
< ["Summer", "Is", "Here"]
Now you can join that to get the result you want, but without the lowercase mapping, which you could do with:
"SummerIs Here" .
split(/\s*(?=[A-Z])/) .
map(function(elt, i) { return i ? elt.toLowerCase() : elt; }) .
join('-')
which gives you want you want.
Using replace as suggested in another answer is also a perfectly viable solution. In terms of best practices, consider the following code from Ember:
var DECAMELIZE_REGEXP = /([a-z\d])([A-Z])/g;
var DASHERIZE_REGEXP = /[ _]/g;
function decamelize(str) {
return str.replace(DECAMELIZE_REGEXP, '$1_$2').toLowerCase();
}
function dasherize(str) {
return decamelize(str).replace(DASHERIZE_REGEXP, '-');
}
First, decamelize puts an underscore _ in between two-character sequences of lower-case letter (or digit) and upper-case letter. Then, dasherize replaces the underscore with a dash. This works perfectly except that it lower-cases the first word in the string. You can sort of combine decamelize and dasherize here with
var SPINALIZE_REGEXP = /([a-z\d])\s*([A-Z])/g;
function spinalCase(str) {
return str.replace(SPINALIZE_REGEXP, '$1-$2').toLowerCase();
}

You want to separate capitalized words, but you are trying to split the string on capitalized words that's why you get those empty strings and spaces.
I think you are looking for this :
var newA = str.match(/[A-Z][a-z]*/g).join("-");

([A-Z][a-z]*) *(?!$|[a-z])
You can simply do a replace by $1-.See demo.
https://regex101.com/r/nL7aZ2/1
var re = /([A-Z][a-z]*) *(?!$|[a-z])/g;
var str = 'SummerIs Here';
var subst = '$1-';
var result = str.replace(re, subst);

var newA = str.split(/ |(?=[A-Z])/).join("-");
You can change the regex like:
/ |(?=[A-Z])/ or /\s*(?=[A-Z])/
Result:
Summer-Is-Here

How do I split a string with multiple separators acting only as one in javascript?

I want to split a string into an array using "space and the comma" (" ,") as the separator. Through looking through some similar questions I figured out how to make them work as one separator. However I want them to work ONLY as one. So I do not want the array to be separated by only comma's or only spaces.
So I'd like the string "txt1, txt2,txt3 txt4, t x t 5" become the array txt1, "txt2,txt3 txt4", "t x t 5"
Here is my current code which doesn't do this:
var array = string.split(/(?:,| )+/)
Here is a link to the jsFiddle: http://jsfiddle.net/MSQxk/

Just do: var array = string.split(", ");

You can use this
var array = string.split(/,\s*/);
//=> ["txt1", "txt2", "txt3", "txt4", "t x t 5"]
This will compensate for strings like
// comma separated
foo,bar
// comma and optional space
foo,bar, hello
If you wanted to compensate for optional whitespace on each side of the comma, you could use this:
// "foo,bar, hello , world".split(/\s*,\s*);
// => ['foo', 'bar', 'hello', 'world']

How to Split string with multiple rules in javascript

I have this string for example:
str = "my name is john#doe oh.yeh";
the end result I am seeking is this Array:
strArr = ['my','name','is','john','&#doe','oh','&yeh'];
which means 2 rules apply:
split after each space " " (I know how)
if there are special characters ("." or "#") then also split but add the characther "&" before the word with the special character.
I know I can strArr = str.split(" ") for the first rule. but how do I do the other trick?
thanks,
Alon

Assuming the result should be '&doe' and not '&#doe', a simple solution would be to just replace all . and # with & split by spaces:
strArr = str.replace(/[.#]/g, ' &').split(/\s+/)
/\s+/ matches consecutive white spaces instead of just one.
If the result should be '&#doe' and '&.yeah' use the same regex and add a capture:
strArr = str.replace(/([.#])/g, ' &$1').split(/\s+/)

You have to use a Regular expression, to match all special characters at once. By "special", I assume that you mean "no letters".
var pattern = /([^ a-z]?)[a-z]+/gi; // Pattern
var str = "my name is john#doe oh.yeh"; // Input string
var strArr = [], match; // output array, temporary var
while ((match = pattern.exec(str)) !== null) { // <-- For each match
strArr.push( (match[1]?'&':'') + match[0]); // <-- Add to array
}
// strArr is now:
// strArr = ['my', 'name', 'is', 'john', '&#doe', 'oh', '&.yeh']
It does not match consecutive special characters. The pattern has to be modified for that. Eg, if you want to include all consecutive characters, use ([^ a-z]+?).
Also, it does nothing include a last special character. If you want to include this one as well, use [a-z]* and remove !== null.

use split() method. That's what you need:
http://www.w3schools.com/jsref/jsref_split.asp
Ok. i saw, you found it, i think:
1) first use split to the whitespaces
2) iterate through your array, split again in array members when you find # or .
3) iterate through your array again and str.replace("#", "&#") and str.replace(".","&.") when you find

I would think a combination of split() and replace() is what you are looking for:
str = "my name is john#doe oh.yeh";
strArr = str.replace('\W',' &');
strArr = strArr.split(' ');
That should be close to what you asked for.

This works:
array = string.replace(/#|\./g, ' &$&').split(' ');
Take a look at demo here: http://jsfiddle.net/M6fQ7/1/

Tokenizing strings using regular expression in Javascript

Suppose I've a long string containing newlines and tabs as:
var x = "This is a long string.\n\t This is another one on next line.";
So how can we split this string into tokens, using regular expression?
I don't want to use .split(' ') because I want to learn Javascript's Regex.
A more complicated string could be this:
var y = "This #is a #long $string. Alright, lets split this.";
Now I want to extract only the valid words out of this string, without special characters, and punctuation, i.e I want these:
var xwords = ["This", "is", "a", "long", "string", "This", "is", "another", "one", "on", "next", "line"];
var ywords = ["This", "is", "a", "long", "string", "Alright", "lets", "split", "this"];

Here is a jsfiddle example of what you asked: http://jsfiddle.net/ayezutov/BjXw5/1/
Basically, the code is very simple:
var y = "This #is a #long $string. Alright, lets split this.";
var regex = /[^\s]+/g; // This is "multiple not space characters, which should be searched not once in string"
var match = y.match(regex);
for (var i = 0; i<match.length; i++)
{
document.write(match[i]);
document.write('<br>');
}
UPDATE:
Basically you can expand the list of separator characters: http://jsfiddle.net/ayezutov/BjXw5/2/
var regex = /[^\s\.,!?]+/g;
UPDATE 2:
Only letters all the time:
http://jsfiddle.net/ayezutov/BjXw5/3/
var regex = /\w+/g;

Use \s+ to tokenize the string.

exec can loop through the matches to remove non-word (\W) characters.
var A= [], str= "This #is a #long $string. Alright, let's split this.",
rx=/\W*([a-zA-Z][a-zA-Z']*)(\W+|$)/g, words;
while((words= rx.exec(str))!= null){
A.push(words[1]);
}
A.join(', ')
/* returned value: (String)
This, is, a, long, string, Alright, let's, split, this
*/

var words = y.split(/[^A-Za-z0-9]+/);

Here is a solution using regex groups to tokenise the text using different types of tokens.
You can test the code here https://jsfiddle.net/u3mvca6q/5/
/*
Basic Regex explanation:
/ Regex start
(\w+) First group, words \w means ASCII letter with \w + means 1 or more letters
| or
(,|!) Second group, punctuation
| or
(\s) Third group, white spaces
/ Regex end
g "global", enables looping over the string to capture one element at a time
Regex result:
result[0] : default group : any match
result[1] : group1 : words
result[2] : group2 : punctuation , !
result[3] : group3 : whitespace
*/
var basicRegex = /(\w+)|(,|!)|(\s)/g;
/*
Advanced Regex explanation:
[a-zA-Z\u0080-\u00FF] instead of \w Supports some Unicode letters instead of ASCII letters only. Find Unicode ranges here https://apps.timwhitlock.info/js/regex
(\.\.\.|\.|,|!|\?) Identify ellipsis (...) and points as separate entities
You can improve it by adding ranges for special punctuation and so on
*/
var advancedRegex = /([a-zA-Z\u0080-\u00FF]+)|(\.\.\.|\.|,|!|\?)|(\s)/g;
var basicString = "Hello, this is a random message!";
var advancedString = "Et en français ? Avec des caractères spéciaux ... With one point at the end.";
console.log("------------------");
var result = null;
do {
result = basicRegex.exec(basicString)
console.log(result);
} while(result != null)
console.log("------------------");
var result = null;
do {
result = advancedRegex.exec(advancedString)
console.log(result);
} while(result != null)
/*
Output:
Array [ "Hello", "Hello", undefined, undefined ]
Array [ ",", undefined, ",", undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "this", "this", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "is", "is", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "a", "a", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "random", "random", undefined, undefined ]
Array [ " ", undefined, undefined, " " ]
Array [ "message", "message", undefined, undefined ]
Array [ "!", undefined, "!", undefined ]
null
*/

In order to extract word-only characters, we use the \w symbol. Whether or not this will match Unicode characters or not is implementation-dependent, and you can use this reference to see what the case is for your language/library.
Please see Alexander Yezutov's answer (update 2) on how to apply this into an expression.

Develop Reference

JavaScript is the programming language of the Web.

How to split comma, semicolon and comma separated phrases with semicolons around? - javascript

Accorting to question apple-banana second example (after this question update) try words.split(';'); var words = "apple;banana;apple, banana;fruit" var result = words.split(';'); console.log(result);

Related

split mixed description hashtags and text

Separating words with Regex

How do I split a string with multiple separators acting only as one in javascript?

How to Split string with multiple rules in javascript

Tokenizing strings using regular expression in Javascript

Categories

Resources