Regex split string into multiple parts - javascript

I have the following string
133. Alarm (Peep peep)
My goal is to split the string using regex into 3 parts and store it as a json object, like
{
"id": "133",
"Title": "Alarm",
"Subtitle": "Peep peep"
}
I can get the number using
function getID(text){
let numberPattern = /\d+/g;
let id = title.match(numberPattern);
if(id){
return id[0];
}
}
and the text between braces using
function getSubtitle(text){
let braces = /\((.*)\)/i;
let subtitle = title.match(braces);
if(subtitle){
return subtitle[1];
}
}
I'm wondering if I can get the three values from the string using a single regex expression (assuming that I will apply it on a long list of that string shape)

You can do this:
const data = '133. Alarm (Peep peep)'
const getInfo = data => {
let [,id, title, subtitle] = data.match(/(\d+)\.\s*(.*?)\s*\((.*?)\)/)
return { id, title, subtitle }
}
console.log(getInfo(data))

Something like
let partsPattern = /(\d+)\.\s*(.*[^[:space:]])\s*\((.*)\)/
Not sure if JS can into POSIX charsets, you might want to use \s instead of [:space:] (or even the space itself if you know that there aren't any other whitespaces expected).
This should capture all the three parts inside the respective submatches (numbers 1, 2 and 3).

You could use one function. exec() will return null if no matches are found, else it will return the matched string, followed by the matched groups. With id && id[1] a check is performed to not access the second element of id for when a match is not found and id === null.
The second element is used id[1] instead of id[0] because the first element will be the matched string, which will contain the dots and whitespace that helped find the match.
var str = "133. Alarm (Peep peep)";
function getData(str) {
var id = (/(\d+)\./).exec(str);
var title = (/\s+(.+)\s+\(/).exec(str);
var subtitle = (/\((.+)\)/).exec(str);
return {
"id": id && id[1],
"Title": title && title[1],
"Subtitle": subtitle && subtitle[1]
};
}
console.log(getData(str));

Related

Extract values as placeholder variables in JavaScript

This is the opposite problem to Efficient JavaScript String Replacement. That solution covers the insertion of data into placeholders whilst this question covers the matching of strings and the extraction of data from placeholders.
I have a problem in understanding how to do the following in ES6 JavaScript.
I'm trying to figure out a way to match strings with placeholders and extract the contents of the placeholders as properties of an object. Perhaps an example will help.
Given the pattern:
my name is {name} and I live in {country}
It would match the string:
my name is Mark and I live in England
And provide an object:
{
name: "Mark",
country: "England"
}
The aim is to take a string and check against a number of patterns until I get a match and then have access to the placeholder values.
Can anyone point me in the right direction...
You can use named capture groups for that problem e.g.
const string = "my name is Mark and I live in England";
const regEx = /name is\s(?<name>\w+?)\b.*?live in (?<country>\w+?)\b/i;
const match = regEx.exec(string);
console.log(match?.groups);
I would be surprised if it can be done with a regex.
The way I would think about it is as follows:
Split the template by { or }
iterate over the latter template parts (every other one starting with index 1)
In each iteration, get the key, its prefix, and postfix (or next prefix)
We can then compute the start and end indices to extract the value from the string with the help of the above.
const extract = (template, str) => {
const templateParts = template.split(/{|}/);
const extracted = {};
for (let index = 1; index < templateParts.length; index += 2) {
const
possibleKey = templateParts[index],
keyPrefix = templateParts[index - 1],
nextPrefix = templateParts[index + 1];
const substringStartIndex = str.indexOf(keyPrefix) + keyPrefix.length;
const substringEndIndex = nextPrefix ? str.indexOf(nextPrefix) : str.length;
extracted[possibleKey] = str.substring(substringStartIndex, substringEndIndex);
}
return extracted;
}
console.log( extract('my name is {name} and I live in {country}', 'my name is Mark and I live in England') );

Finding multiple groups in one string

Figure the following string, it's a list of html a separated by commas. How to get a list of {href,title} that are between 'start' and 'end'?
not thisstartfoo, barendnot this
The following regex give only the last iteration of a.
/start((?:<a href="(?<href>.*?)" title="(?<title>.*?)">.*?<\/a>(?:, )?)+)end/g
How to have all the list?
This should give you what you need.
https://regex101.com/r/isYIeR/1
/(?:start)*(?:<a href=(?<href>.*?)\s+title=(?<title>.*?)>.*?<\/a>)+(?:,|end)
UPDATE
This does not meet the requirement.
The Returned Value for a Given Group is the Last One Captured
I do not think this can be done in one regex match. Here is a javascript solution with 2 regex matches to get a list of {href, title}
var sample='startfoo, bar,barendstart<img> something end\n' +
'beginfoo, bar,barend\n'+
'startfoo again, bar again,bar2 againend';
var reg = /start((?:\s*<a href=.*?\s+title=.*?>.*?<\/a>,?)+)end/gi;
var regex2 = /href=(?<href>.*?)\s+title=(?<title>.*?)>/gi;
var step1, step2 ;
var hrefList = [];
while( (step1 = reg.exec(sample)) !== null) {
while((step2 = regex2.exec(step1[1])) !== null) {
hrefList.push({href:step2.groups["href"], title:step2.groups["title"]});
}
}
console.log(hrefList);
If the format is constant - ie only href and title for each tag, you can use this regex to find a string which is not "", and has " and a space or < after it using lookahead (regex101):
const str = 'startfoo, barend';
const result = str.match(/[^"]+(?="[\s>])/gi);
console.log(result);
This regex:
<.*?>
removes all html tags
so for example
<h1>1. This is a title </h1><ul><a href='www.google.com'>2. Click here </a></ul>
After using regex you will get:
1. This is a title 2. Click here
Not sure if this answers your question though.

Split String with 1 known and some unknown characters in Javascript

I am wanting / needing to split a string by a specific character, for instance a '/' that I can reliably expect, but I need to know what the characters directly in front of that character are up to the space before those characters.
For example:
let myStr = "bob u/ used cars nr/ no resale value i/ information is attached to the vehicle tag bb/ Joe's wrecker service"
So, I can split by the '/' already using
mySplitStr = myStr.split('/');
But now mySplitStr is an array like
mySplitStr[1] = "bob u"
mySplitStr[2] = " used cars nr"
mySplitStr[3] = " no resale value i"
etc
I need, however, to know what the characters are just prior to the '/'.
u
nr
i
etc
so that I know what to do with the information following the '/'.
Any help is greatly appreciated.
You could use this regular expression argument for the split:
let parts = myStr.split(/\s*(\S+)\/\s*/);
Now you will have the special characters at every odd position in the resulting array.
let myStr = "bob u/ used cars nr/ no resale value i/ information is attached to the vehicle tag bb/ Joe's wrecker service";
let parts = myStr.split(/\s*(\S+)\/\s*/);
console.log(parts);
.as-console-wrapper { max-height: 100% !important; top: 0; }
For a more structured result, you could use these special character combinations as keys of an object:
let myStr = "bob u/ used cars nr/ no resale value i/ information is attached to the vehicle tag bb/ Joe's wrecker service";
let obj = myStr.split(/\s*(\S+)\/\s*/).reduceRight( (acc, v) => {
if (acc.default === undefined) {
acc.default = v;
} else {
acc[v] = acc.default;
acc.default = undefined;
}
return acc;
}, {});
console.log(obj);
I think, this is what you're looking for:
"bob u/ used cars nr/ no resale value i/ information is attached to the vehicle tag bb/ Joe's wrecker service"
.split('/')
.map(splitPart => {
const wordsInPart = splitPart.split(' ');
return wordsInPart[wordsInPart.length - 1];
});
// Produces: ["u", "nr", "i", "bb", "service"]
Splitting by '/' is not enough. You also need to visit every part of your split result and extract the last "work" from it.
After you split your string, you indeed get an array, where the last set of characters is the one you want to know, and you can grab it with this:
let mySplitStr = myStr.split('/');
for(let i = 0; i < mySplitStr.length; i++) {
let mySplitStrEl = mySplitStr[i].split(" "); // Split current text element
let lastCharsSet = mySplitStrEl[mySplitStrEl.length -1]; // Grab its last set of characters
let myCurrentStr = mySplitStrEl.splice(mySplitStrEl.length -1, 1); // Remove last set of characters from text element
myCurrentStr = mySplitStrEl.join(" "); // Join text element back into a string
switch(lastCharsSet) {
case "u":
// Your code here
case "nr":
// Your code here
case "i":
// Your code here
}
}
Inside the loop, for the first iteration:
// lastCharsSet is "u"
// myCurrentStr is "bob"

Apply array of string with string.replace

Let's say I have a string like so:
const sentence = "This is my custom string";
I want to highlight the words of a input field inside this sentence.
Let's say a say user typed a string and I have converted the separate words into an array like so:
["custom", "string", "is"]
I know want to replace the words in my sentence with a highlighted version of the words in my array. For a single word I would do something like this:
const word = 'custom';
const searchFor = new RegExp(`(${word})`, 'gi');
const replaceWith = '<strong class="highlight">$1</strong>';
const highlightedSentence = sentence.replace(searchFor, replaceWith);
How can I apply this logic with an array to the entire sentence?
I can't simply loop through it because the string will contain my highlighted class which will also be taken into the highlighting process the the second loop, third loop, etc.
This means that on a second loop if a user where to type:
"high custom"
I would highlight my highlighted class, leading to highlight inception.
For an example of what I mean try commenting/uncommenting the 2 highlighter functions:
https://jsfiddle.net/qh9ttvp2/1/
Your problem is that while replacing words, you replace already added html tag with .class 'highlight'.
Solution here could be to replace anything that is not html tag. Replace this line in you jsfiddle example.
const searchFor = new RegExp(`(${word})(?!([^<]+)?>)`, 'gi');
You can split you sentence into array and check if your element is already highlighted:
let sentence = "This is a some type of long string with all kinds of words in it, all kinds.";
let sentenceArr = sentence.split(' '); // make an array
const query = "kinds words all type";
function highlighter(query, sentence) {
const words = query.match(/\S+/g);
words.forEach((word) => {
// Create a capture group since we are searching case insensitive.
const searchFor = new RegExp(`(${word})`, 'gi');
const replaceWith = '<strong class="highlight">$1</strong>';
sentenceArr = sentenceArr.map(sw => (sw.indexOf('strong class="highlight"') === -1) ? sw.replace(searchFor, replaceWith) : sw); // if already highlited - skip
//sentence = sentence.replace(searchFor, replaceWith);
});
// console.log(sentence);
document.querySelector('.highlighted-sentence').innerHTML = sentenceArr.join(' '); // notice sentenceArr
}
// Works.
//highlighter('kinds words all type', sentence);
// Doesn't work.
highlighter('kinds words high', sentence);
<div class="highlighted-sentence"></div>

How to extract a string using JavaScript Regex?

I'm trying to extract a substring from a file with JavaScript Regex. Here is a slice from the file :
DATE:20091201T220000
SUMMARY:Dad's birthday
the field I want to extract is "Summary". Here is the approach:
extractSummary : function(iCalContent) {
/*
input : iCal file content
return : Event summary
*/
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr);
}
function extractSummary(iCalContent) {
var rx = /\nSUMMARY:(.*)\n/g;
var arr = rx.exec(iCalContent);
return arr[1];
}
You need these changes:
Put the * inside the parenthesis as
suggested above. Otherwise your matching
group will contain only one
character.
Get rid of the ^ and $. With the global option they match on start and end of the full string, rather than on start and end of lines. Match on explicit newlines instead.
I suppose you want the matching group (what's
inside the parenthesis) rather than
the full array? arr[0] is
the full match ("\nSUMMARY:...") and
the next indexes contain the group
matches.
String.match(regexp) is
supposed to return an array with the
matches. In my browser it doesn't (Safari on Mac returns only the full
match, not the groups), but
Regexp.exec(string) works.
You need to use the m flag:
multiline; treat beginning and end characters (^ and $) as working
over multiple lines (i.e., match the beginning or end of each line
(delimited by \n or \r), not only the very beginning or end of the
whole input string)
Also put the * in the right place:
"DATE:20091201T220000\r\nSUMMARY:Dad's birthday".match(/^SUMMARY\:(.*)$/gm);
//------------------------------------------------------------------^ ^
//-----------------------------------------------------------------------|
Your regular expression most likely wants to be
/\nSUMMARY:(.*)$/g
A helpful little trick I like to use is to default assign on match with an array.
var arr = iCalContent.match(/\nSUMMARY:(.*)$/g) || [""]; //could also use null for empty value
return arr[0];
This way you don't get annoying type errors when you go to use arr
This code works:
let str = "governance[string_i_want]";
let res = str.match(/[^governance\[](.*)[^\]]/g);
console.log(res);
res will equal "string_i_want". However, in this example res is still an array, so do not treat res like a string.
By grouping the characters I do not want, using [^string], and matching on what is between the brackets, the code extracts the string I want!
You can try it out here: https://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_match_regexp
Good luck.
(.*) instead of (.)* would be a start. The latter will only capture the last character on the line.
Also, no need to escape the :.
You should use this :
var arr = iCalContent.match(/^SUMMARY\:(.)*$/g);
return(arr[0]);
this is how you can parse iCal files with javascript
function calParse(str) {
function parse() {
var obj = {};
while(str.length) {
var p = str.shift().split(":");
var k = p.shift(), p = p.join();
switch(k) {
case "BEGIN":
obj[p] = parse();
break;
case "END":
return obj;
default:
obj[k] = p;
}
}
return obj;
}
str = str.replace(/\n /g, " ").split("\n");
return parse().VCALENDAR;
}
example =
'BEGIN:VCALENDAR\n'+
'VERSION:2.0\n'+
'PRODID:-//hacksw/handcal//NONSGML v1.0//EN\n'+
'BEGIN:VEVENT\n'+
'DTSTART:19970714T170000Z\n'+
'DTEND:19970715T035959Z\n'+
'SUMMARY:Bastille Day Party\n'+
'END:VEVENT\n'+
'END:VCALENDAR\n'
cal = calParse(example);
alert(cal.VEVENT.SUMMARY);

Categories

Resources