Regex to get parts of a string - javascript

all i am struggling to get sections of a string with regex without using Split or any other similar function here is my scenario:
I have this text U:BCCNT.3;GOwhich i want to get the different sections divided but the symbols in the middle I have managed to get the first one with this regex /(.+):/.exec(value) this gives me the first word till the colon(:) and these are the different variations of the value
Second section BCCNT
BCCNT.3;GO -> without the U: so the string might also contain no colon so for the second section the logic would be any text that is between : and . or any text ending with . and nothing infront
Third section .3-> any text starting with a . and ending with nothing or anytext staring with a . and ending with a ; semicolon
Fourth section ;GO-> any text starting with a ; and ending with nothing
EDIT
and preferably on separate variables like
const sectionOne = regex.exec(value);
const sectionTwo = regex.exec(value);
const sectionThree = regex.exec(value);
const sectionFour = regex.exec(value);
and which ever value doesnt match the pattern the variable would just be undefined or null or any empty string

Here is a regex approach using 4 separate optional capture groups for each possible component:
var input = "U:BCCNT.3;GO";
var re = /^([^:]+:)?([^.]+)?(\.[^;]+)?(;.*)?$/g;
var m;
m = re.exec(input);
if (m) {
console.log(m[1], m[2], m[3], m[4]);
}

Something like
/^(?:([^:]*):)?([^.]*)\.(?:([^;]*);(.*))?/
For example:
const s = 'U:BCCNT.3;GO';
const m = s.match(/^(?:([^:]*):)?([^.]*)\.(?:([^;]*);(.*))?/);
console.log(m);

Related

Finding multiple groups in one string

Figure the following string, it's a list of html a separated by commas. How to get a list of {href,title} that are between 'start' and 'end'?
not thisstartfoo, barendnot this
The following regex give only the last iteration of a.
/start((?:<a href="(?<href>.*?)" title="(?<title>.*?)">.*?<\/a>(?:, )?)+)end/g
How to have all the list?
This should give you what you need.
https://regex101.com/r/isYIeR/1
/(?:start)*(?:<a href=(?<href>.*?)\s+title=(?<title>.*?)>.*?<\/a>)+(?:,|end)
UPDATE
This does not meet the requirement.
The Returned Value for a Given Group is the Last One Captured
I do not think this can be done in one regex match. Here is a javascript solution with 2 regex matches to get a list of {href, title}
var sample='startfoo, bar,barendstart<img> something end\n' +
'beginfoo, bar,barend\n'+
'startfoo again, bar again,bar2 againend';
var reg = /start((?:\s*<a href=.*?\s+title=.*?>.*?<\/a>,?)+)end/gi;
var regex2 = /href=(?<href>.*?)\s+title=(?<title>.*?)>/gi;
var step1, step2 ;
var hrefList = [];
while( (step1 = reg.exec(sample)) !== null) {
while((step2 = regex2.exec(step1[1])) !== null) {
hrefList.push({href:step2.groups["href"], title:step2.groups["title"]});
}
}
console.log(hrefList);
If the format is constant - ie only href and title for each tag, you can use this regex to find a string which is not "", and has " and a space or < after it using lookahead (regex101):
const str = 'startfoo, barend';
const result = str.match(/[^"]+(?="[\s>])/gi);
console.log(result);
This regex:
<.*?>
removes all html tags
so for example
<h1>1. This is a title </h1><ul><a href='www.google.com'>2. Click here </a></ul>
After using regex you will get:
1. This is a title 2. Click here
Not sure if this answers your question though.

Regex to get the first element of each line

I'm trying to get the first element of each line, be it either a number or a string but when the line starts with a number, my current attempt still includes it:
const totalWords = "===========\n\n 1-test\n\n 2-ests \n\n 1 zfzrf";
const firstWord = totalWords.replace(/\s.*/,'')
The output I get :
1-test
2-ests
1 zfzrf
The output I would like:
1
2
1
Alternatively, if you are interested in a non-regexp version (should be faster)
var str = "===========\n\n 1-test\n\n 2-ests \n\n 1 zfzrf";
var res = str.split("\n");
for (row of res) {
let words = row.trim().split(' ');
let firstWord = words[0].trim();
// get first character, parse to int, validate it is infact integer
let element = firstWord.charAt(0);
if (Number.isInteger(parseInt(element))) {
console.log('Row', row);
console.log('Element: ', element);
}
}
Your Regex should skip leading spaces and then capture everything until a space or a dash, so you might want to go with ^\s*([^ -]+).
(See https://regex101.com/r/u7ELiw/1 for application to your examples)
If you additionally know exactly that you are looking for a single digit, you can instead go for ^\s*(\d)
(See https://regex101.com/r/IWhEQ1/1 again for applications)
Maybe im not too sure what you are asking but why are you using something as convoluted as regex when
line.charAt(0)
works pretty well?

Calling characters from a string In JS

I am currently trying to grab a 9 character string from a title, the start of the list is always "BC-" and then it is always six digits following, so for instance a complete thing would look like - "BC-004352" my problem is that I can grab everything after the "BC-" however if there is something after that like "Words Words BC-004352 Words words" it then grabs the "BC-004352 Words Words". This will mess up my program, so is their any way of only capturing the "BC-004352"? How could I then make the script self executable as at the moment it is running of a button and that isn't helpful
<!--BC-Check six digit-->
<script type="text/javascript">
function bc_check() {
var str = "FUCKCKCKKC BC-040300 Has broken";
var res = str.substring(str.indexOf("BC-") + 0);
document.getElementById("recognize").innerHTML = res;
}
</script>
Or you can do this with Regular Expressions:
const testString = "FCKCKCKKC BC-040300 Has broken";
const regex = /.*?BC-(\d+).*?/; //Capture any number of digits following BC-
const matches = testString.match(regex); //Get the match collection
console.log(matches[1]); //Match collection index 1 holds your number
substring has a second parameter for indexEnd. It doesn't include the character at that index so you'll have to add one to get all of the chars you want. So in this case you'll want to add 10 to the index of "BC-".

Javascript get all text in between string

I have string content that gets delivered to me via TCP. This info is only relevant because it means that I do not consistently retrieve the same string. I have a <start> and <stop> separator to ensure that any time I get the data via TCP, I am outputting the full content.
My incoming content looks like so:
<start>Apple Bandana Cadillac<stop>
I want to get everything in between <start> and <stop>. So just Apple Bandana Cadillac.
My script to do this looks like so:
servercsv.on("connection", function(socket){
let d_basic = "";
socket.on('data', function(data){
d_basic += data.toString();
let d_csvindex = d_basic.indexOf('<stop>');
while (d_csvindex > -1){
try {
let strang = d_basic.substring(0, d_csvindex);
let dyson = strang.replace(/<start>/g, '');
let dson = papaparse.parse(dyson);
myfunction(dson);
}
catch(e){ console.log(e); }
d_basic = d_basic.substring(d_csvindex+1);
d_csvindex = d_basic.indexOf('<stop>');
}
});
});
What this means is that I am getting everything before the <stop> string and outputting it. I have also included the line let dyson = strang.replace(/<start>/g, ''); because I want to remove the <start> text.
However, because this is TCP, I am not guranteed to get all parts of this string. As a result, I frequently get back stop>Apple Bandana Cadillac<stop> or some variation of this (such as start>Apple Bandana Cadillac<stop>. It is not consistent enough that I can just do strang.replace("start>", "")
Ideally, I would like my separator to select content that is in between <start> and <stop>. Not just <stop>. However, I am unsure how to do so.
Alternatively, I can also settle for a regex that retrieves all combination of <start><stop> strings during my while loop, and just delete them. So check for <, s, t, a, r, t individually and so forth. But unsure how to implement regex to delete portions of a whole string.
Assuming you get full response:
var test = "<start>Apple Bandana Cadillac<stop>";
var testRE = test.match("<start>(.*)<stop>");
testRE[1] //"Apple Bandana Cadillac"
If there are new lines between <start> and <stop>
var test = "<start>Apple Bandana Cadillac<stop>";
var testRE = test.match("<start>([\\S\\s]*)<stop>");
testRE[1] //"Apple Bandana Cadillac"
Using regular expressions capturing group here.
Try this regex with replace() method:
/<st.*?>(.*?)(?!<st)/g
Literal.................................................: <st
Any char zero or more times lazily...: .*?
Literal..................................................: >
Begin capture group..........................: (
Any char zero or more times lazily...: .*?
End capture group.............................: )
Begin negative lookahead.................: (?!
Literal...................................................: <st
End negative lookahead....................: )
In the Demo below notice that the test example consists of multiple lines, and variances of <start> and <stop> (basically <st).
Demo 1
var rgx = /<st.*?>(.*?)(?!<st)/g;
var str = `<start>Apple Bandana Cadillac<stop>
<stop>Grapes Trampoline Ham<stop>
<start>Kebab Matador Pencil<start>`;
var res = str.replace(rgx, `$1`);
console.log(res);
Update
"say I have op>Grapes Trampoline Ham<stop>...still trying to remove all parts of the string <stop>"
/^(.*?>)(.*?)(<.*?)$/gm;
A simple explanation will have to do since a step-by-step such as Demo 1 would take too much time.
This RegEx is multiline. /m
^..........Begin line.
(.*?>)..Lazily capture everything until literal >........[Return as $1]
(.*?)...Then lazily capture everything until................[Return as $2]
(<.*?)..Literal < and lazily capture everything until..[Return as $3]
$...........End line.
The trick is to replace the second capture $2 and leave $1 and $3 alone.
Demo 2
var rgx = /^(.*?>)(.*?)(<.*?)$/gm;
var str = `<start>Apple Bandana Cadillac<stop>
<stop>Grapes Trampoline Ham<stop>
<start>Kebab Matador Pencil<start>
op>Score False Razor<stop>
`;
var res = str.replace(rgx, `$2`);
console.log(res);

Javascript regular expression is returning # character even though it's not captured

text = 'ticket number #1234 and #8976 ';
r = /#(\d+)/g;
var match = r.exec(text);
log(match); // ["#1234", "1234"]
In the above case I would like to capture both 1234 and 8976. How do I do that. Also the sentence can have any number of '#' followed by integers. So the solution should not hard not be hard coded assuming that there will be at max two occurrences.
Update:
Just curious . Checkout the following two cases.
var match = r.exec(text); // ["#1234", "1234"]
var match = text.match(r); //["#1234", "#8976"]
Why in the second case I am getting # even though I am not capturing it. Looks like string.match does not obey capturing rules.
exec it multiple times to get the rest.
while((match = r.exec(text)))
log(match);
Use String.prototype.match instead of RegExp.prototype.exec:
var match = text.match(r);
That will give you all matches at once (requires g flag) instead of one match at a time.
Here's another way
var text = 'ticket number #1234 and #8976 ';
var r = /#(\d+)/g;
var matches = [];
text.replace( r, function( all, first ) {
matches.push( first )
});
log(matches);
// ["1234", "8976"]

Categories

Resources