Javascript regular expression is returning # character even though it's not captured

Javascript regular expression is returning # character even though it's not captured - javascript

text = 'ticket number #1234 and #8976 ';
r = /#(\d+)/g;
var match = r.exec(text);
log(match); // ["#1234", "1234"]
In the above case I would like to capture both 1234 and 8976. How do I do that. Also the sentence can have any number of '#' followed by integers. So the solution should not hard not be hard coded assuming that there will be at max two occurrences.
Update:
Just curious . Checkout the following two cases.
var match = r.exec(text); // ["#1234", "1234"]
var match = text.match(r); //["#1234", "#8976"]
Why in the second case I am getting # even though I am not capturing it. Looks like string.match does not obey capturing rules.

exec it multiple times to get the rest.
while((match = r.exec(text)))
log(match);

Use String.prototype.match instead of RegExp.prototype.exec:
var match = text.match(r);
That will give you all matches at once (requires g flag) instead of one match at a time.

Here's another way
var text = 'ticket number #1234 and #8976 ';
var r = /#(\d+)/g;
var matches = [];
text.replace( r, function( all, first ) {
matches.push( first )
});
log(matches);
// ["1234", "8976"]

Related

Javascript get all text in between string

I have string content that gets delivered to me via TCP. This info is only relevant because it means that I do not consistently retrieve the same string. I have a <start> and <stop> separator to ensure that any time I get the data via TCP, I am outputting the full content.
My incoming content looks like so:
<start>Apple Bandana Cadillac<stop>
I want to get everything in between <start> and <stop>. So just Apple Bandana Cadillac.
My script to do this looks like so:
servercsv.on("connection", function(socket){
let d_basic = "";
socket.on('data', function(data){
d_basic += data.toString();
let d_csvindex = d_basic.indexOf('<stop>');
while (d_csvindex > -1){
try {
let strang = d_basic.substring(0, d_csvindex);
let dyson = strang.replace(/<start>/g, '');
let dson = papaparse.parse(dyson);
myfunction(dson);
}
catch(e){ console.log(e); }
d_basic = d_basic.substring(d_csvindex+1);
d_csvindex = d_basic.indexOf('<stop>');
}
});
});
What this means is that I am getting everything before the <stop> string and outputting it. I have also included the line let dyson = strang.replace(/<start>/g, ''); because I want to remove the <start> text.
However, because this is TCP, I am not guranteed to get all parts of this string. As a result, I frequently get back stop>Apple Bandana Cadillac<stop> or some variation of this (such as start>Apple Bandana Cadillac<stop>. It is not consistent enough that I can just do strang.replace("start>", "")
Ideally, I would like my separator to select content that is in between <start> and <stop>. Not just <stop>. However, I am unsure how to do so.
Alternatively, I can also settle for a regex that retrieves all combination of <start><stop> strings during my while loop, and just delete them. So check for <, s, t, a, r, t individually and so forth. But unsure how to implement regex to delete portions of a whole string.

Assuming you get full response:
var test = "<start>Apple Bandana Cadillac<stop>";
var testRE = test.match("<start>(.*)<stop>");
testRE[1] //"Apple Bandana Cadillac"
If there are new lines between <start> and <stop>
var test = "<start>Apple Bandana Cadillac<stop>";
var testRE = test.match("<start>([\\S\\s]*)<stop>");
testRE[1] //"Apple Bandana Cadillac"
Using regular expressions capturing group here.

Try this regex with replace() method:
/<st.*?>(.*?)(?!<st)/g
Literal.................................................: <st
Any char zero or more times lazily...: .*?
Literal..................................................: >
Begin capture group..........................: (
Any char zero or more times lazily...: .*?
End capture group.............................: )
Begin negative lookahead.................: (?!
Literal...................................................: <st
End negative lookahead....................: )
In the Demo below notice that the test example consists of multiple lines, and variances of <start> and <stop> (basically <st).
Demo 1
var rgx = /<st.*?>(.*?)(?!<st)/g;
var str = `<start>Apple Bandana Cadillac<stop>
<stop>Grapes Trampoline Ham<stop>
<start>Kebab Matador Pencil<start>`;
var res = str.replace(rgx, `$1`);
console.log(res);
Update
"say I have op>Grapes Trampoline Ham<stop>...still trying to remove all parts of the string <stop>"
/^(.*?>)(.*?)(<.*?)$/gm;
A simple explanation will have to do since a step-by-step such as Demo 1 would take too much time.
This RegEx is multiline. /m
^..........Begin line.
(.*?>)..Lazily capture everything until literal >........[Return as $1]
(.*?)...Then lazily capture everything until................[Return as $2]
(<.*?)..Literal < and lazily capture everything until..[Return as $3]
$...........End line.
The trick is to replace the second capture $2 and leave $1 and $3 alone.
Demo 2
var rgx = /^(.*?>)(.*?)(<.*?)$/gm;
var str = `<start>Apple Bandana Cadillac<stop>
<stop>Grapes Trampoline Ham<stop>
<start>Kebab Matador Pencil<start>
op>Score False Razor<stop>
`;
var res = str.replace(rgx, `$2`);
console.log(res);

Regex match cookie value and remove hyphens

I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??

var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );

If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.

Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/

You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");

Extract word between '=' and '('

I have the following string
234234=AWORDHERE('sdf.'aa')
where I need to extract AWORDHERE.
Sometimes there can be space in between.
234234= AWORDHERE('sdf.'aa')
Can I do this with a regular expression?
Or should I do it manually by finding indexes?
The datasets are huge, so it's important to do it as fast as possible.

Try this regex:
\d+=\s?(\w+)\(
Check Demo
in Javascript it would like that:
var myString = "234234=AWORDHERE('sdf.'aa')";// or 234234= AWORDHERE('sdf.'aa')
var myRegexp = /\d+=\s?(\w+)\(/g;
var match = myRegexp.exec(myString);
console.log(match[1]); // AWORDHERE

You could do this at least three ways. You need to benchmark to see what's fastest.
Substring w/ indexes
function extract(from) {
var ixEq = from.indexOf("=");
var ixParen = from.indexOf("(");
return from.substring(ixEq + 1, ixParen);
}
.
Splits
function extract(from) {
var spEq = from.split("=");
var spParen = spEq[1].split("(");
return spParen[0];
}
Regex (demo)
Here is some sample regex you could use
/[^=]+=([^(]+).*/g
This says
[^=]+ - One or more character which is not an =
= - The = itself
( - creates a matching group so you can access your match in code
[^(]+ - One or more character which is not a (
) - closes the matching group
.* - Matches the rest of the line
the /g on the end tells it to perform the match on all lines.

Using look around you can search for string preceded by = and followed by ( as following.
Regex: (?<==)[A-Z ]+(?=\()
Explanation:
(?<==) checks if [A-Z ] is preceded by an =.
[A-Z ]+ matches your pattern.
(?=\() checks if matched pattern is followed by a (.
Regex101 Demo

var str = "234234= AWORDHERE('sdf.'aa')";
var regexp = /.*=\s+(\w+)\(.*\)/g;
var match = regexp.exec(str);
alert( match[1] );

I made my solution for this just a little more general than you asked for, but I don't think it takes much more time to execute. I didn't measure. If you need greater efficiency than this provides, comment and I or someone else can help you with that.
Here's what I did, using the command prompt of node:
> var s = "234234= AWORDHERE('sdf.'aa')"
undefined
> var a = s.match(/(\w+)=\s*(\w+)\s*\(.*/)
undefined
> a
[ '234234= AWORDHERE(\'sdf.\'aa\')',
'234234',
'AWORDHERE',
index: 0,
input: '234234= AWORDHERE(\'sdf.\'aa\')' ]
>
As you can see, this matches the number before the = in a[1], and it matches the AWORDHERE name as you requested in a[2]. This will work with any number (including zero) spaces before and/or after the =.

Regex extracting multiple matches for string [duplicate]

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.

Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]

You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.

Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

Prevent regex group from including previous character?

I'm attempting to get any words starting with #, such as in "#word", but only get the "word" value.
My sample text is:
#bob asodija qwwiq qwe #john #cat asdasd#qeqwe
My current regex is:
/\B#(\w+)/gi
This works perfectly, except that "#" is still being captured. The output of this match is:
"#bob"
"#john"
"#cat"
I've tried setting the # in a back reference, but its still including the # in the results.
/\B(?:#)(\w+)/gi

You want to use the match array returned from exec
var teststr = '#bob asodija qwwiq qwe #john #cat asdasd#qeqwe';
var exp = /\B#(\w+)/gi;
var match = exp.exec(teststr);
while(match != null){
alert(match[1]); // match 1 = 1st group captured
match = exp.exec(teststr);
}

Here's a neat trick using the String.replace method, which can take a function as the replacement.
var matches = [];
var str = "#bob asodija qwwiq qwe #john #cat asdasd#qeqwe";
str.replace( /\B#(\w+)/g, function( all, firstCaptureGroup ) {
matches.push( firstCaptureGroup );
});
console.log( matches ); //["bob", "john", "cat"]

Here is a better solution without additional calculations except of regular expression:
(?<=\B#)(\w+)

Develop Reference

JavaScript is the programming language of the Web.

Javascript regular expression is returning # character even though it's not captured - javascript

exec it multiple times to get the rest. while((match = r.exec(text))) log(match);

Use String.prototype.match instead of RegExp.prototype.exec: var match = text.match(r); That will give you all matches at once (requires g flag) instead of one match at a time.

Here's another way var text = 'ticket number #1234 and #8976 '; var r = /#(\d+)/g; var matches = []; text.replace( r, function( all, first ) { matches.push( first ) }); log(matches); // ["1234", "8976"]

Related

Javascript get all text in between string

Regex match cookie value and remove hyphens

Extract word between '=' and '('

Regex extracting multiple matches for string [duplicate]

Prevent regex group from including previous character?

Categories

Resources