Extract string when preceding number or combo of preceding characters is unknown

Extract string when preceding number or combo of preceding characters is unknown - javascript

Here's an example string:
++++#foo+bar+baz++#yikes
I need to extract foo and only foo from there or a similar scenario.
The + and the # are the only characters I need to worry about.
However, regardless of what precedes foo, it needs to be stripped or ignored. Everything else after it needs to as well.

try this:
/\++#(\w+)/
and catch the capturing group one.

You can simply use the match() method.
var str = "++++#foo+bar+baz++#yikes";
var res = str.match(/\w+/g);
console.log(res[0]); // foo
console.log(res); // foo,bar,baz,yikes
Or use exec
var str = "++++#foo+bar+baz++#yikes";
var match = /(\w+)/.exec(str);
alert(match[1]); // foo
Using exec with a g modifier (global) is meant to be used in a loop getting all sub matches.
var str = "++++#foo+bar+baz++#yikes";
var re = /\w+/g;
var match;
while (match = re.exec(str)) {
// In array form, match is now your next match..
}

How exactly do + and # play a role in identifying foo? If you just want any string that follows # and is terminated by + that's as simple as:
var foostring = '++++#foo+bar+baz++#yikes';
var matches = (/\#([^+]+)\+/g).exec(foostring);
if (matches.length > 1) {
// all the matches are found in elements 1 .. length - 1 of the matches array
alert('found ' + matches[1] + '!'); // alerts 'found foo!'
}
To help you more specifically, please provide information about the possible variations of your data and how you would go about identifying the token you want to extract even in cases of differing lengths and characters.
If you are just looking for the first segment of text preceded and followed by any combination of + and #, then use:
var foostring = '++++#foo+bar+baz++#yikes';
var result = foostring.match(/[^+#]+/);
// will be the single-element array, ['foo'], or null.
Depending on your data, using \w may be too restrictive as it is equivalent to [a-zA-z0-9_]. Does your data have anything else such as punctuation, dashes, parentheses, or any other characters that you do want to include in the match? Using the negated character class I suggest will catch every token that does not contain a + or a #.

Related

Regex match cookie value and remove hyphens

I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??

var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );

If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.

Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/

You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");

Extract word between '=' and '('

I have the following string
234234=AWORDHERE('sdf.'aa')
where I need to extract AWORDHERE.
Sometimes there can be space in between.
234234= AWORDHERE('sdf.'aa')
Can I do this with a regular expression?
Or should I do it manually by finding indexes?
The datasets are huge, so it's important to do it as fast as possible.

Try this regex:
\d+=\s?(\w+)\(
Check Demo
in Javascript it would like that:
var myString = "234234=AWORDHERE('sdf.'aa')";// or 234234= AWORDHERE('sdf.'aa')
var myRegexp = /\d+=\s?(\w+)\(/g;
var match = myRegexp.exec(myString);
console.log(match[1]); // AWORDHERE

You could do this at least three ways. You need to benchmark to see what's fastest.
Substring w/ indexes
function extract(from) {
var ixEq = from.indexOf("=");
var ixParen = from.indexOf("(");
return from.substring(ixEq + 1, ixParen);
}
.
Splits
function extract(from) {
var spEq = from.split("=");
var spParen = spEq[1].split("(");
return spParen[0];
}
Regex (demo)
Here is some sample regex you could use
/[^=]+=([^(]+).*/g
This says
[^=]+ - One or more character which is not an =
= - The = itself
( - creates a matching group so you can access your match in code
[^(]+ - One or more character which is not a (
) - closes the matching group
.* - Matches the rest of the line
the /g on the end tells it to perform the match on all lines.

Using look around you can search for string preceded by = and followed by ( as following.
Regex: (?<==)[A-Z ]+(?=\()
Explanation:
(?<==) checks if [A-Z ] is preceded by an =.
[A-Z ]+ matches your pattern.
(?=\() checks if matched pattern is followed by a (.
Regex101 Demo

var str = "234234= AWORDHERE('sdf.'aa')";
var regexp = /.*=\s+(\w+)\(.*\)/g;
var match = regexp.exec(str);
alert( match[1] );

I made my solution for this just a little more general than you asked for, but I don't think it takes much more time to execute. I didn't measure. If you need greater efficiency than this provides, comment and I or someone else can help you with that.
Here's what I did, using the command prompt of node:
> var s = "234234= AWORDHERE('sdf.'aa')"
undefined
> var a = s.match(/(\w+)=\s*(\w+)\s*\(.*/)
undefined
> a
[ '234234= AWORDHERE(\'sdf.\'aa\')',
'234234',
'AWORDHERE',
index: 0,
input: '234234= AWORDHERE(\'sdf.\'aa\')' ]
>
As you can see, this matches the number before the = in a[1], and it matches the AWORDHERE name as you requested in a[2]. This will work with any number (including zero) spaces before and/or after the =.

Regex extracting multiple matches for string [duplicate]

I'm trying to obtain all possible matches from a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.
Variables:
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
Code:
var match = string.match(reg);
All matched results I get:
A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y
Matched results I want:
A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y
In my head, I want A1B1Y:A1B2Y to be a match along with A1B2Y:A1B3Y, even though A1B2Y in the string will need to be part of two matches.

Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .exec and manipulating the regex object's lastIndex property.
var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
matches.push(found[0]);
reg.lastIndex -= found[0].split(':')[1].length;
}
console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]
Demo
As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:
reg.lastIndex = found.index+1;
Demo
The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]

You cannot get the direct result from match, but it is possible to produce the result via RegExp.exec and with some modification to the regex:
var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];
while ((arr = regex.exec(input)) !== null) {
results.push(arr[0] + arr[1]);
}
I used zero-width positive look-ahead (?=pattern) in order not to consume the text, so that the overlapping portion can be rematched.
Actually, it is possible to abuse replace method to do achieve the same result:
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];
input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function ($0, $1) {
results.push($0 + $1);
return '';
});
However, since it is replace, it does extra useless replacement work.

Unfortunately, it's not quite as simple as a single string.match.
The reason is that you want overlapping matches, which the /g flag doesn't give you.
You could use lookahead:
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
But now you get:
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.
You could use exec to try and grab what you want. If a regex has the /g flag, you can run exec repeatedly to get all the matches:
// using re from above to get the overlapping matches
var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need
while ((m = re.exec(string)) !== null) {
// m is a match object, which has the index of the current match
matches.push(string.substring(m.index).match(re2)[0]);
}
matches == [
"A1B1Y:A1B2Y",
"A1B2Y:A1B3Y",
"A1B5Y:A1B6Y",
"A1B6Y:A1B7Y",
"A1B9Y:A1B10Y",
"A1B10Y:A1B11Y"
];
Here's a fiddle of this in action. Open up the console to see the results
Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i] and array[i+1] both match like you want.

How to match one, but not two characters using regular expressions

Using javascript regular expressions, how do you match one character while ignoring any other characters that also match?
Example 1: I want to match $, but not $$ or $$$.
Example 2: I want to match $$, but not $$$.
A typical string that is being tested is, "$ $$ $$$ asian italian"
From a user experience perspective, the user selects, or deselects, a checkbox whose value matches tags found in in a list of items. All the tags must be matched (checked) for the item to show.
function filterResults(){
// Make an array of the checked inputs
var aInputs = $('.listings-inputs input:checked').toArray();
// alert(aInputs);
// Turn that array into a new array made from each items value.
var aValues = $.map(aInputs, function(i){
// alert($(i).val());
return $(i).val();
});
// alert(aValues);
// Create new variable, set the value to the joined array set to lower case.
// Use this variable as the string to test
var sValues = aValues.join(' ').toLowerCase();
// alert(sValues);
// sValues = sValues.replace(/\$/ig,'\\$');
// alert(sValues);
// this examines each the '.tags' of each item
$('.listings .tags').each(function(){
var sTags = $(this).text();
// alert(sTags);
sSplitTags = sTags.split(' \267 '); // JavaScript uses octal encoding for special characters
// alert(sSplitTags);
// sSplitTags = sTags.split(' \u00B7 '); // This also works
var show = true;
$.each(sSplitTags, function(i,tag){
if(tag.charAt(0) == '$'){
// alert(tag);
// alert('It begins with a $');
// You have to escape special characters for the RegEx
tag = tag.replace(/\$/ig,'\\$');
// alert(tag);
}
tag = '\\b' + tag + '\\b';
var re = new RegExp(tag,'i');
if(!(re.test(sValues))){
alert(tag);
show = false;
alert('no match');
return false;
}
else{
alert(tag);
show = true;
alert('match');
}
});
if(show == false){
$(this).parent().hide();
}
else{
$(this).parent().show();
}
});
// call the swizzleRows function in the listings.js
swizzleList();
}
Thanks in advance!

Normally, with regex, you can use (?<!x)x(?!x) to match an x that is not preceded nor followed with x.
With the modern ECMAScript 2018+ compliant JS engines, you may use lookbehind based regex:
(?<!\$)\$(?!\$)
See the JS demo (run it in supported browsers only, their number is growing, check the list here):
const str ="$ $$ $$$ asian italian";
const regex = /(?<!\$)\$(?!\$)/g;
console.log( str.match(regex).length ); // Count the single $ occurrences
console.log( str.replace(regex, '<span>$&</span>') ); // Enclose single $ occurrences with tags
console.log( str.split(regex) ); // Split with single $ occurrences

\bx\b
Explanation: Matches x between two word boundaries (for more on word boundaries, look at this tutorial). \b includes the start or end of the string.
I'm taking advantage of the space delimiting in your question. If that is not there, then you will need a more complex expression like (^x$|^x[^x]|[^x]x[^x]|[^x]x$) to match different positions possibly at the start and/or end of the string. This would limit it to single character matching, whereas the first pattern matches entire tokens.
The alternative is just to tokenize the string (split it at spaces) and construct an object from the tokens which you can just look up to see if a given string matched one of the tokens. This should be much faster per-lookup than regex.

Something like that:
q=re.match(r"""(x{2})($|[^x])""", 'xx')
q.groups() ('xx', '')
q=re.match(r"""(x{2})($|[^x])""", 'xxx')
q is None True

How can I remove all characters up to and including the 3rd slash in a string?

I'm having trouble with removing all characters up to and including the 3 third slash in JavaScript. This is my string:
http://blablab/test
The result should be:
test
Does anybody know the correct solution?

To get the last item in a path, you can split the string on / and then pop():
var url = "http://blablab/test";
alert(url.split("/").pop());
//-> "test"
To specify an individual part of a path, split on / and use bracket notation to access the item:
var url = "http://blablab/test/page.php";
alert(url.split("/")[3]);
//-> "test"
Or, if you want everything after the third slash, split(), slice() and join():
var url = "http://blablab/test/page.php";
alert(url.split("/").slice(3).join("/"));
//-> "test/page.php"

var string = 'http://blablab/test'
string = string.replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'').replace(/[\s\S]*\//,'')
alert(string)
This is a regular expression. I will explain below
The regex is /[\s\S]*\//
/ is the start of the regex
Where [\s\S] means whitespace or non whitespace (anything), not to be confused with . which does not match line breaks (. is the same as [^\r\n]).
* means that we match anywhere from zero to unlimited number of [\s\S]
\/ Means match a slash character
The last / is the end of the regex

var str = "http://blablab/test";
var index = 0;
for(var i = 0; i < 3; i++){
index = str.indexOf("/",index)+1;
}
str = str.substr(index);
To make it a one liner you could make the following:
str = str.substr(str.indexOf("/",str.indexOf("/",str.indexOf("/")+1)+1)+1);

You can use split to split the string in parts and use slice to return all parts after the third slice.
var str = "http://blablab/test",
arr = str.split("/");
arr = arr.slice(3);
console.log(arr.join("/")); // "test"
// A longer string:
var str = "http://blablab/test/test"; // "test/test";

You could use a regular expression like this one:
'http://blablab/test'.match(/^(?:[^/]*\/){3}(.*)$/);
// -> ['http://blablab/test', 'test]
A string’s match method gives you either an array (of the whole match, in this case the whole input, and of any capture groups (and we want the first capture group)), or null. So, for general use you need to pull out the 1th element of the array, or null if a match wasn’t found:
var input = 'http://blablab/test',
re = /^(?:[^/]*\/){3}(.*)$/,
match = input.match(re),
result = match && match[1]; // With this input, result contains "test"

let str = "http://blablab/test";
let data = new URL(str).pathname.split("/").pop();
console.log(data);

Develop Reference

JavaScript is the programming language of the Web.

Extract string when preceding number or combo of preceding characters is unknown - javascript

try this: /\++#(\w+)/ and catch the capturing group one.

Related

Regex match cookie value and remove hyphens

Extract word between '=' and '('

Regex extracting multiple matches for string [duplicate]

How to match one, but not two characters using regular expressions

How can I remove all characters up to and including the 3rd slash in a string?

Categories

Resources