regex pattern Parse multiple strings - javascript

need to filter a collection of strings based on a rather complex query I have query input as a string
var query =ti,su,ab(((study OR trail OR research pre/2 challeng*) n/1 (design* AND method*)) (behavior* n/1 behaviour*) OR ((behavior* or behaviour*)n/1 (change* near/6 modification*)));
The query can change
From this query INPUT string I want to collect just the important words:
the result that I expected = study trail research challeng* n/1design* method* behavior* behaviour* behavior* behaviour* n/1change* modification*
my result= study trail research challeng* design* method*behavior* behaviour* behavior* behaviour*change* modification*
my problem here is sometimes I got two words concatenate as an example method*behavior* and behaviour*change* and that's wrong
DEMO:
This my regexp: delete words from the query: ( ti, ab, su, AND, OR, NEAR/n, P/n, pre/n n/n ), brackets () and the comma ,
/ ?[()]|\b(AND|OR|(NEAR|n|PRE|P)/\d+)(\s|$)|\b(ti|ab|su|,)\b ? /gi
var query = "ti,ab,su(((study OR trail OR research pre/2 challeng*) n/1 (design* AND method*)) (behavior* n/1 behaviour*) OR ((behavior* or behaviour*)n/1 (change* near/6 modification*)))";
var subst= "";
var str = query.replace(/ ?[()]|\b(AND|OR|(NEAR|n|PRE|P)\/\d+)(\s|$)|\b(ti|ab|su|,)\b ?/gi,subst);
console.log(str)
every single word need to be sperate with whitespace.
I'm looking for your suggestion.
Thanks.

Replace your matched things with a space, then condense the space afterwards.
var query = "ti,ab,su(((study OR trail OR research pre/2 challeng*) n/1 (design* AND method*)) (behavior* n/1 behaviour*) OR ((behavior* or behaviour*)n/1 (change* near/6 modification*)))";
var subst= " ";
var str = query.replace(/ ?[()]|\b(AND|OR|(NEAR|n|PRE|P)\/\d+)(\s|$)|\b(ti|ab|su|,)\b ?/gi,subst);
str = str.replace(/^ +|( ) +| +$/g,"$1");
console.log(str)

Related

JavaScript get first name and last name from string as array

I have a string that has the following format: <strong>FirstName LastName</strong>
How can I change this into an array with the first element firstName and second lastName?
I did this, but no luck, it won't produce the right result:
var data = [myString.split('<strong>')[1], myString.split('<strong>')[2]]
How can I produce ["firstName", "lastName"] for any string with that format?
In order to parse HTML, use the best HTML parser out there, the DOM itself!
// create a random element, it doesn't have to be 'strong' (e.g., it could be 'div')
var parser = document.createElement('strong');
// set the innerHTML to your string
parser.innerHTML = "<strong>FirstName LastName</strong>";
// get the text inside the element ("FirstName LastName")
var fullName = parser.textContent;
// split it into an array, separated by the space in between FirstName and LastName
var data = fullName.split(" ");
// voila!
console.log(data);
EDIT
As #RobG pointed out, you could also explicitly use a DOM parser rather than that of an element:
var parser = new DOMParser();
var doc = parser.parseFromString("<strong>FirstName LastName</strong>", "text/html");
console.log(doc.body.textContent.split(" "));
However, both methods work perfectly fine; it all comes down to preference.
Just match everything between <strong> and </strong>.
var matches = "<strong>FirstName LastName</strong>".match(/<strong>(.*)<\/strong>/);
console.log(matches[1].split(' '));
The preferred approach would be to use DOM methods; create an element and get the .textContent then match one or more word characters or split space character.
let str = '<strong>FirstName LastName</strong>';
let [,first, last] = str.split(/<[/\w\s-]+>|\s/g);
console.log(first, last);
/<[/\w\s-]+>|\s/g
Splits < followed by one or more word, space or dash characters characters followed by > character or space to match space between words in the string.
Comma operator , within destructuring assignment is used to omit that index from the result of .split() ["", "FirstName", "LastName", ""].
this is my approach of doing your problem. Hope it helps!
var str = "<strong>FirstName LastName</strong>";
var result = str.slice(0, -9).substr(8).split(" ");
Edit: it will only work for this specific example.
Another way to do this in case you had something other than an html
var string = "<strong>FirstName LastName</strong>";
string = string.slice(0, -9); // remove last 9 chars
string = string.substr(8); // remove first 8 chars
string = string.split(" "); // split into an array at space
console.log(string);

Regex match cookie value and remove hyphens

I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??
var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );
If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.
Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/
You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");

String.slice and string.substring

I am brand new at programming, especially JS. I seem to be stuck on a split string.
I need to split a string into two separate strings. I know I can do so with the slice and substr like below, which is my sample I have of what I know. I am assuming my name is Paul Johnson. with below I know that if I have an output of first and last name with the parameters I setup, I will have Paul as my first name and Johnson as my second.
var str = document.getElementById("fullName").value;
var firstname = str.slice(0, 4);
var lastname = str.substr(4, 13);
My question is that I am getting hung up on how to find the space and splitting it from there in order to have a clean cut and the same for the end.
Are there any good resources that clearly define how I can do that?
Thanks!
What you're after is String split. It will let you split on spaces.
http://www.w3schools.com/jsref/jsref_split.asp
var str = "John Smith";
var res = str.split(" ");
Will return an Array with ['John','Smith']
str.indexOf(' ') will return the first space
There is a string split() method in Javascript, and you can split on the space in any two-word name like so:
var splitName = str.split(" ");
var firstName = splitName[0];
var lastName = splitName[1];
http://www.w3schools.com/jsref/jsref_split.asp
A good way to parse strings that are space separated is as follows:
pieces = string.split(' ')
Pieces will then contain an array of all the different strings. Check out the following example:
string_to_parse = 'this,is,a,comma,separated,list';
strings = string_to_parse.split(',');
alert(strings[3]); // makes an alert box containing the string "comma"
Use str.split().
The syntax of split is: string.split(separator,limit)
split() returns a list of strings.
The split() function defaults to splitting by whitespace with to limit.
Example:
var str = "Your Name";
var pieces = str.split();
var firstName = pieces[0];
var lastName = pieces[1];
pieces will be equal to ['Your', 'Name'].
firstName will be equal to 'Your'.
lastName will be equal to 'Name'.
I figured it out:
var str = document.getElementById("fullName").value;
var space = str.indexOf(" ");
var firstname = str.slice(0, space);
var lastname = str.substr(space);
Thank you all!

How to remove the last matched regex pattern in javascript

I have a text which goes like this...
var string = '~a=123~b=234~c=345~b=456'
I need to extract the string such that it splits into
['~a=123~b=234~c=345','']
That is, I need to split the string with /b=.*/ pattern but it should match the last found pattern. How to achieve this using RegEx?
Note: The numbers present after the equal is randomly generated.
Edit:
The above one was just an example. I did not make the question clear I guess.
Generalized String being...
<word1>=<random_alphanumeric_word>~<word2>=<random_alphanumeric_word>..~..~..<word2>=<random_alphanumeric_word>
All have random length and all wordi are alphabets, the whole string length is not fixed. the only text known would be <word2>. Hence I needed RegEx for it and pattern being /<word2>=.*/
This doesn't sound like a job for regexen considering that you want to extract a specific piece. Instead, you can just use lastIndexOf to split the string in two:
var lio = str.lastIndexOf('b=');
var arr = [];
var arr[0] = str.substr(0, lio);
var arr[1] = str.substr(lio);
http://jsfiddle.net/NJn6j/
I don't think I'd personally use a regex for this type of problem, but you can extract the last option pair with a regex like this:
var str = '~a=123~b=234~c=345~b=456';
var matches = str.match(/^(.*)~([^=]+=[^=]+)$/);
// matches[1] = "~a=123~b=234~c=345"
// matches[2] = "b=456"
Demo: http://jsfiddle.net/jfriend00/SGMRC/
Assuming the format is (~, alphanumeric name, =, and numbers) repeated arbitrary number of times. The most important assumption here is that ~ appear once for each name-value pair, and it doesn't appear in the name.
You can remove the last token by a simple replacement:
str.replace(/(.*)~.*/, '$1')
This works by using the greedy property of * to force it to match the last ~ in the input.
This can also be achieved with lastIndexOf, since you only need to know the index of the last ~:
str.substring(0, (str.lastIndexOf('~') + 1 || str.length() + 1) - 1)
(Well, I don't know if the code above is good JS or not... I would rather write in a few lines. The above is just for showing one-liner solution).
A RegExp that will give a result that you may could use is:
string.match(/[a-z]*?=(.*?((?=~)|$))/gi);
// ["a=123", "b=234", "c=345", "b=456"]
But in your case the simplest solution is to split the string before extract the content:
var results = string.split('~'); // ["", "a=123", "b=234", "c=345", "b=456"]
Now will be easy to extract the key and result to add to an object:
var myObj = {};
results.forEach(function (item) {
if(item) {
var r = item.split('=');
if (!myObj[r[0]]) {
myObj[r[0]] = [r[1]];
} else {
myObj[r[0]].push(r[1]);
}
}
});
console.log(myObj);
Object:
a: ["123"]
b: ["234", "456"]
c: ["345"]
(?=.*(~b=[^~]*))\1
will get it done in one match, but if there are duplicate entries it will go to the first. Performance also isn't great and if you string.replace it will destroy all duplicates. It would pass your example, but against '~a=123~b=234~c=345~b=234' it would go to the first 'b=234'.
.*(~b=[^~]*)
will run a lot faster, but it requires another step because the match comes out in a group:
var re = /.*(~b=[^~]*)/.exec(string);
var result = re[1]; //~b=234
var array = string.split(re[1]);
This method will also have the with exact duplicates. Another option is:
var regex = /.*(~b=[^~]*)/g;
var re = regex.exec(string);
var result = re[1];
// if you want an array from either side of the string:
var array = [string.slice(0, regex.lastIndex - re[1].length - 1), string.slice(regex.lastIndex, string.length)];
This actually finds the exact location of the last match and removes it regex.lastIndex - re[1].length - 1 is my guess for the index to remove the ellipsis from the leading side, but I didn't test it so it might be off by 1.

Exclude characters from displaying?

I want to exclude characters from displaying in a vbulletin template.
For example, if a user writes:
"[Hello World] How are you?"
I want to ecxlude "[" and "]" all that's inside so it only displays:
"How are you?"
Is there a way to do this?
Use JavaScript string operations .getIndexOf() and .substring(). Get the position of the first bracket, get the position of the second bracket, split the string into 3 substrings, the middle section being between the two indexed values, and then add just the first and third substrings together. Like this:
var string = "[Hello World] How are you?";
var bracket1 = string.getIndexOf("[");
var bracket2 = string.getIndexOf("]");
var substring1 = string.substring(0,bracket1);
var substring2 = string.substring(bracket1,bracket2);
var substring3 = string.substring(bracket2,string.length);
var solution = substring 1 + " " + substring 3;
At least, that's the concept. Everything may not be right on, but you could futz with the numbers a little to get it perfect.
Or if you don't need to worry about what comes before [], simply use .split():
var string = "[Hello World] How are you?";
var solutionArray = string.split("]");
var solution = solutionArray[1];
Hope this helps!

Categories

Resources