javascript regular expression matching cityname - javascript

I have the following array of data named cityList:
var cityList = [
"Anaa, French Polynesia (AAA)",
"Arrabury, Australia (AAB)",
"Al Arish, Egypt (AAC)",
"Ad-Dabbah, Sudan (AAD)",
"Annaba, Algeria (AAE)",
"Apalachicola, United States (AAF)",
"Arapoti, Brazil (AAG)",
"Aachen, Germany (AAH)",
"Arraias, Brazil (AAI)",
"Awaradam, Suriname (AAJ)",
"Aranuka, Kiribati (AAK)",
"Aalborg, Denmark (AAL)"
];
I want to first search the city name starting at the beginning of the string.
Next I want to search the code portion of the string: AAA, AAB, AAC, etc...
I want to apply a search pattern as a javascript regular expression, first to the city name, and second to the city code.
Here are my regular expressions:
// this regular expression used for search city name
var matcher = new RegExp("^" + re, "i");
// this regular expression used for search city code
var matcher = new RegExp("([(*)])" + re, "i");
How do I combine these two regular expressions into a single regex that works as described?

I suggest this:
var myregexp = /^([^,]+),[^(]*\(([^()]+)\)/;
var match = myregexp.exec(subject);
if (match != null) {
city = match[1];
code = match[2];
}
Explanation:
^ # Start of string
( # Match and capture (group number 1):
[^,]+ # One or more characters except comma (alternatively insert city name)
) # End of group 1
, # Match a comma
[^(]* # Match any number of characters except an opening parenthesis
\( # Match an opening parenthesis
( # Match and capture (group number 2):
[^()]+ # One or more characters except parentheses (alt. insert city code)
) # End of group 2
\) # Match a closing parenthesis
This assumes that no city name will ever contain a comma (otherwise this regex would only capture the part before the comma), so you'd need to check your data if that's ever possible. I can't think of an example, but that's not saying anything :)

$("#leavingCity").autocomplete({
source: function(req, responseFn) {
var re = $.ui.autocomplete.escapeRegex(req.term);
var matcher = new RegExp("/^([^,]+),[^(]*\(([^()]+)\)/", "g");
var a = $.grep(cityList, function(item,index) { return matcher.test(item); });
responseFn(a);
} });
Try this, regualr expression by Tim Pietzcker

This is the most elegant way I can do it:
var cityList = ["Anaa, French Polynesia (AAA)","Arrabury, Australia (AAB)","Al Arish, Egypt (AAC)","Ad-Dabbah, Sudan (AAD)","Annaba, Algeria (AAE)","Apalachicola, United States (AAF)","Arapoti, Brazil (AAG)","Aachen, Germany (AAH)","Arraias, Brazil (AAI)","Awaradam, Suriname (AAJ)","Aranuka, Kiribati (AAK)","Aalborg, Denmark (AAL)"];
var regex = /([a-z].+?),.+?\(([A-Z]{3,3})\)/gi, match, newList = [];
while (match = regex.exec(cityList)) {
newList.push(match[1]+" - "+match[2]);
}
alert(newList[7]);
// prints Aachen - AAH
If you don't understand how to use parentheses in your regex, I suggest you check out the site I learned from: http://www.regular-expressions.info/

Here I suggest a completly different approach (ECMA-262 standard).
As using the regex requires a linear search anyway, if you can pre-process the data, you can set up an array of city objects:
function City(name, country, code){
this.cityName = name;
this.cityCountry = country;
this.cityCode = code;
}
var cities = [];
cities.push(new City('Anaa', 'French Polynesia', 'AAA'));
// ... push the other cities
And a search function:
function GetCity(cityToSearch, cities){
var res = null;
for(i=0;i<cities.length;i++){
if(cities[i].city = cityToSearch
res = cities[i];
}
return res;
}
At run time:
var codeFound = '';
var cityFound = GetCity('Arraias');
if(cityFound != null)
codeFound = cityFound.cityCode;
Remark
In both case, if you are going to fill the cities array with all city of the world, the city name is not a key! For instance there are half a dozen of 'Springfield' in USA. In that case a better approach is to use a two-fields key.

I think you want to accomplish this in a few simple steps:
Split each string in your array before and after the first parenthesis
Apply your first regex to the first part of the string. Store the result as a boolean variable, perhaps named matchOne
Apply your second regex to the second part of the string (don't forget to remove the closing parenthesis). Store the result as a boolean variable, perhaps named matchTwo.
Test if either of the two mathes succeeded: return ( matchOne || matchTwo );

Use indexOf
Its more efficient and explicit of expectation. regex is unnecessary.
const isMatchX = cityList.indexOf('AAB');
const isMatchY = cityList.indexOf('Awar');
Alternatively you could so something like this but its way overkill when you can use indexOf:
const search = (cityList, re) => {
const strRegPart1 = "¬[^¬]*" + re + "[^¬]*";
const strRegPart2 = "¬[^¬]*\\([^\\)]*" + re + "[^\\)]*\\)($|¬)";
const regSearch = RegExp("(" + strRegPart1 + "|" + strRegPart2 + ")", "gi");
const strCityListMarked = '¬' + cityList.join('¬');
const arrMatch = strCityListMarked.match(regSearch);
return arrMatch && arrMatch[1].substr(1);
}

Related

How to replace all values in an object with strings using JS?

I'm trying to get an array of JSON objects. To do that, I'm trying to make the input I have parsable, then parse it and push it to that array using a for loop. The inputs I have to work with look like this:
firstname: Chris, lastname: Cheshire, email: chris#cmdcheshire.com, viewerlink: audiencematic.com/viewer?v\u003dTESTSHOW\u0026push\u003d8A043B5A, tempid: 8A043B5A, permaid: F8tGYNx, showid: TESTSHOW
I've gotten it to the point where each loop produces something like this:
{ "firstname": First Name, "lastname": Last Name, "email": sample#gmail.com, "viewerlink": audiencematic.com/viewer?v=TESTSHOW&push=715B3074, "tempid": 715B3074, "permaid": F8tGYNx, "showid": TESTSHOW }
But got stuck on the last bit, making the values strings. I want it to look like this, so I can use JSON.parse():
{ "firstname": "First Name", "lastname": "Last Name", "email": "sample#gmail.com", "viewerlink": "audiencematic.com/viewer?v=TESTSHOW&push=715B3074", "tempid": "715B3074", "permaid": "F8tGYNx", "showed": "TESTSHOW" }
I tried a couple of different methods I found on here, but one of the values is a URL and the period is screwing with the replace expressions. I tried using the replace function like this:
var jsonStr2 = jsonStr.replace(/(: +\w)|(:+\w)/g, function(matchedStr) {
return ':"' + matchedStr.substring(2, matchedStr.length) + '"';
});
But it just becomes this:
{ "firstname":""irst Name, "lastname":""ast Name, "email":""ample#gmail.com, "viewerlink":""udiencematic.com/viewer?v=TESTSHOW&push=715B3074, "tempid":""15B3074, "permaid":""8tGYNx, "showid":""ESTSHOW }
How should I change my replace function?
(I tried that code because I'm using
var jsonStr = string.replace(/(\w+:)|(\w+ :)/g, function(matchedStr) {
return '"' + matchedStr.substring(0, matchedStr.length - 1) + '":';
});
to put parenthesis around the key sides and that seems to work.)
FIGURED IT OUT!! SEE MY ANSWER BELOW.
One option might be to try using a deserialized version of the string, alter the values associated with the properties of the object, and then convert back to a string.
var person = "{fname:\"John\", lname:\"Doe\", age:25}";
var obj = JSON.parse(person);
for (x in obj) {
obj[x] = "";
}
var result = JSON.stringify(obj);
It's a little longer than doing a string replacement, but I find it a little easier to follow.
I figured it out! I just had to mess around in regexr to figure out what conditions I needed. Here's the working for loop code:
for (i = 0; i < audiencelistdirty.feed.openSearch$totalResults.$t; i++) {
var string = '{ ' + audiencelistdirty.feed.entry[i].content.$t + ' }';
var jsonStr = string.replace(/(\w+:)|(\w+ :)/g, function(matchedStr) {
return '"' + matchedStr.substring(0, matchedStr.length - 1) + '":';
});
var jsonStr1 = jsonStr.replace(/(:(.*?),)|(:\s(.*?)\s)/g, function(matchedStr) {
return ':"' + matchedStr.substring(2, matchedStr.length - 1) + '",';
});
var jsonStr2 = jsonStr1.replace(/(",})/g, function(matchedStr) {
return '" }';
});
var newObj = JSON.parse(jsonStr2);
audiencelist.push(newObj);
};
It's pretty ugly but it works.
EDIT: Sorry, I completely misread the question. To replace the values with quoted strings use this regex replace function:
const str =
'firstname: Chris, lastname: Cheshire, email: chris#cmdcheshire.com, viewerlink: audiencematic.com/viewer?v\u003dTESTSHOW\u0026push\u003d8A043B5A, tempid: 8A043B5A, permaid: F8tGYNx, showid: TESTSHOW'
const json = (() => {
const result = str
.replace(/\w+:\s(.*?)(?:,|$)/g, function (match, subStr) {
return match.replace(subStr, `"${subStr}"`)
})
.replace(/(\w+):/g, function (match, subStr) {
return match.replace(subStr, `"${subStr}"`)
})
return '{' + result + '}'
})()
Wrap the input string into commas then use a regex to identify the keys (between , and :) and their associated values (between : and ,) and construct the object directly as in the example below:
const input = ' firstname : Chris , lastname: Cheshire, email: chris#cmdcheshire.com, viewerlink: audiencematic.com/viewer?v\u003dTESTSHOW\u0026push\u003d8A043B5A, tempid: 8A043B5A, permaid: F8tGYNx, showid: TESTSHOW ';
const wrapped = `,${input},`;
const re = /,\s*([^:\s]*)\s*:\s*(.*?)\s*(?=,)/g;
const obj = {}
Array.from(wrapped.matchAll(re)).forEach((match) => obj[match[1]] = match[2]);
console.log(obj)
String.matchAll() is a newer function, not all JavaScript engines have implemented it yet. If you are one of the unlucky ones (or if you write code to be executed in a browser) then you can use the old-school way:
const input = ' firstname : Chris , lastname: Cheshire, email: chris#cmdcheshire.com, viewerlink: audiencematic.com/viewer?v\u003dTESTSHOW\u0026push\u003d8A043B5A, tempid: 8A043B5A, permaid: F8tGYNx, showid: TESTSHOW ';
const wrapped = `,${input},`;
const re = /,\s*([^:\s]*)\s*:\s*(.*?)\s*(?=,)/g;
const obj = {}
let match = re.exec(wrapped);
while (match) {
obj[match[1]] = match[2];
match = re.exec(wrapped);
}
console.log(obj);
The anatomy of the regex used above
The regular expression piece by piece:
/ # regex delimiter; not part of the regex but JavaScript syntax
, # match a comma
\s # match a white space character (space, tab, new line)
* # the previous symbol zero or more times
( # start the first capturing group; does not match anything
[ # start a character class...
^ # ... that matches any character not listed inside the class
: # ... i.e. any character but semicolon...
\s # ... and white space character
] # end of the character class; the entire class matches only one character
* # the previous symbol zero or more times
) # end of the first capturing group; does not match anything
\s*:\s* # zero or more spaces before and after the semicolon
( # start of the second capturing group
.* # any character, any number of times; this is greedy by default
? # make it not greedy
) # end of the second capturing group
\s* # zero or more spaces
(?= # lookahead positive assertion; matches but does not consume the matched substring
, # matches a comma
) # end of the assertion
/ # regex delimiter; not part of the regex but JavaScript
g # regex flag; 'g' for 'global' is needed to find all matches
Read about the syntax of regular expressions in JavaScript. For a more comprehensive description of the regex patterns I recommend reading the PHP documentation of PCRE (Perl-Compatible Regular Expressions).
You can see the regex in action and play with it on regex101.com.

Regex match cookie value and remove hyphens

I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??
var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );
If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.
Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/
You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");

What's the JS RegExp for this specific string?

I have a rather isolated situation in an inventory management program where our shelf locations have a specific format, which is always Letter: Number-Letter-Number, such as Y: 1-E-4. Most of us coworkers just type in "y1e4" and are done with it, but that obviously creates issues with inconsistent formats in a database. Are JS RegExp's the ideal way to automatically detect and format these alphanumeric strings? I'm slowly wrapping my head around JavaScript's Perl syntax, but what's a simple example of formatting one of these strings?
spec: detect string format of either "W: D-W-D" or "WDWD" and return "W: D-W-D"
This function will accept any format and return undefined if it doesnt match, returns the formatted string if a match does occur.
function validateInventoryCode(input) {
var regexp = /^([a-zA-Z]+)(?:\:\s*)?(\d+)-?(\w+)-?(\d+)$/
var r = regexp.exec(input);
if(r != null) {
return `${r[1]}: ${r[2]}-${r[3]}-${r[4]}`;
}
}
var possibles = ["y1e1", "y:1e1", "Y: 1r3", "y: 32e4", "1:e3e"];
possibles.forEach(function(posssiblity) {
console.log(`input(${posssiblity}), result(${validateInventoryCode(posssiblity)})`);
})
function validateInventoryCode(input) {
var regexp = /^([a-zA-Z]+)(?:\:\s*)?(\d+)-?(\w+)-?(\d+)$/
var r = regexp.exec(input);
if (r != null) {
return `${r[1]}: ${r[2]}-${r[3]}-${r[4]}`;
}
}
I understand the question as "convert LetterNumberLetterNumber to Letter: Number-Letter-Number.
You may use
/^([a-z])(\d+)([a-z])(\d+)$/i
and replace with $1: $2-$3-$4
Details:
^ - start of string
([a-z]) - Group 1 (referenced with $1 from the replacement pattern) capturing any ASCII letter (as /i makes the pattern case-insensitive)
(\d+) - Group 2 capturing 1 or more digits
([a-z]) - Group 3, a letter
(\d+) - Group 4, a number (1 or more digits)
$ - end of string.
See the regex demo.
var re = /^([a-z])(\d+)([a-z])(\d+)$/i;
var s = 'y1e2';
var result = s.replace(re, '$1: $2-$3-$4');
console.log(result);
OR - if the letters must be turned to upper case:
var re = /^([a-z])(\d+)([a-z])(\d+)$/i;
var s = 'y1e2';
var result = s.replace(re,
(m,g1,g2,g3,g4)=>`${g1.toUpperCase()}: ${g2}-${g3.toUpperCase()}-${g4}`
);
console.log(result);
this is the function to match and replace the pattern: DEMO
function findAndFormat(text){
var splittedText=text.split(' ');
for(var i=0, textLength=splittedText.length; i<textLength; i++){
var analyzed=splittedText[i].match(/[A-z]{1}\d{1}[A-z]{1}\d{1}$/);
if(analyzed){
var formattedString=analyzed[0][0].toUpperCase()+': '+analyzed[0][1]+'-'+analyzed[0][2].toUpperCase()+'-'+analyzed[0][3];
text=text.replace(splittedText[i],formattedString);
}
}
return text;
}
i think it's just as it reads:
y1e4
Letter, number, letter, number:
/([A-z][0-9][A-z][0-9])/g
And yes, it's ok to use regex in this case, like form validations and stuff like that. it's just there are some cases on which abusing of regular expressions gives you a bad performance (into intensive data processing and the like)
Example
"HelloY1E4world".replace(/([A-z][0-9][A-z][0-9])/g, ' ');
should return: "Hello world"
regxr.com always comes in handy

Intelligent regex to understand input

Following Split string that used to be a list, I am doing this:
console.log(lines[line]);
var regex = /(-?\d{1,})/g;
var cluster = lines[line].match(regex);
console.log(cluster);
which will give me this:
((3158), (737))
["3158", "737"]
where 3158 will be latter treated as the ID in my program and 737 the associated data.
I am wondering if there was a way to treat inputs of this kind too:
((3158, 1024), (737))
where the ID will be a pair, and do something like this:
var single_regex = regex_for_single_ID;
var pair_regex = regex_for_pair_ID;
if(single_regex)
// do my logic
else if(pair_regex)
// do my other logic
else
// bad input
Is that possible?
Clarification:
What I am interested in is treating the two cases differently. For example one solution would be to have this behavior:
((3158), (737))
["3158", "737"]
and for pairs, concatenate the ID:
((3158, 1024), (737))
["31581024", "737"]
For a simple way, you can use .replace(/(\d+)\s*,\s*/g, '$1') to merge/concatenate numbers in pair and then use simple regex match that you are already using.
Example:
var v1 = "((3158), (737))"; // singular string
var v2 = "((3158, 1024), (737))"; // paired number string
var arr1 = v1.replace(/(\d+)\s*,\s*/g, '$1').match(/-?\d+/g)
//=> ["3158", "737"]
var arr2 = v2.replace(/(\d+)\s*,\s*/g, '$1').match(/-?\d+/g)
//=> ["31581024", "737"]
We use this regex in .replace:
/(\d+)\s*,\s*/
It matches and groups 1 or more digits followed by optional spaces and comma.
In replacement we use $1 that is the back reference to the number we matched, thus removing spaces and comma after the number.
You may use an alternation operator to match either a pair of numbers (capturing them into separate capturing groups) or a single one:
/\((-?\d+), (-?\d+)\)|\((-?\d+)\)/g
See the regex demo
Details:
\((-?\d+), (-?\d+)\) - a (, a number (captured into Group 1), a ,, space, another number of the pair (captured into Group 2) and a )
| - or
\((-?\d+)\) - a (, then a number (captured into Group 3), and a ).
var re = /\((-?\d+), (-?\d+)\)|\((-?\d+)\)/g;
var str = '((3158), (737)) ((3158, 1024), (737))';
var res = [];
while ((m = re.exec(str)) !== null) {
if (m[3]) {
res.push(m[3]);
} else {
res.push(m[1]+m[2]);
}
}
console.log(res);

Extract string when preceding number or combo of preceding characters is unknown

Here's an example string:
++++#foo+bar+baz++#yikes
I need to extract foo and only foo from there or a similar scenario.
The + and the # are the only characters I need to worry about.
However, regardless of what precedes foo, it needs to be stripped or ignored. Everything else after it needs to as well.
try this:
/\++#(\w+)/
and catch the capturing group one.
You can simply use the match() method.
var str = "++++#foo+bar+baz++#yikes";
var res = str.match(/\w+/g);
console.log(res[0]); // foo
console.log(res); // foo,bar,baz,yikes
Or use exec
var str = "++++#foo+bar+baz++#yikes";
var match = /(\w+)/.exec(str);
alert(match[1]); // foo
Using exec with a g modifier (global) is meant to be used in a loop getting all sub matches.
var str = "++++#foo+bar+baz++#yikes";
var re = /\w+/g;
var match;
while (match = re.exec(str)) {
// In array form, match is now your next match..
}
How exactly do + and # play a role in identifying foo? If you just want any string that follows # and is terminated by + that's as simple as:
var foostring = '++++#foo+bar+baz++#yikes';
var matches = (/\#([^+]+)\+/g).exec(foostring);
if (matches.length > 1) {
// all the matches are found in elements 1 .. length - 1 of the matches array
alert('found ' + matches[1] + '!'); // alerts 'found foo!'
}
To help you more specifically, please provide information about the possible variations of your data and how you would go about identifying the token you want to extract even in cases of differing lengths and characters.
If you are just looking for the first segment of text preceded and followed by any combination of + and #, then use:
var foostring = '++++#foo+bar+baz++#yikes';
var result = foostring.match(/[^+#]+/);
// will be the single-element array, ['foo'], or null.
Depending on your data, using \w may be too restrictive as it is equivalent to [a-zA-z0-9_]. Does your data have anything else such as punctuation, dashes, parentheses, or any other characters that you do want to include in the match? Using the negated character class I suggest will catch every token that does not contain a + or a #.

Categories

Resources