Parsing out Salutation & First Name from Full Name field - javascript

I have a string that contains Full Name.
The format of the full name may or may not have the salutation. Also there may or may not be a period after the salutation as well (could display as Mr. or Mr). For example, I could receive:
"Mrs. Ella Anderson"
"Ella Anderson"
"Miss Jennifer Sply"
"Mr. Dan Johnson"
"Damien Hearst"
My goal is to remove the salutation from the Full Name string. Once the salutation is removed, I want to parse out the First Name from the Full Name. I am kinda new to regex, but I do understand how to parse out the First Name. The one part I am just not sure how to do is get rid of the salutation.
var string = "Ella Anderson"
var first = string.replace(/\s.*$/, "").toUpperCase().trim();

This regex should work.
var regex = /(Mr|MR|Ms|Miss|Mrs|Dr|Sir)(\.?)\s/,
fullNames = ["Mrs. Ella Anderson", "Ella Anderson", "Miss Jennifer Sply", "Mr. Dan Johnson", "Damien Hearst"];
var names = fullNames.map(function(name) {
var match = regex.exec(name),
n = "";
(match !== null) ? n = name.replace(match[0], "") : n = name;
return n;
});
console.log(names);

The problem is that the full name is in a string in the first place. If at all possible, you should change that to just use separate fields.
There's no telling what users will enter in a text box. Nor is it reliably possible to determine what part of the remaining name is the first name, and what part is the surname.
If the input data is separated properly, you won't have to figure out what is what, any more.
So, if possible, change the way the name is entered to something like:
<select name="select">
<option>Miss</option>
<option>Mrs</option>
<option>Mr</option>
<option>etc...</option>
</select>
<input placeholder="First name" />
<input placeholder="Surname" />

You can use this regexp: /((Mrs|Mr|Miss)\.? )?([^ ]*) ?([^ ]*)/
Examples:
var regex = /((Mrs|Mr|Miss)\.? )?([^ ]*) ?([^ ]*)/;
regex.exec('Mrs. Ella Anderson') == ["Mrs. Ella Anderson", "Mrs. ", "Mrs", "Ella", "Anderson"];
regex.exec("Ella Anderson") == ["Ella Anderson", undefined, undefined, "Ella", "Anderson"];
regex.exec("Miss Jennifer Sply") == ["Miss Jennifer Sply", "Miss ", "Miss", "Jennifer", "Sply"];
regex.exec("Mr. Dan Johnson") == ["Mr. Dan Johnson", "Mr. ", "Mr", "Dan", "Johnson"];
regex.exec("Damien Hearst") == ["Damien Hearst", undefined, undefined, "Damien", "Hearst"];
regex.exec("Missy Jennifer") == ["Missy Jennifer", undefined, undefined, "Missy", "Jennifer"];
If you want the first name and the last name, you just have to look at the last two values of the array.
Of course, this regexp will not work with something like `Mr. John Smith Junior. If you want something generic, don't use a regexp.

It's a pretty complicated regex:
/^(?:(Miss|M[rs]{1,2})\.?\s+)?(\S+)\s+(\S+)$/
Then if you want middle names or initials it gets a little trickier things like jr. or sr. - It's mostly all doable. There's some question about how to deal with hyphenates.

You can use this regexp:^[ \t]*(?<title>(Shri|Leu|DR|mrs|SMT|Major|Gen){1,10}(\.|,))?\s*(?<LstName>[A-Z][a-z-']{2,20}),? +(?<FstName>[A-Z,a-z]+)*[ \t]*[^\n]*
Tested on the following Test data:
Major. Amator Gary L
Mrs. Grundy Ronald
Dr. Domsky Alan
Shri. Worden Scott Allen
Rodriguez Howard W
NEHME ALLEN
RODRIGUEZ CHARLES G
VERGARA WILLIAM F J
EVELYN J
Leu. GLICK, JACOB L.
SMT. Taylor-garcia Dottielou

Related

Extract information from string - JavaScript

I am currently implementing google places autocomplete and the module I am using in React Native gives me the address as a whole string and not in address components. However, I need to have the postal code and city separate. The example response always look like this:
address: 'Calle Gran Vía, 8, 28013 Madrid, Spain
From this string I would need to have an object that looks like this:
{
city: 'Madrid',
postal_code: 28013,
}
How could I achieve this?
It's not the most "clean" or "smooth" answer, but it's something:
var response = "address: 'Calle Gran Vía, 8, 28013 Madrid, Spain";
var subStr = response.split(",")[2];
var obj = {
city: subStr.split(" ")[2],
postal_code: subStr.split(" ")[1]
};
console.log(obj);
For the city I think the best way is to use an array of cities and search it in the string
var str = "Calle Gran Vía, 8, 28013 Madrid, Spain";
var cities = ["Paris", "Berlin", "Madrid"];
var city = cities.filter(function(item) {
if (str.search(item) != -1)
return item;
})[0] || null;
For the postal code you should use a regex depending on the country (a good list of regex by country)
Probably split the string by ',' with array methods, take the third element of the array and split that by ' ', then you have your data points.
If you can always count on it being in that same format, you can do the following.
var splitAdress = address.split(",");
//This will give you ["Calle Gran Vía", " 8", " 28013 Madrid", " Spain"]
splitAdress = splitAdress[2].split(" ");
//This will give you ["", "28013", "Madrid"]
You'll first split the string into an array based on the comma and then follow it up by splitting on the space. The extra element in the second array is due to the space. This is an example of what #CBroe pointed out in the comments.
list=adress.split(",")[2].split()
list[0] gives you the postal code
list[1] gives you the city name
It depend on if there is always a comma in the "Calle Gran Vía, 8", if not you can use instead list=adress.split(",")[-2].split()
You might want to try this.
var address="Calle Gran Vía, 8, 28013 Madrid, Spain";
var splits = address.split(',')[2].trim().split(' ');
var newAdd = {
city : splits[1],
postal_code : splits[0]
}
console.log(newAdd);

How to count occurrence of multiple sub-string in a long string with JavaScript

I am a fresh with JavaScript. I just tried a lot, but did not get the answer and information to show how to count occurrence of multiple sub-string in a long string at one time.
Further information: I need get the occurrence of these sub-string and if the number of their occurrence to much, I need replace them at one time,so I need get the occurrence at one time.
Here is an example:
The long string Text as below,
Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi's Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.
The sub-string is a question, but what I need is to count each word occurrence in this sub-string at one time. for example, the word "name","NFL","championship","game" and "is","the" in this string.
What is the name of NFL championship game?
One of problems is some sub-string is not in the text, and some have shown many times.(which I might replaced it)
The Code I have tried as below, it is wrong, I have tried many different ways but no good results.
$(".showMoreFeatures").click(function(){
var text= $(".article p").text(); // This is to get the text.
var textCount = new Array();
// Because I use match, so for the first word "what", will return null, so
this is to avoid this null. and I was plan to get the count number, if it is
more than 7 or even more, I will replace them.
var qus = item2.question; //This is to get the sub-string
var checkQus = qus.split(" "); // I split the question to words
var newCheckQus = new Array();
// This is the array I was plan put the sub-string which count number less than 7, which I really needed words.
var count = new Array();
// Because it is a question as sub-string and have many words, so I wan plan to get their number and put them in a array.
for(var k =0; k < checkQus.length; k++){
textCount = text.match(checkQus[k],"g")
if(textCount == null){
continue;
}
for(var j =0; j<checkQus.length;j++){
count[j] = textCount.length;
}
//count++;
}
I was tried many different ways, and searched a lot, but no good results. The above code just want to show what I have tried and my thinking(might totally wrong). But actually it is not working , if you know how to implement it,solve my problem, please just tell me, no need to correct my code.
Thanks very much.
If I have understood the question correctly then it seems you need to count the number of times the words in the question (que) appear in the text (txt)...
var txt = "Super Bowl 50 was an American ...etc... Arabic numerals 50.";
var que = "What is the name of NFL championship game?";
I'll go through this in vanilla JavaScript and you can transpose it for JQuery as required.
First of all, to focus on the text we can make things a little simpler by changing the strings to lowercase and removing some of the punctuation.
// both strings to lowercase
txt = txt.toLowerCase();
que = que.toLowerCase();
// remove punctuation
// using double \\ for proper regular expression syntax
var puncArray = ["\\,", "\\.", "\\(", "\\)", "\\!", "\\?"];
puncArray.forEach(function(P) {
// create a regular expresion from each punctuation 'P'
var rEx = new RegExp( P, "g");
// replace every 'P' with empty string (nothing)
txt = txt.replace(rEx, '');
que = que.replace(rEx, '');
});
Now we can create a cleaner array from str and que as well as a hash table from que like so...
// Arrays: split at every space
var txtArray = txt.split(" ");
var queArray = que.split(" ");
// Object, for storing 'que' counts
var queObject = {};
queArray.forEach(function(S) {
// create 'queObject' keys from 'queArray'
// and set value to zero (0)
queObject[S] = 0;
});
queObject will be used to hold the words counted. If you were to console.debug(queObject) at this point it would look something like this...
console.debug(queObject);
/* =>
queObject = {
what: 0,
is: 0,
the: 0,
name: 0,
of: 0,
nfl: 0,
championship: 0,
game: 0
}
*/
Now we want to test each element in txtArray to see if it contains any of the elements in queArray. If the test is true we'll add +1 to the equivalent queObject property, like this...
// go through each element in 'queArray'
queArray.forEach(function(A) {
// create regular expression for testing
var rEx = new RegExp( A );
// test 'rEx' against elements in 'txtArray'
txtArray.forEach(function(B) {
// is 'A' in 'B'?
if (rEx.test(B)) {
// increase 'queObject' property 'A' by 1.
queObject[A]++;
}
});
});
We use RegExp test method here rather than String match method because we just want to know if "is A in B == true". If it is true then we increase the corresponding queObject property by 1. This method will also find words inside words, such as 'is' in 'San Francisco' etc.
All being well, logging queObject to the console will show you how many times each word in the question appeared in the text.
console.debug(queObject);
/* =>
queObject = {
what: 0
is: 2
the: 17
name: 0
of: 2
nfl: 1
championship: 0
game: 4
}
*/
Hoped that helped. :)
See MDN for more information on:
Array.forEach()
Object.keys()
RegExp.test()

Javascript regex with varying input

I want to filter out the following information out of a long piece of text. Which I copy
and paste in a textfield and then want to process into a table as a result. with
Name
Address
Status
Example snippet:(Kind of randomized the names and addresses etc)
Thuisprikindeling voor: Vrijdag 15 Mei 2015 DE SMART BON 22 afspraken
Pagina 1/4
Persoonlijke mededeling:
Algemene mededeling:
Prikpostgegevens: REEK-Eeklo extern, (-)
Telefoonnummer Fax Mobiel 0499/9999999 Email dummy.dummy#gmail.com
DUMMY FOO V Stationstreet 2 8000 New York F N - Sober BSN: 1655
THUIS Analyses: Werknr: PIN: 000000002038905
Opdrachtgever: Laboratorium Arts:
Mededeling: Some comments // VERY DIFFICULT
FO DUMMY FOO V Butterstreet 6 8740 Melbourne F N - Sober BSN: 15898
THUIS Analyses: Werknr: AFD 3 PIN: 000000002035900
Opdrachtgever: Laboratorium Arts:
Mededeling: ZH BLA / BLA BLA - AFD 3 - SOCIAL BEER
JOHN FOOO V Waterstreet 1 9990 Rome F N - Sober BSN: 17878
THUIS / Analyses: Werknr: K111 PIN: 000000002037888
Opdrachtgever: Laboratorium Arts:
Mededeling: TRYOUT/FOO
FO SMOOTH M.FOO M Queen Elisabethstreet 19 9990 Paris F NN - Not Sober BSN: 14877
What I want to get out of it is this:
DUMMY FOO Stationstreet 2 8000 New York Sober
FO DUMMY FOO Butterstreet 6 8740 Melbourne Sober
JOHN FOOO Waterstreet 1 9990 Rome Sober
FO SMOOTH M.FOO Queen Elisabethstreet 19 9990 Paris Not sober
My strategy for the moment is using the following:
Filter all the lines with at least two words in capitals at the beginning of the line. AND a 4 digit postal code.
Then discard all the other lines as I only need the lines with the names and adresses
Then I strip out all the information needed for that line
Strip the name / address / status
I use the following code:
//Regular expressions
//Filter all lines which start with at least two UPPERCASE words following a space
pattern = /^(([A-Z'.* ]{2,} ){2,}[A-Z]{1,})(?=.*BSN)/;
postcode = /\d{4}/;
searchSober= /(N - Sober)+/;
searchNotSober= /(NN - Not sober)+/;
adres = inputText.split('\n');
for (var i = 0; i < adres.length; i++) {
// If in one line And a postcode and which starts with at least
// two UPPERCASE words following a space
temp = adres[i]
if ( pattern.test(temp) && postcode.test(temp)) {
//Remove BSN in order to be able to use digits to sort out the postal code
temp = temp.replace( /BSN.*/g, "");
// Example: DUMMY FOO V Stationstreet 2 8000 New York F N - Sober
//Selection of the name, always take first part of the array
// DUMMY FOO
var name = temp.match(/^([-A-Z'*.]{2,} ){1,}[-A-Z.]{2,}/)[0];
//remove the name from the string
temp = temp.replace(/^([-A-Z'*.]{2,} ){1,}[-A-Z.]{2,}/, "");
// V Stationstreet 2 8000 New York F N - Sober
//filter out gender
//Using jquery trim for whitespace trimming
// V
var gender = $.trim(temp.match(/^( [A-Z'*.]{1} )/)[0]);
//remove gender
temp = temp.replace(/^( [A-Z'*.]{1} )/, "");
// Stationstreet 2 8000 New York F N - Sober
//looking for status
var status = "unknown";
if ( searchNotsober.test(temp) ) {
status = "Not soberr";
}
else if ( searchSober.test(temp) ) {
status = "Sober";
}
else {
status = "unknown";
}
//Selection of the address /^.*[0-9]{4}.[\w-]{2,40}/
//Stationstreet 2 8000 New York
var address = $.trim(temp.match(/^.*[0-9]{4}.[\w-]{2,40}/gm));
//assemble into person object.
var person={name: name + "", address: address + "", gender: gender +"", status:status + "", location:[] , marker:[]};
result.push(person);
}
}
The problem I have now is that:
Sometimes the names are not written in CAPITALS
Sometimes the postal code is not added so my code just stops working.
Sometimes they put a * in front of the name
A broader question is what strategy can you take to tackle these type of messy input problems?
Should I make cases for every mistake I see in these snippets I get? I feel like
I don't really know exactly what I will get out of this piece of code every time I run
it with different input.
Here is a general way of handling it:
Find all lines that are most likely matches. Match on "Sober" or whatever makes it unlikely to miss a match, even if it gives you false positives.
Filter out false positives, this you have to update and tweak as you go. Make sure you only filter out what isn't relevant at all.
Strict filtering of input, what doesn't match gets logged/reported for manual handling, what does match now conforms to a known strict pattern
Normalize and extract data should now be much easier since you have limited possible input at this stage

title casing and Abbreviations in javascript

I am trying to Titlecase some text which contains corporate names and their stock symbols.
Example (these strings are concatenated as corporate name, which gets title cased and the symbol in parens): AT&T (T)
John Deere Inc. (DE)
These corporate names come from our database which draws them from a stock pricing service. I have it working EXCEPT for when the name is an abbreviation like AT&T
That is return, and you guessed it right, like At&t. How can I preserve casing in abbreviations. I thought to use indexof to get the position of any &'s and uppercase the two characters on either side of it but that seems hackish.
Along the lines of(pseudo code)
var indexPos = myString.indexOf("&");
var fixedString = myString.charAt(indexPos - 1).toUpperCase().charAt(indexPos + 1).toUpperCase()
Oops, forgot to include my titlecase function
function toTitleCase(str) {
return str.replace(/([^\W_]+[^\s-]*) */g, function (txt) {
return txt.charAt(0).toUpperCase() + txt.substr(1).toLowerCase();
});
}
Any better suggestions?
A better title case function may be
function toTitleCase(str) {
return str.replace(
/(\b.)|(.)/g,
function ($0, $1, $2) {
return ($1 && $1.toUpperCase()) || $2.toLowerCase();
}
);
}
toTitleCase("foo bAR&bAz a.e.i."); // "Foo Bar&Baz A.E.I."
This will still transform AT&T to At&T, but there's no information in the way it's written to know what to do, so finally
// specific fixes
if (str === "At&T" ) str = "AT&T";
else if (str === "Iphone") str = "iPhone";
// etc
// or
var dict = {
"At&T": "AT&T",
"Iphone": "iPhone"
};
str = dict[str] || str;
Though of course if you can do it right when you enter the data in the first place it will save you a lot of trouble
This is a general solution for title case, without taking your extra requirements of "abbreviations" into account:
var fixedString = String(myString).toLowerCase().replace(/\b\w/g, String.toUpperCase);
Although I agree with other posters that it's better to start with the data in the correct format in the first place. Not all proper names conform to title case, with just a couple examples being "Werner von Braun" and "Ronald McDonald." There's really no algorithm you can program into a computer to handle the often arbitrary capitalization of proper names, just like you can't really program a computer to spell check proper names.
However, you can certainly program in some exception cases, although I'm still not sure that simply assuming that any word with an ampersand in it should be in all caps always appropriate either. But that can be accomplished like so:
var titleCase = String(myString).toLowerCase().replace(/\b\w/g, String.toUpperCase);
var fixedString = titleCase.replace(/\b\w*\&\w*\b/g, String.toUpperCase);
Note that your second example of "John Deere Inc. (DE)" still isn't handled properly, though. I suppose you could add some other logic to say, put anything word between parentheses in all caps, like so:
var titleCase = String(myString).toLowerCase().replace(/\b\w/g, String.toUpperCase);
var titleCaseCapAmps = titleCase.replace(/\b\w*\&\w*\b/g, String.toUpperCase);
var fixedString = titleCaseCapAmps.replace(/\(.*\)/g, String.toUpperCase);
Which will at least handle your two examples correctly.
How about this: Since the number of registered companies with the stock exchange is finite, and there's a well-defined mapping between stock symbols and company names, your best best is probably to program that mapping into your code, to look up the company name by the ticker abbreviation, something like this:
var TickerToName =
{
A: "Agilent Technologies",
AA: "Alcoa Inc.",
// etc., etc.
}
Then it's just a simple lookup to get the company name from the ticker symbol:
var symbol = "T";
var CompanyName = TickerToName[symbol] || "Unknown ticker symbol: " + symbol;
Of course, I would be very surprised if there was not already some kind of Web Service you could call to get back a company name from a stock ticker symbol, something like in this thread:
Stock ticker symbol lookup API
Or maybe there's some functionality like this in the stock pricing service you're using to get the data in the first place.
The last time I faced this situation, I decided that it was less trouble to simply include the few exceptions here and there as need.
var titleCaseFix = {
"At&t": "AT&T"
}
var fixit(str) {
foreach (var oldCase in titleCaseFix) {
var newCase = titleCaseFix[oldCase];
// Look here for various string replace options:
// http://stackoverflow.com/questions/542232/in-javascript-how-can-i-perform-a-global-replace-on-string-with-a-variable-insi
}
return str;
}

JavaScript find names in strings

What's a good JavaScript library for searching a given string for a large list of names.
For example, given a list of 1000 politicians names find every instance in a string and wrap it in a span.
Priorities are performance with a growing list of names, and accuracy in determining difference between eg, "Tony Blair", "Tony Blair III".
For example, this:
["Tony Blair", "Margaret Thatcher", "Tony Blairite", "Tony Blair III", etc...]
"The best PM after Tony Blair was Margaret Thatcher."
Becomes:
"The best PM after <span class="mp">Tony Blair</span> was <span class="mp">Margaret Thatcher</span>."
var names = ['foo','bar'];
var content = "this foo is bar foobar foo ";
for (var c=0,l=names.length;c<l;c++) {
var r = new RegExp("\\b("+names[c]+")\\b","gi");
content = content.replace(r,'<span class="mp">$1</span>');
}

Categories

Resources