Javascript Regex, Add space before and after math operators if there is none - javascript

Im trying to make the perfect math parser for my discord bot.
Currently I have a simple function parser that takes in a string which has a ton of .replace methods to clear up a bunch of junk or formatting things leftover from discord, or just replaces {} with () and such quality of life things...
var parseArgs = args.toLowerCase().replace(/ -o/g, "").replace(/x/g, "*").replace(/[a-z]/g, "")
.replace(/{/g, "(").replace(/}/g, ")").replace(/\[/g, "(").replace(/]/g, ")").replace(/\+=/g, "+")
.replace(/-=/g, "-").replace(/'/g, "").replace(/`/g, "").replace(/"/g, "");
var origArgs = args.toLowerCase().replace(/`/g, "").replace(/ -o/g, "");
const output = parseMath(parseArgs);
This is nice and all, but If you input an equation like this:
!math 1 + 1aaa+aaaa2{55>>2}
The parser will output:
1 + 1+2*(55>>2)
I want it to output:
1 + 1 + 2 * (55 >> 2)
Which easily gets parsed by my function, but the equation is sent into the chat, and its quite ugly.
Im asking if theres a simple regex formula to check if a math operator (+ - / * x ( ) >> ^ += -= == ===) like those is between any numbers
so 1+2/3(4>>2) and 3>>4===3*4 will turn into 1 + 2 / 3 (4 >> 2) and 3 >> 4 === 3 * 4 respectively.
Edit: I see how crappy my replaces are, so I simplified them:
var parseArgs = args.toLowerCase().replace(/x/g, "*").replace(/ -o|[a-z]|"|'|`/g, "")
.replace(/{|\[/g, "(").replace(/}|]/g, ")").replace(/\+=/g, "+").replace(/-=/g, "-");
var origArgs = args.toLowerCase().replace(/ -o|`/g, "");

First remove anything that isn't mathematical (remove anything that isn't a number or a possible operator), then use .replace to match zero or more spaces, followed by any of the operators, then match zero or more spaces again, and replace with the operator with one space on each side:
const parse = (args) => {
const argsWithOnlyMath = args.replace(/[^\d+\-\/*x()>^=]/g, ' ');
const spacedArgs = argsWithOnlyMath
.replace(/\s*(\D+)\s*/g, ' $1 ') // add spaces
.replace(/ +/g, ' ') // ensure no duplicate spaces
.replace(/\( /g, '(') // remove space after (
.replace(/ \)/g, ')'); // remove space before )
console.log(spacedArgs);
};
parse('!math 1 + 1aaa+aaaa2(55>>2)');
parse(' 1+2/3(4>>2) ');
parse('3>>4===3*4');
To also add spaces before ( and after ), just add more .replaces:
const parse = (args) => {
const argsWithOnlyMath = args.replace(/[^\d+\-\/*x()>^=]/g, ' ');
const spacedArgs = argsWithOnlyMath
.replace(/\s*(\D+)\s*/g, ' $1 ') // add spaces
.replace(/\(/g, ' (') // add space before (
.replace(/\)/g, ') ') // add space after )
.replace(/ +/g, ' ') // ensure no duplicate spaces
.replace(/\( /g, '(') // remove space after (
.replace(/ \)/g, ')'); // remove space before )
console.log(spacedArgs);
};
parse('!math 1 + 1aaa+aaaa2(55>>2)');
parse(' 1+2/3(4>>2) *()');
parse('3*()');

Related

RegEx Data Values Javascript white Space

I am trying to add the correct white space for data i am receiving. currently it shows like this
NotStarted
ReadyforPPPDReview
this is the code i am using
.replace(/([A-Z])/g, '$1')
"NotStarted" shows correct "Not Started" but "ReadyforPPPDReview" shows "Readyfor P P P D Review" when it should look like this "Ready for PPPD Review"
what is the best way to handle both of these using one regex or function?
You would need an NLP engine to handle this properly. Here are two approaches with simple regex, both have limitations:
1. Use list of stop words
We blindly add spaces before and after the stop words:
var str = 'NotStarted, ReadyforPPPDReview';
var wordList = 'and, for, in, on, not, review, the'; // stop words
var wordListRe = new RegExp('(' + wordList.replace(/, */g, '|') + ')', 'gi');
var result1 = str
.replace(wordListRe, ' $1 ') // add space before and after stop words
.replace(/([a-z])([A-Z])/g, '$1 $2') // add space between lower case and upper case chars
.replace(/ +/g, ' ') // remove excessive spaces
.trim(); // remove spaces at start and end
console.log('str: ' + str);
console.log('result1: ' + result1);
As you can imagine the stop words approach has some severe limitations. For example, words formula input would result in for mula in put.
1. Use a mapping table
The mapping table lists words that need to be spaced out (no drugs involved), as in this code snippet:
var str = 'NotStarted, ReadyforPPPDReview';
var spaceWordMap = {
NotStarted: 'Not Started',
Readyfor: 'Ready for',
PPPDReview: 'PPPD Review'
// add more as needed
};
var spaceWordMapRe = new RegExp('(' + Object.keys(spaceWordMap).join('|') + ')', 'gi');
var result2 = str
.replace(spaceWordMapRe, function(m, p1) { // m: matched snippet, p1: first group
return spaceWordMap[p1] // replace key in spaceWordMap with its value
})
.replace(/([a-z])([A-Z])/g, '$1 $2') // add space between lower case and upper case chars
.replace(/ +/g, ' ') // remove excessive spaces
.trim(); // remove spaces at start and end
console.log('str: ' + str);
console.log('result2: ' + result2);
This approach is suitable if you have a deterministic list of words as input.

Javascript: Regex to replace two characters and all their surrounding whitespace

I have a string and would like to replace all instances of the two characters "<" and ">" together with all its surrounding whitespace (no tabs, no newlines, possibly empty) by " < " and " > ", respectively.
Can I do this with a one-liner replace regex expression?
The slow and hard way would be
while (entry.value.indexOf(" <") > -1) {
entry.value = entry.value.replace(" <","<");
}
while (entry.value.indexOf("< ") > -1) {
entry.value = entry.value.replace("< ","<");
}
while (entry.value.indexOf(" >") > -1) {
entry.value = entry.value.replace(" >",">");
}
while (entry.value.indexOf("> ") > -1) {
entry.value = entry.value.replace("> ",">");
}
entry.value = entry.value.replace("<"," < ").replace(">"," > ");
Shortening the whitespace is explained at Regex to replace multiple spaces with a single space, but I do not assume whitespaces around the two characters.
The use case I have are saving math expressions in a database to be presented on a website using MathJax. Doing so, one runs into exactly this problem, see http://docs.mathjax.org/en/latest/tex.html#tex-and-latex-in-html-documents.
Typical expressions are
"Let $i$ such that $i<j$..."
"Let $<I>$ be an ideal in..."
(the later wouldn't even render here in the preview in normal text mode.)
Copy pasting Wiktor's comment here. \s matches any whitespace character, * indicates to match 0 or more of those whitespace characters, [<>] matches any < or >, the g flag indicates to do a global replace instead of just replacing the first match, and the parentheses are to create a capture group so that we can use $1 to refer to the match as a backreference in the replacement string.
See some example input output below.
'<>' // => ' < > ' (two spaces between the carets)
'<\t\t\n\ \n<' // => ' < < ' (again two spaces)
'>a \t b< ' // => ' > a \t b < '
'a>\n b <c ' // => 'a > b < c '
a = 'fpo< \n>\naf ja\tb<>\t<><>asd\npfi b.<< > >';
b = a.replace(/\s*([<>])\s*/g, ' $1 ');
console.log(b);

How to replace a double 2 consecutive quotes by a space in JavaScript

I have a string which represents longitude/latitude in the following format:
dd°mm'ss''W (note 2 single quotes after ss).
And I am using the following code to transform the string in its decimal representation:
function dmsTodegrees(val) {
var s = val.replace('°', ' ');
s = s.replace("'", ' ');
s = s.replace("''", ' ');
var tokens = s.split(' ');
var result = Number.parseFloat(tokens[0]) + Number.parseFloat(tokens[1]) / 60 + Number.parseFloat(tokens[2]) / 3600;
if (tokens[3] === 'W' || tokens[3] === 'S') result = -result;
return result;
}
However, it seems that s = s.replace("''", ' '); is not doing what is intended, and the the 2 single quotes (') are not replaced by a space. Not sure what I am doing wrong here.
Note that i have omitted all the error handling here.
you can use /'{1,2}/g to replace all ' in the string. If you don't care about the number and want to replace all single quotes, you can just use /'+/g
function dmsTodegrees(val) {
var s = val.replace('°', ' ');
s = s.replace(/'{1,2}/g, ' ');
return s;
}
console.log(dmsTodegrees("dd°mm'ss''W"));
So, you are trying to replace °, ' and '' with space in order to split the string with space. Instead of replacing them with space and splitting the string with space, why don't you just split directly?
...
var tokens = val.split(/°|'{1,2}/);
...
It works like this:
"12.34°56.78'90.12''W".split(/°|'{1,2}/)
=> (4) ["12.34", "56.78", "90.12", "W"]
For these lines:
s = s.replace("'", ' ');
s = s.replace("''", ' ');
the first takes every "'" and replaces it with a space. So the second will never find two quotes together.
EXCEPT...replace only works on the first thing it finds.
so
"1,2,3,'',4,'',5,''".replace("'", " ").replace("''", ' ')
gives you
"1,2,3, ',4, ,5,''"
Check the answer from #Dij for the better way.
I would suggest to use single quotes (') with escape sequence(\') inside single quotes for replace function
function dmsTodegrees(val) {
var s = val.replace('°', ' ');
s = s.replace('\'', ' ');
s = s.replace('\'\'', ' ');
console.log(s);
var tokens = s.split(' ');
var result = Number.parseFloat(tokens[0]) + Number.parseFloat(tokens[1]) / 60 + Number.parseFloat(tokens[2]) / 3600;
if (tokens[3] === 'W' || tokens[3] === 'S') result = -result;
return result;
}
console.log(dmsTodegrees("20°10'30''4"));
You need to replace two single quotes first and then one single quote.
If you you repace the single quotes first, even two single quotes get replaced by two spaces (as is the case with you).
s = s.replace("''"," ");
s = s.replace("'"," ");

How to replace css('background-image')

I want to replace css('background-image') path.
The problem:
for the same variable oldBgImg = this.element.css('background-image')
FireFox returns -
"url("http://mySite/images/file1.png")"
but Chrome returns it without the quotes:
"url(http://mySite/images/file1.png)"
Here is the solution I use. can you please help me make it simpler?
var oldBgImg = this.element.css('background-image');
// => FF: "url("http://mySite/images/file1.png")"
// Chrome: "url(http://mySite/images/file1.png)"
// According to http://www.w3.org/TR/CSS2/syndata.html#value-def-uri :
// quotes are optional, so Chrome does not use them, but FF does . . .
var n1 = oldBgImg.lastIndexOf("("); n1 += 1; // now points to the char after the "("
var n2 = oldBgImg.lastIndexOf(")"); n2 -= 1; // now points to the char before the ")"
var c1 = oldBgImg.substring(n1, n1 + 1); // test the first Char after the "("
var c2 = oldBgImg.substring(n2, n2 + 1); // test the first Char after the "("
if ( (c1 == "\"") || (c1 == "\'") ) { n1 += 1; }
if ( (c2 == "\"") || (c2 == "\'") ) { n2 -= 1; }
var oldBgImgPath = oldBgImg.substring(n1, n2 + 1); // [ (" ] .. [ ") ]
var n = oldBgImgPath.lastIndexOf("/");
var newBgImgPath = oldBgImgPath.substring(0, n + 1) + "file2.gif";
// if needed, should also add :
// var path = encodeURI(newBgImgPath);
this.element.css('background-image', 'url(' + newBgImgPath + ')');
Notes:
According to http://www.w3.org/TR/CSS2/syndata.html#value-def-uri
one can use single quote or double-quote or no quote sign
I am looking for a general solution, also for relative path (without "http" or with "file") , I just want to replace the fileName within the URL.
Here's an example of how to do it with regular expressions. - live demo
The expression:
("?)(http:.*?)\1\)
The match
url = 'url("http://mySite/images/file1.png")'.match(/("?)(http:.*?)\1\)/)[2];
You can then reconstruct your property.
$(this).css( 'background-image', 'url("' + url + "')" );
This should work on all browsers.
I did it with regular expressions. I use this code:
var re = /url\(['"]?(.+?)[^\/]+['"]?\)/;
var regs = re.exec(oldBgImg);
var newBgImgPath = regs[1] + "file2.png";
JSFiddle
I'll explain the RE.
It starts with a /, this will indicate it's a RE.
Then there's url\(. It matches the text url(. ( is escaped because it is a reserved character.
Then there is ['"]?. ['"] matches ' or " and the ? makes it optional.
A ( starts a RE group, that can be referred to.
In .+? . matches all charaters except a newline. A + tells that there must be at least 1 of them, or more. Finally, a ? makes the + non-greedy, so it matches as little characters as possible but still tries to match the whole RE.
A ) ends the group.
[^\/] matches any non-/ character. Then there's a + again. It has no ? after it, because we want to match as many non-/ characters (the file name) from the end as we can.
Finally, another optional quote, an escaped ) for the closing bracket in url(...) and a / to end the RE.
Now re.exec(oldBgImg) returns an array with the first element being the whole matched string and the next elements being the matched RE groups (created by () brackets). Then I can just take regs[1], which is the first matched group and contains the pathname.
You could replace the quotes in oldBgImg with nothing like this.
oldBgImg = oldBgImg.replace(/\"/g, "");
That way the URL is always the same no matter what browser retrieved it.

Regular Expression to reformat a US phone number in Javascript

I'm looking to reformat (replace, not validate - there are many references for validating) a phone number for display in Javascript. Here's an example of some of the data:
123 4567890
(123) 456-7890
(123)456-7890
123 456 7890
123.456.7890
(blank/null)
1234567890
Is there an easy way to use a regular expression to do this? I'm looking for the best way to do this. Is there a better way?
I want to reformat the number to the following: (123) 456-7890
Assuming you want the format "(123) 456-7890":
function formatPhoneNumber(phoneNumberString) {
var cleaned = ('' + phoneNumberString).replace(/\D/g, '');
var match = cleaned.match(/^(\d{3})(\d{3})(\d{4})$/);
if (match) {
return '(' + match[1] + ') ' + match[2] + '-' + match[3];
}
return null;
}
Here's a version that allows the optional +1 international code:
function formatPhoneNumber(phoneNumberString) {
var cleaned = ('' + phoneNumberString).replace(/\D/g, '');
var match = cleaned.match(/^(1|)?(\d{3})(\d{3})(\d{4})$/);
if (match) {
var intlCode = (match[1] ? '+1 ' : '');
return [intlCode, '(', match[2], ') ', match[3], '-', match[4]].join('');
}
return null;
}
formatPhoneNumber('+12345678900') // => "+1 (234) 567-8900"
formatPhoneNumber('2345678900') // => "(234) 567-8900"
Possible solution:
function normalize(phone) {
//normalize string and remove all unnecessary characters
phone = phone.replace(/[^\d]/g, "");
//check if number length equals to 10
if (phone.length == 10) {
//reformat and return phone number
return phone.replace(/(\d{3})(\d{3})(\d{4})/, "($1) $2-$3");
}
return null;
}
var phone = '(123)4567890';
phone = normalize(phone); //(123) 456-7890
var x = '301.474.4062';
x = x.replace(/\D+/g, '')
.replace(/(\d{3})(\d{3})(\d{4})/, '($1) $2-$3');
alert(x);
This answer borrows from maerics' answer. It differs primarily in that it accepts partially entered phone numbers and formats the parts that have been entered.
phone = value.replace(/\D/g, '');
const match = phone.match(/^(\d{1,3})(\d{0,3})(\d{0,4})$/);
if (match) {
phone = `${match[1]}${match[2] ? ' ' : ''}${match[2]}${match[3] ? '-' : ''}${match[3]}`;
}
return phone
I'm using this function to format US numbers.
function formatUsPhone(phone) {
var phoneTest = new RegExp(/^((\+1)|1)? ?\(?(\d{3})\)?[ .-]?(\d{3})[ .-]?(\d{4})( ?(ext\.? ?|x)(\d*))?$/);
phone = phone.trim();
var results = phoneTest.exec(phone);
if (results !== null && results.length > 8) {
return "(" + results[3] + ") " + results[4] + "-" + results[5] + (typeof results[8] !== "undefined" ? " x" + results[8] : "");
}
else {
return phone;
}
}
It accepts almost all imaginable ways of writing a US phone number. The result is formatted to a standard form of (987) 654-3210 x123
thinking backwards
Take the last digits only (up to 10) ignoring first "1".
function formatUSNumber(entry = '') {
const match = entry
.replace(/\D+/g, '').replace(/^1/, '')
.match(/([^\d]*\d[^\d]*){1,10}$/)[0]
const part1 = match.length > 2 ? `(${match.substring(0,3)})` : match
const part2 = match.length > 3 ? ` ${match.substring(3, 6)}` : ''
const part3 = match.length > 6 ? `-${match.substring(6, 10)}` : ''
return `${part1}${part2}${part3}`
}
example input / output as you type
formatUSNumber('+1333')
// (333)
formatUSNumber('333')
// (333)
formatUSNumber('333444')
// (333) 444
formatUSNumber('3334445555')
// (333) 444-5555
2021
libphonenumber-js
Example
import parsePhoneNumber from 'libphonenumber-js'
const phoneNumber = parsePhoneNumber('+12133734253')
phoneNumber.formatInternational() === '+1 213 373 4253'
phoneNumber.formatNational() === '(213) 373-4253'
phoneNumber.getURI() === 'tel:+12133734253'
Based on David Baucum's answer - here is a version that trys to improve auto-replacement "as you type" for example in a React onChange event handler:
function formatPhoneNumber(phoneNumber) {
const cleanNum = phoneNumber.toString().replace(/\D/g, '');
const match = cleanNum.match(/^(\d{3})(\d{0,3})(\d{0,4})$/);
if (match) {
return '(' + match[1] + ') ' + (match[2] ? match[2] + "-" : "") + match[3];
}
return cleanNum;
}
//...
onChange={e => setPhoneNum(formatPhoneNumber(e.target.value))}
It will insert (###) as soon as there are 3 numbers and then it will keep following the RegEx until it looks like this (###) ###-####
I've extended David Baucum's answer to include support for extensions up to 4 digits in length. It also includes the parentheses requested in the original question. This formatting will work as you type in the field.
phone = phone.replace(/\D/g, '');
const match = phone.match(/^(\d{1,3})(\d{0,3})(\d{0,4})(\d{0,4})$/);
if (match) {
phone = `(${match[1]}${match[2] ? ') ' : ''}${match[2]}${match[3] ? '-' : ''}${match[3]}${match[4] ? ' x' : ''}${match[4]}`;
}
return phone;
Almost all of these have issues when the user tries to backspace over the delimiters, particularly from the middle of the string.
Here's a jquery solution that handles that, and also makes sure the cursor stays in the right place as you edit:
//format text input as phone number (nnn) nnn-nnnn
$('.myPhoneField').on('input', function (e){
var $phoneField = e.target;
var cursorPosition = $phoneField.selectionStart;
var numericString = $phoneField.value.replace(/\D/g, '').substring(0, 10);
// let user backspace over the '-'
if (cursorPosition === 9 && numericString.length > 6) return;
// let user backspace over the ') '
if (cursorPosition === 5 && numericString.length > 3) return;
if (cursorPosition === 4 && numericString.length > 3) return;
var match = numericString.match(/^(\d{1,3})(\d{0,3})(\d{0,4})$/);
if (match) {
var newVal = '(' + match[1];
newVal += match[2] ? ') ' + match[2] : '';
newVal += match[3] ? '-' + match[3] : '';
// to help us put the cursor back in the right place
var delta = newVal.length - Math.min($phoneField.value.length, 14);
$phoneField.value = newVal;
$phoneField.selectionEnd = cursorPosition + delta;
} else {
$phoneField.value = '';
}
})
var numbers = "(123) 456-7890".replace(/[^\d]/g, ""); //This strips all characters that aren't digits
if (numbers.length != 10) //wrong format
//handle error
var phone = "(" + numbers.substr(0, 3) + ") " + numbers.substr(3, 3) + "-" + numbers.substr(6); //Create format with substrings
Here is one that will accept both phone numbers and phone numbers with extensions.
function phoneNumber(tel) {
var toString = String(tel),
phoneNumber = toString.replace(/[^0-9]/g, ""),
countArrayStr = phoneNumber.split(""),
numberVar = countArrayStr.length,
closeStr = countArrayStr.join("");
if (numberVar == 10) {
var phone = closeStr.replace(/(\d{3})(\d{3})(\d{4})/, "$1.$2.$3"); // Change number symbols here for numbers 10 digits in length. Just change the periods to what ever is needed.
} else if (numberVar > 10) {
var howMany = closeStr.length,
subtract = (10 - howMany),
phoneBeginning = closeStr.slice(0, subtract),
phoneExtention = closeStr.slice(subtract),
disX = "x", // Change the extension symbol here
phoneBeginningReplace = phoneBeginning.replace(/(\d{3})(\d{3})(\d{4})/, "$1.$2.$3"), // Change number symbols here for numbers greater than 10 digits in length. Just change the periods and to what ever is needed.
array = [phoneBeginningReplace, disX, phoneExtention],
afterarray = array.splice(1, 0, " "),
phone = array.join("");
} else {
var phone = "invalid number US number";
}
return phone;
}
phoneNumber("1234567891"); // Your phone number here
For all international Phone numbers with country code upto 3 digits, we can change the original answer a little bit as below.
For first match instead of looking for '1' we should look for 1-3 digits.
export const formatPhoneNumber = (phoneNumberString) => {
var cleaned = ('' + phoneNumberString).replace(/\D/g, '');
var match = cleaned.match(/^(\d{1,3}|)?(\d{3})(\d{3})(\d{4})$/);
if (match) {
var intlCode = (match[1] ? `+${match[1]} ` : '');
return [intlCode, '(', match[2], ') ', match[3], '-', match[4]].join('');
}
return null;
}
console.log( formatPhoneNumber('16464765278') )//+1 (646) 476-5278
console.log( formatPhoneNumber('+2549114765278')) //+254 (911) 476-5278
console.log( formatPhoneNumber('929876543210') )//+92 (987) 654-3210
Fulfils my requirement.
For US Phone Numbers
/^\(?(\d{3})\)?[- ]?(\d{3})[- ]?(\d{4})$/
Let’s divide this regular expression in smaller fragments to make is easy to understand.
/^\(?: Means that the phone number may begin with an optional (.
(\d{3}): After the optional ( there must be 3 numeric digits. If the phone number does not have a (, it must start with 3 digits. E.g. (308 or 308.
\)?: Means that the phone number can have an optional ) after first 3 digits.
[- ]?: Next the phone number can have an optional hyphen (-) after ) if present or after first 3 digits.
(\d{3}): Then there must be 3 more numeric digits. E.g (308)-135 or 308-135 or 308135
[- ]?: After the second set of 3 digits the phone number can have another optional hyphen (-). E.g (308)-135- or 308-135- or 308135-
(\d{4})$/: Finally, the phone number must end with four digits. E.g (308)-135-7895 or 308-135-7895 or 308135-7895 or 3081357895.
Reference :
http://www.zparacha.com/phone_number_regex/
You can use this functions to check valid phone numbers and normalize them:
let formatPhone = (dirtyNumber) => {
return dirtyNumber.replace(/\D+/g, '').replace(/(\d{3})(\d{3})(\d{4})/, '($1) $2-$3');
}
let isPhone = (phone) => {
//normalize string and remove all unnecessary characters
phone = phone.replace(/\D+/g, '');
return phone.length == 10? true : false;
}
The solutions above are superior, especially if using Java, and encountering more numbers with more than 10 digits such as the international code prefix or additional extension numbers. This solution is basic (I'm a beginner in the regex world) and designed with US Phone numbers in mind and is only useful for strings with just 10 numbers with perhaps some formatting characters, or perhaps no formatting characters at all (just 10 numbers). As such I would recomend this solution only for semi-automatic applications. I Personally prefer to store numbers as just 10 numbers without formatting characters, but also want to be able to convert or clean phone numbers to the standard format normal people and apps/phones will recognize instantly at will.
I came across this post looking for something I could use with a text cleaner app that has PCRE Regex capabilities (but no java functions). I will post this here for people who could use a simple pure Regex solution that could work in a variety of text editors, cleaners, expanders, or even some clipboard managers. I personally use Sublime and TextSoap. This solution was made for Text Soap as it lives in the menu bar and provides a drop-down menu where you can trigger text manipulation actions on what is selected by the cursor or what's in the clipboard.
My approach is essentially two substitution/search and replace regexes. Each substitution search and replace involves two regexes, one for search and one for replace.
Substitution/ Search & Replace #1
The first substitution/ search & replace strips non-numeric numbers from an otherwise 10-digit number to a 10-digit string.
First Substitution/ Search Regex: \D
This search string matches all characters that is not a digit.
First Substitution/ Replace Regex: "" (nothing, not even a space)
Leave the substitute field completely blank, no white space should exist including spaces. This will result in all matched non-digit characters being deleted. You should have gone in with 10 digits + formatting characters prior this operation and come out with 10 digits sans formatting characters.
Substitution/ Search & Replace #2
The second substitution/search and replace search part of the operation captures groups for area code $1, a capture group for the second set of three numbers $2, and the last capture group for the last set of four numbers $3. The regex for the substitute portion of the operation inserts US phone number formatting in between the captured group of digits.
Second Substitution/ Search Regex: (\d{3})(\d{3})(\d{4})
Second Substitution/ Replace Regex: \($1\) $2\-$3
The backslash \ escapes the special characters (, ) , (<-whitespace), and - since we are inserting them between our captured numbers in capture groups $1, $2, & $3 for US phone number formatting purposes.
In TextSoap I created a custom cleaner that includes the two substitution operation actions, so in practice it feels identical to executing a script. I'm sure this solution could be improved but I expect complexity to go up quite a bit. An improved version of this solution is welcomed as a learning experience if anyone wants to add to this.

Categories

Resources