JS Regex Add Spaces to string - javascript

I have a two-part string and parts always delimited by space and |. Like this:
teststring | secondstring
It's possible to add predefined count of space between parts using ONLY Javascript regex.replace()?
I tried something like this:
([^\|]+)(\s){0,17}(?(R2)\s|\s)([\|a-zA-Z0-9]+)
And Substitution:
$1$2$3
It's possible to repeat capture group in substitution e.g. $2{17} or match same space multiple times?
EDIT:
I have function
function InvokeRegexp(originalString, pattern, replaceExpr)
{
return originalString.replace(pattern, replaceExpr);
}
and i want to pass two-part text, pattern containing number of spaces or replaceExpr containin number of spaces and get result: firstpart | secondpart

A non regex answer:
str.split("|").join(
" ".repeat(9 /*whitespaces*/) + "|"
)
Or with regex its probably:
str.replace(/\|/," ".repeat(9)+"|")

You could use padStart and padEnd. Because you said you want them to have a certain length.
const input = 'teststring | secondstring';
// Split the input variable and select spaces as well.
// 1. Select multiple spaces: \s+
// 2. Select pipe: \|
// 3. Select all following spaces: \s+
const parts = input.split( /\s+\|\s+/ );
// So every part should be at least 20 chars in this example.
const len = 20;
const output = `${ parts[ 0 ].padEnd( len ) }|${ parts[ 1 ].padStart( len )}`;
console.log( output );

Related

Regex match apostrophe inside, but not around words, inside a character set

I'm counting how many times different words appear in a text using Regular Expressions in JavaScript. My problem is when I have quoted words: 'word' should be counted simply as word (without the quotes, otherwise they'll behave as two different words), while it's should be counted as a whole word.
(?<=\w)(')(?=\w)
This regex can identify apostrophes inside, but not around words. Problem is, I can't use it inside a character set such as [\w]+.
(?<=\w)(')(?=\w)|[\w]+
Will count it's a 'miracle' of nature as 7 words, instead of 5 (it, ', s becoming 3 different words). Also, the third word should be selected simply as miracle, and not as 'miracle'.
To make things even more complicated, I need to capture diacritics too, so I'm using [A-Za-zÀ-ÖØ-öø-ÿ] instead of \w.
How can I accomplish that?
1) You can simply use /[^\s]+/g regex
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g);
console.log(result.length);
console.log(result);
2) If you are calculating total number of words in a string then you can also use split as:
const str = `it's a 'miracle' of nature`;
const result = str.split(/\s+/);
console.log(result.length);
console.log(result);
3) If you want a word without quote at the starting and at the end then you can do as:
const str = `it's a 'miracle' of nature`;
const result = str.match(/[^\s]+/g).map((s) => {
s = s[0] === "'" ? s.slice(1) : s;
s = s[s.length - 1] === "'" ? s.slice(0, -1) : s;
return s;
});
console.log(result.length);
console.log(result);
You might use an alternation with 2 capture groups, and then check for the values of those groups.
(?<!\S)'(\S+)'(?!\S)|(\S+)
(?<!\S)' Negative lookbehind, assert a whitespace boundary to the left and match '
(\S+) Capture group 1, match 1+ non whitespace chars
'(?!\S) Match ' and assert a whitespace boundary to the right
| Or
(\S+) Capture group 2, match 1+ non whitespace chars
See a regex demo.
const regex = /(?<!\S)'(\S+)'(?!\S)|(\S+)/g;
const s = "it's a 'miracle' of nature";
Array.from(s.matchAll(regex), m => {
if (m[1]) console.log(m[1])
if (m[2]) console.log(m[2])
});

Regex to add string to beginning of every word based on condition

I have a string which looks like this
someString = "#3Hello there! How many #4candies did you sell today? Do have any #4candies left?"
lookupDict = {"Hello there": "#3", "candies": "#4"}
Now I want to replace every terms in the string someString with #0 which are not in the dictionary lookupDict. I can't split by a space " " since this will make certain terms like Hello there appear as two different words Hello and there and that would never match my condition.
Now I know to apply basic regex that would add a #0 in front of every word. For example something like
let regex = /(\b\w+\b)/g;
someString = someString.replace(regex, '#0$1'));
But that would blindly add #0 to every term and won't lookup in the dictionary lookupDict.
Is there any way I can combine the regex with a lookup in the dictionary and assign the #0 accordingly? Basically the end result would something like
someString = "#3Hello there! #0How #0many #4candies #0did #0you #0sell #0today? #0Do #0have #0any #4candies #0left?"
Note: Spaces can be considered as word boundries here.
You may use the following logic:
Build an array of substrings you need to skip that are concatenated values and keys of the associative array
Sort the items by length in the descending order since the word boundaries might not work well with phrases containing whitespace
Compile a regex pattern that will consist of two alternatives: the first will match the array items (escaped for use in a regex pattern) enclosed with a capturing group, and the other will match the rest of the "words"
When a match is found, check if Group 1 matched. If group 1 matches, just return the match value, else, add #0 to the match value.
Here is the implementation:
let someString = "#3Hello there! How many #4candies did you sell today? Do have any #4candies left? #0how #0much";
const lookupDict = {"Hello there": "#3", "candies": "#4", "how": "#0", "much": "#0"};
let patternDict = []; // Substrings to skip
for (var key in lookupDict) {
patternDict.push( `${lookupDict[key]}${key}` ); // Values + keys
}
patternDict.sort(function(a, b){ // Sorting by length, descending
return b.length - a.length;
});
var rx = new RegExp("(?:^|\\W)(" + patternDict.map(function(m) { // Building the final pattern
return m.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');}
).join("|") + ")(?!\\w)|\\S+", "gi");
// rx = /(?:^|\W)(#3Hello there|#4candies|#0much|#0how)(?!\w)|\S+/gi
someString = someString.replace(rx, (x, y) => y ? x : `#0${x}` );
console.log(someString);
// => #3Hello there! #0How #0many #4candies #0did #0you #0sell #0today? #0Do #0have #0any #4candies #0left? #0how #0much
The regex will look like
/(?:^|\W)(#3Hello there|#4candies|#0much|#0how)(?!\w)|\S+/gi
See the regex demo (PHP option chosen to highlight groups green).
Details
(?:^|\W) - a non-capturing group matching either start of string (^) or (|) any non-word char (=a char other than an ASCII letter, digit or _)
(#3Hello there|#4candies|#0much|#0how) - Capturing group 1 matching any of the lookupDict concatenated value+keys
(?!\w) - a negative lookahead that fails the match if, immediately to the right of the current location, there is a word char
| - or
\S+ - 1+ non-whitespace chars.
With this way, there is no worry for lookupDict key length or anything else:
let someString =
"#3Hello there! How many #4candies did you sell today? #3Hello there! Do have any #4candies left?#3Hello there! #7John Doe! some other text with having #7John Doe person again";
const lookupDict = { "Hello there": "#3", candies: "#4", "John Doe": "#7" };
Object.keys(lookupDict).map((key, i) => {
const regex = new RegExp(key, "g");
someString = someString.replace(regex, lookupDict[key]); // replace each key to the value: Hello world => #3
});
someString = someString.replace(/ /gi, " #0"); // replace each space
Object.keys(lookupDict).map((key, i) => {
const regex = new RegExp(lookupDict[key] + lookupDict[key], "g");
someString = someString.replace(regex, `${lookupDict[key]}${key}`); // role back the value to key+value
});
someString = someString.replace(/#0#/gi, "#"); // replace #0 for each lookupDict key value
console.log(someString, '<TheResult/>');
You can pass a function to .replace as second parameter and check the matching token in dictionary
I've changed regex to not include # in results
Hello there is problematic, how long can a single term be? max 2 words?
someString = "#3Hello there! How many #4candies did you sell today? Do have any #4candies left?"
let regex = /(?<!#)(\b\w+\b)/g;
someString = someString.replace(regex, x => {
// check x in dict
return `#0${x}`
});
console.log(someString)

Masking phone number with regex in javascript

My application has a specific phone number format which looks like 999.111.222, which I have a regex pattern to mask it on front-end:
/[0-9]{3}\.[0-9]{3}\.([0-9]{3})/
But recently, the format was changed to allow the middle three digits to have one less digit, so now both 999.11.222 and 999.111.222 match. How can I change my regex accordingly?
"999.111.222".replace(/[0-9]{3}\.[0-9]{3}\.([0-9]{3})/, '<div>xxx.xxx.$1</div>')
expected output:
"999.111.222" // xxx.xxx.222
"999.11.222" // xxx.xx.222
Replace {3} with {2,3} to match two or three digits.
/[0-9]{3}\.[0-9]{2,3}\.([0-9]{3})/
For reference see e.g. MDN
Use
console.log(
"999.11.222".replace(/[0-9]{3}\.([0-9]{2,3})\.([0-9]{3})/, function ($0, $1, $2)
{ return '<div>xxx.' + $1.replace(/\d/g, 'x') + '.' + $2 + '</div>'; })
)
The ([0-9]{2,3}) first capturing group will match 2 or 3 digits, and in the callback method used as the replacement argument, all the digits from th first group are replaced with x.
You may further customize the pattern for the first set of digits, too.
In fact, you should change not only your regex but also your callback replace function:
const regex = /[0-9]{3}\.([0-9]{2,3})\.([0-9]{3})/;
const cbFn = (all, g1, g2) =>`<div>xxx.xx${(g1.length === 3 ? 'x' : '')}.${g2}</div>`;
const a = "999.11.222".replace(regex, cbFn);
const b = "999.111.222".replace(regex, cbFn);
console.log(a, b);
To change regex you could add a term with {2,3} quantifier, as already suggested, and create a new group. Then, in replace cb function, you can use length to know if you must put a new x.

regex to extract numbers starting from second symbol

Sorry for one more to the tons of regexp questions but I can't find anything similar to my needs. I want to output the string which can contain number or letter 'A' as the first symbol and numbers only on other positions. Input is any string, for example:
---INPUT--- -OUTPUT-
A123asdf456 -> A123456
0qw#$56-398 -> 056398
B12376B6f90 -> 12376690
12A12345BCt -> 1212345
What I tried is replace(/[^A\d]/g, '') (I use JS), which almost does the job except the case when there's A in the middle of the string. I tried to use ^ anchor but then the pattern doesn't match other numbers in the string. Not sure what is easier - extract matching characters or remove unmatching.
I think you can do it like this using a negative lookahead and then replace with an empty string.
In an non capturing group (?:, use a negative lookahad (?! to assert that what follows is not the beginning of the string followed by ^A or a digit \d. If that is the case, match any character .
(?:(?!^A|\d).)+
var pattern = /(?:(?!^A|\d).)+/g;
var strings = [
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
for (var i = 0; i < strings.length; i++) {
console.log(strings[i] + " ==> " + strings[i].replace(pattern, ""));
}
You can match and capture desired and undesired characters within two different sides of an alternation, then replace those undesired with nothing:
^(A)|\D
JS code:
var inputStrings = [
"A-123asdf456",
"A123asdf456",
"0qw#$56-398",
"B12376B6f90",
"12A12345BCt"
];
console.log(
inputStrings.map(v => v.replace(/^(A)|\D/g, "$1"))
);
You can use the following regex : /(^A)?\d+/g
var arr = ['A123asdf456','0qw#$56-398','B12376B6f90','12A12345BCt', 'A-123asdf456'],
result = arr.map(s => s.match(/(^A|\d)/g).join(''));
console.log(result);

Regex match cookie value and remove hyphens

I'm trying to extract out a group of words from a larger string/cookie that are separated by hyphens. I would like to replace the hyphens with a space and set to a variable. Javascript or jQuery.
As an example, the larger string has a name and value like this within it:
facility=34222%7CConner-Department-Store;
(notice the leading "C")
So first, I need to match()/find facility=34222%7CConner-Department-Store; with regex. Then break it down to "Conner Department Store"
var cookie = document.cookie;
var facilityValue = cookie.match( REGEX ); ??
var test = "store=874635%7Csomethingelse;facility=34222%7CConner-Department-Store;store=874635%7Csomethingelse;";
var test2 = test.replace(/^(.*)facility=([^;]+)(.*)$/, function(matchedString, match1, match2, match3){
return decodeURIComponent(match2);
});
console.log( test2 );
console.log( test2.split('|')[1].replace(/[-]/g, ' ') );
If I understood it correctly, you want to make a phrase by getting all the words between hyphens and disallowing two successive Uppercase letters in a word, so I'd prefer using Regex in that case.
This is a Regex solution, that works dynamically with any cookies in the same format and extract the wanted sentence from it:
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Demo:
var str = "facility=34222%7CConner-Department-Store;";
var matches = str.match(/([A-Z][a-z]+)-?/g);
console.log(matches.map(function(m) {
return m.replace('-', '');
}).join(" "));
Explanation:
Use this Regex (/([A-Z][a-z]+)-?/g to match the words between -.
Replace any - occurence in the matched words.
Then just join these matches array with white space.
Ok,
first, you should decode this string as follows:
var str = "facility=34222%7CConner-Department-Store;"
var decoded = decodeURIComponent(str);
// decoded = "facility=34222|Conner-Department-Store;"
Then you have multiple possibilities to split up this string.
The easiest way is to use substring()
var solution1 = decoded.substring(decoded.indexOf('|') + 1, decoded.length)
// solution1 = "Conner-Department-Store;"
solution1 = solution1.replace('-', ' ');
// solution1 = "Conner Department Store;"
As you can see, substring(arg1, arg2) returns the string, starting at index arg1 and ending at index arg2. See Full Documentation here
If you want to cut the last ; just set decoded.length - 1 as arg2 in the snippet above.
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1)
//returns "Conner-Department-Store"
or all above in just one line:
decoded.substring(decoded.indexOf('|') + 1, decoded.length - 1).replace('-', ' ')
If you want still to use a regular Expression to retrieve (perhaps more) data out of the string, you could use something similar to this snippet:
var solution2 = "";
var regEx= /([A-Za-z]*)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/;
if (regEx.test(decoded)) {
solution2 = decoded.match(regEx);
/* returns
[0:"facility=34222|Conner-Department-Store",
1:"facility",
2:"34222",
3:"Conner-Department-Store",
index:0,
input:"facility=34222|Conner-Department-Store;"
length:4] */
solution2 = solution2[3].replace('-', ' ');
// "Conner Department Store"
}
I have applied some rules for the regex to work, feel free to modify them according your needs.
facility can be any Word built with alphabetical characters lower and uppercase (no other chars) at any length
= needs to be the char =
34222 can be any number but no other characters
| needs to be the char |
Conner-Department-Store can be any characters except one of the following (reserved delimiters): :/?#[]#;,'
Hope this helps :)
edit: to find only the part
facility=34222%7CConner-Department-Store; just modify the regex to
match facility= instead of ([A-z]*)=:
/(facility)=([0-9]*)\|(\S[^:\/?#\[\]\#\;\,']*)/
You can use cookies.js, a mini framework from MDN (Mozilla Developer Network).
Simply include the cookies.js file in your application, and write:
docCookies.getItem("Connor Department Store");

Categories

Resources