RegExp replacement with variable and anchor

RegExp replacement with variable and anchor - javascript

I've done some research, like 'How do you use a variable in a regular expression?' but no luck.
Here is the given input
const input = 'abc $apple 123 apple $banana'
Each variable has its own value.
Currently I'm able to query all variables from the given string using
const variables = input.match(/([$]\w+)/g)
Replacement through looping the variables array with the following codes is not successful
const r = `/(\$${variable})/`;
const target = new RegExp(r, 'gi');
input.replace(target, value);
However, without using the variable, it will be executed,
const target = new RegExp(/(\$apple)/, 'gi');
input.replace(target, value);
I also changed the variable flag from $ to % or #, and it works with the following codes,
// const target = new RegExp(`%{variable}`, 'gi');
// const target = new RegExp(`#{variable}`, 'gi');
input.replace(target, value);
How to match the $ symbol with variable in this case?

If understand correctly, uses (\\$${variable}).
You can check below Fiddle, if only one slash, it will cause RegExp is /($apple)/gi (the slash and the following character are escaped: \$ => $), but $ indicates the end of string in Regex if not escaped.
So the solution is add another slash.
Like below demo:
const input = 'abc $apple 123 apple $banana'
let variable = 'apple'
let value = '#test#'
const r = `(\\$${variable})`;
const target = new RegExp(r, 'gi');
console.log('template literals', r, `(\$${variable})`)
console.log('regex:', target, new RegExp(`(\$${variable})`, 'gi'))
console.log(input.replace(target, value))

I have been using regular expressions just about everyday for almost a year now. I'll post my thoughts and just say:
Regular Expressions are most useful at finding parts of a text or data
file.
If there is a text-file that contains the word "apple" or some derivative there-in, regular-expressions can be a great tool. I use them everyday for parsing HTML content, as I write foreign-news translations (based in HTML).
I believe the code that was posted was in JavaScript (because I saw the replace(//,"gi") function which is what I know is used in that scripting language. I use Java's java.util.regex package myself, and the syntax is just slightly different.
If all you want to do is put a "place-holder" inside of a String - this code could work, I guess - but again, understanding why "regular-expressions" are necessary seems like the real question. In this example, I have used the ampersand ('&') as the placeholder - since I know for a fact it is not one of the "reserved key words" used by (most, but not necessarily all of) the Regular Expression Compiler and Processor.
var s1 = "&VAR1"; // Un-used variable - leaving it here for show!
var myString = "An example text-string with &VAR1, a particular kind of fruit.";
myString.replace(/&VAR1/gi, "apple");
If you want a great way to practice with Regular-Expressions, go to this web-site and play around with them:
https://regexr.com/
Here are the rules for "reserved key symbols" of RegEx Patterns (Copied from that Site):
The following character have special meaning, and should be preceded
by a \ (backslash) to represent a literal character:
+*?^$.[]{}()|/
Within a character set, only \, -, and ] need to be escaped.
Also, sort of "most importantly" - Regular Expressions are "compiled" in Java. I'm not exactly sure about how they work in Java-Script - but there is no such concept as a "Variable" in the Compiled-Expression Part of a Regular Expression - just in the text and data it analyzes. What that means is - if you want to change what you are searching for in a particular piece of Text or Data in Java, you must re-compile your expression using:
Pattern p = Pattern.compile(regExString, flags);
There is not an easy way to "dynamically change" particular values of text in the expression. The amount of complexity it would add would be astronomical, and the value, minimal. Instead, just compile another expression in your code, and search again. Another option is to better undestand things like .* .+ .*? .+? and even (.*) (.+) (.*?) (.+?) so that things that change, do change, and things that don't change, won't!
For instance if you used this pattern to match different-variables:
input.replace(/&VAR.*VAREND/gi, "apple");
All of your variables could be identified by the re-used pattern: "&VAR-stuff-VAREND" but this is just one of millions of ways to change your idea - skin the cat.

Using a replace callback function you can avoid building a separate regex for each variable and replace them all at once:
const input = 'abc $apple 123 apple $banana'
const vars = {
apple: 'Apfel',
banana: 'Banane'
}
let result = input.replace(/\$(\w+)/g, (_, v) => vars[v])
console.log(result)
This won't work if apple and banana were local variables though, but using locals is a bad idea anyways.

Related

How to include a variable and exclude numbers[0-9] and letters[a-zA-Z] in RegExp?

I have a code that generates a random letter based on the word and I have tried to create a RegExp code to turn all the letters from the word to '_' except the randomly generated letter from the word.
const word = "Apple is tasty"
const randomCharacter = word[Math.floor(Math.random() * word.length)]
regex = new RegExp(/[^${randomCharacter}&\/\\#,+()$~%.'":;*?<>{}\s]/gi)
hint = word.replace(regex,'_')
I want to change all the letters to '_' except the randomly generated word. The above code for some reason does not work and shows the result: A___e __ ta_t_ and I'm not able to figure out what to do.
The final result I want is something like this: A____ __ _a___
Is there a way with regex to change all the alphabets and numbers '/[^a-zA-Z0-9]/g' to '_' except the randomly generated letter?
I'm listing all the expressions I want to include on my above code because I'm not able to figure out a way to do include and exclude at the same time using the variable with regex.

You can't do string interpolation inside of a RegExp literal (/.../). Meaning your placeholder ${randomCharacter} will not evaluate to its value in the template, but is instead interpreted literally as the string "${randomCharacter}".
If you want to use template literals, initialize your regex variable with a RegExp constructor instead, like:
const regex = new RegExp(`[^${randomCharacter}&\\/\\\#,+()$~%.'":;*?<>{}\\s]`, "gi");
See the MDN RegExp documentation for an explanation on the differences between the literal notation and constructor function, most notably:
The constructor of the regular expression object [...] results in runtime compilation of the regular expression. Use the constructor function when [...] you don't know the pattern and obtain it from another source, such as user input.

/(?:[^A\s])/
test it on regex101
just replace A in [^A\s] with you character that you want to ommit from replacement
demo:
const word = "Apple is tasty";
const randomCharacter = 'a';//word[Math.floor(Math.random() * word.length)];
regex = new RegExp('(?:[^' + randomCharacter + '\\s])', 'gi');
hint = word.replaceAll(regex, '_');
console.log(hint)

Extracting a complicated part of the string with plain Javascript

I have a following string:
Text
I want to extract from this string, with the use of JavaScript 'pl' or 'pl_company_com'
There are a few variables:
jan_kowalski is a name and surname it can change, and sometimes even have 3 elements
the country code (in this example 'pl') will change to other en / de / fr (this is that part of the string i want to get)
the rest of the string remains the same for every case (beginning + everything after starting with _company_com ...
Ps. I tried to do it with split, but my knowledge of JS is very basic and I cant get what i want, plase help

An alternative to Randy Casburn's solution using regex
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_(.*_company_com)')[1];
console.log(out);
Or if you want to just get that string with those country codes you specified
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
let out = new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx').href.match('.*_((en|de|fr|pl)_company_com)')[1];
console.log(out);
A proof of concept that this solution also works for other combinations
let urls = [
new URL('https://my.domain.com/personal/jan_kowalski_pl_company_com/Documents/Forms/All.aspx'),
new URL('https://my.domain.com/personal/firstname_middlename_lastname_pl_company_com/Documents/Forms/All.aspx')
]
urls.forEach(url => console.log(url.href.match('.*_(en|de|fr|pl).*')[1]))

I have been very successful before with this kind of problems with regular expressions:
var string = 'Text';
var regExp = /([\w]{2})_company_com/;
find = string.match(regExp);
console.log(find); // array with found matches
console.log(find[1]); // first group of regexp = country code
First you got your given string. Second you have a regular expression, which is marked with two slashes at the beginning and at the end. A regular expression is mostly used for string searches (you can even replace complicated text in all major editors with it, which can be VERY useful).
In this case here it matches exactly two word characters [\w]{2} followed directly by _company_com (\w indicates a word character, the [] group all wanted character types, here only word characters, and the {}indicate the number of characters to be found). Now to find the wanted part string.match(regExp) has to be called to get all captured findings. It returns an array with the whole captured string followed by all capture groups within the regExp (which are denoted by ()). So in this case you get the country code with find[1], which is the first and only capture group of the regular expression.

What are the actual uses of ES6 Raw String Access?

What are the actual uses of String.raw Raw String Access introduced in ECMAScript 6?
// String.raw(callSite, ...substitutions)
function quux (strings, ...values) {
strings[0] === "foo\n"
strings[1] === "bar"
strings.raw[0] === "foo\\n"
strings.raw[1] === "bar"
values[0] === 42
}
quux `foo\n${ 42 }bar`
String.raw `foo\n${ 42 }bar` === "foo\\n42bar"
I went through the below docs.
http://es6-features.org/#RawStringAccess
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
http://www.2ality.com/2015/01/es6-strings.html
https://msdn.microsoft.com/en-us/library/dn889830(v=vs.94).aspx
The only the thing that I understand, is that it is used to get the raw string form of template strings and used for debugging the template string.
When this can be used in real time development? They were calling this a tag function. What does that mean?
What concrete use cases am I missing?

The best, and very nearly only, use case for String.raw I can think of is if you're trying to use something like Steven Levithan's XRegExp library that accepts text with significant backslashes. Using String.raw lets you write something semantically clear rather than having to think in terms of doubling your backslashes, just like you can in a regular expression literal in JavaScript itself.
For instance, suppose I'm doing maintenance on a site and I find this:
var isSingleUnicodeWord = /^\w+$/;
...which is meant to check if a string contains only "letters." Two problems: A) There are thousands of "word" characters across the realm of human language that \w doesn't recognize, because its definition is English-centric; and B) It includes _, which many (including the Unicode consortium) would argue is not a "letter."
So if we're using XRegExp on the site, since I know it supports \pL (\p for Unicode categories, and L for "letter"), I might quickly swap this in:
var isSingleUnicodeWord = XRegExp("^\pL+$"); // WRONG
Then I wonder why it didn't work, facepalm, and go back and escape that backslash, since it's being consumed by the string literal.
Easy enough in that simple regex, but in something complicated, remembering to double all those backslashes is a maintenance pain. (Just ask Java programmers trying to use Pattern.)
Enter String.raw:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`);
Example:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`); // L: Letter
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Now I just kick back and write what I mean. I don't even really have to worry about ${...} constructs used in template literals to do substitution, because the odds of my wanting to apply a quantifier {...} to the end-of-line assertion ($) are...low. So I can happily use substitutions and still not worry about backslashes. Lovely.
Having said that, though, if I were doing it a lot, I'd probably want to write a function and use a tagged template instead of String.raw itself. But it's surprisingly awkward to do correctly:
// My one-time tag function
function xrex(strings, ...values) {
let raw = strings.raw;
let max = Math.max(raw.length, values.length);
let result = "";
for (let i = 0; i < max; ++i) {
if (i < raw.length) {
result += raw[i];
}
if (i < values.length) {
result += values[i];
}
}
console.log("Creating with:", result);
return XRegExp(result);
}
// Using it, with a couple of substitutions to prove to myself they work
let category = "L"; // L: Letter
let maybeEol = "$";
let isSingleUnicodeWord = xrex`^\p${category}+${maybeEol}`;
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Maybe the hassle is worth it if you're using it in lots of places, but for a couple of quick ones, String.raw is the simpler option.

First, a few things:
Template strings is old name for template literals.
A tag is a function.
String.raw is a method.
String.raw `foo\n${ 42 }bar\` is a tagged template literal.
Template literals are basically fancy strings.
Template literals can interpolate.
Template literals can be multi-line without using \.
String.raw is required to escape the escape character \.
Try putting a string that contains a new-line character \n through a function that consumes newline character.
console.log("This\nis\nawesome"); // "This\nis\nawesome"
console.log(String.raw`This\nis\nawesome`); // "This\\nis\\nawesome"
If you are wondering, console.log is not one of them. But alert is. Try running these through http://learnharmony.org/ .
alert("This\nis\nawesome");
alert(String.raw`This\nis\nawesome`);
But wait, that's not the use of String.raw.
Possible uses of String.raw method:
To show string without interpretation of backslashed characters (\n, \t) etc.
To show code for the output. (As in example below)
To be used in regex without escaping \.
To print windows director/sub-directory locations without using \\ to much. (They use \ remember. Also, lol)
Here we can show output and code for it in single alert window:
alert("I printed This\nis\nawesome with " + Sring.raw`This\nis\nawesome`);
Though, it would have been great if It's main use could have been to get back the original string. Like:
var original = String.raw`This is awesome.`;
where original would have become: This\tis \tawesome.. This isn't the case sadly.
References:
http://exploringjs.com/es6/ch_template-literals.html
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw

Template strings can be useful in many situations which I will explain below. Considering this, the String.raw prevents escapes from being interpreted. This can be useful in any template string in which you want to contain the escape character but do not want to escape it. A simple example could be the following:
var templateWithBackslash = String.raw `someRegExp displayed in template /^\//`
There are a few things inside that are nice to note with template strings.
They can contain unescaped line breaks without problems.
They can contain "${}". Inside these curly braces the javascript is interpreted instead.
(Note: running these will output the result to your console [in browser dev tools])
Example using line breaks:
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
{
asdf: "and some pretty printed json"
}
</pre>
</div>
`
console.log(myTemplate)
If you wanted to do the above with a normal string in Javascript it would look like the following:
var myTemplate = "\
<div class="myClass">\
<pre>\
My formatted text\
with multiple lines\
{\
asdf: "and some pretty printed json"\
}\
</pre>\
</div>"
console.log(myTemplate)
You will notice the first probably looks much nicer (no need to escape line breaks).
For the second I will use the same template string but also insert the some pretty printed JSON.
var jsonObj = {asdf: "and some pretty printed json", deeper: {someDeep: "Some Deep Var"}}
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
${JSON.stringify(jsonObj, null, 2)}
</pre>
</div>
`
console.log(myTemplate)

In NodeJS it is extremely handy when it comes to filepath handling:
var fs=require('fs');
var s = String.raw`C:\Users\<username>\AppData\Roaming\SomeApp\someObject.json`;
var username = "bob"
s=s.replace("<username>",username)
fs.readFile(s,function(err,result){
if (err) throw error;
console.log(JSON.parse(result))
})
It improves readability of filepaths on Windows. \ is also a fairly common separator, so I can definitely see why it would be useful in general. However it is pretty stupid how \ still escapes `... So ultimately:
String.raw`C:\Users\` //#==> C:\Users\`
console.log(String.raw`C:\Users\`) //#==> SyntaxError: Unexpected end of input.

In addition to its use as a tag, String.raw is also useful in implementing new tag functions as a tool to do the interleaving that most people do with a weird loop. For example, compare:
function foo(strs, ...xs) {
let result = strs[0];
for (let i = 0; i < xs.length; ++i) {
result += useFoo(xs[i]) + strs[i + 1];
}
return result;
}
with
function foo(strs, ...xs) {
return String.raw({raw: strs}, ...xs.map(useFoo));
}

The Use
(Requisite knowledge: tstring §.)
Instead of:
console.log(`\\a\\b\\c\\n\\z\\x12\\xa9\\u1234\\u00A9\\u{1234}\\u{00A9}`);
.you can:
console.log(String.raw`\a\b\c\n\z\x12\xa9\u1234\u00A9\u{1234}\u{00A9}`);
"Escaping"
<\\u> is fine, yet <\u> needs "escaping", eg:
console.log(String.raw`abc${'\\u'}abc`);
.Dit <\\x>, <\x>,
<console.log(String.raw`abc${`\\x`}abc`)>;
.<\`>, <`>, <console.log(String.raw`abc${`\``}abc`)>;
.<\${>, <${&>, <console.log(String.raw`abc${`$\{`}abc`)>;
.<\\1> (till <\\7>), <\1>, <console.log(String.raw`abc${`\\1`}abc`)>;
.<\\>, endunit <\>, <console.log(String.raw`abc${`\\`}`)>.
Nb
There's also a new "latex" string. Cf §.

I've found it to be useful for testing
my RegExps. Say I have a RegExp which
should match end-of-line comments because
I want to remove them. BUT, it must not
match source-code for a regexp like /// .
If your code contains /// it is not the
start of an EOL comment but a RegExp, as
per the rules of JavaScript syntax.
I can test whether my RegExp in variable patEOLC
matches or doesn't /// with:
String.raw`/\//` .match (patEOLC)
In other words it is a way to let my
code "see" code the way it exists in
source-code, not the way it exists
in memory after it has been read
into memory from source-code, with
all backslashes removed.
It is a way to "escape escaping" but
without having to do it separately
for every backslash in a string, but
for all of them at the same time.
It is a way to say that in a given
(back-quoted) string backslash
shall behave just like any other
character, it has no special
meaning or interpretation.

Embed comments within JavaScript regex like in Perl

Is there any way to embed a comment in a JavaScript regex, like you can do in Perl? I'm guessing there is not, but my searching didn't find anything stating you can or can't.

You can't embed a comment in a regex literal.
You may insert comments in a string construction that you pass to the RegExp constructor :
var r = new RegExp(
"\\b" + // word boundary
"A=" + // A=
"(\\d+)"+ // what is captured : some digits
"\\b" // word boundary again
, 'i'); // case insensitive
But a regex literal is so much more convenient (notice how I had to escape the \) I'd rather separate the regex from the comments : just put some comments before your regex, not inside.
EDIT 2018: This question and answer are very old. EcmaScript now offers new ways to handle this, and more precisely template strings.
For example I now use this simple utility in node:
module.exports = function(tmpl){
let [, source, flags] = tmpl.raw.toString()
.replace(/\s*(\/\/.*)?$\s*/gm, "") // remove comments and spaces at both ends of lines
.match(/^\/?(.*?)(?:\/(\w+))?$/); // extracts source and flags
return new RegExp(source, flags);
}
which lets me do things like this or this or this:
const regex = rex`
^ // start of string
[a-z]+ // some letters
bla(\d+)
$ // end
/ig`;
console.log(regex); // /^[a-z]+bla(\d+)$/ig
console.log("Totobla58".match(regex)); // [ 'Totobla58' ]

Now with the grave backticky things, you can do inline comments with a little finagling. Note that in the example below there are some assumptions being made about what won't appear in the strings being matched, especially regarding the whitespace. But I think often you can make intentional assumptions like that, if you write the process() function carefully. If not, there are probably creative ways to define the little "mini-language extension" to regexes in such a way as to make it work.
function process() {
var regex = new RegExp("\\s*([^#]*?)\\s*#.*$", "mg");
var output = "";
while ((result = regex.exec(arguments[0])) !== null ){
output += result[1];
}
return output;
}
var a = new RegExp(process `
^f # matches the first letter f
.* # matches stuff in the middle
h # matches the letter 'h'
`);
console.log(a);
console.log(a.test("fish"));
console.log(a.test("frog"));
Here's a codepen.
Also, to the OP, just because I feel a need to say this, this is neato, but if your resulting code turns out just as verbose as the string concatenation or if it takes you 6 hours to figure out the right regexes and you are the only one on your team who will bother to use it, maybe there are better uses of your time...
I hope you know that I am only this blunt with you because I value our friendship.

Javascript string validation using the regex object

I am complete novice at regex and Javascript. I have the following problem: need to check into a textfield the existence of one (1) or many (n) consecutive * (asterisk) character/characters eg. * or ** or *** or infinite (n) *. Strings allowed eg. *tomato or tomato* or **tomato or tomato** or as many(n)*tomato many(n)*. So, far I had tried the following:
var str = 'a string'
var value = encodeURIComponent(str);
var reg = /([^\s]\*)|(\*[^\s])/;
if (reg.test(value) == true ) {
alert ('Watch out your asterisks!!!')
}

By your question it's hard to decipher what you're after... But let me try:
Only allow asterisks at beginning or at end
If you only allow an arbitrary number (at least one) of asterisks either at the beginning or at the end (but not on both sides) like:
*****tomato
tomato******
but not **tomato*****
Then use this regular expression:
reg = /^(?:\*+[^*]+|[^*]+\*+)$/;
Match front and back number of asterisks
If you require that the number of asterisks at the biginning matches number of asterisks at the end like
*****tomato*****
*tomato*
but not **tomato*****
then use this regular expression:
reg = /^(\*+)[^*]+\1$/;
Results?
It's unclear from your question what the results should be when each of these regular expressions match? Are strings that test positive to above regular expressions fine or wrong is on you and your requirements. As long as you have correct regular expressions you're good to go and provide the functionality you require.
I've also written my regular expressions to just exclude asterisks within the string. If you also need to reject spaces or anything else simply adjust the [^...] parts of above expressions.
Note: both regular expressions are untested but should get you started to build the one you actually need and require in your code.

If I understand correctly you're looking for a pattern like this:
var pattern = /\**[^\s*]+\**/;
this won't match strings like ***** or ** ***, but will match ***d*** *d or all of your examples that you say are valid (***tomatos etc).If I misunderstood, let me know and I'll see what I can do to help. PS: we all started out as newbies at some point, nothing to be ashamed of, let alone apologize for :)
After the edit to your question I gather the use of an asterisk is required, either at the beginning or end of the input, but the string must also contain at least 1 other character, so I propose the following solution:
var pattern = /^\*+[^\s*]+|[^\s*]+\*+$/;
'****'.match(pattern);//false
' ***tomato**'.match(pattern);//true
If, however *tomato* is not allowed, you'll have to change the regex to:
var pattern = /^\*+[^\s*]+$|^[^\s*]+\*+$/;
Here's a handy site to help you find your way in the magical world of regular expressions.

Develop Reference

JavaScript is the programming language of the Web.

RegExp replacement with variable and anchor - javascript

Related

How to include a variable and exclude numbers[0-9] and letters[a-zA-Z] in RegExp?

Extracting a complicated part of the string with plain Javascript

What are the actual uses of ES6 Raw String Access?

Embed comments within JavaScript regex like in Perl

Javascript string validation using the regex object

Categories

Resources