Converting user input strings with Special characters to regular expression

Converting user input strings with Special characters to regular expression - javascript

Note: there was a similar question but since it didn't have 'Special characters' and the problem here is only with 'Special characters', I posted a new question.
I have a list (from user input textarea) of Regular Expression representations:
Example: (simplified)
// starting point is a string like this one
let str = `/ab+c/
/Chapter (\d+)\.\d*/
/\d+/
/d(b+)d/`;
I need to convert them into an array to check (for validity) and prepare each line e.g. (simplified)
let arr = str.split(/[\r\n]+/);
for (let i = 0, len = arr.length; i < len; i++) {
arr[i] = arr[i].slice(1, -1); // removing the start/end slashes
// problem with double slashing the Special characters
// it doesn't give the required result
arr[i] = arr[i].replace(/\\/g, '\\$&');
// same result with replace(/\\/g, '\\\\')
}
Finally, convert them into one RegEx object
let regex = new RegExp(arr.join('|'), 'i');
console.log(regex.test('dbbbd')); // true
console.log(regex.test('256')); // false
I must be missing something here.
Update
I missed the point that the data that comes from a textarea (or similar) doesn't need to be escaped at all. When I was testing the code, I was testing it like above which didn't work.

Lets use the "change" event on <textarea> so that once the user changes the content and clicks outside, we just access the value property of it we can then construct the composite RegExp object. I haven't had the need to escape the \ characters at all.
Just copy paste the following to the text area and click outside.
/ab+c/
/Chapter (\d+)\./
/\d+/
/d(b+)d/
var myTextarea = document.getElementById("ta");
myTextarea.addEventListener("change", function(e) {
var str = e.currentTarget.value.split(/[\r\n]+/)
.map(s => s.slice(1, -1))
.join("|");
rgx = new RegExp(str, "i")
console.log(`Derived RegExp object: ${rgx}`);
console.log(`Testing for 'dbbbd': ${rgx.test('dbbbd')}`); // true
console.log(`Testing for '256': ${rgx.test('256')}`); // true
});
#ta {
width: 33vw;
height: 50vh;
margin-left: 33vw;
}
<textarea id="ta"></textarea>

Related

How to define a line break in extendscript for Adobe Indesign

I am using extendscript to build some invoices from downloaded plaintext emails (.txt)
At points in the file there are lines of text that look like "Order Number: 123456" and then the line ends. I have a script made from parts I found on this site that finds the end of "Order Number:" in order to get a starting position of a substring. I want to use where the return key was hit to go to the next line as the second index number to finish the substring. To do this, I have another piece of script from the helpful people of this site that makes an array out of the indexes of every instance of a character. I will then use whichever array object is a higher number than the first number for the substring.
It's a bit convoluted, but I'm not great with Javascript yet, and if there is an easier way, I don't know it.
What is the character I need to use to emulate a return key in a txt file in javascript for extendscript for indesign?
Thank you.
I have tried things like \n and \r\n and ^p both with and without quotes around them but none of those seem to show up in the array when I try them.
//Load Email as String
var b = new File("~/Desktop/Test/email.txt");
b.open('r');
var str = "";
while (!b.eof)
str += b.readln();
b.close();
var orderNumberLocation = str.search("Order Number: ") + 14;
var orderNumber = str.substring(orderNumberLocation, ARRAY NUMBER GOES HERE)
var loc = orderNumberLocation.lineNumber
function indexes(source, find) {
var result = [];
for (i = 0; i < source.length; ++i) {
// If you want to search case insensitive use
// if (source.substring(i, i + find.length).toLowerCase() == find) {
if (source.substring(i, i + find.length) == find) {
result.push(i);
}
}
alert(result)
}
indexes(str, NEW PARAGRAPH CHARACTER GOES HERE)
I want all my line breaks to show up as an array of indexes in the variable "result".
Edit: My method of importing stripped all line breaks from the document. Using the code below instead works better. Now \n works.
var file = File("~/Desktop/Test/email.txt", "utf-8");
file.open("r");
var str = file.read();
file.close();

You need to use Regular Expressions. Depending on the fields do you need to search, you'l need to tweek the regular expressions, but I can give you a point. If the fields on the email are separated by new lines, something like that will work:
var str; //your string
var fields = {}
var lookFor = /(Order Number:|Adress:).*?\n/g;
str.replace(lookFor, function(match){
var order = match.split(':');
var field = order[0].replace(/\s/g, '');//remove all spaces
var value = order[1];
fields[field]= value;
})
With (Order Number:|Adress:) you are looking for the fields, you can add more fields separated the by the or character | ,inside the parenthessis. The .*?\n operators matches any character till the first break line appears. The g flag indicates that you want to look for all matches. Then you call str.replace, beacause it allows you to perfom a single task on each match. So, if the separator of the field and the value is a colon ':', then you split the match into an array of two values: ['Order number', 12345], and then, store that matches into an object. That code wil produce:
fields = {
OrderNumber: 12345,
Adresss: "my fake adress 000"
}

Please try \n and \r
Example: indexes(str, "\r");

If i've understood well, wat you need is to str.split():
function indexes(source, find) {
var order;
var result = [];
var orders = source.split('\n'); //returns an array of strings: ["order: 12345", "order:54321", ...]
for (var i = 0, l = orders.length; i < l; i++)
{
order = orders[i];
if (order.match(/find/) != null){
result.push(i)
}
}
return result;
}

How do I handle an error for an input string to not allow numbers or symbols

I'm going through an exercise and one of the functions is to write code that takes an input containing only strings and returns the first
non-repeating character.
I've done that already, but i would like to make it smarter by handling any errors that could be a number or symbol. I tried all I could but it seems not to work. It should be purely letters and spaces taken.
Here is what I have so far.
function TheOutput(word){
var a =word.length;
for(var i=0; i < a; i++){
var char=word.charAt(i);
if(word.indexOf(char)===word.lastIndexOf(char)){
result = (char + " is not a number <br/>");
break;
}
return result;
}
}

You can use regular expressions if you want to test if string is containing only letters and spaces.
var onlyLettersAndSpace = 'Valid string';
var stringWithNumbers = 'Invalid string 123';
var RegExpression = new RegExp(/^[a-zA-Z\s]*$/);
console.log(RegExpression.test(onlyLettersAndSpace));
// true
console.log(RegExpression.test(stringWithNumbers));
// false
In your case, you could do something like:
function testWord(word) {
var RegExpression = new RegExp(/^[a-zA-Z\s]*$/);
return RegExpression.test(word);
}
You can do a lot more with regular expressions, see RegExp MDN: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp

Develop vscode extension for replace text

I have a list of data that I need to put a ' symbol at the start of the line and at the end of the line. So the original data look like
11223334444xxx55555
11xxx223334444555xxx55
11xxxx22333444xxx455555
11xxxxx22333444455555
11xxxxxx223334444555xxx55
and I want all the line to look like
11223334444yyyy55555
11yyyy223334444555yyyy55
11yyyyx22333444yyyy455555
11yyyyxx22333444455555
11yyyyyyyy223334444555yyyy55
that is 'yyyy' replace 'xxx', how I write my code? Both typescript and javascript are perfect.
Sorry, my bad. I want develop an extension to do it, and above just an example. Many answers below just miss select full text part.
const textEditor = vscode.window.activeTextEditor;
if (!textEditor) {
return; // No open text editor
}
var firstLine = textEditor.document.lineAt(0);
var lastLine = textEditor.document.lineAt(textEditor.document.lineCount - 1);
var textRange = new vscode.Range(0,
firstLine.range.start.character,
textEditor.document.lineCount - 1,
lastLine.range.end.character);
textEditor.edit(function (editBuilder) {
editBuilder.replace(textRange, '$1');
});
});
replace function above just has one replace argument, not two, how can I replace it?

Try this:
let checkD =
11223334444xxx55555
11xxx223334444555xxx55
11xxxx22333444xxx455555
11xxxxx22333444455555
11xxxxxx223334444555xxx55
;
checkD.replace(/xxx/g, 'yyyy');

You can try it with regex. Read more at the MDN.
Here is an array example:
let data = [
"11223334444xxx55555",
"11xxx223334444555xxx55",
"11xxxx22333444xxx455555",
"11xxxxx22333444455555",
"11xxxxxx223334444555xxx55"
];
let max = data.length;
for (let i = 0; i < max; i++) {
let regex = new RegExp("xxx", "g")
data[i] = data[i].replace(regex, "yyyy")
}
console.log(data);
Here is a single string example:
let data = `11223334444xxx55555
11xxx223334444555xxx55
11xxxx22333444xxx455555
11xxxxx22333444455555
11xxxxxx223334444555xxx55`;
let regex = new RegExp("xxx", "g")
data = data.replace(regex, "yyyy")
console.log(data);

I know it's late, but I was struggling with this sort of an issue and got myself to resolve it, using JS.
If I understood your question right.
You want to replace three 'x' letters with four 'y' letters.
I.e., turn this
xxxxxx1xxx2xx into this yyyyyyyy1yyyy2xx
const textEditor = vscode.window.activeTextEditor;
if (!textEditor) {
vscode.window.showErrorMessage("Editor Does Not Exist");
return;
}
var m;
let fullText = editor.document.getText();
const regex = /xxx/gm; // 'g' flag is for global search & 'm' flag is for multiline.
//searching for previously declared xxx in regex and replacing it with 'yyyy'.
let textReplace = fullText.replace(regex, `yyyy`);
//Creating a new range with startLine, startCharacter & endLine, endCharacter.
let invalidRange = new vscode.Range(0, 0, editor.document.lineCount, 0);
// To ensure that above range is completely contained in this document.
let validFullRange = editor.document.validateRange(invalidRange);
while ((m = regex.exec(fullText)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
editor.edit(editBuilder => {
editBuilder.replace(validFullRange, textReplace);
}).catch(err => console.log(err));
}
Regexp g flag is for global search to match all occurrences, without it, it'll only check for the first.
Regexp m flag is for multiline, it makes ^ and $ match at line beginnings and line endings(instead of at string), respectively.
Reference: global - JavaScript | MDN ,
multiline - JavaScript | MDN
Also, consider looking at the VSCode API Doc for Range & Validate Range

that is 'yyyy' replace 'xxx', how I write my code
split and join is what I use e.g.
str.split('yyyy').join('xxx');

Regex to validate a texarea input which must be URLs separated by new lines

I am trying to create a regex which will ultimately be used with Google Forms to validate a texarea input.
The rule is,
Input area can have one or more URLs (http or https)
Each URL must be separated either by one or more new lines
Each line which has text, must be a single valid URL
Last URL may have or may not have new line character/s after it
Till now, I have written this regex ^(https?://.+[\r\n]+)*(https?://.+[\r\n]+?)$ but the problem is that if a line has more than 1 url, it validates that too.
Here is my testing playground: http://goo.gl/YPdvBH.

Here is what you are looking for
Demo , Demo with your URLS
function validate(ele) {
str = ele.value;
str = str.replace(/\r/g, "");
while (/\s\n/.test(str)) {
str = str.replace(/\s\n/g, "\n");
}
while (/\n\n/.test(str)) {
str = str.replace(/\n\n/g, "\n");
}
ele.value = str;
str = str.replace(/\n/g, "_!_&_!_").split("_!_&_!_")
var result = [], counter = 0;
for (var i = 0; i < str.length; i++) {
str[i] = str[i].replace(/(?:(?:^|\n)\s+|\s+(?:$|\n))/g, '').replace(/\s+/g, ' ');
if(str[i].length !== 0){
if (isValidAddress(str[i])) {
result.push(str[i]);
}
counter += 1;
}
}
function isValidAddress(s) {
return /^(https?|ftp):\/\/(((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*#)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)+(\/(([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)*)*)?)?(\?((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|#)|\/|\?)*)?$/i.test(s)
}
return (result.length === str.length);
}
var ele = document.getElementById('urls');
validate(ele);

This is closer to the regex you are looking for:
^(https?://[\S]+[\r\n]+)*(https?://[\S]+[\r\n]+?)$
The difference between your regex and this one is that you use .+ which will match all characters except newline whereas I use [\S]+ (note it is a capital S) which will match all non-whitespace characters. So, this doesn't match more than one token on one line. Hence, on each line you can match at max one token and that must be of the form that you have defined.
For a regex to match a single URL, look at this question on StackOverflow:
What is the best regular expression to check if a string is a valid URL?
I don't know whether google-forms have a length limit. But if they have, it is sure to almost bounce into it.

If i understand right - in your regexp missing m flag for multiline, so you need something like this
/^(https?://.+this your reg exp for one url)$/m
sample with regexp from Javascript URL validation regex
/^(ht|f)tps?:\/\/[a-z0-9-\.]+\.[a-z]{2,4}\/?([^\s<>\#%"\,\{\}\\|\\\^\[\]`]+)?$/m

regular expression javascript returning unexpected results

In the below code, I want to validate messageText with first validationPattern and display the corresponding message from the validationPatterns array. Pattern and Message are separated by Pipe "|" character.
for this I am using the below code and always getting wrong result. Can some one look at this and help me?
var messageText = "Message1234";
var validationPatterns = [
['\/^.{6,7}$/|message one'],
['\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b|message two']
];
for (var i = 0; i < validationPatterns.length; i++) {
var validationvalues = validationPatterns[i].toString();
var expr = validationvalues.split("|")[0];
console.log(expr.constructor);
if(expr.test(messageText)) {
console.log("yes");
} else {
console.log("no");
}
}
I know that we cannot use pipe as separator as pipe is also part of regular expression. However I will change that later.

Your validationpatterns are strings. That means:
The backslashes get eaten as they just string-escape the following characters. "\b" is equivalent to "b". You would need to double escape them: "\\b"
You cannot call the test method on them. You would need to construct RegExp objects out of them.
While it's possible to fix this, it would be better if you just used regex literals and separated them from the message as distinct properties of an object (or in an array).
var inputText = "Message1234";
var validationPatterns = [
[/^.{6,7}$/, 'message one'],
[/\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/, 'message two']
];
for (var i = 0; i < validationPatterns.length; i++) {
var expr = validationPatterns[i][0],
message = validationPatterns[i][1];
console.log(expr.constructor); // RegExp now, not String
if(expr.test(inputText)) {
console.log(message+": yes");
} else {
console.log(message+": no");
}
}

Your expr variable is still just a string (validationvalues.split("|")[0] will return a string). That's the reason it does not work as a regular expression.
You need to add a line after the initial definition of expr.
expr = new RegExp(expr, 'i');
The 'i' is just an example of how you would use a case-insensitive flag or other flags. Use an empty string if you want a case-sensitive search (the default).
Also, you need to take out the / and / which are surrounding your first pattern. They are only needed when using regular expression literals in JavaScript code, and are not needed when converting strings into regular expressions.

Develop Reference

JavaScript is the programming language of the Web.

Converting user input strings with Special characters to regular expression - javascript

Related

How to define a line break in extendscript for Adobe Indesign

How do I handle an error for an input string to not allow numbers or symbols

Develop vscode extension for replace text

Regex to validate a texarea input which must be URLs separated by new lines

regular expression javascript returning unexpected results

Categories

Resources