RegExp & PCRE convert to tree with own syntax

RegExp & PCRE convert to tree with own syntax - javascript

Looking for pre-processor for creating own syntax of regular expression, based on RegExp & PCRE syntax so it can be parsed to PCRE syntax. Example at the end
I guess I need a processor of regular expression that outputs a tree structure that represents regular expression, so I can traverse the tree and hotswap some parts, then compile it to regular expression string.
But this processor must have ability to add own syntax parsing/processing.
Is there some processor like this, already made by someone? I've made one by myself some time ago, but looking for more professional solution.
Of course we are talking about node.js/javascript
Yes, node.js has not support for PCRE, but there is a npm module for using PCRE with node.js, it works great!
Why someone would need it?
For example, you can create big regular expression by smaller ones:
(John (like|love)s every (animal|creature) on earth: (#animals))
(#...) is hash tag group, it means in place of it will be another regular expression containing alterantives for all animals.
Another example, you can create more sophisticated kind of groups:
(#(a|x)(b)(c))
permutation group matches all brackets (3 or less or more) in any order:
(a|x)(b)(c)
(a|x)(c)(b)
(b)(a|x)(c)
(b)(c)(a|x)
(c)(a|x)(b)
(c)(b)(a|x)
have more, but I guess I've made a point.

Related

Small regex issue with validating phone numbers (Javascript) [duplicate]

This question already has answers here:
How to validate phone numbers using regex
(43 answers)
Closed 6 years ago.
Wanting to validate phone numbers with the following criteria.
-Minimum of 6 digits.
-Can only have the following symbols "+", "(", ")", "-".
-Contain no more than n consecutive symbols, but numbers are OK.
Here are some examples of what i consider valid:
07519767576
+447519767576
(02380) 346450
(+44) 7519767576
I have been trying to do this myself for quite a while but hitting a brick wall. Here is what i have tried so far
^(?=.{9,}$)(?=[^0-9]*[0-9])(?:([\d\s\+\(\)\-])\1?(?!\1{5}))+?$
This kinda works but its a bit of a hack because it also limits amount of consecutive numbers.
I am not able to do this check in PHP, it has to be done in JS sadly. Is this even possible without needing a degree in regex?

At least one of your requirements is beyond what traditional regular languages in general can do. As pointed out in the comments, counting the number of digits across patterns, groups or regular expressions is not possible in traditional regular languages, which essentially use Deterministic Finite Automata (also knows as DFAs) to compute regular expression matches.
PCRE compatible regular expressions, which is what most languages like Javascript and Python for example, support add additional functionality with things such as backtracking, look ahead matching, grouping, counting for a single group, and so on.
These enhance the set of patterns PCRE regular expressions can match, or more technically the set of languages the expression will accept. But to the best of my knowledge, none of these extensions let one do counting in the way you want to here, at least directly.
Turns out PCRE compatible regular expressions are NP-Complete in theory, but that doesn't mean it's easy or even feasible to write a regular expression for a given problem.
In most cases one would write a small hand rolled parser in a turing complete programming language, which can do what you need fairly easily.
OP mentioned that doing this is not an option and thus the problem as has come to a standstill.

RegExp for parsing a Math Expression?

Hey I've written a fractal-generating program in JavaScript and HTML5 (here's the link), which was about a 2 year process including all the research I did on Complex math and fractal equations, and I was looking to update the interface, since it is quite intimidating for people to look at. While looking through the code I noticed that some of my old techniques for going about doing things were very inefficient, such as my Complex.parseFunction.
I'm looking for a way to use RegExp to parse components of the expression such as functions, operators, and variables, as well as implementing the proper order of operations for the expression. An example below might demonstrate what I mean:
//the first example parses an expression with two variables and outputs to string
console.log(Complex.parseFunction("i*-sinh(C-Z^2)", ["Z","C"], false))
"Complex.I.mult(Complex.neg(Complex.sinh(C.sub(Z.cPow(new Complex(2,0,2,0))))))"
//the second example parses the same expression but outputs to function
console.log(Complex.parseFunction("i*-sinh(C-Z^2)", ["Z","C"], true))
function(Z,C){
return Complex.I.mult(Complex.neg(Complex.sinh(C.sub(Z.cPow(new Complex(2,0,2,0))))));
}
I know how to handle RegExp using String.prototype.replace and all that, all I need is the RegExp itself. Please note that it should be able to tell the difference between the subtraction operator (e.g. "C-Z^2") and the negative function (e.g. "i*-(Z^2+C)") by noting whether it is directly after a variable or an operator respectively.

While you can use regular expressions as part of an expression parser, for example to break out tokens, regular expressions do not have the computational power to parse properly nested mathematical expressions. That is essentially one of the core results of computing theory (finite state automata vs. push down automata). You probably want to look at something like recursive-descent or LR parsing.
I also wouldn't worry too much about the efficiency of parsing an expression provided you only do it once. Given all of the other math you are doing, I doubt it is material.

Is it possible to parse regex strings with a regex

Just out of curiosity, is it possible to parse a string that is totally made out of random but valid regular expressions with a single regular expression?
given the string of regex:
<[^>]*>\xA9
parses to:
<[^>]*>
\xA9
in which the first one match html and second one match a copyright symbol.
Edit:
I found a similar question asked at SO claiming that it maybe possible. Here, I'm referring to regex in JavaScript ECMA-262 only.

No, it is not possible: regular expression language allows parenthesized expressions representing capturing and non-capturing groups, lookarounds, etc., where parentheses must be balanced. It is not possible even in theory to write a regular expression that verifies if parentheses are balanced in a given string. Without an ability to do that you wouldn't know where one regexp ends and the other one starts.
In general, regex grammar is relatively complex. To get an idea of just how complex it is, take a look at the parser in the source of Java's Pattern class.

What Javascript Regular Expression features are unique to Javascript?

I hope this question isn't too broad, but then again I would expect the Javascript (and other languages) regular expression engine's to share most of it's functionality with what is considered standard / expected regular expression behavior.
I made a statement about C# having unique regular expression capabilities in this post :: RegEx match open tags except XHTML self-contained tags
Specifically, here is the statement:
C# is unique when it comes to regular expressions in that it supports
Balancing Group
Definitions.
See Matching Balanced Constructs with .NET Regular Expressions
See .NET Regular Expressions: Regex and Balanced Matching
See Microsoft's docs on Balancing Group Definitions
I'm curious what unique regular expression capabilities javascript has if any.

Although JavaScript’s regular expression library supports features that are considered as common (see comparison table), there is one particular expression that I haven’t seen in other:
/[^]/
This matches any arbitrary character similar to /[\s\S]/ (or any other union of complementary character classes) and can be handy as JavaScript does not have a s modifier like others have to have . match line breaks too.
Similar to that:
/[]/
This evaluates to an empty character set and can’t match anything at all.

javascript regexes are a subset of perl regexes.
Meaning, it has no unique features, but it's missing quite a few.

Javascript regular expressions are modeled on Perl's regular expressions.
See: http://www.regular-expressions.info/javascript.html

JavaScript's regex engine is merely a subset of Perl's engine, meaning that it doesn't add anything new and is missing many of the features Perl contains.
You can read more about it here: http://www.regular-expressions.info/javascript.html.

Best way to generate javascript code in ruby (RoR)

I have seen some rails plugins which generate javascript code dynamically using ruby.
1.
%Q ( mixed block of javascript and ruby )
2.
<<-CODE
some mixed ruby and javascript code
CODE
Being a java developer I don't understand
what those strange looking syntax mean ?
Is one way better than the other ?
can anyone point me to proper documentation about such things ?

The first syntax is Ruby's string literal syntax. Specifically, the %Q (capital Q as opposed to lower-case) means that the string will be interpolated. eg:
%Q[Here's a string with #{a_variable} interpolated!]
Note that you can use any arbitrary characters as the open and close delimiters.
The second syntax is Ruby's heredoc syntax. The dash after the opening << indicates that Ruby will strip whitespace from the beginning of input lines contained in the heredoc block.
Ruby on Rails ships with the Prototype JavaScript framework built-in already. It also ships with JS generator helper methods which generate the Prototype code dynamically based on Ruby code.
You needn't use these if you don't want to. In fact, I rarely use them or Prototype at all, as jQuery is my JS framework of choice. So one way is not "better" than the other (except in the general sense that heredoc is better than the string literal syntax for certain cases).

In Ruby %Q provides a double quote delimited string, so:
%Q(mixed block of javascript and ruby) #=> "mixed block of javascript and ruby"
<<-CODE is what Ruby calls a Here Document, or simply heredoc. This is a mechanism for creating free format strings whilst preserving special characters such as new lines and tabs.
A heredoc is created by preceding the text with << followed by the delimiter string you wish to use to mark the end of the text.
text = <<-DOC
To be, or not to be: that is the question
William Shakespeare
DOC
When this string is printed it appears exactly as it was entered, together with all the new lines and tabs:
To be, or not to be: that is the question
William Shakespeare

%Q is the equivalent to a "" string in Ruby. But if you use such %Q-syntax, you don't need to escape double quotes.
It's a HEREDOC declaration. You also don't need to escape quotes there.
Strings in Ruby.

Here you can find the details.
Ruby with javascript

Develop Reference

JavaScript is the programming language of the Web.

RegExp & PCRE convert to tree with own syntax - javascript

Related

Small regex issue with validating phone numbers (Javascript) [duplicate]

RegExp for parsing a Math Expression?

Is it possible to parse regex strings with a regex

What Javascript Regular Expression features are unique to Javascript?

Best way to generate javascript code in ruby (RoR)

Categories

Resources