Regex - How to count word in sql text? - javascript

I want to count "CREATE" word in sql text but I don't want to count with comment.
This is my work PHP regex pattern. But it don't work on JavaScript or C#. How can I convert Javascript or C# Regex pattern ? I want to get match count on Javascript or C#
https://regex101.com/r/oB7pA2/7#pcre
This Regex pattern is not working on C#
var asd = Regex.Matches("CREATE TABLE TEST (COLUMNA NUMBER);", "(\\/\\*(?:(?!\\/\\*).|(?1))*?\\*\\/)(*SKIP)(*F)|^\\s*CREATE").Count;
Best Regards

The simplest way would be to split the process into three steps
1. Remove the comments
I've only quickly tested this but it does seem to work with my nested comments test cases. You are going to want to write test cases and and prove this regex works for your cases.
Note that this doesn't "require" the closing of comments */ because that could cause "catastrophic backtracking". If it is important to you to error when an unclosed comment is in the string you should write a separate RegEx for that.
Regular Expression
(?:/\*(?:[^*/]+|\*[^/]|/[^*]|/\*(?:[^*/]+|\*[^/]|/[^*])*(?:\*/)?)*(?:\*/)?|-- [^\r\n]+)
https://regex101.com/r/sO5vR1/2
Visualisation
Code
var modified = original.replace(/(?:\/\*(?:[^*\/]+|\*[^\/]|\/[^*]|\/\*(?:[^*\/]+|\*[^\/]|\/[^*])*(?:\*\/)?)*(?:\*\/)?|-- [^\r\n]+)/ig, "");
2. Remove quoted strings
We don't want to count CREATE if it's in a quoted string
NOTE: You might also want to handle brackets ([ ]) if you believe they might to contain the word CREATE.
Regular Expression
'[^']*(?:''[^']*)*'|"[^"\\]*(?:\\.[^"\\]*)*"
https://regex101.com/r/rZ9nV7/1
Visualisation
Code
modified = modified.replace(/'[^']*(?:''[^']*)*'|"[^"\\]*(?:\\.[^"\\]*)*"/ig, "");
3. Count occurrences of "CREATE"
Code
var count = (modified.match(/\bCREATE\b/ig) || []).length;

Related

Javascript RegEx to match everything up until and after a specific pattern

I need a regex to match everything in a string except a sub-string of a given pattern, which may appear several times.
Example text:
A lot of text before my pattern.
Perhaps several lines...
...then [my pattern here]. Then maybe [my pattern here] again and some more text to end.
The pattern in case is anything starting with "000." and followed by however many alphanumeric characters except a space. So, for example, valid tokens would be:
000.a
000.1a
000.SomeLongWordHere!123
Firstly, I started to match the pattern itself, which I managed to with /000\.[^ ]+/ g. Then I tried to negate that, with /(?!000\.[^ ]+)/ g and variations of that, adding things like .+ before, before and after, but none works for what I need.
I looked into several other questions regarding regex (such as this and this), but wasn't lucky (or didn't quite understand how to apply the answers to my need).
I'm using regex101.com to test.
Using the above example text, the desired result is:
A lot of text before my pattern.
Perhaps several lines...
...then . Then maybe again and some more text to end.
Any help is greatly appreciated. Thanks in advance!
As #Doqnach mentioned in the comments, it seems you just want to do a replace
Using your regex:
text.replace(/000\.[^ ]+/g, "");
And here's a working example:
let text = `A lot of text before my pattern.
Perhaps several lines...
...then 000.SomeLongWordHere!123 Then maybe 000.1a again and some more text to end.`
// Easiest way
console.log("=== Using Replace ===\n\n");
console.log(text.replace(/000\.[^ ]+/g, ""));
// Using regex exec
console.log("\n\n=== Regex exec ===\n\n");
let regex = /(?:^|(?:000\.[^ ]+))((?:(?!000\.[^ ]+)[\S\s])*)/igm;
let content = "";
while(result = regex.exec(text)){
//In Group 1 is the content that does not match the desired pattern
content+= result[1];
}
console.log(content);
Second method breakdown in: https://regex101.com/r/nR9tV6/16

Understand code from string using Regex or something - Js

I wanted to run a string replace function on a piece of code and make sure that all the strings in the code is intact and unchanged using javascript. For example if I have a code like below:
var a = "I am ok";
if (a == "I am ok") {
alert("That's great to know");
}
Now, I want to run a string replace on this code block. But it should only effect the code part of it. Not the strings which are in double quotes. Can this be done using regex or any other method?
AST
To avoid any chance of error in code manipulation using an Abstract Syntax Tree (AST) type solution is best. One example implementation is in UglifyJS2 which is a JavaScript parser, minifier, compressor or beautifier toolkit.
RegEx
Alternatively if an AST is over the top for your specific task you can use RegEx.
But do you have to contend with comments too?
The process might look like this:
Use a carefully formed regex to split the JavaScript code string based on these in this order:
comment blocks
comment lines
quoted strings both single and double quotes (remembering to contend with escaping of characters).
Iterate though the split components. If string (beings with " or ') or comment (begins with // or /*) ignore, otherwise run your replacement.
(and the simple part) join array of strings back together.
You would have to place the function code in a string variable, run a normal regex operation over that string, and then convert it to a function afterwards with:
var func = new Function('a', 'b', 'return a + b');
EDIT: Use regex to exclude the text between double quotes if you need to.

JavaScript RegEx match unless wrapped with [nocode][/nocode] tags

My current code is:
var user_pattern = this.settings.tag;
user_pattern = user_pattern.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&"); // escape regex
var pattern = new RegExp(user_pattern.replace(/%USERNAME%/i, "(\\S+)"), "ig");
Where this.settings.tag is a string such as "[user=%USERNAME%]" or "#%USERNAME%". The code uses pattern.exec(str) to find any username in the corresponding tag and works perfectly fine. For example, if str = "Hello, [user=test]" then pattern.exec(str) will find test.
This works fine, but I want to be able to stop it from matching if the string is wrapped in [nocode][/nocode] tags. For example, if str = "[nocode]Hello, [user=test], how are you?[/nocode]" thenpattern.exec(str)` should not match anything.
I'm not quite sure where to start. I tried using a (?![nocode]) before and after the pattern, but to no avail. Any help would be great.
I would just test if the string starts with [nocode] first:
/^\[nocode\]/.test('[nocode]');
Then simply do not process it.
Maybe filter out [nocode] before trying to find the username(s)?
pattern.exec(str.replace(/\[nocode\](.*)\[\/nocode\]/g,''));
I know this isn't exactly what you asked for because now you have to use two separate regular expressions, however code readability is important too and doing it this way is definitely better in that aspect. Hope this helps 😉
JSFiddle: http://jsfiddle.net/1f485Lda/1/
It's based on this: Regular Expression to get a string between two strings in Javascript

Regex appears to ignore multiple piped characters

Apologies for the awkward question title, I have the following JavaScript:
var wordRe = new RegExp('\\b(?:(?![<^>"])fox|hello(?![<\/">]))\\b', 'g'); // Words regex
console.log('<span>hello</span> <hello>fox</hello> fox link hello my name is fox'.replace(wordRe, 'foo'));
What I'm trying to do is replace any word that isn't nested in a HTML tag, or part of a HTML tag itself. I.e I want to only match "plain" text. The expression seems to be ignoring the rule for the first piped match "fox", and replacing it when it shouldn't be.
Can anyone point out why this is? I think I might have organised the expression incorrectly (at least the negative lookahead).
Here is the JSFiddle.
I'd also like to add that I am aware of the implications of using regex with HTML :)
For your regex work, you want lookbehind. However, as of this writing, this feature is not supported in Javascript.
Here is a workaround:
Instead of matching what we want, we will match what we don't want and remove it from our input string. Later, we can perform the replace on the cleaned input string.
var nonWordRe = new RegExp('<([^>]+).*?>[^<]+?</\\1>', 'g');
var test = '<span>hello</span> <hello>fox</hello> fox link hello my name is fox';
var cleanedTest = test.replace(nonWordRe, '');
var final = cleanedTest.replace(/fox|hello/, 'foo'); // once trimmed final=='foo my name is foo'
NOTA:
I have build this workaround based on your sample. But here are some points that may need to be explored if you face them:
you may need to remove self closing tags (<([^>]+).*?/\>) from the test string
you may need to trim the final string (final)
you may need a descent html parser if tags can contain other tags as HTML allow this.
Javascript doesn't, again as of this writing, recursive patterns.
Demo
http://jsfiddle.net/yXd82/2/

How to search csv string and return a match by using a Javascript regex

I'm trying to extract the first user-right from semicolon separated string which matches a pattern.
Users rights are stored in format:
LAA;LA_1;LA_2;LE_3;
String is empty if user does not have any rights.
My best solution so far is to use the following regex in regex.replace statement:
.*?;(LA_[^;]*)?.*
(The question mark at the end of group is for the purpose of matching the whole line in case user has not the right and replace it with empty string to signal that she doesn't have it.)
However, it doesn't work correctly in case the searched right is in the first position:
LA_1;LA_2;LE_3;
It is easy to fix it by just adding a semicolon at the beginning of line before regex replace but my question is, why doesn't the following regex match it?
.*?(?:(?:^|;)(LA_[^;]*))?.*
I have tried numerous other regular expressions to find the solution but so far without success.
I am not sure I get your question right, but in regards to the regular expressions you are using, you are overcomplicating them for no clear reason (at least not to me). You might want something like:
function getFirstRight(rights) {
var m = rights.match(/(^|;)(LA_[^;]*)/)
return m ? m[2] : "";
}
You could just split the string first:
function getFirstRight(rights)
{
return rights.split(";",1)[0] || "";
}
To answer the specific question "why doesn't the following regex match it?", one problem is the mix of this at the beginning:
.*?
eventually followed by:
^|;
Which might be like saying, skip over any extra characters until you reach either the start or a semicolon. But you can't skip over anything and then later arrive at the start (unless it involves newlines in a multiline string).
Something like this works:
.*?(\bLA_[^;]).*
Meaning, skip over characters until a word boundary followed by "LA_".

Categories

Resources