Regular Expression to identify all camel cased strings in a document - javascript

I am rusty on regular expressions and need some help. A js code base inherited is using a mix of camel case and snake casing for things like variables names and object properties.
I am trying to formulate a regular expression I can use that will identify all the camel cased strings, and then be able to replace those strings with snake casing. The part I am struggling with is identifying the camel cased strings under the conditions I have.
Identifying which strings are camel case: In this document, all camel cased strings start off with either a lower case letter, an underscore, or a $, and then will Use a capital Letter at some point later in the string. Examples are: someCamelCasedString & _someCamelCasedString & $someCamelCasedString. The regular expression would need to take into account that some of these strings I am trying to match for may be object properties, so it should be able to identify things like: Foo._someCamelCasedString.bar or Foo[_someCamelCasedString].bar

This identifies all occurrences of "strict" camel case (only letters). Whether they start with _ or $ or foofoo doesn't matter.
[a-z]+[A-Z][a-zA-Z]*
An edge case is cameL Is that proper camel case? I have assumed it is, but we can change that.
See demo
If you want to allow other characters in the string (digits etc) then we can add them in the character classes. So this is a starting point to be refined depending on your requirements.
For instance if you know that you're happy with digits and underscores, you can go with this:
[a-z]\w*?[A-Z]\w*
If you also want to allow dollars in the name (a character that #Jongware says js strings allow) you can go with this:
[a-z][\w$]*[A-Z][\w$]*
Then there is the question of what constitutes the boundary of a valid string, so that we can perhaps devise some anchor (perhaps with sneaky lookaheads, since js doesn't support lookbehinds) in order to avoid false positives.

Maybe something like this:
/(\w|\$)+([A-Z])\w+/gm
You can play around with it here and see the examples: http://regexr.com/38qkq The site also explains what each piece means in regular expressions.

/(?:^|\s|[^\w$])([a-z_$][a-zA-Z]*[A-Z][a-zA-Z]*)/gm
Test http://regex101.com/r/pH1aB7

Related

Regex expression excludes links with weird URL

I have this regex expression (Java / JavaScript)
/(http|ftp|https):\/\/([\w+?\.\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\-\\=\\+\\\\\/\\?\\.\\:\\;\\'\\,]*\.(?:jpg|JPG|jpeg|JPEG|gif|GIF|png|PNG|bmp|BMP|tiff|TIFF))?/
But it seem to have issues with a URL like this one :
https://cdn.vox-cdn.com/thumbor/C07imD1SHmAnbObkg-nJ92N6sD8=/0x0:4799x3199/920x613/filters:focal(2017x1217:2783x1983):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/62871037/seattle.0.jpg
What do you think is missing in my expression?
I want to accept valid image URL.
Your expression works for me in the validator I tested with (regex101.com), however, it matches as 3 separate capture groups. To capture it all as a single match, just wrap the whole statement in a set of parentheses.
Note: to be clear, there are simpler ways to do this, but to answer the specific question that the OP asked, this will make their statement match their supplied link.
((http|ftp|https):\/\/([\w+?\.\w+])+([a-zA-Z0-9\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)_\-\\=\\+\\\\\/\\?\\.\\:\\;\\'\\,]*\.(?:jpg|JPG|jpeg|JPEG|gif|GIF|png|PNG|bmp|BMP|tiff|TIFF))?)
EDIT: After assisting the OP in narrowing down the scope of their issue, a more appropriate regex statement would be something like this: /^(((http(s?))|((s?)ftp)):)([\w \D~!##$%^&*\\_/-=+/?.:;',]){1,}\.(jpg|gif|png)$/i
Lets break this down:
First this says it must start with either'http' with an optional 's', or if that isnt there, it will look for 'ftp' with an optional 's' prefixing it to account for secure forms of ftp. this must be followed with a colon. The next set accepts just about any commonly used character or symbol in a url path. Finally, it ensures that the expression ends with an actual image extension. wrapping the expression in /{expression}/i indicates that the expression is case insensitive and it will matche either upper or lower case, in any combination.
as a further note, you also may want to account for the print formats of .jpeg, .tif, etc.

Regular expression to find if all letters in a set or range exist anywhere in string

I am writing a regular expression in JavaScript to find if each of the letters in 'abt' is available anywhere in the string.
console.log(/(?=.*a)(?=.*t)(?=.*b)/i.test("at good and bad"));
If I have more characters to identify I have to make this regular expression long.
Can any one suggest me how can I optimize this?
If I have to match a specific range like a-z what I should do?
If I understand what you're trying to do, I think the following should work:
(.?[a]{1,}.?)|(.?[b]{1,}.?)|(.?[t]{1,}.?)
This will match any string that contains any of those letters at any position, in any word within the string. if you have to do ranges, like say, a-c, change the [a] to [a-c] and that will still work. This is case sensitive so if you want to check in a manner that is not case sensitive it would be[a-cA-C] or [aA].
Tested working at regex101.com

Regular expression anything but letters (javascript)

I want to validate a form field by checking of the input contains any letters. All other characters and numbers should be allowed. I'm quite bad at regular expressions, and I can't find a correct solution anywhere.
I've tried this:
/[^A-Za-z]/g
but this only returns false if the string consists of only letters (i.e. 432ad32d should return false as well).
Could anyone tell me how to do this?
Using a whitelist of allowed characters is the best approach in your case:
/^[-+\d(), ]+$/
Unicode has many things it calls a letter, better not mess with that in the first place. And JavaScript regexes aren't well suited for handling these (they lack things like \p{L} for instance unless you use an external library).
Also, by using the whitelist approach you can be sure about the kinds of inputs which will be accepted by your form. You can't predict the kind of mess users could input otherwise. Think about things like this:
TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S̨̥̫͎̭ͯ̿̔̀ͅ
:-)
/[^A-Za-z]/
This regex matches a single non-letter, which isn't very useful. Yura Yakym's answer matches the beginning of the string, any number of non-letters, and then the end of the string, which is useful when it matches: it means your string contains only those things.
Another useful regex is:
/[A-Za-z]/
This matches a single letter, which is useful when it doesn't match: it means your string does not contain any letters at all.
For your question in general, "how can I ensure a string lacks letters?", I would use that second regex: I would try to match a letter, and hopefully fail to do so. For input validation though, I'd prefer a regex that describes all possible valid inputs. If /^[^A-Za-z]*$/ does so, then use that. If you have additional requirements, add those to it. Don't have multiple "no letters? OK. no non-dash special characters? OK." ... well, unless you want to provide error messages precisely about such things.
Try this regular expression: ^[^A-Za-z]*$
You need to include anchors
/^[^A-Za-z]+$/g
This will ensure the string starts and ends with one or more numbers/special characters
You forgot about start and end markers. Also you don't need g flag.
/^[^A-Za-z]*$/
Anyway, that's strange as I can enter ciryllic letters still.

Solving regular expression recursive strings

The Problem
I could match this string
(xx)
using this regex
\([^()]*\)
But it wouldn't match
(x(xx)x)
So, this regex would
\([^()]*\([^()]*\)[^()]*\)
However, this would fail to match
(x(x(xx)x)x)
But again, this new regex would
[^()]*\([^()]*\([^()]*\)[^()]*\)[^()]*
This is where you can notice the replication, the entire regex pattern of the second regex after the first \( and before the last \) is copied and replaces the center most [^()]*. Of course, this last regex wouldn't match
(x(x(x(xx)x)x)x)
But, you could always copy replace the center most [^()]* with [^()]*\([^()]*\)[^()]* like we did for the last regex and it'll capture more (xx) groups. The more you add to the regex the more it can handle, but it will always be limited to how much you add.
So, how do you get around this limitation and capture a group of parenthesis (or any two characters for that matter) that can contain extra groups within it?
Falsely Assumed Solutions
I know you might think to just use
\(.*\)
But this will match all of
(xx)xx)
when it should only match the sub-string (xx).
Even this
\([^)]*\)
will not match pairs of parentheses that have pairs nested like
(xx(xx)xx)
From this, it'll only match up to (xx(xx).
Is it possible?
So is it possible to write a regex that can match groups of parentheses? Or is this something that must be handled by a routine?
Edit
The solution must work in the JavaScript implementation of Regular Expressions
If you want to match only if the round brackets are balanced you cannot do it by regex itself..
a better way would be to
1>match the string using \(.*\)
2>count the number of (,) and check if they are equal..if they are then you have the match
3>if they are not equal use \([^()]*\) to match the required string
Formally speaking, this isn't possible using regular expressions! Regular expressions define regular languages, and regular languages can't have balanced parenthesis.
However, it turns out that this is the sort of thing people need to do all the time, so lots of Regex engines have been extended to include more than formal regular expressions. Therefore, you can do balanced brackets with regular expressions in javascript. This article might help get you started: http://weblogs.asp.net/whaggard/archive/2005/02/20/377025.aspx . It's for .net, but the same applies for the standard javascript regex engine.
Personally though, I think it's best to solve a complex problem like this with your own function rather than leveraging the extended features of a Regex engine.

Match altered version of first match with only one expression?

I'm writing a brush for Alex Gorbatchev's Syntax Highlighter to get highlighting for Smalltalk code. Now, consider the following Smalltalk code:
aCollection do: [ :each | each shout ]
I want to find the block argument ":each" and then match "each" every time it occurrs afterwards (for simplicity, let's say every occurrence an not just inside the brackets).
Note that the argument can have any name, e.g. ":myArg".
My attempt to match ":each":
\:([\d\w]+)
This seems to work. The problem is for me to match the occurrences of "each". I thought something like this could work:
\:([\d\w]+)|\1
But the right hand side of the alternation seems to be treated as an independent expression, so backreferencing doesn't work.
Is it even possible to accomplish what I want in a single expression? Or would I have to use the backreference within a second expression (via another function call)?
You could do it in languages that support variable-length lookbehind (AFAIK only the .NET framework languages do, Perl 6 might). There you could highlight a word if it matches (?<=:(\w+)\b.*)\1. But JavaScript doesn't support lookbehind at all.
But anyway this regex would be very inefficient (I just checked a simple example in RegexBuddy, and the regex engine needs over 60 steps for nearly every character in the document to decide between match and non-match), so this is not a good idea if you want to use it for code highlighting.
I'd recommend you use the two-step approach you mentioned: First match :(\w+)\b (word boundary inserted for safety, \d is implied in \w), then do a literal search for match result \1.
I believe the only thing stored by the Regex engine between matches is the position of the last match. Therefore, when looking for the next match, you cannot use a backreference to the match before.
So, no, I do not think that this is possible.

Categories

Resources