Java Regex replace function not working as intended - javascript

I need some help with a JS Regex.
Here's the string I'm passing, I want to delete everything before 'Hanyuu-sama' with JS Replace.
Hanyuu","dj":{"id":18,"djname":"Hanyuu-sama
The first and second "Hanyuu" can change, the id number can change. This has already been cropped quite a bit with regular expressions.
Now I've tried a few and surprisingly it's failing when I do simple and complex regexes:
I've tried:
.*\"
And it does nothing, I've tried disgusting stuff in my desperation:
.*\","dj\":{\"id":.*,\"djname\":\"
And nada.
Here's a JS Fiddle and here's a http://regex101.com/r/tE2uY0/1 Regex JS matching platform.
Does anyone know why this isn't working?
I know this is likely bad practice, I'm just trying to learn Regexes.
Bonus points if anyone can refer me to a good source to learn Regular expressions. I'd love a solution but I'd like to learn how to do this myself in the future and why this one failed even more.

Your method call should look like this:
source = source.replace(/.*"/, "");
Regular expression in javascript are written between /.../ and not "/.../" like they are in many other languages.
If your string is always structured like that and it does not contain any more characters, your regex should do the trick. That's because the * quantifier acts greedy by default, thus always matching the last " in the string.

Related

Confusion regarding RegExp matches, HTML tags, and newlines

I am attempting to create a Markdown-to-HTML parser. I am trying to use regex expressions to match an input string that may or may not contain HTML tags and whitespace/newlines. I have encountered an interesting case that I do not at all understand.
My regex expression is regex = /\*([\w\s]+|<.+>)\*/g.
The following works:
'*words\nmorewords*'.match(regex)
'*<b>words</b>*'.match(regex)
However, this does not work:
'*<b>words\nmore words</b>*'.match(regex)
If anyone can help me understand why this is so, I would appreciate it.
Edit: I see my faulty logic, thanks to Ry. The expression regex = /\*(<[a-z]+>)?[\w\s]+(<\/[a-z]+>)?\*/g solves this case.
This should work for your purpose:
\*(<.+>)?([\w\s]+)(<.+>)?\*
The HTML tags can exist or not (<.+>)?. The \n is matched by the \s (whitespace).
I'm also going to link the canonical don't parse HTML with regex answer, because regex is not suitable for (or even capable of) parsing HTML beyond fairly restricted subsets. Have a read, it's informative (and funny)!
Recall the Chomsky Heirarchy. Regular expressions can parse regular languages. HTML is not a regular language (it is the next level up, context sensitive).
There are extensions to some regular expression engines that give it recursive capability. You can probably parse HTML with these but there are better ways, like using a proper HTML parser for example DOMParser.

Regexp for finding all regexps in project

I need to optimize all regexps in a JavaScript project. I found all the ones created with new RegExp with a simple search. The problem are the ones created as literals:/asd/.
I am using PhPStorm so the regexp engine is Java. That means we have look behind. So i came up with this:
(?<=[\s=(,\[\?:;|)])\/[^*\n/][^\n/]*[^*]\/
This translates in give me everything that looks like /.../ and is not preceded by one of the following:\s= (,[?:;|).
Can a regexp be preceded by anything else?
Do you have a better idea?
Searching for methods used by String and RegExp classes is not acceptable(exec, replace...) because finding the declaration in some projects is very hard and requires a lot of time. Plus you can have multiple uses of the same regexp.
My regexp was a bit off. I used this eventually:
(?<=[\s=(,\[\?:;|)])\/[^\n/].*?\/

Regex - PHP to Javascript

I have the following regex:
<div\s*class="selectionDescription">(.*?)<\/div>
Which works with PHP perfectly. I am aware that javascript does not support the \s flag.
I have tried using the \g flag, however my pattern is not matched.
I am looking to match everything inside the div in the following string:
<div class="selectionDescription">Text to match</div>
I receive the following error in javascript:
Uncaught SyntaxError: Invalid flags supplied to RegExp constructor 's'(…)
Your pattern seems to work.
s is not a flag, so if you are trying something like new RegExp('<div\s*class="selectionDescription">(.*?)<\/div>', 's') then yes, you would find an error.
You do not need to add any flags, except perhaps the g flag to capture this div many times. (Check it out)
Maybe check out a quick primer on Javascript's regular expressions?
If you are spanning multiple lines, and you mean the single line mode that s provides, you can emulate that with [\S\s] or some other similar "all inclusive, all exclusive" style: [\d\D], [\W\w], etc.
That will allow it to span multiple lines and still match:
<div\s*class="selectionDescription">([\S\s]*?)<\/div>
You need to be wary of using lazy *? quantifiers, however. Take a look at https://regex101.com/r/xD2jV8/1 where the number of steps is 220.
If the content between <div> and </div> tags is very large, this becomes very computationally expensive, very fast.
While slightly less readable,
<div\s*class="selectionDescription">((?:[^<]+|<(?!\/div>))*)<\/div>
would do the same but within only 69 steps.
And at that point, https://regex101.com/r/xD2jV8/3 slightly optimizes it further, but HTML really isn't the best way to handle things with HTML. jQuery could perform this quickly and much "safer": $('div.selectionDescription').html()
Of course, you may not have access to this at this point, but HTML is usually not the best thing to use for parsing HTML.

Comma Operator to Semicolons

I have a chunk of javascript that has many comma operators, for example
"i".toString(), "e".toString(), "a".toString();
Is there a way with JavaScript to convert these to semicolons?
"i".toString(); "e".toString(); "a".toString();
This might seem like a cop-out answer... but I'd suggest against trying it. Doing any kind of string manipulation to change it would be virtually impossible. In addition to function definition argument lists, you'd also need to skip text in string literals or regex literals or function calls or array literals or object literals or variable declarations.... maybe even more. Regex can't handle it, turning on and off as you see keywords can't handle it.
If you want to actually convert these, you really have to actually parse the code and figure out which ones are the comma operator. Moreover, there might be some cases where the comma's presence is relevant:
var a = 10, 20;
is not the same as
var a = 10; 20;
for example.
So I really don't think you should try it. But if you do want to, I'd start by searching for a javascript parser (or writing one, it isn't super hard, but it'd probably take the better part of a day and might still be buggy). I'm pretty sure the more advanced minifiers like Google's include a parser, maybe their source will help.
Then, you parse it to find the actual comma expressions. If the return value is used, leave it alone. If not, go ahead and replace them with expression statements, then regenerate the source code string. You could go ahead and format it based on scope indentation at this time too. It might end up looking pretty good. It'll just be a fair chunk of work.
Here's a parser library written in JS: http://esprima.org/ (thanks to #torazaburo for this comment)

Getting parts of a URL in JavaScript

I have to match URLs in a text, linkify them, and then display only the host--domain name or IP address--to the user. How can I proceed with JavaScript?
Thanks.
PS: please don't tell me about this; those regular expressions are so buggy they can't match http://google.com
If you don't want to use regular expressions, then you'll need to use things like indexOf and such instead. For instance, search for "://" in the text of every element and if you find it and the bit in front of it looks like a protocol (or "scheme"), grab it and the following characters that are valid URI characters (RFC2396). If the result ends in a dot or question mark, remove the dot or question (it probably ends a sentence). There's not really a lot more to say.
Update: Ah, I see from your edit that you don't have a problem with regular expressions, just the ones in the answers to that question. Fair enough.
This may well be one of those places where trying to do it all with a regular expression is more work that it should be, but using regular expressions as part of the solution is helpful. For instance,
/[a-zA-Z][a-zA-Z0-9+\-.]*:\/\//
...may well be a helpful way to find the beginning of a URL, since the scheme portion must start with an alpha and then can have zero or more alpha, digit, +, -, or . prior to the : (section 3.1).

Categories

Resources