Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am currently using the following to convert [url=][/url] to a html link:
s = message.replace(/\[url=([^\]]+)\]\s*(.*?)\s*\[\/url\]/gi, "<a href='$1'>$2</a>")
That work's fine.
I then added on another replace function using a regex to replace www with http://www like so:
s = message.replace(/\[url=([^\]]+)\]\s*(.*?)\s*\[\/url\]/gi, "<a href='$1'>$2</a>")
.replace(/www/g, "http://www");
This is probably not the best/efficient method and also does not support https:// which is not a priority at the moment but is something I would like to include at some point. Could someone please advise me on what I could do to improve the regex?
Couple of things:
First, there are some problems with your second pattern. It is not case insensitive (whereas the first pattern is) so it won't catch things like 'WWW'. Maybe that's desirable, maybe not. But it's also global, and is not anchored to the beginning of the URL. So it will replace www anywhere in the URL. Changing it to something like /href=\'www/ and then changing the replace string to href='http://www should solve these problems.
Secondly, in a case like this, using 2 regular expressions may not be bad. You can fold it into one regex if you want, but that doesn't mean it is any more efficient for the computer or the poor human who happens to be reading it.
All that said, one way to accomplish this with a single regular expression, is to do something like:
s = message.replace(/\[url=(?:http:\/\/)?([^\]]+)\]\s*(.*?)\s*\[\/url\]/gi, "<a href='http://$1'>$2</a>")
This may accomplish what you want. It does not key off of "www", though, it simply prepends "http://" to any URL that does not already begin with that string. It also doesn't support https, as you mentioned, but I'm not sure how you will support that anyway. If all you are given is a URL with no protocol, how do you determine whether or not to make it https?
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I sometimes run into situations where I would need a best practice to define a long string. I'm talking about something like this:
const text = 'This is indeed a very long string. Some might say that it is really, really long.'
My problem here is that the string is just too wide. If I would prefer a solution where the column width is considered. I would usually use one of two solutions:
a.)
const text = 'This is indeed a very long string. ' +
'Some might say that it is really, really long.'
The problem with this one is that it uses an unnecessary concatenation.
b.)
const text = `This is indeed a very long string.
Some might say that it is really, really long.`
And the problem with this one is that the resulting string will actually contain a new line, which might not be wanted in some situations.
I realize that this might be a question for opinionated answers, but I still feel like that I'm missing something, or that there is a better solution out there. Please show me if you have one!
You can use the string continuation character \ (single backslash) to do that.
const longString = "This is a really really \
long string that should \
not be split in multiple lines."
console.log(longString)
See the documentation in the Long literal strings section for details.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm somewhat new to regex. I understand most of the basics but what I'm trying to do is beyond my knowledge, and may not even be possible.
I'm trying to make a regex in JavaScript that can match a series of function calls in the following pattern.
Name.Name(Params).Name(Params)
The names could be any standard java function name. I understand how do to this part. The params though can be different number of parameters (Currently only 0-2)
My biggest issue however is that params could potentially take ANY string with either a single or double quotation mark, or variable names. I have added some examples below as I need all of these to work with my regular expression (if Possible).
Examples:
Func.Foo().Bar()
Foo.Bar('foo', bar).Foobar()
Foo.Bar("foo", "bar").bar(')')
Foo.Bar('/"foo/"').bar("foo(bar/")")
My main concern here is I cant just look for a opening and parentheses or even 2 quotation marks.
Is it possible to use a regex so that I can parse the function call and parameters out?
The short answer to the Question in the title is yes, you can build a regex that matches any substring. But unfortunately that is not what you want. If you allow arbitrary substrings your regex will either match many cases you dont want to match or it will become extremely complex (see the email regex for an example).
What you want is a tokenizer!(https://medium.freecodecamp.org/how-to-build-a-math-expression-tokenizer-using-javascript-3638d4e5fbe9)
Edit: for the solutions in the comments: the ast parser is for java, the author wants to use javascript.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I found this too code in one example script:
'regex':'^[a-zA-Z]{2} *\\d{6}|[a-zA-Z]{2} *\\d{3} *[a-zA-Z]{2}|[a-zA-Z]{2}[a-zA-Z]{2}[0-9]{6}|[a-zA-Z]{2}[0-9]{1}[a-zA-Z]{5}|[a-zA-Z]{2}[a-zA-Z]{1}[0-9]{5}|[a-zA-Z]{2}[0-9]{1}[a-zA-Z]{1}[0-9]{4}|[a-zA-Z]{2}[0-9]{5}[a-zA-Z]{1}|[a-zA-Z]{2}[0-9]{2}[a-zA-Z]{1}[0-9]{3}$',
but i really do not know what it means...
Regex is a regular expression for example disecting parts of text from a bigger collection of text.
Say for example that you want to find all names in a newspaper. Instead of reading the entire thing looking for each name you can make a regex model of finding every word starting with a capital letter that is not right after a punctuation.
In your example the regex ^ means it looks for something that starts with the following:
- A small or capital letter between a to z
- The word is to letters long
Read a couple of examples and you'll get the hang of it.
http://www.dreambank.net/regex.html
It is simply just a model how text is built up.
If you just want the explanation of the posted regex, you should try regex101 . It breaks down the posted regex into capture groups and gives a pretty detailed explanation and matches for a given input and regex.
Like many have already suggested, it would be better if you start by reading about how regex works. I'm sure you'll find plenty of related questions on Stackoverflow.
However, I have created a simple demo of your posted regex on regex101 like I said. I'm going to refrain from posting the entire explanation here. it would be a good exercise if you try to read the explanation and understand it on your own.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I've taken over the development of a project and have noticed a weird snippet in the JavaScript where the developer wrote the following:
/* ... code */
var el = document.getElementById('foo');
el.href = "http://" + "w" + "w" + "w" + "." + "d" + "o" + "main.com/foobar/";
/* ... code */
I have some hunches as to what the purpose is, but will refrain from expressing it so as to not misguide, probably better, answers ...
What is the purpose of concatenating the domain?
In terms of JavaScript itself, this has practically no effect - the result is the same.
But the reason may be different than to accomplish some task in JavaScript. I guess there are two possibilities that are most likely to be the case here:
To mislead other programmers (so the domain name is not easily found by simple text search). Similar (but a lot more complex) ways are used by worms to insert code into the website without showing what it contains, unless you will put a lot more effort to analyse it.
To try to mislead crawlers, which probably assumes they are not parsing the JavaScript and getting actual result. It may be the case for example if the programmer feared that the code will be eg. indexed and by searching this domain name in the search engine, anyone can find out it was mentioned in the code of the site you are describing.
Well normally you would be concatenating this because you would use variables within it...
el.href = "http://www." + domain + "." + ext "/" + additionalUrl;
Otherwise separating letter by letter like that serves no purpose. I'm assuming the previous programmer was just bored :)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I would like to create a similar tool to Instapaper or Readability and I wonder what is the best way to find and get text from a web page. Do you have any ideas?
The question is too broad to give a concrete answer to, but you can separate this question into three concerns:
A way to grab web resources. libcurl for example, or just about anything able to talk HTTP.
A DOM parser. Python has xml.dom.minidom, for example.
An algorithm for traversing the DOM tree and extracting text. Be it scanning for elements with class=article, or <div>s with more than 1024 characters etc., is entirely up to you. You will need experimentation to get this right.
I suggest asking separate questions for each of these concerns. After doing research on each, of course. :)
Here is an idea to get you started in Ruby. Just tested the code below and it is working fine for me. Have a look it might help you.
require 'open-uri'
require 'cgi'
require 'nokogiri'
$url='http://www.stackoverflow.com'
$txt_file = open($url)
$raw_contents = $txt_file.read
$html = Nokogiri::HTML(CGI.unescapeHTML($raw_contents)).content
#strip the web page fetched out of all hmtl tags and encoded chars
$txt_file = File.new('c:\ruby193\bin\web-content\stack.txt', "w")
#stack.txt now contains a stripped, pure txt file which you can manipulate further
$txt_file.write($html)
$txt_file.close
puts 'Here is the stripped text of your webpage\n'+$html