How to extract a string from a larger string?

How to extract a string from a larger string? - javascript

Using jQuery how would I find/extract the "Spartan" string if the outputted HTML page had the following..
<a href="/mytlnet/logout.php?t=e22df53bf4b5fc0a087ce48897e65ec0">
<b>Logout</b>
</a> : Spartan<br>

Regular Expressions. Or by splitting the string in a more tedious fashion.
Since I'm not a big regex-junkie, I would likely get the text-equivalent using .text(), then split the result on ":", and grab the second index (which would be the 'Spartan' text).

if the pattern is going to be consistent you can your RegEx
or if the markup is going to be the same you can try the jQuery HTML Parser

As well as using regular expressions, I've also abusively used these functions to do the things it seems you want to do (strip html and such from a text string):
//removes all HTML tags
function striptags(stringToStrip) {
return stringToStrip.replace(/(<([^>]+)>)/ig,"");
}
//standard trim function for JavaScript--removes leading and trailing white space
function trim(stringToTrim) {
return stringToTrim.replace(/^\s+|\s+$/g,"");
}
Incredible regular expressions that have saved me a lot of time.

Related

Getting functions content with regex

I'm trying to get the functions from an input string by using regular expressions.
So far, I managed to get the JavaScript kind of function declarations with the following regex:
/(\b(f|F)unction(.*?)\((.*?)\)\s*\{)/g
Which when applied like this, will return me whole declaration on the first index:
var re = /(\b(f|F)unction(.*?)\((.*?)\)\s*\{)/g;
while ((m = re.exec(text)) !== null) {
//m[0] contains the function declaration
declarations.push(m[0]);
}
Now, I would like to get in the returned match, the whole content of each of the functions so I can work with it later on (removed it, wrap it...)
I haven't managed to find a regext to do so, so far, I got this:
(\b(f|F)unction(.*?)\((.*?)\)\s*\{)(.*?|\n)*\}
But of course, it catches the first closing bracket } instead of the one at the end of each of the functions.
Any idea of how to get the closing } of each function?

Any idea of how to get the closing } of each function?
This will be very hard with a regex. Because the function body can include any number of possibly nested brace pairs. And then consider strings containing unmatched braces in the function body.
To parse a non-regular language you need something more powerful than regular expressions: a parser for that language.
(Some regex variants have some ability to matched paired characters, but firstly JavaScript's regex engine isn't one; and secondly then there are those strings….)

Javascript doesn't interpret Hair space as a space with regex

I use a regex for my splitfunction.
string.split(/\s/)
But   (which is a Hair Space), will not be recognised. How to make sure it does (without implementing the exact code in the regex expression)

Per MDN, the definition of \s in a regex (in the Firefox browser) is this:
[ \f\n\r\t\v\u00a0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]
So, if you want to split on something in addition to this (e.g. an HTML entity), then you will need to add that to your own regex. Remember, string.split() is not an HTML function, it's a string function so it doesn't know anything special about HTML. If you want to split on certain HTML tags or entities, you will have to code up a regex that includes the things you want to split on.
You can code for it yourself like this:
string.split(/\s| /);
Working demo: http://jsfiddle.net/jfriend00/nAQ97/
If what you really want to do is to have your HTML parsed and converted to text by the browser (which will process all entities and HTML tags), then you can do this:
function getPlainText(str) {
var x = document.createElement("div");
x.innerHTML = str;
return (x.textContent || x.innerText);
}
Then, you could split your string like this:
getPlainText(str).split(/\s/);
Working demo: http://jsfiddle.net/jfriend00/KR2aa/
If you want to make absolutely sure this works in older browsers, you'd either have to test one of these above functions in all browsers that you care about or you'd have to use a custom regex with all the entities you want to split on in the first option or do a search/replace on all unicode characters that you want to split on in the second option and turn them into a regular space before doing the split. Because older browsers weren't very consistent here, there is no free lunch if you want safe compatibility with old browsers.

Regex simply not working in IE 8, 9 and 10

Here is our text to replace:
<IMG src="https://domain.com/images/siteheader.jpg">
Using javascript .replace, we try to replace with blank space using the following:
.replace ("/<A href=\"http:\/\/domain.com\"><IMG src=\"https:\/\/domain.com\/images\/siteheader.jpg\"><\/A>/i"," ");
In all other browsers this seems to work, but not in IE. I even tried using this online regex validator: http://www.online-toolz.com/tools/regexp-editor.php and it says it's valid. Kind of stumped. Is IE doing something out of the norm?

You either use a string (the literal form of which looks like "...") or a regular expression (the literal form of which looks like /.../) with replace. You're trying to do both simultaneously. Remove the quotes:
.replace (/<A href="http:\/\/domain.com"><IMG src="https:\/\/domain.com\/images\/siteheader.jpg"><\/A>/i, " ");
When you use a string, it's just matched literally, no regular expression processing is done.
I haven't validated the entire contents of the regex, just removed the surrounding " and removed the \ in front of the embedded ".

Regexes are literals and should not have quotes around them:
.replace(/your regex here/,'replacement')
That being said, where is the text coming from? If it's coming from .innerHTML, browsers may return a string that is different from what you literally have in the source (for instance, attribute names may be uppercased, or the attributes themselves swapped. I believe older versions of IE strip out quotes around single-word attribute values, which would also mess with your regex.
In short, you should not use a regex for this. You could try this instead:
var toRemove = document.querySelector("a[href='http://domain.com']"),
parent = toRemove.parentNode;
parent.removeChild(toRemove);

removing phpbb tag using regex javascript

I'm trying to remove a rectangular brackets(bbcode style) using javascript, this is for removing unwanted bbcode.
I try with this.
theString .replace(/\[quote[^\/]+\]*\[\/quote\]/, "")
it works with this string sample:
theString = "[quote=MyName;225]Test 123[/quote]";
it will fail within this sample:
theString = "[quote=MyName;225]Test [quote]inside quotes[/quote]123[/quote]";
if there any solution beside regex no problem

The other 2 solutions simply do not work (see my comments). To solve this problem you first need to craft a regex which matches the innermost matching quote elements (which contain neither [QUOTE..] nor [/QUOTE]). Next, you need to iterate, applying this regex over and over until there are no more QUOTE elements left. This tested function does what you want:
function filterQuotes(text)
{ // Regex matches inner [QUOTE]non-quote-stuff[/quote] tag.
var re = /\[quote[^\[]+(?:(?!\[\/?quote\b)\[[^\[]*)*\[\/quote\]/ig;
while (text.search(re) !== -1)
{ // Need to iterate removing QUOTEs from inside out.
text = text.replace(re, "");
}
return text;
}
Note that this regex employs Jeffrey Friedl's "Unrolling the loop" efficiency technique and is not only accurate, but is quite fast to boot.
See: Mastering Regular Expressions (3rd Edition) (highly recommended).

Try this one:
/\[quote[^\/]+\].*\[\/quote\]$/
The $ sign indicates that only the closing quote element at the end of the string should be used to determine the ending of the quote you're trying to remove.
And i added a "." before the asterisk so that this will match any sign in between. I tested this with your two strings and it worked.
edit: I don't exactly know how you are using that. But just as an addition. If you want the pattern also to match to a string where no attributes are added for example:
[quote]Hello[/quote]
You should change the "+" sign into an asterisk as well like this:
/\[quote[^\/]*\].*\[\/quote\]$/

This answer has flaws, see Ridgerunner's answer for a more correct one.
Here's my crack at it.
function filterQuotes(text)
{
return text.replace(/\[(\/)?quote([^\/]*)?\]/g,"");
}

How do I escape a string inside JavaScript code inside an onClick handler?

Maybe I'm just thinking about this too hard, but I'm having a problem figuring out what escaping to use on a string in some JavaScript code inside a link's onClick handler. Example:
Select
The <%itemid%> and <%itemname%> are where template substitution occurs. My problem is that the item name can contain any character, including single and double quotes. Currently, if it contains single quotes it breaks the JavaScript code.
My first thought was to use the template language's function to JavaScript-escape the item name, which just escapes the quotes. That will not fix the case of the string containing double quotes which breaks the HTML of the link. How is this problem normally addressed? Do I need to HTML-escape the entire onClick handler?
If so, that would look really strange since the template language's escape function for that would also HTMLify the parentheses, quotes, and semicolons...
This link is being generated for every result in a search results page, so creating a separate method inside a JavaScript tag is not possible, because I'd need to generate one per result.
Also, I'm using a templating engine that was home-grown at the company I work for, so toolkit-specific solutions will be of no use to me.

In JavaScript you can encode single quotes as "\x27" and double quotes as "\x22". Therefore, with this method you can, once you're inside the (double or single) quotes of a JavaScript string literal, use the \x27 \x22 with impunity without fear of any embedded quotes "breaking out" of your string.
\xXX is for chars < 127, and \uXXXX for Unicode, so armed with this knowledge you can create a robust JSEncode function for all characters that are out of the usual whitelist.
For example,
Select

Depending on the server-side language, you could use one of these:
.NET 4.0
string result = System.Web.HttpUtility.JavaScriptStringEncode("jsString")
Java
import org.apache.commons.lang.StringEscapeUtils;
...
String result = StringEscapeUtils.escapeJavaScript(jsString);
Python
import json
result = json.dumps(jsString)
PHP
$result = strtr($jsString, array('\\' => '\\\\', "'" => "\\'", '"' => '\\"',
"\r" => '\\r', "\n" => '\\n' ));
Ruby on Rails
<%= escape_javascript(jsString) %>

Use hidden spans, one each for each of the parameters <%itemid%> and <%itemname%> and write their values inside them.
For example, the span for <%itemid%> would look like <span id='itemid' style='display:none'><%itemid%></span> and in the javascript function SelectSurveyItem to pick the arguments from these spans' innerHTML.

If it's going into an HTML attribute, you'll need to both HTML-encode (as a minimum: > to > < to &lt and " to ") it, and escape single-quotes (with a backslash) so they don't interfere with your javascript quoting.
Best way to do it is with your templating system (extending it, if necessary), but you could simply make a couple of escaping/encoding functions and wrap them both around any data that's going in there.
And yes, it's perfectly valid (correct, even) to HTML-escape the entire contents of your HTML attributes, even if they contain javascript.

Try avoid using string-literals in your HTML and use JavaScript to bind JavaScript events.
Also, avoid 'href=#' unless you really know what you're doing. It breaks so much usability for compulsive middleclickers (tab opener).
<a id="tehbutton" href="somewhereToGoWithoutWorkingJavascript.com">Select</a>
My JavaScript library of choice just happens to be jQuery:
<script type="text/javascript">//<!-- <![CDATA[
jQuery(function($){
$("#tehbutton").click(function(){
SelectSurveyItem('<%itemid%>', '<%itemname%>');
return false;
});
});
//]]>--></script>
If you happen to be rendering a list of links like that, you may want to do this:
<a id="link_1" href="foo">Bar</a>
<a id="link_2" href="foo2">Baz</a>
<script type="text/javascript">
jQuery(function($){
var l = [[1,'Bar'],[2,'Baz']];
$(l).each(function(k,v){
$("#link_" + v[0] ).click(function(){
SelectSurveyItem(v[0],v[1]);
return false;
});
});
});
</script>

Another interesting solution might be to do this:
Select
Then you can use a standard HTML-encoding on both the variables, without having to worry about the extra complication of the javascript quoting.
Yes, this does create HTML that is strictly invalid. However, it is a valid technique, and all modern browsers support it.
If it was my, I'd probably go with my first suggestion, and ensure the values are HTML-encoded and have single-quotes escaped.

Declare separate functions in the <head> section and invoke those in your onClick method. If you have lots you could use a naming scheme that numbers them, or pass an integer in in your onClicks and have a big fat switch statement in the function.

Any good templating engine worth its salt will have an "escape quotes" function. Ours (also home-grown, where I work) also has a function to escape quotes for javascript. In both cases, the template variable is then just appended with _esc or _js_esc, depending on which you want. You should never output user-generated content to a browser that hasn't been escaped, IMHO.

I have faced this problem as well. I made a script to convert single quotes into escaped double quotes that won't break the HTML.
function noQuote(text)
{
var newtext = "";
for (var i = 0; i < text.length; i++) {
if (text[i] == "'") {
newtext += "\"";
}
else {
newtext += text[i];
}
}
return newtext;
}

Use the Microsoft Anti-XSS library which includes a JavaScript encode.

First, it would be simpler if the onclick handler was set this way:
<a id="someLinkId"href="#">Select</a>
<script type="text/javascript">
document.getElementById("someLinkId").onClick =
function() {
SelectSurveyItem('<%itemid%>', '<%itemname%>'); return false;
};
</script>
Then itemid and itemname need to be escaped for JavaScript (that is, " becomes \", etc.).
If you are using Java on the server side, you might take a look at the class StringEscapeUtils from jakarta's common-lang. Otherwise, it should not take too long to write your own 'escapeJavascript' method.

Is the answers here that you can't escape quotes using JavaScript and that you need to start with escaped strings.
Therefore. There's no way of JavaScript being able to handle the string 'Marge said "I'd look that was" to Peter' and you need your data be cleaned before offering it to the script?

I faced the same problem, and I solved it in a tricky way. First make global variables, v1, v2, and v3. And in the onclick, send an indicator, 1, 2, or 3 and in the function check for 1, 2, 3 to put the v1, v2, and v3 like:
onclick="myfun(1)"
onclick="myfun(2)"
onclick="myfun(3)"
function myfun(var)
{
if (var ==1)
alert(v1);
if (var ==2)
alert(v2);
if (var ==3)
alert(v3);
}

Develop Reference

JavaScript is the programming language of the Web.

How to extract a string from a larger string? - javascript

Using jQuery how would I find/extract the "Spartan" string if the outputted HTML page had the following.. <a href="/mytlnet/logout.php?t=e22df53bf4b5fc0a087ce48897e65ec0"> <b>Logout</b> </a> : Spartan<br>

Regular Expressions. Or by splitting the string in a more tedious fashion. Since I'm not a big regex-junkie, I would likely get the text-equivalent using .text(), then split the result on ":", and grab the second index (which would be the 'Spartan' text).

if the pattern is going to be consistent you can your RegEx or if the markup is going to be the same you can try the jQuery HTML Parser

Related

Getting functions content with regex

Javascript doesn't interpret Hair space as a space with regex

Regex simply not working in IE 8, 9 and 10

removing phpbb tag using regex javascript

How do I escape a string inside JavaScript code inside an onClick handler?

Categories

Resources