Javascript regular expressions - javascript

Having a small problem for a quick "Search and Highlight" script that I'm working on. I'm using regular expressions because I'd like to do the searching all on client side, after the document has loaded. My search/highlight function goes like this:
function highlight(word, colour, container) {
var regex = new RegExp("(>[^<]*?)(" + word + ")", "ig");
var replace = "$1<span name='searchTerm' style='background-color: " + colour + "'>$2</span>";
if (regex.exec(container.innerHTML)) {
container.innerHTML = container.innerHTML.replace(regex, replace);
return true;
}
return false;
}
word is the word to search for, colour is the colour to highlight it and container is the element to search in.
Consider an element that contained this:
<ul>
<li>Set the setting to the correct setting.</li>
</ul>
Say I passed the word "set" to the highlight function. In it's current state, it only finds the first instance of set due to lazy repitition.
So what if I change the regex to this:
var regex = new RegExp("(>[^<]*?)?(" + word + ")", "ig");
This now works great, it highlights all instances of the string "set". But if I pass the search word "li" then it will replace the text inside the tags!
Is there a quick fix for this regular expression to get the behaviour I want? I need it to replace all instances of the search string but not those found as part of a tag. I'd like to keep it client-side using regex.
Thanks!

You shouldn't be using regex to parse HTML. Walk the DOM tree properly and do a search and replace on pure text.
By the way there's a jQuery plugin that does what you want; you could use it or look at it to get an idea on how to do it:
http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html

Related

mark text in a string with regular expression but exclude links

I have a text and I want when a user search for a term, the term becomes highlighted by wrapping the term with mark tag.
javascript to wrap the match term:
var sampleText = window.document.getElementById('test').innerHTML;
var _keywordHighlight = function (text, term) {
var pattern = new RegExp('('+term+')', 'gi');
text = text.replace(pattern, '<mark>$1</mark>');;
return text;
};
var newText = _keywordHighlight(sampleText, 'sample');
window.document.getElementById('test').innerHTML = newText;
jsfiddle.net link:
https://jsfiddle.net/homa/j0Lgk6pf/
The problem is, the search term inside the url also wraps by mark tag and it broke the link.
How can I exclude links to be wrapped by mark tag?
Use a negative lookahead to add an additional constraint that the term is not followed by a > without first having a <. This will effectively exclude matches within <...> markup.
var pattern = new RegExp('('+term+')(?![^<]*>)', 'gi');
https://jsfiddle.net/qdk80o0k/
You're reinverting the wheel
Using innerHTML will destroy events
Using innerHTML will trigger regeneration of the DOM
To make thins easy you should use an existing plugin. There are many jQuery plugins out there, but as you haven't added the jquery tag I assume that you're searching a plain JS solution. Then the only plugin is mark.js.
Example of your use case

Replace with RegExp only outside tags in the string

I have a strings where some html tags could present, like
this is a nice day for bowling <b>bbbb</b>
how can I replace with RegExp all b symbols, for example, with :blablabla: (for example) but ONLY outside html tags?
So in that case the resulting string should become
this is a nice day for :blablabla:owling <b>bbbb</b>
EDIT: I would like to be more specific, based on the answers I have received. So first of all I have just a string, not DOM element, or anything else. The string may or may not contain tags (opening and closing). The main idea is to be able to replace anywhere in the text except inside tags. For example if I have a string like
not feeling well today :/ check out this link http://example.com
the regexp should replace only first :/ with real smiley image, but should not replace second and third, because they are inside (and part of) tag. Here's an example snippet using the regexp from one of the answer.
var s = 'not feeling well today :/ check out this link http://example.com';
var replaced = s.replace(/(?:<[^\/]*?.*?<\/.*?>)|(:\/)/g, "smiley_image_here");
document.querySelector("pre").textContent = replaced;
<pre></pre>
It is strange but the DEMO shows that it captured the correct group, but the same regexp in replace function seem not to be working.
The regex itself to replace all bs with :blablabla: is not that hard:
.replace(/b/g, ":blablabla:")
It is a bit tricky to get the text nodes where we need to perform search and replace.
Here is a DOM-based example:
function replaceTextOutsideTags(input) {
var doc = document.createDocumentFragment();
var wrapper = document.createElement('myelt');
wrapper.innerHTML = input;
doc.appendChild( wrapper );
return textNodesUnder(doc);
}
function textNodesUnder(el){
var n, walk=document.createTreeWalker(el,NodeFilter.SHOW_TEXT,null,false);
while(n=walk.nextNode())
{
if (n.parentNode.nodeName.toLowerCase() === 'myelt')
n.nodeValue = n.nodeValue.replace(/:\/(?!\/)/g, "smiley_here");
}
return el.firstChild.innerHTML;
}
var s = 'not feeling well today :/ check out this link http://example.com';
console.log(replaceTextOutsideTags(s));
Here, we only modify the text nodes that are direct children of the custom-created element named myelt.
Result:
not feeling well today smiley_here check out this link http://example.com
var input = "this is a nice day for bowling <b>bbbb</b>";
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:')
+ c;
});
document.querySelector("pre").textContent = result;
<pre></pre>
You can do this:
var result = input.replace(/(^|>)([^<]*)(<|$)/g, function(_,a,b,c){
return a
+ b.replace(/b/g, ':blablabla:') // you may do something else here
+ c;
});
Note that in most (no all but most) real complex use cases, it's much more convenient to manipulate a parsed DOM rather than just a string. If you're starting with a HTML page, you might use a library (some, like my one, accept regexes to do so).
I think you can use a regex like this : (Just for a simple data not a nested one)
/<[^\/]*?b.*?<\/.*?>|(b)/ig
[Regex Demo]
If you wanna use a regex I can suggest you use below regex to remove all tags recursively until all tags removed:
/<[^\/][^<]*>[^<]*<\/.*?>/g
then use a replace for finding any b.

Is there any way for me to work with this 100,000 item new-line separated string of words?

I've got a 100,000+ long list of English words in plain text. I want to use split() to convert the list into an array, which I can then convert to an associative array, giving each list item a key equal to its own name, so I can very efficiently check whether or not a string is an English word.
Here's the problem:
The list is new-line separated.
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
This means that var list = ' <copy/paste list> ' isn't going to work, because JavaScript quotes don't work multi-line.
Is there any way for me to work with this 100,000 item new-line separated string?
replace the newlines with commas in any texteditor before copying to your js file
One workaround would be to use paste the list into notepad++. Then select all and Edit>Line Operations>Join lines.
This removes new lines and replaces them with spaces.
If you're doing this client side, you can use jQuery's get function to get the words from a text file and do the processing there:
jQuery.get('wordlist.txt', function(results){
//Do your processing on results here
});
If you're doing this in Node.js, follow the guide here to see how to read a file into memory.
You can use notepad++ or any semi-advanced text editor.
Go to notepad++ and push Ctrl+H to bring up the Replace dialog.
Towards the bottom, select the "Extended" Search Mode
You want to find "\r\n" and replace it with ", "
This will remove the newlines and replace it with commas
jsfiddle Demo
Addressing this purely from having a string and trying to work with it in JavaScript through copy paste. Specifically the issues regarding, "This means that var list = ' ' isn't going to work, because JavaScript quotes don't work multi-line.", and "Is there any way for me to work with this 100,000 item new-line separated string?".
You can treat the string like a string in a comment in JavaScript . Although counter-intuitive, this is an interesting approach. Here is the main function
function convertComment(c) {
return c.toString().
replace(/^[^\/]+\/\*!?/, '').
replace(/\*\/[^\/]+$/, '');
}
It can be used in your situation as follows:
var s = convertComment(function() {
/*
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
*/
});
At which point you may do whatever you like with s. The demo simply places it into a div for displaying.
jsFiddle Demo
Further, here is an example of taking the list of words, getting them into an array, and then referencing a single word in the array.
//previously shown code
var all = s.match(/[^\r\n]+/g);
var rand = parseInt(Math.random() * all.length);
document.getElementById("random").innerHTML = "Random index #"+rand+": "+all[rand];
If the words are in a separate file, you can load them directly into the page and go from there. I've used a script element with a MIME type that should mean browsers ignore the content (provided it's in the head):
<script type="text/plain" id="wordlist">
aa
aah
aahed
aahing
aahs
aal
aalii
aaliis
aals
</script>
<script>
var words = (function() {
var words = '\n' + document.getElementById('wordlist').textContent + '\n';
return {
checkWord: function (word) {
return words.indexOf('\n' + word + '\n') != -1;
}
}
}());
console.log(words.checkWord('aaliis')); // true
console.log(words.checkWord('ahh')); // false
</script>
The result is an object with one method, checkWord, that has access to the word list in a closure. You could add more methods like addWord or addVariant, whatever.
Note that textContent may not be supported in all browsers, you may need to feature detect and use innerText or an alternative for some.
For variety, another solution is to put the unaltered content into
A data attribute - HTML attributes can contain newlines
or a "non-script" script - eg. <SCRIPT TYPE="text/x-wordlist">
or an HTML comment node
or another hidden element that allows content
Then the content could be read and split/parsed. Since this would be done outside of JavaScript's string literal parsing it doesn't have the issue regarding embedded newlines.

Changing the color of the first letter of some text in JavaScript

I have the following string: "*Username".
Is there a way to change the color of the asterik while keeping the "Username" part as is?
I decided to write an answer using proper DOM methods to do that, it's not the shortest but I don't use regular expressions or string concatenation and manipulation.
The algorithm is quite simple.
You have to select all the elements that contain the text you want to highlight.
Create a highlighted asterisk.
Replace the old content with the new one where the first character is an asterisk.
Here's the JavaScript part.
// Create the span containing the highlighted asterisk
var asterisk = document.createElement('span');
asterisk.className = 'highlight';
asterisk.appendChild(document.createTextNode('*'));
/*
* users is a NodeList in my case.
* Walk the users (teehee) and check if the first characer is an asterisk
*/
for (var i = 0; i < users.length; ++i) {
var user = users[i];
var text = user.textContent;
if (text.charAt(0) == '*') {
user.removeChild(user.firstChild);
user.appendChild(asterisk);
user.appendChild(document.createTextNode(text.slice(1)));
}
}
Note: I'm aware of the implications of using .textContent, if you want to support IE<9 please shim it.
You'll also need a CSS class that defines how a highlighted asterisk span will look like.
Here is a JSFiddle you can play around with.
You'd have to wrap it in a containing element, a really basic example might be:
var str = '*Username';
str = str.replace(/^\*(.*)/, '<span style="color:red">*</span>$1');
jsFiddle
The JavaScript way
You can use the .charAt() and .slice() methods to recolor the first character (no matter what the first character is nor how many asterisks are there):
var oldString = document.getElementById('element').innerHTML;
var newString = "<span style='color:red'>"
+ oldString.charAt(0)
+ "</span>"
+ "<span style='color:white'>"
+ oldString.slice(1)
+ "</span>";
document.getElementById('element').innerHTML = newString;
jsFiddle here
The CSS-only way
If you don't want to use JavaScript or the JavaScript way seems to be long to you, there is no need for JavaScript at all. However, this method requires that the asterisk (*) is not written in the HTML markup. The following code will add a red asterisk before everything with class='required'
.required:before
{
color: red;
content: "* ";
}
jsFiddle here

RegExp: how to exclude matched groups from $N?

I've made a working regexp, but i think it's not the best use-case:
el = '<div style="color:red">123</div>';
el.replace(/(<div.*>)(\d+)(<\/div>)/g, '$1<b>$2</b>$3');
// expecting result: <div style="color:red"><b>123</b></div>
After googling i've found that (?: ... ) in regexps - means ignoring group match, thus:
el.replace(/(?:<div.*>)(\d+)(?:<\/div>)/g, '<b>$1</b>');
// returns <b>123</b>
but i need an expecting result from 1st example.
Is there a way to exclude 'em? just to write replace(/.../, '<b>$1</b>')?
This is just a little case for understanding how to exclude groups in regexp. And i know, what we can't parse HTML with regexp :)
So you want to get the same result while only using the replacement <b>$1</b>?
In your case just replace(/\d+/, '<b>$&</b>') would suffice.
But if you want to make sure there are div tags around the number, you could use lookarounds and \K like in the following expression. Except that JS does not support lookbehind nor \K, so you're out of luck, you have to use a capturing group for that in JS.
<div[^>]*>\K\d+(?=</div>)
There nothing wrong with a replacement value of '$1<b>$2</b>$3'. I would just change your regex to this:
el = '<div style="color:red">123</div>';
el.replace(/(<div[^>]*>)(\d+)(<\/div>)/g, '$1<b>$2</b>$3');
Changing how it matches the first div keeps the full match on the div tags, but makes sure it matches the minimum possible before the closing > of the first div tag rather than the maximum possible.
With your regex, you would not get what you wanted with this input string:
el = '<div style="color:red">123</div><div style="color:red">456</div>';
The problem with using something like:
el.replace(/\d+/, '<b>$&</b>')
is that doesn't work properly with things like this:
el = '<div style="margin-left: 10px">123</div>'
because it picks up the numbers inside the div tag.

Categories

Resources