regex not matching in findText() - javascript

I was trying to change color to some text in a document, and I required regexp.
So, I tried the function findText to search where my text is in the selected text, but I am having some troubles matching the regexps.
Look at this sample script
function onOpen() {
DocumentApp.getUi().createMenu('Test')
.addItem('Find regex in sel', 'findRegex'))
.addToUi();
}
function findRegex() {
var ui = DocumentApp.getUi();
var selection = DocumentApp.getActiveDocument().getSelection();
var txt = selection.getSelectedElements()[0].getElement().asText();
var inp = ui.prompt('Regex to search:').getResponseText();
var regex = new RegExp(inp);
ui.alert('Found in "' + txt.getText()
+ '"\n Re: ' + txt.findText(regex)
+ '\n In: ' + txt.findText(inp));
}
This prompts for something to search, then builds a regex out of it. Then both the regex and the original string are used to search in the selected text.
Well, I do not know what to do to get the regex matching: I am always getting null: if the text to be searched is "foobarbaz", and I input foo, only the plain string matches.
If instead I input /foo/, clearly nothing matches.
How should I use regexps to search using findText?
Consider that I have to "compose" regex, like /foobar/ + /\d+/, where foobar is the user-entered pattern.

Ok, I think to have found the crux: the regexp passed to findText is always a String object, even if it have a regexp inside.
I tried searching "fo+" in the text "fooo" and it matched correctly.

Related

Regex with multiple start and end characters that must be the same

I would like to be able to search for strings inside a special tag in a string in JavaScript. Strings in JavaScript can start with either " or ' character.
Here an example to illustrate what I want to do. My custom tag is called <my-tag. My regex is /('|")*?<my-tag>((.|\n)[^"']*?)<\/my-tag>*?('|")/g. I use this regex pattern on the following strings:
var a = '<my-tag>Hello World</my-tag>'; //is found as expected
var b = "<my-tag>Hello World" + '</my-tag>'; //is NOT found, this is good!
var c = "<my-tag>Hello World</my-tag>"; //is found as expected
var d = '<my-tag>something "special"</my-tag>'; //here the " char causes a problem
var e = "<my-tag>something 'special'</my-tag>"; //here the " char causes a problem
It works well with a and also c where it finds the tag with the containing text. It also does not find the text in b which is what I want. But in case d and e the tag with content is not found due to the occurrence of the " and ' character. What I want is a regex where inside the tag " is allowed if the string is start with ', and vice versa.
Is it possible to achieve this with one regex, or is the only thing I can do is to work with two separate regex expressions like
/(")*?<my-tag>((.|\n)[^']*?)<\/my-tag>*?(")/g and /(')*?<my-tag>((.|\n)[^"]*?)<\/my-tag>*?(')/g ?
It's not pretty, but I think this would work:
/("<my-tag>((.|\n)[^"]*?)<\/my-tag>"|'<my-tag>((.|\n)[^']*?)<\/my-tag>')/g
You should be able to use de match from the first match ('|") and reuse it for the second match. Something like the following:
/('|")<my-tag>.*?<\/my-tag>\1/g
This should make sure to match the same character at the beginning and the end.
But you really shouldn't use regex for parsing HTML.

Regex replace text outside html tag

I'm working on an autocomplete component that highlights all ocurrences of searched text. What I do is explode the input text by words, and wrap every ocurrence of those words into a
My code looks like this
inputText = 'marriott st';
text = "Marriott east side";
textSearch = inputText.split(' ');
for (var i in textSearch) {
var regexSearch = new RegExp('(?!<\/?strong>)' + textSearch[i]), "i");
var textReplaced = regexSearch.exec(text);
text = text.replace(regexSearch, '< strong>' + textReplaced + '< /strong>');
}
For example, given the result: "marriott east side"
And the input text: "marriott st"
I should get
<strong>marriot< /strong > ea < strong >st < /strong > side
And i'm getting
<<strong>st</strong>rong>marriot</<strong>st </strong>rong>ea<<strong>st</strong> rong>s</strong> side
Any ideas how can I improve my regex, in order to avoid ocurrences inside the html tags? Thanks
/(?!<\/?strong>)st/
I would process the string in one pass. You can create one regular expression out of the search string:
var search_pattern = '(' + inputText.replace(/\s+/g, '|') + ')';
// `search_pattern` is now `(marriot|st)`
text = text.replace(RegExp(search_pattern, 'gi'), '<strong>$1</strong>');
DEMO
You could even split the search string first, sort the words by length and combine them, to give a higher precedence to longer matches.
You definitely should escape special regex characters inside the string: How to escape regular expression special characters using javascript?.
Before each search, I suggest getting (or saving) the original search string to work on each time. For example, in your current case that means you could replace all '<strong>' and '</strong>' tags with ''. This will help keep your regEx simple, especially if you decide to add other html tags and formatting in the future.

Converting XML tags to uppercase using a javascript regex

I'm trying to convert XML tags to uppercase, while preserving the case of attributes and text. So for example
<Mytag Category="Parent">Value1</Mytag>
Becomes
<MYTAG Category="Parent">Value1</MYTAG>
I have a regex which matches the XML tags correctly, but the upperCase function does not seem to be working.
myXmlElement.replace(/<(\/)*([a-zA-Z_0-9]+)([^>]*)>/g,"<$1" + "$2".toUpperCase() + "$3>")
I've also tried using String.prototype.toUpperCase.apply("$2"), as well as passing a function as the replace argument
myXmlElement.replace(/<[\/]*([a-zA-Z_0-9]+)[^>]*>/g,
function($1,$2,$3){return <$1 + $2.toUpperCase() + $3>})
But this doesn't work, as $1,$2,$3 appear to refer to the entire matching elements ($1 = , $2 = )
I'm sure there is something trivial I am overlooking here, can anybody help out?
If you want to match the characters before and after your tag name, the need to be put into matching braces within the pattern:
var pattern = /<([\/]*)([a-zA-Z_0-9]+)([^>]*)>/g
var newTag = myElement.replace(pattern, function(full, before, tag, after) {
return "<" before + tag.toUpperCase() + after + ">"
})
The replacement function will take the full matching expression as first argument. That's why you simply may ignore it.
After that any matching brace of your pattern will be passed as a parameter.

Preserving case / capitalization with JavaScript replace method

I'm continuing work on a search term suggestion tool using Jquery UI. I am now working on displaying the results with the search term pattern in bold. I have implemented this functionality through patching the Autocomplete's _renderItem method. The problem I have now is that the replaced characters have the same case as those typed by the user in the input (e.g. if the user typed an "A" and the returned result was "America", the replaced text would be AmericA. Here's the code:
var exp = new RegExp(this.term, "gi") ;
var rep = item.label.replace( exp, "<span style='font-weight:bold;color:Black;'>"
+ this.term + "</span>");
As always, thanks in advance for any help.
You can use:
var rep = item.label.replace(exp,
"<span style='font-weight:bold;color:Black;'>$&</span>");
When replacing a string, $& means "the whole match", so you don't have to repeat the search term (in some cases you don't know it). In other flavors, you may use $0 or \0.
Also, remember to escape special characters in this.term.
You can add your expression in a group by encapsulating them in parentheses
var exp = new RegExp("(" + this.term + ")", "gi") ;
var rep = item.label.replace( exp, "<span style='font-weight:bold'>$1</span>");
You can the refere to that group using $1.
See here for more details about backreferences.

Javascript regular expressions

Having a small problem for a quick "Search and Highlight" script that I'm working on. I'm using regular expressions because I'd like to do the searching all on client side, after the document has loaded. My search/highlight function goes like this:
function highlight(word, colour, container) {
var regex = new RegExp("(>[^<]*?)(" + word + ")", "ig");
var replace = "$1<span name='searchTerm' style='background-color: " + colour + "'>$2</span>";
if (regex.exec(container.innerHTML)) {
container.innerHTML = container.innerHTML.replace(regex, replace);
return true;
}
return false;
}
word is the word to search for, colour is the colour to highlight it and container is the element to search in.
Consider an element that contained this:
<ul>
<li>Set the setting to the correct setting.</li>
</ul>
Say I passed the word "set" to the highlight function. In it's current state, it only finds the first instance of set due to lazy repitition.
So what if I change the regex to this:
var regex = new RegExp("(>[^<]*?)?(" + word + ")", "ig");
This now works great, it highlights all instances of the string "set". But if I pass the search word "li" then it will replace the text inside the tags!
Is there a quick fix for this regular expression to get the behaviour I want? I need it to replace all instances of the search string but not those found as part of a tag. I'd like to keep it client-side using regex.
Thanks!
You shouldn't be using regex to parse HTML. Walk the DOM tree properly and do a search and replace on pure text.
By the way there's a jQuery plugin that does what you want; you could use it or look at it to get an idea on how to do it:
http://johannburkard.de/blog/programming/javascript/highlight-javascript-text-higlighting-jquery-plugin.html

Categories

Resources