Using regexes to modify the text of html (with javascript)

I want to modify the text in a html file using javascript in an android webview.
Essentially, I want to do what android Linkify does to text, but I don't want to do it with java code, because I feel like that might delay the webview rendering the html (if I parse the text before sending it to the webview).
So, for example a piece of html like this:
<body> <!--these two shouldn't be linked--> <!--these two shouldn't be linked-->
<p></p> <!--this should be linked-->
<p>102-232-2312 2032-122-332 </p><!-- should be linked as numbers-->
Should become this:
<p>102-232-2312 <a href="tel:2032-122-332>2032-122-332</a> </p>
I already have the regexes to convert numbers and email ids to links, and they're working well enough. What I want to ensure is that I don't link anything that's already within tags. I've removed anchor tags, so they're not an issue, but I also need to avoid linking things like this:
<div width="1000"> <!-- Don't want this '1000' to be linked (but I do want other 4 digit numbers to be)-->
So for example if my regex for links is:
var replacePattern1 = /((https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim
How do I make sure that it's not within < and >? (Answers using javascript would be appreciated, but if you feel like this is a stupid way of doing it, please let me know about alternatives).
If you're answering with javascript, this question can essentially be shortened to:
How do I write a regex in javascript to search for patterns which are not surrounded by '<' '>' tags

So if you use JS than mean is client side, your DOM page have free access of all objects of your page coef events.
May be in this step you dont need to use a regex just using DOM.
jquery lib can easy update DOM object.
in your step you want only tag.
So i suggest :
//using jquery
var paras = document.getElementsByTagName("p");
for(p in paras){

As i tell you the deal is manipulate the DOM so example with you step dunno if exactly what you try to get :
var paras = document.getElementsByTagName("p");
var hrefs = [];
//what you want to replace in the loop of p
var json_urls = {"links":["http://", "tel:"]};
for(p in paras){
//copy of text content of your p
var text_cp = paras[p].textContent;
//delete the p[i] content
paras[p].textContent = "";
//create element dom a
hrefs[p] = document.createElement("a");
//i add attribute id with some affectation unique
hrefs[p].id = "_" + p;
//add attribute href to a with some affectation replace + content
hrefs[p].href = json_urls.links[p] + text_cp;
hrefs[p].textContent = text_cp;


Replace non-code text on webpage

I searched through a bunch of related questions that help with replacing site innerHTML using JavaScript, but most reply on targetting the ID or Class of the text. However, my can be either inside a span or td tag, possibly elsewhere. I finally was able to gather a few resources to make the following code work:
$("body").children().each(function() {
The problem with the above code is that I randomly see some code artifacts or other issues on the loaded page. I think it has something to do with there being multiple "$" part of the website code and the above script is converting it to %, hence breaking things.using JavaScript or Jquery
Is there any way to modify the code (JavaScript/jQuery) so that it does not affect code elements and only replaces the visible text (i.e. >Here<)?
It looks like the reason I'm getting a conflict with some other code is that of this error "Uncaught TypeError: Cannot read property 'innerText' of undefined". So I'm guessing there are some elements that don't have innerText (even though they don't meet the regex criteria) and it breaks other inline script code.
Is there anything I can add or modify the code with to not try the .replace if it doesn't meet the regex expression or to not replace if it's undefined?
Wholesale regex modifications to the DOM are a little dangerous; it's best to limit your work to only the DOM nodes you're certain you need to check. In this case, you want text nodes only (the visible parts of the document.)
This answer gives a convenient way to select all text nodes contained within a given element. Then you can iterate through that list and replace nodes based on your regex, without having to worry about accidentally modifying the surrounding HTML tags or attributes:
var getTextNodesIn = function(el) {
return $(el)
.find(":not(iframe, script)") // skip <script> and <iframe> tags
.filter(function() {
return this.nodeType == 3; // text nodes only
getTextNodesIn($('#foo')).each(function() {
var txt = $(this).text().trim(); // trimming surrounding whitespace
txt = txt.replace(/^\$\d$/g,"%"); // your regex
console.log($('#foo').html()); // tags and attributes were not changed
<script src=""></script>
<div id="foo"> Some sample data, including bits that a naive regex would trip up on:
foo<span data-attr="$1">bar<i>$1</i>$12</span><div>baz</div>
<!-- $1 -->
// embedded script tag:
console.log("<b>$1</b>"); // won't be replaced
I did it solved it slightly differently and test each value against regex before attempting to replace it:
var regEx = new RegExp(/^\$\d$/);
var allElements = document.querySelectorAll("*");
for (var i = 0; i < allElements.length; i++){
var allElementsText = allElements[i].innerText;
var regExTest = regEx.test(allElementsText);
if (regExTest=== true) {
var newText = allElementsText.replace(regEx, '%');
Does anyone see any potential issues with this?
One issue I found is that it does not work if part of the page refreshes after the page has loaded. Is there any way to have it re-run the script when new content is generated on page?

Browser automatically closing list tags

I'm trying to make a html list using jQuery but browsers are prematurely closing my ul tag and li tags.
As a simple example, I tried to make this list:
How are you?
So I wrote the code:
$("#test").append("How are you?");
But the code resulted in:
How are you?
I know that instead I could simply write:
$("#test").append("<ul><li>Hello</li><li>How are you?</li></ul>");
but my project requires the code to be on several lines.
Here's a jsfiddle:
try like this
var ul = $("<ul></ul>");
ul.append('<li>how r u</li>');
You should avoid appending just parts, cause jquery will close them automaticly.
So there are two easy solution which come in my mind:
The first would be to build the append string:
var try = "";
try = try + "<ul>";
try = try + "<li>";
try = try + "Hello";
//And so on
Or something like this:
You can get further(and avoid many ids) by selecting the to append child with nth:child. Or like in the other answer by saving it in a variable.

Get dynamically created names of dynamical links

I need to create a link for a set of documents. They are created dynamically, thus the names are also different, f.ex. Test, Test2, so one.
I need to show the link like "Document TestN", where links changed according to the current document. I can now create the links by a href="id" onklick=bla+bla+bla", but the name does not change. Instead of 'Dashboard' I need to get 'Dashboard of "ConcreteSite"', where I can get names by pageHeader:
<script language="javascript" type="text/javascript">
var siteNameAsParam =;
var scrt_var = siteNameAsParam.split("siteName=")[1];
<p>You are here: Dashboard </p>
Based on your code I think this is what you're after but more detail on what you're trying to do would be great.
<p>You are here: Dashboard </p>
<p>You are here: Dashboard </p>
document.addEventListener('DOMContentLoaded', function(event) {
var links = document.getElementsByTagName("a");
for (i = 0; i < links.length; i++) {
var siteNameAsParam =;
var scrt_var = siteNameAsParam.split("siteName=")[1];
links[i].href = links[i].href + '?siteName=' + scrt_var;
links[i].innerText += ' fred';
}, false);
This does the following:
On page load gets all links on the page
loops through the links and grabs the query strings from the url
splits the query string on siteName
sets each link url to add the query string
updates the links text to append the query string (or undefined if it doesn't exist (see note below)
Note: your code implies you already have a query string in the url of siteName=SITENAMEHERE. Also, depending what you're trying to achieve, there are probably much better approaches. This I hope answers your current question but I think you should review how other achieve what you're after.
Here is a jsfiddle with a different working sample of what I think you might want. Hopefully it helps. there are comments in the fiddle. I think you want to try doing more when the link is created (set the event listener there, update the text as desired, etc.) instead of on the click event.

get snippet of html text without creating a DOM?

Given a html, I'd like to get first 100 characters of text (content without the markups)
I could create a jquery object with the html and use .text().
But the problem is that browsers may load all the images in the html.
So I wonder if there's a way to extract text snippet from html without building a DOM.
given a html (just a string of html, not part of DOM yet)
<p>my lord</p><img src="some_url"><br>I'm overloaded
I could do $('<div/>').append(html).text().substr(0, 5); to get 5 characters.
But the img is downloaded by browser, and I don't want that.
var s = "<p>my lord</p><img src=\"some_url\"><br>I'm overloaded"
s = s.replace(/<[^>]+>/g,'').substr(0, 100);
You could remove the image elements and then load it to the dom
Something like
var html = "<p>my lord</p><img src="some_url"><br>I'm overloaded";
html = html.replace(/<img[^>]*>/g,"");
var firstFive = $('<div/>').append(html).text().substr(0, 5);

Is there a cross-browser css way to decorate only last word in a span block?

I need to show user's list which should look like in the example below:
Helen Burns Edward
Fairfax Rochester Bertha
Antoinetta Mason Adèle
Is there a way to achieve this without using javascript? Each row should be one span, i.e. <span>Helen</span><span>Burns</span> is not acceptable.
No, there is not. You are going to have to use some form of scripting to accomplish this if you don't want your last names to be in their own tags.
To the browser, each row is an element, and the "words" themselves have no separate meaning as far as CSS is concerned. You must place the words in different tags in order to do what you want.
The browser does not automagically know what part of the name is the last name so you have to add extra markup to achieve what you want.
There's no solution for common used browser for know using only CSS. You should use javascript or HTML + CSS as you already made.
without pure css this is impossible (as you don't want a separation in the markup)...
<span>Monty Burns</span><br />
<span>Bart Simpson</span>
<script type="text/javascript">
$(document).ready(function() {
var spans = $('span');
spans.each(function(index, element) {
var span = $(element);
var spanText = span.text();
var spanTextArray = spanText.split(' ');
var spanTextArrayLength = spanTextArray.length;
var lastName = spanTextArray[spanTextArrayLength -1];
var firstName = spanTextArray.join(' ');
var spanLastName = $('<span/>');
spanLastName.css('font-weight', 'bold');
spanLastName.css('margin-left', '5px');
working demo.
edit: if you do not want an extra span-tag in there, just change
var spanLastName = $('<span/>');
spanLastName.css('font-weight', 'bold');
var spanLastName = $('<strong/>');
I don't think this is possible with CSS because your example doesn't show any order:
Helen Burns
Edward Fairfax Rochester
Bertha Antoinetta Mason
Adèle Varens
I don't know why you might desire not to have an extra tag surrounding the last name (as other answers and comments have suggested), but if you are looking simply for minimalist mark-up, this works (no span even used):
First Middle <strong>Last</strong>
First Middle <strong>Last</strong>
First Middle <strong>Last</strong>
strong:after {content:' '; display: block;}
Which creates "rows" with your desired styling without anything more than a single tag (which could be a span rather than a strong if you desired).
Edit: Of course, this will not work for IE7 or under.

