How to use replace in NODEJS? - javascript

how would use replace in node js?
HTML:
<p> https://www.abc.co/ Lorem ipsum dolor sit amet, praesent justo sem suscipit dolor,https://www.abc.co/ maecenas pellentesque ligula vestibulum in vivamus eu </p>
I try:
const variableabc = document.getElementsByTagName('p');
const result = variableabc.replace('/(https?:\/\/[^\s]+)/g','$1');
Expected output:
https://www.abc.co/
https://www.abc.co/

Related

Get text from <a> tags in text using javascript

I'm getting html content from API.
Sample message could look like below
Lorem ipsum dolor sit amet example.com
Pellentesque porta ligula et justo condimentum, nec tincidunt libero tempor.
Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor google.com
I need my message to look line below, plain text with
Lorem ipsum dolor sit amet example.com
Pellentesque porta ligula et justo condimentum, nec tincidunt libero tempor.
Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor google.com
I've tried to use regex with groups, js code below
const r = /^<a href.*>(.*?)<\/a>$/gm
let link = `google.com test test.com`
let result
while((result = r.exec(link)) !== null) {
const match = result[1];
link = link.replace(r, match)
}
console.log(link)
I also tried simple code like below
const r = /^<a href.*>(.*?)<\/a>$/gm
let link = `google.com test test.com`
link = link.replaceAll(r, "$1")
console.log(link)
Unfortunately, in both cases after running my code console.log prints "test.com", not whole message.
Are there any better solutions?
You do not need to do it with a regular expression. You can use DOM to remove the links and any other HTML tags.
const htmlString = `Lorem ipsum dolor sit amet example.com
Pellentesque porta ligula et justo condimentum, nec tincidunt libero tempor.
Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor google.com`
const parser = new DOMParser();
const doc = parser.parseFromString(htmlString, "text/html");
const text = doc.body.textContent;
console.log(text);
If you just want to remove links and leave other HTML tags that is also possible.
const htmlString = `Lorem ipsum dolor sit amet example.com
Pellentesque <b>porta</b> ligula <em>et justo</em> condimentum, nec tincidunt libero tempor.
Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor google.com`
const parser = new DOMParser();
const doc = parser.parseFromString(htmlString, "text/html");
const anchors = doc.body.querySelectorAll("a");
anchors.forEach(node => node.replaceWith(...node.childNodes));
const htmlWithAnchorsRemoved = doc.body.innerHTML;
console.log(htmlWithAnchorsRemoved);
The pattern for removing all anchor tags from an text would be something like this:
<a.*?</a>
with the global tag.
It will specifically search for all the anchor tags in your string and will match it globally (i.e. all over the text which you are using). You can use this regex with replaceAll function like this:
let value = string.replaceAll("<a[^>]*>(.*?)</a>", "");
You can test the regex here
I have tested the given string and the output is as follows:
Hope this helps. Let me know if you have any queries.
Regards
Using regexp to parse html is never a good path to follow. Maybe the following will help you?
const html=`Lorem ipsum dolor sit amet example.com
Pellentesque porta ligula et justo condimentum, nec tincidunt libero tempor.
Pellentesque nunc justo, tincidunt sit amet suscipit sit amet, auctor google.com`;
function html2text(html){
const o=document.createElement("div");
o.innerHTML=html;
return o.textContent;
}
console.log(html2text(html));
Thx for all answers. Solution from #bobble-bubble comment works for me
Code snippet below
const replaceHTML = (text) => {
const rLink = /<\/?a\b[^><]*>/gi
text = text.replace(rLink, "")
return text
}
console.log(replaceHTML(`google.com`))
temp = document.createElement('template');
temp.innerHTML = text;
temp.content.querySelectorAll('a').forEach(e=>{e.replaceWith(e.href)});
console.log(temp.innerHTML);

occurrences of any letters / javascript

I am playing around with dom manipulation and js and I am running into a problem.
Let's say I have <p id = "description"> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras facilisis, felis et sagittis eleifend, justo ante maximus augue, id porta massa elit a ligula. </p>
and I want to write a function that counts a number of repeated letters in a paragraph. I figured out how to do that with a string but not with paragraphs.
function recurringLetters() {
var myParagraph = document.getElementById("description").innerHTML;
}
}
Any thoughts?
This is how far I have gotten.
Hope This Answers your Question.
Just Copy & Paste into an HTML file for testing.
function WORD_COUNT( _THIS_ , _WORD_ ){
var TEMP = _THIS_.innerHTML;
var COUNT= 0;
// IF TEMP search result finds nothing, return is -1, so -1 is our stopping point
while(TEMP.search(_WORD_)>-1){
TEMP = TEMP.replace(_WORD_,'');
COUNT++;
document.getElementById('output').innerHTML=COUNT;
}}
<p onmouseover=WORD_COUNT(this,'us');>PUT YOUR MOUSE OVER ME.<BR> Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras facilisis, felis et sagittis eleifend, justo ante maximus augue, id porta massa elit a ligula. </p>
<p ID=output>Output area<p>

Node.js fs cheerio read and write multiple files

I have the following code adapted from here that I am using with Node.js and Cheerio to read html files and split large source files into small chunks. The code is working well for a single file.
Now I need to read multiple large html files and split them one after the other and output the resulting files in a folder.
How can I read and write every file in the folder and then split it?
Here is the code:
var cheerio = require('cheerio'),
fs = require('fs');
fs.readFile('./sourceHtml2/testone.html', 'utf8', dataLoaded);
function dataLoaded(err, data) {
$ = cheerio.load(data);
$('#toplevel > div').each(function (i, elem) {
var id = $(elem).attr('id'),
filename = id + '.html',
content = $.html(elem);
fs.writeFile('./output2/' + filename, content, function (err) {
console.log('Written html to ' + filename);
});
});
}
Here is my sample source file
<!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Lorem Ipsum</title>
</head>
<body>
<div id="toplevel">
<div id="1-1">
<h1>HTML Ipsum Presents One</h1>
<p>
<strong>Pellentesque habitant morbi tristique</strong>senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper.
<h2>Header Level 2</h2>
<ol>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ol>
<h3>Header Level 3</h3>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ul>
</div>
<div id="1-2">
<h1>HTML Ipsum Presents Two</h1>
<p>
<strong>Pellentesque habitant morbi tristique</strong>senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper.
<h2>Header Level 2</h2>
<ol>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ol>
<blockquote>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus magna. Cras in mi at felis aliquet congue. Ut a est eget ligula molestie gravida. Curabitur massa. Donec eleifend, libero at sagittis mollis, tellus est malesuada tellus,
at luctus turpis elit sit amet quam. Vivamus pretium ornare est.</p>
</blockquote>
<h3>Header Level 3</h3>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ul>
</div>
<div id="1-3">
<h1>HTML Ipsum Presents Three</h1>
<p>
<strong>Pellentesque habitant morbi tristique</strong>senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper.
<h2>Header Level 2</h2>
<ol>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ol>
<blockquote>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus magna. Cras in mi at felis aliquet congue. Ut a est eget ligula molestie gravida. Curabitur massa. Donec eleifend, libero at sagittis mollis, tellus est malesuada tellus,
at luctus turpis elit sit amet quam. Vivamus pretium ornare est.</p>
</blockquote>
<h3>Header Level 3</h3>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ul>
</div>
</div>
</body>
</html>
Your help will be greatly appreciated.
You need to process the files in the input directory as an array and you'll also want to prevent filename collisions in the output folder.
The code provided below provides a solution to both issues. HTML files (.htm and .html) are read from the 'input' subfolder and the generated files written to the 'output' subfolder.
var cheerio = require('cheerio'),
fs = require('fs');
// process files found in the 'input' folder
fs.readdir('./input', 'utf8', findHtmlFiles);
function findHtmlFiles(err, files) {
if (files.length) {
files.forEach(function (fullFilename) {
var pattern = /\.[0-9a-z]{1,5}$/i;
var ext = (fullFilename).match(pattern);
// only process '.htm' and '.html' files
if (ext[0] == '.htm' || ext[0] == '.html') {
fs.readFile('./input/' + fullFilename, 'utf8', function (err, data) {
if (err)
throw err
else {
// add the file name to prevent collisions
// in the output folder
var fileData = {
file: fullFilename.slice(0, (ext[0].length * -1)),
data: data
};
dataLoaded(null, fileData);
}
});
}
});
}
}
function dataLoaded(err, fd) {
$ = cheerio.load(fd.data);
$('#toplevel > div').each(function (i, elem) {
var id = $(elem).attr('id'),
filename = fd.file + '_' + id + '.html',
content = $.html(elem);
fs.writeFile('./output/' + filename, content, function (err) {
console.log('Written html to ' + filename);
});
});
}
Sample console output:
Written html to testone_1-1.html
Written html to testone_1-2.html
Written html to testone_1-3.html
Written html to testtwo_1-1.html
Written html to testtwo_1-2.html
Written html to testtwo_1-3.html

Optimizing javascript (innerhtml, insert elements to textnodes)

So I'm making a firefox addon to highlight words and reg. expressions and I'm having some troubles optimizing it.
This was the 1st attempt:
function highlight (searchText, replacement) {
var walker = document.createTreeWalker(document.body);
while(walker.nextNode()){
if(walker.currentNode.nodeType === 3 && searchText.test(walker.currentNode.nodeValue)){
var html = walker.currentNode.data.replace(searchText, replacement);
var wrap = document.createElement('div');
var frag = document.createDocumentFragment();
wrap.innerHTML = html;
while (wrap.firstChild) {
frag.appendChild(wrap.firstChild);
}
walker.currentNode.parentNode.replaceChild(frag,walker.currentNode);
}
}
}
But the walker.currentNode.parentNode.replaceChild(frag,walker.currentNode); line replaces the current node so the while(walker.nextNode()) stopped working.
I've solved it like this but i was looking for a cleaner solution:
function highlight (searchText, replacement) {
var walker = document.createTreeWalker(document.body);
var nextnode=true;
while(nextnode){
if(walker.currentNode.nodeType === 3 && searchText.test(walker.currentNode.nodeValue)){
//1~2 ms
var html = walker.currentNode.data.replace(searchText, replacement);
//~11-12 ms
var wrap = document.createElement('div');
var frag = document.createDocumentFragment();
//~11-12 ms
wrap.innerHTML = html;
//~36-37 ms
while (wrap.firstChild) {
frag.appendChild(wrap.firstChild);
}
//73~74 ms
var nodeToReplace=walker.currentNode;
nextnode=walker.nextNode();
nodeToReplace.parentNode.replaceChild(frag,nodeToReplace);
//83~85 ms
}else{
nextnode=walker.nextNode();
}
}
}
Also I'm trying to improve performance so I've made some test to look for the slower parts of the code (I've tested using a 1.64 mb lorem ipsum) so here are my questions:
Is there a faster alternative for the wrap.innerHTML = html; that is adding 25 ms to the code?
I'm pretty sure that this can't be optimized while (wrap.firstChild) {frag.appendChild(wrap.firstChild);} but it adds 37 ms so suggestions are welcome.
Feel free to use this code the snippet is a working example of the code and shows how to use the it.
Edited to show latest changes, you may need to edit the excludes to be less restrictive.
var regexp = /lorem|amet/gi;
highlight (regexp,'<span style="Background-color:#33FF33">$&</span>');
function highlight (searchText, replacement) {
var excludes = 'html,head,style,title,link,script,noscript,object,iframe,canvas,applet';
var wrap = document.createElement('div');
var frag = document.createDocumentFragment();
var walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);
var nextnode=true;
while(nextnode){
if(searchText.test(walker.currentNode.nodeValue)
&& (excludes + ',').indexOf(walker.currentNode.parentNode.nodeName.toLowerCase() + ',') === -1
){
var html = walker.currentNode.data.replace(searchText, replacement);
wrap.innerHTML = html;
while (wrap.firstChild) {
frag.appendChild(wrap.firstChild);
}
var nodeToReplace=walker.currentNode;
nextnode=walker.nextNode();
nodeToReplace.parentNode.replaceChild(frag,nodeToReplace);
}else{
nextnode=walker.nextNode();
}
}
}
<h1>HTML Ipsum Presents</h1>
<p><strong>Pellentesque habitant morbi tristique</strong> senectus et netus et malesuada fames ac turpis egestas. Vestibulum tortor quam, feugiat vitae, ultricies eget, tempor sit amet, ante. Donec eu libero sit amet quam egestas semper. <em>Aenean ultricies mi vitae est.</em> Mauris
placerat eleifend leo. Quisque sit amet est et sapien ullamcorper pharetra. Vestibulum erat wisi, condimentum sed, <code>commodo vitae</code>, ornare sit amet, wisi. Aenean fermentum, elit eget tincidunt condimentum, eros ipsum rutrum orci, sagittis
tempus lacus enim ac dui. Donec non enim in turpis pulvinar facilisis. Ut felis.</p>
<h2>Header Level 2</h2>
<ol>
<li>Lorem ipsum dolor sit amet, consectetuer lorem adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ol>
<blockquote>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Vivamus magna. Cras in mi at felis aliquet congue. Ut a est eget ligula molestie gravida. Curabitur massa. Donec eleifend, libero at sagittis mollis, tellus est malesuada tellus, at luctus turpis
elit sit amet quam. Vivamus pretium ornare est.</p>
</blockquote>
<h3>Header Level 3</h3>
<ul>
<li>Lorem ipsum dolor sit amet, consectetuer lorem adipiscing elit.</li>
<li>Aliquam tincidunt mauris eu risus.</li>
</ul>

Wrap unwrapped parts of HTML string with Javascript

I have string in variable (Javascript/jQuery) containing content like this:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
<p>Morbi a faucibus magna. Donec lacinia, leo eget</p>
Pellentesque aliquet luctus lobortis.
<p>Morbi a faucibus magna. Donec lacinia, leo eget</p>
massa iaculis leo, nec auctor
how i can wrap all unwrapped content in p tags?
So that string looks like:
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.</p>
<p>Morbi a faucibus magna. Donec lacinia, leo eget</p>
<p>Pellentesque aliquet luctus lobortis.</p>
<p>Morbi a faucibus magna. Donec lacinia, leo eget</p>
<p>massa iaculis leo, nec auctor</p>
Thank you!
Something like
var str = 'your string';
var div = $('<div />', {html: str});
div.contents().filter(function() {
return this.nodeType === 3;
}).wrap('<p />');
var new_str = div.html();
FIDDLE
Using a new jQuery object to parse the string as HTML, and then filtering out unwrapped textnodes, and wrapping them with paragraphs, and outputting the changed HTML as the new string.
Here's a jQuery-free way to do it using only string methods (no DOM required.)
var text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit.<p>Morbi a faucibus magna. Donec lacinia, leo eget</p>Pellentesque aliquet luctus lobortis.<p>Morbi a faucibus magna. Donec lacinia, leo eget</p>massa iaculis leo, nec auctor",
unwrapped = text.split(/<p>\b[^>]*<\/p>/g), //regex to split on all p wrapped text
i;
for (i=0; i < unwrapped.length; i++) {
text = text.replace(unwrapped[i], '<p>' + unwrapped[i] + '</p>');
};

Categories

Resources