I want to write an GnuPG extension for Google Chrome. So far, everything works as expected: If I detect ASCII armored crypt-text, I parse it with my extension and then replace it. (after password has been entered)
Gmail however litters the message body with an insane amount of tags, so my simple JS approach doesn't work anymore. Is there something which can select an certain amount of visible text, no matter how many tags are contained in it, and replace it with some other text? (the tags don't need to survive). ie I want to unencrypt the mailbody in place.
what do you need is something like this:
/<[^>]+>/g
this regexp will remove all tags, an leave plain text...
just gotta replace for nothing... something like this:
"<p>text <b>full</b> of <i>junk</i> and <u>unwanted</u> tags</p>".replace(/<[^>]+>/g, "");
...and about selecting an specific part you can use substring, I guess!
What I really needed to do was a little different:
expand my regex so it didn't care about tags:
var re = /-----[\s\S]+?-----[\s\S]+?-----[\s\S]+?-----/gm;
store all the matches, with tags
use the regex provided by gibatronic to remove tags and then further process the cleaned text using gpg
use body.innerHTML.replace() to replace the matches from 1) with the processed text from 3)
It works now, the only problem is it breaks Gmail. Site layout stays intact, but all buttons and links become defunct. Only solution is to reload the page. Gotta fix this :S
Related
I have a simple search engine on one of our older websites. This site is running IIS 6.0 on Windows Server 2003. The search functionality is provided by Microsoft Indexing Service.
You can see the search functionality on our website. (Just type in "speakers" and you will see some hits.
I would like to use the "FullHit" feature offered by the indexing service. When using this feature the Indexing service inserts the full hit results in between "begindetail" and "enddetail" on a target web page.
The problem that I have is that the documents that are being returned have HTML. This looks messy. (Just click on "Hit Locator Tool" in the search results above to see what I mean.)
I would like to create a DIV section such as ...
<DIV name="target">
begindetail
enddetail
</DIV>
Then, after the page is populated I would like to use javascript to strip out all of the HTML elements (but not the data) between the opening and closing DIV.
For example, <FONT color="magenta">Good Data</FONT> would be modified to only show Good Data.
I can also use Classic ASP if necessary.
Please let me know if you have any suggestions or know of any functions that I can add to the target page to accomplish this task.
Thanks in advance.
I inspected your webpage, and there definitely must be some logic errors in your ASP code. (1) Instead of something like <div></div> being passed to the browser, it is HTML entities for special characters, so it is being passed like <DIV> </DIV>, which is very ugly and is why it is rendering as text instead of HTML code. In your ASP code, you must not be parsing the search result text before passing it to the browser. (2) All of this improperly-formatted code is inserted after the first closing html tag, and then there are closing body and html tags after the improperly-formatted code, so somewhere in your ASP code, you are telling it to append the code to the end of the document, rather than insert it inside the original <body></body>.
If you want to decode the mixture of HTML entities, <br> tags, and text into rendered HTML, this JavaScript may work:
window.onload = function() {
var text = decodeHTMLEntities(document.body.innerText);
document.write(text);
}
function decodeHTMLEntities(text) {
var entities = [
['amp', '&'],
['apos', '\''],
['#x27', '\''],
['#x2F', '/'],
['#39', '\''],
['#47', '/'],
['lt', '<'],
['gt', '>'],
['nbsp', ' '],
['quot', '"']
];
for (var i = 0, max = entities.length; i < max; ++i)
text = text.replace(new RegExp('&'+entities[i][0]+';', 'g'), entities[i][1]);
return text;
}
jsFiddle: https://jsfiddle.net/6ohc1tkr/
But first things first, you need to fix your ASP code, or whatever you use to parse and then display the search results. That's what is causing the improper formatting and display of the HTML. Show us your back-end code and then we can help you.
This is what I used to accomplish what you are trying to do.
string-strip-html
It worked pretty well for me.
I now have the search feature working as expected. I would like to thank everyone for their insightful comments. This feedback helped me identify and fix the problem.
OS: Windows Server 2003
IIS: 6.0
Microsoft Index Server
The hit locator tool will only work properly for HTML pages. If you use this tool with a simple TXT file then the results will not be displayed correctly.
I know there are other questions on editable divs, but I couldn't find one specific to the Markdown-related issue I have.
User will be typing inside a ContentEditable div. And he may choose to do any number of Markdown-related things like code blocks, headers, and whatever.
I am having issues extracting the source properly and storing it into my database to be displayed again later by a standard Markdown parser. I have tried two ways:
$('.content').text()
In this method, the problem is that all the line breaks are stripped out and of course that is not okay.
$('.content').html()
In this method, I can get the line breaks working fine by using regex to replace <br\> with \n before inserting into database. But the browser also wraps things like ## Heading Here with divs, like this: <div>## Heading Here</div>. This is problematic for me because when I go to display this afterwards, I don't get the proper Markdown formatting.
What's the best (most simple and reliable) way to solve this problem as of 2015?
EDIT: Found a potential solution here: http://www.davidtong.me/innerhtml-innertext-textcontent-html-and-text/
if you check the documentation of jquery's .text() method,
The result of the .text() method is a string containing the combined text of all matched elements. (Due to variations in the HTML parsers in different browsers, the text returned may vary in newlines and other white space.)
so getting whitespaces is not guaranteed in all browsers.
try using the innerText property of the element.
document.getElementsByClassName('content')[0].innerText
this returns the text with all white spacing intact. But this is not cross browser compatible. It works in IE and Chrome, but not in Firefox.
the innerText equivalent for Firefox is textContent (link), but that strips out the whitespaces.
This is what I've been able to come up with using that link I posted above in my edit. It's in Coffeescript.
div = $('.content')[0]
if div.innerText
text = div.innerText
else
escapedText = div.innerHTML
.replace(/(?:\r\<br\>|\r|\<br\>)/g, '\n')
.replace(/(\<([^\>]+)\>)/gi, "")
text = _.unescape(escapedText)
Basically, I'm checking whether or not innerText works, and if it doesn't then we do this other thing where we:
Take the HTML, which has escaped text.
Replace all the <br> tags with line breaks.
Strip out any tags (escaped ones won't be stripped, i.e. the stuff the user types).
Unescape the escaped text.
When replacing things in my chat room it comes up in the box as the 'HTML Character Entities'. However, I want it to revert back and actually show the character typed in when it is then shown in the chat room. So I am using the following code to stop any html from being entered and damaging the chat room by replacing certain html character with there entities (I want to get one or two working before I look at the others I know there are many more.) ....
Javascript
var str1 = this.value.replace(/>/g, '<');
if (str1!=this.value) this.value=str1;
var str2 = this.value.replace(/</g, '>');
if (str2!=this.value) this.value=str2;
and then the following code then displays the text after it has been entered into the database etc. and on updating the chat box it uses the following to add in the the updated messages ...
Returned from php and then displayed through the following javascript
$('#chatroomarea').append($("<p>"+ data.text[i] +"</p>"));
I have messed around with this a few times changing it to val and using
.html(.append($("<p>"+ data.text[i] +"</p>")));
Etc. But I have had no luck. I'm not quite sure how to do this I just need the HTML Character Entities to actually show up back in there true Character instead of displaying something such as... '>'
This might be something I need to actually put within the replacing code where it will include code of it's own on replacing such as (this is just an example I'm not exactly sure on how I would write it) ....
var str1 = this.value.replace(/>/g, '.html(<)');
Any help on this would be much appreciated, Thank you.
$('#chatroomarea').append($("<xmp>"+ data.text[i] +"</xmp>"));
HTML xmp tag
The use is deprecated, but supported in most browsers.
Another option will be to use a styled textarea , To my knowledge these two are the tags that doesn't bother rendering html tags as it is.
I am playing around with creating an HTML-textarea based plain text editor to edit my scripts (using e.g. Mozilla Prism + a localhost install/ webserver). It works fine so far, but when I want to insert something at the cursor position, it gets slow in Firefox when there is a lot of text in the textarea (Chrome works fine). E.g. with 133k filled in the textarea it takes around 1 sec to perform inserting 4 spaces.
I already have and use elm.selectionStart and elm.selectionEnd. Based on these I then copy the text, manipulate it, and set the value back into the textarea -- perhaps that is what's causing the bottleneck (I'm using the similar approach as answered on this site before). Ideally, I would probably like to have something like elm.selectedText = 'foobar' but can't find this...
It doesn't necessarily need to be crossbrowser...
Can someone help?
According to this article on codemirror, using designMode is faster than using a textarea, because you can edit parts of the content instead of editing the whole text in one go.
There's an API that replaces the selected text: textarea.setRangeText('text').
Here's a demo:
const textarea = document.querySelector('textarea');
textarea.addEventListener('click', () => {
textarea.setRangeText('WOW');
});
<textarea rows="10" cols="40">Click anywhere or select any text in here. It will be replaced by WOW</textarea>
There's also document.execCommand('insertText') with undo support but it's not cross-browser. Try insert-text-textarea for a cross-browser solution.
I want to replace a string in HTML page using JavaScript but ignore it, if it is in an HTML tag, for example:
visit google search engine
you can search on google tatatata...
I want to replace google by <b>google</b>, but not here:
visit google search engine
you can search on <b>google</b> tatatata...
I tried with this one:
regex = new RegExp(">([^<]*)?(google)([^>]*)?<", 'i');
el.innerHTML = el.innerHTML.replace(regex,'>$1<b>$2</b>$3<');
but the problem: I got <b>google</b> inside the <a> tag:
visit <b>google</b> search engine
you can search on <b>google</b> tatatata...
How can fix this?
You'd be better using an html parser for this, rather than regex. I'm not sure it can be done 100% reliably.
You may or may not be able to do with with a regexp. It depends on how precisely you can define the conditions. Saying you want the string replaced except if it's in an HTML tag is not narrow enough, since everything on the page is presumably within some HTML tag (BODY if nothing else).
It would probably work better to traverse the DOM tree for this instead of trying to use a regexp on the HTML.
Parsing HTML with a regular expression is not going to be easy for anything other than trivial cases, since HTML isn't regular.
For more details see this Stackoverflow question (and answers).
I think you're all missing the question here...
When he says inside the tag, he means inside the opening tag, as in the <a href="google.com"> tag...This is something quite different than text, say, inside a <p> </p> tag pair or <body> </body>. While I don't have the answer yet, I'm struggling with this same problem and I know it has to be solvable using regex. Once I figure it out, i'll come back and post.
WORKAROUND
If You can't use a html parser or are quite confident about Your html structure try this:
do the "bad" changing
repeat replace (<[^>]*)(<[^>]+>) to $1 a few times (as much as You need)
It's a simple workaround, but works for me.
Cons?
Well... You have to do the replace twice for the case ... ...> as it removes only first unwanted tag from every tag on the page
[edit:]
SOLUTION
Why not use jQuery, put the html code into the page and do something like this:
$(containerOrSth).find('a').each(function(){
if($(this).children().length==0){
$(this).text($(this).text().replace('google','evil'));
}else{
//here You have to care about children tags, but You have to know where to expect them - before or after text. comment for more help
}
});
I'm using
regex = new RegExp("(?=[^>]*<)google", 'i');
you can't really do that, your "google" is always in some tag, either replace all or none
Well, since everything is part of a tag, your request makes no real sense. If it's just the <a /> tag, you might just check for that part. Mainly by making sure you don't have a tailing </a> tag before a fresh <a>
You can do that using REGEX, but filtering blocks like STYLE, SCRIPT and CDATA will need more work, and not implemented in the following solution.
Most of the answers state that 'your data is always in some tags' but they are missing the point, the data is always 'between' some tags, and you want to filter where it is 'in' a tag.
Note that tag characters in inline scripts will likely break this, so if they exist, they should be processed seperately with this method. Take a look at here :
complex html string.replace function
I can give you a hacky solution…
Pick a non printable character that’s not in your string…. Dup your buffer… now overwrite the tags in your dup buffer using the non printable character… perform regex to find position and length of match on dup buffer … Now you know where to perform replace in original buffer