Opening/Closing tag formatting mismatch - javascript

When receiving HTML from backend encountering that escape characters has whitespace, thus not allows to close tag and show text, like show below in the example.
<p&#6 2;Please check for BAC and get customer to confirm via secure phone line the account change. "customer has wrote wrong bankname" does not confirm the account change.</p>
Whitespace can be in various place of the tag <p&#62 ; <p& #62; <p > and also appear for the opening tags.
Would there be possibility to avoid whitespace, that would allow to create tag and create a formatting solution?

As #kissu suggested, it seems the cleanest solution would be to fix it on back-end. Especially, as the closing tag is fine.
If that's not an option, you can use JavaScript to remove the extra space, e.g. with something like this:
const response = '<p&#6 2;Please check for BAC...</p>';
const fixedResponse = response.replace(/(<.*)\s(.*;)/, "$1$2");
This script finds characters "<" and ";", and removes a space between them.
If you have multiple issues like this in the response, you might need to update the regex to make it more robust.


Hide characters in a string

I was wondering if there was a way to hide a string of characters in a string. I found Control Characters which work for hidding those characters:
>var hidden = "\26"
but what i would like to escape a string of characters and have them not show up like this:
>var hidden = "\26cantseethis\26"
Is there any such method using ASCII characters?
What I am trying to do is give state to a google doc. I have a workflow type google app script attached to a form that creates a doc. the doc is immediatly viewable by the administrator so i dont want to put a bunch of special strings like &UserOneAgreed in the doc, mostly because of the potential of someone going in and modifying that string. I have another script that will go in and modify the related text once some user input is gathered.
You cannot do that. The control character can be used for character only, so you will need to escape each character separately to hide them.

Converting special characters in c# string to html special characters

I am using .Net:
fulltext = File.ReadAllText(#location);
to read text anyfile content at given locatin.
I got result as:
fulltext="# vdk10.syx\t1.1 - 3/15/94\r# #(#)Copyright (C) 1987-1993 Verity, Inc.\r#\r# Synonym-list Database Descriptor\r#\r$control: 1\rdescriptor:\r{\r data-table: syu\r {\r worm:\tTHDSTAMP\t\tdate\r worm:\tQPARSER\t\t\ttext\r\t/_hexdata = yes\r varwidth:\tWORD\t\tsyv\r fixwidth:\tEXPLEN\t\t\t2 unsigned-integer\r varwidth:\tEXPLIST\t\tsyx\r\t/_hexdata = yes\r }\r data-table: syw\r {\r varwidth:\tSYNONYMS\tsyz\r }\r}\r\r ";
Now, I want this fulltext to be displayed in html page so that special characters are recognized in html properly. For examples: \r should be replaced by line break tag
so that they are properly formatted in html page display.
Is there any .net class to do this? I am looking for universal method since i am reading file and I can have any special characters. Thanks in advance for help or just direction.
You're trying to solve two problems:
Ensure special characters are properly encoded
Pretty-print your text
Solve them in this order:
First, encode the text, by importing the System.Web namespace and using HttpUtility (asked on StackOverflow). Use the result in step 2.
Pretty-printing is trickier, depending on the amount of pretty-printing that you want. Here are a few approaches, in increasing order of difficulty:
Put the text in a pre element. This should preserve newlines, tabs, spaces. You can still adjust the font used using CSS if you first slap a CSS class on the pre.
Replace all \r, \r\n and remaining \n with <br/>.
Study the structure of your text, parse it according to this structure, and provide specific tags in specific contexts. For example, the tab characters in your example may be indicative of a list of items. HTML provides the ol and ul elements for lists. Similarly, consecutive line breaks may indicate paragraphs, for which HTML provides the well known p element.
Thanks Everyone here for your valuable comment. I solved my formatting problem in client side with following code.
document.getElementById('textView').innerText = fulltext;
Here textview is the div where i want to display my fulltext . I don't think i need to replace special characters in string fulltext. I output as shown in the figure.

whay backaward slash in the parameter element of the javascript object?

I was inspecting this site in firebug. Inside the third <script/> tag in the head section of the page , I found an object variable declared in the following way ( truncated here however by me) :
var EM={
The utility of the variable is unknown to me. What struck me is the 3 urls presented there. Why are the backward slashes present there? Couldn't it be something like :
"ajaxurl" : ""
In a script element there are various character sequences (depending on the version of HTML) that will terminate the element. </script> will always do this.
<\/script> will not.
Escaping / characters will not change the meaning of the JS, but will prevent any such HTML from ending the script.
The \/\/ is to avoid the below scenario:
when the url looks something similar to "ajaxurl" : "</script>"
Try copy paste the url in browsers address bar. This is handled correctly. Otherwise, You might end up getting script errors and page might not work as you've expected.
imagine DOM manipulators replacing the value as it is in the src attribute of the script tag and then the javascript engine reporting multiple errors because that particular script referenced might not get loaded due to incorrectly defined src value
Hope this helps.
Life would be hectic without these lil things
It is used to escape the characters..
The backslash () can be used to insert apostrophes, new lines, quotes, and other special characters into a string.
var str = " Hello "World" !! ";
This won't work..
You have to escape them first
var str = " Hello \"World\" !! ";
alert(str) ; \\ This works
In terms of Javascript / and <\/ are identical inside a string. As far as HTML is concerned </ starts an end tag but <\/ does not.

Javascript regex whitespace is being wacky

I'm trying to write a regex that searches a page for any script tags and extracts the script content, and in order to accommodate any HTML-writing style, I want my regex to include script tags with any arbitrary number of whitespace characters (e.g. <script type = blahblah> and <script type=blahblah> should both be found). My first attempt ended up with funky results, so I broke down the problem into something simpler, and decided to just test and play around with a regex like /\s*h\s*/g.
When testing it out on string, for some reason completely arbitrary amounts of whitespace around the 'h' would be a match, and other arbitrary amounts wouldn't, e.g. something like " h " would match but " h " wouldn't. Does anyone have an idea of why this occurring or the the error I'm making?
Since you're using JavaScript, why can't you just use getElementsByTagName('script')? That's how you should be doing it.
If you somehow have an HTML string, create an iframe and dump the HTML into it, then run getElementsByTagName('script') on it.
OK, to extend Kolink's answer, you don't need an iframe, or event handlers:
var temp = document.createElement('div');
temp.innerHTML = otherHtml;
var scripts = temp.getElementsByTagName('script');
... now scripts is a DOM collection of the script elements - and the script doesn't get executed ...
Why regex is not a fantastic idea for this:
As a <script> element may not contain the string </script> anywhere, writing a regex to match them would not be difficult: /<script[.\n]+?<\/script>/gi
It looks like you want to only match scripts with a specific type attribute. You could try to include that in your pattern too: /<script[^>]+type\s*=\s*(["']?)blahblah\1[.\n]*?<\/script>/gi - but that is horrible. (That's what happens when you use regular expressions on irregular strings, you need to simplify)
So instead you iterate through all the basic matched scripts, extract the starting tag: result.match(/<script[^>]*>/i)[0] and within that, search for your type attribute /type\s*=\s*((["'])blahblah\2|\bblahblah\b)/.test(startTag). Oh look - it's back to horrible - simplify!
This time via normalisation:
startTag = startTag.replace(/\s*=\s*/g, '=').replace(/=([^\s"'>]+)/g, '="$1"') - now you're in danger territory, what if the = is inside a quoted string? Can you see how it just gets more and more complicated?
You can only have this work using regex if you make robust assumptions about the HTML you'll use it on (i.e. to make it regular). Otherwise your problems will grow and grow and grow!
disclaimer: I haven't tested any of the regex used to see if they do what I say they do, they're just example attempts.

Regular expression for detecting hyperlinks

I've got this regex pattern from WMD showdown.js file.
and the code is:
text = text.replace(/<((https?|ftp|dict):[^'">\s]+)>/gi,"$1");
But when I set text to, it does not anchor it, it returns the original text value as is (
P.S: I've tested it with RegexPal and it does not match.
Your code is searching for a url wrapped in <> like: <>: RegexPal.
Just change it to /((https?|ftp|dict):[^'">\s]+)/gi if you don't want it to search for the <>: RegexPal
As long as you know your url's start with http:// or https:// or whatever you can use:
The expression will match till it encounters a character not allowed in the URL i.e. is not A-Za-z\.\-. It will not however detect anything of the form or anything that comes after the domain name like parameters or sub directory paths etc. If that is your requirement that you can simply choose to terminate the terminating condition as you have above in your regex.
I know it seems pointless but it may be useful if you want the display name to be something abbreviated rather than the whole url in case of complex urls.
You could use:
var re = /(http|https|ftp|dict)(:\/\/\S+?)(\.?\s|\.?$)/gi;
el.innerHTML = el.innerHTML.replace(re, '<a href=\'$1$2\'>$1$2<\/a>$3');
to also match URLs at the end of sentences.
But you need to be very careful with this technique, make sure the content of the element is more or less plain text and not complex markup. Regular expressions are not meant for, nor are they good at, processing or parsing HTML.

