I want to somehow mark a part of a plain text to emphasize that without putting extra characters around that.
I figured out that I can use the combining characters and used \u0332, but when I tried to emphasize a string including numbers and white spaces, I realized that the character doesn't combine with them (will be combined with the next character):
console.log("8 o'clock".replace(/./g, a => a + "\u{0332}"));
I'm trying to find any appropriate combining character or make \u0332 to combine with all other characters, or any idea like this to mark a plain text.
Related
Assume we have text such as the following.
Title: (some text)
My Title [abc]
Content: (some test)
My long content paragraph. With multiple sentences. [abc]
Short Content: (some text)
Short content [abc]
Using Javascript and RegEx, is it possible to extract the text so that it would be as follows.
Title: My Title
Content: My long content paragraph. With multiple sentences.
Short Content: Short content
Basically ignoring new lines and text in the () and [] brackets?
I've tried to use Regex but I can't get it to do exactly as I'd like. I'm also getting the issue that when I match Content: i'm getting a match for both Content: & Short Content: however i'd want to only match the occurrence where it is an exact match.
EDIT:
I'm new to RegEx. So far to extract the titles such as Title:, Content: and so on I have
/[A-Za-z]+:|[A-Za-z]+ [A-Za-z]+:|[A-Za-z]+ [A-Za-z]+ [A-Za-z]+:|[A-Za-z]+ [A-Za-z]+ [0-9]+:/g
And then I loop through and use this
[TITLENAME]:.*\n.*
I'm struggling to get past this. My next step would be to loop through the text that is matched above and then remove the bracket stuff. I'm sure there is a better way to do this!
You could use String.replace( /(\(|\)|\[|\])/g , '')
If you take a string and use the replace method with these two arguments it will return a string with the ()[] characters removed. I have escaped them all with \ since they are special characters in regex. It might be a little over zealous.
Also g makes the regular expression global so it will remove all instances
If the text within parenthesis (e.g. 'abc') is fixed and have a special meaning you can also go with: '/(\(some text\)\n|\(some test\)\n|(\[abc\]))|(^$\n)/gm'.
This way you would allow parenthesis in the real text that you want to preserve, e.g. some text (this I want to preserve) and other text.
Please note the multiline m flag.
https://regex101.com/r/cS3pRR/1
I have a text box with a bunch of comments, all separated by a specific character string as a means of splitting them to display each comment individually.
The string in question is | but I can change this to accommodate whatever will work. My only requirement is that it is not likely to be a string of characters someone will type in an everyday sentence.
I believe I need to use the split method and possibly some regex but all the other questions I've seen only seem to mention splitting by one character or a number of different characters, not a specific set of characters in a row.
Can anyone point me in the right direction?
.split() should work for that purpose:
var comments = "this is a comment|and here is another comment|and yet another one";
var parsedComments = comments.split('|');
This will give you all comments in an array which you can then loop over or do whatever you have to do.
Keep in mind you could also change | to something like <--NEWCOMMENT--> and it will still work fine inside the split('<--NEWCOMMENT-->') method.
Remember that split() removes the character it's splitting on, so your resulting array won't contain any instances of <--NEWCOMMENT-->
I've researched stackoverflow and find similar results but it is not really what I wanted.
Given an xml string: "<a b=\"c\"></a>" in javascript context, I want to create a regex that will capture the attribute value including the quotation marks.
NOTE: this is similar if you're using single quotation marks.
Currently I have a regular expression tailored to the XML specification:
[_A-Za-z][\w\.\-]*(?:=\"[^\"]*\")?
[_A-Za-z][\w\.\-]* //This will match the attribute name.
(?:=\"[^\"]*\")? //This will match the attribute value.
\"[^\"]*\" //This part concerns me.
My question now is, what if the xml string looks like this:
<shout statement="Hi! \"Richeve\"."></shout>
I know this is a dumb question to ask but I just want to capture rare cases that this scenario might happen (I know the coder can use single quotes on this scenario) but there are cases that we don't know the current value of the attribute given that the attribute value changes dynamically at runtime.
So to make this clearer, the result of that using the correct regex should be:
"Hi! \"Richeve\"."
I hope my question is clear. Thanks for all the help!
PS: Note that the language context is Javascript and I know it is tempting to use lookbehinds but currently lookbehinds are not supported.
PS: I know it is really hard to parse XML but I have an elegant solution to this :) so I just need this small problem to be solved. So this problem only main focus is capturing quotation marked string tokens containing quotation marks inside the string token.
The standard pattern for content with matching delimiters and embedded escaped delimiters goes like this:
"[^"\\]*(?:\\.[^"\\]*)*"
Ignoring the obvious first and last characters in the pattern, here's how the rest of the pattern works:
[^"\\]*: Consume all characters until a delimiter OR backslash (matching Hi! in your example)
(?:\\.[^"\\]*)* Try to consume a single escaped character \\. followed by a series of non delimiter/backslash characters, repeatedly (matching \"Richeve first and then \". next in your example)
That's it.
You can try to use a more generic delimiter approach using (['"]) and back references, or you can just allow for an alternate pattern with single quotes like so:
("[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*')
Here's another description of this technique that might also help (see the section called Strings): http://www.regular-expressions.info/examplesprogrammer.html
Description
I'm pretty really sure embedding double quotes inside a double quoted attribute value is not legal. You could use the unicode equivalent of a double quote \x22 inside the value.
However to answer the question, this expression will:
allow escaped quotes inside attribute values
capture the attribute statement 's value
allow attributes to appear in any order inside the tag
will avoid many of the edge cases which will trip up pattern matching inside html text
doesn't use lookbehinds
<shout\b(?=\s)(?=(?:[^>=]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*?\sstatement=(['"])((?:\\['"]|.)*?)\1(?:\s|\/>|>))(?:[^>=]|='(?:[^']|\\')*'|="(?:[^"]|\\")*"|=[^'"][^\s>]*)*>.*?<\/shout>
Example
Pretty Rubular
Ugly RegexPlanet set to Javascript
Sample Text
Note the difficult edge case in the first attribute :)
<shout onmouseover=' statement="He said \"I am Inside the onMouseOver\" " ; if ( 6 > a ) { funRotate(statement) } ; ' statement="Hi! \"Richeve\"." title="sometitle">SomeString</shout>
Matches
Group 0 gets the entire tag from open to close
Group 1 gets the quote surrounding the statement attribute value, this is used to match the closing quote correctly
Group 2 gets the statement attribute value which may include escaped quotes like \" but not including the surrounding quotes
[0][0] = <shout onmouseover=' statement="He said \"I am Inside the onMouseOver\" " ; if ( 6 > a ) { funRotate(statement) } ; ' statement="Hi! \"Richeve\"." title="sometitle">SomeString</shout>
[0][1] = "
[0][2] = Hi! \"Richeve\".
I am using .Net:
fulltext = File.ReadAllText(#location);
to read text anyfile content at given locatin.
I got result as:
fulltext="# vdk10.syx\t1.1 - 3/15/94\r# #(#)Copyright (C) 1987-1993 Verity, Inc.\r#\r# Synonym-list Database Descriptor\r#\r$control: 1\rdescriptor:\r{\r data-table: syu\r {\r worm:\tTHDSTAMP\t\tdate\r worm:\tQPARSER\t\t\ttext\r\t/_hexdata = yes\r varwidth:\tWORD\t\tsyv\r fixwidth:\tEXPLEN\t\t\t2 unsigned-integer\r varwidth:\tEXPLIST\t\tsyx\r\t/_hexdata = yes\r }\r data-table: syw\r {\r varwidth:\tSYNONYMS\tsyz\r }\r}\r\r ";
Now, I want this fulltext to be displayed in html page so that special characters are recognized in html properly. For examples: \r should be replaced by line break tag
so that they are properly formatted in html page display.
Is there any .net class to do this? I am looking for universal method since i am reading file and I can have any special characters. Thanks in advance for help or just direction.
You're trying to solve two problems:
Ensure special characters are properly encoded
Pretty-print your text
Solve them in this order:
First, encode the text, by importing the System.Web namespace and using HttpUtility (asked on StackOverflow). Use the result in step 2.
Pretty-printing is trickier, depending on the amount of pretty-printing that you want. Here are a few approaches, in increasing order of difficulty:
Put the text in a pre element. This should preserve newlines, tabs, spaces. You can still adjust the font used using CSS if you first slap a CSS class on the pre.
Replace all \r, \r\n and remaining \n with <br/>.
Study the structure of your text, parse it according to this structure, and provide specific tags in specific contexts. For example, the tab characters in your example may be indicative of a list of items. HTML provides the ol and ul elements for lists. Similarly, consecutive line breaks may indicate paragraphs, for which HTML provides the well known p element.
Thanks Everyone here for your valuable comment. I solved my formatting problem in client side with following code.
document.getElementById('textView').innerText = fulltext;
Here textview is the div where i want to display my fulltext . I don't think i need to replace special characters in string fulltext. I output as shown in the figure.
Being obsessed with neatness in Javascript lately, I was curious about whether there is some type of common practice about how to deal with lines that span over 80 cols due to string length. With innerHTML I can mark line breaks with a backslash and indentation spaces won't show up in the content of the element, but that doesn't seem to go for eg. console.log().
Are there any conventions for this or should I just learn to live with lines longer than 80 cols? :)
There's no universal convention. With modern high-res monitors you can easily fit 160 columns and still have room for IDE toolbars without needing to scroll, so I wouldn't be concerned about sticking to 80 columns.
Some people go out of their way to never have any line of code go past n columns, where n might be 80, or 160, or some other arbitrary number based on what fits for their preferred font and screen resolution. Some people I work with don't care and have lines that go way off to the right regardless of whether it is due to a long string or a function with lots of parameters or whatever.
I try to avoid any horizontal scrolling but I don't obsess about it so if I have a string constant that is particularly long I will probably put it all on one line. If I have a string that is built up by concatenating constants and variables I will split it over several lines, because that statement will already have several + operators that are a natural place to add line breaks. If I have a function with lots of parameters, more than would fit without scrolling, I will put each parameter on a newline. For an if statement with a lot of conditions I'd probably break that over several lines.
Regarding what you mentioned about innerHTML versus console.log(): if you break a string constant across lines in your source code by including a backslash and newline then any indenting spaces you put on the second line will become part of the string:
var myString1 = "This has been broken\
into two lines.";
// equivalent to
var myString2 = "This has been broken into two lines.";
// NOT equivalent to
var myString3 = "This has been broken\
into two lines.";
If you use that string for innerHTML the spaces will be treated the same as spaces in your HTML source, i.e., the browser will display it with multiple spaces compressed down to a single space. But for any other uses of the string in your code including console.log() the space characters will all be included.
If horizontal scrolling really bothers you and you have a long string the following method lets you have indenting without extra spaces in the string:
var myString3 = "Hi there, this has been broken"
+ " into several lines by concatenating"
+ " a number of shorter strings.";