Regexp to read HTML and match onlu specific word in text

Regexp to read HTML and match onlu specific word in text - javascript

I have this String:
<body>
<span class="open crack-opener o_open i_opens ng-open" style='open'>Open opens openes "Open opens openes" clopened</span>
</body>
I need to select only the words OPEN or OPENS or OPENES only inside the text. I tried the following RegExp, but it only selects the tags. I need to negate this and select the words.
/(<\/?\w+((\s+\w+(\s*=\s*(?:\".*?"|'.*?'|[^'\">\s]+))?)+\s*|\s*)?>)/ig
How do I negate this match and insert the word open?
Thanks in advance

To begin with: Do not use regex to parse HTML, it is not a good idea, since it is impossible to build regex parsing HTML :)
But back to your question:
var str="<body><span class=\"open crack-opener o_open i_opens ng-open\" style='open'>Open opens openes \"Open opens openes\" clopened</span></body>";
var words=str.match(/(\bopen\b|\bopens\b|\bopenes\b)(?=[^>]*<)/ig);
This will search for your words followed by anything except for > and then followed by <. That solution is not the best but you cannot expect regex to do something it was not designed for.

Related

replace \n from tag attribute using javascript regex with empty character

I have tag like <span style="font-size:10.5pt;\nfont-family:\nKaiTi"> and I want to replace \n within tag with empty character.
Note: Tag could be anything(not fixed)
I want regex expression to replace the same in the javascript.

You should be able to strip out the \n character before applying this HTML to the page.
Having said that, try this (\\n)
You can see it here: regex101
Edit: A bit of refinement and I have this (\W\\n). It works with the example you provided. It breaks down if you have spaces in the body of the tags (<span> \n </span>).
I've tried everything I know to do. Perhaps someone with more regex experience can assist?

Remove some HTML tags from a string if it is a duplicate from the previous tag

I have a string like this:
The user have to press the button </br> </br> </br> Then find the
right folder </br> </br> </br> Those the correct name.
The problem is all the spaces. I want to just have one space. Somehow I have to check for duplicates but I don't know how?
I have tried to replace all the html tags but then none of the tags are shown:
htmlstring.replace(/<[^>]*>/g, ' ');
I have also tried to just replace the br tags:
htmlstring.replace(/<[//]{0,1}(B|b)[^><]*>/g,"");
I want the string to look like this:
The user have to press the button </br> Then find the
right folder </br> Those the correct name.

You can try this one:
htmlstring.replace(/((<\/br>\s?){2,})/g, '$2')
it finds only the substrings, which contain multiple line-breaks - {2,}
it replaces to only one - reference to the result of the second group
it does it repeatedly in the whole string - g flag
P.s. I often use regexpal to test my regexes, this is a really informative and fast way to write it.

For this simple example you could use:
htmlstring.replace(/((<[^>]*>)\s*){1,}/g, '$1')

match every character until a pattern occurs in the beginning of the line (javascript)

I have this text:
<a>
a lot of text here with all types of symbols ! : . %& < >
</a>
<a>
another text here with all types of symbols ! : . %& < >
</a>
I want to match the tag name and its contents: so the procedure I'm using is match:
<([^]*?)>(?:([^]*)<\/\1>)?
NOTE: I use the conditional group at the end because it can be omitted, for example.
<a>
<a>
another text here with all types of symbols ! : . %& < >
</a>
But my problem is that the regex tries to consume every character so it opens and closes the tab and the contents of the tab becomes:
<a>
another text here with all types of symbols ! : . %& < >
when I wanted to detect two matches one the isolated tag and the other the multiline tag.
NOTE2: This is NOT HTML or XML so I don't need to parse it like wise.
NOTE3: my ideia was to replace the regex part:
(?:([^]*)....
by something that would 'match every character until '<' appears at the beginning of the line (this because in the text I'm parsing there can't be tags inside tags) so I thought that would be good.. but I can't seem to find a regex for that :(

I think what you want is /<([a-z0-9-]+)>([^]*?)(?:(<\/\1>)|$|(?=(?:<[a-zA-Z0-9\-]+>)))/gi

I suggest you parse it by program:
Match the first occurrence of any opening tag:
<([a-z0-9]+)>
With this, you can get the tag's name.
Get the position of the second occurrence of any opening tag and the position of the first ocurrence of the closing tag with the same name that the read before.
Compare these positions and decide if it was a single-line just-open-tag or a multi-line open-and-close-tag.
Get the content enclosed between the first opening tag and the lowest position got in step 2.

Use Javascript to get the Sentence of a Clicked Word

This is a problem I'm running into and I'm not quite sure how to approach it.
Say I have a paragraph:
"This is a test paragraph. I love cats. Please apply here"
And I want a user to be able to click any one of the words in a sentence, and then return the entire sentence that contains it.

You first would have to split your paragraph into elements, as you can't (easily) detect clicks on text without elements :
$('p').each(function() {
$(this).html($(this).text().split(/([\.\?!])(?= )/).map(
function(v){return '<span class=sentence>'+v+'</span>'}
));
});
Note that it splits correctly paragraphs like this one :
<p>I love cats! Dogs are fine too... Here's a number : 3.4. Please apply here</p>
Then you would bind the click :
$('.sentence').click(function(){
alert($(this).text());
});
Demonstration
I don't know if in English : is a separator between sentences. If so, it can be added to the regex of course.

First of all, be prepared to accept a certain level of inaccuracy. This may seem simple on the surface, but trying to parse natural languages is an exercise in madness. Let us assume, then, that all sentences are punctuated by ., ?, or !. We can forget about interrobangs and so forth for the moment. Let's also ignore quoted punctuation like "!", which doesn't end the sentence.
Also, let's try to grab quotation marks after the punctuation, so that "Foo?" ends up as "Foo?" and not "Foo?.
Finally, for simplicity, let's assume that there are no nested tags inside the paragraph. This is not really a safe assumption, but it will simplify the code, and dealing with nested tags is a separate issue.
$('p').each(function() {
var sentences = $(this)
.text()
.replace(/([^.!?]*[^.!?\s][.!?]['"]?)(\s|$)/g,
'<span class="sentence">$1</span>$2');
$(this).html(sentences);
});
$('.sentence').on('click', function() {
console.log($(this).text());
});
It's not perfect (for example, quoted punctuation will break it), but it will work 99% of the time.
Live demo: http://jsfiddle.net/SmhV3/
Slightly amped-up version that can handle quoted punctuation: http://jsfiddle.net/pk5XM/1/

Match the sentences. You can use a regex along the lines of /[^!.?]+[!.?]/g for this.
Replace each sentence with a wrapping span that has a click event to alert the entire span.

I suggest you take a look at Selection and ranges in JavaScript.
There is not method parse, which can get you the current selected setence, so you have to code that on your own...
A Javascript library for getting the Selection Rang cross browser based is Rangy.

Not sure how to get the complete sentense. but you can try this to get word by word if you split each word by spaces.
<div id="myDiv" onmouseover="splitToSpans(this)" onclick="alert(event.target.innerHTML)">This is a test paragraph. I love cats. Please apply here</div>
function splitToSpans(element){
if($(element).children().length)
return;
var arr = new Array();
$($(element).text().split(' ')).each(function(){
arr.push($('<span>'+this+' </span>'));
});
$(element).text('');
$(arr).each(function(){$(element).append(this);});
}

jQuery match first letter in a string and wrap with span tag

I'm trying to get the first letter in a paragraph and wrap it with a <span> tag. Notice I said letter and not character, as I'm dealing with messy markup that often has blank spaces.
Existing markup (which I can't edit):
<p> Actual text starts after a few blank spaces.</p>
Desired result:
<p> <span class="big-cap">A</span>ctual text starts after a few blank spaces.</p>
How do I ignore anything but /[a-zA-Z]/ ? Any help would be greatly appreciated.

$('p').html(function (i, html)
{
return html.replace(/^[^a-zA-Z]*([a-zA-Z])/g, '<span class="big-cap">$1</span>');
});
Demo: http://jsfiddle.net/mattball/t3DNY/

I would vote against using JS for this task. It'll make your page slower and also it's a bad practice to use JS for presentation purposes.
Instead I can suggest using :first-letter pseudo-class to assign additional styles to the first letter in paragraph. Here is the demo: http://jsfiddle.net/e4XY2/. It should work in all modern browsers except IE7.

Matt Ball's solution is good but if you paragraph has and image or markup or quotes the regex will not just fail but break the html
for instance
<p><strong>Important</strong></p>
or
<p>"Important"</p>
You can avoid breaking the html in these cases by adding "'< to the exuded initial characters. Though in this case there will be no span wrapped on the first character.
return html.replace(/^[^a-zA-Z'"<]*([a-zA-Z])/g, '<span class="big-cap">$1</span>');
I think Optimally you may wish to wrap the first character after a ' or "
I would however consider it best to not wrap the character if it was already in markup, but that probably requires a second replace trial.

I do not seem to have permission to reply to an answer so forgive me for doing it like this. The answer given by Matt Ball will not work if the P contains another element as first child. Go to the fiddle and add a IMG (very common) as first child of the P and the I from Img will turn into a drop cap.

If you use the x parameter (not sure if it's supported in jQuery), you can have the script ignore whitespace in the pattern. Then use something like this:
/^([a-zA-Z]).*$/
You know what format your first character should be, and it should grab only that character into a group. If you could have other characters other than whitespace before your first letter, maybe something like this:
/.*?([a-zA-Z]).*/
Conditionally catch other characters first, and then capture the first letter into a group, which you could then wrap around a span tag.

Develop Reference

JavaScript is the programming language of the Web.

Regexp to read HTML and match onlu specific word in text - javascript

Related

replace \n from tag attribute using javascript regex with empty character

Remove some HTML tags from a string if it is a duplicate from the previous tag

match every character until a pattern occurs in the beginning of the line (javascript)

Use Javascript to get the Sentence of a Clicked Word

jQuery match first letter in a string and wrap with span tag

Categories

Resources