Regex in an XML string - javascript

I would like to get the "MyDocType" string/value of the following string using regex. Yes, it is an XML node, but I really need to get it using regex! :)
<cmis:propertyString propertyDefinitionId="ctp:DocType" displayName="document type" queryName="ctp:DocType"><cmis:value>MyDocType</cmis:value></cmis:propertyString>
Please note the I may have other matches. I want all. Also, there are other tags in the string, but the following rules may be sufficient to get the string:
starts with:
it has some text between
but the value is between the <cmis:value> tags.
I can't just look at the <cmis:value> tags because there are occurrences of it in other places that I don't want to match.
Finally, I need to get the link after the href=" text. I can also have many.
<link rel="self" href="http://localhost:8080/alfresco/s/cmis/s/workspace:SpacesStore/i/6c0dc826-179b-4ed6-9c3a-300f45d6556b"/>
thank you very much!
Miguel

Related

Regex CSS full path

I would like to regex the fullpath of the css, for example I have this:
<link rel="stylesheet" href="../assets/myCssFile.css"/>
I would like to regex: /assets/myCssFile.css
What I tried is this: /(?:href)=("|').*?([\w.]+\.(?:css))\1/gi
but this returns me this: href="../assets/myCssFile.css"
Can someone help me out with the regex.
BTW: It is an response text of an ajax request which returns me a string of the html page
href=\"(.*\.css)"
This will return "../assets/myCssFile.css" in a capture group.
https://regex101.com/r/a44Axz/1
All it is doing is saying:
I am only interested in text within quotes that immediately follows an "href" and only if it is a ".css" file.
Late, but maybe it will help someone. :)
Another regex that matches css paths:
href[ \t]{0,}=[ \t]{0,}"(.{1,}\.css)"
This also matches cases when between href and = exists tabs or spaces or between = and css filename.
Also, if you want to match css and javascript filenames in one regex you can use something like this:
href[ \t]{0,}=[ \t]{0,}"(.{1,}\.css)"|src[ \t]{0,}=[ \t]{0,}"(.{1,}\.js)"
They are combined by or operator |.

Regular Expression matching extra unwanted content

I'm trying to get a parameter stored in a html comment using regex. However when I execute the expression it return the widest string possible and not all the possible matches.
So I have some content that might include this string:
<!--url:/new--><!--title:My Title-->
I use the following simply expression to get the url I need:
/<!--url:(.*)-->/
The issue I have is that the result match part of the title which is of course valid but not what I was looking for
["<!--url:/new--><!--title:My Title-->", "/new--><!--title:My Title"]
There is workarounds I can use like making sure there is a line break after each parameter line but I prefer to have a solid regex and also of course understand what I missing out.
PS: Please comment if you come up with a better title.
Make the regex non-greedy:
/<!--url:(.*?)-->/
You can test this regex by clicking here:
Regex101

Javascript Regex, Removing unclosed tags

I'm looking for javascript regex solution to remove unclosed tags for example:
<div></div><span>
As you can see i want to remove the <span> element, I know it's a bad idea to use regex on markup but it's required for my project, This is the regex pattern i made but it didn't work:
/<([a-z]+?)>([\s\S]*?)(?!<\/\1>)/g
I'm using javascript replace to replace all matches with "", What i try with my pattern is to match only unclosed tags, About the pattern:
[a-z] i know html tags can contain =,",etc, I'm looking for simple pattern that i can play and edit so i started with [a-z]
I used !? to reject matches for closing tags.
I know my pattern isn't working, If anyone have an idea i will be very thankful.
Edit:
I'm aware that there may be recursion, If this is the case i want to remove all the recursion tree, I only want to keep 1 level of html for example:
<div><span></span></div><p></p>
So if the next tag after the <div> is not </div> remove it.
First of all, lets see what OP said:
I know it's a bad idea to use regex on markup but it's required for my project.
I only want to keep 1 level of html
This can be achieved.
You were on the right track. However you shouldn't have used !? to reject matches for closing tags. You want to accept them. This way the match will not accept unclosed tags which is our goal after all.
Now, your regex will look like this.
/<([a-z]+?)>([\s\S]*?)(<\/\1>)/g
We can remove the second and third brackets as they are not necessary:
/<([a-z]+?)>[\s\S]*?<\/\1>/g
If we test this regex on the provided code will will get the following:
"<div><span></span></div><p></p>".match(/<([a-z]+?)>[\s\S]*?<\/\1>/g)
["<div><span></span></div>", "<p></p>"]
It seems that our regex matches TOO MUCH symbols. We must break the match at the "<" symbol as it denotes new tag. The [^<] means "any character but "<".
"<div><span></span></div><p></p>".match(/<([a-z]+?)>[^<]*?<\/\1>/g)
["<span></span>", "<p></p>"]
Finally we can just join the matched results.
"<div><span></span></div><p></p>".match(/<([a-z]+?)>[^<]*?<\/\1>/g).join("")
"<span></span><p></p>"
Wohoooo. I will leave the first part of regex to you as it was not part of the question. I hope this was helpful. I am open for further questions.

Regexp with variables: how to convert xml with attributes to html using javascript replace method?

Apologies if this is a dup, I searched but couldn't quite find the info I was looking for.
Using javascript, I want to search a string, find a tag, match attributes in the tag, and store them as variables for rewriting. Here's the part of the string I'm looking for:
<my_child name="view" gso="g--" type="Application.View">
that I'd like to convert to:
<tr><td>view</td><td>Appication.View</td><td>g--</td></tr>
Here's the regexp I'm trying. I don't think it's actually finding a match though:
objString = objString.replace(/<my_child name="(.*)" gso="(.*)" type="(.*)">/g, '<tr><td>'+RegExp.$1+'</td><td>'+RegExp.$3+'</td><td>'+RegExp.$2+'</td><td>');
EDIT: SOLVED.
Thanks for the advice kiamlaluno. Turns out I was also not taking into account the indeterminate number of spaces between attributes. Here's the updated regexp:
/<my_child[ ]+name="([^"]*)"[ ]+gso="([^"]*)"[ ]+type="([^"]*)"[ ]*>/g, '<tr><td>$1</td><td>$3</td><td>$2</td><td>'
you can replace
'<tr><td>'+RegExp.$1+'</td><td>'+RegExp.$3+'</td><td>'+RegExp.$2+'</td><td>'
with
'<tr><td>$1</td><td>$3</td><td>$2</td><td>'
and the (.*) matches should probably be ([^"]*) instead
maybe that will help get you closer?

How do I extract the title value from a string using Javascript regexp?

I have a string variable which I would like to extract the title value in id="resultcount" element. The output should be 2.
var str = '<table cellpadding=0 cellspacing=0 width="99%" id="addrResults"><tr></tr></table><span id="resultcount" title="2" style="display:none;">2</span><span style="font-size: 10pt">2 matching results. Please select your address to proceed, or refine your search.</span>';
I tried the following regex but it is not working:
/id=\"resultcount\" title=['\"][^'\"](+['\"][^>]*)>/
Since var str = ... is Javascript syntax, I assume you need a Javascript solution. As Peter Corlett said, you can't parse HTML using regular expressions, but if you are using jQuery you can use it to take advantage of browser own parser without effort using this:
$('#resultcount', '<div>'+str+'</div>').attr('title')
It will return undefined if resultcount is not found or it has not a title attribute.
To make sure it doesn't matter which attribute (id or title) comes first in a string, take entire html element with required id:
var tag = str.replace(/^.*(<[^<]+?id=\"resultcount\".+?\/.+?>).*$/, "$1")
Then find title from previous string:
var res = tag.replace(/^.*title=\"(\d+)\".*$/, "$1");
// res is 2
But, as people have previously mentioned it is unreliable to use RegEx for parsing html, something as trivial as different quote (single instead of double quote) or space in "wrong" place will brake it.
Please see this earlier response, entitled "You can't parse [X]HTML with regex":
RegEx match open tags except XHTML self-contained tags
Well, since no one else is jumping in on this and I'm assuming you're just looking for a value and not trying to create a parser, I'll give you what works for me with PCRE. I'm not sure how to put it into the java format for you but I think you'll be able to do that.
span id="resultcount" title="(\d+)"
The part you're looking to get is the non-passive group $1 which is the '\d+' part. It will get one or more digits between the quote marks.

Categories

Resources