It's difficult to describe because I'm not an expert with regular expressions. So I tell you my case.
In HTML want to contribute class attributes into different data-xyz attributes. The problem is to get always all classes per match. For example the following HTML:
<span class="note-123 index-3 green">Hello</span> <span class="index-456 red">World<span>
Until now my regular expression is /<span class="([^\"\s]*)\s*/ and it matches the first class. In this case note-123 and index-456
But if I want to get all classes per element I could use /<span class="([^\"\s]*)\s*([^\"\s]*)\s*([^\"\s]*)\s*/. That works until three classes and the result for the second class return index-456, red and an empty string.
Is there a possibility to always get all classes per match no matter how many classes there are? Similar to a nested loop in Javascript?
I would be pleased to get any help from you guys.
You could get the classes without using a regex making use of querySelectorAll to find the elements that you want and use classList to get the class names.
Then use for example the add or remove methods.
Or use a DOMParser.
Note to close the last span.
let elms = document.querySelectorAll("span");
elms.forEach(e => {
for (let value of e.classList.values()) {
console.log(value);
}
});
<span class="note-123 index-3 green">Hello</span> <span class="index-456 red">World</span>
Use the regex to extract the value of the class attribute and split it at whitespace sequences:
let as_classes
, as_matches
, n_i
, re_classes
, s_test
;
re_classes = new RegExp ( "<span class=\u0022([^\\u0022]*)", "g" );
s_test = '<span class="note-123 index-3 green">Hello</span> <span class="index-456 red">World<span>';
n_i=0;
while ((as_matches = re_classes.exec(s_test)) !== null) {
n_i++;
s_classes = as_matches[1];
as_classes = s_classes.split(/[\s]+/g);
console.log(`match #${n_i}, classes: ${JSON.stringify(as_classes)}.`);
}
Warning
It is in general never a good approach to extract information from html with regexen.
Related
I'm trying to get all elements with id feed_item_{n} where {n} can be any integer greater than 0.
I know that I can use document.querySelectorAll('[id^="feed_item_"]') but that doesn't really help because I get also elements with these ids: feed_item_0_x, feed_item_x_0 where x may be any string
Is there a quick way to achieve that in one single line rather than running over all the elements I got from previous command and filttering them?
Since it's not possible to use RegEx within attribute selectors, the only way is to filter your querySelectorAll result; and there.. you can use a regex to match only numbers after feed_item_
It will be something like this
let items = [...document.querySelectorAll('[id^="feed_item_"]')].filter(
(item) => item.id.match(/\d+$/)
);
You also can use :not attribute, to get items whose id doesn't have _x by :not([id*="_x"]
const items = document.querySelectorAll('[id^="feed_item_"]:not([id*="_x"])')
console.log([...items])
<div id="feed_item_1">feed_item_1</div>
<div id="feed_item_2">feed_item_2</div>
<div id="feed_item_x_1">feed_item_x_1</div>
<div id="feed_item_1_x">feed_item_1_x</div>
Is there a way to get an element by its content(a word it contains?)
For example, get all the elements with the letter "F," and put it in a array of elements
I highly recommand you to use jQuery for these kind of DOM elements searching.
Then you can use this:
var foos = $("div:contains('foo')" )
will make an array with all divs containing the word 'foo'.
One fairly easy way is to select the elements you're interested in and then use 'filter' to look at the innerText. You can make this case insensitive with toLowerCase
var result = $('div').filter( (i,e) => e.innerText.toLowerCase().indexOf("f")>-1);
console.log("Items with 'F':",result.length);
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div>Forest</div>
<div>Fortnight</div>
<div>Trees</div>
<div>Africa</div>
The simpler way is using :contains('F') as a selector - but that is always case sensitive (which may be fine for your case).
You can use :contains as a selector. For example, to filter all divs of a special class that also contains your text, you can use $("div.myclass:contains('searched text')")
I think you can "bruteforce" it by iterating all DOM items. e.g.:
let arrayDom = Array.from(document.getElementsByTagName("*"));
arrayDom.forEach(element => {
if (element.innerHTML.contains('F')){
// Do something
}
})
Is there a way to do a wildcard element name match using querySelector or querySelectorAll?
The XML document I'm trying to parse is basically a flat list of properties
I need to find elements that have certain strings in their names.
I see support for wildcards in attribute queries but not for the elements themselves.
Any solution except going back to using the apparently deprecated XPath (IE9 dropped it) is acceptable.
[id^='someId'] will match all ids starting with someId.
[id$='someId'] will match all ids ending with someId.
[id*='someId'] will match all ids containing someId.
If you're looking for the name attribute just substitute id with name.
If you're talking about the tag name of the element I don't believe there is a way using querySelector
I was messing/musing on one-liners involving querySelector() & ended up here, & have a possible answer to the OP question using tag names & querySelector(), with credits to #JaredMcAteer for answering MY question, aka have RegEx-like matches with querySelector() in vanilla Javascript
Hoping the following will be useful & fit the OP's needs or everyone else's:
// basically, of before:
var youtubeDiv = document.querySelector('iframe[src="http://www.youtube.com/embed/Jk5lTqQzoKA"]')
// after
var youtubeDiv = document.querySelector('iframe[src^="http://www.youtube.com"]');
// or even, for my needs
var youtubeDiv = document.querySelector('iframe[src*="youtube"]');
Then, we can, for example, get the src stuff, etc ...
console.log(youtubeDiv.src);
//> "http://www.youtube.com/embed/Jk5lTqQzoKA"
console.debug(youtubeDiv);
//> (...)
Set the tagName as an explicit attribute:
for(var i=0,els=document.querySelectorAll('*'); i<els.length;
els[i].setAttribute('tagName',els[i++].tagName) );
I needed this myself, for an XML Document, with Nested Tags ending in _Sequence. See JaredMcAteer answer for more details.
document.querySelectorAll('[tagName$="_Sequence"]')
I didn't say it would be pretty :)
PS: I would recommend to use tag_name over tagName, so you do not run into interferences when reading 'computer generated', implicit DOM attributes.
I just wrote this short script; seems to work.
/**
* Find all the elements with a tagName that matches.
* #param {RegExp} regEx regular expression to match against tagName
* #returns {Array} elements in the DOM that match
*/
function getAllTagMatches(regEx) {
return Array.prototype.slice.call(document.querySelectorAll('*')).filter(function (el) {
return el.tagName.match(regEx);
});
}
getAllTagMatches(/^di/i); // Returns an array of all elements that begin with "di", eg "div"
i'm looking for regex + not + multiClass selector, and this is what I got.
Hope this help someone looking for same thing!
// contain abc class
"div[class*='abc']"
// contain exact abc class
"div[class~='abc']"
// contain exact abc & def(case-insensitively)
"div[class~='abc'][class*='DeF'i]"
// contain exact abc but not def(case-insensitively)
"div[class~='abc']:not([class*='DeF'i])"
css selector doc: https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors
simple test: https://codepen.io/BIgiCrab/pen/BadjbZe
I liked many of the answers above, but I prefer my queries run only on classes/IDs so they don't have to iterate over every element. This is a combination of code from both #bigiCrab and #JaredMcAteer
// class exactly matches abc
const exactAbc = document.querySelectorAll("[class='abc']")
// class begins with abc
const startsAbc = document.querySelectorAll("[class^='abc']")
// class contains abc
const containsAbc = document.querySelectorAll("[class*='abc']")
// class contains white-space separated word exactly matching abc
const wordAbc = document.querySelectorAll("[class~='abc']")
// class ends with abc
const endsAbc = document.querySelectorAll("[class$='abc']")
Substitute "class" with "id" or "href" to get other matches. Read the article linked below for further examples.
Reference:
CSS attribute selectors on MDN: https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors
There is a way by saying what is is not. Just make the not something it never will be. A good css selector reference:
https://www.w3schools.com/cssref/css_selectors.asp which shows the :not selector as follows:
:not(selector) :not(p) Selects every element that is not a <p> element
Here is an example: a div followed by something (anything but a z tag)
div > :not(z){
border:1px solid pink;
}
as i am getting tough time to list out all spans, which having class="ansspans", may be one or more span with "ansspans" classes will be there, i need to get all the spans with its content and iterate through it. can you tell me how to do it, Regex, Jquery, any thing ok,
The content will be in string variable (not in DOM), as IE 9 ignoring quotes from attribute, so i cant use getelementbyclass name,and i followed this answer, to get quotes InnerHTml workAround, now its displaying with quotes. so i need get all the class of ansspans, in an array, so that i'll iterate it n get the text content of each span
<span id="sss_ctl00_ctl06_lblanswertext">
assignment
<span class="ansspans">submission </span>
date : 10:07:51 AM
</span>
in this eg, expected output will be 1 span object, so that i can iterate over it
Update : I cant use DOm, as we are in quirks mode, so ie 9 will ignore attribute quotes, which i cant traverse using getelement by class name, . so , i need to match all spans in a string variable. hope everyone understood my problem ;(
(as been said on this site before, sometimes it's ok to parse a limited, known set of xml with regex)
//assumes no <span id="stupid>" class="ansspans">, and no embedded span.ansspans
var data = ' <span id="sss_ctl00_ctl06_lblanswertext"> assignment \n date : 10:07:51 AM \n <span class="ansspans">ONE has a new \n line in it</span><span class="ansspans">TWO</span><span class="ansspans">3 </span><span class="ansspans">4 </span><span class="ansspans">5 </span>';
var myregexp = /<span[^>]+?class="ansspans".*?>([\s\S]*?)<\/span>/g;
var match = myregexp.exec(data);
var result = "spans found:\n";
while (match != null) {
result += "match:"+RegExp.$1 + ',\n';
match = myregexp.exec(data);
}
alert(result);
(edited to capture inner html instead of whole tag)
The following jQuery selector $('span.ansspans') will get all the <span class="anspans"> for the page.
If you need something for a specific element, add a prefix of the appropriate selector, i.e. $('#sss_ctl00_ctl06_lblanswertext span.ansspans')
If this needs to be done in a more dynamic way - look into functions like find(), filter(), etc.
solution using jquery
var tempArray = $("span.ansspans");//this will select all the span elements having class 'ansspans' and return them in an array
var len = tempArray.length;//calculate the length of the array
for(var index = 0;index<len;index++){
var reqString = $(tempArray[index]).html();
}
These string values can be either put inside an array or can be utilised then and there only.
If you want textContent, use .text() instead of .html()
Is there a way to do a wildcard element name match using querySelector or querySelectorAll?
The XML document I'm trying to parse is basically a flat list of properties
I need to find elements that have certain strings in their names.
I see support for wildcards in attribute queries but not for the elements themselves.
Any solution except going back to using the apparently deprecated XPath (IE9 dropped it) is acceptable.
[id^='someId'] will match all ids starting with someId.
[id$='someId'] will match all ids ending with someId.
[id*='someId'] will match all ids containing someId.
If you're looking for the name attribute just substitute id with name.
If you're talking about the tag name of the element I don't believe there is a way using querySelector
I was messing/musing on one-liners involving querySelector() & ended up here, & have a possible answer to the OP question using tag names & querySelector(), with credits to #JaredMcAteer for answering MY question, aka have RegEx-like matches with querySelector() in vanilla Javascript
Hoping the following will be useful & fit the OP's needs or everyone else's:
// basically, of before:
var youtubeDiv = document.querySelector('iframe[src="http://www.youtube.com/embed/Jk5lTqQzoKA"]')
// after
var youtubeDiv = document.querySelector('iframe[src^="http://www.youtube.com"]');
// or even, for my needs
var youtubeDiv = document.querySelector('iframe[src*="youtube"]');
Then, we can, for example, get the src stuff, etc ...
console.log(youtubeDiv.src);
//> "http://www.youtube.com/embed/Jk5lTqQzoKA"
console.debug(youtubeDiv);
//> (...)
Set the tagName as an explicit attribute:
for(var i=0,els=document.querySelectorAll('*'); i<els.length;
els[i].setAttribute('tagName',els[i++].tagName) );
I needed this myself, for an XML Document, with Nested Tags ending in _Sequence. See JaredMcAteer answer for more details.
document.querySelectorAll('[tagName$="_Sequence"]')
I didn't say it would be pretty :)
PS: I would recommend to use tag_name over tagName, so you do not run into interferences when reading 'computer generated', implicit DOM attributes.
I just wrote this short script; seems to work.
/**
* Find all the elements with a tagName that matches.
* #param {RegExp} regEx regular expression to match against tagName
* #returns {Array} elements in the DOM that match
*/
function getAllTagMatches(regEx) {
return Array.prototype.slice.call(document.querySelectorAll('*')).filter(function (el) {
return el.tagName.match(regEx);
});
}
getAllTagMatches(/^di/i); // Returns an array of all elements that begin with "di", eg "div"
i'm looking for regex + not + multiClass selector, and this is what I got.
Hope this help someone looking for same thing!
// contain abc class
"div[class*='abc']"
// contain exact abc class
"div[class~='abc']"
// contain exact abc & def(case-insensitively)
"div[class~='abc'][class*='DeF'i]"
// contain exact abc but not def(case-insensitively)
"div[class~='abc']:not([class*='DeF'i])"
css selector doc: https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors
simple test: https://codepen.io/BIgiCrab/pen/BadjbZe
I liked many of the answers above, but I prefer my queries run only on classes/IDs so they don't have to iterate over every element. This is a combination of code from both #bigiCrab and #JaredMcAteer
// class exactly matches abc
const exactAbc = document.querySelectorAll("[class='abc']")
// class begins with abc
const startsAbc = document.querySelectorAll("[class^='abc']")
// class contains abc
const containsAbc = document.querySelectorAll("[class*='abc']")
// class contains white-space separated word exactly matching abc
const wordAbc = document.querySelectorAll("[class~='abc']")
// class ends with abc
const endsAbc = document.querySelectorAll("[class$='abc']")
Substitute "class" with "id" or "href" to get other matches. Read the article linked below for further examples.
Reference:
CSS attribute selectors on MDN: https://developer.mozilla.org/en-US/docs/Web/CSS/Attribute_selectors
There is a way by saying what is is not. Just make the not something it never will be. A good css selector reference:
https://www.w3schools.com/cssref/css_selectors.asp which shows the :not selector as follows:
:not(selector) :not(p) Selects every element that is not a <p> element
Here is an example: a div followed by something (anything but a z tag)
div > :not(z){
border:1px solid pink;
}