How to highlight text from paragraph using protractor? - javascript

I have notes section where user related data is present. This data is dynamic. I want to select one or two words from that notes section.
Text is seperated indexwise. E. G. 'Any' word is having 3 indexes. All these notes are present under one div tag.
Please suggest how to select text or word from paragraph present there?
I tried below 1.browser.actions().keyDown(protractor.Key.CTRL).sendKeys('a').perform() and
2.var Key = protractor.Key; var Key = protractor.Key; browser.actions().sendKeys(Key.chord(Key.CONTROL, 's')).perform(); browser.actions().sendKeys(Key.chord(Key.CONTROL, Key.SHIFT, 'm')).perform(); browser.actions().sendKeys(Key.chord(Key.CONTROL, 'o')).perform();

elements have a getText() function
so get the element and then getText() from it
async function someName(elementID){
const elem = element(by.id(elementID));
return await elem.getText();
}
you can then parse the string however you like

Related

Word Add-in Get full Document text WITH INDICATOR?

There is already a question answering related to this topic: Word Add-in Get full Document text?
However, this method can't extract the indicator/bullet points.
Is there a way we can do this? I expect the text to be exactly the same as we manually select all then copy a Word document.
The reason behind this: I'm building a question bank from a microsoft word document. Several tools offer text extraction, however, it usually ignores the bullet point.
I use keywords like A. B. C. D. etc to detect the choices. However, if the author writing choices using indicator/bullet point, this method fails.
You can convert the numbered lists (list paragraphs) to plain text with a simple piece of vba.
See here convert lists to text
For each paragraph in the document, you can identify whether it is a list item by calling isListItem.
If it is, you can call listItem to get the item.
The listString property in Word.ListItem class can help you get the list item bullet, number, or picture as a string.
Here is an example that how to extract the bullets in the document.
Word.run(async (context) => {
var paragraphs = context.document.body.paragraphs;
paragraphs.load("$none");
await context.sync();
for (let i = 0; i < paragraphs.items.length; i++) {
paragraphs.items[i].load("isListItem");
paragraphs.items[i].load("text");
await context.sync();
if (paragraphs.items[i].isListItem) {
paragraphs.items[i].load("listItem");
await context.sync();
console.log(paragraphs.items[i].listItem.listString + " " + paragraphs.items[i].text);
} else {
console.log(paragraphs.items[i].text);
}
}
});
The document is printed to the console paragraph by paragraph with all bullets retained.

Google Scripts - keep track of element [duplicate]

Update: This is a better way of asking the following question.
Is there an Id like attribute for an Element in a Document which I can use to reach that element at a later time. Let's say I inserted a paragraph to a document as follows:
var myParagraph = 'This should be highlighted when user clicks a button';
body.insertParagraph(0, myParagraph);
Then the user inserts another one at the beginning manually (i.e. by typing or pasting). Now the childIndex of my paragraph changes to 1 from 0. I want to reach that paragraph at a later time and highlight it. But because of the insertion, the childIndex is not valid anymore. There is no Id like attribute for Element interface or any type implementing that. CahceService and PropertiesService only accepts String data, so I can't store myParagraphas an Object.
Do you guys have any idea to achieve what I want?
Thanks,
Old version of the same question (Optional Read):
Imagine that user selects a word and presses the highlight button of my add-on. Then she does the same thing for several more words. Then she edits the document in a way that the start end end indexes of those highlighted words change.
At this point she presses the remove highlighting button. My add-on should disable highlighting on all previously selected words. The problem is that I don't want to scan the entire document and find any highlighted text. I just want direct access to those that previously selected.
Is there a way to do that? I tried caching selected elements. But when I get them back from the cache, I get TypeError: Cannot find function insertText in object Text. error. It seems like the type of the object or something changes in between cache.put() and cache.get().
var elements = selection.getSelectedElements();
for (var i = 0; i < elements.length; ++i) {
if (elements[i].isPartial()) {
Logger.log('partial');
var element = elements[i].getElement().asText();
var cache = CacheService.getDocumentCache();
cache.put('element', element);
var startIndex = elements[i].getStartOffset();
var endIndex = elements[i].getEndOffsetInclusive();
}
// ...
}
When I get back the element I get TypeError: Cannot find function insertText in object Text. error.
var cache = CacheService.getDocumentCache();
cache.get('text').insertText(0, ':)');
I hope I can clearly explained what I want to achieve.
One direct way is to add a bookmark, which is not dependent on subsequent document changes. It has a disadvantage: a bookmark is visible for everyone...
More interesting way is to add a named range with a unique name. Sample code is below:
function setNamedParagraph() {
var doc = DocumentApp.getActiveDocument();
// Suppose you want to remember namely the third paragraph (currently)
var par = doc.getBody().getParagraphs()[2];
Logger.log(par.getText());
var rng = doc.newRange().addElement(par);
doc.addNamedRange("My Unique Paragraph", rng);
}
function getParagraphByName() {
var doc = DocumentApp.getActiveDocument();
var rng = doc.getNamedRanges("My Unique Paragraph")[0];
if (rng) {
var par = rng.getRange().getRangeElements()[0].getElement().asParagraph();
Logger.log(par.getText());
} else {
Logger.log("Deleted!");
}
}
The first function "marks" the third paragraph as named range. The second one takes this paragraph by the range name despite subsequent document changes. Really here we need to consider the exception, when our "unique paragraph" was deleted.
Not sure if cache is the best approach. Cache is volatile, so it might happen that the cached value doesn't exist anymore. Probably PropertiesService is a better choice.

Hot to get the value of the current selected element in taiko test?

In my taiko test script I selected an input element with the proximity parameter.
Then I can write some text in the input element.
After that, I want to make an assertion, that the text was actually written into that element by checking the value. How do I do that?
Here is an excerpt of my test script:
await click($(`input`, below('someHeader')));
await write('abc');
The input field has no id.
How can I write a check, that the value is 'abc' ?
textBox is the selector that represents all text inputField in Taiko...
proximity selector can be used to fetch the inputfield to write and get its value like below
await write('abc',into(textBox(below('someHeader'))));
await textBox(below('someHeader')).value();
You can get the text stored by using any selectors. If there is no any ID then you can try using label name, textBox or you can try using xpath
const myText = await $("xpath_value").text();
assert.strictEqual(myText, 'abc')

NodeJS: Extract a sentence from html text based on a phrase

I have some text stored in a database, which looks something like below:
let text = "<p>Some people live so much in the future they they lose touch with reality.</p><p>They don't just <strong>lose touch</strong> with reality, they get obsessed with the future.</p>"
The text can have many paragraphs and HTML tags.
Now, I also have a phrase:
let phrase = 'lose touch'
What I want to do is search for the phrase in text, and return the complete sentence containing the phrase in strong tag.
In the above example, even though the first para also contains the phrase 'lose touch', it should return the second sentence because it is in the second sentence that the phrase is inside strong tag. The result will be:
They don't just <strong>lose touch</strong> with reality, they get obsessed with the future.
On the client-side, I could create a DOM tree with this HTML text, convert it into an array and search through each item in the array, but in NodeJS document is not available, so this is basically just plain text with HTML tags. How do I go about finding the right sentence in this blob of text?
I think this might help you.
No need to involve DOM in this if I understood the problem correctly.
This solution would work even if the p or strong tags have attributes in them.
And if you want to search for tags other than p, simply update the regex for it and it should work.
const search_phrase = "lose touch";
const strong_regex = new RegExp(`<\s*strong[^>]*>${search_phrase}<\s*/\s*strong>`, "g");
const paragraph_regex = new RegExp("<\s*p[^>]*>(.*?)<\s*/\s*p>", "g");
const text = "<p>Some people live so much in the future they they lose touch with reality.</p><p>They don't just <strong>lose touch</strong> with reality, they get obsessed with the future.</p>";
const paragraphs = text.match(paragraph_regex);
if (paragraphs && paragraphs.length) {
const paragraphs_with_strong_text = paragraphs.filter(paragraph => {
return strong_regex.test(paragraph);
});
console.log(paragraphs_with_strong_text);
// prints [ '<p>They don\'t just <strong>lose touch</strong> with reality, they get obsessed with the future.</p>' ]
}
P.S. The code is not optimised, you can change it as per the requirement in your application.
There is cheerio which is something like server-side jQuery. So you can get your page as text, build DOM, and search inside of it.
first you could var arr = text.split("<p>") in order to be able to work with each sentence individually
then you could loop through your array and search for your phrase inside strong tags
for(var i = 0; i<arr.length;i++){
if(arr[i].search("<strong>"+phrase+"</strong>")!=-1){
console.log("<p>"+arr[i]);
//arr[i] is the the entire sentence containing phrase inside strong tags minus "<p>"
} }

Python Selenium Scraping Javascript - Element not found

I am trying to scrape the following Javascript frontend website to practise my Javascript scraping skills:
https://www.oplaadpalen.nl/laadpaal/112618
I am trying to find two different elements by their xPath. The first one is the title, which it does find. The second one is the actual text itself, which it somehow fails to find. It's strange since I just copied the xPath's from Chrome browser.
from selenium import webdriver
link = 'https://www.oplaadpalen.nl/laadpaal/112618'
driver = webdriver.PhantomJS()
driver.get(link)
#It could find the right element
xpath_attribute_title = '//*[#id="main-sidebar-container"]/div/div[1]/div[2]/div/div[' + str(3) + ']/label'
next_page_elem_title = driver.find_element_by_xpath(xpath_attribute_title)
print(next_page_elem_title.text)
#It fails to find the right element
xpath_attribute_value = '//*[#id="main-sidebar-container"]/div/div[1]/div[2]/div/div[' + str(3) + ']/text()'
next_page_elem_value = driver.find_element_by_xpath(xpath_attribute_value)
print(next_page_elem_value.text)
I have tried a couple of things: change "text()" into "text", "(text)", but none of them seem to work.
I have two questions:
Why doesn't it find the correct element?
What can we do to make it find the correct element?
Selenium's find_element_by_xpath() method returns the first element node matching the given XPath query, if any. However, XPath's text() function returns a text nodeā€”not the element node that contains it.
To extract the text using Selenium's finder methods, you'll need to find the containing element, then extract the text from the returned object.
Keeping your own logic intact you can extract the labels and the associate value as follows :
for x in range(3, 8):
label = driver.find_element_by_xpath("//div[#class='labels']//following::div[%s]/label" %x).get_attribute("innerHTML")
value = driver.find_element_by_xpath("//div[#class='labels']//following::div[%s]" %x).get_attribute("innerHTML").split(">")[2]
print("Label is %s and value is %s" % (label, value))
Console Output :
Label is Paalcode: and value is NewMotion 04001157
Label is Adres: and value is Deventerstraat 130
Label is pc/plaats: and value is 7321cd Apeldoorn
I would suggest a slightly different approach. I would grab the entire text and then split one time on :. That will get you the title and the value. The code below will get Paalcode through openingstijden labels.
for x in range(2, 8):
s = driver.find_element_by_css_selector("div.leftblock > div.labels > div")[x].text
t = s.split(":", 1)
print(t[0]) # title
print(t[1]) # value
You don't want to split more than once because Status contains more semicolons.
Going with #JeffC's approach, if you want to first select all those elements using xpath instead of css selector, you may use this code:
xpath_title_value = "//div[#class='labels']//div[label[contains(text(),':')] and not(div) and not(contains(#class,'toolbox'))]"
title_and_value_elements = driver.find_elements_by_xpath(xpath_title_value)
Notice the plural elements in the find_elements_by_xpath method. The xpath above selects div elements that are descendants of a div element that had a class attribute of "labels". The nested label of each selected div must contain a colon. Furthermore, the div itself may not have a class of "toolbox" (Something that certain other divs on the page have), nor must it contain any additional nested divs.
Following which, you can extract the text within the individual div elements (which also contain the text from the nested label elements) and then split them using ":\n" which separates the title and value in the raw text string.
for element in title_and_value_elements:
element = element.text
title,value = element.split(":\n")
print(title)
print(value,"\n")
Since you want to practice JS skills you can do this also in JS, actually all the divs contain more data, you can see if you do paste this in the browser console:
labels = document.querySelectorAll(".labels");
divs = labels[0].querySelectorAll("div");
for (div of divs) console.log(div.firstChild, div.textContent);
you can push to an array and check only divs and that have label and return the resulted array in a python variable:
labels_value_pair.driver.execute_script('''
scrap = [];
labels = document.querySelectorAll(".labels");
divs = labels[0].querySelectorAll("div");
for (div of divs) if (div.firstChild.tagName==="LABEL") scrap.push(div.firstChild.textContent, div.textContent);
return scrap;
''')

Categories

Resources