Suppose you were reading a text file, with Javascript and jQuery and suppose the server-side guy was unwilling to give you say xml or JSON, and you want to parse the thing once to get relevant text that you will use later in an autocomplete, like so:
Text file (assume there are many similar listings and there are different DATABASES):
QUERY:1
DATABASE:geoquery
NL:What are the capitals of the states that border the most populated states?
SQL:something
DR:
root(ROOT-0, What-1)
cop(What-1, are-2)
det(capitals-4, the-3)
nsubj(What-1, capitals-4)
det(states-7, the-6)
prep_of(capitals-4, states-7)
nsubj(border-9, states-7)
rcmod(states-7, border-9)
det(states-13, the-10)
advmod(populated-12, most-11)
amod(states-13, populated-12)
dobj(border-9, states-13)
QUERY:2
DATABASE:geoquery
NL:What are the capitals of states bordering New York?
SQL:SELECT state.Capital FROM state JOIN border_info ON state.State_Name
DR:
root(ROOT-0, What-1)
cop(What-1, are-2)
det(capitals-4, the-3)
nsubj(What-1, capitals-4)
prep_of(capitals-4, states-6)
partmod(states-6, bordering-7)
nn(York-9, New-8)
dobj(bordering-7, York-9)
I can use a regex to peel off say all NL: for example, but I need to first pare the file down so only specific NL's associated with a DATABASE get read. So read the file once getting all matches for a specific database that the user selects from a select, then make an array of NL from that list to be the source of an autocomplete.
$(document).ready(function(){
$.get('inputQueryExamples.txt',function(data){
// need code here to read text file first and limit results
var queryString = data;
var cleanString = "";
cleanString = queryString.match(/^NL.*/gm);
console.log(cleanString);
$('#what').html(cleanString);
var nlString = cleanString.map(function(el) {return el.replace('NL:','');});
$('#query-list').autocomplete({
source:nlString
});
});//end get
});
Thanks for any insight.
Using regex for this is like using ducktape to patch up a severed limb.
Any way,
By the looks of it, you want to get all of the NL('s) when they come from a particular database.
You would need to do a multiline regex match, with a positive lookbehind for the database name, then you'd simply match anything after NL, stopping at the next newline.
Example:
(?<=DATABASE:geoquery).*?(?<=NL:)(.*?)(?=[\r\n])
Online demo:
Regex101 Example
Related
I wrote a Google Apps Script that pulls a single spreadsheet cell from new Google Form entries and sends its contents via email.
The script is working nicely, but the cell content is sent as a single block of text (newlines, paragraphs, etc are dropped). The Google Form entry is of "Paragraph/Long Text" type and I'd like to maintain the authors' formatting in the generated email.
I am tinkering with string types, but can't quite find the right combination. Any advice will be immensely appreciated. Code below.
function SendEmail() {
// find out how many rows exist
var numRows = SpreadsheetApp.getActiveSheet().getRange("A:A").getLastRow();
// fetch entry
var messageRange = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Form Response").getRange("D" + numRows);
// build message
var message = {
to: "destination#email.net",
subject: "New Entry Posted",
htmlBody: 'Hello, a new entry was posted.<p> <p>' + messageRange.getValues() + "<p> <p>Link here."
};
// send
MailApp.sendEmail(message);
}
Keeping Line-Breaks
For preserving line-breaks, the issue is text based line breaks (carriage return, \r, or newline, \n) are ignored in HTML. The best ways to create spacing between lines is to either use the dedicated "line break" element, or separate text into individual elements and add spacing with CSS.
Luckily, converting Google Sheets line breaks into HTML is really easy. Simply change messageRange.getValue() to messageRange.getValue().replace(/[\r\n]{1,2}/g,"<br>").
Advanced Formatting
If you are looking to preserver more advanced formatting, such as colors and images, the solution gets a bit trickier. Both range.getValue() and event range.getRichTextValue().getText() both return plain text. In order to convert to HTML, you need to use a bunch of other methods, like range.getFontColors();, and parse the output into CSS and combine it with the plain text value. There is a dedicated library called SheetConverter to accomplish this, and you can see this SO answer for details.
Other things I noticed:
I noticed a few other things about your code you might want to change. You might have noticed in my solution that I used messageRange.getValue(), but in your code you have messageRange.getValues(). This is because you want a single value, but range.getValues() is for getting multiple values out of a range of > 1 cells.
You also have malformed HTML in your htmlBody. You open a bunch of <p> tags, but never close them with </p>.
Another thing is that the way you get the last row doesn't really make sense. You check the last row of "A:A" in the active sheet, but the active sheet can change and there is no guarantee that it is the same as "Form Response", or that A:A has the same last row as D:D. I think a safer solution would be something like this (which also includes the above recommended changes):
function SendEmail() {
var formResponseSheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Form Response");
// fetch entry
var messageRange = formResponseSheet.getRange("D" + formResponseSheet.getLastRow());
// build message
var message = {
to: "destination#email.net",
subject: "New Entry Posted",
htmlBody: 'Hello, a new entry was posted.<p>' + messageRange.getValue().replace(/[\r\n]{1,2}/g,"<br>") + "</p>Link here."
};
// send
MailApp.sendEmail(message);
}
I have an issue related to finding a regex for the link with some conditions. Here is the scenario:
I have created utils.ts it's a typescript. basically, it will take an API response as an input and return the formatted HTML supported text, like bold text, email, Images, Links.
So let's take one scenario which I am facing.
as a return of the utils.ts file, I am getting this.
https://www.google.com Click here
(Note: normal links and 'a' tag links can occure in any order)
from the above text, as you can see this part Click here is already in HTML supported method.
So I will get the following output on GUI
https://www.google.com Click here
so from this point, I want a regex which can format https://www.google.com but it must not manipulate Click here as it is already formated.
Here I also want to format https:///www.google.com as follow
Google
The main problem I am facing is when I am replacing the string with 'https://..' with tags it will also replace the links inside 'href' like this
Google Google">Click me</a>
Which is what I don't want.
Please share your thought on this.
Thank you
Not yet formatted links can be found using alternations. The idea is - if a link is formatted it's not captured to a group (don't be confused that the regex still finds something - you should only look at Group 1). Otherwise, the link is captured to a group.
The regex below is really simple, just to explain the idea. You might want to update it with a better URL search pattern.
demo
(?:href="https?\S+")|(https?\S+)
If I understood correctly, you want to extract from the text those web addresses that appear in the text and are not links. If so check out the following javascript:
//the data:
var txt1='https://www.google.com Click here http://other.domain.com';
// strip html tags
String.prototype.stripHTML = function () {
var reTag = /<(?:.|\s)*?>/g;
return this.replace(reTag, " ");
};
var txt2=txt1.stripHTML();
//console.log(txt2);
//split tokens
var regex1 = /\s/;
var tokens = txt2.split(regex1);
//console.log(tokens);
//build an address table
regex2=/^https?:\/\/.*/;
var i=0, j=0;
var addresses=[];
for (i in tokens) {
if (regex2.test(tokens[i])) {
addresses[j] = tokens[i];
j++;
}
i++;
}
console.log(addresses);
I want to build a webpage/ custom search engine that will take user input and then query google for the results, except that with a number of keywords I want to append some predefined strings that ( will be stored in a dictionary/file..).
I tried using a form and then submitting query to google, but I want to do it as beautifully (and much the same way) as these people have done.
They just append Zlatan at the beginning , I want to append variable strings.
To show you what I've tried, here is the link to GitHub: https://github.com/google/google-api-php-client/blob/master/examples/simple-query.php
Any useful links, knowledge , suggestions, steps would be heartily appreciated.
This is how they do it. Just do the same but append any word from you list of words as you like. It's not clear how you're supposed to select what words to add. Specify that and we can be more helpful.
var form = document.querySelector('form');
var input = document.getElementById('field');
form.addEventListener('submit', function(ev) {
ev.preventDefault();
var redirect = 'https://google.com/search?q=zlatan+' + input.value;
window.location = redirect;
});
when i use user defined tags with uppercase node like "<ABC> test </ABC>" in ckeditor .On clicking source, it gets displayed as "<abc> test </abc>".please help me to get the expected output , which should be <ABC> test </ABC> and please guide me where the code should be modified.Thanking you
(Continued from comments) I propose post-processing the content and not trying to bend CKEditor to produce Case Sensitive output.
I don't know your languages or your architecture, but if you get the data from CKEditor with getData(), you can do something like this if you want to do the conversion in the client side:
// Javascript
var i = CKEDITOR.instances.editor1;
var d = i.getData();
var correctData = d.replace(/<abc/ig, '<ABC');
In the backend you can do something similar
// C# (untested)
string result = Regex.Replace(
htmlStringFromAJAX,
RegEx.Escape("<abc"),
RegEx.Escape("<ABC"),
RegexOptions.IgnoreCase
);
// PHP (untested)
$result = str_ireplace("<abc", "<ABC", $htmlStringFromAJAX);
(I hope you either have just this one abc tag or a small static amount of tags - if not, this will be a very annoying solution to maintain.)
Firstly I've looked at a lot of posts on Stackoverflow but I don't see one which seems to be the definitive way. Someone always seems to find a flaw in the regex.
I already have retrieved my tweets and obviously they can contain any number of hashtags in each one.
If I have an array of possible hashtags that I want to find - ["#ENGLAND","#IRELAND","#wales"] etc.
What is a RELIABLE way to check if a tweet contains these hashtags. I don't want to call the API again, I only want to check my existing tweets, as I'm clicking on buttons to change the filtering on the fly, want to avoid rate limit if they keep clicking around for ages.
EDIT:
Example: Here is a tweet that contains #ENGLAND and #someothertag
I want to search all the tweets and just show the tweets that CONTAIN one or more of my array of tags, I already cache the tweets, I don't want to make a call containing any tags just filter the existing results!
Why only hashify particular hashtags (which you need to specify and then maintain) when you can hashify any hashtag?
I usually use something like this:
var hashregex = /#([a-z0-9_\-]+)/gi,
text = text.replace(hashregex, function (value) {
return '<a target="_blank" href="http://twitter.com/#!/search?q=' + value.replace('#', '%23') + '">' + value + '</a>';
});
Then you can just use text when you set the content to the processed tweets
You could store the hashtags from the entities on the element, for instance
<div class='tweet' data-hashtags='england ireland'>some_tweet</div>
And filter like this when someone clicks your button:
$('div.tweet').hide();
$('div.tweet[data-hashtags~="ireland"]').show();
It's obviously greatly simplified, but the general approach should help you avoid having to parse out the tags yourself
// match a #, followed by either "question" or "idea"
var myregexp = /#(england|ireland|wales)\b/i;
var match = myregexp.exec(subject);
if (match != null) {
result = match[1]; // will contain "england","ireland", or "wales"
} else {
result = "";
}
If you don't know the names of the hashtags on hand
replace
var myregexp = /#(england|ireland|wales)\b/i;
with
var myregexp = /#(\w+)/; // Use this instead