how to extract this kind of data and put them into a nice array? - javascript

I got a string like this one:
var tweet ="#fadil good:))RT #finnyajja: what a nice day RT #fadielfirsta: how are you? #finnyajja yay";
what kind of code should work to extract any words with # character and also removing any special char at the end of the words? so it would an array like this :
(#fadil, #finnyajja, #fadielfirsta, #finnyajja);
i have tried the following code :
var users = $.grep(tweet.split(" "), function(a){return /^#/.test(a)});
it returns this:
(#fadil, #finnyajja:, #fadielfirsta:, #finnyajja)
there's still colon ':' character at the end of some words. What should I do? any solution guys? Thanks

Here is code that is more straightforward than trying to use split:
var tweet_text ="#fadil good:))RT #finnyajja: what a nice day RT #fadielfirsta: how are you? #finnyajja yay";
var result = tweet_text.match(/#\w+/g);

The easiest way without changing your current code too much would be to just remove all colons prior to calling split:
var users = $.grep(tweet_text.replace(":","").split(" "), function(a){return /^#/.test(a)});
You could also write a regex to do all the work for you using match. Something like this:
var regex = /#[a-z0-9]+/gi;
var matches = tweet.match(regex);
This assumes that you only want letters and numbers, if certain other characters are allowed, this regex will need to be modified.
http://jsfiddle.net/YHM87/

Related

How would I write a Regular Expression to capture the value between Last Slash and Query String?

Problem:
Extract image file name from CDN address similar to the following:
https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba
Two-stage Solution:
I am using two regular expressions to retrieve the file name:
var postLastSlashRegEx = /[^\/]+$/,
preQueryRegEx = /^([^?]+)/;
var fileFromURL = urlString.match(postLastSlashRegEx)[0].match(preQueryRegEx)[0];
// fileFromURL = "photo%2FB%_2.jpeg"
Question:
Is there a way I can combine both regular expressions?
I've tried using capture groups, but haven't been able to produce a working solution.
From my comment
You can use a lookahead to find the "?" and use [^/] to match any non-slash characters.
/[^/]+(?=\?)/
To remove the dependency on the URL needing a "?", you can make the lookahead match a question mark or the end of line indicator (represented by $), but make sure the first glob is non-greedy.
/[^/]+?(?=\?|$)/
You don't have to use regex, you can just use split and substr.
var str = "https://cdnstorage.api.com/v0/b/my-app.com/o/photo%2FB%_2.jpeg?alt=media&token=4e32-a1a2-c48e6c91a2ba".split("?")[0];
var fileName = temp.substr(temp.lastIndexOf('/')+1);
but if regex is important to you, then:
str.match(/[^?]*\/([^?]+)/)[1]
The code using the substring method would look like the following -
var fileFromURL = urlString.substring(urlString.lastIndexOf('/') + 1, urlString.lastIndexOf('?'))

Splitting string with javascript using '>' character

I acknowledge that this question has probably been asked so many times before and I have tried searching all over StackOverflow for a solution, but so far nothing has worked for me.
I want to split a string but it's not working properly and spitting out individual characters as each item in an array. The string I have from my CMS uses ">" characters to separate and I am using regEx to replace the 'greater than' symbol - with a comma, which works. Sourced this solution from Regex that detects greater than ">" and less than "<" in a string
However, the arrays remain incorrectly formed, like the split() function does not even work:
var myString = "TEST Public Libraries Connect > News Blog > A new item"
var regEx = /<|>/g;
var myNewString = (myString.replace(regEx,","))
alert(myNewString);
myNewString.split(",");
alert(myNewString[0]);
alert(myNewString[1]);
alert(myNewString[2]);
I've put it up in a Fiddle as well, just confused as to why the split won't work properly. Is it because there is spaces in the string?
This should work:
var myNewString = myString.split(">");
https://jsfiddle.net/2j56cva0/3/
In your fiddle, you were splitting myNewString instead of the actual string.
myNewString.split(",");
You need to assign the result of the split to something. It does not just change the string itself into an array.
var parts = myNewString.split(",");

How do I get a list of strings ending in a newline or ending in the end of the string in javascript regex?

I'm pretty frustrated with regex right now. Given:
var text = "This is a sentence.\nThis is another sentence\n\nThis is the last sentence!"
I want regex to return to me:
{"This is a sentence.\n", "This is another sentence\n\n", "This is the last sentence!"}
I think i should use
var matches = text.match(/.+[\n+\Z]/)
but \Z doesn't seem to work. Does javascript have an end of string matcher?
You can use the following regex.
var matches = text.match(/.+\n*/g);
Working Demo
Or you could match a newline sequence "one or more" times or the end of the string.
var matches = text.match(/.+(?:\n+|$)/g);
Try this one: /(.+\n*)/g
See it here: http://regex101.com/r/wK8oX3/1
If you wanted an array and didn't want to keep the "\n" around you could do...
var strings = text.split("\n");
which would yield
["This is a sentence.", "This is another sentence", "", "This is the last sentence!"]
if you wanted to get rid of that empty string chain a filter onto the split...
var strings = text.split("\n").filter(function(s){ return s !== ""; });
Maybe not what you want tho, also not as efficient as the regex options already proposed.
Edit: as torazaburo pointed out using Boolean as the filter function is cleaner than a callback.
var strings = text.split("\n").filter(Boolean);
Edit Again: I keep getting one upped, using the /\n+/ expression is even cooler.
var strings = text.split(/\n+/);
To get an array of sentences:
var matches = text.match(/.+?(?:(?:\\n)+|$)/g);
You can try this,
text.match(/.+/g)

javascript regex to extract the first character after the last specified character

I am trying to extract the first character after the last underscore in a string with an unknown number of '_' in the string but in my case there will always be one, because I added it in another step of the process.
What I tried is this. I also tried the regex by itself to extract from the name, but my result was empty.
var s = "XXXX-XXXX_XX_DigitalF.pdf"
var string = match(/[^_]*$/)[1]
string.charAt(0)
So the final desired result is 'D'. If the RegEx can only get me what is behind the last '_' that is fine because I know I can use the charAt like currently shown. However, if the regex can do the whole thing, even better.
If you know there will always be at least one underscore you can do this:
var s = "XXXX-XXXX_XX_DigitalF.pdf"
var firstCharAfterUnderscore = s.charAt(s.lastIndexOf("_") + 1);
// OR, with regex
var firstCharAfterUnderscore = s.match(/_([^_])[^_]*$/)[1]
With the regex, you can extract just the one letter by using parentheses to capture that part of the match. But I think the .lastIndexOf() version is easier to read.
Either way if there's a possibility of no underscores in the input you'd need to add some additional logic.

regex: read character behind number

i have regex code:
<script type="text/javascript">
var str = "kw-xr55und";
var patt1 = /[T|EE|EJU].*D/i;
document.write(str.match(patt1));
</script>
it can read:
str= "KD-R55UND" -> as UND
but if i type:
str= "kw-tc800h2und -> result tc-800h2und. //it makes script read T in front of 800
i want the result as UND
how to make the code just check at character behind the 800?
EDIT
After this code it can work:
<script type="text/javascript">
var str = "kw-tc800h2und";
var patt1 = /[EJTUG|]\D*D/i;
document.write(str.match(patt1));
</script>
but show next problem, i can show the result if:
str= "kw-tc800un2d"
i want result like -> UN2D
Try this:
var patt1 = /(T|EE|EJU)\D*$/i;
It will match a sequence of non-digit characters starting with T, EE or EJU, and finishing at the end of the string. If the string has to end with D as in your examples, you can add that in:
var patt1 = /(T|EE|EJU)\D*D$/i;
If you want to match it anywhere, not just at the end of the string, try this:
var patt1 = /(T|EE|EJU)\D*D/i;
EDIT: Oops! No, of course that doesn't work. I tried to guess what you meant by [T|EE|EJU], because it's a character class that matches one of the characters E, J, T, U or | (equivalent to [EJTU|]), and I was sure that couldn't be what you meant. But what the heck, try this:
var patt1 = /[EJTU|]\D*D/i;
I still don't understand what you're trying to do, but sometimes trial and error is the only way to move ahead. At least I tested it this time! :P
EDIT: Okay, so the match can contain digits, it just can't start with one. Try this:
var patt1 = /[EJTU|]\w*D/i;
For PCRE, try this:
/(?<=\d)\D*/
It uses a lookbehind to find a set of non-digit characters that comes immediately after a digit.
For Javascript, try this:
/\D+$/
It will match any characters that are not digits from the end of the text backwards.
Depending on what flavor of regex you're using, you can use a look behind assertion. It will essentially say 'match this if it's right after a number'.
In python it's like this:
(?<=\d)\D*
Oh, also, regex is case sensitive unless you set it not to be.
Try /[TEUJ]\D*\d*\D*D$/i if it can have 1 digit in it, but not 2. Getting any more specific would require additional information, such as the maximum length of the string, or what exactly differs between parsing tc800h2und and h2und.

Categories

Resources