reformat characters in json data - javascript

I am retrieving data from reddit json. and some data is like that:
The actual resolution of this image is 3067x2276, not 4381x3251. See [this](https://www.reddit.com/r/EarthPorn/wiki/index#wiki_resolution.3F_what_is_that_and_how_can_i_find_it.3F) page for information on how to find out what the resolution of an image is.
i want to insert the data into <p></p> on my page but the link is as it is above (not clickable).
Notice when i try to post it on stackoverflow, it very nicely reformats into a clickable link. How do i do that?
reformatted by stackoverflow:
The actual resolution of this image is 3067x2276, not 4381x3251. See this page for information on how to find out what the resolution of an image is.
How do i achieve that?

I feel like I cheated, but inspecting the OP in my browser, I get...
<p>The actual resolution of this image is 3067x2276, not 4381x3251. See this page for information on how to find out what the resolution of an image is.</p>
In other words, if you find [words](URL), replace it with:
words
This little regex tries to capture the contents of [] followed by (). Checking for http may be insufficient depending on the sort of links you expect...
let regex = /\[(.*?)\]\(([^\)]+)\)/g;
let matches = regex.exec(line);
// matches ought to contains words and a potential url
if (matches.length > 2 && matches[2].startsWith("http://")) {
// matches[2] is probably a url, so...
let replace = `${matches[1]}`
// ...
}

Start with Regular Expressions, basically wildcards on steroids.
/\[.*\]\(.*\)/, While looking weird, will find [*](*) where * can be any length string. All this can do is find the first index of this appearing. I tried looking but i'm not the best with JS.
https://www.w3schools.com/js/js_regexp.asp

Related

Splitting Web Page URL but Not URL After the Hash in JavaScript

So I tried getting the text after my URL like this "example.com/#!/THISPART" with the code below:
var webSplit = window.location.hash.split('/')[1];
And It works but if I supply another URL after like this "example.com/#!/https://example.com/" it gives this as output "https" which I don't want. I want the whole URL. Can someone help me out? Thanks!
window.location.hash.split('!/')[1]; // Outputs "https://example.com/"
Meet REGEX.
var part = location.hash.replace(/#!\//, '');
That'll give you whatever's after #!/.
The actual problem is that your approach for the split is incorrect.
With your example window.location.hash return #!/https://example.com/. What you#re doing is splitting that element by / and using the second element of the result.
The result of the split would be 4 elements ([ "#!", "https:", "", "example.com" ]) and the second element would be just he https:. What you need to do, do fix it is either find an another delimiter or find another way to extract the URL.
With your current example you could !/ for the split to get two elements back ([ "#", "https://example.com" ]) where the second one would be the whole URL, as long as it doesn't contain the string !/. Another appraoch would be to find the first occurrence of http from the left and take a substring from what position until the end of it.
Other answers show some different solution.
var webSplit = window.location.split('/#!/')[1];
Output will be :: https://example.com/ as ou want.

Matching a JS string with regex

I have a long xml raw message that is being stored in a string format. A sample is as below.
<tag1>val</tag><tag2>val</tag2><tagSomeNameXYZ/>
I'm looking to search this string and find out if it contains an empty html tag such as <tagSomeNameXYZ/>. This thing is, the value of SomeName can change depending on context. I've tried using Str.match(/tagSomeNameXYZ/g) and Str.match(/<tag.*.XYZ\/>/g) to find out if it contains exactly that string, but am able to get it return anything. I'm having trouble in writing a reg ex that matches something like <tag*XYZ/>, where * is going to be SomeName (which I'm not interested in)
Tl;dr : How do I filter out <tagSomeNameXYZ/> from the string. Format being : <constant variableName constant/>
Example patterns that it should match:
<tagGetIndexXYZ/>
<tagGetAllIndexXYZ/>
<tagGetFooterXYZ/>
The issue you have with Str.match(/<tag.*.XYZ\/>/g) is the .* takes everything it sees and does not stop at the XYZ as you wish. So you need to find a way to stop (e.g. the [^/]* means keep taking until you find a /) and then work back from there (the slice).
Does this help
testString = "<tagGetIndexXYZ/>"
res = testString.match(/<tag([^/]*)\/\>/)[1].slice(0,-3)
console.log(res)

regex replace urls with anchors ONLY if not already in anchors

I've seen similar questions asked before, but none with a working solution.
I am trying to replace all urls on a page with anchor tags, but only those which aren't already within anchor tags.
so http://google.com should be replaced with
http://google.com
But only if it's not already within an anchor tag.
Any thoughts?
I think you need to do a two-pass operation. Split the source into
PART1 <a href=...>blah></a> PART2 <a href=...>blah</a> PART3...
Then replace urls with <a href="url"> in each of PART1, PART2 etc, then paste it all back together.
Doing it within a single regex is going to be a headache, if not impossible, depending on your dialect.
For jobs like this, I normally recommend people do it with code rather than regex because regex gets really messy, really fast. However, if you do want a regex, here is a workable solution. Please go to the link to get a full understanding and view of test cases I used.
http://regex101.com/r/kL3iL7
(?:http([s]?):\/\/)?((\w+[.])+\w+(\/\w*)*(\?[^\s]*)*)(?![^\s]*>)
with replacement
\2
I do not promise that is is perfect, but it does handle a lot of cases. Let me know if there are any test cases it needs to be fixed for.
If you are doing it on the client-side it might be worth doing it by walking document tree
Look through text nodes (nodeName="#text") and if there is substring starting with http/https and parent tag is not A - replace it with pattern (\1 etc)
consider this to start
// getting all tags where there is a text with 'http' which are not links
var textTags = [].slice.call(document.getElementsByTagName('*'))
.filter(function(n) {
return !n.children.length
&& n.nodeName !='A' && n.nodeName !='INPUT'
&& (n.innerHTML.indexOf('http') > -1) })
for(var i in textTags) {
// your code to replace links with whatever you want
}

Only match regex if it doesnt start with a pattern in javascript

I have a bit of a strange one here, I basically have a large chunk of text which may or may not contain links to images.
So lets say it does I have a pattern which will extract the image url fine, however once a match is found it is replaced with a element with the link as the src. Now the problem is there may be multiple matches within the text and this is where it gets tricky. As the url pattern will now match the src tags url, which will basically just enter an infinite loop.
So is there a way to ONLY match in regex if it doesnt start with a pattern like ="|=' ? as then it would match the url in something like:
some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6
but not
some image <img src="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6">
I am not sure if it is possible, but if it is could someone point me in the right direction? A replace by itself will not suffice in this scenario as the url matched needs to be used elsewhere too so it needs to be used like a capture.
The main scenarios I need to account for are:
Many links in one block of varied text
A single link without any other text
A single link with other varied text
== edit ==
Here is the current regex I am using to match urls:
(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
== edit 2 ==
Just so everyone understands why I cannot use the /g command here is an answer which explains the issue, if I could use this /g like I originally tried then it would make things a lot simpler.
Javascript regex multiple captures again
What you are looking for is a negative look behind, but Javascript doesn't support any kind of look behinds, so you will either have to use a callback function to check what was matched and make sure it is not preceded by a ' or ", or you can use the following regex:
(?:^|[^"'])(\b(https?|ftp|file):\/\/[-a-zA-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
which has a single problem, that is in the case of a successful match it will catch one more character, the one right before the (\b(https?|ftp|file) pattern in the input, but I think you can deal with this easily.
Regex101 Demo
Using the /ig command at the end should work... the g is for global replace and the i is for case-insensitivity, which is necessary as you've only got A-Z instead of a-zA-Z.
Using the following vanilla JS appears to work for me (see jsfiddle)...
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");
Although, what it does highlight is that the query string part of the URL (the ?v=6 is not being picked up with your RegEx).
For jQuery, it would be (see jsfiddle)...
$(document).ready(function(){
var test="some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 some image http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
$("#output").html(test.replace(re,"<img src=\"$1\"/>"));
});
Update
Just in case my example of using the same image URL in the example doesn't convince you - it also works with different URLs... see this jsfiddle update
var test="http://cdn.sstatic.net/stackoverflow/img/sprites.png?v=6 http://cdn.sstatic.net/serverfault/img/sprites.png?v=7";
var re = new RegExp(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))/ig);
document.getElementById("output").innerHTML = test.replace(re,"<img src=\"$1\"/>");
Couldn't you just see if there is a whitespace in front of the url, instead of that word-boundary? seems to work, although you will have to remove the matched whitespace later.
(\s(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*(?:png|jpeg|jpg|gif|bmp))
http://rubular.com/r/9wSc0HNWas
Edit: Damn, too slow :) I'll still leave this here as my regex is shorter ;)
as was said by freefaller, you might use /g flag to just find all matches in one go, if exec is not a must.
otherwise: you can add (="|=')? to the beginning of your regex, and check if $1 is undefined. if it is undefined, then it was not started with a ="|=' pattern

please extract a bit of info from this string (without regex so that i can understand it)

On my web app, I take a look at the current URL, and if the current URL is a form like this:
http://www.domain.com:11000/invite/abcde16989/root/index.html
-> All I need is to extract the ID which consists of 5 letters and 5 numbers (abcde16989) in another variable for further use.
So I need this:
var current_url = "the whole path, not just the hostname";
if (current_url has ID)
var ID = abcde16989;
You could always use split using / as the delimiter if the ID is always going to be in the same position, eg
var parts = current_url.split('/');
var id = parts[4];
Though your requirement of matching "5 letters and 5 numbers" really does suit a regex match.
var id = current_url.match(/[a-zA-Z]{5}[0-9]{5}/); // returns null if not found
I'm assuming you don't need the full URL, but just the pathname to get your ID. Use the following:
var current_url = window.location.pathname; //gets the pathname
var split_url = current_url.split('/'); //splits the path at each /
current_id = split_url[2]; //1st item in array is "invite", 2nd is your id, 3rd would be "root"
alert(current_id);
Firstly, this doesn't need JQuery; this is simple Javascript. I'll amend your tags after I've replied to reflect this.
A regex would actually be quite an easy way to achieve this, and I don't think a simple one like this would be as difficult to understand as you think.
So I'll answer with the regex option anyway and then move on to other options:
var url = "http://www.domain.com:11000/invite/abcde16989/root/index.html";
//first method:
var id = url.match('^http://www.domain.com:11000/invite/(.+)/root/index.html$')[1];/index.html$/)[1];
//second method: (if you don't know exact format of the rest of the URL but you do know the format of the ID string)
var id = url.match('/([a-z]{5}[0-9]{5})/')[1];
The first method will get the string in the position you specified within the URL. It won't check the formatting; it just looks at the rest of the URL and grabs the bit of it you're asking for. This should be really easy to understand: It's basically just your URL, but with (.+) where your ID goes.
The second method looks specifically for a string in the format you asked for -- ie five letters and then five numbers. This is admittedly a bit harder to read, but should be fairly self explanatory if you look at it given those criteria.
In both cases, the regex itself will return an array of results, with array element zero being the whole string (ie in the first case, including the rest of the URL). This is where the (brackets) come in (ie the bit where we said (.+)). This tells the match function to put the contents of the brackets into another array element so we can use it. In both cases, this means that we can read the ID in array element [1].
Okay, so how about the non-regex options:
In fact, it's going to be quite hard to do it in a simple way without regex in Javascript, since even the simple string splitting function uses a regex match to do the split (granted it would be a very simple one, it is still a regex). A couple of other people have already given you answers using this, but it is still a regex, so technically they've also not answered your question accurately.
I'm going to guess that actually one of these answers will be good enough for you (either mine or more likely one of the answers using split()), despite there still being a regex element. However if you really don't want anything to do with regex, you're going to have to start doing some slightly more complex string manipulation, probably using substring() (though there are other ways to do it).
Something along the lines of this:
var prefixstring="http://www.domain.com:11000/invite/";
var prefixlen=prefixstring.length;
var idlen=10;
var id = url.substring(prefixlen,idlen+prefixlen);
This gets the length of the portion of the URL in front of the ID, and then uses substring() to snip out the required bit. But I'm sure you'll agree that the regex options are simpler? ;-)
Hope that helps. (and I hope it helps you feel less afraid of regex!)

Categories

Resources