Javascript RegEx named group and combine - javascript

I want to parse a file containing multiple lines. I will loop through the newlines to extract the information i need. I am trying to write a m3u parser which uses regex - i already succesfully made one using sting position and substring.
I want to parse this line:
#EXTINF:-1 tvg-ID="NPO 2 HD NL" tvg-name="NL : Npo 2" tvg-logo="http://www.iptv-plus.net:25461/images/Astra23picon/1_0_19_17C0_C82_3_EB0000_0_0_0.png" group-title="Holland",NL : Npo 2
So i came up with these regexpressions:
(?<=group-title=")(.*?)(?=")
(?<=tvg-name=")(.*?)(?=")
(?<=tvg-logo=")(.*?)(?=")
(?<=tvg-ID=")(.*?)(?=")
(?<=,)(.*?)$
These will extract the tvg tags, and extract the name from the end of the line (always after the last occuring comma.)
What i would like is to put all these regexpressions in one regex, so i can get an array containing all elements. I tried using | but that is an OR? i believe.. I can use them seperate, but i think it might be faster to put them all in 1? Also i would like to create named groups. But when i change it to
(?<group><=group-title=")(.*?)(?=")
it takes the lookbehind as text (<=).. Can i and how can i combine lookbehind with named groups? I like to use the regex in javascript, but could also implement a php that parses the m3u file and returns a json.

Related

How do I replace string within quotes in javascript?

I have this in a javascript/jQuery string (This string is grabbed from an html ($('#shortcode')) elements value which could be changed if user clicks some buttons)
[csvtohtml_create include_rows="1-10"
debug_mode="no" source_type="visualizer_plugin" path="map"
source_files="bundeslander_staple.csv" include cols="1,2,4" exclude cols="3"]
In a textbox (named incl_sc) I have the value:
include cols="2,4"
I want to replace include_cols="1,2,4" from the above string with the value from the textbox.
so basically:
How do I replace include_cols values here? (include_cols="2,4" instead of include_cols="1,2,4") I'm great at many things but regex is not one of them. I guess regex is the thing to use here?
I'm trying this:
var s = $('#shortcode').html();
//I want to replace include cols="1,2,4" exclude cols="3"
//with include_cols="1,2" exclude_cols="3" for example
s.replace('/([include="])[^]*?\1/g', incl_sc.val() );
but I don't get any replacement at all (the string s is same string as $("#shortcode").html(). Obviously I'm doing something really dumb. Please help :-)
In short what you will need is
s.replace(/include cols="[^"]+"/g, incl_sc.val());
There were a couple problems with your code,
To use a regex with String.prototype.replace, you must pass a regex as the first argument, but you were actually passing a string.
This is a regex literal /regex/ while this isn't '/actually a string/'
In the text you supplied in your question include_cols is written as include cols (with a space)
And your regex was formed wrong. I recomend testing them in this website, where you can also learn more if you want.
The code above will replace the part include cols="1,2,3" by whatever is in the textarea, regardless of whats between the quotes (as long it doesn't contain another quote).
First of all I think you need to remove the quotes and fix a little bit the regex.
const r = /(include_cols=\")(.*)(\")/g;
s.replace(r, `$1${incl_sc.val()}$3`)
Basically, I group the first and last part in order to include them at the end of the replacement. You can also avoid create the first and last group and put it literally in the last argument of the replace function, like this:
const r = /include_cols=\"(.*)\"/g;
s.replace(r, `include_cols="${incl_sc.val()}"`)

Get a string between two strings in Javascript

I have the below string that I need help pulling an ID from in Presto. Presto uses the javascript regex. I've searched multiple options including:
JavaScript text between double quotes
Javascript regex to extract all characters between quotation marks following a specific word
I need to pull the GA Client ID which looks like this:
75714ae471df63202106404675dasd800097erer1849995367
Below is a snipped where it sits in the string.
The struggle is that the "s:38:" is not constant. The number can be anything. For example, it could be s:40: or s:1000: etc. I need it to return just the alphanumeric id.
String Snippet
"GA_ClientID__c";s:38:"75714ae471df63202106404675dasd800097erer1849995367";
Full string listed below
99524";s:9:"FirstName";s:2:"John";s:8:"LastName";s:8:"Doe";s:7:"Company";s:10:"Sample";s:5:"Email";s:20:"xxxxx#gmail.com";s:5:"Phone";s:10:"8888888888";s:7:"Country";s:13:"United States";s:5:"Title";s:8:"Creative";s:5:"State";s:2:"NC";s:13:"Last_Asset__c";s:40:"White Paper: Be a More Strategic Partner";s:16:"Last_Campaign__c";s:18:"70160000000q6TgAAI";s:16:"Referring_URL__c";s:8:"[direct]";s:19:"leadPriorityMarketo";s:2:"P2";s:18:"ProductInterest__c";s:9:"sample";s:14:"landingpageurl";s:359:"https://www.sample.com;mkt_tok=samplesamplesamplesample";s:14:"GA_ClientID__c";s:38:"75714ae471df63202106404675dasd800097erer1849995367";s:13:"Drupal_SID__c";s:36:"e1380c07-0258-47de-aaf8-82d4d8061e1a";s:4:"form";s:4:"1046";} ```
This works for your sample
"GA_ClientID__c";[^"]*"([^"]*)"
https://regex101.com/r/Q4Orj6/1

JavaScript split string by specific character string

I have a text box with a bunch of comments, all separated by a specific character string as a means of splitting them to display each comment individually.
The string in question is | but I can change this to accommodate whatever will work. My only requirement is that it is not likely to be a string of characters someone will type in an everyday sentence.
I believe I need to use the split method and possibly some regex but all the other questions I've seen only seem to mention splitting by one character or a number of different characters, not a specific set of characters in a row.
Can anyone point me in the right direction?
.split() should work for that purpose:
var comments = "this is a comment|and here is another comment|and yet another one";
var parsedComments = comments.split('|');
This will give you all comments in an array which you can then loop over or do whatever you have to do.
Keep in mind you could also change | to something like <--NEWCOMMENT--> and it will still work fine inside the split('<--NEWCOMMENT-->') method.
Remember that split() removes the character it's splitting on, so your resulting array won't contain any instances of <--NEWCOMMENT-->

regex replace on JSON is removing an Object from Array

I'm trying to improve my understanding of Regex, but this one has me quite mystified.
I started with some text defined as:
var txt = "{\"columns\":[{\"text\":\"A\",\"value\":80},{\"text\":\"B\",\"renderer\":\"gbpFormat\",\"value\":80},{\"text\":\"C\",\"value\":80}]}";
and do a replace as follows:
txt.replace(/\"renderer\"\:(.*)(?:,)/g,"\"renderer\"\:gbpFormat\,");
which results in:
"{"columns":[{"text":"A","value":80},{"text":"B","renderer":gbpFormat,"value":80}]}"
What I expected was for the renderer attribute value to have it's quotes removed; which has happened, but also the C column is completely missing! I'd really love for someone to explain how my Regex has removed column C?
As an extra bonus, if you could explain how to remove the quotes around any value for renderer (i.e. so I don't have to hard-code the value gbpFormat in the regex) that'd be fantastic.
You are using a greedy operator while you need a lazy one. Change this:
"renderer":(.*)(?:,)
^---- add here the '?' to make it lazy
To
"renderer":(.*?)(?:,)
Working demo
Your code should be:
txt.replace(/\"renderer\"\:(.*?)(?:,)/g,"\"renderer\"\:gbpFormat\,");
If you are learning regex, take a look at this documentation to know more about greedyness. A nice extract to understand this is:
Watch Out for The Greediness!
Suppose you want to use a regex to match an HTML tag. You know that
the input will be a valid HTML file, so the regular expression does
not need to exclude any invalid use of sharp brackets. If it sits
between sharp brackets, it is an HTML tag.
Most people new to regular expressions will attempt to use <.+>. They
will be surprised when they test it on a string like This is a
first test. You might expect the regex to match and when
continuing after that match, .
But it does not. The regex will match first. Obviously not
what we wanted. The reason is that the plus is greedy. That is, the
plus causes the regex engine to repeat the preceding token as often as
possible. Only if that causes the entire regex to fail, will the regex
engine backtrack. That is, it will go back to the plus, make it give
up the last iteration, and proceed with the remainder of the regex.
Like the plus, the star and the repetition using curly braces are
greedy.
Try like this:
txt = txt.replace(/"renderer":"(.*?)"/g,'"renderer":$1');
The issue in the expression you were using was this part:
(.*)(?:,)
By default, the * quantifier is greedy by default, which means that it gobbles up as much as it can, so it will run up to the last comma in your string. The easiest solution would be to turn that in to a non-greedy quantifier, by adding a question mark after the asterisk and change that part of your expression to look like this
(.*?)(?:,)
For the solution I proposed at the top of this answer, I also removed the part matching the comma, because I think it's easier just to match everything between quotes. As for your bonus question, to replace the matched value instead of having to hardcode gbpFormat, I used a backreference ($1), which will insert the first matched group into the replacement string.
Don't manipulate JSON with regexp. It's too likely that you will break it, as you have found, and more importantly there's no need to.
In addition, once you have changed
'{"columns": [..."renderer": "gbpFormat", ...]}'
into
'{"columns": [..."renderer": gbpFormat, ...]}' // remove quotes from gbpFormat
then this is no longer valid JSON. (JSON requires that property values be numbers, quoted strings, objects, or arrays.) So you will not be able to parse it, or send it anywhere and have it interpreted correctly.
Therefore you should parse it to start with, then manipulate the resulting actual JS object:
var object = JSON.parse(txt);
object.columns.forEach(function(column) {
column.renderer = ghpFormat;
});
If you want to replace any quoted value of the renderer property with the value itself, then you could try
column.renderer = window[column.renderer];
Assuming that the value is available in the global namespace.
This question falls into the category of "I need a regexp, or I wrote one and it's not working, and I'm not really sure why it has to be a regexp, but I heard they can do all kinds of things, so that's just what I imagined I must need." People use regexps to try to do far too many complex matching, splitting, scanning, replacement, and validation tasks, including on complex languages such as HTML, or in this case JSON. There is almost always a better way.
The only time I can imagine wanting to manipulate JSON with regexps is if the JSON is broken somehow, perhaps due to a bug in server code, and it needs to be fixed up in order to be parseable.

Javascript - parse formatted text and extract values in order?

I have a field with wiki style rendering on it that I'd like to bust up in Javascript.
The text I'm trying to parse looks like this:
{color:#47B}_name1_{color}
{color:#555}description1{color}
---
{color:#47B}_name2_{color}
{color:#555}description2{color}
---
{color:#47B}_name3_{color}
{color:#555}description3{color}
---
etc
Where name1 and description1 belong together, name2 and description2 belong together, and so forth. The values for name and description are user supplied values, with description potentially spanning multiple lines.
My end goal is to be able to extract the values of each name and each description from the text (and be able to reliably associated name1 with description1, etc).
My question is: If I used a regex to match all the names into an array and all the descriptions into an array, can I be ensured that the items in the array are in the correct order? That is, will names[0] always be the first name in the parsed text (assuming I did a javascript regex match into the names array)? Also- is this bad practice/should I do this another way?
The regular expression I'm trying to use to match names is:
/^(\{color\:#47B\})(_)(\s*?)(.*?)(\s*?)(_)(\{color\})$/
And the regular expression I'm using to match descriptions is:
/(\{color\:#555\})(.*?)(\{color\})/
A regex search will always return matches in source order (i.e. in the order in which they occur in the source text.)
I assume you are asking this question because you're hoping to do two regex matches (one for name, one for description) and then get two result arrays, and guarantee that namesmatch[i] always goes with descriptionmatch[i]. However, this will only be true if your source text is always exactly perfect.
In this case it may be better or safer either to use a single regex that matches both at once, or split your source up by those -- delimiters and then match within each block. The reason why it may be safer is that your source text may contain errors, and at least in this case you can detect that and have as much good data as possible.
A note about your regexes. The . does not match newlines, so if the text between your {color} braces might have a newline you need to include newlines explicitly. [\s\S] and [^] are common idioms for this. Alternatively, if all . in a regex should match newlines, set the dotAll flag (s).

Categories

Resources