Javascript RegEx Mysteries (for a poor C programmer)

Javascript RegEx Mysteries (for a poor C programmer) - javascript

This is clearly a RTFM issue, but after I did so repeatedly I just can't get the damn thing to work so there are times when asking for help makes sense:
var text = "KEY:01 VAL:1.10,KEY:02 VAL:2.20,KEY:03 VAL:3.30";
var pattern = '/KEY:(\S+) VAL:([^,]+)/g';
//var pattern = '/KEY:(\S+) VAL:(.?+)(?:(?=,KEY:)|$)/g';
//var pattern = '/KEY:(\S+) VAL:(.+)$/g';
//pattern.compile(pattern);
var kv = null;
var row = 0, col = 0;
while((kv = pattern.exec(text) != null))
{
row = kv[1].charAt(0) - '0';
col = kv[1].charAt(1) - '0';
e = document.getElementById('live').rows[row].cells;
e[col].innerHTML = kv[2].slice(0, kv[2].indexOf(","));
}
kv[1] is supposed to give "01"
kv[2] is supposed to give "1.10"
...and of course kv[] should list all the values of 'text'
to fill the table called 'live'.
But I can't get to have pattern.exec() succeed in doing that.
Where is the glitch?

First, the delimiters for the RegExp should be /s, there's no need to put them in ' delimiters. i.e. to get your exec to run properly you should have:
var pattern = /KEY:(\S+) VAL:([^,]+)/g;
Second, you're assigning a boolean to kv which you don't want. The while will obviously only evaluate to true if it's not null so that's redundant. Instead you just need:
while (kv = pattern.exec(text)) {
That should get your code to work as you desire.

the syntax for pattern objects doesn't include quoting, such as:
var pattern=/KEY:(\S+) VAL:([^,]+)/g;
http://www.w3schools.com/jsref/jsref_regexp_exec.asp

It should be
var pattern = /KEY:(\S+) VAL:([^,]+)/g;
http://www.regular-expressions.info/ is a good place to start with.

Related

Looking for the easiest way to extract an unknown substring from within a string. (terms separated by slashes)

The initial string:
initString = '/digital/collection/music/bunch/of/other/stuff'
What I want: music
Specifically, I want any term (will never include slashes) that would come between collection/ and /bunch
How I'm going about it:
if(initString.includes('/digital/collection/')){
let slicedString = initString.slice(19); //results in 'music/bunch/of/other/stuff'
let indexOfSlash = slicedString.indexOf('/'); //results, in this case, to 5
let desiredString = slicedString.slice(0, indexOfSlash); //results in 'music'
}
Question:
How the heck do I accomplish this in javascript in a more elegant way?
I looked for something like an endIndexOf() that would replace my hardcoded .slice(19)
lastIndexOf() isn't what I'm looking for, because I want the index at the end of the first instance of my substring /digital/collection/
I'm looking to keep the number of lines down, and I couldn't find anything like a .getStringBetween('beginCutoff, endCutoff')
Thank you in advance!

your title says "index" but your example shows you wanting to return a string. If, in fact, you are wanting to return the string, try this:
if(initString.includes('/digital/collection/')) {
var components = initString.split('/');
return components[3];
}

If the path is always the same, and the field you want is the after the third /, then you can use split.
var initString = '/digital/collection/music/bunch/of/other/stuff';
var collection = initString.split("/")[2]; // third index
In the real world, you will want to check if the index exists first before using it.
var collections = initString.split("/");
var collection = "";
if (collections.length > 2) {
collection = collections[2];
}

You can use const desiredString = initString.slice(19, 24); if its always music you are looking for.

If you need to find the next path param that comes after '/digital/collection/' regardless where '/digital/collection/' lies in the path
first use split to get an path array
then use find to return the element whose 2 prior elements are digital and collection respectively
const initString = '/digital/collection/music/bunch/of/other/stuff'
const pathArray = initString.split('/')
const path = pathArray.length >= 3
? pathArray.find((elm, index)=> pathArray[index-2] === 'digital' && pathArray[index-1] === 'collection')
: 'path is too short'
console.log(path)

Think about this logically: the "end index" is just the "start index" plus the length of the substring, right? So... do that :)
const sub = '/digital/collection/';
const startIndex = initString.indexOf(sub);
if (startIndex >= 0) {
let desiredString = initString.substring(startIndex + sub.length);
}
That'll give you from the end of the substring to the end of the full string; you can always split at / and take index 0 to get just the first directory name form what remains.

You can also use regular expression for the purpose.
const initString = '/digital/collection/music/bunch/of/other/stuff';
const result = initString.match(/\/digital\/collection\/([a-zA-Z]+)\//)[1];
console.log(result);
The console output is:
music

If you know the initial string, and you have the part before the string you seek, then the following snippet returns you the string you seek. You need not calculate indices, or anything like that.
// getting the last index of searchString
// we should get: music
const initString = '/digital/collection/music/bunch/of/other/stuff'
const firstPart = '/digital/collection/'
const lastIndexOf = (s1, s2) => {
return s1.replace(s2, '').split('/')[0]
}
console.log(lastIndexOf(initString, firstPart))

JavaScript Error in Pentaho - Cannot call method "toUpperCase" of null

// Sets all the letters as uppercase
var str = ae_a_asset_e$manufacture_code.toUpperCase();
var str2 = ae_a_asset_e$serial_no.toUpperCase();
var str3 = ae_a_asset_e$manu_part_number.toUpperCase();
var str4 = ae_a_asset_e$region_code.toUpperCase();
var str5 = ae_a_asset_e$fac_id.toUpperCase();
Any idea how to fix this? I would think there would have to be a way to say if value = null then don't worry about it.

First you have to think whether it is correct that some values are null or not, such as ae_a_asset_e$manufacture_code.
If they can be null you can access them in a safe way like this (extend this code to all other vars as required):
var str = ae_a_asset_e$manufacture_code ? ae_a_asset_e$manufacture_code.toUpperCase() : "";
If they cannot be null then, your should check your data integrity first and then run this script (assuming they are never null).

How do you divide a string of random letters into pairs of substrings and replace with new characters continuously in Javascript?

So I'm working on this project that involves translating a string of text such as "als;kdfja;lsjkdf" into regular charaters like "the big dog" by parsing for certain pairs of letters that translate. (i.e: "fj" = "D")
The catch is I cant simply use the .replace() function in javascript, because there are many occurences where it's given the text "fjkl", and needs to find "jk" and logically interprets the collision of "fj" and "kl" to say that it's found it. This wont work for me, because for me, it didnt find it, as i am only trying to look at found pairs within 2 characters at a time. (i.e: "fjkl" could only yeild "fj" and "kl".)
(In the end I intend to utilize just the 8 characters "asdfjkl;" and set pairs of characters to actual letters. (in this subsitution method, fyi, "fj" OR "jf" would actually be "_"(space). )
in trying to figure out this task in javascript, (I dont know if another language might handle it more efficiently,) I tried utilizing the "split" function in the following way. (Disclaimer, I'm not sure if this is formatted 100% perfectly)
<textarea id="textbox"></textarea>
<script>
var text = document.getElementById("textbox").value; //getting string from the textarea
var pairs = text.split(/(..)/).filter(String); //spliting string into pairs
if(pairs == "fj"){replace(pairs, " ")} //some sort of subsitution
</script>
Additionally, if possible, i would like the replaced characters to be fed directly into the textarea continuosly as the user types, so the translation happens almost simutaneously. (I'm assuming this will use some sort of setInterval() function?)
If any tips can be given on the correct formatting of which tools i should use in javascript, that would be very outstanding; Thanks in advance.
if your interested, here is full list of subsitutions im making in the end of this project:
syntax:(X OR Y == result)
AJ JA = F
AK KA = V
AL LA = B
A; ;A = Y
SJ JS = N
SK KS = M
SL LS = S
S; ;S = P
DJ JD = A
DK DK = U
DL LD = D
D; ;D = G
FJ JF = _
FK KF = I
FL LF = T
F; ;F = K
AS SA = C
SD DS = L
DF FD = E
JK KJ = O
KL LK = R
L; ;L = Z
AD DA = -
SF FS = ,
AF FA = .
JL LJ = !
K; ;K = :
J; ;J = ?
-Daniel Rehman

I have prepared a code for your requirement. You can bind a function on keydown to allow continuous changes as you type in the textarea.
I am using replacePair method to replace a pair of character by its equivalent uppercase representation. You can inject your own custom logic here.
var tb = document.getElementById('tb');
var processedLength = 0;
var pairEntered=false;
tb.onkeydown = function (e) {
pairEntered=!pairEntered;
if (pairEntered) {
var nextTwoChars = this.value.substr(this.value.length - 2, 2);
var prevPart=this.value.substr(0,this.value.length-2);
var finalText=prevPart+ replacePair(nextTwoChars);
this.value=finalText;
processedLength+=2;
}
}
function replacePair(str){
return str.toUpperCase();
}
jsfiddle:http://jsfiddle.net/218fq7t2/
updated fiddle as per your replacement logic: http://jsfiddle.net/218fq7t2/3/

If you can be assured that certain pairs always translate to the same character, then perhaps a dictionary object can help.
var dict = { as:"A", kl:"B", al:"C", fj:"D" ... };
And, if your 'decryption' algorithm is 'lazy' (evaluates the first pair it encounters), then you can just travel through the input string.
var outputString = "", c, cl;
for (c = 1, cl = inputString.length; c < cl; c += 2) {
outputString += dict[inputString[c-1] + inputString[c]] || "";
}
If your replacement algorithm is not any more complicated than simply looking up which letter the pair represents, then this should do alright for you. No real logic necessary.

Couldn't you do it as follows:
var text = document.getElementById("textbox").value;
for (i = 0; i <= text.length; i++) {
if (text[i] == "j") {
if (text[i+1] == "f") {
pair = "jf";
text = text.replace(pair, "_");
}
}
What this would do is it would always, when checking any letter, also check the letter after it during the same step in the procedure. When it finds both letter i and letter i+1 matching up with a pair you are looking for, then the letters will be replaced by a space (or whatever you want), meaning that when the for-loop reaches the next run after a pair was found, the size of the text string will have been reduced by one. Thus, when it increments i, it will automatically skip the letter that made up the second component of the found pair. Thus, "jfkl" will be identified as two different pairs and your algorithm will not be confused.
of course, you would also have to work in the other pairs/codewords into the for loop so that they are all checked in some way

I had hoped my previous answer was enough to get you started. I was merely providing an algorithm that you could then use to your liking (wrap it in a function and add your own event listeners, etc).
Here is the solution to your problem. I did not write the entire dictionary. You will need to complete that.
var dictionary = { "aj":"F", "ja":"F", "ak":"V", "ka":"V", "al":"B", "la":"B", "a;":"Y", ";a":"Y" }
var input, output;
function init() {
input = document.getElementById("input");
output = document.getElementById("output");
input.addEventListener("keyup", decrypt, false);
}
function decrypt () {
if (!input || !output) {
return;
}
var i = input.value, o = "", c, cl;
for (c = 1, cl = i.length; c < cl; c += 2) {
o += dictionary[ i[c-1] + i[c] ] || "";
}
while (output.hasChildNodes()) {
output.removeChild(output.firstChild);
}
output.appendChild(document.createTextNode(o));
}
window.addEventListener("load", init, false);
<textarea id="input"></textarea>
<div id="output"></div>

Javascript splitting string using only last splitting parameter

An example of what im trying to get:
String1 - 'string.co.uk' - would return 'string' and 'co.uk'
String2 - 'random.words.string.co.uk' - would return 'string` and 'co.uk'
I currently have this:
var split= [];
var tld_part = domain_name.split(".");
var sld_parts = domain_name.split(".")[0];
tld_part = tld_part.slice(1, tld_part.length);
split.push(sld_parts);
split.push(tld_part.join("."));
With my current code, it takes the split parameter from the beginning, i want to reverse it if possible. With my current code it does this:
String1 - 'string.co.uk' - returns 'string' and 'co.uk'
String2 - 'random.words.string.co.uk' - would return 'random` and 'words.string.co.uk'
Any suggestions?

To expand upon elclanrs comment:
function getParts(str) {
var temp = str.split('.').slice(-3) // grabs the last 3 elements
return {
tld_parts : [temp[1],temp[2]].join("."),
sld_parts : temp[0]
}
}
getParts("foo.bar.baz.co.uk") would return { tld_parts : "co.uk", sld_parts : "baz" }
and
getParts("i.got.99.terms.but.a.bit.aint.one.co.uk") would return { tld_parts : "co.uk", sld_parts : "one" }

try this
var str='string.co.uk'//or 'random.words.string.co.uk'
var part = str.split('.');
var result = part[part.length - 1].toString() + '.' + part[part.length - 1].toString();
alert(result);

One way that comes to mind is the following
var tld_part = domain_name.split(".");
var name = tld_part[tld_part.length - 2];
var tld = tld_part[tld_part.length - 1] +"."+ tld_part[tld_part.length];

Depending on your use case, peforming direct splits might not be a good idea — for example, how would the above code handle .com or even just localhost? In this respect I would go down the RegExp route:
function stripSubdomains( str ){
var regs; return (regs = /([^.]+)(\.co)?(\.[^.]+)$/i.exec( str ))
? regs[1] + (regs[2]||'') + regs[3]
: str
;
};
Before the Regular Expression Police attack reprimand me for not being specific enough, a disclaimer:
The above can be tightened as a check against domain names by rather than checking for ^., to check for the specific characters allowed in a domain at that point. However, my own personal perspective on matters like these is to be more open at the point of capture, and be tougher from a filtering point at a later date... This allows you to keep an eye on what people might be trying, because you can never be 100% certain your validation isn't blocking valid requests — unless you have an army of user testers at your disposal. At the end of the day, it all depends on where this code is being used, so the above is an illustrated example only.

Ambiguous interface of RegExp

Something very strange.
var body="Received: from ([195.000.000.0])\r\nReceived: from ([77.000.000.000]) by (6.0.000.000)"
var lastMath="";
var subExp = "[\\[\\(](\\d+\\.\\d+\\.\\d+\\.\\d+)[\\]\\)]"
var re = new RegExp("Received\\: from.*?"+subExp +".*", "mg");
var re1 = new RegExp(subExp , "mg");
while(ares= re.exec(body))
{
print(ares[0])
while( ares1 = re1.exec(ares[0]))
{
if(!IsLocalIP(ares1[1]))
{
print(ares1[1])
lastMath=ares1[1];
break ;
}
}
}
print(lastMath)
It outputs:
Received: from ([195.000.000.0])
195.000.000.0
Received: from ([77.000.000.000]) by (6.0.000.000)
6.0.000.000
6.0.000.000
But I think it should be:
Received: from ([195.000.000.0])
195.000.000.0
Received: from ([77.000.000.000]) by (6.0.000.000)
77.000.000.000
77.000.000.000
Because obviously "77.000.000.000" goes first. If I comment "break", output order is correct.
What's wrong with my code?

Note that regex grouping in Javascript (and most languages) does not work with a very obvious behavior with the * or + operators. For example:
js>r = /^(ab[0-9])+$/
/^(ab[0-9])+$/
js>"ab1ab2ab3ab4".match(r)
ab1ab2ab3ab4,ab4
In this case, you get the last group that matches and that's it. I'm not sure where this behavior is specified, but it can vary from language to language.
edit: What does IsLocalIP() do?
OK, I think the problem has to do with exec's statefulness (which may be why I don't use it; I use String.match()) -- if you're going to do this, you need to manually initialize the regex's lastindex property to 0, otherwise you get this behavior:
function weird(dobreak)
{
var s = "Received: from ([77.000.000.000]) by (6.0.000.000)"
var re1 = /[\[\(](\d+\.\d+\.\d+\.\d+)[\]\)]/mg
while (s2 = re1.exec(s))
{
writeln("s2="+s2);
if (dobreak)
break;
}
}
produces this result:
js>weird(true)
js>weird(true)
s2=[77.000.000.000],77.000.000.000
js>weird(true)
s2=(6.0.000.000),6.0.000.000
js>weird(true)
js>
You'll note that the same function gets three different results, which implies statefulness is mucking things up for some bizarre reason (Javascript is caching/interning the regex somehow? I'm using JSDB which uses Spidermonkey = Firefox's javascript engine).
So if I change the code to the following:
function notweird(dobreak)
{
var s = "Received: from ([77.000.000.000]) by (6.0.000.000)"
var re1 = /[\[\(](\d+\.\d+\.\d+\.\d+)[\]\)]/mg
re1.lastIndex = 0;
while (s2 = re1.exec(s))
{
writeln("s2="+s2);
if (dobreak)
break;
}
}
Then I get the expected behavior:
js>notweird(true)
s2=[77.000.000.000],77.000.000.000
js>notweird(true)
s2=[77.000.000.000],77.000.000.000
js>notweird(true)
s2=[77.000.000.000],77.000.000.000

Develop Reference

JavaScript is the programming language of the Web.

Javascript RegEx Mysteries (for a poor C programmer) - javascript

the syntax for pattern objects doesn't include quoting, such as: var pattern=/KEY:(\S+) VAL:([^,]+)/g; http://www.w3schools.com/jsref/jsref_regexp_exec.asp

It should be var pattern = /KEY:(\S+) VAL:([^,]+)/g; http://www.regular-expressions.info/ is a good place to start with.

Related

Looking for the easiest way to extract an unknown substring from within a string. (terms separated by slashes)

JavaScript Error in Pentaho - Cannot call method "toUpperCase" of null

How do you divide a string of random letters into pairs of substrings and replace with new characters continuously in Javascript?

Javascript splitting string using only last splitting parameter

Ambiguous interface of RegExp

Categories

Resources