Creating AST with OMetaJS that includes token value and position - javascript

I'm trying to parse a DSL with OMetaJS and produce an AST that includes a token value as well as it's index in the original stream.
I know I can use the Index Capture Rule syntax ( #<rule> ) to give me an object containing the indices framing the token but is it possible to capture that as well as the token value?
E.g for the grammar:
export ometa Test {
start = #<identifier>,
identifier = (letter | digit)+
}
Parsing "Bob" gives:
{ fromIdx : 0, toIdx : 3 }
If I remove the '#' from 'identifier' then parsing gives "Bob" as the result. What I'd ideally like to get is combination of the two:
{ fromIdx : 0, toIdx : 3, value: 'Bob' }
I could of course hack the source, but is there a better way to do this?
I want to have both value and position because I'm trying to create a visual representation of the DSL which allows editing of identifier names for example. In this case I need to know where in the original source the identifier appeared so I can modify it.

I think what you're asking for is pretty useful, and probably deserves to have its own syntactic sugar. I'll definitely think about it. In the meantime, you could do something like this:
ometa Test {
parse :r = #<apply(r):value>:node !(node.value = value) -> node,
identifier = (letter | digit)+,
start = parse("identifier")
}
Hope that helps!

Given that you want the thing, and the span, what about using the peek operator &? That will return the token, but not consume the input. So perhaps something like
spannedThing = (&identifier:token #identifier:span) -> combineThemSomehow(token, span)
might do what you want? (Warning: my OMeta's rusty; the above might not use correct grammar.) You could turn that into a parameterised rule.

Related

Create function REGEX for optimization

I've been asked to optimize the speed of my query. I currently have this regex in my query, which is checking for a pattern and returning substring within that pattern. To clarify I have a table with multiple columns that I have to look through to check for this value: [v= and return the numbers within that list.
This is looking through several 'name..' columns that look something like this: xyzzy [v=123] but I only want to return 123, the below works:
COALESCE(REGEXP_SUBSTR(NAME, '[[]v=([0-9]+)', 1, 1, 'ie'),
REGEXP_SUBSTR(NAME_5, '[[]v=([0-9]+)', 1, 1, 'ie'),
REGEXP_SUBSTR(NAME_4, '[[]v=([0-9]+)', 1, 1, 'ie')) as display_vertical_code
but to optimize this, I thought of maybe creating a function unfortunately I don't know javascript :/ and I don't know if the formatting is correct I'm having some difficulties creating it, this is what I've tried, can someone tell me if I'm missing something?
CREATE OR REPLACE FUNCTION dfp.regex(NAME VARCHAR)
RETURNS OBJECT
LANGUAGE javascript
STRICT AS '
return new RegExp(NAME,"[[]v=([0-9]+)",1 ,1,"ie")
';
When I try to use the above function in my below query:
COALESCE(
GET(DFP.REGEX(NAME)),
GET(DFP.REGEX(NAME_5)),
GET(DFP.REGEX(NAME_4)),
GET(DFP.REGEX(NAME_3)),
GET(DFP.REGEX(NAME_2)),
GET(DFP.REGEX(NAME_1)),
GET(DFP.REGEX(NAME_0))
) as display_vertical_code
I see this error:
error line 3 at position 8 not enough arguments for function
[GET(REGEX(Tablename.NAME))], expected 2, got 1
This should do it.
CREATE OR REPLACE FUNCTION regex(NAME VARCHAR)
RETURNS string
LANGUAGE javascript
STRICT IMMUTABLE AS
$$
const regex = /[[]\s{0,5}v\s{0,5}=\s{0,5}([0-9]+)/i;
let s = NAME.match(regex);
if (s != null) {
return s[0].split('=')[1].trim();
} else {
return null;
}
$$;
select regex('xyzzy [v=123]');
-- Alternate permutation
select regex('xyzzy [ v = 123]');
You want to return a string, not an object. Adding the IMMUTABLE option tells Snowflake that the same input results in the same output every time. It can sometimes help with performance.
Edit... This one's a bit more fault tolerant and allows whitespace (if that could be a problem). If you want to get rid of allowing whitespace, delete the \s{0,5} expressions.

Replacing Variables in String at Runtime

Currently I am using the following to evaluate variables that are placed in strings at runtime:
newVal = eval("`" + newVal + "`");
So if I have the string:
"Hello from channel: ${erpVars["CommandChannel"]["name"]}"
And erpVars["CommandChannel"]["name"] has value home, then the resulting string is:
Hello from channel: home
There are other objects than just erpVars that could be holding matching values for the string, but this is just one example. It's also important to note that each string could have more than one variable that needs replacing.
I am trying to achieve the same thing without using eval(), as some of the variable values come from user input.
Your case sounds super nasty (you should never ever use eval in JS! It poses a major security threat! also it looks weird that you want to replace this sort of a string) and perhaps if you told me more about where you get your inputs from and in what form, then maybe we could find together a much better solution for this. On that note, this is how I would solve your issue in its current form.
const newVal = 'Hello from channel: ${erpVars["CommandChannel"]["name"]}';
const strings = {
erpVars: {
CommandChannel: {
name: "home"
}
}
};
const vars = newVal.match(/\$\{.+?\}/g);
let result = newVal;
vars.forEach(v => {
let valuePath = '${erpVars["CommandChannel"]["name"]}'.match(/[\w\d]+/g).join('.');
result = result.replace(v, _.get(strings, valuePath));
});
console.log(result);
Note that I'm skipping here the edge scenarios, like getting a null result from the newVal.match when there are no variables in the newVal, but that's easy to handle.
Also note that over here i'm using the lodash library in _.get() (https://lodash.com/docs/4.17.15#get). It's super popular for this kind of small tasks. Of course there are really a lot of other tools that allow you to extract a value based on a property path like erpVars.CommandChannel.name that is stored in the valuePath variable, including a crazy amount of instructions that tell you how to do it yourself.

How to return multiple tokens with Jison lexer

I'm new to lexing and parsing so sorry if the title isn't clear enough.
Basically, I'm using Jison to parse some text and I am trying to get the lexer to comprehend indentation. Here's the bit in question:
(\r\n|\r|\n)+\s* %{
parser.indentCount = parser.indentCount || [0];
var indentation = yytext.replace(/^(\r\n|\r|\n)+/, '').length;
if (indentation > parser.indentCount[0]) {
parser.indentCount.unshift(indentation);
return 'INDENT';
}
var tokens = [];
while (indentation < parser.indentCount[0]) {
tokens.push('DEDENT');
parser.indentCount.shift();
}
if (tokens.length) {
return tokens;
}
if (!indentation.length) {
return 'NEWLINE';
}
%}
So far, almost all of that works as expected. The one problem is the line where I attempt to return an array of DEDENT tokens. It appears that Jison is just converting that array into a string which causes me to get a parse error like Expecting ........, got DEDENT,DEDENT.
What I'm hoping I can do to get around this is manually push some DEDENT tokens onto the stack. Maybe with a function like this.pushToken('DEDENT') or something along those lines. But the Jison documentation is not so great and I could use some help.
Any thoughts?
EDIT:
I seem to have been able to hack my way around this after looking at the generated parser code. Here's what seems to work...
if (tokens.length) {
var args = arguments;
tokens.slice(1).forEach(function () {
lexer.performAction.apply(this, args);
}.bind(this));
return 'DEDENT';
}
This tricks the lexer into performing another action using the exact same input for each DEDENT we have in the stack, thus allowing it to add in the proper dedents. However, it feels gross and I'm worried there could be unforeseen problems.
I would still love it if anyone had any ideas on a better way to do this.
After a couple of days I ended up figuring out a better answer. Here's what it looks like:
(\r\n|\r|\n)+[ \t]* %{
parser.indentCount = parser.indentCount || [0];
parser.forceDedent = parser.forceDedent || 0;
if (parser.forceDedent) {
parser.forceDedent -= 1;
this.unput(yytext);
return 'DEDENT';
}
var indentation = yytext.replace(/^(\r\n|\r|\n)+/, '').length;
if (indentation > parser.indentCount[0]) {
parser.indentCount.unshift(indentation);
return 'INDENT';
}
var dedents = [];
while (indentation < parser.indentCount[0]) {
dedents.push('DEDENT');
parser.indentCount.shift();
}
if (dedents.length) {
parser.forceDedent = dedents.length - 1;
this.unput(yytext);
return 'DEDENT';
}
return `NEWLINE`;
%}
Firstly, I modified my capture regex to make sure I wasn't inadvertently capturing extra newlines after a series of non-newline spaces.
Next, we make sure there are 2 "global" variables. indentCount will track our current indentation length. forceDedent will force us to return a DEDENT if it has a value above 0.
Next, we have a condition to test for a truthy value on forceDedent. If we have one, we'll decrement it by 1 and use the unput function to make sure we iterate on this same pattern at least one more time, but for this iteration, we'll return a DEDENT.
If we haven't returned, we get the length of our current indentation.
If the current indentation is greater than our most recent indentation, we'll track that on our indentCount variable and return an INDENT.
If we haven't returned, it's time to prepare to possible dedents. We'll make an array to track them.
When we detect a dedent, the user could be attempting to close 1 or more blocks all at once. So we need to include a DEDENT for as many blocks as the user is closing. We set up a loop and say that for as long as the current indentation is less than our most recent indentation, we'll add a DEDENT to our list and shift an item off of our indentCount.
If we tracked any dedents, we need to make sure all of them get returned by the lexer. Because the lexer can only return 1 token at a time, we'll return 1 here, but we'll also set our forceDedent variable to make sure we return the rest of them as well. To make sure we iterate on this pattern again and those dedents can be inserted, we'll use the unput function.
In any other case, we'll just return a NEWLINE.

Form handling and validation in pure JavaScript

My intention is to get your thoughts and criticism about the script below, as regards the algorithm's design, performance and cross-browser compatibility.
I have just started getting into JavaScript having missed out on its awesomeness for quite a while. My background and experience is in developing C/C++/PHP based RESTful backends.
In order to understand the language and the right way of using it, I decided to do something which I am sure has been done many times before. But learning to use a new language and paradigm often entails pain anyway.
This is my attempt to create a normal form processing and validation script/ function.
In order to reduce complexity and keep code simple/clean, I decided to use HTML5 Custom Data Attributes (data-*) to assign metadata for each element in the form:
Data-Required: True or False. If set to true, this parameter makes the form-field required and so it cannot be empty. A value set to false indicates that the field is optional. Default is false.>
Data-Type: Type of validation to be performed. Examples include 'email', 'password', 'numbers' or any other 'regexp'.
A fairy simple example of such a form would be:
<form action="postlistings" id="postlistings" enctype='multipart/form-data' method="post" class="postlistings">
<ul class="login-li">
<li>
<input class="title" name="title" type="title" id="title" data-required="true" data-type="title"></a>
</li>
<li>
<textarea name="body" id="elm1" class="elm1" name="elm1" data-type="body" data-required="true" >
</textarea>
</li>
<li>
<span class="nav-btn-question">Add Listing</span>
</li>
</ul>
</form>
Reminder: This is my first piece of JavaScript code.
The idea is to call Form while passing the form name to retrieve and validate all the field values in one loop for performance. The validation involves two steps as can be guessed from the Data-* attributes described above:
i. Check for required form fields.
In case the values fail to meet step 1 requirement, an error message from configuration is pulled for the specific form value. Thus, for all values that fail to meet this requirement, an array of error messages are collected and passed on to the View.
ii. Perform respective validations.
Validations are only performed if all the values passed step 1. Otherwise, they follow the same steps as indicated in 1 above.
function Form(){
var args = Array.prototype.slice.call(arguments),
formName = args[0],
callback = args.pop(),
userError = [{type: {}, param: {}}],
requiredDataParam = 'required',
typeDataParam = 'type',
form = document.forms[formName],
formLength = form.length || null,
formElement = {id: {}, name: {}, value: {}, required: {}, type: {}};
function getFormElements(){
var num = 0;
var emptyContent = false;
for (var i = 0; i < formLength; i += 1) {
var formField = form[i];
formElement.id[i] = inArray('id', formField) ? formField.id : null;
formElement.name[i] = inArray('name', formField) ? formField.name : null;
formElement.value[i] = inArray('value', formField) ? formField.value : null;
formElement.required[i] = getDataAttribute(formField, requiredDataParam);
formElement.type[i] = getDataAttribute(formField, typeDataParam);
if (formElement.required[i] === true){
if(!formElement.type[i]) {
error('Validation rule not defined!');
}
else if (!formElement.value[i]) {
userError[num++] = {'type': 'required', 'param': form[i]};
emptyContent = true;
}
}
if (emptyContent === false) {
// Perform validations only if no empty but required form values were found.
// This is so that we can collect all the empty
// inputs and their corresponding error messages.
}
}
if (userError) {
// Return empty form errors and their corresponding error messages.
}
return formElement;
};
// Removed the getFormParam function that was not used at all.
return {
getFormElements: getFormElements
}
};
Two outside functions that are used in the JS script above (from JQuery source):
var inArray = function(elem, array){
if (array.indexOf){
return array.indexOf(elem);
}
for (var i = 0, length = array.length; i < length; i++){
if (array[i] === elem){
return i;
}
}
return -1;
}
// This is a cross-platform way to retrieve HTML5 custom attributes.
// Source: JQuery
var getDataAttribute = function(elem, key, data) {
if (data === undefined && elem.nodeType === 1) {
data = elem.getAttribute("data-" + key);
if (typeof data === "string") {
data = data === "true" ? true :
data === "false" ? false :
data === "null" ? null :
!CheckType.isNaN ? parseFloat(data) :
CheckType.rbrace.test(data) ? parseJSON(data) :
data;
}
else {
data = undefined;
}
}
return data;
}
An example of Config Error messages can be set as follows:
var errorMsgs = {
ERROR_email: "Please enter a valid email address.",
ERROR_password: "Your password must be at least 6 characters long. Please try another",
ERROR_user_exists: "The requested email address already exists. Please try again."
};
As I post this for your review, please ignore any styling conventions that I might not have followed. My intention is to get your expert reviews on anything I should be doing different or could do better concerning the code itself, and the algorithm.
Besides the styling conventions, all criticism and questions are welcome.
First I'd like to clear up a common misconception. Forgive me if you already understand this clearly; maybe it will be helpful for someone else.
Learning and using jQuery or a similar library does not preclude or conflict with learning the JavaScript language. jQuery is simply a DOM manipulation library which takes away many of the pain points of using the DOM. There's plenty of room to learn and use JavaScript, the language, even if you use a library to abstract away some of the DOM details.
In fact, I would argue that using the DOM directly is likely to teach bad JavaScript coding habits, because the DOM is very much not a "JavaScript-ish" API. It was designed to work identically in JavaScript and Java and potentially other languages, and so it completely fails to make good use of the features of the JavaScript language.
Of course as you said, you're using this as a learning exercise; I just don't want you to fall into the trap that I've seen many people fall into of thinking, "I don't want to learn jQuery, because I want to learn JavaScript instead!" That's a false dichotomy: you have to learn JavaScript in either case, and using jQuery for the DOM doesn't interfere with that at all.
Now some details...
While it's OK to quote property names in an object literal and when you reference the properties, it's customary - and more readable - not to quote them when they are valid JavaScript names. e.g. in your formElement object
formElement = { id: {}, name: {}, value: {}, required: {}, type: {} };
(there was a missing semicolon at the end there too)
and where you use the names you can do:
formElement.id[i] = ...
formElement.name[i] = ...
etc.
Don't run your loops backwards unless the program logic requires it. It doesn't make the code faster except possibly in the case of an extremely tight loop, and it makes it unclear whether you're just prematurely optimizing or actually need the backwards loop.
Speaking of optimization, that loop has several inArray() calls. Since each of those loops through an array, that could be more of a performance impact than the outer loop. I imagine these arrays are probably pretty short? So performance wouldn't matter at all anyway, but this is something to think about in cases where you have longer arrays and objects. In some cases you can use an object with property names and values for a faster lookup - but I didn't look closely enough at what you're doing to suggest anything.
In any case, you're using inArray() wrong! But not your fault, that is a ridiculously named function in jQuery. The name clearly suggests a boolean return value, but the function returns the zero-based array index or -1 if the value is not found. I strongly recommend renaming this function as indexOf() to match the native Array method, or arrayIndex(), or some such.
That same loop has form[i] repeated numerous times. You could do this at the top of the loop:
var field = form[i];
and then use field throughout, e.g. field.id instead of form[i].id. This is generally faster, if it matters (which it probably doesn't here), but more importantly it's easier to read.
Do not use strict boolean comparisons like if( foo === true ) and if( bar === false) unless you really need to - and those cases are rare. The code sends a signal to the reader that there is something going on that's different from the usual boolean test. The only time these particular tests should be used is when you have a variable that may contain a boolean value or may contain some other type of value, and you need to distinguish which is which.
A good example of a case where you should use tests like these is an optional parameter that defaults to true:
// Do stuff unless 'really' is explicitly set to false, e.g.
// stuff(1) will do stuff with 1, but stuff(1,false) won't.
function stuff( value, really ) {
if( really === false ) {
// don't do stuff
}
else {
// do stuff
}
}
That specific example doesn't make a lot of sense, but it should give you the idea.
Similarly, an === true test could be used in a case where need to distinguish an actual boolean true value from some other "truthy" value. Indeed, it looks like this line is a valid case for that:
if (formElement['required'][i] === true){
given that if (formElement['required'][i] comes from the getDataAttribute() function which may return a boolean or other type.
If you are just testing for truthiness, though - and this should be most of the time - simply use if( foo ) or if( ! foo ). Or similarly in a conditional expression: foo ? x : y or !foo ? x : y.
The above was a long-winded way of saying that you should change this:
if (empty_content === false) {
to:
if (!empty_content) {
Your getFormParam() function goes to some work to convert an undefined result to null. There is usually no reason to do this. I don't see any place where that function is called, so I can't advise specifically, but in general you'd be testing for truthiness on something like this, so null and undefined would both be treated as false. Or in cases where you do need to distinguish null/undefined from other values (say, an explicit false), you can easily do it with != null or == null. This is one case where the "looser" comparison performed by == and != is very useful: both null and undefined evaluate the same with these operators.
You asked to ignore coding style, but one little suggestion here: You have a mix of camelCaseNames and names_with_underscores. In JavaScript, camelCaseNames are more idiomatic for function and variable names, with PascalCaseNames for constructor functions. Of course feel free to use underscores where they make more sense, for example if you're writing code that works with database columns in that format you may want your variable names to match the column names.
Hope that helps! Keep up the good work.
Update for your new code
I'm having a bit of trouble following the logic in the code, and I think I know part of the reason. It's a combination of naming conventions and inside-out objects.
First, the name formElement is really confusing. When I see element in JavaScript, I think of either a DOM element (HTMLElement) or an array element. I'm not sure if this formElement represents one or the other or neither.
So I look at the code to figure out what it's doing, and I see it has id:{}, name:{}, ... properties, but the code later treats each of those as an Array and not an Object:
formElement.id[i] = ...
formElement.name[i] = ...
formElement.value[i] = ...
formElement.required[i] = ...
formElement.type[i] = ...
(where i is an integer index)
If that code is right, those should be arrays instead: id:[], name:[], ....
But this is a red flag. When you see yourself creating parallel arrays in JavaScript, you're probably doing it wrong. In most cases you're better off replacing the parallel arrays with a single array of objects. Each of the objects in that array represents a single slice through all your parallel arrays, with a property for each of the previous arrays.
So, this object (where I've made the correction from {} to [] to match its current use):
formElement = { id: [], name: [], value: [], required: [], type: [] };
should be:
formInfo = [];
and then where you have the code that goes:
formElement.id[i] = ...;
formElement.name[i] = ...;
formElement.value[i] = ...;
formElement.required[i] = ...;
formElement.type[i] = ...;
It should be:
var info = {
id: ...,
name: ...,
value: ...,
required: ...,
type: ...
};
formInfo.push( info );
and adjust the rest of the code to suit. For example:
formElement.required[i]
would be:
formInfo[i].required
or even simpler since it's in the same function:
info.required
And note: I'm not saying info and formInfo are great names :-) they are just placeholders so you can think of a better name. The main idea is to create an array of objects instead of a set of parallel arrays.
One last thing and then I'm out of time for now.
That getDataAttribute() function is a complicated little piece of work. You don't need it! It would be simpler would just call the underlying function directly where you need it:
var info = {
...
required: formField.getAttribute('data-required') === 'true',
type: formField.getAttribute('data-type')
};
This also gives you full control of how the attributes are interpreted - as in the === 'true' test above. (This gives you a proper boolean value, so when you test the value later you don't have to use === true on it.)
On a stylistic note, yes, I did hard code the two 'data-xxxx' names right there, and I think that's a better and more clear way to do it.. Don't let your C experience throw you off here. There's no advantage to defining a string "constant" in this particular case, unless it's something that you want to make configurable, which this isn't.
Also, even if you do make a string constant, there's a minor advantage to having the complete 'data-whatever' string instead of just 'whatever'. The reason is that when somebody reads your HTML code, they may see a string in it and search the JS code for that string. But when they search for data-whatever they won't find it if the data- prefix is automagically prepended in the JS code.
Oh, I forgot one last thing. This code:
function Form(){
var args = Array.prototype.slice.call(arguments),
formName = args[0],
callback = args.pop(),
is working way too hard! Just do this instead:
function Form( formName, callback ) {
(and keep the var for the remaining variable declarations of course)
I cannot add comments yet so here is a little tip. I would separate the getFormElements() into smaller private functions. And I would add the errorMsgs to the Form function.
But for a first script in JavaScript, it is very impressive. This is actually the real reason I respond. I think it deserves more upvotes, and I would be very interested in a JS ninja responding to this question.
Good luck!

Convert string into storable variable names and values (as strings, and objects)

2015 Edit Don't do this. Be a good person and Just Use JSON.parse() :)
I am trying to take a string which contains variables and values in a javascript-like syntax, and store them in a global object (gv). My issue is just with the parsing of the string.
String (everything inside the <div>):
<div id="gv">
variableName = "variableValue,NoSpacesThough";
portal = "TheCakeIsALie";
</div>
Script (parses string above, places values into global object):
var s = (document.getElementById("gv").innerHTML).split(';');
for (var i = 0; i < s.length; i++) {
if (s[i] !== "\n" || "") {
s[i] = s[i].replace(/^\s*/gm, "");
var varName = s[i].substr(0, s[i].indexOf('=') - 1),
varValue = (s[i].substr((s[i].indexOf('"') + 1), s[i].length)).replace('"', "");
gv[varName] = varValue;
}
}
Result:
console.log(gv.variableName); //returns: variableValue,NoSpacesThough
console.log(gv.portal); //returns: TheCakeIsALie
Q: How can I modify this script to correctly store these variables:
exampleVariable = { name: "string with spaces", cake:lie };
variableName = "variableValue,NoSpacesThough";
portal = "The Cake Is A Lie";
The directly above has an object containing: A string with spaces (and "), a reference
Thanks.
Four options / thoughts / suggestions:
1. Use JSON
If you're in control of the source format, I'd recommend using JSON rather than rolling your own. Details on that page. JSON is now part of the ECMAScript (JavaScript) standard with standard methods for creating JSON strings from object graphs and vice-versa. With your example:
exampleVariable = { name: "string with spaces", cake:lie };
variableName = "variableValue,NoSpacesThough";
portal = "The Cake Is A Lie";
here's what the JSON equivalent would look like:
{
"exampleVariable": { name: "string with spaces", cake:lie },
"variableName": "variableValue,NoSpacesThough",
"portal": "The Cake Is A Lie"
}
As you can see, the only differences are:
You wrap the entire thing in curly braces ({}).
You put the "variable" names (property names) in double quotes.
You use a colon rather than an equal sign after the property name.
You use a comma rather than a semicolon to separate properties (just as in the object literal you have on your exampleVariable line).
You ensure that any string values use double, rather than single, quotes (JavaScript allows either; JSON is more restrictive). Your example uses double quotes, but I mention it just in case...
2. Pre-process it into JSON with regular expressions
If you're not in control of the source format, but it's exactly as you've shown, you could reformat it as JSON fairly easily via regular expressions, and then deserialize it with the JSON stuff. But if the format is more complicated than you've quoted, that starts getting hairy very quickly.
Here's an example (live copy) of transforming what you've quoted to JSON:
function transformToJSON(str) {
var rexSplit = /\r?\n/g,
rexTransform = /^\s*([a-zA-Z0-9_]+)\s*=\s*(.+);\s*$/g,
rexAllWhite = /\s+/g,
lines,
index,
line;
lines = str.split(rexSplit);
index = 0;
while (index < lines.length) {
line = lines[index];
if (line.replace(rexAllWhite, '').length === 0) {
// Blank line, remove it
lines.splice(index, 1);
}
else {
// Transform it
lines[index] = line.replace(rexTransform, '"$1": $2');
++index;
}
}
result = "{\n" + lines.join(",\n") + "\n}";
return result;
}
...but beware as, again, that relies on the format being exactly as you showed, and in particular it relies on each value being on a single line and any string values being in double quotes (a requirement of JSON). You'll probably need to handle complexities the above doesn't handle, but you can't do it with things like your first line var s = (document.getElementById("gv").innerHTML).split(';');, which will break lines on ; regardless of whether the ; is within quotes...
3. Actually parse it by modifying a JSON parser to support your format
If you can't change the format, and it's less precise than the examples you've quoted, you'll have to get into actual parsing; there are no shortcuts (well, no reliable ones). Actually parsing JavaScript literals (I'm assuming there are not expressions in your data, other than the assignment expression of course) isn't that bad. You could probably take a JSON parser and modify it to your needs, since it will already have nearly all the logic for literals. There are two on Crockford's github page (Crockford being the inventer of JSON), one using recursive descent and another using a state machine. Take your choice and start hacking.
4. The evil eval
I suppose I should mention eval here, although I don't recommend you use it. eval runs arbitrary JavaScript code from a string. But because it runs any code you give it, it's not a good choice for deserializing things like this, and any free variables (like the ones you've quoted) would end up being globals. Really very ugly, I mostly mention it in order to say: Don't use it. :-)

Categories

Resources