What unicode character I can use to "flag" a string?

What unicode character I can use to "flag" a string? - javascript

I want to represent an object that has several text properties, every one representing the same text value but in different languages. In case the user modifies a single field, the other fields should be revised, and I'm thinking on adding a single Unicode character at the beginning of the string of the other fields, and then to check for fields that need attention, I just have to check the value at obj.text_prop[0].
Which Unicode character can I use for this purpose? Ideally, it would be non-printable, supported in JS and JSON.

Such flagging should be done some other way, at a protocol level other than character level. For example, consider as making each language version an object rather than just a string; the object could then have properties such as needsAttention in addition to the property that contains the string.
But in case you need to embed such information into a string, then you could use ZERO WIDTH SPACE U+200B. As such it means line break opportunity, but this should not disturb here. The main problem is probably that old versions of IE may display it as a small rectangle.
Alternatively, you could use a noncharacter code point such as U+FFFF, if you can make sure that the string is never sent anywhere from the program without removing this code point. As described in Ch. 16 of the Unicode Standard, Special Areas and Format Characters, noncharacter code points are reserved for internal use in an application and should never be used in text interchange.

I would suggest you not to use strange characters in the beginning of the line. You can implement something like this:
<script type="text/javascript">
function LocalizationSet(){};
LocalizationSet.prototype.localizationItems = [];
LocalizationSet.prototype.itemsNeedAttention = [];
LocalizationSet.prototype.setLocalization = function(langId, text)
{
this.localizationItems[langId] = text;
this.itemsNeedAttention[langId] = true;
}
LocalizationSet.prototype.getLocalization = function(langId)
{
return this.localizationItems[langId];
}
LocalizationSet.prototype.needsAttention = function(langId)
{
if(this.itemsNeedAttention[langId] == null)
{
return false;
}
return this.itemsNeedAttention[langId];
}
LocalizationSet.prototype.unsetAttentionFlags = function()
{
for(var it in this.itemsNeedAttention)
{
this.itemsNeedAttention[it] = false;
}
}
//Example
var set = new LocalizationSet();
set.setLocalization("en","Hello");
set.setLocalization("de","Willkommen");
alert(set.needsAttention("en"));
alert(set.needsAttention("de"));
set.unsetAttentionFlags();
alert(set.needsAttention("en"));
set.setLocalization("en","Hi");
alert(set.needsAttention("en"));
//Shows true,true,false,true
</script>

Related

Create a Range spanning the character before cursor in Word Javascript API / Office.js

I'm trying to create an Office Javascript Add-in which will examine the character before the cursor and replace that character depending on what it is. So I need to create a Range of the character before the cursor. I can do this easily with a VBA macro, but unfortunately, I can't find a way to do this with the new javascript api. Is this possible?
If this is possible, it would also be helpful if I could look at the 5 characters before and after the cursor for added context.
Thanks.

A couple months ago I tried something similar. In short there is no good way to do it. You could try what I will specify below, but I would advice against it. The example is not thought through and most likely will contain a number of bugs. Additionally I find this an incredibly inefficient way to do something so simple.
Limitations in the API that prevent an easy solution:
There is no cursor, only selections. This means that you need to make an assumption that the cursor is always at the beginning of a selection.
Selections cannot be directly modified through the Office.js API. So it is not possible to expand the selection to include the previous character.
The 'Range' object does allow to be extended into both directions, but it requires another range as input. This means an earlier range needs to created/found (i.e. a range object before the current selection).
You can only navigate outside of the selection through the property 'parentBody' which will give you the entire body of the document. This needs to be processed in order to isolate a range before the cursor that could help us replace the character.
From what I can tell it is not possible to create a range for a single character. So a bigger range needs to be taken before the cursor and needs to replaced entirely.
Example
// WARNING: Incredibly inefficient and poor code. Do not use directly!
// WARNING: Edge cases are not tackled in this example.
function replaceCharacterBeforeCursor() {
Word.run(function (context) {
var selection = context.document.getSelection();
// Assumption: Cursor always starts at the beginning of a selection.
var cursor = selection.getRange('Start');
// Create a new range that covers everything before the cursor (or the beginning of the selection).
var startDocument = selection.parentBody.getRange("Start");
var rangeBeforeSelection = startDocument.expandTo(startDocument);
// Capture parent paragraph.
var parentParagraph = rangeBeforeSelection.paragraphs.getLast();
context.load(parentParagraph);
context
.sync()
.then(function () {
// Create range that captures everything from the beginning of the parent
// paragraph until the cursor.
var paragraphStart = parentParagraph.getRange('Start');
var wordRangeBeforeCursor = paragraphStart.expandTo(cursor);
context.load(wordRangeBeforeCursor);
context
.sync()
.then(function () {
// Replace last character.
var oldText = wordRangeBeforeCursor.text;
var wordLength = oldText.length;
var lastCharacter = oldText.substring(wordLength - 1);
if (lastCharacter !== " ") {
var newText = oldText.substring(0, wordLength - 1) + "test";
wordRangeBeforeCursor.insertText(newText, 'Replace');
context.sync();
}
});
});
});
}
Another way to do it is through text ranges. This would be substantially more inefficient. Either way I hope this will help you in finding a solution that fits your needs.

How to use lastIndexOf() function with symbols

Just started using indexOf() and lastIndexOf() functions and I know why they are used, however, the result doesn't make me feel happy :)
let str = $('#info').html();
// WORKS
//alert(str.lastIndexOf('√'));
// DOESN'T WORK
alert(str.lastIndexOf('√'));
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="info">√</div>
The problem is I get the alert result as "-1", which means the √ couldn't be found in the str variable. Using simple symbol √ it works, however, I'm not sure if it's a good practice using this symbol here.
In my opinion, another approach about this problem would be encoding √ symbol in the HTML to √, so using "Inspect element" feature you would see √.
What do you think?

There is no direct way to achieve this. But if you still want to do this way then you simply need to create a HEX value of the ASCII value:
let str = ascii_to_hexa($('#info').html());
str = '&#x0'+str.toUpperCase()+';';
alert(str.lastIndexOf('√'));
function ascii_to_hexa(str)
{
var arr1 = [];
for (var n = 0, l = str.length; n < l; n ++){
var hex = Number(str.charCodeAt(n)).toString(16);
arr1.push(hex);
}
return arr1.join('');
}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="info">√</div>

When the browser reads and parses your HTML, it builds up a DOM, without retaining the exact HTML you provided. Later, if you ask for HTML, it builds a new HTML string using its own rules for doing that.
That's why str.lastIndexOf('√') doesn't work: The browser isn't under any obligation to give you back the character in the same form you used when you supplied it. It could give it back as just a character (√) or a named character reference (√ in this case) or a decimal numeric character reference (√), rather than the hex numeric character reference you're looking for.
You'll have to test on your target browsers to see what they give you, and then look for that. I suspect most if not all will return the actual character, and so your str.lastIndexOf('√') (or str.lastIndexOf('\u221A')) will be the way to go.
<div>√</div>

How to return multiple tokens with Jison lexer

I'm new to lexing and parsing so sorry if the title isn't clear enough.
Basically, I'm using Jison to parse some text and I am trying to get the lexer to comprehend indentation. Here's the bit in question:
(\r\n|\r|\n)+\s* %{
parser.indentCount = parser.indentCount || [0];
var indentation = yytext.replace(/^(\r\n|\r|\n)+/, '').length;
if (indentation > parser.indentCount[0]) {
parser.indentCount.unshift(indentation);
return 'INDENT';
}
var tokens = [];
while (indentation < parser.indentCount[0]) {
tokens.push('DEDENT');
parser.indentCount.shift();
}
if (tokens.length) {
return tokens;
}
if (!indentation.length) {
return 'NEWLINE';
}
%}
So far, almost all of that works as expected. The one problem is the line where I attempt to return an array of DEDENT tokens. It appears that Jison is just converting that array into a string which causes me to get a parse error like Expecting ........, got DEDENT,DEDENT.
What I'm hoping I can do to get around this is manually push some DEDENT tokens onto the stack. Maybe with a function like this.pushToken('DEDENT') or something along those lines. But the Jison documentation is not so great and I could use some help.
Any thoughts?
EDIT:
I seem to have been able to hack my way around this after looking at the generated parser code. Here's what seems to work...
if (tokens.length) {
var args = arguments;
tokens.slice(1).forEach(function () {
lexer.performAction.apply(this, args);
}.bind(this));
return 'DEDENT';
}
This tricks the lexer into performing another action using the exact same input for each DEDENT we have in the stack, thus allowing it to add in the proper dedents. However, it feels gross and I'm worried there could be unforeseen problems.
I would still love it if anyone had any ideas on a better way to do this.

After a couple of days I ended up figuring out a better answer. Here's what it looks like:
(\r\n|\r|\n)+[ \t]* %{
parser.indentCount = parser.indentCount || [0];
parser.forceDedent = parser.forceDedent || 0;
if (parser.forceDedent) {
parser.forceDedent -= 1;
this.unput(yytext);
return 'DEDENT';
}
var indentation = yytext.replace(/^(\r\n|\r|\n)+/, '').length;
if (indentation > parser.indentCount[0]) {
parser.indentCount.unshift(indentation);
return 'INDENT';
}
var dedents = [];
while (indentation < parser.indentCount[0]) {
dedents.push('DEDENT');
parser.indentCount.shift();
}
if (dedents.length) {
parser.forceDedent = dedents.length - 1;
this.unput(yytext);
return 'DEDENT';
}
return `NEWLINE`;
%}
Firstly, I modified my capture regex to make sure I wasn't inadvertently capturing extra newlines after a series of non-newline spaces.
Next, we make sure there are 2 "global" variables. indentCount will track our current indentation length. forceDedent will force us to return a DEDENT if it has a value above 0.
Next, we have a condition to test for a truthy value on forceDedent. If we have one, we'll decrement it by 1 and use the unput function to make sure we iterate on this same pattern at least one more time, but for this iteration, we'll return a DEDENT.
If we haven't returned, we get the length of our current indentation.
If the current indentation is greater than our most recent indentation, we'll track that on our indentCount variable and return an INDENT.
If we haven't returned, it's time to prepare to possible dedents. We'll make an array to track them.
When we detect a dedent, the user could be attempting to close 1 or more blocks all at once. So we need to include a DEDENT for as many blocks as the user is closing. We set up a loop and say that for as long as the current indentation is less than our most recent indentation, we'll add a DEDENT to our list and shift an item off of our indentCount.
If we tracked any dedents, we need to make sure all of them get returned by the lexer. Because the lexer can only return 1 token at a time, we'll return 1 here, but we'll also set our forceDedent variable to make sure we return the rest of them as well. To make sure we iterate on this pattern again and those dedents can be inserted, we'll use the unput function.
In any other case, we'll just return a NEWLINE.

CodeMirror - Using RegEx with overlay

I can't seem to find an example of anyone using RegEx matches to create an overlay in CodeMirror. The Moustaches example matching one thing at a time seems simple enough, but in the API, it says that the RegEx match returns the array of matches and I can't figure out what to do with it in the context of the structure in the moustaches example.
I have a regular expression which finds all the elements I need to highlight: I've tested it and it works.
Should I be loading up the array outside of the token function and then matching each one? Or is there a way to work with the array?
The other issue is that I want to apply different styling depending on the (biz|cms) option in the regex - one for 'biz' and another for 'cms'. There will be others but I'm trying to keep it simple.
This is as far as I have got. The comments show my confusion.
CodeMirror.defineMode("tbs", function(config, parserConfig) {
var tbsOverlay = {
token: function(stream, state) {
tbsArray = match("^<(biz|cms).([a-zA-Z0-9.]*)(\s)?(\/)?>");
if (tbsArray != null) {
for (i = 0; i < tbsArray.length; i++) {
var result = tbsArray[i];
//Do I need to stream.match each element now to get hold of each bit of text?
//Or is there some way to identify and tag all the matches?
}
}
//Obviously this bit won't work either now - even with regex
while (stream.next() != null && !stream.match("<biz.", false)) {}
return null;
}
};
return CodeMirror.overlayMode(CodeMirror.getMode(config, parserConfig.backdrop || "text/html"), tbsOverlay);
});

It returns the array as produced by RegExp.exec or String.prototype.match (see for example https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/String/match), so you probably don't want to iterate through it, but rather pick out specific elements the correspond to groups in your regexp (if (result[1] == "biz") ...)

Look at implementation of Code Mirror method match() and you'll see, that it processes method parameter for two types: string and RegExp.
Your constant in
stream.match("<biz.")
is of string type.
Define it in RegExp type:
tbsArray = /<biz./g
Thus, your stream will be matched with RegExp.

Convert string into storable variable names and values (as strings, and objects)

2015 Edit Don't do this. Be a good person and Just Use JSON.parse() :)
I am trying to take a string which contains variables and values in a javascript-like syntax, and store them in a global object (gv). My issue is just with the parsing of the string.
String (everything inside the <div>):
<div id="gv">
variableName = "variableValue,NoSpacesThough";
portal = "TheCakeIsALie";
</div>
Script (parses string above, places values into global object):
var s = (document.getElementById("gv").innerHTML).split(';');
for (var i = 0; i < s.length; i++) {
if (s[i] !== "\n" || "") {
s[i] = s[i].replace(/^\s*/gm, "");
var varName = s[i].substr(0, s[i].indexOf('=') - 1),
varValue = (s[i].substr((s[i].indexOf('"') + 1), s[i].length)).replace('"', "");
gv[varName] = varValue;
}
}
Result:
console.log(gv.variableName); //returns: variableValue,NoSpacesThough
console.log(gv.portal); //returns: TheCakeIsALie
Q: How can I modify this script to correctly store these variables:
exampleVariable = { name: "string with spaces", cake:lie };
variableName = "variableValue,NoSpacesThough";
portal = "The Cake Is A Lie";
The directly above has an object containing: A string with spaces (and "), a reference
Thanks.

Four options / thoughts / suggestions:
1. Use JSON
If you're in control of the source format, I'd recommend using JSON rather than rolling your own. Details on that page. JSON is now part of the ECMAScript (JavaScript) standard with standard methods for creating JSON strings from object graphs and vice-versa. With your example:
exampleVariable = { name: "string with spaces", cake:lie };
variableName = "variableValue,NoSpacesThough";
portal = "The Cake Is A Lie";
here's what the JSON equivalent would look like:
{
"exampleVariable": { name: "string with spaces", cake:lie },
"variableName": "variableValue,NoSpacesThough",
"portal": "The Cake Is A Lie"
}
As you can see, the only differences are:
You wrap the entire thing in curly braces ({}).
You put the "variable" names (property names) in double quotes.
You use a colon rather than an equal sign after the property name.
You use a comma rather than a semicolon to separate properties (just as in the object literal you have on your exampleVariable line).
You ensure that any string values use double, rather than single, quotes (JavaScript allows either; JSON is more restrictive). Your example uses double quotes, but I mention it just in case...
2. Pre-process it into JSON with regular expressions
If you're not in control of the source format, but it's exactly as you've shown, you could reformat it as JSON fairly easily via regular expressions, and then deserialize it with the JSON stuff. But if the format is more complicated than you've quoted, that starts getting hairy very quickly.
Here's an example (live copy) of transforming what you've quoted to JSON:
function transformToJSON(str) {
var rexSplit = /\r?\n/g,
rexTransform = /^\s*([a-zA-Z0-9_]+)\s*=\s*(.+);\s*$/g,
rexAllWhite = /\s+/g,
lines,
index,
line;
lines = str.split(rexSplit);
index = 0;
while (index < lines.length) {
line = lines[index];
if (line.replace(rexAllWhite, '').length === 0) {
// Blank line, remove it
lines.splice(index, 1);
}
else {
// Transform it
lines[index] = line.replace(rexTransform, '"$1": $2');
++index;
}
}
result = "{\n" + lines.join(",\n") + "\n}";
return result;
}
...but beware as, again, that relies on the format being exactly as you showed, and in particular it relies on each value being on a single line and any string values being in double quotes (a requirement of JSON). You'll probably need to handle complexities the above doesn't handle, but you can't do it with things like your first line var s = (document.getElementById("gv").innerHTML).split(';');, which will break lines on ; regardless of whether the ; is within quotes...
3. Actually parse it by modifying a JSON parser to support your format
If you can't change the format, and it's less precise than the examples you've quoted, you'll have to get into actual parsing; there are no shortcuts (well, no reliable ones). Actually parsing JavaScript literals (I'm assuming there are not expressions in your data, other than the assignment expression of course) isn't that bad. You could probably take a JSON parser and modify it to your needs, since it will already have nearly all the logic for literals. There are two on Crockford's github page (Crockford being the inventer of JSON), one using recursive descent and another using a state machine. Take your choice and start hacking.
4. The evil eval
I suppose I should mention eval here, although I don't recommend you use it. eval runs arbitrary JavaScript code from a string. But because it runs any code you give it, it's not a good choice for deserializing things like this, and any free variables (like the ones you've quoted) would end up being globals. Really very ugly, I mostly mention it in order to say: Don't use it. :-)

Develop Reference

JavaScript is the programming language of the Web.

What unicode character I can use to "flag" a string? - javascript

Related

Create a Range spanning the character before cursor in Word Javascript API / Office.js

How to use lastIndexOf() function with symbols

How to return multiple tokens with Jison lexer

CodeMirror - Using RegEx with overlay

Convert string into storable variable names and values (as strings, and objects)

Categories

Resources