Replace new line characters with \r - javascript

I'm using hl7parser to parse ADM files.
The documentation states that to create a new Message object, a string should be passed:
var message = hl7parser.create("MSH|^~\&|||||20121031232617||ADT^A04|20381|P|2.3||||NE\rEVN|A04|20121031162617||01\rPID|1|16194|16194||Jones^Bob");
Notice that the string uses '\r' to separate segments (MSH, EVN, PID).
I'm fetching the data from a server, which returns for instance the following data.
MSH|^~\&|EPICADT|DH|LABADT|DH|201301011226||ADT^A01|HL7MSG00001|P|2.3.1|
EVN|A01|201301011223||
PID|||MRN12345^5^M11||APPLESEED^JOHN^A^III||19710101|M||C|1 CATALYZE STREET^^MADISON^WI^53005-1020|GL|(414)379-1212|(414)271-3434||S||MRN12345001^2^M10|123456789|987654^NC|
NK1|1|APPLESEED^BARBARA^J|WIFE||||||NK^NEXT OF KIN
PV1|1|I|2000^2012^01||||004777^GOOD^SIDNEY^J.|||SUR||||ADM|A0|
Replacing the \n with \r with replace() doesn't make the parsing work, neither does split('\n') and join('\r').
I noticed that there is a difference when logging the string passed in the example and the string after replacing with \r
With string in example:
PID|1|16194|16194||Jones^BobADT^A04|20381|P|2.3||||NE
It's only printing the last segment apparently because of the \r characters
With my replacement method:
PID|||MRN12345^5^M11||APPLESEED^JOHN^A^III||19710101|M||C|1 CATALYZE STREET^^MADISON^WI^53005-1020|GL|(414)379-1PV1|1|I|2000^2012^01||||004777^GOOD^SIDNEY^J.|||SUR||||ADM|A0|
The entire string is printed, not just the last segment.
I'm not sure why there is a difference when printing them. Is there a difference between passing a literal string with \r character and "adding" \r to a string?

Doing this should work:
const lines = "A\nB\nC";
const result = lines.split("\n").join("\r");
console.log(result);
The confusion probably comes from the fact that it looks like it didn't, since it looks like it just output ABC.
However, if we check out the length of the string produced:
const lines = "A\nB\nC";
const result = lines.split("\n").join("\r");
console.log(result);
console.log(result.length);
Notice that it is 5 characters long, not 3. The \r is there. It's just that when it is output to most things, it basically gets hidden because an \r doesn't really render to anything on its own.
It is a "carriage return" and only MacOS (before X) used it as a newline character. Windows uses a combination of \r\n to render a newline and Linux (and MacOSX) uses \n.
If it wanted an explicitly shown in the string \r, then you'd need to use an escaped one (though this is almost certainly not what it expects):
const lines = "A\nB\nC";
const result = lines.split("\n").join("\\r");
console.log(result);
console.log(result.length);

function replaceLfWithCr(text) {
return text.replace(/\n/g, '\r');
}

Related

Escaping backslash in a string containing backslash

I have a string containing I\u2019m (with backslashes not escaped)
var myString = 'I\\u2019m'; // I\u2019m
But then I need a function that 'escape backslashes' that string, so the function I'm looking for would return I'm
backslashString(myString); // I'm
I've tried using eval:
function backslashString(input){
input = input.replace(/'/g, "\\'"); // Replace ' with \' that's going to mess up eval
return eval(`'${input}'`);
}
But is there a proper way of doing it? I'm looking for a function that escape backslashes a string containing I\u2019m to I'm and also handles if there's an extra backslash (A lost \ backslash)
EDIT:
I did not ask what I meant from the start. This not only applies to unicode characters, but applies to all backslash characters including \n
The backslashes aren’t the real problem here - the real problem is the difference between code and data.
\uXXXX is JavaScript syntax to write the Unicode codepoint of a character in a text literal. It gets replaced with the actual character, when the JavaScript parser interprets this code.
Now you have a variable that contains the value I\u2019m already - that is data. This does not get parsed as JavaScript, so it does mean the literal characters I\u2019m, and not I’m. eval can “fix” that, because the missing step of interpreting this as code is simply what eval does.
If you do not want to use eval (and thereby invite all the potential risks that entails, if the input data is not completely under your control), then you can parse those numeric values from the string using regular expressions, and then use String.formCharCode to create the actual Unicode character from the given code point:
var myString = 'I\\u2019m and I\\u2018m';
var myNewString = myString.replace(/\\u([0-9]+)/g, function(m, n) {
return String.fromCharCode(parseInt(n, 16)) }
);
console.log(myNewString)
/\\u([0-9]+)/g - regular expression to match this \uXXXX format (X=digits), g modifier to replace all matches instead of stopping after the first.
parseInt(n, 16) - to convert the hexadecimal value to a decimal first, because String.fromCharCode wants the latter.
decodeURIComponent(JSON.parse('"I\\u2019m"'));
OR for multiple
'I\\\u2019m'.split('\\').join().replace(/,/g,'');
'I\u2019m'.split('\\').join().replace(/,/g,'');
Looks like there's no other way other than eval (JSON.parse doesn't like new lines in strings)
NOTE: The function would return false if it has a trailing backslash
function backslashString(input){
input = input.replace(/`/g, '\\`'); // Escape quotes for input to eval
try{
return eval('`'+input+'`');
}catch(e){ // Will return false if input has errors in backslashing
return false;
}
}

Line endings (also known as Newlines) in JS strings

It is well known, that Unix-like system uses LF characters for newlines, whereas Windows uses CR+LF.
However, when I test this code from local HTML file on my Windows PC, it seems that JS treat all newlines as separated with LF. Is it correct assumption?
var string = `
foo
bar
`;
// There should be only one blank line between foo and bar.
// \n - Works
// string = string.replace(/^(\s*\n){2,}/gm, '\n');
// \r\n - Doesn't work
string = string.replace(/^(\s*\r\n){2,}/gm, '\r\n');
alert(string);
// That is, it seems that JS treat all newlines as separated with
// `LF` instead of `CR+LF`?
I think I found an explanation.
You are using an ES6 Template Literal to construct your multi-line string.
According to the ECMAScript specs a
.. template literal component is interpreted as a sequence of Unicode
code points. The Template Value (TV) of a literal component is
described in terms of code unit values (SV, 11.8.4) contributed by the
various parts of the template literal component. As part of this
process, some Unicode code points within the template component are
interpreted as having a mathematical value (MV, 11.8.3). In
determining a TV, escape sequences are replaced by the UTF-16 code
unit(s) of the Unicode code point represented by the escape sequence.
The Template Raw Value (TRV) is similar to a Template Value with the
difference that in TRVs escape sequences are interpreted literally.
And below that, it is defined that:
The TRV of LineTerminatorSequence::<LF> is the code unit 0x000A (LINE
FEED).
The TRV of LineTerminatorSequence::<CR> is the code unit 0x000A (LINE FEED).
My interpretation here is, you always just get a line feed - regardless of the OS-specific new-line definitions when you use a template literal.
Finally, in JavaScript's regular expressions a
\n matches a line feed (U+000A).
which describes the observed behavior.
However, if you define a string literal '\r\n' or read text from a file stream, etc that contains OS-specific new-lines you have to deal with it.
Here are some tests that demonstrate the behavior of template literals:
`a
b`.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
(String.raw`a
b`).split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
'a\r\nb'.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
"a\
b".split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
Interpreting the results:
char(97) = a, char(98) = b
char(10) = \n, char(13) = \r
You could use the regular expression: /^\s*[\r\n]/gm
Code example:
let string = `
foo
bar
`;
string = string.replace(/^\s*[\r\n]/gm, '\r\n');
console.log(string);

How to insert a <CR> (Carriage Return) in a Javascript String?

I need to insert a Carriage Return in a String. As far as I know the \r should do it but here is the problem:
I wrote this in the browser console: 1\r2 then I get: 12 in return. Now I copy/paste it on Notepad++ https://imgur.com/a/pUW8p
There is a CR and a LF there. How I can add just a CR?
Note that you can replace the LF(\n) in notepad, save the file and the LFs are gone.
In ES6 with string templates you just introduce the carriage return as written text.
Also you can use \n to mark carriage return in quoted strings.
const str = `l
e`;
console.log(str);
const str2 = 'l\ne';
console.log(str2);
Usually, carriage returns often go like this...
\n
and you can use more than one to go down multiple lines.
It's the equivalent of pressing the enter/return key after you type one line and you want to move to the next one.
For example,
var astring = "This is a \n test";
print(astring);
You would get:
This is a
test
Hope it answers your question,
Alexander B.

Parse JSON but preserve \n in strings

I have this JSON string:
{\"text\":\"Line 1\\nLine 2\",\"color\":\"black\"}
I can parse it when I do this:
pg = JSON.parse(myJSONString.replace(/\\/g, ""));
But when I access pg.text the value is:
Line 1nLine 2.
But I want the value to be exactly:
Line 1\nLine 2
The JSON string is valid in terms of the target program which interprets it as part of a larger command. It's Minecraft actually. Minecraft will render this as you would expect with Line 1 and Line 2 on separate lines.
But I'm making a editor that needs to read the \n back in as is. Which will be displayed in an html input field.
Just as some context here is the full command which contains some JSON code.
/summon zombie ~ ~1 ~ {HandItems:[{id:"minecraft:written_book",Count:1b,tag:{title‌​:"",author:"",pages:‌​["{\"text\":\"Line 1\\nLine 2\",\"color\":\"black\"}"]}},{}]}
Try adding [1] at /\[1]/g but works for single slash only, but since the type of the quoted json i think is a string when you parse that it slash will automatically be removed so you don't even need to use replace. and \n will remain as.
var myString ='{\"text\":\"Line 1\\nLine 2\",\"color\":\"black\"}';
console.log(JSON.parse(myString.replace(/\\[1]/g, ""))); //adding [1] will remove single slash \\n -> \n
var myString =JSON.parse(myString.replace(/\\[1]/g, ""));
console.log(myString.text);
Your string is not valid JSON, and ideally you should fix the code that generates it, or contact the provider of it.
If the issue is that there is always one backslash too many, then you could do this:
// Need to escape the backslashes in this string literal to get the actual input:
var myJSONString = '{\\"text\\":\\"Line 1\\\\nLine 2\\",\\"color\\":\\"black\\"}';
console.log(myJSONString);
// Only replace backslashes that are not preceded by another:
var fixedJSON = myJSONString.replace(/([^\\])\\/g, "$1");
console.log(fixedJSON);
var pg = JSON.parse(fixedJSON);
console.log(pg);

What are the actual uses of ES6 Raw String Access?

What are the actual uses of String.raw Raw String Access introduced in ECMAScript 6?
// String.raw(callSite, ...substitutions)
function quux (strings, ...values) {
strings[0] === "foo\n"
strings[1] === "bar"
strings.raw[0] === "foo\\n"
strings.raw[1] === "bar"
values[0] === 42
}
quux `foo\n${ 42 }bar`
String.raw `foo\n${ 42 }bar` === "foo\\n42bar"
I went through the below docs.
http://es6-features.org/#RawStringAccess
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/template_strings
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
http://www.2ality.com/2015/01/es6-strings.html
https://msdn.microsoft.com/en-us/library/dn889830(v=vs.94).aspx
The only the thing that I understand, is that it is used to get the raw string form of template strings and used for debugging the template string.
When this can be used in real time development? They were calling this a tag function. What does that mean?
What concrete use cases am I missing?
The best, and very nearly only, use case for String.raw I can think of is if you're trying to use something like Steven Levithan's XRegExp library that accepts text with significant backslashes. Using String.raw lets you write something semantically clear rather than having to think in terms of doubling your backslashes, just like you can in a regular expression literal in JavaScript itself.
For instance, suppose I'm doing maintenance on a site and I find this:
var isSingleUnicodeWord = /^\w+$/;
...which is meant to check if a string contains only "letters." Two problems: A) There are thousands of "word" characters across the realm of human language that \w doesn't recognize, because its definition is English-centric; and B) It includes _, which many (including the Unicode consortium) would argue is not a "letter."
So if we're using XRegExp on the site, since I know it supports \pL (\p for Unicode categories, and L for "letter"), I might quickly swap this in:
var isSingleUnicodeWord = XRegExp("^\pL+$"); // WRONG
Then I wonder why it didn't work, facepalm, and go back and escape that backslash, since it's being consumed by the string literal.
Easy enough in that simple regex, but in something complicated, remembering to double all those backslashes is a maintenance pain. (Just ask Java programmers trying to use Pattern.)
Enter String.raw:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`);
Example:
let isSingleUnicodeWord = XRegExp(String.raw`^\pL+$`); // L: Letter
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Now I just kick back and write what I mean. I don't even really have to worry about ${...} constructs used in template literals to do substitution, because the odds of my wanting to apply a quantifier {...} to the end-of-line assertion ($) are...low. So I can happily use substitutions and still not worry about backslashes. Lovely.
Having said that, though, if I were doing it a lot, I'd probably want to write a function and use a tagged template instead of String.raw itself. But it's surprisingly awkward to do correctly:
// My one-time tag function
function xrex(strings, ...values) {
let raw = strings.raw;
let max = Math.max(raw.length, values.length);
let result = "";
for (let i = 0; i < max; ++i) {
if (i < raw.length) {
result += raw[i];
}
if (i < values.length) {
result += values[i];
}
}
console.log("Creating with:", result);
return XRegExp(result);
}
// Using it, with a couple of substitutions to prove to myself they work
let category = "L"; // L: Letter
let maybeEol = "$";
let isSingleUnicodeWord = xrex`^\p${category}+${maybeEol}`;
console.log(isSingleUnicodeWord.test("Русский")); // true
console.log(isSingleUnicodeWord.test("日本語")); // true
console.log(isSingleUnicodeWord.test("العربية")); // true
console.log(isSingleUnicodeWord.test("foo bar")); // false
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Maybe the hassle is worth it if you're using it in lots of places, but for a couple of quick ones, String.raw is the simpler option.
First, a few things:
Template strings is old name for template literals.
A tag is a function.
String.raw is a method.
String.raw `foo\n${ 42 }bar\` is a tagged template literal.
Template literals are basically fancy strings.
Template literals can interpolate.
Template literals can be multi-line without using \.
String.raw is required to escape the escape character \.
Try putting a string that contains a new-line character \n through a function that consumes newline character.
console.log("This\nis\nawesome"); // "This\nis\nawesome"
console.log(String.raw`This\nis\nawesome`); // "This\\nis\\nawesome"
If you are wondering, console.log is not one of them. But alert is. Try running these through http://learnharmony.org/ .
alert("This\nis\nawesome");
alert(String.raw`This\nis\nawesome`);
But wait, that's not the use of String.raw.
Possible uses of String.raw method:
To show string without interpretation of backslashed characters (\n, \t) etc.
To show code for the output. (As in example below)
To be used in regex without escaping \.
To print windows director/sub-directory locations without using \\ to much. (They use \ remember. Also, lol)
Here we can show output and code for it in single alert window:
alert("I printed This\nis\nawesome with " + Sring.raw`This\nis\nawesome`);
Though, it would have been great if It's main use could have been to get back the original string. Like:
var original = String.raw`This is awesome.`;
where original would have become: This\tis \tawesome.. This isn't the case sadly.
References:
http://exploringjs.com/es6/ch_template-literals.html
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw
Template strings can be useful in many situations which I will explain below. Considering this, the String.raw prevents escapes from being interpreted. This can be useful in any template string in which you want to contain the escape character but do not want to escape it. A simple example could be the following:
var templateWithBackslash = String.raw `someRegExp displayed in template /^\//`
There are a few things inside that are nice to note with template strings.
They can contain unescaped line breaks without problems.
They can contain "${}". Inside these curly braces the javascript is interpreted instead.
(Note: running these will output the result to your console [in browser dev tools])
Example using line breaks:
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
{
asdf: "and some pretty printed json"
}
</pre>
</div>
`
console.log(myTemplate)
If you wanted to do the above with a normal string in Javascript it would look like the following:
var myTemplate = "\
<div class="myClass">\
<pre>\
My formatted text\
with multiple lines\
{\
asdf: "and some pretty printed json"\
}\
</pre>\
</div>"
console.log(myTemplate)
You will notice the first probably looks much nicer (no need to escape line breaks).
For the second I will use the same template string but also insert the some pretty printed JSON.
var jsonObj = {asdf: "and some pretty printed json", deeper: {someDeep: "Some Deep Var"}}
var myTemplate = `
<div class="myClass">
<pre>
My formatted text
with multiple lines
${JSON.stringify(jsonObj, null, 2)}
</pre>
</div>
`
console.log(myTemplate)
In NodeJS it is extremely handy when it comes to filepath handling:
var fs=require('fs');
var s = String.raw`C:\Users\<username>\AppData\Roaming\SomeApp\someObject.json`;
var username = "bob"
s=s.replace("<username>",username)
fs.readFile(s,function(err,result){
if (err) throw error;
console.log(JSON.parse(result))
})
It improves readability of filepaths on Windows. \ is also a fairly common separator, so I can definitely see why it would be useful in general. However it is pretty stupid how \ still escapes `... So ultimately:
String.raw`C:\Users\` //#==> C:\Users\`
console.log(String.raw`C:\Users\`) //#==> SyntaxError: Unexpected end of input.
In addition to its use as a tag, String.raw is also useful in implementing new tag functions as a tool to do the interleaving that most people do with a weird loop. For example, compare:
function foo(strs, ...xs) {
let result = strs[0];
for (let i = 0; i < xs.length; ++i) {
result += useFoo(xs[i]) + strs[i + 1];
}
return result;
}
with
function foo(strs, ...xs) {
return String.raw({raw: strs}, ...xs.map(useFoo));
}
The Use
(Requisite knowledge: tstring §.)
Instead of:
console.log(`\\a\\b\\c\\n\\z\\x12\\xa9\\u1234\\u00A9\\u{1234}\\u{00A9}`);
.you can:
console.log(String.raw`\a\b\c\n\z\x12\xa9\u1234\u00A9\u{1234}\u{00A9}`);
"Escaping"
<\\u> is fine, yet <\u> needs "escaping", eg:
console.log(String.raw`abc${'\\u'}abc`);
.Dit <\\x>, <\x>,
<console.log(String.raw`abc${`\\x`}abc`)>;
.<\`>, <`>, <console.log(String.raw`abc${`\``}abc`)>;
.<\${>, <${&>, <console.log(String.raw`abc${`$\{`}abc`)>;
.<\\1> (till <\\7>), <\1>, <console.log(String.raw`abc${`\\1`}abc`)>;
.<\\>, endunit <\>, <console.log(String.raw`abc${`\\`}`)>.
Nb
There's also a new "latex" string. Cf §.
I've found it to be useful for testing
my RegExps. Say I have a RegExp which
should match end-of-line comments because
I want to remove them. BUT, it must not
match source-code for a regexp like /// .
If your code contains /// it is not the
start of an EOL comment but a RegExp, as
per the rules of JavaScript syntax.
I can test whether my RegExp in variable patEOLC
matches or doesn't /// with:
String.raw`/\//` .match (patEOLC)
In other words it is a way to let my
code "see" code the way it exists in
source-code, not the way it exists
in memory after it has been read
into memory from source-code, with
all backslashes removed.
It is a way to "escape escaping" but
without having to do it separately
for every backslash in a string, but
for all of them at the same time.
It is a way to say that in a given
(back-quoted) string backslash
shall behave just like any other
character, it has no special
meaning or interpretation.

Categories

Resources