Convert escaped unicode sequence to Emoji in JS - javascript

I have a string in JS as follows. I am having a hard time converting these surrogate pairs to emoji's. Can someone help?
I have tried to get a solution online by searching almost everything that I could, but in vain.
var text = 'CONGRATS! Your task has been completed! Tell us how we did \\uD83D\\uDE4C \\uD83D\\uDC4D \\uD83D\\uDC4E'
This is a node.js code. Is there any easy way to convert these codes to emojis without using an external helper utility?
EDIT:
I updated my code and the regex as follows:
var text = 'CONGRATS! Your task has been completed! Tell us how we did {2722} {1F44D} {1F44E}';
text.replace(/\{[^}]*\}/ig, (_, g) => String.fromCodePoint(`0x${g}`))
What am I doing wrong?

One option could be to replace all Unicode escape sequences with their HEX representations and use String.fromCharCode() to replace it with its associated character:
const text = 'CONGRATS! Your task has been completed! Tell us how we did \\uD83D\\uDE4C \\uD83D\\uDC4D \\uD83D\\uDC4E';
const res = text.replace(/\\u([0-9A-F]{4})/ig, (_, g) => String.fromCharCode(`0x${g}`));
console.log(res);
As for your edit, your issue is with your regular expression. You can change it to be /\{([^}]*)\}/g, which means:
\{ - match an open curly brace.
([^}]*) - match and group the contents after the open curly brace which is not a closed curly brace }.
} - match a closed curly brace.
g - match the expression globally (so all occurrences of the expression, not just the first)
The entire regular expression will match {CONTENTS}, whereas the group will contain only the contents between the two curly braces, so CONTENTS. The match is the first argument provided to the .replace() callback function whereas the group (g) is provided as the second argument and is what we use:
const text = 'CONGRATS! Your task has been completed! Tell us how we did {2722} {1F44D} {1F44E}';
const res = text.replace(/\{([^}]*)\}/g, (_, g) => String.fromCodePoint(`0x${g}`));
console.log(res);

Related

Extract content of code which start with a curly bracket and ends with a curly bracket followed by closing parenthesis

I'm completely mess with Regular Expressions right now(lack of practice).
I'm writing a node script, which goes through a bunch of js files, each file calls a function, with one of the arguments being a json. The aim is to get all those json arguments and place them in one file. The problem I'm facing at the moment is the extraction of the argument part of the code, here is the function call part of that string:
$translateProvider.translations('de', {
WASTE_MANAGEMENT: 'Abfallmanagement',
WASTE_TYPE_LIST: 'Abfallarten',
WASTE_ENTRY_LIST: 'Abfalleinträge',
WASTE_TYPE: 'Abfallart',
TREATMENT_TYPE: 'Behandlungsart',
TREATMENT_TYPE_STATUS: 'Status Behandlungsart',
DUPLICATED_TREATMENT_TYPE: 'Doppelte Behandlungsart',
TREATMENT_TYPE_LIST: 'Behandlungsarten',
TREATMENT_TARGET_LIST: 'Ziele Behandlungsarten',
TREATMENT_TARGET_ADD: 'Ziel Behandlungsart hinzufügen',
SITE_TARGET: 'Gebäudeziel',
WASTE_TREATMENT_TYPES: 'Abfallbehandlungsarten',
WASTE_TREATMENT_TARGETS: '{{Abfallbehandlungsziele}}',
WASTE_TREATMENT_TYPES_LIST: '{{Abfallbehandlungsarten}}',
WASTE_TYPE_ADD: 'Abfallart hinzufügen',
UNIT_ADD: 'Einheit hinzufügen'
})
So I'm trying to write a regular expression which matches the segment of the js code, which starts with "'de', {" and ends with "})", while it can have any characters between(single/double curly brackets included).
I tried something like this \'de'\s*,\s*{([^}]*)})\ , but that doesn't work. The furthest I got was with this \'de'\s*,\s*{([^})]*)}\ , but this ends at the first closing curly bracket within the json, which is not what I want.
It seems, that even the concepts of regular exressions I understood before, now I completely forgot.
Any is help is much appreciated.
You did not state the desired output. Here is a solution that parses the text, and creates an array of arrays. You can easily transform that to a desired output.
const input = `$translateProvider.translations('de', {
WASTE_MANAGEMENT: 'Abfallmanagement',
WASTE_TYPE_LIST: 'Abfallarten',
WASTE_ENTRY_LIST: 'Abfalleinträge',
WASTE_TYPE: 'Abfallart',
TREATMENT_TYPE: 'Behandlungsart',
TREATMENT_TYPE_STATUS: 'Status Behandlungsart',
DUPLICATED_TREATMENT_TYPE: 'Doppelte Behandlungsart',
TREATMENT_TYPE_LIST: 'Behandlungsarten',
TREATMENT_TARGET_LIST: 'Ziele Behandlungsarten',
TREATMENT_TARGET_ADD: 'Ziel Behandlungsart hinzufügen',
SITE_TARGET: 'Gebäudeziel',
WASTE_TREATMENT_TYPES: 'Abfallbehandlungsarten',
WASTE_TREATMENT_TARGETS: '{{Abfallbehandlungsziele}}',
WASTE_TREATMENT_TYPES_LIST: '{{Abfallbehandlungsarten}}',
WASTE_TYPE_ADD: 'Abfallart hinzufügen',
UNIT_ADD: 'Einheit hinzufügen'
})`;
const regex1 = /\.translations\([^{]*\{\s+(.*?)\s*\}\)/s;
const regex2 = /',[\r\n]+\s*/;
const regex3 = /: +'/;
let result = [];
let m = input.match(regex1);
if(m) {
result = m[1].split(regex2).map(line => line.split(regex3));
}
console.log(result);
Explanation of regex1:
\.translations\( -- literal .translations(
[^{]* -- anything not {
\{\s+ -- { and all whitespace
(.*?) -- capture group 1 with non-greedy scan up to:
\s*\}\) -- whitespace, followed by })
s flag to make . match newlines
Explanation of regex2:
',[\r\n]+\s* -- ',, followed by newlines and space (to split lines)
Explanation of regex3:
: +' -- literal : ' (to split key/value)
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
This can be done with lookahead, lookbehind, and boundary-type assertions:
/(?<=^\$translateProvider\.translations\('de', {)[\s\S]*(?=}\)$)/
(?<=^\$translateProvider\.translations\('de', {) is a lookbehind assertion that checks for '$translateProvider.translations('de', {' at the beginning of the string.
(?=}\)$) is a lookahead assertion that checks for '})' at the end of the string.
[\s\S]* is a character class that matches any sequence of space and non-space characters between the two assertions.
Here is the regex101 link for you to test
Hope this helps.

How to delete brackets after a special letter in regex

Hi I am having problem while trying to remove these square brackets.
I figured out how to find square brackets but I need to find square brackets only if it starts with # like this,
by the way I am using .replace to remove them in javascript, Not sure if it is going to help to find the answer.
#[john_doe]
The result must be #john_doe.
I dont want to remove other brackets which is like that,
[something written here]
Here is the link of the regex
You need a regular expression replace solution like
text = text.replace(/#\[([^\][]*)]/g, "#$1")
See the regex demo.
Pattern details
#\[ - a #[ text
([^\][]*) - Group 1 ($1): any zero or more chars other than [ and ] (the ] is special inside character classes (in ECMAScript regex standard, even at the start position) and need escaping)
] - a ] char.
See the JavaScript demo:
let text = '#[john_doe] and [something written here]';
text = text.replace(/#\[([^\][]*)]/g, "#$1");
console.log(text);
You can use Regular expression /#\[/g:
const texts = ['#[john_doe]', '[something written here]']
texts.forEach(t => {
// Match the # character followed by an opening square bracket [
const result = t.replace(/#\[/g, '#')
console.log(result)
})
let name = "#[something_here]"
name.replace(/(\[|\])/, "")

Using regex to place dashes at specific indexes of a string containing numbers and letters

So the formatted code needs looks like this:
"US-XXX-12-12345"
So that's two letters, followed by 3 letters/numbers, followed by 2 numbers, then followed by 5 letters/numbers.
I was able to get it to work using only numbers with the following:
return testString
.replace(/(\d{2})(\d)/, '$1-$2')
.replace(/(\d{3})(\d)/, '$1-$2')
.replace(/(\d{2})(\d{2})/, '$1-$2')
.replace(/(\d{4})(\d{2})/, '$1-$2')
This returns:
"12-345-67-89012"
I tried switching the lowercase d's for uppercase ones (representing letters) and it adds all sorts of extra dashes and does not let me backspace. Any and all help is much appreciated!
EDIT:
Ended up solving it like this:
const clean = new RegExp(/[^a-zA-Z0-9]/, 'gi')
return testString
.replace(clean, '')
.replace(/([a-zA-Z0-9]{2})([a-zA-Z0-9])/, '$1-$2')
.replace(/(-[a-zA-Z0-9]{3})([a-zA-Z0-9])/, '$1-$2')
.replace(/(-[a-zA-Z0-9]{3})(-[a-zA-Z0-9]{2})([a-zA-Z0-9])/, '$1$2-$3')
Thanks to all who tried to help, I really appreciate it <3
I would use a function like this:
function mask(string, model){
let i = 0;
return model.replace(/#/g, () => string[i++] || "");
}
use:
mask("USXXX1212345", "##-###-##-#####") => 'US-XXX-12-12345'
the first parameter of the function is the string you want to format, and the next parameter is how it will be formatted
This isn't quite what you asked for but regex probably isn't the best solution to this question. A better option would probably be to use something like react-input-mask because you mentioned using this in real time in a React form.
import { useState } from "react";
import InputMask from "react-input-mask";
const Example = () => {
const [inputText, setInputText] = useState("");
return (
<InputMask
mask="aa-***-99-*****"
value={inputText}
onChange={(e) => setInputText(e.target.value)}
/>
);
};
And here's a CodeSandbox example because I can't for the life of my embed a functional example in a SO answer: https://codesandbox.io/s/react-input-mask-example-554lcn?file=/src/App.js
If the string is known to have the correct pattern, you can replace matches of the following regular expression with a hyphen.
(?<=^.{2})|(?<=^.{5})|(?<=^.{7})
If lowercase letters are permitted the case-indifferent flag (i) needs to be set.
This expression contains an alternation comprised of three positive lookbehind expressions. It reads, "match the position following the second, fifth or (|) seventh character of the string. Think of (?<=^.{2}) as matching the location between the second and third character of the string. Because it does not match any characters it is referred to as a zero-width match.
Demo
If the string is not known to have the correct pattern, I suggest the string be first matched against the following regular expression to confirm its pattern is correct.
^[A-Z]{2}[A-Z\d]{3}\d{2}[A-Z\d]{5}$
Demo
This regular expression (with the case-indifferent flag is set) can be broken down as follows.
^ # match the beginning of the string
[A-Z]{2} # match two letters
[A-Z\d]{3} # match three letters or digits
\d{2} # match two digits
[A-Z\d]{5} # match five letters or digits
$ # match end of string
The first regular expression can be modified to also confirm that the string has the correct pattern but it complicates the expression considerably. If you would like to see that please let me know.
Here's a solution that does what you want and works on partial input.
It tolerates strings where the dashes are already in the right spot.
Invalid input is left as is.
const rx = /^(?:(\d{1,2})(?:-?([A-Za-z]{1,3})(?:-?(\d{1,2})(?:-?(\d{1,5}))?)?)?)?/
function addDashes(x) {
return x.replace(rx, function(...args) {
const captures = args.slice(1, 5) // get rid of the cruft
return captures.filter(x => x !== undefined).join("-")
})
}
const test = (x) => console.log(x, addDashes(x))
test("")
test("invalid123")
test("1")
test("12a")
test("12-a")
test("12abc")
test("12-abc")
test("12abc12")
test("12-abc12")
test("12-abc-12")
test("12abc1254")
test("12-abc1254")
test("12abc-1254")
test("12-abc-1254")
test("12abc12543")
test("12-abc12-543")
test("12abc1254321")
The RegExp itself is a bit complex, if you want to see how I built it you can have a look at this flems sandbox, where you can tweak it to your liking.

How to include a variable and exclude numbers[0-9] and letters[a-zA-Z] in RegExp?

I have a code that generates a random letter based on the word and I have tried to create a RegExp code to turn all the letters from the word to '_' except the randomly generated letter from the word.
const word = "Apple is tasty"
const randomCharacter = word[Math.floor(Math.random() * word.length)]
regex = new RegExp(/[^${randomCharacter}&\/\\#,+()$~%.'":;*?<>{}\s]/gi)
hint = word.replace(regex,'_')
I want to change all the letters to '_' except the randomly generated word. The above code for some reason does not work and shows the result: A___e __ ta_t_ and I'm not able to figure out what to do.
The final result I want is something like this: A____ __ _a___
Is there a way with regex to change all the alphabets and numbers '/[^a-zA-Z0-9]/g' to '_' except the randomly generated letter?
I'm listing all the expressions I want to include on my above code because I'm not able to figure out a way to do include and exclude at the same time using the variable with regex.
You can't do string interpolation inside of a RegExp literal (/.../). Meaning your placeholder ${randomCharacter} will not evaluate to its value in the template, but is instead interpreted literally as the string "${randomCharacter}".
If you want to use template literals, initialize your regex variable with a RegExp constructor instead, like:
const regex = new RegExp(`[^${randomCharacter}&\\/\\\#,+()$~%.'":;*?<>{}\\s]`, "gi");
See the MDN RegExp documentation for an explanation on the differences between the literal notation and constructor function, most notably:
The constructor of the regular expression object [...] results in runtime compilation of the regular expression. Use the constructor function when [...] you don't know the pattern and obtain it from another source, such as user input.
/(?:[^A\s])/
test it on regex101
just replace A in [^A\s] with you character that you want to ommit from replacement
demo:
const word = "Apple is tasty";
const randomCharacter = 'a';//word[Math.floor(Math.random() * word.length)];
regex = new RegExp('(?:[^' + randomCharacter + '\\s])', 'gi');
hint = word.replaceAll(regex, '_');
console.log(hint)

RegExp replacement with variable and anchor

I've done some research, like 'How do you use a variable in a regular expression?' but no luck.
Here is the given input
const input = 'abc $apple 123 apple $banana'
Each variable has its own value.
Currently I'm able to query all variables from the given string using
const variables = input.match(/([$]\w+)/g)
Replacement through looping the variables array with the following codes is not successful
const r = `/(\$${variable})/`;
const target = new RegExp(r, 'gi');
input.replace(target, value);
However, without using the variable, it will be executed,
const target = new RegExp(/(\$apple)/, 'gi');
input.replace(target, value);
I also changed the variable flag from $ to % or #, and it works with the following codes,
// const target = new RegExp(`%{variable}`, 'gi');
// const target = new RegExp(`#{variable}`, 'gi');
input.replace(target, value);
How to match the $ symbol with variable in this case?
If understand correctly, uses (\\$${variable}).
You can check below Fiddle, if only one slash, it will cause RegExp is /($apple)/gi (the slash and the following character are escaped: \$ => $), but $ indicates the end of string in Regex if not escaped.
So the solution is add another slash.
Like below demo:
const input = 'abc $apple 123 apple $banana'
let variable = 'apple'
let value = '#test#'
const r = `(\\$${variable})`;
const target = new RegExp(r, 'gi');
console.log('template literals', r, `(\$${variable})`)
console.log('regex:', target, new RegExp(`(\$${variable})`, 'gi'))
console.log(input.replace(target, value))
I have been using regular expressions just about everyday for almost a year now. I'll post my thoughts and just say:
Regular Expressions are most useful at finding parts of a text or data
file.
If there is a text-file that contains the word "apple" or some derivative there-in, regular-expressions can be a great tool. I use them everyday for parsing HTML content, as I write foreign-news translations (based in HTML).
I believe the code that was posted was in JavaScript (because I saw the replace(//,"gi") function which is what I know is used in that scripting language. I use Java's java.util.regex package myself, and the syntax is just slightly different.
If all you want to do is put a "place-holder" inside of a String - this code could work, I guess - but again, understanding why "regular-expressions" are necessary seems like the real question. In this example, I have used the ampersand ('&') as the placeholder - since I know for a fact it is not one of the "reserved key words" used by (most, but not necessarily all of) the Regular Expression Compiler and Processor.
var s1 = "&VAR1"; // Un-used variable - leaving it here for show!
var myString = "An example text-string with &VAR1, a particular kind of fruit.";
myString.replace(/&VAR1/gi, "apple");
If you want a great way to practice with Regular-Expressions, go to this web-site and play around with them:
https://regexr.com/
Here are the rules for "reserved key symbols" of RegEx Patterns (Copied from that Site):
The following character have special meaning, and should be preceded
by a \ (backslash) to represent a literal character:
+*?^$.[]{}()|/
Within a character set, only \, -, and ] need to be escaped.
Also, sort of "most importantly" - Regular Expressions are "compiled" in Java. I'm not exactly sure about how they work in Java-Script - but there is no such concept as a "Variable" in the Compiled-Expression Part of a Regular Expression - just in the text and data it analyzes. What that means is - if you want to change what you are searching for in a particular piece of Text or Data in Java, you must re-compile your expression using:
Pattern p = Pattern.compile(regExString, flags);
There is not an easy way to "dynamically change" particular values of text in the expression. The amount of complexity it would add would be astronomical, and the value, minimal. Instead, just compile another expression in your code, and search again. Another option is to better undestand things like .* .+ .*? .+? and even (.*) (.+) (.*?) (.+?) so that things that change, do change, and things that don't change, won't!
For instance if you used this pattern to match different-variables:
input.replace(/&VAR.*VAREND/gi, "apple");
All of your variables could be identified by the re-used pattern: "&VAR-stuff-VAREND" but this is just one of millions of ways to change your idea - skin the cat.
Using a replace callback function you can avoid building a separate regex for each variable and replace them all at once:
const input = 'abc $apple 123 apple $banana'
const vars = {
apple: 'Apfel',
banana: 'Banane'
}
let result = input.replace(/\$(\w+)/g, (_, v) => vars[v])
console.log(result)
This won't work if apple and banana were local variables though, but using locals is a bad idea anyways.

Categories

Resources