I need to split a string of text into its component words, so I'm using a Regex to split it on the empty spaces (in a Typescript file, btw).
splitIntoWords(text: string) : Array<string> {
const separator = ' ';
const words = text.split(new RegExp(separator, 'g'));
return words;
}
This mostly works, but I've noticed that I regularly get words in the array that still contain spaces. If I copy the text into the Chrome console and split(' ') it I get the correct amount of words, but when I use the variable (even in the console) it invariably fails in some cases. I can't work out what the difference is. This is an example of my text:
"Le coronavirus en France : la décrue se poursuit en réanimation, la reprise économique au cœur des préoccupations. La mise en œuvre du plan de déconfinement élaboré par le gouvernement doit encore faire l’objet, jeudi, d’un « travail de concertation et d’adaptation aux réalités de terrain » avec les responsables et les élus locaux."
The regex never manages to split the substring "économique au" into two components, for instance. Does anyone know why this is happening?
It sounds like the whitespace is occasionally not just a plain space. You can split on all whitespace by using \s for the separator instead, which will match any whitespace, including space characters and tab characters.
const text = "Le coronavirus en France : la décrue se poursuit en réanimation, la reprise économique au cœur des préoccupations. La mise en œuvre du plan de déconfinement élaboré par le gouvernement doit encore faire l’objet, jeudi, d’un « travail de concertation et d’adaptation aux réalités de terrain » avec les responsables et les élus locaux.";
const words = text.split(/\s/);
console.log(words);
Another option would be to use match instead of split, and match non-whitespace characters.
const text = "Le coronavirus en France : la décrue se poursuit en réanimation, la reprise économique au cœur des préoccupations. La mise en œuvre du plan de déconfinement élaboré par le gouvernement doit encore faire l’objet, jeudi, d’un « travail de concertation et d’adaptation aux réalités de terrain » avec les responsables et les élus locaux.";
const words = text.match(/\S+/g);
console.log(words);
I have this string that I receive from a get request:
Rekryteringstest för anställning
Det här är rekryteringstestet och samtidigt den sida som är data till programmet som ska skrivas
Uppgiften är ganska generellt skriven för att passa både för de som löser den i t ex Java och de som löser den som t ex en webbsida.
Skriv en lösning som:
1. Öppnar ett fönster (om inte resultatet visas i t ex webbläsare)
2. Laddar webbadresser till bilder med tillhörande kommentar (längst ner på den här sidan, nya bilder varje gång sidan laddas!)
3. Laddar och visar bilderna med tillhörande kommentar
4. Laddar om data (från den här sidan!) automatiskt var 30:e sekund, vid omladdning kan gamla bilder tas bort
5. Har en knapp för att manuellt trigga omladdning
6. Visar någon form av status när data laddas
7. Har en knapp för att avsluta applikationen
8. Har en 'Om'-dialog som visar kontaktinformation till dig
9. Lösningen ska vara enkel att testköra och om applicerbart EN körbar fil
A. Skicka in lösningen inklusive all kod till Bouvet
Hur applikationen ser ut är inte lika viktigt som hur applikationen
med tillhörande unit-test är skriven och fungerar.
Data:
https://images.unsplash.com/photo-1514125067037-8e669dd37638?ixlib=rb-0.3.5&ixid=eyJhcHBfaWQiOjEyMDd9&s=1e2adb26fb5dc49fc14efd7f6aeca128&auto=format&fit=crop&w=1650&q=80 Mer publik
---------- END OF THE RESPONSE STRING ------------
Every time i make a request, the https link and text after the link updates.
How can I easily get only these values in this big string?
I have tried this
let splittedArray = response.data.split( "Data:" );
And then I get this
<URL kommentar>
http://3.bp.blogspot.com/-_gbAWeYsKP4/T899GpY3CSI/AAAAAAAAACw/du8qLqu4xEo/s1600/empty.jpg Lådan
https://images.unsplash.com/photo-1514125067037-8e669dd37638?ixlib=rb-0.3.5&ixid=eyJhcHBfaWQiOjEyMDd9&s=1e2adb26fb5dc49fc14efd7f6aeca128&auto=format&fit=crop&w=1650&q=80 Mer publik
for example.
From here I would like to split the https links and the text afterwards in different parts so I can easily use them. At the moment I cannot use split because it is an array (the last part)
As per clarifications in comments, let's start with an example data here :
let splittedArray = [
"part to be discarded",
"<URL kommentar> http://3.bp.blogspot.com/-_gbAWeYsKP4/T899GpY3CSI/AAAAAAAAACw/du8qLqu4xEo/s1600/empty.jpg Lådan
https://images.unsplash.com/photo-1514125067037-8e669dd37638?ixlib=rb-0.3.5&ixid=eyJhcHBfaWQiOjEyMDd9&s=1e2adb26fb5dc49fc14efd7f6aeca128&auto=format&fit=crop&w=1650&q=80 Mer publik"
];
Then, you can't simply use split on the variable splittedArray.
If you want to do further manipulation on the second part (a string that actually contains the links), you need to get this part by referring it as splittedArray[1].
Then you can probably split it by space characters, and keep the ones starting with 'http'.
splittedArray[1].split(/\s+/)
let splittedArray = [
"part to be discarded",
"<URL kommentar> http://3.bp.blogspot.com/-_gbAWeYsKP4/T899GpY3CSI/AAAAAAAAACw/du8qLqu4xEo/s1600/empty.jpg Lådan \
https://images.unsplash.com/photo-1514125067037-8e669dd37638?ixlib=rb-0.3.5&ixid=eyJhcHBfaWQiOjEyMDd9&s=1e2adb26fb5dc49fc14efd7f6aeca128&auto=format&fit=crop&w=1650&q=80 Mer publik"
];
let splittedSecondPart = splittedArray[1].split(/\s+/);
let filteredByHttp = splittedSecondPart.filter(x => x.startsWith('http'));
console.log(filteredByHttp);
Expected Income/Output
Input: Longines, retailed by Barth, Zurich, ref. 22127, movement no. 5770083,
Desired Output: 5770083
Only digits from this I will build: {"Movement Number": 5770083}
I believe I will need to run multiple regexes against each string as I need to know the following:
Which title belongs to which string ie movement no.= 5770083 etc
Multiple different languages will be used for the same title, for example:
Movement number variations:
Movement no.
mouvement signés.Numérotée
no
MVT
jewels #
Werk-Nr.
Current regex: /movement no. ([^\s]+)/
With the above regex it will also pick up the ,.
It is also case insensitive.
Test String
Longines. A very fine and rare stainless steel water-resistant
chronograph wristwatch with black dial and original box\nSigned
Longines, retailed by Barth, Zurich, ref. 22127, movement no. 5770083,
case no. 46, circa 1941\nCal. 13 ZN nickel-finished lever movement, 17
jewels, the black dial with Arabic numerals, outer railway five minute
divisions and tachymetre scale, two subsidiary dials indicating
constant seconds and 30 minutes register, in large circular
water-resistant-type case with flat bezel, downturned lugs, screw
back, two round chronograph buttons in the band, case and movement
signed by maker, dial signed by maker and retailer\n37 mm. diam.
Test String French
MONTRE BRACELET D'HOMME CHRONOGRAPHE EN OR, PAR LONGINES\n\nDe forme
ronde, le cadran noir à chiffres arabes, cadran auxiliaire pour les
secondes à neuf heures et totalisateur de minutes à trois heures,
mouvement mécanique 13 Z N, vers 1960, poids brut: 44.49 gr., monture
en or jaune 18K (750)\n\nCadran Longines, mouvement no. 3872616, fond
de boîte no. 5872616\nVeuillez noter que les bracelets de montre
pouvant être en cuirs exotiques provenant d'espèces protégées, tels le
crocodile, ils ne sont pas vendus avec les montre même s'ils sont
exposés avec celles-ci. Christie's devra retirer et conserver ces
bracelets avant leur collecte par les acheteur
You can use
\b((?:Movement|mouvement) no\.|mouvement signés\.Numérotée|no|MVT|jewels #|Werk-Nr\.) (\d+)
https://regex101.com/r/thL0wt/1
Start at a word boundary, then inside a capturing group, alternate between all the different possible phrases you want before a number - then, match a space, and capture numeric characters in another group. Your desired result will be in the first and second capturing groups.
const input = `Longines. A very fine and rare stainless steel water-resistant chronograph wristwatch with black dial and original box\nSigned Longines, retailed by Barth, Zurich, ref. 22127, movement no. 5770083, case no. 46, circa 1941\nCal. 13 ZN nickel-finished lever movement, 17 jewels, the black dial with Arabic numerals, outer railway five minute divisions and tachymetre scale, two subsidiary dials indicating constant seconds and 30 minutes register, in large circular water-resistant-type case with flat bezel, downturned lugs, screw back, two round chronograph buttons in the band, case and movement signed by maker, dial signed by maker and retailer\n37 mm. diam.
MONTRE BRACELET D'HOMME CHRONOGRAPHE EN OR, PAR LONGINES\n\nDe forme ronde, le cadran noir à chiffres arabes, cadran auxiliaire pour les secondes à neuf heures et totalisateur de minutes à trois heures, mouvement mécanique 13 Z N, vers 1960, poids brut: 44.49 gr., monture en or jaune 18K (750)\n\nCadran Longines, mouvement no. 3872616, fond de boîte no. 5872616\nVeuillez noter que les bracelets de montre pouvant être en cuirs exotiques provenant d'espèces protégées, tels le crocodile, ils ne sont pas vendus avec les montre même s'ils sont exposés avec celles-ci. Christie's devra retirer et conserver ces bracelets avant leur collecte par les acheteur`;
const matches = {};
let match;
const pattern = /\b((?:Movement|mouvement) no\.|mouvement signés\.Numérotée|no|MVT|jewels #|Werk-Nr\.) (\d+)/gmi;
while (match = pattern.exec(input)) {
matches[match[1]] = match[2];
// or, if you only want a single object:
const obj = {
[match[1]]: match[2]
};
}
console.log(matches);
For movement no. specifically you'll want this regex to get rid of the comma:
movement no. ([^\s\W]+)
In regards to the languages, a set of if statements performing the appropriate term that you want to test against is the only way I can think of unless the RegExp object allows for string substitution. Sorry for not being more help in that area.
You are using negated character class [^\s]+, which matches everything except whitespace. So, if there's another character you don't want to match, i.e. comma ,, then add it to this class: [^\s,].
And you can follow same logic for any character you don't want to match.
var input = "Longines, retailed by Barth, Zurich, ref. 22127, movement no. 5770083";
var output = input.match(/(?<=movement no. )\d+/)
I got this string:
[[Fil:Hoganas_hamn.jpg|miniatyr|Höganäs Hamn.]] [[Fil:Hoganas_hamn_kvickbadet.jpg|miniatyr|Höganäs Hamn - Kvickbadet.]] [[Fil:Höganäs Jefast ny redigerad-1.jpg|miniatyr|Jefasthuset sett från väster med en del av den nya bryggan vid Kvickbadet.]] '''Höganäs''' är en [[tätort]] och [[centralort]] i [[Höganäs kommun]] i [[Skåne län]]. Höganäs blev stad 1936. Ursprungligen är Höganäs ett [[fiskeläge]] kring vilket en [[gruvindustri]] utvecklades för brytning av [[kol (bränsle)|kol]] och [[lera|leror]] för tillverkning av [[eldfast]] [[keramik]] ([[Höganäskrus]]). Gruvindustrin är numera nedlagd.
I want to exclude every instance of [[FIL: + dynamic word]] and every [[, ]], but not exclude the word itself when its only [[word]] without the "FIL:" in it.
I've begun doing a regex for it but I'm stuck.
\[\[\Fil:|\]\]
The output Im after should look like this:
'''Höganäs''' är en tätort och centralort i Höganäs kommun i Skåne län. Höganäs blev stad 1936. Ursprungligen är Höganäs ett fiskeläge kring vilket en gruvindustri utvecklades för brytning av kol (bränsle)|kol och lera|leror för tillverkning av eldfast keramik (Höganäskrus). Gruvindustrin är numera nedlagd.
I have JQuery but think .replace should do the trick?
Try replacing all matches for this Regex with an empty string:
\[\[Fil:[^\]]*\]\]|\[\[|\]\]
To break this down:
\[\[Fil:[^\]]*\]\] matches [[Fil:...]]
\[\[ matches remaining [[
\]\] matches remaining ]]
| combines with OR
To get your exact output, you may need to strip some whitespace as well:
\[\[Fil:[^\]]*\]\]\s+|\[\[|\]\]
So, in JavaScript, you could write:
x.replace(/\[\[Fil:[^\]]*\]\]\s+|\[\[|\]\]/g, '');
Try this, maybe you want also to adjust spaces
var string = "[[Fil:Hoganas_hamn.jpg|miniatyr|Höganäs Hamn.]] [[Fil:Hoganas_hamn_kvickbadet.jpg|miniatyr|Höganäs Hamn - Kvickbadet.]] [[Fil:Höganäs Jefast ny redigerad-1.jpg|miniatyr|Jefasthuset sett från väster med en del av den nya bryggan vid Kvickbadet.]] '''Höganäs''' är en [[tätort]] och [[centralort]] i [[Höganäs kommun]] i [[Skåne län]]. Höganäs blev stad 1936. Ursprungligen är Höganäs ett [[fiskeläge]] kring vilket en [[gruvindustri]] utvecklades för brytning av [[kol (bränsle)|kol]] och [[lera|leror]] för tillverkning av [[eldfast]] [[keramik]] ([[Höganäskrus]]). Gruvindustrin är numera nedlagd.";
var result = string.replace(/\[\[Fil:.*?\]\]/g, '').replace(/\[\[(.*?)\]\]/g, '$1');
console.log(result);
You can use a regex like this
\[\[.*?\]\]
And then use the callback function version of replace to check if starts with Fil: then conditionally decide whether you want to return a blank string to erase it, or just the word itself.
Alternately, use 2 regexes. Replace the Fil: ones with a blank string first, and then the rest with just the word. You can use
\[\[(\w+)\]\]
Or something similar to catch the [[word]] ones and then replace it with a backreference to the word, i.e., \1 refers to what's in parentheses.
Hello I'm having trouble with the function setUpTranslation().
//The purpose of this function is to place the French phrases into the document and set up the event handlers for the mousedown and mouseup events.
//These are the arrays of the French phrases and English phrases that I have do place into the document:
var english = new Array();
english[0] = "This hotel isn't far from the Eiffel Tower.";
english[1] = "What time does the train arrive?";
english[2] = "We have been waiting for the bus for one half-hour.";
english[3] = "This meal is delicious";
english[4] = "What day is she going to arrive?";
english[5] = "We have eleven minutes before the train leaves!";
english[6] = "Living in a foreign country is a good experience.";
english[7] = "Excuse me! I'm late!";
english[8] = "Is this taxi free?";
english[9] = "Be careful when you go down the steps.";
var french = new Array();
french[0] = "Cet hôtel n'est pas loin de la Tour Eiffel.";
french[1] = "A quelle heure arrive le train?";
french[2] = "Nous attendons l'autobus depuis une demi-heure.";
french[3] = "Ce repas est délicieux";
french[4] = "Quel jour va-t-elle arriver?";
french[5] = "Nous avons onze minutes avant le départ du train!";
french[6] = "Habiter dans un pays étranger est une bonne expérience.";
french[7] = "Excusez-moi! Je suis en retard!";
french[8] = "Est-ce que ce taxi est libre?";
french[9] = "Faites attention quand vous descendez l'escalier.";
//function I'm having trouble with
function setUpTranslation(){
var phrases = document.getElementByTagName("p");
for (i =0; i<phrases.length; i++){
phrases[i].number =i;
phrases[i].childNodes[1].innerHTML =french[i];
phrases[i].childNodes[1].onmousedown =function(){
swapFE(event);
phrases[i].childNodes[1].onmouseup =function(){
swapEF(event);
};
};
}
//Below are the other two functions swapFE() and swapEF(). The purpose of the function swapFE() is to exchange the French phrase for the English translation
//The purpose of the function swapEF() is to exchange the English translation for the French phrase.
function swapFE(e){
var phrase =e.srcElement;
var parent =phrase.parentNode;
var idnum =parent.childNodes[0];
var phrasenum =parseInt(idnum.innerHTML)-1;
phrase.innerText =english[phrasenum];
}
function swapEF(e){
var phrase =e.srcElement;
var parent =phrase.parentNode;
var idnum =parent.childNodes[0];
var phrasenum =parseInt(idnum.innerHTML)-1;
phrase.innerText =french[phrasenum];
}
//Not sure if these are right. Thanks in advance!
Assuming that your HTML looks like this
<p><span>1</span><span></span></p>
<p><span>2</span><span></span></p>
...
<p><span>10</span><span></span></p>
Then all you need to do is to add the curly bracket after swapFE(event); (points for Mr Plunkett) and replace getElementByTagName with getElementsByTagName (you're missing an 's' in there).
One additional thing to note: If the English phrase is shorter than the French, the container might shrink when the onmousedown event fires. If this shrinkage causes the mouse cursor to be positioned outside the container, the subsequent onmouseup event will not be triggered. Of course, if you are using block elements (e.g. a <div>) instead of my assumed <span>, that likely isn't an issue. In any case, it's probably better to attach the event listeners to the <p> tags instead.