`I have an API which send video data in chunks for large files. How can i append those data in a single blob so that i can create a final URL?
I have tried pushing data to blob parts and it is also get appended with correct data size in final blob. But video only play for 7 sec out of 31sec. I think only first chunk is getting appended only.
var MyBlobBuilder = function (this: any) {
this.parts = []
}
MyBlobBuilder.prototype.append = function (part) {
this.parts.push(part)
this.blob = undefined
}
MyBlobBuilder.prototype.getBlob = function () {
if (!this.blob) {
this.blob = new Blob(this.parts, { type: 'video/mp4' })
}
return this.blob
}
let myBlobBuilder = new MyBlobBuilder()
function handleResourceClick(resource: string) {
dispatch(fetchFileType(resource)).then((x: any) => {
let group = x.payload.data.sizeInKB / 5
let start = 0
for (let i = 0; i < 5 && start <= x.payload.data.sizeInKB; i++) {
dispatch(
getVideoUrl({ pdfUrl: resource, offset: start, length: group })
).then((res: any) => {
if (isApiSuccess(res)) {
myBlobBuilder.append(res.payload?.data)
}
})
start = start + group
}
})
}]
Related
I am using JSZip in my little React application. I need to fetch attachments from sharepoint list. Attachments size about 3 gb so I made a decision to download it in parts (200mb). But I got big RAM consumption (more than 3500 mb). I can't find memory leak
Source code. This function outside of React component:
var JSZip = require("jszip");
async function testUploadAllAttachments() {
const ceilSizebytes = 209715200;
// actually it count blobs sizes
let blobCounter = 0;
const filterItemsDate = await sp.web.lists.getByTitle("Reports").items.getAll();
console.log("filterItemsDate: ", filterItemsDate);
let zip = new JSZip();
for (const item of filterItemsDate) {
let allItemAttachments = await sp.web.lists.getByTitle("Reports").items.getById(item.Id).attachmentFiles();
let itemFolder = zip.folder(item.Id);
console.log("itemFolder: ", itemFolder);
console.log("blobCounter: ", blobCounter);
for (const attach of allItemAttachments) {
let urlToFetch = attach.ServerRelativePath.DecodedUrl;
let blob = await fetch(urlToFetch).then(response => {
return response.blob();
});
// Blob size control
blobCounter += blob.size;
if (blobCounter > ceilSizebytes) {
blobCounter = 0;
zip.remove(item.Id);
await zip.generateAsync({ type: "blob",
compression: "DEFLATE",
compressionOptions:{
level: 6
} })
.then(function (content) {
// saveAs(content, "examplePart.zip");
// content = null;
})
zip = null;
zip = new JSZip();
// Recreate missing item on removing step
let itemFolderReset = zip.folder(item.Id);
for (let i = 0; i < allItemAttachments.length; i++) {
let urlToFetchReset = allItemAttachments[i].ServerRelativePath.DecodedUrl;
let blobReset = await fetch(urlToFetch).then(response => {
return response.blob();
});
itemFolderReset.file(allItemAttachments[i].FileName, blobReset );
blobCounter += blobReset.size;
}
continue;
}
else {
itemFolder.file(attach.FileName, blob);
}
}
}
await zip.generateAsync({ type: "blob" })
.then(function (content) {
saveAs(content, "example.zip");
})
}
I would like to send the data by chunks
now what i'm sending to the server look like this
for loop - 1, 2, 3
what the server receives: 3,1,2 -> asynchronous.
and i need to send it synchronic so the server will receive as my for loop order: 1, 2, 3
How can i do it ?
//52428800
const chunkSize = 1377628
let beginUpload = data;
let component = this;
let start = 0;
let startCount = 0;
let callStoreCouunt = 0;
for (start; start < zipedFile.length; start += chunkSize) {
const chunk = zipedFile.slice(start, start + chunkSize + 1)
startCount +=1;
// debugger
// var base64Image = new Buffer( zipedFile ).toString('base64');
var base64Image = new Buffer( chunk ).toString('base64');
console.log(chunk, startCount);
let uploadPackage: documentInterfaces.UploadPackage = {
transaction: {
documentId: {value: data.documentId.value},
transactionId: data.transactionId,
fileGuid: data.fileGuid
},
packageBuffer: base64Image
};
// debugger
component.$store.dispatch('documents/uploadPackage', uploadPackage)
.then(({ data, status }: { data: documentInterfaces.ReciveAttachScene , status: number }) => {
// debugger
if(status !== 200){
component.$message({
message: data,
type: "error"
});
component.rejectUpload(beginUpload);
}
else{
callStoreCouunt+=1;
console.log(chunk, "res" + callStoreCouunt)
debugger
if(callStoreCouunt === startCount){
let commitPackage = {
transaction: {
documentId: {value: uploadPackage.transaction.documentId.value},
transactionId: uploadPackage.transaction.transactionId,
fileGuid: uploadPackage.transaction.fileGuid
}
};
debugger
component.commitUpload(commitPackage);
}
}
});
}
You cannot control which chunk of data reaches the server first. If there's a network problem somewhere on its way, it might go around the planet multiple times before it reaches the server.
Even if the 1st chunk was sent 5 ms earlier than the 2nd one, the 2nd chunk might reach the server earlier.
But there's a few ways you can solve this:
Method 1:
Wait for the server response before sending the next chunk:
let state = {
isPaused: false
}
let sentChunks = 0
let totalChunks = getTotalChunksAmount()
let chunkToSend = ...
setInterval(() => {
if (!isPaused && sentChunks < totalChunks) {
state.isPaused = true
send(chunkToSend)
sentChunks += 1
}
}, 100)
onServerReachListener(response => {
if (response === ...) {
state.isPaused = false
}
})
Method 2:
If you don't need to process chunks sequentially in real time, you can just wait for all of them to arrive on the server, then sort them before processing:
let chunks = []
onChunkReceived (chunk) {
if (chunk.isLast) {
chunks.push(chunk)
chunks.sort()
processChunks()
}
else {
chunks.push(chunk)
}
}
Method 3:
If you do need to process chunks sequentially in real time, give all the chunks an id property and processing them sequentially, while storing the other ones for later:
let chunksToProcess = []
let lastProcessedChunkId = -1
onChunkReceived (chunk) {
if (chunk.id === lastProcessedChunkId) {
processChunk()
lastProcessedChunkId += 1
processStoredChunks()
}
else {
chunksToProcess.push(chunk)
}
}
I want to sore an image file to my local storage in order to reuse it on a different page. Im getting a picture or a file when on desktop. That should be transferred to a file and then I want to store it the onImagePicked function. I have connected that function with a button so the image should be transformed to a file and also be stored then. But somehow the func. is not called because I don't get my console.log. I suppose it could have to do something that Im just testing with pictures that come directly from my desktop and not with pictures from a device. But I don't know.
Here is my code: (The rest with getting and displaying the picture works)
import { Storage } from '#ionic/storage';
// convert base64 image into a file
function base64toBlob(base64Data, contentType) {
contentType = contentType || '';
const sliceSize = 1024;
const byteCharacters = atob(base64Data);
const bytesLength = byteCharacters.length;
const slicesCount = Math.ceil(bytesLength / sliceSize);
const byteArrays = new Array(slicesCount);
for (var sliceIndex = 0; sliceIndex < slicesCount; ++sliceIndex) {
const begin = sliceIndex * sliceSize;
const end = Math.min(begin + sliceSize, bytesLength);
const bytes = new Array(end - begin);
for (let offset = begin, i = 0; offset < end; ++i, ++offset) {
bytes[i] = byteCharacters[offset].charCodeAt(0);
}
byteArrays[sliceIndex] = new Uint8Array(bytes);
}
return new Blob(byteArrays, { type: contentType });
}
export const IMAGE_DATA = 'image_data';
...
#ViewChild('filePicker') filePickerRef: ElementRef<HTMLInputElement>;
#Output() imagePick = new EventEmitter<string | File>();
selectedImage: string;
usePicker = false;
...
takePicture() {
if (this.usePicker) {
this.filePickerRef.nativeElement.click();
}
Plugins.Camera.getPhoto({
quality: 80,
source: CameraSource.Prompt,
correctOrientation: true,
saveToGallery: true,
allowEditing: true,
resultType: CameraResultType.Base64,
direction: CameraDirection.Front
}).then(image => {
this.selectedImage = image.base64String;
this.imagePick.emit(image.base64String);
}).catch(error => {
console.log(error);
return false;
});
}
onImagePicked(imageData: string | File) {
let imageFile;
if (typeof imageData === 'string') {
try {
imageFile = base64toBlob(imageData.replace('data:image/jpeg;base64,', ''), 'image/jpeg');
this.storage.set('image_data', imageFile);
console.log('stored');
} catch (error) {
console.log(error);
}
} else {
imageFile = imageData;
}
}
onFileChosen(event: Event) {
const pickedFile = (event.target as HTMLInputElement).files[0];
const fr = new FileReader();
fr.onload = () => {
const dataUrl = fr.result.toString();
this.selectedImage = dataUrl;
this.imagePick.emit(pickedFile);
};
fr.readAsDataURL(pickedFile);
}
html (where I try to call the function)
<app-make-photo (imagePick)="onImagePicked($event)"></app-make-photo>
I have audio data streaming from the server to the client. It starts as a Node.js buffer (which is a Uint8Array) and is then sent to the AudiWorkletProcessor via port.postMessage(), where it is converted into a Float32Array and stored in this.data. I have spent hours trying to set the output to the audio data contained in the Float32Array. Logging the Float32Array pre-processing shows accurate data, but logging it during processing shows that it is not changing when the new message is posted. This is probably a gap in my low-level audio-programming knowledge.
When data arrives in the client, the following function is called:
process = (data) => {
this.node.port.postMessage(data)
}
As an aside, (and you can let me know) maybe I should be using parameter descriptors instead of postMessage? Anyways, here's my AudioWorkletProcessor:
class BypassProcessor extends AudioWorkletProcessor {
constructor() {
super();
this.isPlaying = true;
this.port.onmessage = this.onmessage.bind(this)
}
static get parameterDescriptors() {
return [{ // Maybe we should use parameters. This is not utilized at present.
name: 'stream',
defaultValue: 0.707
}];
}
convertBlock = (incomingData) => { // incoming data is a UInt8Array
let i, l = incomingData.length;
let outputData = new Float32Array(incomingData.length);
for (i = 0; i < l; i++) {
outputData[i] = (incomingData[i] - 128) / 128.0;
}
return outputData;
}
onmessage(event) {
const { data } = event;
let ui8 = new Uint8Array(data);
this.data = this.convertBlock(ui8)
}
process(inputs, outputs) {
const input = inputs[0];
const output = outputs[0];
if (this.data) {
for (let channel = 0; channel < output.length; ++channel) {
const inputChannel = input[channel]
const outputChannel = output[channel]
for (let i = 0; i < inputChannel.length; ++i) {
outputChannel[i] = this.data[i]
}
}
}
return true;
}
}
registerProcessor('bypass-processor', BypassProcessor);
How can I simply set the output of the AudioWorkletProcessor to the data coming through?
The AudioWorkletProcessor process only each 128 bytes, so you need to manage your own buffers to make sure that is the case for an AudioWorklet, probably by adding a FIFO.
I resolved something like this using a RingBuffer(FIFO) implemented in WebAssembly, in my case I was receiving a buffer with 160 bytes.
Look my AudioWorkletProcessor implementation
import Module from './buffer-kernel.wasmodule.js';
import { HeapAudioBuffer, RingBuffer, ALAW_TO_LINEAR } from './audio-helper.js';
class SpeakerWorkletProcessor extends AudioWorkletProcessor {
constructor(options) {
super();
this.payload = null;
this.bufferSize = options.processorOptions.bufferSize; // Getting buffer size from options
this.channelCount = options.processorOptions.channelCount;
this.inputRingBuffer = new RingBuffer(this.bufferSize, this.channelCount);
this.outputRingBuffer = new RingBuffer(this.bufferSize, this.channelCount);
this.heapInputBuffer = new HeapAudioBuffer(Module, this.bufferSize, this.channelCount);
this.heapOutputBuffer = new HeapAudioBuffer(Module, this.bufferSize, this.channelCount);
this.kernel = new Module.VariableBufferKernel(this.bufferSize);
this.port.onmessage = this.onmessage.bind(this);
}
alawToLinear(incomingData) {
const outputData = new Float32Array(incomingData.length);
for (let i = 0; i < incomingData.length; i++) {
outputData[i] = (ALAW_TO_LINEAR[incomingData[i]] * 1.0) / 32768;
}
return outputData;
}
onmessage(event) {
const { data } = event;
if (data) {
this.payload = this.alawToLinear(new Uint8Array(data)); //Receiving data from my Socket listener and in my case converting PCM alaw to linear
} else {
this.payload = null;
}
}
process(inputs, outputs) {
const output = outputs[0];
if (this.payload) {
this.inputRingBuffer.push([this.payload]); // Pushing data from my Socket
if (this.inputRingBuffer.framesAvailable >= this.bufferSize) { // if the input data size hits the buffer size, so I can "outputted"
this.inputRingBuffer.pull(this.heapInputBuffer.getChannelData());
this.kernel.process(
this.heapInputBuffer.getHeapAddress(),
this.heapOutputBuffer.getHeapAddress(),
this.channelCount,
);
this.outputRingBuffer.push(this.heapOutputBuffer.getChannelData());
}
this.outputRingBuffer.pull(output); // Retriving data from FIFO and putting our output
}
return true;
}
}
registerProcessor(`speaker-worklet-processor`, SpeakerWorkletProcessor);
Look the AudioContext and AudioWorklet instances
this.audioContext = new AudioContext({
latencyHint: 'interactive',
sampleRate: this.sampleRate,
sinkId: audioinput || "default"
});
this.audioBuffer = this.audioContext.createBuffer(1, this.audioSize, this.sampleRate);
this.audioSource = this.audioContext.createBufferSource();
this.audioSource.buffer = this.audioBuffer;
this.audioSource.loop = true;
this.audioContext.audioWorklet
.addModule('workers/speaker-worklet-processor.js')
.then(() => {
this.speakerWorklet = new AudioWorkletNode(
this.audioContext,
'speaker-worklet-processor',
{
channelCount: 1,
processorOptions: {
bufferSize: 160, //Here I'm passing the size of my output, I'm just saying to RingBuffer what size I need
channelCount: 1,
},
},
);
this.audioSource.connect(this.speakerWorklet).connect(this.audioContext.destination);
}).catch((err)=>{
console.log("Receiver ", err);
})
Look how I'm receiving and sending the data from Socket to audioWorklet
protected onMessage(e: any): void { //My Socket message listener
const { data:serverData } = e;
const socketId = e.socketId;
if (this.audioWalking && this.ws && !this.ws.isPaused() && this.ws.info.socketId === socketId) {
const buffer = arrayBufferToBuffer(serverData);
const rtp = RTPParser.parseRtpPacket(buffer);
const sharedPayload = new Uint8Array(new SharedArrayBuffer(rtp.payload.length)); //sharing javascript buffer memory between main thread and worklet thread
sharedPayload.set(rtp.payload, 0);
this.speakerWorklet.port.postMessage(sharedPayload); //Sending data to worklet
}
}
For help people I putted on Github the piece important of this solution
audio-worklet-processor-wasm-example
I followed this example, it have the all explanation how the RingBuffer works
wasm-ring-buffer
I'm new to ES6 and Promise. I'm trying pdf.js to extract texts from all pages of a pdf file into a string array. And when extraction is done, I want to parse the array somehow. Say pdf file(passed via typedarray correctly) has 4 pages and my code is:
let str = [];
PDFJS.getDocument(typedarray).then(function(pdf) {
for(let i = 1; i <= pdf.numPages; i++) {
pdf.getPage(i).then(function(page) {
page.getTextContent().then(function(textContent) {
for(let j = 0; j < textContent.items.length; j++) {
str.push(textContent.items[j].str);
}
parse(str);
});
});
}
});
It manages to work, but, of course, the problem is my parse function is called 4 times. I just want to call parse only after all 4-pages-extraction is done.
Similar to https://stackoverflow.com/a/40494019/1765767 -- collect page promises using Promise.all and don't forget to chain then's:
function gettext(pdfUrl){
var pdf = pdfjsLib.getDocument(pdfUrl);
return pdf.then(function(pdf) { // get all pages text
var maxPages = pdf.pdfInfo.numPages;
var countPromises = []; // collecting all page promises
for (var j = 1; j <= maxPages; j++) {
var page = pdf.getPage(j);
var txt = "";
countPromises.push(page.then(function(page) { // add page promise
var textContent = page.getTextContent();
return textContent.then(function(text){ // return content promise
return text.items.map(function (s) { return s.str; }).join(''); // value page text
});
}));
}
// Wait for all pages and join text
return Promise.all(countPromises).then(function (texts) {
return texts.join('');
});
});
}
// waiting on gettext to finish completion, or error
gettext("https://cdn.mozilla.net/pdfjs/tracemonkey.pdf").then(function (text) {
alert('parse ' + text);
},
function (reason) {
console.error(reason);
});
<script src="https://npmcdn.com/pdfjs-dist/build/pdf.js"></script>
A bit more cleaner version of #async5 and updated according to the latest version of "pdfjs-dist": "^2.0.943"
import PDFJS from "pdfjs-dist";
import PDFJSWorker from "pdfjs-dist/build/pdf.worker.js"; // add this to fit 2.3.0
PDFJS.disableTextLayer = true;
PDFJS.disableWorker = true; // not availaible anymore since 2.3.0 (see imports)
const getPageText = async (pdf: Pdf, pageNo: number) => {
const page = await pdf.getPage(pageNo);
const tokenizedText = await page.getTextContent();
const pageText = tokenizedText.items.map(token => token.str).join("");
return pageText;
};
/* see example of a PDFSource below */
export const getPDFText = async (source: PDFSource): Promise<string> => {
Object.assign(window, {pdfjsWorker: PDFJSWorker}); // added to fit 2.3.0
const pdf: Pdf = await PDFJS.getDocument(source).promise;
const maxPages = pdf.numPages;
const pageTextPromises = [];
for (let pageNo = 1; pageNo <= maxPages; pageNo += 1) {
pageTextPromises.push(getPageText(pdf, pageNo));
}
const pageTexts = await Promise.all(pageTextPromises);
return pageTexts.join(" ");
};
This is the corresponding typescript declaration file that I have used if anyone needs it.
declare module "pdfjs-dist";
type TokenText = {
str: string;
};
type PageText = {
items: TokenText[];
};
type PdfPage = {
getTextContent: () => Promise<PageText>;
};
type Pdf = {
numPages: number;
getPage: (pageNo: number) => Promise<PdfPage>;
};
type PDFSource = Buffer | string;
declare module 'pdfjs-dist/build/pdf.worker.js'; // needed in 2.3.0
Example of how to get a PDFSource from a File with Buffer (from node types) :
file.arrayBuffer().then((ab: ArrayBuffer) => {
const pdfSource: PDFSource = Buffer.from(ab);
});
Here's a shorter (not necessarily better) version:
async function getPdfText(data) {
let doc = await pdfjsLib.getDocument({data}).promise;
let pageTexts = Array.from({length: doc.numPages}, async (v,i) => {
return (await (await doc.getPage(i+1)).getTextContent()).items.map(token => token.str).join('');
});
return (await Promise.all(pageTexts)).join('');
}
Here, data is a string or buffer (or you could change it to take the url, etc., instead).
Here's another Typescript version with await and Promise.all based on the other answers:
import { getDocument } from "pdfjs-dist";
import {
DocumentInitParameters,
PDFDataRangeTransport,
TypedArray,
} from "pdfjs-dist/types/display/api";
export const getPdfText = async (
src: string | TypedArray | DocumentInitParameters | PDFDataRangeTransport
): Promise<string> => {
const pdf = await getDocument(src).promise;
const pageList = await Promise.all(Array.from({ length: pdf.numPages }, (_, i) => pdf.getPage(i + 1)));
const textList = await Promise.all(pageList.map((p) => p.getTextContent()));
return textList
.map(({ items }) => items.map(({ str }) => str).join(""))
.join("");
};
If you use the PDFViewer component, here is my solution that doesn't involve any promise or asynchrony:
function getDocumentText(viewer) {
let text = '';
for (let i = 0; i < viewer.pagesCount; i++) {
const { textContentItemsStr } = viewer.getPageView(i).textLayer;
for (let item of textContentItemsStr)
text += item;
}
return text;
}
I wouldn't know how to do it either, but thanks to async5 I did it. I copied his code and updated it to the new version of pdf.js.
I made minimal corrections and also took the liberty of not grouping all the pages into a single string. In addition, I used a regular expression that removes many of the empty spaces that PDF unfortunately ends up creating (it does not solve all cases, but the vast majority).
The way I did it should be the way that most will feel comfortable working, however, feel free to remove the regex or make any other changes.
// pdf-to-text.js v1, require pdf.js ( https://mozilla.github.io/pdf.js/getting_started/#download )
// load pdf.js and pdf.worker.js
function pdfToText(url, separator = ' ') {
let pdf = pdfjsLib.getDocument(url);
return pdf.promise.then(function(pdf) { // get all pages text
let maxPages = pdf._pdfInfo.numPages;
let countPromises = []; // collecting all page promises
for (let i = 1; i <= maxPages; i++) {
let page = pdf.getPage(i);
countPromises.push(page.then(function(page) { // add page promise
let textContent = page.getTextContent();
return textContent.then(function(text) { // return content promise
return text.items.map(function(obj) {
return obj.str;
}).join(separator); // value page text
});
}));
};
// wait for all pages and join text
return Promise.all(countPromises).then(function(texts) {
for(let i = 0; i < texts.length; i++){
texts[i] = texts[i].replace(/\s+/g, ' ').trim();
};
return texts;
});
});
};
// example of use:
// waiting on pdfToText to finish completion, or error
pdfToText('files/pdf-name.pdf').then(function(pdfTexts) {
console.log(pdfTexts);
// RESULT: ['TEXT-OF-PAGE-1', 'TEXT-OF-PAGE-2', ...]
}, function(reason) {
console.error(reason);
});