I have an application in Electron that does facial recognition of people to then decide whether or not they can enter the place and for that I'm using Amazon Rekognition.
Everything was working fine (for a few months) until, two days ago, a customer reported to me that the app was behaving strangely, like it wasn't responding to requests for facial recognition.
After several tests, I discovered that what is happening with it is a timeout error, which occurs in all API calls, whether they are looking for faces (SearchFacesByImage) or registering new faces (IndexFaces).
The error says:
{
"message": "connect ETIMEDOUT 3.226.60.54:443",
"errno": -4039,
"code": "TimeoutError",
"syscall": "connect",
"address": "3.226.60.54",
"port": 443,
"time": "2022-12-14T13:50:10.909Z",
"region": "us-east-1",
"hostname": "rekognition.us-east-1.amazonaws.com",
"retryable": true
}
What intrigued me was the fact that everything was working fine, until this behavior just started happening (and I didn't make any code changes/updates to the app running on my client's computer).
And what makes me even more intrigued is that this behavior occurs completely randomly and only on the machine of that client in question. Sometimes the API calls work correctly (returning whether the person was recognized or not), but most of the time, the calls take about 90 seconds to return the timeout error. When executing the same code on my machine (same methods and same CollectionId) everything runs normally and there was no timeout error at any time - while at the exact same moment on my client's machine the behavior continues.
I was using aws-sdk and then switched to #aws-sdk/client-rekognition (thinking that could solve the problem) but the code only worked on a few of the first calls to the API and a few minutes later it got the timeout errors again.
The code I'm using to configure and make calls to Rekognition is basically this:
const { RekognitionClient, IndexFacesCommand, SearchFacesByImageCommand } = require('#aws-sdk/client-rekognition')
const rekognitionClient = new RekognitionClient({
credentials: {
accessKeyId: 'accessKeyId',
secretAccessKey: 'secretAccessKey'
},
region: 'us-east-1'
})
const registerFaceOnRekognition = async (bytes, userId) => {
const params = {
CollectionId: 'collectionId',
Image: { Bytes: bytes },
ExternalImageId: userId,
MaxFaces: 1,
QualityFilter: 'HIGH'
}
const command = new IndexFacesCommand(params)
try {
const { FaceRecords } = await rekognitionClient.send(command)
if (!FaceRecords.length) {
console.log('No faces detected.')
return
}
console.log('Face created:')
console.log(FaceRecords[0].Face.FaceId)
} catch (error) {
console.error(error) // timeout error
}
}
const searchFaceByImageOnRekognition = async (bytes) => {
const params = {
CollectionId: 'collectionId',
Image: { Bytes: bytes },
MaxFaces: 1,
FaceMatchThreshold: 99,
QualityFilter: 'HIGH'
}
const command = new SearchFacesByImageCommand(params)
try {
const { FaceMatches } = await rekognitionClient.send(command)
if (!FaceMatches.length) {
console.log('This face has not been registered yet')
return
}
console.log('Face found:')
console.log(FaceMatches[0].Face.ExternalImageId)
} catch (error) {
console.error(error) // timeout error
}
}
// Method called through the renderer process that has a canvas where the webcam view is reproduced
const onTakePicture = (event, data) => {
const bytes = Buffer.from(data.dataURL.replace('data:image/jpeg;base64,', ''), 'base64')
// If there is a userId, register the face in the image
if (data.userId) {
registerFaceOnRekognition(bytes, data.userId)
return
}
// Else, search for the face in the image
searchFaceByImageOnRekognition(bytes)
}
Just remembering that: during all tests on my client's computer the internet connection was stable and working properly.
What is the best way to investigate and resolve this issue?
UPDATE:
I enabled Rekognition debug logs and they can be found at: https://gist.github.com/IgorSamer/4e58e09f3fa615401f85ca325b794245
In it, the first three requests (2022-12-16T13:48:45.932Z, 2022-12-16T13:53:20.325Z and 2022-12-16T14:19:12.479Z) occur normally. However, all other consecutive requests start to give the timeout error, where, in fact, no data is returned after the [DEBUG] App: endpoints Resolved endpoint: step.
As previously mentioned the internet connection is working fine. I could also managing to reproduce the error via remote access, that is, the machine internet was ok at the time of error.
Is there a possibility that there is a block made by my client's firewall/network that prevents requests from being sent by the SDK after a few successful requests? If yes, what is the best way to investigate this?
Exploration
This is what I would do initially to gather some info:
Verify if this is happening ALL the time with that specific client.
Verify if this is happening ONLY with one client, or more.
Verify if this is happening in one or multiple regions (i.e us-east-1).
Verify if Amazon Recognition has had/or has issues in the affected region during the time window of interest.
Check Recognition's status in the Health dashboard in your AWS console: link
Use AWS Recognition Guidelines and Quotas as a reference to determine if your app/service usage of Recognition is under the set limits.
Note there's a limit on TPS per resource (i.e SearchFacesByImage, IndexFaces) per account.
Possible approaches
Verify if there was a change in the client network/firewall. Just ask.
Replicate your app's API call with AWS CLI and study logs.
Access remotely to your client's device.
Setup temporal AWS credentials (remember to remove access after the test)
Send an API call to the Recognition endpoint. Note that even a 4XX error will be good news, as you got at least some response.
Set up proper logging for your app (as CloudWatch logs may not be enough to troubleshoot).
Check Splunk's APM and NewRelic's APM
I hope this may be of help to at least create a troubleshooting strategy
Related
I am attempting to write an web application with a persistent echo connection to a laravel-echo-server instance, which needs to detect disconnections and attempt to reconnect gracefully. The scenario I am attempting to overcome now is a user's machine has gone to sleep / reawoke and their session key has been invalidated (echo server requires an active session in our app). Detecting this situation from an HTTP perspective is solved - I setup a regular keepAlive, and if that keepAlive detects a 400-level error, it reconnects and updates the session auth_token.
When my Laravel session dies, I cannot tell that has happened from an echo perspective. The best I've found is I can attach to the 'disconnect' event, but that only gets triggered if the server-side laravel-echo-server process dies, rather than the session is invalid:
this.echoConnection.connector.socket.on('connect', function() {
logger.log('info', `Echo server running`);
})
this.echoConnection.connector.socket.on('disconnect', function() {
logger.log('warn', `Echo server disconnected`);
});
On the laravel-echo-server side, I can tell that the connection is dead - it will show this error:
⚠ [7:03:30 PM] - 5TwHN2qUys5VEFP5AAAG could not be authenticated to private.1
I cannot figure out how to catch this failure event programmatically from the client. Is there a way to capture it? Again, I can tell the session is dead eventually because I poll the server regularly via a http keepAlive function, but I would definitely also like to tell directly from the echo connection if possible, as it polls at a much higher natural rate.
As a second (more important) question, if I detect that my session has died, what should I do to recycle the echo connection (after I have logged in again via HTTP and gotten a new auth_token)? Is there anything specific I should call / etc? I've had some success calling disconnect() then setting up the connection again from scratch, but I do see errors such as:
websocket.js:201 WebSocket is already in CLOSING or CLOSED state.
Here is my current (naive) reconnection code, which is my initial connection code with an attempt to disconnect first stapled onto it:
async attemptEchoReconnect() {
if (this.echoConnection !== null) {
this.echoConnection.disconnect();
this.echoConnection = null;
}
const thisConnectionParams = this.props.connections[this.connectionName];
const curThis = this;
this.echoConnection = new Echo({
broadcaster: 'socket.io',
host: thisConnectionParams.echoHost,
authEndpoint: 'api/broadcasting/auth',
auth: {
headers: {
Authorization: `Bearer ` + thisConnectionParams.authToken
}
}
});
this.echoConnection.connector.socket.on('connect', function() {
logger.log('info', `Echo server running`);
})
this.echoConnection.connector.socket.on('disconnect', function() {
logger.log('warn', `Echo server disconnected`);
});
this.echoConnection.join('everywhere')
.here(users => {
logger.log('info', `Rejoined presence channel`);
});
this.echoConnection.private(`private.${this.props.id}`)
.listen(...);
setTimeout(() => { this.keepAlive() }, 120 * 1000);
}
Any help would be so great - these APIs are not well documented to the end that I really want, and I am hoping I can get some stability with this connection rather than having to do something ugly like force restart.
For anyone who needs help with this problem, my above echo reconnection code seems to be pretty stable, along with a keepAlive function to determine the state of the HTTP connection. I am still a bit uncertain of the origin of the console errors I am seeing, but I suspect they have to do with connection loss during a sleep cycle, which is not something I am particularly worried about.
I'd still be interested in hearing other thoughts if anyone has any. I am somewhat inclined to believe long-term stability of an echo connection is possible, though it does appear you have to proactively monitor it with what tools you have available.
I'm using firebase-functions/lib/logger to log client-side firebase/firestore activity like
const { log, error } = require("firebase-functions/lib/logger");
export const addData = async (userId, dataId) => {
try {
const collectionRef = firestore
.collection("docs")
await collectionRef.add({
dataId,
});
log(`Data added`, { userId, dataId });
} catch (err) {
error(`Unable to add new data`, { userId, dataId });
throw new Error(err);
}
};
When I run this on my local, the log shows up in my browser console. Will this happen on non-local environments, ie for real users? Will these logs also show up automatically in Stackdriver, or are they stuck on the client side? I want to be able to view the logs either in Stackdriver or Firebase console but have them not show up in the browser for real users. How should I accomplish this?
Messages logged in Cloud Functions will not show up in the client app at all (that would probably be a security hole for your app). They will show up in the Cloud Functions console in the log tab, and in StackDriver.
Any messages logged in your app will not show up in any Google Cloud product. They are constrained to the device that generated them. If you want cloud logging, you'll need to implement some other solution. Cloud Functions does not support this - you will need to investigate other solutions or build something yourself.
For some reason documents created on my app are not showing up on my remote couchdb database.
I am using the following
import PouchDB from 'pouchdb-react-native'
let company_id = await AsyncStorage.getItem('company_id');
let device_db = new PouchDB(company_id, {auto_compaction: true});
let remote_db = new PouchDB('https://'+API_KEY+'#'+SERVER+'/'+company_id, {ajax: {timeout: 180000}});
device_db.replicate.to(remote_db).then((resp) => {
console.log(JSON.stringify(resp));
console.log("Device to Remote Server - Success");
return resp;
}, (error) => {
console.log("Device to Remote Server - Error");
return false;
});
I get a successful response the response:
{
"ok":true,
"start_time":"2018-05-17T15:19:05.179Z",
"docs_read":0,
"docs_written":0,
"doc_write_failures":0,
"errors":[
],
"last_seq":355,
"status":"complete",
"end_time":"2018-05-17T15:19:05.555Z"
}
When I go to my remote database, document_id's that am able to search and grab on the application do not show up.
Is there something I am not taking into account?
Is there anything I can do to check why this might be happening?
This worked when I used the same scripting method in Ionic and when I switched to React-Native I noticed this is the case.
NOTE: When I do .from() and get data from remote to the device, I get the data. For some reason it just isn't pushing data out
"Is there anything I can do to check why this might be happening?"
I would try switching on debugging as outlined here.
PouchDB.debug.enable('*');
This should allow you to view debug messages in your browser's JavaScript console.
I'm developing a webapp using ReactJS for the frontend and express for the backend. I'm deploying my app to azure.
To test if my requests are going through I wrote two different API requests.
The first one is very simple:
router.get('/test', (req, res) => {
res.send('test was a success');
});
Then in the frontend I have a button which when clicked makes the request and I get the response 'test was a success'. This works every time.
The second test is:
router.post('/test-email', (req, res) => {
let current_template = 'reset';
readHTMLFile(__dirname + '/emails/' + current_template + '.html', function(err, html) {
let htmlSend;
let template = handlebars.compile(html);
let = replacements = {
name: req.body.name
};
htmlSend = template(replacements);
let mailOptions = {
from: 'email#email.com',
to: 'someone#email.com',
subject: 'Test Email',
html: htmlSend
};
transporter.sendMail(mailOptions)
.then(response => {
res.send(response);
})
.catch(console.error);
});
});
Then when I've deployed the app I make a call to each one of these tests. The first one, like I mentioned always succeeds. The second one which is supposed to send a very simple email fails most of the time with the error "iisnode encountered an error when processing the request. HRESULT: 0x6d HTTP status: 500 HTTP subStatus: 1013". The strange thing is that every once in a while the email will send but this happens very rarely. Most times the request will take exactly two minutes before sending a response with an error.
I should note that when in development in localhost both tests work all the time with no issues whatsoever, it's only when in production (deployment to azure) that this happens.
I've been digging around for the last few days and came up with nothing. Any help or directions would be greatly appreciated.
I found out what the problem was. I'm using gmail to send my test emails, by default gmail will block any attempts to use an account if it thinks the app making the request is not secure. This can be easily fixed by simply clicking the link they automatically send you when you make your first attempt. What is not immediately obvious is when you go in production mode they add another level of security which in this case I believe is a captcha, and while you'll be able to send emails in development as soon as you deploy your app this no longer becomes the case.
Anyway, after digging around a little more I found the option to disable the captcha and now my emails send fine!
Link to that option https://accounts.google.com/b/0/DisplayUnlockCaptcha
Hopefully this will help someone.
I can't figure out how to debug WebRTC. I keep getting 'ICE Failed' errors, but I doubt that's the issue. Here's my code: https://github.com/wamoyo/webrtc-cafe/tree/master/2.1%20Establishing%20a%20Connection%20%28within%20a%20Local%20Area%20Network%29
I'm using node.js/express/socket.io for setting up rooms and connecting peers, and then some default public servers for signalling.
The strange thing is, it appears the I have the remoteStream on the client.
Here's the two errors I'm getting (By they way, for now, I'm just trying to connect form my phone to laptop or two browser tabs, all within a LAN):
HTTP "Content-Type" of "text/html" is not supported. Load of media resource http://192.168.1.2:3000/%5Bobject%20MediaStream%5D failed.
ICE failed, see about:webrtc for more details
Any help would rock!
I've made a few comments already, but I think it's also worthwhile to write an answer.
There are 3 big things I see after my first quick read of your code. I haven't tried to actually run or debug your code beyond a superficial reading.
First, you should set the remoteVideo.src URL parameter in the same way as you do the local video stream:
pc.onaddstream = function(media) { // This function runs when the remote stream is added.
console.log(media);
remoteVideo.src = window.URL.createObjectURL(media.stream);
}
Second, you should pass a constraints object to the createOffer() and createAnswer() methods of RTCPeerConnection. The constraints should/could look like this:
var constraints = {
mandatory: {
OfferToReceiveAudio: true,
OfferToReceiveVideo: true
}
};
And you pass this after the success and error callback arguments:
pc.createOffer(..., ..., constraints);
and:
pc.createAnswer(..., ..., constraints);
Lastly, you are not exchanging ICE candidates between your peers. ICE candidates can be part of the offer/answer SDP, but not always. To ensure that you send all of them, you should implement an onicecandidate handler on the RTCPeerConnection:
pc.onicecandidate = function (event) {
if (event.candidate) {
socket.emit("ice candidate", event.candidate);
}
}
You will have to implement "ice candidate" message relaying between clients in your server.js
Hope this helps, and good luck!