For my game I need functions to translate between two coordinate systems. Well it's mainly math question but what I need is the C++ code to do it and a bit of explanation how to solve my issue.
Screen coordiantes:
a) top left corner is 0,0
b) no minus values
c) right += x (the more is x value, the more on the right is point)
d) bottom +=y
Cartesian 2D coordinates:
a) middle point is (0, 0)
b) minus values do exist
c) right += x
d) bottom -= y (the less is y, the more at the bottom is point)
I need an easy way to translate from one system to another and vice versa. To do that, (I think) I need some knowledge like where is the (0, 0) [top left corner in screen coordinates] placed in the cartesian coordinates.
However there is a problem that for some point in cartesian coordinates after translating it to screen ones, the position in screen coordinates may be minus, which is a nonsense. I cant put top left corner of screen coordinates in (-inifity, +infinity) cartesian coords...
How can I solve this? The only solution I can think of is to place screen (0, 0) in cartesian (0, 0) and only use IV quarter of cartesian system, but in that case using cartesian system is pointless...
I'm sure there are ways for translating screen coordinates into cartesian coordinates and vice versa, but I'm doing something wrong in my thinking with that minus values.
The basic algorithm to translate from cartesian coordinates to screen coordinates are
screenX = cartX + screen_width/2
screenY = screen_height/2 - cartY
But as you mentioned, cartesian space is infinite, and your screen space is not. This can be solved easily by changing the resolution between screen space and cartesian space. The above algorithm makes 1 unit in cartesian space = 1 unit/pixel in screen space. If you allow for other ratios, you can "zoom" out or in your screen space to cover all of the cartesian space necessary.
This would change the above algorithm to
screenX = zoom_factor*cartX + screen_width/2
screenY = screen_height/2 - zoom_factor*cartY
Now you handle negative (or overly large) screenX and screenY by modifying your zoom factor until all your cartesian coordinates will fit on the screen.
You could also allow for panning of the coordinate space too, meaning, allowing the center of cartesian space to be off-center of the screen. This could also help in allowing your zoom_factor to stay as tight as possible but also fit data which isn't evenly distributed around the origin of cartesian space.
This would change the algorithm to
screenX = zoom_factor*cartX + screen_width/2 + offsetX
screenY = screen_height/2 - zoom_factor*cartY + offsetY
You must know the size of the screen in order to be able to convert
Convert to Cartesian:
cartesianx = screenx - screenwidth / 2;
cartesiany = -screeny + screenheight / 2;
Convert to Screen:
screenx = cartesianx + screenwidth / 2;
screeny = -cartesiany + screenheight / 2;
For cases where you have a negative screen value:
I would not worry about this, this content will simply be clipped so the user will not see. If this is a problem, I would add some constraints that prevent the cartesian coordinate from being too large. Another solution, since you can't have the edges be +/- infinity, would be to scale your coordinates (e.g. 1 pixel = 10 cartesian) Let's call this scalefactor. The equations are now:
Convert to Cartesian with scale factor:
cartesianx = scalefactor*screenx - screenwidth / 2;
cartesiany = -scalefactor*screeny + screenheight / 2;
Convert to Screen with scale factor:
screenx = (cartesianx + screenwidth / 2) / scalefactor;
screeny = (-cartesiany + screenheight / 2) / scalefactor;
You need to know the width and height of the screen.
Then you can do:
cartX = screenX - (width / 2);
cartY = -(screenY - (height / 2));
screenX = cartX + (width / 2);
screenY = -cartY + (height / 2);
You will always have the problem that the result could be off the screen -- either as a negative value, or as a value larger than the available screen size.
Sometimes that won't matter: e.g., if your graphical API accepts negative values and clips your drawing for you. Sometimes it will matter, and for those cases you should have a function that checks if a set of screen coordinates is on the screen.
You could also write your own clipping functions that try to do something reasonable with coordinates that fall off the screen (such as truncating negative screen coordinates to 0, and coordinates that are too large to the maximum onscreen coordinate). However, keep in mind that "reasonable" depends on what you're trying to do, so it might be best to hold off on defining such functions until you actually need them.
In any case, as other answers have noted, you can convert between the coordinate systems as:
cart.x = screen.x - width/2;
cart.y = height/2 - screen.y;
screen.x = cart.x + width/2;
screen.y = height/2 - cart.y;
I've got some boost c++ for you, based on microsoft article:
You just need to know two screen points and two points in your coordinate system. Then you can convert point from one system to another.
#include <boost/numeric/ublas/vector.hpp>
#include <boost/numeric/ublas/vector_proxy.hpp>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/triangular.hpp>
#include <boost/numeric/ublas/lu.hpp>
#include <boost/numeric/ublas/io.hpp>
/* Matrix inversion routine.
Uses lu_factorize and lu_substitute in uBLAS to invert a matrix */
template<class T>
bool InvertMatrix(const boost::numeric::ublas::matrix<T>& input, boost::numeric::ublas::matrix<T>& inverse)
typedef boost::numeric::ublas::permutation_matrix<std::size_t> pmatrix;
// create a working copy of the input
boost::numeric::ublas::matrix<T> A(input);
// create a permutation matrix for the LU-factorization
pmatrix pm(A.size1());
// perform LU-factorization
int res = lu_factorize(A, pm);
if (res != 0)
return false;
// create identity matrix of "inverse"
inverse.assign(boost::numeric::ublas::identity_matrix<T> (A.size1()));
// backsubstitute to get the inverse
lu_substitute(A, pm, inverse);
return true;
PointF ConvertCoordinates(PointF pt_in,
PointF pt1, PointF pt2, PointF pt1_, PointF pt2_)
float matrix1[]={
pt1.X, pt1.Y, 1.0f, 0.0f,
-pt1.Y, pt1.X, 0.0f, 1.0f,
pt2.X, pt2.Y, 1.0f, 0.0f,
-pt2.Y, pt2.X, 0.0f, 1.0f
boost::numeric::ublas::matrix<float> M(4, 4);
CopyMemory(&[0], matrix1, sizeof(matrix1));
boost::numeric::ublas::matrix<float> M_1(4, 4);
InvertMatrix<float>(M, M_1);
double vector[] = {
boost::numeric::ublas::vector<float> u(4);
boost::numeric::ublas::vector<float> u1(4);
u(0) = pt1_.X;
u(1) = pt1_.Y;
u(2) = pt2_.X;
u(3) = pt2_.Y;
u1 = boost::numeric::ublas::prod(M_1, u);
PointF pt;
pt.X = u1(0)*pt_in.X + u1(1)*pt_in.Y + u1(2);
pt.Y = u1(1)*pt_in.X - u1(0)*pt_in.Y + u1(3);
return pt;
I have a requirement to make some annotations on an image. This image is scalable (can be zoomed in and out). Now the challenge is that the annotations should also move with the scaling. How can I achieve this? I understand that 'direction' of zooming depends on the point considered as 'centre' when zooming, so assuming that this 'centre' is the absolute centre of the iamge container (width/2, height/2), how do I get the coordinates of the same point on image after zooming?
As an example, consider the following two images:
Image-1 (Normal scale):
Image-2 (Zoomed-in):
If I know the coordinates of the red point in Image-1 (which is at normal scale), how do I get the coordinates (x,y) of the same red point in Image-2? Note that the image container's width and height will remain same throughout the zooming process.
This function should return your new X and Y measured from the left top of the image.
Bear in mind, that the new coordinates can be outside of the width/height of your image, as the point you picked might be "zoomed off the edge"
* width: integer, width of image in px
* height: integer, height of image in px;
* x: integer, horizontal distance from left
* y: integer, vertical distance from top
* scale: float, scale factor (1,5 = 150%)
const scaleCoordinates = (width, height, x, y, scale) =>{
const centerX = width/2;
const centerY = height/2;
const relX = x - centerX;
const relY = y - centerY;
const scaledX = relX * scale;
const scaledY= relY * scale;
return {x: scaledX + centerX, y: scaledY + centerY};
console.log(scaleCoordinates(100,100,25,50, 1.2));
First, you'd want to determine the coordinates of the annotation with respect to the center of the image.
So for example on an image of 200 x 100, the point (120,60) with the origin in the left top corner would be (20,-10) when you take the center of the image as your origin.
If you scale the image 150%, your new coordinates would be those coordinates multiplied by 1,5 (=150%).
In our example that would be 30, -15.
Than you can calculate that back to absolute values, with the original point of origin
I want to render a bunch of 3d points into 2d canvas without webgl.
I thought clip space and screen space are the same thing, and camera is used to convert from 3d world space to 2d screen space,
but apperently they are not.
So on webgl, when setting gl_Position, it's in clip space,
later this position is converted to screen space by webgl, and gl_FragCoord is set.
How is this calculation is done and where?
And Camera matrix and view projection matrices has nothing to do with converting clip space to screen space.
I can have a 3d world space that fit's into clip space, and I wouldn't need to use a camera right?
If all my assumptions are true, I need to learn how to convert from clip space into screen space.
Here's my code:
const uMatrix = mvpMatrix(modelMatrix(transform));
// transform each vertex into 2d screen space
vertices = => {
let res = mat4.multiplyVector(uMatrix, [...vertex, 1.0]);
// res is vec4 element, in clip space,
// how to transform this into screen space?
return [res[0], res[1]];
// viewProjectionMatrix calculation
const mvpMatrix = modelMatrix => {
const { pos: camPos, target, up } = camera;
const { fov, aspect, near, far } = camera;
let camMatrix = mat4.lookAt(camPos, target, up);
let viewMatrix = mat4.inverse(camMatrix);
let projectionMatrix = mat4.perspective(fov, aspect, near, far);
let viewProjectionMatrix = mat4.multiply(projectionMatrix, viewMatrix);
return mat4.multiply(viewProjectionMatrix, modelMatrix);
The camera mentioned in this article transforms clip space to screen space, If so it shouldn't be named a camera right?
First the geometry is clipped, according to the clip space coordinate (gl_Position). The clip space coordinate is a Homogeneous coordinates. The condition for a homogeneous coordinate to be in clip space is:
-w <= x, y, z <= w.
The clip space coordinate is transformed to a Cartesian coordinate in normalized device space, by Perspective divide:
ndc_position = / gl_Position.w
The normalized device space is a cube, with the left bottom front of (-1, -1, -1) and the right top back of (1, 1, 1).
The x and y component of the normalized device space coordinate is linear mapped to the viewport, which is set by gl.viewport (See WebGL Viewport). The viewport is a rectangle with an origin (x, y) and a width and a height:
xw = (ndc_position.x + 1) * (width / 2) + x
yw = (ndc_position.y + 1) * (height / 2 ) + y
xw and yw can be accessed by gl_FragCoord.xy in the fragment shader.
The z component of the normalized device space coordinate is linear mapped to the depth range, which is by default [0.0, 1.0], but can be set by gl.depthRange. See Viewport Depth Range. The depth range consists of a near value and a far value. far has to be greater than near and both values have to be in [0.0, 1.0]:
depth = (ndc_position.z + 1) * (far-near) / 2 + near
The depth can be accessed by gl_FragCoord.z in the fragment shader.
All this operations are done automatically in the rendering pipeline and are part of the Vertex Post-Processing.
I'm struggling to find a method/strategy to handle drawing with stored coordinates and the variation in canvas dimensions across various devices and screen sizes for my web app.
Basically I want to display an image on the canvas. The user will mark two points on an area of image and the app records where these markers were placed. The idea is that the user will use the app every odd day, able to see where X amount of previous points were drawn and able to add two new ones to the area mentioned in places not already marked by previous markers. The canvas is currently set up for height = window.innerHeight and width = window.innerWidth/2.
My initial thought was recording the coordinates of each pair of points and retrieving them as required so they can be redrawn. But these coordinates don't match up if the canvas changes size, as discovered when I tested the web page on different devices. How can I record the previous coordinates and use them to mark the same area of the image regardless of canvas dimensions?
Use percentages! Example:
So lets say on Device 1 the canvas size is 150x200,
User puts marker on pixel 25x30. You can do some math to get the percentage.
And then you SAVE that percentage, not the location,
let userX = 25; //where the user placed a marker
let canvasWidth = 150;
//Use a calculator to verify :D
let percent = 100 / (canvasWidth / userX); //16.666%
And now that you have the percent you can set the marker's location based on that percent.
let markerX = (canvasWidth * percent) / 100; //24.999
canvasWidth = 400; //Lets change the canvas size!
markerX = (canvasWidth * percent) / 100; //66.664;
And voila :D just grab the canvas size and you can determine marker's location every time.
Virtual Canvas
You must define a virtual canvas. This is the ideal canvas with a predefined size, all coordinates are relative to this canvas. The center of this virtual canvas is coordinate 0,0
When a coordinate is entered it is converted to the virtual coordinates and stored. When rendered they are converted to the device screen coordinates.
Different devices have various aspect ratios, even a single device can be tilted which changes the aspect. That means that the virtual canvas will not exactly fit on all devices. The best you can do is ensure that the whole virtual canvas is visible without stretching it in x, or y directions. this is called scale to fit.
Scale to fit
To render to the device canvas you need to scale the coordinates so that the whole virtual canvas can fit. You use the canvas transform to apply the scaling.
To create the device scale matrix
const vWidth = 1920; // virtual canvas size
const vHeight = 1080;
function scaleToFitMatrix(dWidth, dHeight) {
const scale = Math.min(dWidth / vWidth, dHeight / vHeight);
return [scale, 0, 0, scale, dWidth / 2, dHeight / 2];
const scaleMatrix = scaleToFitMatrix(innerWidth, innerHeight);
Scale position not pixels
Point is defined as a position on the virtual canvas. However the transform will also scale the line widths, and feature sizes which you would not want on very low or high res devices.
To keep the same pixels size but still render in features in pixel sizes you use the inverse scale, and reset the transform just before you stroke as follows (4 pixel box centered over point)
const point = {x : 0, y : 0}; // center of virtual canvas
const point1 = {x : -vWidth / 2, y : -vHeight / 2}; // top left of virtual canvas
const point2 = {x : vWidth / 2, y : vHeight / 2}; // bottom right of virtual canvas
function drawPoint(ctx, matrix, vX, vY, pW, pH) { // vX, vY virtual coordinate
const invScale = 1 / matrix[0]; // to scale to pixel size
ctx.lineWidth = 1; // width of line
ctx.rect(vX - pW * 0.5 * invScale, vY - pH * 0.5 * invScale, pW * invScale, pH * invScale);
ctx.setTransform(1,0,0,1,0,0); // reset transform for line width to be correct
const ctx = canvas.getContext("2d");
drawPoint(ctx, scaleMatrix, point.x, point.y, 4, 4);
Transforming via CPU
To convert a point from the device coordinates to the virtual coordinates you need to apply the inverse matrix to that point. For example you get the pageX, pageY coordinates from a mouse, you convert using the scale matrix as follows
function pointToVirtual(matrix, point) {
point.x = (point.x - matrix[4]) / matrix[0];
point.y = (point.y - matrix[5]) / matrix[3];
return point;
To convert from virtual to device
function virtualToPoint(matrix, point) {
point.x = (point.x * matrix[0]) + matrix[4];
point.y = (point.y * matrix[3]) + matrix[5];
return point;
Check bounds
There may be an area above/below or left/right of the canvas that is outside the virtual canvas coordinates. To check if inside the virtual canvas call the following
function isInVritual(vPoint) {
return ! (vPoint.x < -vWidth / 2 ||
vPoint.y < -vHeight / 2 ||
vPoint.x >= vWidth / 2 ||
vPoint.y >= vHeight / 2);
const dPoint = {x: page.x, y: page.y}; // coordinate in device coords
if (isInVirtual(pointToVirtual(scaleMatrix,dPoint))) {
console.log("Point inside");
} else {
console.log("Point out of bounds.");
Extra points
The above assumes that the canvas is aligned to the screen.
Some devices will be zoomed (pinch scaled). You will need to check the device pixel scale for the best results.
It is best to set the virtual canvas size to the max screen resolution you expect.
Always work in virtual coordinates, only convert to device coordinates when you need to render.
The app allows users to upload a photo of themselves and then place a pair of glasses over their face to see what it looks like. For the most part, it is working fine. After the user selects the location of the 2 pupils, I auto zoom the image based on the ratio between the distance of the pupils and then already known distance between the center points of the glasses. All is working fine there, but now I need to automatically place the glasses image over the eyes.
I am using KinectJS, but the problem is not with regards to that library or javascript.. it is more of an algorithm requirement
Distance between pupils (eyes)
Distance between pupils (glasses)
Glasses width
Glasses height
Zoom ratio
//.. code before here just zooms the image, etc..
//problem is here (this is wrong, but I need to know what is the right way to calculate this)
var newLeftEyeX = self.leftEyePosition.x * ratio;
var newLeftEyeY = self.leftEyePosition.y * ratio;
//create a blue dot for testing (remove later)
var newEyePosition = new Kinetic.Circle({
radius: 3,
fill: "blue",
stroke: "blue",
strokeWidth: 0,
x: newLeftEyeX,
y: newLeftEyeY
var glassesWidth = glassesImage.getWidth();
var glassesHeight = glassesImage.getHeight();
// this code below works perfect, as I can see the glasses center over the blue dot created above
newGlassesPosition.x = newLeftEyeX - (glassesWidth / 4);
newGlassesPosition.y = newLeftEyeY - (glassesHeight / 2);
A math genius to give me the algorithm to determine where the new left eye position should be AFTER the image has been resized
After researching this for the past 6 hours or so, I think I need to do some sort of "translate transform", but the examples I see only allow setting this by x and y amounts.. whereas I will only know the scale of the underlying image. Here's the example I found (which cannot help me):
and here is something which looks interesting, but it is for Silverlight:
Get element position after transform
Is there perhaps some way to do the same in Html5 and/or KinectJS? Or perhaps I am going down the wrong road here... any ideas people?
I tried this:
// if zoomFactor > 1, then picture got bigger, so...
if (zoomFactor > 1) {
// if x = 10 (for example) and if zoomFactor = 2, that means new x should be 5
// current x / zoomFactor => 10 / 2 = 5
newLeftEyeX = self.leftEyePosition.x / zoomFactor;
// same for y
newLeftEyeY = self.leftEyePosition.y / zoomFactor;
else {
// else picture got smaller, so...
// if x = 10 (for example) and if zoomFactor = 0.5, that means new x should be 20
// current x * (1 / zoomFactor) => 10 * (1 / 0.5) = 10 * 2 = 20
newLeftEyeX = self.leftEyePosition.x * (1 / zoomFactor);
// same for y
newLeftEyeY = self.leftEyePosition.y * (1 / zoomFactor);
that didn't work, so then I tried an implementation of Rody Oldenhuis' suggestion (thanks Rody):
var xFromCenter = self.leftEyePosition.x - self.xCenter;
var yFromCenter = self.leftEyePosition.y - self.yCenter;
var angle = Math.atan2(yFromCenter, xFromCenter);
var length = Math.hypotenuse(xFromCenter, yFromCenter);
var xNew = zoomFactor * length * Math.cos(angle);
var yNew = zoomFactor * length * Math.sin(angle);
newLeftEyeX = xNew + self.xCenter;
newLeftEyeY = yNew + self.yCenter;
However, that is still not working as expected. So, I am not sure what the issue is currently. If anyone has worked with KinectJS before and has an idea of what the issue may be, please let me know.
I checked Rody's calculations on paper and they seem fine, so there is obviously something else here messing things up.. I got the coordinates of the left pupil at zoom factors 1 and 2. With those coordinates, maybe someone can figure out what the issue is:
Zoom Factor 1: x = 239, y = 209
Zoom Factor 2: x = 201, y = 133
OK, since it's an algorithmic question, I'm going to keep this generic and only write pseudo code.
I f I understand you correctly, What you want is the following:
Transform all coordinates such that the origin of your coordinate system is at the zoom center (usually, central pixel)
Compute the angle a line drawn from this new origin to a point of interest makes with the positive x-axis. Compute also the length of this line.
The new x and y coordinates after zooming are defined by elongating this line, such that the new line is the zoom factor times the length of the original line.
Transform the newly found x and y coordinates back to a coordinate system that makes sense to the computer (e.g., top left pixel = 0,0)
Repeat for all points of interest.
In pseudo-code (with formulas):
x_center = image_width/2
y_center = image_height/2
x_from_zoom_center = x_from_topleft - x_center
y_from_zoom_center = y_from_topleft - y_center
angle = atan2(y_from_zoom_center, x_from_zoom_center)
length = hypot(x_from_zoom_center, y_from_zoom_center)
x_new = zoom_factor * length * cos(angle)
y_new = zoom_factor * length * sin(angle)
x_new_topleft = x_new + x_center
y_new_topleft = y_new + y_center
Note that this assumes the number of pixels used for length and width stays the same after zooming. Note also that some rounding should take place (keep everything double precision, and only round to int after all calculations)
In the code above, atan2 is the four-quadrant arctangent, available in most programming languages, and hypot is simply sqrt(x*x + y*y), but then computed more carefully (e.g., to avoid overflow etc.), also available in most programing languages.
Is this indeed what you were after?
Before you think "why is this guy asking for help on this problem, surely this has been implemented 1000x" - while you are mostly correct, I have attempted to solve this problem with several open source libs yet here I am.
I am attempting to implement an SVG based "zoom in on mouse wheel, focusing on the mouse" from scratch.
I know there are many libraries that accomplish this, d3 and svg-pan-zoom to name a couple. Unfortunately, my implementations using those libs are falling short of my expectations. I was hoping that I could get some help from the community with the underlying mathematical model for this type of UI feature.
Basically, the desired behavior is like Google Maps, a user has their mouse hovering over a location, they scroll the mouse wheel (inward), and the scale of the map image increases, while the location being hovered over becomes the horizontal and vertical center of the viewport.
Naturally, I have access to the width / height of the viewport and the x / y of the mouse.
In this example, I will only focus on the x axis, the viewport is 900 units wide, the square is 100 units wide, it's x offset is 400 units, and the scale is 1:1
<g transform="translate(0 0) scale(1)">
Assuming the mouse x position was at or near 450 units, if a user wheels in until scale reached 2:1, I would expect the x offset to reach -450 units, centering the point of focus like so.
<g transform="translate(-450 0) scale(2)">
The x and y offsets need to be recalculated on each increment of wheel scroll as a function of the current scale / mouse offsets.
All of my attempts have fallen utterly short of the desired behavior, any advice is appreciated.
While I appreciate any help, please refrain from answering with suggestions to 3rd party libraries, jQuery plugins and things of that nature. My aim here is to understand the mathematical model behind this problem in a general sense, my use of SVG is primarily illustrative.
What I usually do is I maintain three variable offset x offset y and scale. They will be applied as a transform to a container group, like your element <g transform="translate(0 0) scale(1)">.
If the mouse would be over the origin the new translation would be trivial to calculate. You just multiply the offset x and y by the difference in scale :
offsetX = offsetX * newScale/scale
offsetY = offsetY * newScale/scale
What you could do is translate the offset so that the mouse is at the origin. Then you scale and then you translate every thing back. Have a look at this typescript class that has a scaleRelativeTo method to do just what you want:
export class Point implements Interfaces.IPoint {
x: number;
y: number;
public constructor(x: number, y: number) {
this.x = x;
this.y = y;
add(p: Interfaces.IPoint): Point {
return new Point(this.x + p.x, this.y + p.y);
snapTo(gridX: number, gridY: number): Point {
var x = Math.round(this.x / gridX) * gridX;
var y = Math.round(this.y / gridY) * gridY;
return new Point(x, y);
scale(factor: number): Point {
return new Point(this.x * factor, this.y * factor);
scaleRelativeTo(point: Interfaces.IPoint, factor: number): Point {
return this.subtract(point).scale(factor).add(point);
subtract(p: Interfaces.IPoint): Point {
return new Point(this.x - p.x, this.y - p.y);
So if you have given transform given by translate(offsetX,offsetY) scale(scale) and a scroll event took place at (mouseX, mouseY) leading to a new scale newScale you would calculate the new transform by :
offsetX = (offsetX - mouseX) * newScale/scale + mouseX
offsetY = (offsetY - mouseY) * newScale/scale + mouseY