Creating a Simple Fitness Game with MoveNet and TypeScript

In the realm of interactive applications, real-time physical interaction stands out as a dynamic and engaging way to blend the digital and physical worlds. One cutting-edge technology at the forefront of this integration is pose estimation, which opens up exciting possibilities for fitness and gaming alike.

In this tutorial, we’ll dive into the process of creating a simple fitness game leveraging TensorFlow’s MoveNet and TypeScript. This guide is designed to be accessible, emphasizing ease of understanding and implementation without the need for extensive prior experience with additional frameworks or libraries.

For those eager to see the complete code and follow along more interactively, a companion repository is available. This repo includes a commit history that aligns with each step of the development process, ensuring you can see exactly how the project evolves from start to finish. If at any point something isn’t working as expected, the commit log is an excellent resource to help troubleshoot and understand the changes in context.

Setting Up the Project

We set up the project using Vite, a modern build tool that provides hot reloading and some sane defaults for TypeScripts. This way, we can focus on the essentials without getting bogged down by configuration details. The project is initiated with the following command:

yarn create vite fitness-game --template react-ts

Note: You’ll need Node 18+ and Yarn 1.22+ for this

This sets up a basic TypeScript and React environment where we can start building our game. The latter is not really important, we won’t be using any React today. We only need the structure generated by the template. That’s why we’ll directly go ahead and delete all files in src/ except main.tsx and vite-env.d.ts.

Creating a Camera Mirror

Accessing the user’s webcam is the first step in our application. We use the getUserMedia API to capture video from the user’s camera.

// src/utils.js
export async function captureCamera(): Promise<MediaStream> {
  const constraints: MediaStreamConstraints = {
    audio: false,
    video: {
      width: {ideal: 4096},
      height: {ideal: 2160}
    }
  }
  return await navigator.mediaDevices.getUserMedia(constraints);
}

We could just display the resulting MediaStream with a regular <video> element, but we’re going with an <canvas> element instead. The reason is simple: We’ll need to render additional game elements and UI later, so we’ll need the canvas anyway. But we also need a video element as a source for the pose estimation. For this reason, let’s set the video element to display: none and only show the canvas.

<!--index.html-->
...
<div>
  <video id="video" style="display: none"></video>
  <canvas id="canvas"></canvas>
</div>
...

Here’s how we can pipe the MediaStream into the video element and render that on the canvas:

// src/CanvasRenderer.ts
export class CanvasRenderer {
  private videoElement: HTMLVideoElement;
  private canvasElement: HTMLCanvasElement;
  private context: CanvasRenderingContext2D;

  constructor(stream: MediaStream, video: HTMLVideoElement, canvas: HTMLCanvasElement) {
    this.canvasElement = canvas;
    this.context = canvas.getContext("2d") as CanvasRenderingContext2D;
    this.videoElement = video;
    this.videoElement.srcObject = stream;
    const settings = stream.getVideoTracks()[0].getSettings();
    this.setResolution(settings.width || 1920, settings.height || 1080);
    this.videoElement.play();
  }

  renderCanvas() {
    this.context.setTransform(-1, 0, 0, 1, this.canvasElement.width, 0);
    this.context.drawImage(this.videoElement, 0, 0, this.videoElement.width, this.videoElement.height);
    requestAnimationFrame(this.renderCanvas.bind(this));
  }

  setResolution(width: number, height: number) {
    if (this.videoElement && this.canvasElement) {
      this.videoElement.width = width;
      this.videoElement.height = height;
      this.canvasElement.width = width;
      this.canvasElement.height = height;
    }
  }
}

Now all we have to do is to connect everything in main.tsx.

// src/main.tsx
import {captureCamera} from "./utils.ts";
import {CanvasRenderer} from "./CanvasRenderer.ts";

captureCamera().then(stream => {
  const video: HTMLVideoElement = document.getElementById("video") as HTMLVideoElement;
  const canvas: HTMLCanvasElement = document.getElementById("canvas") as HTMLCanvasElement;
  const renderer = new CanvasRenderer(stream, video, canvas);
  renderer.renderCanvas();
})

Integrating TensorFlow Pose Detection

For pose detection, we integrate TensorFlow’s keypoint-based model MoveNet, which comes with a highly efficient library. Keypoints are specific points on a person’s body that the software identifies and tracks. These points typically represent major joints like elbows, knees, or the tip of the nose. By detecting these points, MoveNet provides real-time spatial data, allowing us to understand the user’s posture and movements within the game space. This functionality is crucial for interactive applications where user movement directly controls or influences the game mechanics.

Let’s install the dependencies:

yarn add @mediapipe/pose @tensorflow-models/pose-detection @tensorflow/tfjs-core @tensorflow/tfjs-converter @tensorflow/tfjs-backend-webgl @tensorflow/tfjs-backend-webgpu

Next we add the PoseDetectionContainer class, which manages the pose detection logic. That includes the creation of a pose detector and the continuous estimation of poses from the video stream. The setup involves importing necessary TensorFlow modules and configuring the MoveNet model:

// src/PoseDetectionContainer.ts
import {
  createDetector,
  movenet,
  MoveNetModelConfig, Pose,
  PoseDetector,
  SupportedModels
} from "@tensorflow-models/pose-detection";
import '@tensorflow/tfjs-core';
import '@tensorflow/tfjs-backend-webgl';

type PosesCallback = (poses: Pose[]) => void;

export class PoseDetectionContainer {
  private videoElement: HTMLVideoElement;
  private detectorPromise: Promise<PoseDetector>;
  private newPosesCallback: PosesCallback;

  constructor(video: HTMLVideoElement, callback: PosesCallback) {
    this.videoElement = video;
    this.newPosesCallback = callback;
    const detectionModel = SupportedModels.MoveNet;
    const moveNetMultiConfig: MoveNetModelConfig = {
      modelType: movenet.modelType.MULTIPOSE_LIGHTNING
    };
    console.log("creating detector");
    this.detectorPromise = createDetector(detectionModel, moveNetMultiConfig);
  }

  async startDetection() {
    const detector = await this.detectorPromise;
    console.log("detector functional");
    setInterval(async () => {
      this.newPosesCallback(await detector.estimatePoses(this.videoElement));
    }, 50);
  }
}

One important thing to note here is that you need to import tfjs-core and one of the backends ( f.e. tfjs-backend-webgl), even if you don’t intend to use any of the classes or functions. This is because importing the modules initializes them and the pose detection library expects them to be present. It’s not really straightforward, just stick to the documentation.

Rendering Sprites on the HTMLCanvas

For the sake of simplicity, I didn’t want to get into a fully blown game engine for this project. Instead, I implemented some rudimentary features you would find in a game engine, like handling sprites. Sprites are graphical elements that represent objects in the game, such as targets or other interactive elements. Here’s how we manage these:

// src/CanvasRenderer.ts
export type Sprite = {
  image: CanvasImageSource | undefined,
  x: number,
  y: number,
  width: number,
  height: number,
}

export class CanvasRenderer {
  //...
  constructor(stream: MediaStream, video: HTMLVideoElement, canvas: HTMLCanvasElement) {
    //...
    this.spritesMap = new Map<string, Sprite>;
  }

  renderCanvas() {
    //...
    this.drawSprites();
  }

  drawSprites() {
    for (const [, sprite] of this.spritesMap) {
      if (sprite.image) {
        this.context.drawImage(sprite.image, sprite.x, sprite.y, sprite.width, sprite.height);
      } else {
        this.context.fillRect(sprite.x, sprite.y, sprite.width, sprite.height);
      }
    }
  }

  addSprite(sprite: Sprite, id?: string): string {
    id = id ? id : new Date().toString();
    this.spritesMap.set(id, sprite);
    return id;
  }

  moveSprite(id: string, x: number, y: number) {
    if (this.spritesMap.has(id)) {
      const entry = this.spritesMap.get(id) as Sprite;
      this.spritesMap.set(id, {...entry, x: x, y: y})
    } else {
      throw Error("Tried moving a sprite that doesn't exist");
    }
  }

  deleteSprite(id: string) {
    if (this.spritesMap.has(id)) {
      this.spritesMap.delete(id);
    } else {
      throw Error("Tried deleting a sprite that doesn't exist");
    }
  }

  hasSprite(id: string): boolean {
    return this.spritesMap.has(id);
  }

}

Manipulating Sprites with Keypoints

Now to the juicy part: Interaction. We can now add the GameManager class to use the keypoints detected by MoveNet to manipulate sprites on the canvas. Each keypoint represents a part of the user’s body, and we map these points to corresponding sprites to create some gameplay. Let’s use the nose for a start. According to the COCO keypoints map, that’s keypoint index 0.

I’ll admit the next code sample is quite a lot to take in, but I’ll explain more below:

// src/GameManager.ts
export class GameManager {
  renderer: CanvasRenderer;
  detectionContainer: PoseDetectionContainer;
  selectedKeypoint: number;
  currentPoses: Pose[] = [];
  previousPoses: Pose[] = [];

  constructor(renderer: CanvasRenderer, detectionContainer: PoseDetectionContainer, selectedKeypoint: number) {
    this.renderer = renderer;
    this.detectionContainer = detectionContainer;
    this.detectionContainer.setCallback(this.updatePoses.bind(this));
    this.selectedKeypoint = selectedKeypoint;
  }

  updatePoses(poses: Pose[]) {
    this.previousPoses = this.currentPoses;
    this.currentPoses = poses;
    this.updateSpritesForSelectedKeypoint();
  }

  updateSpritesForSelectedKeypoint() {
    //Ensure the sprites match the detected keypoints
    if ((this.currentPoses.length > this.previousPoses.length)) {
      for (let i = this.previousPoses.length; i < this.currentPoses.length; i++) {
        const identifier = this.getIdForKeypoint(i, this.selectedKeypoint);
        this.renderer.addSprite({image: undefined, x: 0, y: 0, width: 50, height: 50}, identifier);
      }
    } else if (this.currentPoses.length < this.previousPoses.length) {
      for (let i = this.currentPoses.length; i < this.previousPoses.length; i++) {
        const identifier = this.getIdForKeypoint(i, this.selectedKeypoint);
        this.renderer.deleteSprite(identifier);
      }
    }
    //Move sprites to the location of the keypoint
    if (this.currentPoses.length > 0) {
      for (let i = 0; i < this.currentPoses.length; i++) {
        const selectedKeypoint = this.currentPoses[i].keypoints[this.selectedKeypoint];
        this.renderer.moveSprite(
          this.getIdForKeypoint(i, this.selectedKeypoint),
          selectedKeypoint.x,
          selectedKeypoint.y
        );
      }
    }
  }

  getIdForKeypoint(poseIndex: number, keypointIndex: number): string {
    return `pose${poseIndex}point${keypointIndex}`;
  }
}

Okay, what’s going on with updateSpritesForSelectedKeypoint()? Why add and remove sprites? Why are there multiple poses?

If you remember how we started, we set the MoveNet variant to MULTIPOSE_LIGHTNING, which allows us to track up to five people. So when somebody enters or leaves the video, we need to ensure the number of sprites matches the number of poses. That might sound overkill, but it’s a good workaround for people with pictures of humans in the background - and an easy way to get some local multiplayer.

Adding Game Mechanics

What’s missing for a game now? Some entities to interact with. Let’s say you have to catch something falling from above. For this, we need to detect when our selected pose sprite collides with such a game entity. And we need to move those entities downwards, like they’re falling. Lastly, we might want to keep track of some score.

Here’s how I implemented that in the GameManager:

// src/GameManager.ts
export class GameManager {
  //...
  startGame() {
    this.score = 0;
    const width = this.renderer.getResolution()[0];
    const offset = width / 8;
    this.spawnInterval = setInterval(() => {
      const position = Math.random() * (width - offset * 2);
      const speed = this.baseSpeed + this.score;
      this.targets.push({
        x: offset + position,
        y: 0,
        speed: speed,
        score: speed / 3,
        id: new Date().toString()
      })
    }, 1000);
  }

  stopGame() {
    clearInterval(this.spawnInterval);
    this.targets = [];
  }

  moveTargets(passedTimeInMilliseconds: number) {
    for (let i = 0; i < this.targets.length; i++) {
      const target = this.targets[i];
      if (!this.renderer.hasSprite(target.id)) {
        this.renderer.addSprite(
          {
            image: undefined,
            x: target.x,
            y: target.y,
            width: 30,
            height: 30,
          },
          target.id,
        )
      } else {
        target.y += target.speed * passedTimeInMilliseconds / 1000; // CSS Pixels per second
        this.renderer.moveSprite(target.id, target.x, target.y);
      }
      if (target.y > this.renderer.getResolution()[1]) {
        this.renderer.deleteSprite(target.id);
        this.updateScore(-target.score);
        this.targets.splice(i, 1);
      }
    }
  }

  calculateHits() {
    let collisionIDs: string[] = [];
    for (let i = 0; i < this.currentPoses.length; i++) {
      collisionIDs = collisionIDs.concat(this.renderer.getOverlappingSprites(this.getIdForKeypoint(i, this.selectedKeypoint)));
    }
    //delete each target with the ids matching the ones in collisions
    for (const collisionID of collisionIDs) {
      this.targets = this.targets.filter(target => {
        const match = target.id === collisionID;
        if (match) {
          this.updateScore(target.score);
          this.renderer.deleteSprite(target.id);
        }
        return !match;
      });
    }
  }

  updateScore(delta: number) {
    if (this.score + delta > 0) {
      this.score += delta;
    } else {
      this.score = 0;
    }
    console.log(this.score)
  }

}

There’s more going on in the other CanvasRenderer to make it work. For example, I added the getOverlappingSprites() function for collision detection. And I added something I called registerRenderFunction() to make it possible to hook into the requestAnimationFrame() loop. If you’re interested, just take a look at the companion repo.

One more thing to note is that newly spawned targets have a speed depending on the score. That’s in startGame(). So the better you do at the game, the harder the targets are to catch. This way, we get adaptive difficulty.

Enhancing the Game with UI Elements

Finally, to make the game more visually appealing and user-friendly, we add some actual images for the sprites. I thought it would be fun to have cookies raining from above that you need to catch with your mouth. These need to be HTMLImageElements for the canvas, so we just do the same trick as with the video:

<!-- index.html -->
...
<div style="width: 100%; display: flex; justify-content: center;">
  <video id="video" style="display: none"></video>
  <img id="mouth" alt="mouth" src="/mouth.png" style="display: none">
  <img id="cookie" alt="mouth" src="/cookie.png" style="display: none">
  <canvas id="canvas"></canvas>
</div>
...

Then we pass them on to the GameManager, so they are used when targets or new players show up:

// src/main.tsx
//...
const cookie: HTMLImageElement = document.getElementById("cookie") as HTMLImageElement;
const mouth: HTMLImageElement = document.getElementById("mouth") as HTMLImageElement;
captureCamera().then(stream => {
  const renderer = new CanvasRenderer(stream, video, canvas);
  renderer.renderCanvas();
  const manager = new GameManager(renderer, container, 0, [mouth, cookie]);
  manager.startGame();
})

Conclusion

By following this tutorial, you’ve taken a significant step towards understanding and utilizing real-time pose estimation in web applications. We’ve explored how to set up a basic project with Vite, access and manipulate the webcam feed, integrate TensorFlow’s MoveNet for pose detection, and creatively use these poses to interact with game elements rendered on a canvas.

But the journey doesn’t have to end here. The flexibility of the technologies we’ve discussed allows for vast creativity and innovation. Consider enhancing the game with additional features like multiple pose tracking, integrating more complex game mechanics, or optimizing performance for various devices.

If you found this guide helpful or have developed something unique based on this foundation, don’t hesitate to share your results or contribute to the companion repository. Engaging with a community can provide valuable feedback and new ideas, and help others on their development journey. Let’s continue to push the boundaries of what’s possible with interactive web technologies!