Self Tracking Pan & Tilt Gimbal for Video Conferencing

Video conference calls have started to move from being infrequent, relatively short “video telephone calls” to being much longer remote presentations and interactions of equal participants.

As the amount of time spent on camera has increased it becomes evident that participants need more advanced cameras- able to track their motion around a space with out intervention.

This is robustly solved problem for non-live video applications: there are many powered gimbals available for phones and cameras. However most of these solutions require a dedicated application on the device which can be used for recording, but not with video conferencing solutions.

The goal of this project is to make a gimbal that tracks a target independently, to which can be attached any camera- web camera, phone, DSLR, camcorder, etc.

Active vs Passive Tracking

How is the target subject to be tracked?

This could be done by to detecting the targets inherent properties or influences- reflected light, heat signature, etc.

Or the target could be required to carry an active emitter to which the gimbal is highly tuned.

Active tracking should produce a much higher signal to noise ratio than passive tracking, and be more reliable and simple to implement. A disadvantage is that the tracking transmitter may interfere with recording equipment; IR LEDs may show up in video footage, RF may cause interference in both the audio and video.

Passive tracking has the advantage of fewer parts required in the final system. As well, it shouldn’t interfere with recording equipment.

Active Tracking

Possible implementation:

  • IR transmitter (LED) as tracking target
  • WII controller as sensor. It’s capable of tracking up to 4 IR point sources
  • connect via Bluetooth to gimbal controller


  • wii controller needs to be mounted close to the camara
  • if that’s not possible, may need to use 2 wii controllers so that the target can be triangulated. With known relative locations of the wii controllers and the gimbal, trigonometry can be used to calculate the correct orientation of the ginmbal


Passive Tracking

Quick run through: OpenCV is a in image processing toolkit. It has a face/object detection algorithms built in. It uses a Haar cascade… need to research the details, but in short, it’s looking at images with a certain type of wavelet filter to detect features. It’s manually tuned.

Versus a “trained” neural network implemented with TensorFlow. The example TensorFlow network above can label many things in an image, and you can thus use it to track a dog, bird, person, cup, … There is TensorFlow hardware, so it can run much faster than OpenCV.

An article contrasting these approaches:

General References

Tracking Products