Human tracking is a complex vision problem for computers to emulate.
We propose the use of a second camera to increase the robustness of tracking ability.
The system presented is capable of 3D tracking which has potential in fields such as augmented reality.
The algorithm presented in this paper is designed to detect and track human movement in real-time from 3D footage. Techniques are discussed that hold potential for a tracking system when combined with stereoscopic video capture using the extra depth included in the footage. This information allows for the production of a robust and reliable system.
One of the major issues associated with this problem is computational expense. Real-time tracking systems have been designed that work off a single image for tasks such as video surveillance. The different images will be recorded as if they are eyes, at the same distance apart from one another approximately 6-7cm. In order to use 3D imagery two separate images need to be analysed, combined and the human motion detected and extracted. The greatest benefit of this system would be the extra image of information to which conventional systems do not have access such as the benefit of depth perception in the overlapping field of view from the cameras.
In this paper, we describe the motivation behind using 3D footage and the technical complexity of the problem. The system is shown tracking a human in a scene indoors and outdoors with video output from the system of the detected regions. This first prototype created here has further uses in the field of motion capture, computer gaming and augmented reality.
Key words: 3D Image, human detection, human tracking, foreground detection.
Computer vision is a challenging field of computing where the ability of an algorithm to produce a valid output is often not the only measure of success. Often, one of the biggest problems in computer vision is the computation cost to run the algorithm in real-time. Real-time human tracking is a problem that many advances in recent years devised. The majority of current systems developed use a single camera with no ability of depth perception. The goal of the system presented is to take advantage of the depth perception ability given by adding a second camera spaced just under that of the intraocular distance. Recent advances in 3D televisions and movies create an industrial requirement for future innovations to keep up with the demand that followed.
Multiple camera human tracking is not a new area of research with many researchers and companies trying to find a robust and easy to set-up camera. Typically, these systems are made up out of multiple cameras that can see the human from different viewpoints [Ama99]1. A larger number of these systems are starting to focus on stereoscopic cameras being used from fixed locations utilising background subtraction techniques and creating disparity mapping on the resultant image. Figure 1 -i shows how using two cameras gives us an overlaid region where three-dimensional view exists assuming webcams with a 90° range. All objects within the 3D region have a different parallax. Closer objects have larger parallax than distant objects. This knowledge is used in the creation of depth mapping. Here a system is presented that uses this knowledge of differing parallax to detect a person who is close to the camera.
Figure 1‑i: Image showing the 3D region created by using two webcams
Human tracking is a complex vision problem for computers to simulate. Computers lack depth perception without use of specialist equipment such as the Microsoft Kinect. The goal of the system developed here is to give the computer depth perception in the way humans have by having two cameras at eye distance apart.
The system developed here will be used in further work in augmented reality where 3D analysis of a scene has the potential to create more robust systems than are currently available. In this scenario, users of the system can then be exposed to uniquely generated frames for each individual eye.
Throughout this paper, all statistics reported are from test preformed on a computer running on a virtual operating system running Windows XP, with access to a single 3.4 GHz core processor, 1GB memory (369MB used by OS). The camera used is a stereoscopic camera recording at VGA resolution (640x480) at 60 fps (frames per second).