Augmented Reality
Gerald Nielson
Abstract
Augmented Reality (AR) is the technology of combining real world images and/or video with computer generated information. The field of AR has very good potential to improve lives in many ways and help people learn, navigate, and search the environment. Until recently, the technology available has made development in this field not very meaningful or worthwhile. With the advancements in mobile phone technology, incorporating things like GPS data, a video camera, a compass, and an internet connection, the benefits of AR are becoming available to more and more people every day. Mobile AR allows users to integrate the information of the internet with their real lives. Many useful applications are currently available, each using different computer vision techniques and various approaches to development. The future of augmented reality looks very promising and with the advancements in technology it will someday be an important part of many people’s lives.
What is Augmented Reality?
Augmented, meaning; amplified, improved, or enhanced, and reality, meaning the world or state of things as they exist, or things that are actually experienced. Therefore, augmented reality is simply improving or enhancing a user’s reality. More formally, Augmented Reality is a live view of a physical, real environment whose components are enhanced or amplified by computer-generated sensory input such as sound, video, or GPS data [8].There are many examples of this type of technology everywhere and many may not even know that they are using it. Television has many indirect applications to enhance a show or sporting event for its viewers. The NFL makes watching American football games easier and more enjoyable by overlaying a yellow first-down marker along the width of the field; no one at the field can see it, but everyone viewing it on television has this augmented view of the game. There are also many applications available for use with the internet. The increased use of mobile phones and other portable devices have made augmented reality easily become a part of many people’s lives.
Types of Augmented Reality
There are two main types of augmented reality applications; Marker based and GPS based. Marker based applications use a marker that the computer/camera is searching for to overlay the marker/image with some content, graphics, or information. GPS based applications use the GPS information and other data from your mobile phone to provide relevant information about the environment around you. This information can easily be viewed through the camera in a smartphone.
Marker Based
Examples of this type of application include the USPS Priority Mail simulator and many other types of advertisements in magazines. This application uses your webcam and a marker printed from their website. It then portrays a 3-dimensional model of a shipping box over the marker for the user to determine if the box will be large enough to ship their desired item. There are also advertisements that give instructions from their website that enable users to view 3-dimensional models of the desired product. This allows a user to view a new model of a car from not only a 2-dimensional picture, but get an even better feel of what the car is like using a 3-D model.
GPS Based
There are a number of mobile browsers available for smartphones that allow a large number of useful augmented reality applications to be downloaded and used from a smartphone. Layar, Wikitude, and Junaio are just a few examples. Most of these applications are free to use and also provide ways to create your own. GPS based applications use the coordinates from your phone to locate points of interest near you, and display information such as distance interest points are from your location, directions, contact information, and user ratings that are all pulled from the internet.
Computer Vision
Computer vision has great potential with augmented reality, but there are many problems associated with this depending on the type of application. Computer vision is mainly used for marker-based and marker-less applications which involve detecting specific features in an image and then matching these to another description in a database. Because it can rely on visual features that are naturally present to register the camera, it does not require engineering the environment [4], or using markers that have to be setup before running the application. To register the camera, i.e. enable the camera to recognize certain images that are being viewed, the feature detection and matching process is used. This process involves using a marker or natural feature, the unique image or feature used to specify what information to display and where to place it.
Tracking by Detection vs. Recursive Tracking
Most of the current approaches to 3D tracking are based on what can be called recursive tracking. This method uses the images from the previous camera frame; meaning it searches for images based on what was found in the last frame. This approach has a few problems associated with it. First, the system must either be initialized by hand or require the camera to be very close to a specified position. Second, it makes the system very fragile. If something goes wrong between two consecutive frames, for example due to a complete occlusion of the target object or a very fast motion, the system can be lost and must be re-initialized in the same fashion [4]. Although the recursive tracking method may be more accurate, the tracking by detection method is more useful for video tracking applications or image stitching. In these applications the image being searched for is already found from the previous frame, so it already knows what it is searching for. In tracking-by-detection approaches, feature points are first extracted from incoming frames at run-time and matched against a database of feature points for which the 3D locations are known [4].
Figure 1: What kinds of features should be detected? [6]
Feature Detection and Matching
The feature detection and matching process involves certain steps in order to make sure the image being viewed is detected properly and its corresponding image match is found in the fastest way possible. A simple augmented reality application uses markers that have already been registered with the application and have specific images linked to be displayed on the marker position. Instead of using predetermined markers for object recognition, augmented reality applications can rely on natural features to register the camera. Natural features are simply parts of an object being viewed that can help identify it in the detection and matching process. There are a total of three steps in the feature detection and matching process, with the last two steps having two different options, feature matching and feature tracking. Features are a specific location in the image, or a unique point in an image which are used to find a small set of corresponding locations in different images. In figure 1, some features have more distinguishing characteristics than others such as the mountain edge compare to the point in the sky. The point in the sky does not have much change in color or intensity in any direction and would be very hard to match. Features could be specific points, straight lines, regions, or edges and the one(s) to use depends on the application and technique used. To give an overview of this process and some of the problems, it will be described using specific points as the feature being used for description and detection. These kinds of localized features are often called keypoint features or interest points (or even corners) and are often described by the appearance of patches of pixels surrounding the point location [6].
Feature Detection
This step of the process is referred to as the extraction phase, and each image is searched for locations that are likely to match well in other images. As you can see in figure 1, some features are more easily matched against each other and correspondences can be established faster with greater success. Patches with large contrast changes (gradients) are easier to localize and patches with gradients in at least two (significantly) different orientations are the easiest to localize [6]. The points detected in the sky will be very hard to localize and find a match because they don’t have a lot of change, i.e. they are hard to localize. The points with large contrast; along the mountain edge, make them more unique and easier to localize. In order to establish these matches, detectors and descriptors are used. A detector is used to create the descriptor and it needs to be repeatable, meaning the same feature needs to be detected in two or more different images of the same scene accounting for lighting and/or viewpoint changes.
Feature Description
A descriptor is a description of the distinctive point from the image stored in the database, application, or service. The two main requirements for a good descriptor are distinctiveness, i.e. feature points corresponding to two different physical points result in different descriptors, and invariance to changes in view points and directions, illumination and image noise. [3] Stable detectors are selected in the image from the detection step. In the description step, each interest point is represented by a feature vector, which is a description of the point. To obtain image information, image gradients are used. Image gradients give details on the directional change of the intensity or color in an image. To build the descriptor (figure 2), an oriented quadratic grid with 4x4 square sub-regions is laid over the interest point (left). For each square, the wavelet responses are computed. The 2x2 sub-divisions of each square correspond to the actual fields of the descriptor. These are the sums dx, |dx|, dy, and |dy|, computed relatively to the orientation of the grid (right). In figure 3, the left image shoes the case of a homogeneous region. All values are relatively low. The middle shows frequencies in x direction, the value of |dx| is high, but all others remain low. If the intensity is gradually increasing in x direction, both values dx and |dx| are high. [1]
Figure 2: Wavelet responses and descriptors [1].
Figure 3: The descriptor entries of a sub-region represent the nature of the underlying intensity pattern. [1].
Figure 4: Benefits of Gravity Aligned Feature Descriptors
Gravity Aligned Feature Descriptors (GAFD)
Because images can be viewed from varying perspectives, such as camera rotation, an image transformation is generally used to normalize the pixels in the region around a feature point. The descriptor is then computed based on this normalized region [3]. Including the gravity vector with the feature descriptor increases the speed and matching rate when detecting image features. In figure 4 above [3], The camera of the left mobile phone captures the window and the four corners act as feature points for which a descriptor is computed. On the left all four corners could be considered the same. On the right, when the gravity vector is added to the descriptor, it is possible to determine what corner is being viewed. Due to invariance to rotation, as schematically illustrated with the normalized regions of the features in the left column, an ideal feature descriptor would describe these features in exactly the same way making them indistinguishable [3]. There are three different approaches to take advantage of the gravity when it is used in the descriptor computation process, which are depicted in Figure 5 and described below.
Figure 5: Each feature point has a local (red) ol and global (blue) og orientation (a)
Regular feature descriptors with relative gravity orientation
Using regular feature descriptors with relative gravity orientation involves the same process of computing the feature descriptor as done before (Ol), but also stores the relative global orientation (Ogl = Og – Ol), depicted in figure 5(b). The local orientation (Ol) of a feature computed from the intensities of neighboring pixels is usually computed such that it provides the same normalized region at any viewpoint and view direction.[3] This would be used as part of the descriptor and to “rule-out” the comparison of the descriptors for features whose relative orientation (Ogl) are significantly different. The process becomes faster because only features with similar relative global orientation (Ogl) are compared against each other.
Gravity aligned feature descriptors
Another possibility is using the global orientation in place of the local orientation, which is shown in figure 5(c). When this approach is used matching results are improved. It also improves time because the computation of the local orientation is not needed.
Gravity aligned feature descriptors with relative local orientation
If the orientation of the gravity (Og) is accurate, Og can be used for the normalization and the relative global orientation (Ogl) is stored with every feature for use in the matching process. If there is more than one dominant orientation (Ol) these are all stored relative to the gravity. In figure 5 (c), only the descriptors of the features with one or more matching relative orientation Ogl need to be compared which makes the process faster and more accurate.
Feature Matching
Once features and their descriptors have been found and calculated from two or more images, the next step is to establish some feature matches between these images. The approach we take depends partially on the application, e.g., different strategies may be preferable for matching images that are known to overlap (e.g., in image stitching) vs. images that may have no correspondence whatsoever (e.g., when trying to recognize objects from a database) [6]. This step can be split into two steps. First, select a matching strategy based on the application. Then devise efficient ways to perform matching as quickly as possible. The simplest method to use is to set a threshold or maximum distance to search for matches and only return matches from other images that are within this distance. Setting the threshold too high results in too many false positives, i.e., incorrect matches being returned. Setting the threshold too low results in too many false negatives, i.e., too many correct matches being missed [6]. (Figure 4) The threshold can be evaluated by using the amount of false positives (incorrect match), false negatives (correct match not detected), true positives (correct match), and true negatives (non-match correctly rejected) to find the threshold with the best match rate based on these numbers.
Feature Tracking
The feature tracking step is an alternative to the feature matching step. Rather than independently finding features in all possible images and then matching them, feature tracking locates a set of likely feature locations in a first image and then searches for their matching locations in the following images. This kind of detect then track approach is more widely used for video tracking applications, where the expected amount of motion and appearance deformation between adjacent frames is expected to be small (or at least bounded) [6].
Augmented Reality Development
There are many tools for developing an augmented reality application with both GPS based applications and marker based/marker less applications. The development of GPS based applications has become more relevant recently because of the growth in the number of people using smartphones. These types of phones have everything needed to create augmented reality. Almost all phones have a camera to determine what is being viewed, GPS services to identify a user’s location and points of interest, a compass to determine the direction they are facing, accelerometer to determine the orientation of the phone, and an internet connection to provide relevant information. Approximately 1/3 of American adults own smartphones which is a very large market for augmented reality applications. Development of marker-based applications is also fairly simple and can be done in many different ways.
Marker-Based Development
There are many free software tools to create your own marker-based augmented reality application. FLARToolKit is an AS3 ported version of ARToolKit, but is based on NyARToolkit, which is the Java ported version of ARToolKit. ARToolKit is a software library with classes that use the computer vision algorithms to help create augmented reality applications. You can create your own 3-dimensional images to display but helper classes for major flash 3D engines are included. There are also many free flash development environments available. FlashDevelop is one that I found helpful and easy to use.
GPS based Development
Augmented reality browsers are available for smartphones and have a large number of applications available. Browsers such as Layar, Junaio, and Wikitude all have free ways of creating your own application for use through their specific browser. The steps involved in making your own Layar application are;
-
Sign up to be a developer, free to join and publish on Layar’s site.
-
Define and edit a layer on the publishing site.
-
Prepare the database which stores the POI (point of interest) information.
-
Gather POI information in which Google Maps is used to get the GPS coordinates but others can be used.
-
Build a web service to fetch the POI information which needs to be formatted in JSON (JavaScript Object Notation).
-
Test the layer.
-
Publish the layer.
These steps are fairly straightforward and the site also offers a forum and documentation for any questions you have or any issues you may encounter in the process.
The future of Augmented Reality
The field of augmented reality has a very bright future, especially with the fast rate of development in mobile phone technology. Google has plans to sell heads-up-display (HUD) glasses by the end of 2012, allowing the wearer to stream information to their eyes in real-time. Contact lenses are also being developed with electronic displays. Some augmented reality applications are useful currently, and it is likely that more will be very useful in the future. The tools that enable us to use this type of technology are getting better and less expensive everyday making it more available for the everyday user.
Advertising
Advertising is an area of augmented reality that hasn’t been fully tapped into yet. Some things, such as laws and regulations of advertising, are not defined yet in the augmented reality world. There is also a lot of potential for marketing to find new ways of attracting people to products and creating new types of engaging advertisements.
Virtual Air Rights
Augmented reality has created a new ‘realm’ of advertising and with it new laws and regulations that need to be defined and regulated somehow. As of right now, these issues are not solved and applications could be violating copyright laws without any penalty. For example, consider a mobile browser application that uses GPS coordinates from your phone to display advertisements directing you towards certain types of businesses. As of right now, the “virtual advertising space” has no regulations or laws to keep people from violating another company’s advertisement. In the ‘real-world’, real estate developers pay a lot of money to secure air rights for the empty space above their buildings. Monetizing by building up (as opposed to out) in crowded areas like Manhattan, they also get to dictate what advertisements appear in the air that they control. Augmented reality has made it possible for this same type of advertising to exist when using your smartphone. There are multiple apps that feature the ability for ads to appear on your mobile screen as mini-virtual billboards linked to GPS coordinates and points of interest. Currently there is nothing stopping, for example, a Layar application from displaying a “Coca-Cola” ad, where a Wikitude application would have otherwise displayed “Pepsi” ad in the same virtual air space. There is also an issue when certain applications manipulate a company’s logo when viewed through a mobile phone. As of right now, there is nothing stopping someone from making an app that displays a bad image over a company’s logo. The issue is that it is ‘your phone’, but it is the company’s logo that has copyright laws protecting it from being altered by anyone.
Conclusion
Until recently, the limits of technology have slowed the process of and desire for augmented reality development. With the advancements in mobile phone technology, incorporating things like GPS data, a video camera, a compass, and an internet connection, the benefits of AR are becoming available to more and more people every day. Mobile AR allows users to integrate the information of the internet with their real lives making access to a large amount of relevant data quick and easy where ever you may be. Many useful applications are currently available, each using different computer vision techniques depending on the application that is being used. These techniques are constantly being improved and will only lead to more relevant and enhanced applications. The field of augmented reality has a very optimistic future and may someday be an important part of many people’s lives.
References
[1] Bay, Herbert, Andreas Ess, Tinne Tuytelaars, and Luc Van Gool. "Speeded-Up Robust Features (SURF)." Web. 2012. .
[2] Frommelt, Daniel M. "Augmented Reality." Augmented Reality. 2009. Web. 21 Jan. 2012. .
[3] Kurz, Daniel, and Selim Ben Himan. "Inertial Sensor-aligned Visual Feature Descriptors." Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2011. 161-66. Http://da.nielkurz.de/. Web. 4 Feb. 2012. .
[4] Lepetit, Vincent. On Computer Vision for Augmented Reality. Tech. Web. 4 Feb. 2012. .
[5] "Re: FLARTOOLKIT/FLASH AUGMENTED REALITY GETTING STARTED." Web log comment. Mikko Haapoja's Blog. 2008. Web. 2012. .
[6] Szeliski, Richard. "Feature Detection and Matching." Computer Vision: Algorithms and Applications. 205-35. Http://szeliski.org/Book/. 3 Sept. 2010. Web. 11 Feb. 2012. .
[7] "Developers." Layar. Web. 11 Feb. 2012. .
[8] Wikipedia Editors, Internet. (2012). Retrieved February 4, 2012, from http://en.wikipedia.org/wiki/Augmented_reality.
Share with your friends: |