A seminar report submitted by nidish kumar r V, ra1911003020205


APPENDIX 3 (A typical specimen of table of contents)



Download 1.11 Mb.
Page3/14
Date24.04.2022
Size1.11 Mb.
#58665
1   2   3   4   5   6   7   8   9   ...   14
Batch 15 Traffic Sign Recognition Report(1)
APPENDIX 3

(A typical specimen of table of contents)

TABLE OF CONTENTS

CHAPTER NO.

TITLE

PAGE













ABSTRACT

iii




LIST OF FIGURES

vi




LIST OF TABLES

vii




LIST OF SYMBOLS

viii










1

INTRODUCTION







1.1 INTRODUCTION

9




1.2 DRAWBACKS

9




1.3 CHALLENGES

10




1.4 SCOPE OF WORK

10










2

LITERATURE SURVEY







2.1 Real-TimeEmbedded Traffic Sign Recognition Using Efficient Convolutional Neural Network


12




2.2 A Cascaded R-CNN With Multiscale Attention and Imbalanced Samples for Traffic Sign Detection


14





2.3 Image Recognition and Safety Risk Assessment of Traffic Sign Based on Deep Convolution Neural Network


15




2.4 MR-CNN: A Multi-Scale Region-Based Convolutional Neural Network for Small Traffic Sign Recognition


16




2.5 Automatic Recognition of Traffic Signs Based on Visual Inspection





19











3

METHODOLOGY







3.1 REGION EXTRACTION

21




3.2 HOG FEATURE

22




3.2.1 Preprocessing

22




3.2.2 Hog calculation

23




3.2.3 Cell division

24




3.2.4 Normalization of 16x16 blocks

25




3.2.5 Calculation of hog feature of gradient direction

26




3.3 STACKED CAPSNET

26




3.4 DATA AUGMENTATION

29











4

EXPERIMENTS







4.1 DATASET

30




4.2 EXPERIMENTAL SETTINGS

30





4.3 RESULTS

31










5

CONCLUSION

34

6

REFERENCES

36


LIST OF FIGURES

Figure1. (a) ENet (Mixed Kernel). The network structure of step 6. The mixed Kernel reduced the loss and improved the accuracy, although the prediction time increased. (b) ENet-V1.The network structure of step 7. Shortcut reduced the loss and the improved accuracy, although the prediction time increased. This structure is the most accurate model, which is called ENet-V1. (c)ENet-V2.The network structure of step 8. The depthwise separable convolutions obviously reduced the prediction time, although the loss increased and the accuracy was reduced. This structure, which is called ENet-V2, is the most efficient model.

Figure 2. Illustration of the eight steps of building ENet. The dotted box represents an unselected option. 

Figure 3. cascaded R-CNN with multiscale attention and imbalanced samples.

Figure 4. Multiscale attention.

Figure 5. The compositions of RNN and LSTM network structure.

Figure 6. Changes in the safety risks of traffic accidents when the vehicle speed decreases.

Figure 7. To enhance the recognition accuracy of small traffic sign, they constructed the multi-scale fused feature map by using deconvolution in the detection stage and leverage the contextual information for the given region proposals in the classification stage.

Figure 8. Different types of traffic signs.

Figure 9. A stacked CAPSNET.

Figure 10.

Figure 11. Gradient magnitude map (left: x direction; middle: y direction; right: gradient magnitude).

Figure 12. Each RGB cell and its gradient (left) [30]; gradient magnitude and direction (right).

Figure 13. The visualized HOG feature.

Figure 14. The stacked model.

Figure 15.

Figure 16. (a) Original and preprocessed images; (b) 68 errors of our model.

Figure 17. The radar chart of five performance indices.

LIST OF TABLE

Table 1. Comparison of the Recognition Performance of Five Methods on 45 Categories. (in %)

Table 2. Detection Performance Regarding Different Feature Maps for 500 Region Proposals (IoU=0.5)

Table 3. The Performance of Each Model.

Table 4. The Recognition Accuracy of Each Technique in Our Model.

LIST OF SYMBOLS AND ABBREVATION

ROI – Region Of Extraction

HOG – Histogram Of Gradients

CAPSNET – Capsule Neural Network

CNN – Convolutional Neural Network

SVM – Support Vector Machine

R-FCN – Region – Based Fully Convolutional Network

DSC – Depthwise Seperable Convolutional

RSE – Root Square Error

MR-CNN – Multiscale Reigion Based Convolutional Nerual Network

CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION

Traffic sign recognition is a research hotspot in the application of visual navigation and computer vision in intelligent driving. Under multiple constraints, the recognition of traffic signs needs to realize various goals with a high accuracy through complex implementation methods. A minor classification error of traffic signs will bring disastrous consequences. In automatic driving, most targets, including traffic lights, routes, special vehicles, and the gestures of traffic police, are recognized by cameras or vehicle-to-everything (V2X) communication. Meanwhile, radar is intrinsically unable to identify signals like speed limit and stop sign. Cameras are installed on the dashboard of many autonomous vehicles and The associate editor coordinating the review of this manuscript and approving it for publication was Muhammad Khurram Khan , driver assistance systems, and used to capture the real-time images or videos embedded in the machine learning model of the car system. The deep learning algorithm of the model must be robust and reliable, so that the model could capture the traffic signs in different directions and poses. After all, the speed and geographic location change continuously as the vehicle drives through different environments and lights.



1.2 DRAWBACKS

However, traditional traffic sign recognition algorithms are basically driven by tasks, namely, color detection, shape recognition, and machine learning. Most of them are only applied in fully or semi-enclosed environments like expressways. Even the most popular traffic sign recognition algorithm, convolutional neural network (CNN), cannot effectively capture traffic sign features like pose, angle, and direction, due to the defect in max-pooling layer. For software reasons, the image quality will be reduced if the images are collected or transmitted on the computer. In addition, the quality of the images collected by the imaging sensor varies greatly. For example, the image quality is generally undesirable on rainy or foggy days, in dark nights, and under very dark light.



1.3 CHALLENGES

The CNN-based machine learning models are unable to cope with the above challenges. Therefore, it is urgent to cover traffic signs in the hierarchical contours of computer vision, and improve the accuracy and stability of traffic sign recognition. During the observation of scene images, the attention should be focused on the targets or regions of interest (ROIs). These targets or regions must carry striking visual features, such as edge contour, detailed texture, color gradient direction, color intensity, and spatial location.



1.4 SCOPE OF WORK

On this basis, this paper designs a method to extract candidate regions from traffic sign images through content analysis and key information acquisition. In complex scenes, our method could extract salient foreground targets with universal significance from the input image, and realize the recognition of multiple traffic sign images. However, the input images must meet two requirements:



  1. Each image needs to suit the perception mechanism of human eyes, that is, the visual target area must be clearly different from the background

  2. The collaborative visual targets in multiple images must have obvious similarities.

Angle is a thorny issue in image recognition, if there are illumination changes, and occlusions. In this case, it is difficult to obtain useful features through classification. Suppose our goal is to design a detector for the buttons on shirts or jackets, which are usually round (or oval in images) with several holes. Through edge detection, it is easy to judge whether a target in the image is a button based on the edges. In this example, edge information is useful, while color information is not. Besides, the useful features should also be discriminable. For instance, the good features extracted from an image should be able to differentiate buttons from other round objects (e.g., coins and wheels). Thus, the Histogram of Oriented Gradients (HOG) was designed for the target image, and the distribution of gradient directions was treated as a feature, in order to solve the projection distortion in image recognition. The gradient of the image (derivatives in the x and y directions) is very useful, because the edges and corners of the image (regions where the intensity changes sharply) have a large amplitude. Compared with other areas on the same plane, edges and corners contain more information about the shape of the object. For recognizing the images during weather changes Data Augmentation method is used.

To sum up, the traditional CNN cannot effectively recognize traffic signs in images taken in different environments, illuminations, speeds, positions, poses, angles, or directions. This paper puts forward a CapsNet-based traffic sign learning system. The neuronal system can dynamically capture the poses and directions of vehicles on road, effectively identify the traffic signs in different angles and directions, and give case descriptions of traffic signs. The main contributions of this paper are as follows:



  1. To improve visual inspection effect, we designed a method to extract candidate regions from the input image through content analysis and key information recognition.

  2. To prevent projection distortion, we developed an HOG method for actual images.

  3. To identify the traffic sign images during weather changes, we developed data augmentation method.

  4. A CapsNet-based traffic sign learning system was created to effectively capture the poses and directions of traffic signs.

  5. Through repeated experiments, our method was proved better than traditional CNN, support vector machine (SVM) and region-based fully convolutional network (R-FCN) ResNet 101 in traffic sign recognition.


CHAPTER 2

LITERATURE SURVEY

    1. REAL-TIME EMBEDDED TRAFFIC SIGN RECOGNITION USING EFFICIENT CONVOLUTIONAL NEURAL NETWORK

This paper, they introduced a new efficient TSC network called ENet (efficient network) and a TSD network called EmdNet (efficient network using multiscale operation and depthwise separable convolution). They used data mining and multiscale operation to improve the accuracy and generalization ability and used depthwise separable convolution (DSC) to improve the speed. The resulting ENet possesses 0.9 M parameters (1/15 the parameters of the start-of-the-art method) while still achieving an accuracy of 98.6 % on the German Traffic Sign Recognition benchmark (GTSRB). In addition, they designed EmdNet’s backbone network according to the principles of ENet. The EmdNet with the SDD Framework possesses only 6.3 M parameters, which is similar to MobileNet’s scale.

In order to improve the efficiency of the algorithm, they analysed the characteristics of the dataset, which is also a kind of data mining. They divided the validation set and conducted data augmentation more reasonable and effective through data.





Figure1. (a) ENet (Mixed Kernel). The network structure of step 6. The mixed Kernel reduced the loss and improved the accuracy, although the prediction time increased. (b) ENet-V1.The network structure of step 7. Shortcut reduced the loss and the improved accuracy, although the prediction time increased. This structure is the most accurate model, which is called ENet-V1. (c)ENet-V2.The network structure of step 8. The depthwise separable convolutions obviously reduced the prediction time, although the loss increased and the accuracy was reduced. This structure, which is called ENet-V2, is the most efficient model.



Figure 2. Illustration of the eight steps of building ENet. The dotted box represents an unselected option. 

In this paper they showed the experimental process of building ENet’s structure. The resulting ENet possesses 0.9M parameters (1/15 the parameters of the start-ofthe-art method) while still achieving an accuracy of 98.6% on the GTSRB. Furthermore, they designed EmdNet’s backbone network according the principle of the ENet. The resulting EmdNet, which is similar to MobileNet with the SDD Framework, possesses 6.3M parameters.These experimental results show that an efficient neural network architecture, which has adequate accuracy, generalization, and speed, can be designed for real-time embedded traffic sign recognition.



While these efforts are effective in recognizing traffic signs based on graphical methods, performance , these solutions do not work well in complex scenarios (e.g., low light, signs partially obscured, etc.), and are particularly ineffective in recognizing signs with different orientations or viewpoints.

    1. A CASCADED R-CNN WITH MULTISCALE ATTENTION AND IMBALANCED SAMPLES FOR TRAFFIC SIGN DETECTION

To solve the undetection and false detection, in this paper they proposed a cascaded R-CNN to obtain the multiscale features in pyramids. Each layer of the cascaded network except the first layer fuses the output bounding box of the previous one layer for joint training. This method contributes to the traffic sign detection. Then, they proposed a multiscale attention method to obtain the weighted multiscale features by dot-product and softmax, which is summed to fine the features to highlight the traffic sign features and improve the accuracy of the traffic sign detection. Finally, they increased the number of difficult negative samples for dataset balance and data augmentation in the training to relieve the interference by complex environment and similar false traffic signs. The data augment method expands the German traffic sign training dataset by simulation of complex environment changes. They conducted numerous experiments to verify the effectiveness of our proposed algorithm. The accuracy and recall rate of their method were 98.7% and 90.5% in GTSDB, 99.7% and 83.62% in CCTSDB and 98.9% and 85.6% in Lisa dataset respectively.



Figure 3. cascaded R-CNN with multiscale attention and imbalanced samples.



Figure 4. Multiscale attention.

Finally, to alleviate the interference of environmental factors and improve the detection accuracy, they increased the number of hard negative samples during the training stage and expand the GTSDB training dataset by generating real-world pictures containing traffic signs in situations such as lighting, weather changes. 

    1. IMAGE RECOGNITION AND SAFETY RISK ASSESSMENT OF TRAFFIC SIGN BASED ON DEEP CONVOLUTION NEURAL NETWORK

In this paper First, a dual-path deep CNN (TDCNN) TSR model is built based on the convolutional neural network (CNN), and the cost function and recognition accuracy are selected as indicators to analyze the training results of the model. Second, the recurrent neural network (RNN) and long-short-term memory (LSTM) RNN are utilized to assess the road traffic safety risks, and the prediction and evaluation effects of them are compared. Finally, the changes in safety risks of road traffic accidents are analyzed based on the two key influencing factors of the number of road intersections and the speed of vehicles traveling. The results show that the learning rate of the network model and the number of hidden neurons in the fully-connected layer directly affect the training results, and there are differences in the choices between the early and late training periods. Compared with RNN, the LSTM network model has higher evaluation accuracy, and its corresponding root square error (RSE) is 0.36. The rational control of the number of intersections and the speed of roads traveled has a significant impact on improving the safety level and promoting road traffic efficiency. The VR image recognition algorithm and safety risk prediction method based on a neural network model positively affect the construction of an intelligent transport network.



Figure 5. The compositions of RNN and LSTM network structure.



Figure 6. Changes in the safety risks of traffic accidents when the vehicle speed decreases.

    1. MR-CNN: A MULTI-SCALE REGION-BASED CONVOLUTIONAL NEURAL NETWORK FOR SMALL TRAFFIC SIGN RECOGNITION

In this paper, they proposed the multiscale region-based convolutional neural network (MR-CNN). At the detection stage, MR-CNN uses a multiscale deconvolution operation to up-sample the features of the deeper convolution layers and concatenates them to those of the shallow layer to construct the fused feature map. The fused feature map has the ability to generate fewer region proposals and achieve higher recall values. At the classification stage, they leveraged the multi-scale contextual regions to exploit the information surrounding a given object proposal and construct the fused feature for the fully connected layers. The fused feature map inside the region proposal network (RPN) focuses primarily on improving the image resolution and semantic information for small traffic sign detection, while outside the RPN, the fused feature enhances the feature representation by leveraging the contextual information. Finally, they evaluated MR-CNN on the largest dataset, Tsinghua Tencent 100K, which is suitable for their problem and more challenging than the GTSDB and GTSRB datasets. The final experimental results indicate that the MR-CNN is superior at detecting small traffic signs, and that it achieves the state-of-the-art performance compared with their methods.



Figure 7. To enhance the recognition accuracy of small traffic sign, they constructed the multi-scale fused feature map by using deconvolution in the detection stage and leverage the contextual information for the given region proposals in the classification stage.



Table 1. Comparison of the Recognition Performance of Five Methods on 45 Categories. (in %)

Small traffic sign detection is a challenging problem in computer vision because it is more difficult to localize due to their low resolution. For an object detection framework, when the targets do not appear in the region proposal set, the subsequent classification will be invalid. To highlight the impact of the proposed fused feature map and the advantage gained by the architecture,They compared the detection performance of region proposals for different feature maps in Table 2.



Table 2. Detection Performance Regarding Different Feature Maps for 500 Region Proposals (IoU=0.5)

    1. Download 1.11 Mb.

      Share with your friends:
1   2   3   4   5   6   7   8   9   ...   14




The database is protected by copyright ©ininet.org 2024
send message

    Main page