ROCCA uses a random forest classifier model based on the open-source statistical software package WEKA (http://www.cs.waikato.ac.nz/ml/weka/index.html). For more information on random forests and the WEKA package, please refer to Witten et al. (2011).
Whistle Classification
ROCCA measures the 50 features from each whistle contour. See Appendix A for a description of each of these variables.
ROCCA’s Random Forest classifier was trained using 50 variables measured from single-species schools of dolphins that had visual confirmation of species identity (see Oswald et al. 2007 and Oswald 2013 for details on the training datasets). During whistle classification, features measured from a whistle contour are run through the Random Forest model and each tree in the forest produces a species classification. Each tree can be considered 1 ‘vote’ for a given species classification. Votes are tallied over all trees and the whistle is classified as the species with the most ‘votes’. In addition to classifying individual whistles, encounters are classified based on the number of tree classifications for each species, summed over all of the whistles that were analyzed for that encounter.
The number of tree classifications for the predicted species is also used as a measure of the certainty of the classification. If a greater percentage of trees classifies the whistle as a particular species, then the classification is considered to have a higher degree of certainty. The ‘strong whistle threshold’ (specified in the ROCCA parameters window) is the percentage of trees that must classify the whistle as a given species in order for that classification to be considered reliable. If the percentage of trees classifying the whistle as a particular species falls below the strong whistle threshold, the whistle is classified as ambiguous. Similarly, encounters are classified as ambiguous unless the percentage of tree votes (summed over all of the whistles in the encounter) for the predicted species exceeds the ‘strong school threshold’ (see Section 3.2.2 for details on how to set the strong whistle and strong school thresholds). ‘
School Classification
The School Stats output file contains a list of possible species based on the classifier model used. There are two values stored for each species: the number of times a whistle has been classified to that species (also displayed on the ROCCA sidebar) and a cumulative total of the percentage of tree votes for the species (not displayed on the ROCCA sidebar). When a new whistle classification is saved to a School Stats file, the number of whistles classified as that species is increased by one and the percentage of tree votes for each species are added to the corresponding cumulative totals. ROCCA classifies an encounter as the species with the highest cumulative percentage of tree votes. If the highest cumulative percentage of tree votes falls below the school threshold (as specified in the ROCCA Parameters window, Section 3.2.2), the encounter is classified as Ambiguous.
Note! The species with the highest cumulative percentage of tree votes may be different than the species with the greatest number of whistle classifications (the value shown in the sidebar species list).
This page intentionally left blank.
The results of individual whistle classifications are grouped into encounters as defined by the user. Each group must be given a name, the encounter number. In addition to classifying individual whistles, ROCCA also classifies the overall encounter. The encounter classification is determined by summing the percentage of trees voting for each species over all of the whistles classified in that encounter. The species with the highest cumulative percentage of tree votes is the species classification for that encounter.
Figure 13. The ROCCA sidebar.
Encounter number: the current encounter number. This is the encounter number used when a new whistle is selected from the spectrogram display. Any combination of numbers and letters can be used to specify the encounter number.
Scroll buttons: allow you to scroll through the list of encounter numbers.
Classification results: displays a tally of the number of whistles classified as each species for the current encounter number. The list of possible species is based on the currently loaded classifier model. Species are denoted by the first letter of the genus and species (ex. Gm = Globicephala macrorhynchus). The number beside the species name indicates the number of whistles classified to that species. See Appendix B for a list of species included in the tropical Pacific and Atlantic classifiers, along with their genus-species codes.
School classification: displays the species classification for the current encounter.
Rename Encounter: renames the current encounter. Any previously saved output files that use the old encounter number in the filename will be renamed using the new encounter number.
Note! The information contained within the whistle Contour Stats file is NOT updated—you must modify any references to the old encounter number manually. Also note that you are not allowed to duplicate encounter numbers.
Save Encounter: overwrites the current School Stats file (as defined in the ROCCA Parameters window) with the current list of encounters and classification results. School classification results are also saved automatically every five minutes.
New Encounter: creates a new encounter.
Whistle Start: lists the time and frequency of the first user-selected point on the spectrogram.
Whistle End: lists the time and frequency of the second user-selected point on the spectrogram.
Note! Once you select the second point, the portion of the spectrogram in between the first and second points is captured in a new popup window.
OUTPUT
ROCCA saves three different files during whistle classification: whistle clip, contour points, and contour parameters. ROCCA will also save detection stats automatically every five minutes, as well as when the Save Detection button is clicked in the ROCCA sidebar (Section 7).
If a database module is being used, ROCCA will also save the data in two tables: Rocca_Whistle_Stats and Rocca_Detection_Stats.
Whistle Clip
ROCCA saves the whistle clip in a .wav file format to the output directory. The start and end points of the clip are defined by the start and end points that you originally selected in the spectrogram popup window. The channels saved to the clip file are specified in the ROCCA Parameters window (Section 3.2.1). ROCCA saves the file according to the filename defined in the ROCCA Parameters window (Section 3.2.4)
Contour Points
ROCCA saves the time/frequency pair for each extracted contour point in a .csv file in the output directory. The duty cycle, the energy in a frequency band around the peak frequency (as defined in the ROCCA Parameters window), and the RMS value of the amplitude are also saved. ROCCA saves the file according to the filename defined in the ROCCA Parameters window (Section 3.2.4).
Contour Features
ROCCA saves the features measured from the current contour, as well as the classification results (the percentage of trees voting for each species), in a .csv format Contour Stats file in the output directory. The information from each classified whistle is appended to the end of the file, and the file is never overwritten. Thus, this file will continue to collect classification information every time ROCCA is run.
Other information that is saved for each whistle includes the sound source, date and time, and encounter number. The end of each row in the Contour Stats file lists the name of the random forest model, the percentage of trees voting for each species, and a corresponding list of the species names. The species names are added to each row instead of to the header line because the header is created based on information from the first whistle contour analyzed. If you use a different classification model for the analysis of subsequent whistles, the species list may be different and may no longer match the header. By including the species list in the row, you are always able to verify which species were included in the classification algorithm for a particular whistle contour.
ROCCA saves the file according to the filename specified in the ROCCA Parameters window (Section 3.2.3). If a database module is being used, the data will also be saved to the Rocca_Whistle_Stats table.
School Stats
ROCCA saves classification results for all encounters in a .csv format School Stats file in the output directory. For each encounter, ROCCA includes the cumulative random forest tree vote totals for each species, a list of species in the classifier, and the overall school classification (based on the species with the highest cumulative tree vote total).
Each time the School Stats file is saved, either through the auto-save function or by pressing the Save Detection button, ROCCA overwrites the file in order to update any renamed encounters numbers. Since an encounter number can be renamed but never deleted, no information will be lost when overwriting an old file during a single PAMGuard session. HOWEVER, if PAMGuard is closed and restarted, the file will be overwritten with blank data and all prior information will be lost. ROCCA searches for the file at startup. If the file exists, you are given the opportunity to rename it before it is lost, and/or load the existing data back into the system.
Note! When examining the classification results for a particular encounter number, you should refer to the species list at the end of the row instead of the species listed in the header. The header information is taken from the first encounter number listed. If subsequent encounter numbers use different classification models, the included species may change and this change is not reflected in the header.
ROCCA saves the School Stats file according to the filename specified in the ROCCA Parameters window (Section 3.2.3). If a database module is being used, the data will also be saved to the Rocca_Detection_Stats table.
This page intentionally left blank
literature cited
Gillespie, D., J. Gordon, R. McHugh, D. McLaren, D.K. Mellinger, P. Redmond, A. Thode, P. Trinder, and D. Xiao. (2008). PAMGUARD: Semiautomated, open-source software for real-time acoustic detection and localization of cetaceans. Proceed. Instit. Acoust. 30, Part 5. 9 pp.
Oswald, J.N. (2013). Development of a Classifier for the Acoustic Identification of Delphinid Species in the Northwest Atlantic Ocean. Final Report. Submitted to HDR Environmental, Operations and Construction, Inc. Norfolk, Virginia under Contract No. CON005-4394-009, Subproject 164744, Task Order 003, Agreement # 105067. Prepared by Bio-Waves, Inc., Encinitas, California.
Oswald, J.N., S. Rankin, J. Barlow, and M.O. Lammers. (2007). A tool for real-time acoustic species identification of delphinid whistles. J. Acoust. Soc. Am. 122, 587-595.
Witten, I.H., E. Frank and M.A. Hall. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman Publishers, ISBN: 978-0-12-374856-0.
Appendix A:
Variables Measured by ROCCA
This page intentionally left blank.
Appendix A:
Variables Measured by ROCCA
Variable
|
Explanation
|
Begsweep
|
slope of the beginning sweep (1 = positive, -1 = negative, 0 = zero)
|
Begup
|
binary variable: 1=beginning slope is positive, 0=beginning slope is negative
|
Begdwn
|
binary variable: 1=beginning slope is negative, 0=beginning slope is positive
|
Endsweep
|
slope of the end sweep (1 = positive, -1 = negative, = 0 zero)
|
Endup
|
binary variable: 1=ending slope is positive, 0=ending slope is negative
|
Enddwn
|
binary variable: 1=ending slope is negative, 0=ending slope is positive
|
Beg
|
beginning frequency (Hz)
|
End
|
ending frequency (Hz)
|
Min
|
minimum frequency (Hz)
|
Dur
|
duration (sec)
|
Range
|
maximum frequency–minimum frequency (Hz)
|
Max
|
maximum frequency (Hz)
|
mean freq
|
mean frequency (Hz)
|
median freq
|
median frequency (Hz)
|
std freq
|
standard deviation of the frequency (Hz)
|
Spread
|
difference between the 75th and the 25th percentiles of the frequency
|
quart freq
|
frequency at one quarter of the duration (Hz)
|
half freq
|
frequency at one half of the duration (Hz)
|
Threequart
|
frequency at three quarters of the duration (Hz)
|
Centerfreq
|
(minimum frequency + (maximum frequency-minimum frequency))/2
|
rel bw
|
relative bandwidth: (max freq - min freq)/center freq
|
Maxmin
|
max freq/min freq
|
Begend
|
beg freq/end freq
|
Cofm
|
coefficient of frequency modulation: take 20 frequency measurements equally spaced in time, then subtract each frequency value from the one before it. COFM is the sum of the absolute values of these differences, all divided by 10,000
|
tot step
|
number of steps (10 percent or greater increase or decrease in frequency over two contour points)
|
tot inflect
|
number of inflection points (changes from positive to negative or negative to positive slope)
|
max delta
|
maximum time between inflection points
|
min delta
|
minimum time between inflection points
|
maxmin delta
|
max delta/min delta
|
mean delta
|
mean time between inflection points
|
std delta
|
standard deviation of the time between inflection points
|
median delta
|
median of the time between inflection points
|
mean slope
|
overall mean slope
|
mean pos slope
|
mean positive slope
|
mean neg slope
|
mean negative slope
|
mean absslope
|
mean absolute value of the slope
|
Posneg
|
mean positive slope/mean negative slope
|
perc up
|
percent of the whistle that has a positive slope
|
perc dwn
|
percent of the whistle that has a negative slope
|
perc flt
|
percent of the whistle that has zero slope
|
up dwn
|
number of inflection points that go from positive slope to negative slope
|
dwn up
|
number of inflection points that go from negative slope to positive slope
|
up flt
|
number of times the slope changes from positive to zero
|
dwn flt
|
number of times the slope changes from negative to zero
|
flt dwn
|
number of times the slope changes from zero to negative
|
flt up
|
number of times the slope changes from zero to positive
|
step up
|
number of steps that have increasing frequency
|
step dwn
|
number of steps that have decreasing frequency
|
step.dur
|
number of steps/duration
|
inflect.dur
|
number of inflection points/duration
|
Appendix B:
Genus Species Codes for the Tropical Pacific and Atlantic Classifiers
This page intentionally left blank.
Appendix B:
Genus Species Codes for the Tropical Pacific and Atlantic Classifiers
Tropical Pacific Classifier
Code
|
Scientific Name
|
Common name
|
Ambig
|
n/a
|
Ambiguous
|
Dc_Dd
|
Delphinus capensis and D. delphis
|
Long- and short-beaked common dolphin
|
Gm
|
Globicephala macrorhynchus
|
Short-finned pilot whale
|
Pc
|
Pseudorca crassidens
|
False killer whale
|
Sa
|
Stenella attenuata
|
Pantropical spotted dolphin
|
Sb
|
Steno bredanensis
|
Rough-toothed dolphin
|
Sc
|
Stenella coeruleoalba
|
Striped dolphin
|
Sl
|
Stenella longirostris
|
Spinner dolphin
|
Tt
|
Tursiops truncatus
|
Bottlenose dolphin
|
Atlantic Classifier
Code
|
Scientific Name
|
Common name
|
Ambig
|
n/a
|
Ambiguous
|
Dd
|
Delphinus delphis
|
Short-beaked common dolphin
|
Sc
|
Stenella coeruleoalba
|
Striped dolphin
|
Tt
|
Tursiops truncatus
|
Bottlenose dolphin
|
Sf
|
Stenella frontalis
|
Atlantic spotted dolphin
|
Gm
|
Globicephala macrorhynchus
|
Short-finned pilot whale
|
This page intentionally left blank.
Appendix C:
Description of CSV File Columns
This page intentionally left blank.
Appendix C:
Description of CSV File Columns
Contour Points File
Header
|
Description
|
Time [ms]
|
Time elapsed (since PAMGuard started)
|
Peak Frequency [Hz]
|
Frequency with the highest amplitude in the time slice
|
Duty Cycle, Energy, WindowRMS
|
Variables used internally by ROCCA
|
Contour Features File
Header
|
Description
|
Source
|
Source of acoustic data (sound card, filename, etc.)
|
Date-Time
|
Local (computer) date and time when the whistle was captured
|
Detection Count
|
Running tally of whistles captured since ROCCA was started. Number is incremented each time a whistle is sent to ROCCA
|
Encounter Number
|
Encounter number as specified by the user
|
Classified Species
|
Species classification of whistle
|
FREQMAX … STEPDUR
|
Features measured by ROCCA and used as input to the random forest classifier
|
Classifier
|
Name of classifier used
|
{no header}
|
The remaining columns contain the percentage of trees voting for each species. The final column contains the order of the species shown in the voting columns. For example, if the final column contains Gm-Dd-Sc-Sf-Tt, it indicates the first voting column contains the percentage of trees voting for Gm, the second voting column contains the percentage of trees voting for Dd, etc.
|
School Stats File
Header
|
Description
|
Encounter Number
|
Encounter number as specified by the user
|
{list of species, starting with Ambig}
|
The number of whistles classified as each species in the current encounter number
|
{list of species votes, starting with Ambig}
|
Percentage of trees voting for each species, summed over all whistles in the current encounter number. The species with the highest total percentage of votes is the overall encounter classification
|
Encounter Classification
|
Overall species classification for the current encounter
|
This page intentionally left blank.
Share with your friends: |