Submitted to: Naval Facilities Engineering Command Atlantic under hdr environmental, Operations and Construction, Inc. Contract No. N62470-10-d-3011, Task Order 03 Prepared By


WHISTLE AND SCHOOL CLASSIFICATION



Download 1.87 Mb.
Page11/11
Date03.03.2018
Size1.87 Mb.
#42189
1   2   3   4   5   6   7   8   9   10   11

WHISTLE AND SCHOOL CLASSIFICATION


ROCCA uses a random forest classifier model based on the open-source statistical software package WEKA (http://www.cs.waikato.ac.nz/ml/weka/index.html). For more information on random forests and the WEKA package, please refer to Witten et al. (2011).
    1. Whistle Classification


ROCCA measures the 50 features from each whistle contour. See Appendix A for a description of each of these variables.

ROCCA’s Random Forest classifier was trained using 50 variables measured from single-species schools of dolphins that had visual confirmation of species identity (see Oswald et al. 2007 and Oswald 2013 for details on the training datasets). During whistle classification, features measured from a whistle contour are run through the Random Forest model and each tree in the forest produces a species classification. Each tree can be considered 1 ‘vote’ for a given species classification. Votes are tallied over all trees and the whistle is classified as the species with the most ‘votes’. In addition to classifying individual whistles, encounters are classified based on the number of tree classifications for each species, summed over all of the whistles that were analyzed for that encounter.



The number of tree classifications for the predicted species is also used as a measure of the certainty of the classification. If a greater percentage of trees classifies the whistle as a particular species, then the classification is considered to have a higher degree of certainty. The ‘strong whistle threshold’ (specified in the ROCCA parameters window) is the percentage of trees that must classify the whistle as a given species in order for that classification to be considered reliable. If the percentage of trees classifying the whistle as a particular species falls below the strong whistle threshold, the whistle is classified as ambiguous. Similarly, encounters are classified as ambiguous unless the percentage of tree votes (summed over all of the whistles in the encounter) for the predicted species exceeds the ‘strong school threshold’ (see Section 3.2.2 for details on how to set the strong whistle and strong school thresholds). ‘
    1. School Classification


The School Stats output file contains a list of possible species based on the classifier model used. There are two values stored for each species: the number of times a whistle has been classified to that species (also displayed on the ROCCA sidebar) and a cumulative total of the percentage of tree votes for the species (not displayed on the ROCCA sidebar). When a new whistle classification is saved to a School Stats file, the number of whistles classified as that species is increased by one and the percentage of tree votes for each species are added to the corresponding cumulative totals. ROCCA classifies an encounter as the species with the highest cumulative percentage of tree votes. If the highest cumulative percentage of tree votes falls below the school threshold (as specified in the ROCCA Parameters window, Section 3.2.2), the encounter is classified as Ambiguous.

Note! The species with the highest cumulative percentage of tree votes may be different than the species with the greatest number of whistle classifications (the value shown in the sidebar species list).

This page intentionally left blank.
  1. DISPLAYING THE RESULTS – THE ROCCA SIDEBAR


The results of individual whistle classifications are grouped into encounters as defined by the user. Each group must be given a name, the encounter number. In addition to classifying individual whistles, ROCCA also classifies the overall encounter. The encounter classification is determined by summing the percentage of trees voting for each species over all of the whistles classified in that encounter. The species with the highest cumulative percentage of tree votes is the species classification for that encounter.

Figure 13. The ROCCA sidebar.



  1. Encounter number: the current encounter number. This is the encounter number used when a new whistle is selected from the spectrogram display. Any combination of numbers and letters can be used to specify the encounter number.

  2. Scroll buttons: allow you to scroll through the list of encounter numbers.

  3. Classification results: displays a tally of the number of whistles classified as each species for the current encounter number. The list of possible species is based on the currently loaded classifier model. Species are denoted by the first letter of the genus and species (ex. Gm = Globicephala macrorhynchus). The number beside the species name indicates the number of whistles classified to that species. See Appendix B for a list of species included in the tropical Pacific and Atlantic classifiers, along with their genus-species codes.

  4. School classification: displays the species classification for the current encounter.

  5. Rename Encounter: renames the current encounter. Any previously saved output files that use the old encounter number in the filename will be renamed using the new encounter number.

Note! The information contained within the whistle Contour Stats file is NOT updated—you must modify any references to the old encounter number manually. Also note that you are not allowed to duplicate encounter numbers.

  1. Save Encounter: overwrites the current School Stats file (as defined in the ROCCA Parameters window) with the current list of encounters and classification results. School classification results are also saved automatically every five minutes.

  2. New Encounter: creates a new encounter.

  3. Whistle Start: lists the time and frequency of the first user-selected point on the spectrogram.

  4. Whistle End: lists the time and frequency of the second user-selected point on the spectrogram.

Note! Once you select the second point, the portion of the spectrogram in between the first and second points is captured in a new popup window.
  1. OUTPUT


ROCCA saves three different files during whistle classification: whistle clip, contour points, and contour parameters. ROCCA will also save detection stats automatically every five minutes, as well as when the Save Detection button is clicked in the ROCCA sidebar (Section 7).

If a database module is being used, ROCCA will also save the data in two tables: Rocca_Whistle_Stats and Rocca_Detection_Stats.


    1. Whistle Clip


ROCCA saves the whistle clip in a .wav file format to the output directory. The start and end points of the clip are defined by the start and end points that you originally selected in the spectrogram popup window. The channels saved to the clip file are specified in the ROCCA Parameters window (Section 3.2.1). ROCCA saves the file according to the filename defined in the ROCCA Parameters window (Section 3.2.4)
    1. Contour Points


ROCCA saves the time/frequency pair for each extracted contour point in a .csv file in the output directory. The duty cycle, the energy in a frequency band around the peak frequency (as defined in the ROCCA Parameters window), and the RMS value of the amplitude are also saved. ROCCA saves the file according to the filename defined in the ROCCA Parameters window (Section 3.2.4).
    1. Contour Features


ROCCA saves the features measured from the current contour, as well as the classification results (the percentage of trees voting for each species), in a .csv format Contour Stats file in the output directory. The information from each classified whistle is appended to the end of the file, and the file is never overwritten. Thus, this file will continue to collect classification information every time ROCCA is run.

Other information that is saved for each whistle includes the sound source, date and time, and encounter number. The end of each row in the Contour Stats file lists the name of the random forest model, the percentage of trees voting for each species, and a corresponding list of the species names. The species names are added to each row instead of to the header line because the header is created based on information from the first whistle contour analyzed. If you use a different classification model for the analysis of subsequent whistles, the species list may be different and may no longer match the header. By including the species list in the row, you are always able to verify which species were included in the classification algorithm for a particular whistle contour.

ROCCA saves the file according to the filename specified in the ROCCA Parameters window (Section 3.2.3). If a database module is being used, the data will also be saved to the Rocca_Whistle_Stats table.

    1. School Stats


ROCCA saves classification results for all encounters in a .csv format School Stats file in the output directory. For each encounter, ROCCA includes the cumulative random forest tree vote totals for each species, a list of species in the classifier, and the overall school classification (based on the species with the highest cumulative tree vote total).

Each time the School Stats file is saved, either through the auto-save function or by pressing the Save Detection button, ROCCA overwrites the file in order to update any renamed encounters numbers. Since an encounter number can be renamed but never deleted, no information will be lost when overwriting an old file during a single PAMGuard session. HOWEVER, if PAMGuard is closed and restarted, the file will be overwritten with blank data and all prior information will be lost. ROCCA searches for the file at startup. If the file exists, you are given the opportunity to rename it before it is lost, and/or load the existing data back into the system.



Note! When examining the classification results for a particular encounter number, you should refer to the species list at the end of the row instead of the species listed in the header. The header information is taken from the first encounter number listed. If subsequent encounter numbers use different classification models, the included species may change and this change is not reflected in the header.

ROCCA saves the School Stats file according to the filename specified in the ROCCA Parameters window (Section 3.2.3). If a database module is being used, the data will also be saved to the Rocca_Detection_Stats table.



This page intentionally left blank
  1. literature cited


Gillespie, D., J. Gordon, R. McHugh, D. McLaren, D.K. Mellinger, P. Redmond, A. Thode, P. Trinder, and D. Xiao. (2008). PAMGUARD: Semiautomated, open-source software for real-time acoustic detection and localization of cetaceans. Proceed. Instit. Acoust. 30, Part 5. 9 pp.

Oswald, J.N. (2013). Development of a Classifier for the Acoustic Identification of Delphinid Species in the Northwest Atlantic Ocean. Final Report. Submitted to HDR Environmental, Operations and Construction, Inc. Norfolk, Virginia under Contract No. CON005-4394-009, Subproject 164744, Task Order 003, Agreement # 105067. Prepared by Bio-Waves, Inc., Encinitas, California.

Oswald, J.N., S. Rankin, J. Barlow, and M.O. Lammers. (2007). A tool for real-time acoustic species identification of delphinid whistles. J. Acoust. Soc. Am. 122, 587-595.

Witten, I.H., E. Frank and M.A. Hall. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufman Publishers, ISBN: 978-0-12-374856-0.





Appendix A:

Variables Measured by ROCCA



This page intentionally left blank.

Appendix A:
Variables Measured by ROCCA


Variable

Explanation

Begsweep

slope of the beginning sweep (1 = positive, -1 = negative, 0 = zero)

Begup

binary variable: 1=beginning slope is positive, 0=beginning slope is negative

Begdwn

binary variable: 1=beginning slope is negative, 0=beginning slope is positive

Endsweep

slope of the end sweep (1 = positive, -1 = negative, = 0 zero)

Endup

binary variable: 1=ending slope is positive, 0=ending slope is negative

Enddwn

binary variable: 1=ending slope is negative, 0=ending slope is positive

Beg

beginning frequency (Hz)

End

ending frequency (Hz)

Min

minimum frequency (Hz)

Dur

duration (sec)

Range

maximum frequency–minimum frequency (Hz)

Max

maximum frequency (Hz)

mean freq

mean frequency (Hz)

median freq

median frequency (Hz)

std freq

standard deviation of the frequency (Hz)

Spread

difference between the 75th and the 25th percentiles of the frequency

quart freq

frequency at one quarter of the duration (Hz)

half freq

frequency at one half of the duration (Hz)

Threequart

frequency at three quarters of the duration (Hz)

Centerfreq

(minimum frequency + (maximum frequency-minimum frequency))/2

rel bw

relative bandwidth: (max freq - min freq)/center freq

Maxmin

max freq/min freq

Begend

beg freq/end freq

Cofm

coefficient of frequency modulation: take 20 frequency measurements equally spaced in time, then subtract each frequency value from the one before it. COFM is the sum of the absolute values of these differences, all divided by 10,000

tot step

number of steps (10 percent or greater increase or decrease in frequency over two contour points)

tot inflect

number of inflection points (changes from positive to negative or negative to positive slope)

max delta

maximum time between inflection points

min delta

minimum time between inflection points

maxmin delta

max delta/min delta

mean delta

mean time between inflection points

std delta

standard deviation of the time between inflection points

median delta

median of the time between inflection points

mean slope

overall mean slope

mean pos slope

mean positive slope

mean neg slope

mean negative slope

mean absslope

mean absolute value of the slope

Posneg

mean positive slope/mean negative slope

perc up

percent of the whistle that has a positive slope

perc dwn

percent of the whistle that has a negative slope

perc flt

percent of the whistle that has zero slope

up dwn

number of inflection points that go from positive slope to negative slope

dwn up

number of inflection points that go from negative slope to positive slope

up flt

number of times the slope changes from positive to zero

dwn flt

number of times the slope changes from negative to zero

flt dwn

number of times the slope changes from zero to negative

flt up

number of times the slope changes from zero to positive

step up

number of steps that have increasing frequency

step dwn

number of steps that have decreasing frequency

step.dur

number of steps/duration

inflect.dur

number of inflection points/duration



Appendix B:

Genus Species Codes for the Tropical Pacific and Atlantic Classifiers



This page intentionally left blank.

Appendix B:
Genus Species Codes for the Tropical Pacific and Atlantic Classifiers

Tropical Pacific Classifier



Code

Scientific Name

Common name

Ambig

n/a

Ambiguous

Dc_Dd

Delphinus capensis and D. delphis

Long- and short-beaked common dolphin

Gm

Globicephala macrorhynchus

Short-finned pilot whale

Pc

Pseudorca crassidens

False killer whale

Sa

Stenella attenuata

Pantropical spotted dolphin

Sb

Steno bredanensis

Rough-toothed dolphin

Sc

Stenella coeruleoalba

Striped dolphin

Sl

Stenella longirostris

Spinner dolphin

Tt

Tursiops truncatus

Bottlenose dolphin

Atlantic Classifier

Code

Scientific Name

Common name

Ambig

n/a

Ambiguous

Dd

Delphinus delphis

Short-beaked common dolphin

Sc

Stenella coeruleoalba

Striped dolphin

Tt

Tursiops truncatus

Bottlenose dolphin

Sf

Stenella frontalis

Atlantic spotted dolphin

Gm

Globicephala macrorhynchus

Short-finned pilot whale

This page intentionally left blank.

Appendix C:

Description of CSV File Columns



This page intentionally left blank.

Appendix C:
Description of CSV File Columns

Contour Points File



Header

Description

Time [ms]

Time elapsed (since PAMGuard started)

Peak Frequency [Hz]

Frequency with the highest amplitude in the time slice

Duty Cycle, Energy, WindowRMS

Variables used internally by ROCCA

Contour Features File

Header

Description

Source

Source of acoustic data (sound card, filename, etc.)

Date-Time

Local (computer) date and time when the whistle was captured

Detection Count

Running tally of whistles captured since ROCCA was started. Number is incremented each time a whistle is sent to ROCCA

Encounter Number

Encounter number as specified by the user

Classified Species

Species classification of whistle

FREQMAX … STEPDUR

Features measured by ROCCA and used as input to the random forest classifier

Classifier

Name of classifier used

{no header}

The remaining columns contain the percentage of trees voting for each species. The final column contains the order of the species shown in the voting columns. For example, if the final column contains Gm-Dd-Sc-Sf-Tt, it indicates the first voting column contains the percentage of trees voting for Gm, the second voting column contains the percentage of trees voting for Dd, etc.

School Stats File

Header

Description

Encounter Number

Encounter number as specified by the user

{list of species, starting with Ambig}

The number of whistles classified as each species in the current encounter number

{list of species votes, starting with Ambig}

Percentage of trees voting for each species, summed over all whistles in the current encounter number. The species with the highest total percentage of votes is the overall encounter classification

Encounter Classification

Overall species classification for the current encounter

This page intentionally left blank.

1 References to sections within this user’s manual have been hyperlinked.

2 For questions and requests related to a new classifier based on custom data, please contact Dr. Julie Oswald at Bio-Waves, Inc. at: julie.oswald@bio-waves.net.


Download 1.87 Mb.

Share with your friends:
1   2   3   4   5   6   7   8   9   10   11




The database is protected by copyright ©ininet.org 2024
send message

    Main page