Defense Date

8-23-2010

Graduation Date

Fall 2010

Availability

Immediate Access

Submission Type

thesis

Degree Name

MS

Department

Computational Mathematics

School

McAnulty College and Graduate School of Liberal Arts

Committee Chair

Frank D'Amico

Committee Member

John Kern

Committee Member

John Fleming

Keywords

Adolescent obesity, Hierarchical clustering, Multivariate Outlier, NHIS, Outlier Mining, Similarity(Distance) Measure

Abstract

Outlier mining is a fundamental issue in many statistical analyses, especially in multivariate cases. Outliers may exert undue influence on outcomes of the analysis. In most cases, it is a big challenge to reveal the pattern of the outliers and the "outlyingness". There are several approaches and methods to detect anomalous data points in data. But no single method is perfect for every data set especially when the data dimension and volume is high. In this thesis, I review distance-based clustering methods for multivariate outlier mining and demonstrate the usefulness of it in a medical setting. Specifically, I discuss Hierarchical clustering and the multivariate methods of determining appropriate cluster(s). After mining the multivariate outliers, I examine and describe the characteristics of the variables for those outliers. Finally, I demonstrate the application of these methods using the National Health Interview Survey (NHIS) 2008 database for the purposes of studying adolescent obesity.

Format

PDF

Language

English

Share

COinS