A Chinese Version of an Authorship Attribution Analysis Program

Author

Mengjia Zhao

Defense Date

7-21-2008

Graduation Date

Summer 1-1-2008

Availability

Campus Only

Submission Type

thesis

Degree Name

MS

Department

Computational Mathematics

School

McAnulty College and Graduate School of Liberal Arts

Committee Chair

Patrick Juola

Committee Member

Mark S. Mazur

Committee Member

Carl Toews

Keywords

Java, Chinese, Authorship, cross-entropy, FMM

Abstract

The thesis will give an introduction and background for the Authorship Attribution problem in Chinese, and how we extend the existing JGAAP framework and make a few modifications to handle the special problems of Authorship Attribution in Chinese. Then varieties of methods have been used to test. The corpus we used for testing includes four authors and 32 Chinese novels. We found that Character or forward maximum matching (FMM)-segmented words in conjunction with the K-Nearest Neighbor calculated using nominal KS worked best in our test.

Format

PDF

Language

English

This document is currently not available here.

Share

COinS