Zesting Up Stylometry with MapLemon: A Corpus for Stylometric Demographic Identification

DOI

10.16995/DSCN.9665

Document Type

Journal Article

Publication Date

1-1-2023

Publication Title

Digital Studies/ Le Champ Numerique

Volume

13

Issue

3

Abstract

MapLemon is a corpus in its second iteration that was created to obtain a baseline corpus for linguistic variation among English-speaking North Americans. The MapLemon corpus currently houses upwards of 21,000 words across 185 participants, 10+ linguistic backgrounds, and 40+ US states and Canadian provinces. MapLemon also houses writing from 91 transgender and non-binary individuals. MapLemon presents a unique method for data collection in the virtual written medium and a corpus that has proven useful for identifying demographic information via writing style, otherwise known as stylometry.

Open Access

Gold

Share

COinS