Previous: Contents Up: ReadMe: Software for Automated Next: Installation

Introduction

The ReadMe software package for R takes as input a set of text documents (such as speeches, blog posts, newspaper articles, judicial opinions, movie reviews, etc.), a categorization scheme chosen by the user (e.g., ordered positive to negative sentiment ratings, unordered policy topics, or any other mutually exclusive and exhaustive set of categories), and a small subset of text documents hand classified into the given categories. The hand classified subset need not be a random sample and can differ in dramatic but specific ways from the population of documents. If used properly, ReadMe will report, normally within sampling error of the truth, the proportion of documents within each of the given categories among those not hand coded.

ReadMe computes the proportion of documents in each category without the more error-prone intermediate step of classifing individual documents. This is an important limitation for some purposes, but not for most social science applications. For example, we have been unable to locate many published examples of content analysis in political science where the ultimate goal was individual-level classification rather than the generalizations provided by the proportion of documents within each category, or perhaps the proportion within each category in subsets of the documents (such as policy areas or years). It appears that a similar point also applies to the most social sciences and related academic areas. Thus, for example, our method cannot be used to classify letters to a legislative representative by policy area, but it could accurately estimate the distribution of letters by policy areas -- which makes the method useless in helping the legislator route letters to the most informed employee to draft a response, but useful for a political scientist tracking the intensity of this form of constituency expression by policy.

The specific procedures implemented in ReadMe are described in

Daniel Hopkins and Gary King. 2007. ``Extracting Systematic Social Science Meaning from Text,'' http://GKing.Harvard.edu/.



Gary King 2011-07-12