Ansel MacLaughlin

Applied Scientist at Amazon

Content-based Models of Quotation: Datasets

data_content-based_models_of_quotation.zip contains two of the datasets (KJB-CA and LAT-EJC) from our paper Content-based Models of Quotation (EACL-21). Unfortunately, we are unable to share any data from the JSTOR Understanding Series (KJB-JA, SHAK-JA, ABL-JA). Please contact JSTOR to discuss getting access to this data.

If you use either of our datasets, please cite our paper:

@inproceedings{maclaughlin-smith-2021-content,
  author={Ansel MacLaughlin and David A. Smith},
  title={Content-based Models of Quotation},
  booktitle={EACL},
  year={2021}
}

kjb-ca.jsonl: King James Bible - Chronicling America:

lat-ejc.jsonl: Latin Text - JSTOR Early Journal Content: