datasets
Table of Contents
Datasets
Please note that in the interest of space and clarity not every dataset available will be listed in the subcategories below. The Speech Resources Consortium page, for example, provides dozens of corpora, as does the Japan Data Catalog for the Humanities and Social Sciences (which had nearly 8,000 open-access datasets as of June 2022). Please refer to their individual pages for more updated information on available datasets.
Repositories and Portals
Text Data
- Dataset of Premodern Japanese Text (text and page images)
- Dataset of Edo Cooking Recipes (text and page images)
- eStat (statistical information from government agencies)
- Statistics Japan (statistical information from the Statistics Bureau)
- Nichibun Haikai Database (text in HTML)
- Nichibun Renga Database (text in HTML)
- Nichibun Waka Database (text in HTML)
- Electoral Datasets (Amy Catalinac, Harvard Dataverse Repository)
OCR Training
- KMNIST Dataset (kuzushiji)
- Dataset of Modern Magazines (includes 東洋学芸雑誌, 国民之友, 明六雑誌)
Maps/GIS
Stopwords
Image & Video Data
IIIF
datasets.txt · Last modified: 2022/07/16 02:07 by prcurtis