datasets
This is an old revision of the document!
Table of Contents
Datasets
Repositories and Portals
Text Data
- Dataset of Premodern Japanese Text (text and page images)
- Dataset of Edo Cooking Recipes (text and page images)
- eStat (statistical information from government agencies)
- Statistics Japan (statistical information from the Statistics Bureau)
- https://clrd.ninjal.ac.jp/bccwj/en/index.html|Balanced Corpus of Contemporary Written Japanese (BCCWJ)]]
- https://www2.ninjal.ac.jp/cojads/index.html|Corpus of Japanese Dialects (COJADS)]]
- https://clrd.ninjal.ac.jp/csj/en/index.html|Corpus of Spontaneous Japanese (CSJ)
- https://www2.ninjal.ac.jp/conversation/cejc.html|Corpus of Everyday Japanese Conversation (CEJC)]]
- https://www2.ninjal.ac.jp/jll/lsaj/|International Corpus of Japanese as a Second Language (I-JAS)]]
- https://mmsrv.ninjal.ac.jp/nucc/|Nagoya University Conversation Corpus (NUCC)]]
- https://www2.ninjal.ac.jp/conversation/shokuba.html|Gen-Nichi-Ken Corpus of Workplace Conversation (CWPC)]]
- https://masayu-a.github.io/NWJC/|NINJAL Web Japanese Corpus (NWJC)]]
- https://clrd.ninjal.ac.jp/cmj/index.html|Corpus of Modern Japanese (CMJ)]]
- https://masayu-a.github.io/anno/|Annotation Data (Anno)]]
- https://www2.ninjal.ac.jp/conversation/showaCorpus/|Showa Speech Corpus (SSC)]]
- https://clrd.ninjal.ac.jp/chj/overview-en.html|Corpus of Historical Japanese (CHJ)]]
OCR Training
- KMNIST Dataset (kuzushiji)
- Dataset of Modern Magazines (includes 東洋学芸雑誌, 国民之友, 明六雑誌)
Maps/GIS
Image Data
IIIF
datasets.1654617401.txt.gz · Last modified: 2022/06/07 15:56 by prcurtis