User Tools

Site Tools


datasets

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
datasets [2022/06/07 16:45] – [Text Data] prcurtisdatasets [2022/07/16 02:07] (current) – [Text Data] prcurtis
Line 1: Line 1:
 ======Datasets====== ======Datasets======
  
-Please note that in the interest of space and clarity not every dataset available will be listed in the subcategories below. The Speech Resources Consortium page, for example, provides dozens of corpora, as does the Japan Data Catalog for the Humanities and Social Sciences. Please refer to their individual pages for more updated information on available datasets.+Please note that in the interest of space and clarity not every dataset available will be listed in the subcategories below. The Speech Resources Consortium page, for example, provides dozens of corpora, as does the Japan Data Catalog for the Humanities and Social Sciences (which had nearly 8,000 open-access datasets as of June 2022). Please refer to their individual pages for more updated information on available datasets.
  
 =====Repositories and Portals===== =====Repositories and Portals=====
Line 42: Line 42:
   * [[https://www.nii.ac.jp/dsc/idr/recruit/|Recruit Co. Ltd. Hot Pepper Beauty Dataset]]   * [[https://www.nii.ac.jp/dsc/idr/recruit/|Recruit Co. Ltd. Hot Pepper Beauty Dataset]]
   * [[https://www.nii.ac.jp/dsc/idr/nico/|Nico Nico video comment Dataset]]   * [[https://www.nii.ac.jp/dsc/idr/nico/|Nico Nico video comment Dataset]]
 +  * [[https://lapis.nichibun.ac.jp/haikai/menu.html|Nichibun Haikai Database]] (text in HTML) 
 +  * [[https://lapis.nichibun.ac.jp/renga/menu.html|Nichibun Renga Database]] (text in HTML) 
 +  * [[https://lapis.nichibun.ac.jp/waka/menu.html|Nichibun Waka Database]] (text in HTML) 
 +  * [[https://dataverse.harvard.edu/dataverse/amycatalinac|Electoral Datasets]] (Amy Catalinac, Harvard Dataverse Repository)
 =====OCR Training===== =====OCR Training=====
  
Line 53: Line 56:
  
   * [[http://geoshape.ex.nii.ac.jp/|Geoshape Repository (Geoshapeリポジトリ)]]   * [[http://geoshape.ex.nii.ac.jp/|Geoshape Repository (Geoshapeリポジトリ)]]
 +  * [[https://www.gsi.go.jp/ENGLISH/index.html|Geospational Information Authority of Japan]]
   * [[https://sites.fas.harvard.edu/~chgis/data/japan/|Japan Historical Map Data (Harvard)]]   * [[https://sites.fas.harvard.edu/~chgis/data/japan/|Japan Historical Map Data (Harvard)]]
   * [[https://www.jodc.go.jp/aboutJODC_work_data.html|Japan Oceanographic Data Center (JODC)]]   * [[https://www.jodc.go.jp/aboutJODC_work_data.html|Japan Oceanographic Data Center (JODC)]]
Line 63: Line 67:
   * [[http://codh.rois.ac.jp/edo-spots/|Edo Sightseeing Guide]]   * [[http://codh.rois.ac.jp/edo-spots/|Edo Sightseeing Guide]]
  
 +=====Stopwords=====
  
 +**[[https://github.com/stopwords-iso/stopwords-ja/blob/master/stopwords-ja.txt|Common Stopwords for Japanese]]**
 +[[https://github.com/stopwords-iso|Stopwords ISO]]
  
 =====Image & Video Data===== =====Image & Video Data=====
datasets.1654620308.txt.gz · Last modified: 2022/06/07 16:45 by prcurtis