User Tools

Site Tools


datasets

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
datasets [2022/06/07 16:28] – [Image Data] prcurtisdatasets [2022/07/16 02:07] (current) – [Text Data] prcurtis
Line 1: Line 1:
 ======Datasets====== ======Datasets======
  
-Please note that in the interest of space and clarity not every dataset available will be listed in the subcategories below. The Speech Resources Consortium page, for example, provides dozens of corpora, as does the Japan Data Catalog for the Humanities and Social Sciences. Please refer to their individual pages for more updated information on available datasets.+Please note that in the interest of space and clarity not every dataset available will be listed in the subcategories below. The Speech Resources Consortium page, for example, provides dozens of corpora, as does the Japan Data Catalog for the Humanities and Social Sciences (which had nearly 8,000 open-access datasets as of June 2022). Please refer to their individual pages for more updated information on available datasets.
  
 =====Repositories and Portals===== =====Repositories and Portals=====
Line 32: Line 32:
   * [[https://www.nii.ac.jp/dsc/idr/en/rakuten/|NII- Rakuten Datasets ]]    * [[https://www.nii.ac.jp/dsc/idr/en/rakuten/|NII- Rakuten Datasets ]] 
   * [[https://www.nii.ac.jp/dsc/idr/en/rdata/Hazumi/|NII- Osaka University Multimodal Dialogue Corpus (Hazumi)]]   * [[https://www.nii.ac.jp/dsc/idr/en/rdata/Hazumi/|NII- Osaka University Multimodal Dialogue Corpus (Hazumi)]]
 +  * [[https://www.nii.ac.jp/dsc/idr/rdata/Ritsumei-ARC/|Ritsumeikan ARC Ukiyo-e Database]]
 +  * [[https://www.nii.ac.jp/dsc/idr/jast/|JAST Medical Dataset]]
 +  * [[https://www.nii.ac.jp/dsc/idr/athome/|At Home Co. Ltd. Real Estate Dataset]]
 +  * [[https://www.nii.ac.jp/dsc/idr/bengo4/|Bengo4.com Lawyer Dataset]]
 +  * [[https://www.nii.ac.jp/dsc/idr/diet-review/|Diet products Dataset]]
 +  * [[https://www.nii.ac.jp/dsc/idr/oricon/|Oricon customer satisfaction Dataset]]
 +  * [[https://www.nii.ac.jp/dsc/idr/intage/|INTAGE retail Dataset]]
 +  * [[https://www.nii.ac.jp/dsc/idr/fuman/|Insight Tech Co. Ltd. Dissatisfaction Inquiry Dataset]]
 +  * [[https://www.nii.ac.jp/dsc/idr/recruit/|Recruit Co. Ltd. Hot Pepper Beauty Dataset]]
 +  * [[https://www.nii.ac.jp/dsc/idr/nico/|Nico Nico video comment Dataset]]
 +  * [[https://lapis.nichibun.ac.jp/haikai/menu.html|Nichibun Haikai Database]] (text in HTML)
 +  * [[https://lapis.nichibun.ac.jp/renga/menu.html|Nichibun Renga Database]] (text in HTML)
 +  * [[https://lapis.nichibun.ac.jp/waka/menu.html|Nichibun Waka Database]] (text in HTML)
 +  * [[https://dataverse.harvard.edu/dataverse/amycatalinac|Electoral Datasets]] (Amy Catalinac, Harvard Dataverse Repository)
 =====OCR Training===== =====OCR Training=====
  
Line 42: Line 56:
  
   * [[http://geoshape.ex.nii.ac.jp/|Geoshape Repository (Geoshapeリポジトリ)]]   * [[http://geoshape.ex.nii.ac.jp/|Geoshape Repository (Geoshapeリポジトリ)]]
 +  * [[https://www.gsi.go.jp/ENGLISH/index.html|Geospational Information Authority of Japan]]
   * [[https://sites.fas.harvard.edu/~chgis/data/japan/|Japan Historical Map Data (Harvard)]]   * [[https://sites.fas.harvard.edu/~chgis/data/japan/|Japan Historical Map Data (Harvard)]]
   * [[https://www.jodc.go.jp/aboutJODC_work_data.html|Japan Oceanographic Data Center (JODC)]]   * [[https://www.jodc.go.jp/aboutJODC_work_data.html|Japan Oceanographic Data Center (JODC)]]
Line 52: Line 67:
   * [[http://codh.rois.ac.jp/edo-spots/|Edo Sightseeing Guide]]   * [[http://codh.rois.ac.jp/edo-spots/|Edo Sightseeing Guide]]
  
 +=====Stopwords=====
  
 +**[[https://github.com/stopwords-iso/stopwords-ja/blob/master/stopwords-ja.txt|Common Stopwords for Japanese]]**
 +[[https://github.com/stopwords-iso|Stopwords ISO]]
  
 =====Image & Video Data===== =====Image & Video Data=====
Line 63: Line 81:
   * [[https://www.nii.ac.jp/dsc/idr/rdata/KoSign/|Kokugakuin University Japanese Sign Language Database (KoSign)]]   * [[https://www.nii.ac.jp/dsc/idr/rdata/KoSign/|Kokugakuin University Japanese Sign Language Database (KoSign)]]
   * [[https://www.nii.ac.jp/dsc/idr/rdata/Hazumi/|Osaka University Multimodal Dialogue Corpus (Hazumi)]]   * [[https://www.nii.ac.jp/dsc/idr/rdata/Hazumi/|Osaka University Multimodal Dialogue Corpus (Hazumi)]]
 +  * [[https://www.nii.ac.jp/dsc/idr/rdata/TDU-NEDO/|Group Communication Corpus (TDU-NEDO)]]
 +  * [[https://www.nii.ac.jp/dsc/idr/trigger/|Trigger Co. Ltd. Animation Dataset]]
 +  * [[https://www.nii.ac.jp/dsc/idr/sansan/|Sansan business card Dataset]]
 =====IIIF===== =====IIIF=====
  
   * [[http://bauddha.dhii.jp/SAT/iiifmani/show.php|IIIF Manifests for Buddhist Studies]]   * [[http://bauddha.dhii.jp/SAT/iiifmani/show.php|IIIF Manifests for Buddhist Studies]]
  
datasets.1654619299.txt.gz · Last modified: 2022/06/07 16:28 by prcurtis