User Tools

Site Tools


tutorials

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tutorials [2022/06/22 18:18] – [Text Segmentation] prcurtistutorials [2022/12/10 17:10] (current) – [OCR & Kuzushiji Reading] prcurtis
Line 8: Line 8:
 **[[http://darthcrimson.org/digital-japanese-literature-aozora-bunko/|Digital Japanese Literature: Aozora Bunko]]** (removing ruby from texts) **[[http://darthcrimson.org/digital-japanese-literature-aozora-bunko/|Digital Japanese Literature: Aozora Bunko]]** (removing ruby from texts)
 [[https://experiencing.art/|Christopher Morse]] [[https://experiencing.art/|Christopher Morse]]
 +
 +===== Dictionaries for Word Segmentation =====
 +
 +**[[https://www.dampfkraft.com/nlp/japanese-tokenizer-dictionaries.html|An Overview of Japanese Tokenizer Dictionaries]]**
 +[[https://www.dampfkraft.com/|Paul McCann]]
  
 ===== Encoding ===== ===== Encoding =====
Line 24: Line 29:
 [[https://www.mstavros.com/home|Matthew Stavros]] [[https://www.mstavros.com/home|Matthew Stavros]]
  
 +**[[https://digitalorientalist.com/2021/11/09/i-just-want-the-data-a-short-guide-to-gsi-japan-for-non-japanese-speaking-users/|“I Just Want the Data!”: A Short Guide to GSI Japan for Non-Japanese-Speaking Users]]**
 +[[https://digitalorientalist.com/author/pulpbindandbond/|Matthew Hayes]]
 ===== OCR & Kuzushiji Reading===== ===== OCR & Kuzushiji Reading=====
  
Line 55: Line 62:
 **[[https://clrd.ninjal.ac.jp/tutorial.html|Tutorials on linguistic corpora (J)]]** **[[https://clrd.ninjal.ac.jp/tutorial.html|Tutorials on linguistic corpora (J)]]**
 [[https://www.ninjal.ac.jp/english/|National Institute for Japanese Language and Linguistics (国立国語研究所)]] [[https://www.ninjal.ac.jp/english/|National Institute for Japanese Language and Linguistics (国立国語研究所)]]
 +
 +
 +**[[https://digitalorientalist.com/2022/12/09/genius-loci-extracting-names-and-places-from-japanese-texts/|Genius loci: extracting names and places from Japanese texts]]**
 +[[https://digitalorientalist.com/about-anna-oskina/|Anna Oskina]], //[[https://digitalorientalist.com|The Digital Orientalist]]//
  
 ===== Text Mining ===== ===== Text Mining =====
Line 67: Line 78:
 **[[https://leanpub.com/japanesenlp|Introduction to Japanese Natural Language Processing]]** **[[https://leanpub.com/japanesenlp|Introduction to Japanese Natural Language Processing]]**
 [[https://twitter.com/mhagiwara|Masato Hagiwara]] and [[https://www.dampfkraft.com/|Paul O'Leary McCann]] [[https://twitter.com/mhagiwara|Masato Hagiwara]] and [[https://www.dampfkraft.com/|Paul O'Leary McCann]]
 +
 +
 +**[[https://digitalorientalist.com/2022/12/09/genius-loci-extracting-names-and-places-from-japanese-texts/|Genius loci: extracting names and places from Japanese texts]]**
 +[[https://digitalorientalist.com/about-anna-oskina/|Anna Oskina]], //[[https://digitalorientalist.com|The Digital Orientalist]]//
  
 **[[https://steviepoppe.net/blog/2020/04/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 1: Twitter Data Collection]] **[[https://steviepoppe.net/blog/2020/04/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 1: Twitter Data Collection]]
tutorials.1655921887.txt.gz · Last modified: 2022/06/22 18:18 by prcurtis