User Tools

Site Tools


tutorials

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
tutorials [2022/06/21 23:24] – [Text Mining] prcurtistutorials [2022/12/10 17:10] (current) – [OCR & Kuzushiji Reading] prcurtis
Line 8: Line 8:
 **[[http://darthcrimson.org/digital-japanese-literature-aozora-bunko/|Digital Japanese Literature: Aozora Bunko]]** (removing ruby from texts) **[[http://darthcrimson.org/digital-japanese-literature-aozora-bunko/|Digital Japanese Literature: Aozora Bunko]]** (removing ruby from texts)
 [[https://experiencing.art/|Christopher Morse]] [[https://experiencing.art/|Christopher Morse]]
 +
 +===== Dictionaries for Word Segmentation =====
 +
 +**[[https://www.dampfkraft.com/nlp/japanese-tokenizer-dictionaries.html|An Overview of Japanese Tokenizer Dictionaries]]**
 +[[https://www.dampfkraft.com/|Paul McCann]]
  
 ===== Encoding ===== ===== Encoding =====
Line 24: Line 29:
 [[https://www.mstavros.com/home|Matthew Stavros]] [[https://www.mstavros.com/home|Matthew Stavros]]
  
 +**[[https://digitalorientalist.com/2021/11/09/i-just-want-the-data-a-short-guide-to-gsi-japan-for-non-japanese-speaking-users/|“I Just Want the Data!”: A Short Guide to GSI Japan for Non-Japanese-Speaking Users]]**
 +[[https://digitalorientalist.com/author/pulpbindandbond/|Matthew Hayes]]
 ===== OCR & Kuzushiji Reading===== ===== OCR & Kuzushiji Reading=====
  
Line 46: Line 53:
 **[[https://slideslive.com/38939744/fugashi-a-tool-for-japanese-tokenization|fugashi: A Tool for Japanese Tokenization]]** **[[https://slideslive.com/38939744/fugashi-a-tool-for-japanese-tokenization|fugashi: A Tool for Japanese Tokenization]]**
 [[https://www.dampfkraft.com/|Paul McCann]] [[https://www.dampfkraft.com/|Paul McCann]]
 +
 +**[[https://towardsdatascience.com/how-japanese-tokenizers-work-87ab6b256984|How Japanese Tokenizers Work]]**
 +[[https://medium.com/@wanasit?source=post_page-----87ab6b256984--------------------------------|Wanasit Tanakitrungruang]]
  
 **[[https://digitalorientalist.com/2021/05/11/basic-python-for-japanese-studies-using-fugashi-for-text-segmentation/|Basic Python for Japanese Studies: Using fugashi for Text Segmentation]]** **[[https://digitalorientalist.com/2021/05/11/basic-python-for-japanese-studies-using-fugashi-for-text-segmentation/|Basic Python for Japanese Studies: Using fugashi for Text Segmentation]]**
Line 52: Line 62:
 **[[https://clrd.ninjal.ac.jp/tutorial.html|Tutorials on linguistic corpora (J)]]** **[[https://clrd.ninjal.ac.jp/tutorial.html|Tutorials on linguistic corpora (J)]]**
 [[https://www.ninjal.ac.jp/english/|National Institute for Japanese Language and Linguistics (国立国語研究所)]] [[https://www.ninjal.ac.jp/english/|National Institute for Japanese Language and Linguistics (国立国語研究所)]]
 +
 +
 +**[[https://digitalorientalist.com/2022/12/09/genius-loci-extracting-names-and-places-from-japanese-texts/|Genius loci: extracting names and places from Japanese texts]]**
 +[[https://digitalorientalist.com/about-anna-oskina/|Anna Oskina]], //[[https://digitalorientalist.com|The Digital Orientalist]]//
  
 ===== Text Mining ===== ===== Text Mining =====
Line 65: Line 79:
 [[https://twitter.com/mhagiwara|Masato Hagiwara]] and [[https://www.dampfkraft.com/|Paul O'Leary McCann]] [[https://twitter.com/mhagiwara|Masato Hagiwara]] and [[https://www.dampfkraft.com/|Paul O'Leary McCann]]
  
 +
 +**[[https://digitalorientalist.com/2022/12/09/genius-loci-extracting-names-and-places-from-japanese-texts/|Genius loci: extracting names and places from Japanese texts]]**
 +[[https://digitalorientalist.com/about-anna-oskina/|Anna Oskina]], //[[https://digitalorientalist.com|The Digital Orientalist]]//
 +
 +**[[https://steviepoppe.net/blog/2020/04/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 1: Twitter Data Collection]]
 +[[https://steviepoppe.net/blog/2020/05/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter-part-2/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 2: Basic Metrics & Graphs]]
 +[[https://steviepoppe.net/blog/2020/06/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter-part-3/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 3: Natural Language Processing With MeCab, Neologd and KH Coder]]
 +[[https://steviepoppe.net/blog/2020/07/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter-part-4/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 4: Natural Language Processing With MeCab, Neologd and NLTK]]**
 +[[https://steviepoppe.net/|Stevie Poppe]]
 ===== Webscraping ===== ===== Webscraping =====
  
tutorials.1655853864.txt.gz · Last modified: 2022/06/21 23:24 by prcurtis