This is an old revision of the document!
Table of Contents
Tutorials & Reviews
Cleaning Data
Taiyō Project: First Steps with Data
Molly Des Jardin
Digital Japanese Literature: Aozora Bunko (removing ruby from texts)
Christopher Morse
Regular Expressions (regex) for Japanese
Provided by Hoyt Long. Download as a text file by clicking the link in the tab above.
- jpn_reg.txt
HTML TAGS: <[^<]+?> WORD1 OR WORD2: 学校|學校 SENTENCE: [^!?。]*[!?。」] QUOTATION: 「[^」]*」 ALL HIRAGANA: [ぁ-ゟ ]+ ALL KATAKANA: [゠-ヿ]+ ALL KANJI: [\u4E00-\u9FEF] METAPHOR (?): .{3}(のように|みたいに).{3}
Encoding
Encodings of Japanese
Alexandre Elias
IIIF
Mapping
OCR & Kuzushiji Reading
Cursive Japanese and OCR: Using KuroNet
James Harry Morris, The Digital Orientalist
An Introduction to the miwo kuzushiji app
ROIS-DS CODH
Google Docs and OCR: Some Experiments Transcribing Japanese Language Texts
James Harry Morris, The Digital Orientalist
Text Segmentation
Japanese Text Segmentation and Analysis with Web ChaMame
James Harry Morris, The Digital Orientalist
Basic Python for Japanese Studies: Using fugashi for Text Segmentation
James Harry Morris, The Digital Orientalist
Tutorials on linguistic corpora (J)
National Institute for Japanese Language and Linguistics (国立国語研究所)
Text Mining
An Introduction to Japanese Text Mining
Mark Ravina (UT Austin)
Using Voyant Tools with Historical Japanese Texts
James Harry Morris, The Digital Orientalist