Table of Contents
Tutorials & Reviews
Cleaning Data
Taiyō Project: First Steps with Data
Molly Des Jardin
Digital Japanese Literature: Aozora Bunko (removing ruby from texts)
Christopher Morse
Dictionaries for Word Segmentation
Encoding
Encodings of Japanese
Alexandre Elias
IIIF
Mapping
From Bunkachō to Google Maps
Matthew Stavros
“I Just Want the Data!”: A Short Guide to GSI Japan for Non-Japanese-Speaking Users
Matthew Hayes
OCR & Kuzushiji Reading
An Introduction to the miwo kuzushiji app
ROIS-DS CODH
Cursive Japanese and OCR: Using KuroNet
James Harry Morris, The Digital Orientalist
Practicing Reading Cursive Japanese with Miwo
James Harry Morris, The Digital Orientalist
Google Docs and OCR: Some Experiments Transcribing Japanese Language Texts
James Harry Morris, The Digital Orientalist
Text Segmentation
Japanese Text Segmentation and Analysis with Web ChaMame
James Harry Morris, The Digital Orientalist
fugashi, a Tool for Tokenizing Japanese in Python
fugashi: A Tool for Japanese Tokenization
Paul McCann
How Japanese Tokenizers Work
Wanasit Tanakitrungruang
Basic Python for Japanese Studies: Using fugashi for Text Segmentation
James Harry Morris, The Digital Orientalist
Tutorials on linguistic corpora (J)
National Institute for Japanese Language and Linguistics (国立国語研究所)
Genius loci: extracting names and places from Japanese texts
Anna Oskina, The Digital Orientalist
Text Mining
An Introduction to Japanese Text Mining
Mark Ravina (UT Austin)
Using Voyant Tools with Historical Japanese Texts
James Harry Morris, The Digital Orientalist
Introduction to Japanese Natural Language Processing
Masato Hagiwara and Paul O'Leary McCann
Genius loci: extracting names and places from Japanese texts
Anna Oskina, The Digital Orientalist
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 1: Twitter Data Collection
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 2: Basic Metrics & Graphs
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 3: Natural Language Processing With MeCab, Neologd and KH Coder
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 4: Natural Language Processing With MeCab, Neologd and NLTK
Stevie Poppe