Table of Contents

Tutorials & Reviews

Cleaning Data

Taiyō Project: First Steps with Data
Molly Des Jardin

Digital Japanese Literature: Aozora Bunko (removing ruby from texts)
Christopher Morse

Dictionaries for Word Segmentation

An Overview of Japanese Tokenizer Dictionaries
Paul McCann

Encoding

Encodings of Japanese
Alexandre Elias

IIIF

IIIF Curation Platform (ICP) Tutorial
Suzuki Chikahiko

Mapping

From Bunkachō to Google Maps
Matthew Stavros

“I Just Want the Data!”: A Short Guide to GSI Japan for Non-Japanese-Speaking Users
Matthew Hayes

OCR & Kuzushiji Reading

An Introduction to the miwo kuzushiji app
ROIS-DS CODH

Cursive Japanese and OCR: Using KuroNet
James Harry Morris, The Digital Orientalist

Practicing Reading Cursive Japanese with Miwo
James Harry Morris, The Digital Orientalist

Google Docs and OCR: Some Experiments Transcribing Japanese Language Texts
James Harry Morris, The Digital Orientalist

Text Segmentation

Japanese Text Segmentation and Analysis with Web ChaMame
James Harry Morris, The Digital Orientalist

fugashi, a Tool for Tokenizing Japanese in Python
fugashi: A Tool for Japanese Tokenization
Paul McCann

How Japanese Tokenizers Work
Wanasit Tanakitrungruang

Basic Python for Japanese Studies: Using fugashi for Text Segmentation
James Harry Morris, The Digital Orientalist

Tutorials on linguistic corpora (J)
National Institute for Japanese Language and Linguistics (国立国語研究所)

Genius loci: extracting names and places from Japanese texts
Anna Oskina, The Digital Orientalist

Text Mining

An Introduction to Japanese Text Mining
Mark Ravina (UT Austin)

Using Voyant Tools with Historical Japanese Texts
James Harry Morris, The Digital Orientalist

Introduction to Japanese Natural Language Processing
Masato Hagiwara and Paul O'Leary McCann

Genius loci: extracting names and places from Japanese texts
Anna Oskina, The Digital Orientalist

A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 1: Twitter Data Collection
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 2: Basic Metrics & Graphs
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 3: Natural Language Processing With MeCab, Neologd and KH Coder
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 4: Natural Language Processing With MeCab, Neologd and NLTK

Stevie Poppe

Webscraping

Crawling Aozora Bunko
Molly Des Jardin

Web Scraping with Python for Beginners
James Harry Morris, The Digital Orientalist