This is an old revision of the document!
Table of Contents
Tutorials & Reviews
Cleaning Data
Taiyō Project: First Steps with Data
Molly Des Jardin
Digital Japanese Literature: Aozora Bunko (removing ruby from texts)
Christopher Morse
Dictionaries for Word Segmentation
An Overview of Japanese Tokenizer Dictionaries
Paul McCann
===== Encoding =====
Encodings of Japanese
Alexandre Elias
===== IIIF =====
IIIF Curation Platform (ICP) Tutorial
Suzuki Chikahiko
===== Mapping =====
From Bunkachō to Google Maps
Matthew Stavros
===== OCR & Kuzushiji Reading=====
An Introduction to the miwo kuzushiji app
ROIS-DS CODH
Cursive Japanese and OCR: Using KuroNet
James Harry Morris, The Digital Orientalist
Practicing Reading Cursive Japanese with Miwo
James Harry Morris, The Digital Orientalist
Google Docs and OCR: Some Experiments Transcribing Japanese Language Texts
James Harry Morris, The Digital Orientalist
===== Text Segmentation =====
Japanese Text Segmentation and Analysis with Web ChaMame
James Harry Morris, The Digital Orientalist
fugashi, a Tool for Tokenizing Japanese in Python
fugashi: A Tool for Japanese Tokenization
Paul McCann
How Japanese Tokenizers Work
Wanasit Tanakitrungruang
Basic Python for Japanese Studies: Using fugashi for Text Segmentation
James Harry Morris, The Digital Orientalist
Tutorials on linguistic corpora (J)
National Institute for Japanese Language and Linguistics (国立国語研究所)
===== Text Mining =====
An Introduction to Japanese Text Mining
Mark Ravina (UT Austin)
Using Voyant Tools with Historical Japanese Texts
James Harry Morris, The Digital Orientalist
Introduction to Japanese Natural Language Processing
Masato Hagiwara and Paul O'Leary McCann
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 1: Twitter Data Collection
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 2: Basic Metrics & Graphs
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 3: Natural Language Processing With MeCab, Neologd and KH Coder
A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 4: Natural Language Processing With MeCab, Neologd and NLTK
Stevie Poppe
===== Webscraping =====
Crawling Aozora Bunko
Molly Des Jardin
Web Scraping with Python for Beginners**
James Harry Morris, The Digital Orientalist