User Tools

Site Tools


tutorials

This is an old revision of the document!


Tutorials & Reviews

Cleaning Data

Taiyō Project: First Steps with Data
Molly Des Jardin

Digital Japanese Literature: Aozora Bunko (removing ruby from texts)
Christopher Morse

Regular Expressions (regex) for Japanese
Provided by Hoyt Long. Download as a text file by clicking the link in the tab above.

jpn_reg.txt
HTML TAGS: <[^<]+?>
 
WORD1 OR WORD2: 学校|學校
 
SENTENCE: [^!?。]*[!?。」]
 
QUOTATION: 「[^」]*」
 
ALL HIRAGANA: [ぁ-ゟ ]+  
 
ALL KATAKANA: [゠-ヿ]+   
 
ALL KANJI: [\u4E00-\u9FEF]
 
METAPHOR (?): .{3}(のように|みたいに).{3}

Encoding

Encodings of Japanese
Alexandre Elias

IIIF

Mapping

OCR & Kuzushiji Reading

Text Segmentation

Text Mining

Webscraping

tutorials.1655822865.txt.gz · Last modified: 2022/06/21 14:47 by prcurtis