====== Tutorials & Reviews ====== ===== Cleaning Data ===== **[[https://www.mollydesjardin.com/blog/taiyo-project-first-steps-with-data/|Taiyō Project: First Steps with Data]]** [[https://www.mollydesjardin.com/|Molly Des Jardin]] **[[http://darthcrimson.org/digital-japanese-literature-aozora-bunko/|Digital Japanese Literature: Aozora Bunko]]** (removing ruby from texts) [[https://experiencing.art/|Christopher Morse]] ===== Dictionaries for Word Segmentation ===== **[[https://www.dampfkraft.com/nlp/japanese-tokenizer-dictionaries.html|An Overview of Japanese Tokenizer Dictionaries]]** [[https://www.dampfkraft.com/|Paul McCann]] ===== Encoding ===== **[[https://www.sljfaq.org/afaq/encodings.html|Encodings of Japanese]]** Alexandre Elias ===== IIIF ===== **[[http://www.ch-suzuki.com/icpt/index.html.en|IIIF Curation Platform (ICP) Tutorial]]** [[http://www.ch-suzuki.com/index.html|Suzuki Chikahiko]] ===== Mapping ===== **[[https://www.mstavros.com/mapping-guide|From Bunkachō to Google Maps ]]** [[https://www.mstavros.com/home|Matthew Stavros]] **[[https://digitalorientalist.com/2021/11/09/i-just-want-the-data-a-short-guide-to-gsi-japan-for-non-japanese-speaking-users/|“I Just Want the Data!”: A Short Guide to GSI Japan for Non-Japanese-Speaking Users]]** [[https://digitalorientalist.com/author/pulpbindandbond/|Matthew Hayes]] ===== OCR & Kuzushiji Reading===== **[[https://www.youtube.com/watch?v=ZUS7rSKGscc|An Introduction to the miwo kuzushiji app]]** [[http://codh.rois.ac.jp/|ROIS-DS CODH]] **[[https://digitalorientalist.com/2020/02/18/cursive-japanese-and-ocr-using-kuronet/|Cursive Japanese and OCR: Using KuroNet]]** [[https://digitalorientalist.com/author/morrisjh/|James Harry Morris]], //[[https://digitalorientalist.com|The Digital Orientalist]]// **[[https://digitalorientalist.com/2022/06/03/practicing-reading-cursive-japanese-with-miwo/|Practicing Reading Cursive Japanese with Miwo]]** [[https://digitalorientalist.com/author/morrisjh/|James Harry Morris]], //[[https://digitalorientalist.com|The Digital Orientalist]]// **[[https://digitalorientalist.com/2021/04/09/google-docs-and-ocr-some-experiments-transcribing-japanese-language-texts/|Google Docs and OCR: Some Experiments Transcribing Japanese Language Texts]]** [[https://digitalorientalist.com/author/morrisjh/|James Harry Morris]], //[[https://digitalorientalist.com|The Digital Orientalist]]// ===== Text Segmentation ===== **[[https://digitalorientalist.com/2019/03/19/japanese-text-segmentation-with-web-chamame/|Japanese Text Segmentation and Analysis with Web ChaMame]]** [[https://tsukuba.academia.edu/JamesMorris|James Harry Morris]], //[[https://digitalorientalist.com|The Digital Orientalist]]// **[[https://aclanthology.org/2020.nlposs-1.7/|fugashi, a Tool for Tokenizing Japanese in Python]]** **[[https://slideslive.com/38939744/fugashi-a-tool-for-japanese-tokenization|fugashi: A Tool for Japanese Tokenization]]** [[https://www.dampfkraft.com/|Paul McCann]] **[[https://towardsdatascience.com/how-japanese-tokenizers-work-87ab6b256984|How Japanese Tokenizers Work]]** [[https://medium.com/@wanasit?source=post_page-----87ab6b256984--------------------------------|Wanasit Tanakitrungruang]] **[[https://digitalorientalist.com/2021/05/11/basic-python-for-japanese-studies-using-fugashi-for-text-segmentation/|Basic Python for Japanese Studies: Using fugashi for Text Segmentation]]** [[https://tsukuba.academia.edu/JamesMorris|James Harry Morris]], //[[https://digitalorientalist.com|The Digital Orientalist]]// **[[https://clrd.ninjal.ac.jp/tutorial.html|Tutorials on linguistic corpora (J)]]** [[https://www.ninjal.ac.jp/english/|National Institute for Japanese Language and Linguistics (国立国語研究所)]] **[[https://digitalorientalist.com/2022/12/09/genius-loci-extracting-names-and-places-from-japanese-texts/|Genius loci: extracting names and places from Japanese texts]]** [[https://digitalorientalist.com/about-anna-oskina/|Anna Oskina]], //[[https://digitalorientalist.com|The Digital Orientalist]]// ===== Text Mining ===== **[[http://laits.utexas.edu/~mr56267/Japanese_Text_Mining/Harvard_Jtextmining_intro.html|An Introduction to Japanese Text Mining]]** [[https://liberalarts.utexas.edu/history/faculty/mr56267/|Mark Ravina]] (UT Austin) **[[https://digitalorientalist.com/2021/06/18/using-voyant-tools-with-historical-japanese-texts/|Using Voyant Tools with Historical Japanese Texts]]** [[https://digitalorientalist.com/author/morrisjh/|James Harry Morris]], //[[https://digitalorientalist.com|The Digital Orientalist]]// **[[https://leanpub.com/japanesenlp|Introduction to Japanese Natural Language Processing]]** [[https://twitter.com/mhagiwara|Masato Hagiwara]] and [[https://www.dampfkraft.com/|Paul O'Leary McCann]] **[[https://digitalorientalist.com/2022/12/09/genius-loci-extracting-names-and-places-from-japanese-texts/|Genius loci: extracting names and places from Japanese texts]]** [[https://digitalorientalist.com/about-anna-oskina/|Anna Oskina]], //[[https://digitalorientalist.com|The Digital Orientalist]]// **[[https://steviepoppe.net/blog/2020/04/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 1: Twitter Data Collection]] [[https://steviepoppe.net/blog/2020/05/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter-part-2/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 2: Basic Metrics & Graphs]] [[https://steviepoppe.net/blog/2020/06/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter-part-3/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 3: Natural Language Processing With MeCab, Neologd and KH Coder]] [[https://steviepoppe.net/blog/2020/07/a-quick-guide-to-data-mining-textual-analysis-of-japanese-twitter-part-4/|A Quick Guide to Data-mining & (Textual) Analysis of (Japanese) Twitter Part 4: Natural Language Processing With MeCab, Neologd and NLTK]]** [[https://steviepoppe.net/|Stevie Poppe]] ===== Webscraping ===== **[[https://www.mollydesjardin.com/blog/crawling-aozora-bunko/|Crawling Aozora Bunko]]** [[https://www.mollydesjardin.com/|Molly Des Jardin]] **[[https://digitalorientalist.com/2020/01/14/web-scraping-with-python-for-beginners/|Web Scraping with Python for Beginners]]** [[https://tsukuba.academia.edu/JamesMorris|James Harry Morris]], //[[https://digitalorientalist.com|The Digital Orientalist]]//