===== Regular Expressions (Regex) for Japanese ===== The regular expressions provided below are separated by source and included as code blocks for easy copy-pasting. They can also be downloaded as text files by clicking the link in the tab of each code block. //Expressions below provided by Hoyt Long (University of Chicago).// HTML TAGS: <[^<]+?> WORD1 OR WORD2: 学校|學校 SENTENCE: [^!?。]*[!?。」] QUOTATION: 「[^」]*」 ALL HIRAGANA: [ぁ-ゟ ]+ ALL KATAKANA: [゠-ヿ]+ ALL KANJI: [\u4E00-\u9FEF] METAPHOR (?): .{3}(のように|みたいに).{3} //Additional formulas from [[https://regex101.com/r/xhHFs2/1|Regular Expressions 101]].// //Expressions below collected from the defunct [[https://web.archive.org/web/20120422073323/http://crunchytoast.com/2009/12/12/japanese-regex-alzheimers-and-why-cant-i-remember/|Crunchytoast page]] and Terrance Snyder's [[https://gist.github.com/terrancesnyder/1345094|Github repository]].// Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf) ([一-龯]) Regex for matching Hirgana or Katakana ([ぁ-んァ-ン]) Regex for matching Non-Hirgana or Non-Katakana ([^ぁ-んァ-ン]) Regex for matching Hirgana or Katakana or basic punctuation (、。’) ([ぁ-んァ-ン\w]) Regex for matching Hirgana or Katakana and random other characters ([ぁ-んァ-ン!:/]) Regex for matching Hirgana ([ぁ-ん]) Regex for matching full-width Katakana (zenkaku 全角) ([ァ-ン]) Regex for matching half-width Katakana (hankaku 半角) ([ァ-ン゙゚]) Regex for matching full-width Numbers (zenkaku 全角) ([0-9]) Regex for matching full-width Letters (zenkaku 全角) ([A-z]) Regex for matching Hiragana codespace characters (includes non phonetic characters) ([ぁ-ゞ]) Regex for matching full-width (zenkaku) Katakana codespace characters (includes non phonetic characters) ([ァ-ヶ]) Regex for matching half-width (hankaku) Katakana codespace characters (this is an old character set so the order is inconsistent with the hiragana) ([ヲ-゚]) Regex for matching Japanese Post Codes /^¥d{3}¥-¥d{4}$/ /^¥d{3}-¥d{4}$|^¥d{3}-¥d{2}$|^¥d{3}$/ Regex for matching Japanese mobile phone numbers (keitai bangou) /^¥d{3}-¥d{4}-¥d{4}$|^¥d{11}$/ /^0¥d0-¥d{4}-¥d{4}$/ Regex for matching Japanese fixed line phone numbers /^[0-9-]{6,9}$|^[0-9-]{12}$/ /^¥d{1,4}-¥d{4}$|^¥d{2,5}-¥d{1,4}-¥d{4}$/ Update from 2014 by user cb372 Hiragana = [ぁ-ゔゞ゛゜ー] // 0x3041-0x3094, 0x309E, 0x309B, 0x309C, 0x30FC Katakana = [ァ-・ヽヾ゛゜ー] // 0x30A1-0x30FB, 0x30FD, ヾ, 0x309B, 0x309C, 0x30FC Hiragana or katakana = [ぁ-ゔゞァ-・ヽヾ゛゜ー] // 0x3041-0x3094, 0x309E, 0x30A1-0x30FB, 0x30FD, ヾ, 0x309B, 0x309C, 0x30FC Update from 2019 by user minhloc2011 Just updated full-width Katakana from「30A1」~「30FE」 (Unicode:30FB). Regex for matching full-width Katakana (zenkaku 全角) ([ァ-ン]) Replace to: ([ァ-ヾ])