===== Regular Expressions (Regex) for Japanese =====
The regular expressions provided below are separated by source and included as code blocks for easy copy-pasting. They can also be downloaded as text files by clicking the link in the tab of each code block.
//Expressions below provided by Hoyt Long (University of Chicago).//
HTML TAGS: <[^<]+?>
WORD1 OR WORD2: 学校|學校
SENTENCE: [^!?。]*[!?。」]
QUOTATION: 「[^」]*」
ALL HIRAGANA: [ぁ-ゟ ]+
ALL KATAKANA: [゠-ヿ]+
ALL KANJI: [\u4E00-\u9FEF]
METAPHOR (?): .{3}(のように|みたいに).{3}
//Additional formulas from [[https://regex101.com/r/xhHFs2/1|Regular Expressions 101]].//
//Expressions below collected from the defunct [[https://web.archive.org/web/20120422073323/http://crunchytoast.com/2009/12/12/japanese-regex-alzheimers-and-why-cant-i-remember/|Crunchytoast page]] and Terrance Snyder's [[https://gist.github.com/terrancesnyder/1345094|Github repository]].//
Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf)
([一-龯])
Regex for matching Hirgana or Katakana
([ぁ-んァ-ン])
Regex for matching Non-Hirgana or Non-Katakana
([^ぁ-んァ-ン])
Regex for matching Hirgana or Katakana or basic punctuation (、。’)
([ぁ-んァ-ン\w])
Regex for matching Hirgana or Katakana and random other characters
([ぁ-んァ-ン!:/])
Regex for matching Hirgana
([ぁ-ん])
Regex for matching full-width Katakana (zenkaku 全角)
([ァ-ン])
Regex for matching half-width Katakana (hankaku 半角)
([ァ-ン゙゚])
Regex for matching full-width Numbers (zenkaku 全角)
([0-9])
Regex for matching full-width Letters (zenkaku 全角)
([A-z])
Regex for matching Hiragana codespace characters
(includes non phonetic characters)
([ぁ-ゞ])
Regex for matching full-width (zenkaku) Katakana codespace characters
(includes non phonetic characters)
([ァ-ヶ])
Regex for matching half-width (hankaku) Katakana codespace characters
(this is an old character set so the order is inconsistent with the hiragana)
([ヲ-゚])
Regex for matching Japanese Post Codes
/^¥d{3}¥-¥d{4}$/
/^¥d{3}-¥d{4}$|^¥d{3}-¥d{2}$|^¥d{3}$/
Regex for matching Japanese mobile phone numbers (keitai bangou)
/^¥d{3}-¥d{4}-¥d{4}$|^¥d{11}$/
/^0¥d0-¥d{4}-¥d{4}$/
Regex for matching Japanese fixed line phone numbers
/^[0-9-]{6,9}$|^[0-9-]{12}$/
/^¥d{1,4}-¥d{4}$|^¥d{2,5}-¥d{1,4}-¥d{4}$/
Update from 2014 by user cb372
Hiragana = [ぁ-ゔゞ゛゜ー] // 0x3041-0x3094, 0x309E, 0x309B, 0x309C, 0x30FC
Katakana = [ァ-・ヽヾ゛゜ー] // 0x30A1-0x30FB, 0x30FD, ヾ, 0x309B, 0x309C, 0x30FC
Hiragana or katakana = [ぁ-ゔゞァ-・ヽヾ゛゜ー] // 0x3041-0x3094, 0x309E, 0x30A1-0x30FB, 0x30FD, ヾ, 0x309B, 0x309C, 0x30FC
Update from 2019 by user minhloc2011
Just updated full-width Katakana from「30A1」~「30FE」 (Unicode:30FB).
Regex for matching full-width Katakana (zenkaku 全角)
([ァ-ン])
Replace to:
([ァ-ヾ])