regex
Regular Expressions (Regex) for Japanese
The regular expressions provided below are separated by source and included as code blocks for easy copy-pasting. They can also be downloaded as text files by clicking the link in the tab of each code block.
Expressions below provided by Hoyt Long (University of Chicago).
- jpn_reg.txt
HTML TAGS: <[^<]+?> WORD1 OR WORD2: 学校|學校 SENTENCE: [^!?。]*[!?。」] QUOTATION: 「[^」]*」 ALL HIRAGANA: [ぁ-ゟ ]+ ALL KATAKANA: [゠-ヿ]+ ALL KANJI: [\u4E00-\u9FEF] METAPHOR (?): .{3}(のように|みたいに).{3}
Additional formulas from Regular Expressions 101.
Expressions below collected from the defunct Crunchytoast page and Terrance Snyder's Github repository.
- jpn_reg_crunchytoast.txt
Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf) ([一-龯]) Regex for matching Hirgana or Katakana ([ぁ-んァ-ン]) Regex for matching Non-Hirgana or Non-Katakana ([^ぁ-んァ-ン]) Regex for matching Hirgana or Katakana or basic punctuation (、。’) ([ぁ-んァ-ン\w]) Regex for matching Hirgana or Katakana and random other characters ([ぁ-んァ-ン!:/]) Regex for matching Hirgana ([ぁ-ん]) Regex for matching full-width Katakana (zenkaku 全角) ([ァ-ン]) Regex for matching half-width Katakana (hankaku 半角) ([ァ-ン゙゚]) Regex for matching full-width Numbers (zenkaku 全角) ([0-9]) Regex for matching full-width Letters (zenkaku 全角) ([A-z]) Regex for matching Hiragana codespace characters (includes non phonetic characters) ([ぁ-ゞ]) Regex for matching full-width (zenkaku) Katakana codespace characters (includes non phonetic characters) ([ァ-ヶ]) Regex for matching half-width (hankaku) Katakana codespace characters (this is an old character set so the order is inconsistent with the hiragana) ([ヲ-゚]) Regex for matching Japanese Post Codes /^¥d{3}¥-¥d{4}$/ /^¥d{3}-¥d{4}$|^¥d{3}-¥d{2}$|^¥d{3}$/ Regex for matching Japanese mobile phone numbers (keitai bangou) /^¥d{3}-¥d{4}-¥d{4}$|^¥d{11}$/ /^0¥d0-¥d{4}-¥d{4}$/ Regex for matching Japanese fixed line phone numbers /^[0-9-]{6,9}$|^[0-9-]{12}$/ /^¥d{1,4}-¥d{4}$|^¥d{2,5}-¥d{1,4}-¥d{4}$/ Update from 2014 by user cb372 Hiragana = [ぁ-ゔゞ゛゜ー] // 0x3041-0x3094, 0x309E, 0x309B, 0x309C, 0x30FC Katakana = [ァ-・ヽヾ゛゜ー] // 0x30A1-0x30FB, 0x30FD, ヾ, 0x309B, 0x309C, 0x30FC Hiragana or katakana = [ぁ-ゔゞァ-・ヽヾ゛゜ー] // 0x3041-0x3094, 0x309E, 0x30A1-0x30FB, 0x30FD, ヾ, 0x309B, 0x309C, 0x30FC Update from 2019 by user minhloc2011 Just updated full-width Katakana from「30A1」~「30FE」 (Unicode:30FB). Regex for matching full-width Katakana (zenkaku 全角) ([ァ-ン]) Replace to: ([ァ-ヾ])
regex.txt · Last modified: 2022/06/21 18:06 by prcurtis