User Tools

Site Tools


Regular Expressions (Regex) for Japanese

The regular expressions provided below are separated by source and included as code blocks for easy copy-pasting. They can also be downloaded as text files by clicking the link in the tab of each code block.

Expressions below provided by Hoyt Long (University of Chicago).

HTML TAGS: <[^<]+?>
WORD1 OR WORD2: 学校|學校
SENTENCE: [^!?。]*[!?。」]
QUOTATION: 「[^」]*」
ALL HIRAGANA: [ぁ-ゟ ]+  
ALL KATAKANA: [゠-ヿ]+   
ALL KANJI: [\u4E00-\u9FEF]
METAPHOR (?): .{3}(のように|みたいに).{3}

Additional formulas from Regular Expressions 101.

Expressions below collected from the defunct Crunchytoast page and Terrance Snyder's Github repository.

Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf)
Regex for matching Hirgana or Katakana
Regex for matching Non-Hirgana or Non-Katakana
Regex for matching Hirgana or Katakana or basic punctuation (、。’)
Regex for matching Hirgana or Katakana and random other characters
Regex for matching Hirgana
Regex for matching full-width Katakana (zenkaku 全角)
Regex for matching half-width Katakana (hankaku 半角)
Regex for matching full-width Numbers (zenkaku 全角)
Regex for matching full-width Letters (zenkaku 全角)
Regex for matching Hiragana codespace characters 
(includes non phonetic characters)
Regex for matching full-width (zenkaku) Katakana codespace characters 
(includes non phonetic characters)
Regex for matching half-width (hankaku) Katakana codespace characters 
(this is an old character set so the order is inconsistent with the hiragana)
Regex for matching Japanese Post Codes
Regex for matching Japanese mobile phone numbers (keitai bangou)
Regex for matching Japanese fixed line phone numbers
Update from 2014 by user cb372
Hiragana = [ぁ-ゔゞ゛゜ー]  // 0x3041-0x3094, 0x309E, 0x309B, 0x309C, 0x30FC
Katakana = [ァ-・ヽヾ゛゜ー]  // 0x30A1-0x30FB, 0x30FD, &#x30FE, 0x309B, 0x309C, 0x30FC
Hiragana or katakana = [ぁ-ゔゞァ-・ヽヾ゛゜ー]  // 0x3041-0x3094, 0x309E, 0x30A1-0x30FB, 0x30FD, &#x30FE, 0x309B, 0x309C, 0x30FC
Update from 2019 by user minhloc2011
Just updated full-width Katakana from「30A1」~「30FE」 (Unicode:30FB).
Regex for matching full-width Katakana (zenkaku 全角)
Replace to:
regex.txt · Last modified: 2022/06/21 18:06 by prcurtis