User Tools

Site Tools


regex

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Last revisionBoth sides next revision
regex [2022/06/21 15:01] – created prcurtisregex [2022/06/21 15:12] prcurtis
Line 1: Line 1:
 ===== Regular Expressions (Regex) for Japanese ===== ===== Regular Expressions (Regex) for Japanese =====
  
-//Expressions below provided by Hoyt LongDownload as text file by clicking the link in the tab.//+The regular expressions provided below are separated by source and included as code blocks for easy copy-pastingThey can also be downloaded as text files by clicking the link in the tab of each code block. 
 + 
 +//Expressions below provided by Hoyt Long (University of Chicago).//
 <file text jpn_reg.txt> <file text jpn_reg.txt>
 HTML TAGS: <[^<]+?> HTML TAGS: <[^<]+?>
Line 18: Line 20:
  
 METAPHOR (?): .{3}(のように|みたいに).{3} METAPHOR (?): .{3}(のように|みたいに).{3}
 +</file>
 +
 +//Expressions below collected from the defunct [[https://web.archive.org/web/20120422073323/http://crunchytoast.com/2009/12/12/japanese-regex-alzheimers-and-why-cant-i-remember/|Crunchytoast page]] and Terrance Snyder's [[https://gist.github.com/terrancesnyder/1345094|Github repository]].//
 +<file text jpn_reg_crunchytoast.txt>
 +Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf)
 +([一-龯])
 +
 +Regex for matching Hirgana or Katakana
 +([ぁ-んァ-ン])
 +
 +Regex for matching Non-Hirgana or Non-Katakana
 +([^ぁ-んァ-ン])
 +
 +Regex for matching Hirgana or Katakana or basic punctuation (、。’)
 +([ぁ-んァ-ン\w])
 +
 +Regex for matching Hirgana or Katakana and random other characters
 +([ぁ-んァ-ン!:/])
 +
 +Regex for matching Hirgana
 +([ぁ-ん])
 +
 +Regex for matching full-width Katakana (zenkaku 全角)
 +([ァ-ン])
 +
 +Regex for matching half-width Katakana (hankaku 半角)
 +([ァ-ン゙゚])
 +
 +Regex for matching full-width Numbers (zenkaku 全角)
 +([0-9])
 +
 +Regex for matching full-width Letters (zenkaku 全角)
 +([A-z])
 +
 +Regex for matching Hiragana codespace characters 
 +(includes non phonetic characters)
 +([ぁ-ゞ])
 +
 +Regex for matching full-width (zenkaku) Katakana codespace characters 
 +(includes non phonetic characters)
 +([ァ-ヶ])
 +
 +Regex for matching half-width (hankaku) Katakana codespace characters 
 +(this is an old character set so the order is inconsistent with the hiragana)
 +([ヲ-゚])
 +
 +Regex for matching Japanese Post Codes
 +/^¥d{3}¥-¥d{4}$/
 +/^¥d{3}-¥d{4}$|^¥d{3}-¥d{2}$|^¥d{3}$/
 +
 +Regex for matching Japanese mobile phone numbers (keitai bangou)
 +/^¥d{3}-¥d{4}-¥d{4}$|^¥d{11}$/
 +/^0¥d0-¥d{4}-¥d{4}$/
 +
 +Regex for matching Japanese fixed line phone numbers
 +/^[0-9-]{6,9}$|^[0-9-]{12}$/
 +/^¥d{1,4}-¥d{4}$|^¥d{2,5}-¥d{1,4}-¥d{4}$/
 +
 +Update from 2014 by user cb372
 +Hiragana = [ぁ-ゔゞ゛゜ー]  // 0x3041-0x3094, 0x309E, 0x309B, 0x309C, 0x30FC
 +Katakana = [ァ-・ヽヾ゛゜ー]  // 0x30A1-0x30FB, 0x30FD, &#x30FE, 0x309B, 0x309C, 0x30FC
 +Hiragana or katakana = [ぁ-ゔゞァ-・ヽヾ゛゜ー]  // 0x3041-0x3094, 0x309E, 0x30A1-0x30FB, 0x30FD, &#x30FE, 0x309B, 0x309C, 0x30FC
 +
 +Update from 2019 by user minhloc2011
 +Just updated full-width Katakana from「30A1」~「30FE」 (Unicode:30FB).
 +Regex for matching full-width Katakana (zenkaku 全角)
 +([ァ-ン])
 +Replace to:
 +([ァ-ヾ])
 </file> </file>
regex.txt · Last modified: 2022/06/21 18:06 by prcurtis