User Tools

Site Tools


regex

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
regex [2022/06/21 15:01] – created prcurtisregex [2022/06/21 18:06] (current) prcurtis
Line 1: Line 1:
 ===== Regular Expressions (Regex) for Japanese ===== ===== Regular Expressions (Regex) for Japanese =====
  
-//Expressions below provided by Hoyt LongDownload as text file by clicking the link in the tab.//+The regular expressions provided below are separated by source and included as code blocks for easy copy-pastingThey can also be downloaded as text files by clicking the link in the tab of each code block. 
 + 
 +//Expressions below provided by Hoyt Long (University of Chicago).//
 <file text jpn_reg.txt> <file text jpn_reg.txt>
 HTML TAGS: <[^<]+?> HTML TAGS: <[^<]+?>
Line 18: Line 20:
  
 METAPHOR (?): .{3}(のように|みたいに).{3} METAPHOR (?): .{3}(のように|みたいに).{3}
 +</file>
 +
 +//Additional formulas from [[https://regex101.com/r/xhHFs2/1|Regular Expressions 101]].//
 +
 +//Expressions below collected from the defunct [[https://web.archive.org/web/20120422073323/http://crunchytoast.com/2009/12/12/japanese-regex-alzheimers-and-why-cant-i-remember/|Crunchytoast page]] and Terrance Snyder's [[https://gist.github.com/terrancesnyder/1345094|Github repository]].//
 +<file text jpn_reg_crunchytoast.txt>
 +Regex for matching ALL Japanese common & uncommon Kanji (4e00 – 9fcf)
 +([一-龯])
 +
 +Regex for matching Hirgana or Katakana
 +([ぁ-んァ-ン])
 +
 +Regex for matching Non-Hirgana or Non-Katakana
 +([^ぁ-んァ-ン])
 +
 +Regex for matching Hirgana or Katakana or basic punctuation (、。’)
 +([ぁ-んァ-ン\w])
 +
 +Regex for matching Hirgana or Katakana and random other characters
 +([ぁ-んァ-ン!:/])
 +
 +Regex for matching Hirgana
 +([ぁ-ん])
 +
 +Regex for matching full-width Katakana (zenkaku 全角)
 +([ァ-ン])
 +
 +Regex for matching half-width Katakana (hankaku 半角)
 +([ァ-ン゙゚])
 +
 +Regex for matching full-width Numbers (zenkaku 全角)
 +([0-9])
 +
 +Regex for matching full-width Letters (zenkaku 全角)
 +([A-z])
 +
 +Regex for matching Hiragana codespace characters 
 +(includes non phonetic characters)
 +([ぁ-ゞ])
 +
 +Regex for matching full-width (zenkaku) Katakana codespace characters 
 +(includes non phonetic characters)
 +([ァ-ヶ])
 +
 +Regex for matching half-width (hankaku) Katakana codespace characters 
 +(this is an old character set so the order is inconsistent with the hiragana)
 +([ヲ-゚])
 +
 +Regex for matching Japanese Post Codes
 +/^¥d{3}¥-¥d{4}$/
 +/^¥d{3}-¥d{4}$|^¥d{3}-¥d{2}$|^¥d{3}$/
 +
 +Regex for matching Japanese mobile phone numbers (keitai bangou)
 +/^¥d{3}-¥d{4}-¥d{4}$|^¥d{11}$/
 +/^0¥d0-¥d{4}-¥d{4}$/
 +
 +Regex for matching Japanese fixed line phone numbers
 +/^[0-9-]{6,9}$|^[0-9-]{12}$/
 +/^¥d{1,4}-¥d{4}$|^¥d{2,5}-¥d{1,4}-¥d{4}$/
 +
 +Update from 2014 by user cb372
 +Hiragana = [ぁ-ゔゞ゛゜ー]  // 0x3041-0x3094, 0x309E, 0x309B, 0x309C, 0x30FC
 +Katakana = [ァ-・ヽヾ゛゜ー]  // 0x30A1-0x30FB, 0x30FD, &#x30FE, 0x309B, 0x309C, 0x30FC
 +Hiragana or katakana = [ぁ-ゔゞァ-・ヽヾ゛゜ー]  // 0x3041-0x3094, 0x309E, 0x30A1-0x30FB, 0x30FD, &#x30FE, 0x309B, 0x309C, 0x30FC
 +
 +Update from 2019 by user minhloc2011
 +Just updated full-width Katakana from「30A1」~「30FE」 (Unicode:30FB).
 +Regex for matching full-width Katakana (zenkaku 全角)
 +([ァ-ン])
 +Replace to:
 +([ァ-ヾ])
 </file> </file>
regex.1655823715.txt.gz · Last modified: 2022/06/21 15:01 by prcurtis