Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
docs:8000_appendices:0500_regular_expressions [2022/08/18 12:28] – removed - external edit (Unknown date) 127.0.0.1docs:8000_appendices:0500_regular_expressions [2022/09/13 18:15] (current) – external edit 127.0.0.1
Line 1: Line 1:
 +====== Appendix E. Regular Expressions. ======
 +
 +=== Summary of regular-expression constructs ===
 +
 +^Construct^Matches^
 +^  Characters  ^^
 +|X|The character x|
 +|\\|The backslash character|
 +|\0n|The character with octal value 0n <nowiki>(0 <= n <= 7)</nowiki>|
 +|\0nn|The character with octal value 0nn <nowiki>(0 <= n <= 7)</nowiki>)|
 +|\0mnn|The character with octal value 0mnn <nowiki>(0 <= m <= 3, 0 <= n <= 7)</nowiki>|
 +|\xhh|The character with hexadecimal value 0xhh|
 +|\uhhhh|The character with hexadecimal value 0xhhhh|
 +|\t|The tab character ('\u0009')|
 +|\n|The newline (line feed) character ('\u000A')|
 +|\r|The carriage-return character ('\u000D')|
 +|\f|The form-feed character ('\u000C')|
 +|\a|The alert (bell) character ('\u0007')|
 +|\e|The escape character ('\u001B')|
 +|\cx|The control character corresponding to x|
 +^  Character classes  ^^
 +|[abc]|a, b, or c (simple class)|
 +|[abc]|Any character except a, b, or c (negation)|
 +|[a-zA-Z]|a through z or A through Z, inclusive (range)|
 +|[a-d[m-p]]|a through d, or m through p: [a-dm-p] (union)|
 +|[a-z&&[def]]|d, e, or f (intersection)|
 +|<nowiki>[a-z&&[^bc]]</nowiki>|a through z, except for b and c: [ad-z] (subtraction)|
 +|<nowiki>[a-z&&[^m-p]]</nowiki>|a through z, and not m through p: [a-lq-z](subtraction)|
 +^  Predefined character classes  ^^
 +|.|Any character (may or may not match [[http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/#lt|line terminators]])|
 +|\d|A digit: [0-9]|
 +|\D|A non-digit: <nowiki>[^0-9]</nowiki>|
 +|\s|A whitespace character: [ \t\n\x0B\f\r]|
 +|\S|A non-whitespace character: <nowiki>[^\s]</nowiki>|
 +|\w|A word character: [a-zA-Z_0-9]|
 +|\W|A non-word character: <nowiki>[^\w]</nowiki>|
 +^  POSIX character classes (US-ASCII only)  ^^
 +|\p{Lower}|A lower-case alphabetic character: [a-z]|
 +|\p{Upper}|An upper-case alphabetic character:[A-Z]|
 +|\p{ASCII}|All ASCII:[\x00-\x7F]|
 +|\p{Alpha}|An alphabetic character:[\p{Lower}\p{Upper}]|
 +|\p{Digit}|A decimal digit: [0-9]|
 +|\p{Alnum}|An alphanumeric character:[\p{Alpha}\p{Digit}]|
 +|\p{Punct}|Punctuation: One of <nowiki>!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ </nowiki>|
 +|\p{Graph}|A visible character: [\p{Alnum}\p{Punct}]|
 +|\p{Print}|A printable character: [\p{Graph}]|
 +|\p{Blank}|A space or a tab: [ \t]|
 +|\p{Cntrl}|A control character: [\x00-\x1F\x7F]|
 +|\p{XDigit}|A hexadecimal digit: [0-9a-fA-F]|
 +|\p{Space}|A whitespace character: [ \t\n\x0B\f\r]|
 +^  Classes for Unicode blocks and categories  ^^
 +| \p{InGreek}|A character in the Greek block (simple [[http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/#ubc|block]])|
 +| \p{Lu}|An uppercase letter (simple [[http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/#ubc|category]])|
 +| \p{Sc}|A currency symbol|
 +| \P{InGreek}|Any character except one in the Greek block (negation)|
 +|<nowiki>[\p{L}&&[^\p{Lu}]]</nowiki>|Any letter except an uppercase letter (subtraction)|
 +^  Boundary matchers  ^^
 +|<nowiki>^</nowiki>|The beginning of a line|
 +|$|The end of a line|
 +|\b|A word boundary|
 +|\B|A non-word boundary|
 +|\A|The beginning of the input|
 +|\G|The end of the previous match|
 +|\Z|The end of the input but for the final [[http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/#lt|terminator]], if any|
 +|\z|The end of the input|
 +^  Greedy quantifiers  ^^
 +|X?|X, once or not at all|
 +|X*|X, zero or more times|
 +|X+|X, one or more times|
 +|X{n}|X, exactly n times|
 +|X{n,}|X, at least n times|
 +|X{n,m}|X, at least n but not more than m times|
 +^  Reluctant quantifiers  ^^
 +|X??|X, once or not at all|
 +|X*?|X, zero or more times|
 +|X+?|X, one or more times|
 +|X{n}?|X, exactly n times|
 +|X{n,}?|X, at least n times|
 +|X{n,m}?|X, at least n but not more than m times|
 +^  Possessive quantifiers  ^^
 +|X?+|X, once or not at all|
 +|X*+|X, zero or more times|
 +|X++|X, one or more times|
 +|X{n}+|X, exactly n times|
 +|X{n,}+|X, at least n times|
 +|X{n,m}+|X, at least n but not more than m times|
 +^  Logical operators  ^^
 +|XY|X followed by Y|
 +|<nowiki>X|Y</nowiki>|Either X or Y|
 +|(X)|X, as a [[http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/#cg|capturing group]]|
 +^  Back references  ^^
 +|\n|Whatever the n<sup>th</sup>[[http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/#cg|capturing group]] matched|
 +^  Quotation  ^^
 +|\|Nothing, but quotes the following character|
 +|\Q|Nothing, but quotes all characters until \E|
 +|\E|Nothing, but ends quoting started by \Q|
 +^  Special constructs (non-capturing)  ^^
 +|(?:X)|X, as a non-capturing group|
 +|(?idmsux-idmsux)|Nothing, but turns match flags on - off|
 +|(?idmsux-idmsux:X)|X, as a [[http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/#cg|non-capturing group]] with the given flags on - off|
 +|(?=X)|X, via zero-width positive lookahead|
 +|(?!X)|X, via zero-width negative lookahead|
 +|<nowiki>(?<=X)</nowiki>|X, via zero-width positive lookbehind|
 +|(?<!X)|X, via zero-width negative lookbehind|
 +|(?>X)|X, as an independent, non-capturing group|
 +
 +
 +