class Regex

Overview

ARegex represents a regular expression, a pattern that describes the contents of strings. ARegex can determine whether or not a string matches its description, and extract the parts of the string that match.

ARegex can be created using the literal syntax, in which it is delimited by forward slashes (/):

/hay/ =~ "haystack"   # => 0
/y/.match("haystack") # => Regex::MatchData("y")

SeeRegex literals in the language reference.

Interpolation works in regular expression literals just as it does in string literals. Be aware that using this feature will cause an exception to be raised at runtime, if the resulting string would not be a valid regular expression.

x = "a"
/#{x}/.match("asdf") # => Regex::MatchData("a")
x = "("
/#{x}/ # raises ArgumentError

When we check to see if a particular regular expression describes a string, we can say that we are performing a match or matching one against the other. If we find that a regular expression does describe a string, we say that it matches, and we can refer to a part of the string that was described as a match.

Here"haystack" does not contain the pattern/needle/, so it doesn't match:

/needle/.match("haystack") # => nil

Here"haystack" contains the pattern/hay/, so it matches:

/hay/.match("haystack") # => Regex::MatchData("hay")

Regex methods that perform a match usually return a truthy value if there was a match andnil if there was no match. After performing a match, the special variable$~ will be an instance ofRegex::MatchData if it matched,nil otherwise.

When matching a regular expression using#=~ (eitherString#=~ or Regex#=~), the returned value will be the index of the first match in the string if the expression matched,nil otherwise.

/stack/ =~ "haystack"  # => 3
"haystack" =~ /stack/  # => 3
$~                     # => Regex::MatchData("stack")
/needle/ =~ "haystack" # => nil
"haystack" =~ /needle/ # => nil
$~                     # raises Exception

When matching a regular expression using#match (eitherString#match or Regex#match), the returned value will be aRegex::MatchData if the expression matched,nil otherwise.

/hay/.match("haystack")    # => Regex::MatchData("hay")
"haystack".match(/hay/)    # => Regex::MatchData("hay")
$~                         # => Regex::MatchData("hay")
/needle/.match("haystack") # => nil
"haystack".match(/needle/) # => nil
$~                         # raises Exception

Regular expressions have their own language for describing strings.

Many programming languages and tools implement their own regular expression language, but Crystal usesPCRE2, a popular C library, with JIT compilation enabled for providing regular expressions. Here give a brief summary of the most basic features of regular expressions - grouping, repetition, and alternation - but the feature set of PCRE2 extends far beyond these, and we don't attempt to describe it in full here. For more information, refer to the PCRE2 documentation, especially the full pattern syntax or syntax quick reference.

NOTE Prior to Crystal 1.8 the compiler expected regex literals to follow the originalPCRE pattern syntax. The following summary applies to both PCRE and PCRE2.

The regular expression language can be used to match much more than just the static substrings in the above examples. Certain characters, called metacharacters, are given special treatment in regular expressions, and can be used to describe more complex patterns. To match metacharacters literally in a regular expression, they must be escaped by being preceded with a backslash (\)..escape will do this automatically for a given String.

A group of characters (often called a capture group or subpattern) can be identified by enclosing it in parentheses (()). The contents of each capture group can be extracted on a successful match:

/a(sd)f/.match("_asdf_")                     # => Regex::MatchData("asdf" 1:"sd")
/a(sd)f/.match("_asdf_").try &.[1]           # => "sd"
/a(?<grp>sd)f/.match("_asdf_")               # => Regex::MatchData("asdf" grp:"sd")
/a(?<grp>sd)f/.match("_asdf_").try &.["grp"] # => "sd"

Capture groups are indexed starting from 1. Methods that accept a capture group index will usually also accept 0 to refer to the full match. Capture groups can also be given names, using the(?&lt;name&gt;...) syntax, as in the previous example.

Following a match, the special variables $N (e.g., $1, $2, $3, ...) can be used to access a capture group. Trying to access an invalid capture group will raise an exception. Note that it is possible to have a successful match with a nil capture:

/(spice)(s)?/.match("spice") # => Regex::MatchData("spice" 1:"spice" 2:nil)
$1                           # => "spice"
$2                           # => raises Exception

This can be mitigated by using the nilable version of the above: $N?, (e.g., $1? $2?, $3?, ...). Changing the above to use$2? instead of$2 would returnnil.$2?.nil? would returntrue.

A character or group can be repeated or made optional using an asterisk (* - zero or more), a plus sign (#+ - one or more), integer bounds in curly braces ({n,m}) (at leastn, no more thanm), or a question mark (?) (zero or one).

/fo*/.match("_f_")         # => Regex::MatchData("f")
/fo+/.match("_f_")         # => nil
/fo*/.match("_foo_")       # => Regex::MatchData("foo")
/fo{3,}/.match("_foo_")    # => nil
/fo{1,3}/.match("_foo_")   # => Regex::MatchData("foo")
/fo*/.match("_foo_")       # => Regex::MatchData("foo")
/fo*/.match("_foooooooo_") # => Regex::MatchData("foooooooo")
/fo{,3}/.match("_foooo_")  # => nil
/f(op)*/.match("fopopo")   # => Regex::MatchData("fopop" 1:"op")
/foo?bar/.match("foobar")  # => Regex::MatchData("foobar")
/foo?bar/.match("fobar")   # => Regex::MatchData("fobar")

Alternatives can be separated using a vertical bar (|). Any single character can be represented by dot (.). When matching only one character, specific alternatives can be expressed as a character class, enclosed in square brackets ([]):

/foo|bar/.match("foo")     # => Regex::MatchData("foo")
/foo|bar/.match("bar")     # => Regex::MatchData("bar")
/_(x|y)_/.match("_x_")     # => Regex::MatchData("_x_" 1:"x")
/_(x|y)_/.match("_y_")     # => Regex::MatchData("_y_" 1:"y")
/_(x|y)_/.match("_(x|y)_") # => nil
/_(x|y)_/.match("_(x|y)_") # => nil
/_._/.match("_x_")         # => Regex::MatchData("_x_")
/_[xyz]_/.match("_x_")     # => Regex::MatchData("_x_")
/_[a-z]_/.match("_x_")     # => Regex::MatchData("_x_")
/_[^a-z]_/.match("_x_")    # => nil
/_[^a-wy-z]_/.match("_x_") # => Regex::MatchData("_x_")

Regular expressions can be defined with these 3 optional flags:

/asdf/ =~ "ASDF"    # => nil
/asdf/i =~ "ASDF"   # => 0
/^z/i =~ "ASDF\nZ"  # => nil
/^z/im =~ "ASDF\nZ" # => 5

PCRE2 supports other encodings, but Crystal strings are UTF-8 only, so Crystal regular expressions are also UTF-8 only (by default). Crystal strings are expected to contain only valid UTF-8 encodings, but that's not guaranteed. There's a chance that a stringcan contain invalid bytes. Especially data read from external sources must not be trusted to be valid encoding. The regex engine demands valid UTF-8, so it checks the encoding for every match. This can be unnecessary if the string is already validated (for example viaString#valid_encoding? or because it has already been used in a previous regex match). If that's the case, it's profitable to skip UTF-8 validation viaMatchOptions::NO_UTF_CHECK (orCompileOptions::NO_UTF_CHECK for the pattern).

PCRE2 optionally permits named capture groups (named subpatterns) to not be unique. Crystal exposes the name table of aRegex as a Hash ofString =&gt;Int32, and therefore requires named capture groups to have unique names within a singleRegex.

Included Modules

Defined in:

json/any.cr
regex.cr
regex/match_data.cr
yaml/any.cr

Constant Summary

SPECIAL_CHARACTERS = {' ', '.', '\\', '+', '*', '?', '[', '^', ']', '$', '(', ')', '{', '}', '=', '!', '<', '>', '|', ':', '-'}

List of metacharacters that need to be escaped.

SeeRegex.needs_escape? andRegex.escape.

Constructors

Class Method Summary

Instance Method Summary

Instance methods inherited from module Regex::PCRE2

finalize finalize

Class methods inherited from module Regex::PCRE2

current_jit_stack : Crystal::ValueWithFinalizer(Pointer(LibPCRE2::JITStack)) current_jit_stack, current_match_data : Crystal::ValueWithFinalizer(Pointer(LibPCRE2::MatchData)) current_match_data, match_context : Pointer(LibPCRE2::MatchContext) match_context, supports_compile_flag?(options) supports_compile_flag?, supports_match_flag?(options) supports_match_flag?, version : String version, version_number : Tuple(Int32, Int32) version_number

Instance methods inherited from class Reference

==(other : self)
==(other : JSON::Any)
==(other : YAML::Any)
==(other)
==
, dup dup, hash(hasher) hash, initialize initialize, inspect(io : IO) : Nil inspect, object_id : UInt64 object_id, pretty_print(pp) : Nil pretty_print, same?(other : Reference) : Bool
same?(other : Nil)
same?
, to_s(io : IO) : Nil to_s

Constructor methods inherited from class Reference

new new, unsafe_construct(address : Pointer, *args, **opts) : self unsafe_construct

Class methods inherited from class Reference

pre_initialize(address : Pointer) pre_initialize

Instance methods inherited from class Object

! : Bool !, !=(other) !=, !~(other) !~, ==(other) ==, ===(other : JSON::Any)
===(other : YAML::Any)
===(other)
===
, =~(other) =~, as(type : Class) as, as?(type : Class) as?, class class, dup dup, hash(hasher)
hash
hash
, in?(collection : Object) : Bool
in?(*values : Object) : Bool
in?
, inspect(io : IO) : Nil
inspect : String
inspect
, is_a?(type : Class) : Bool is_a?, itself itself, nil? : Bool nil?, not_nil!(message)
not_nil!
not_nil!
, pretty_inspect(width = 79, newline = "\n", indent = 0) : String pretty_inspect, pretty_print(pp : PrettyPrint) : Nil pretty_print, responds_to?(name : Symbol) : Bool responds_to?, tap(&) tap, to_json(io : IO) : Nil
to_json : String
to_json
, to_pretty_json(indent : String = " ") : String
to_pretty_json(io : IO, indent : String = " ") : Nil
to_pretty_json
, to_s(io : IO) : Nil
to_s : String
to_s
, to_yaml(io : IO) : Nil
to_yaml : String
to_yaml
, try(&) try, unsafe_as(type : T.class) forall T unsafe_as

Class methods inherited from class Object

from_json(string_or_io : String | IO, root : String)
from_json(string_or_io : String | IO)
from_json
, from_yaml(string_or_io : String | IO) from_yaml

Macros inherited from class Object

class_getter(*names, &block) class_getter, class_getter!(*names) class_getter!, class_getter?(*names, &block) class_getter?, class_property(*names, &block) class_property, class_property!(*names) class_property!, class_property?(*names, &block) class_property?, class_setter(*names) class_setter, def_clone def_clone, def_equals(*fields) def_equals, def_equals_and_hash(*fields) def_equals_and_hash, def_hash(*fields) def_hash, delegate(*methods, to object) delegate, forward_missing_to(delegate) forward_missing_to, getter(*names, &block) getter, getter!(*names) getter!, getter?(*names, &block) getter?, property(*names, &block) property, property!(*names) property!, property?(*names, &block) property?, setter(*names) setter

Constructor Detail

def self.literal(pattern : String, *, i : Bool = false, m : Bool = false, x : Bool = false) : self #

Creates a newRegex instance from a literal consisting of apattern and the named parameter modifiers.


def self.new(source : String, options : Options = Options::None) : self #

Creates a newRegex out of the given sourceString.

Regex.new("^a-z+:\\s+\\w+")                          # => /^a-z+:\s+\w+/
Regex.new("cat", Regex::CompileOptions::IGNORE_CASE) # => /cat/i
options = Regex::CompileOptions::IGNORE_CASE | Regex::CompileOptions::EXTENDED
Regex.new("dog", options) # => /dog/ix

def self.union(patterns : Enumerable(Regex | String)) : self #

Union. Returns aRegex that matches any ofpatterns.

All capture groups in the patterns after the first one will have their indexes offset.

re = Regex.union([/skiing/i, "sledding"])
re.match("Skiing")   # => Regex::MatchData("Skiing")
re.match("sledding") # => Regex::MatchData("sledding")
re = Regex.union({/skiing/i, "sledding"})
re.match("Skiing")   # => Regex::MatchData("Skiing")
re.match("sledding") # => Regex::MatchData("sledding")

def self.union(*patterns : Regex | String) : self #

Union. Returns aRegex that matches any ofpatterns.

All capture groups in the patterns after the first one will have their indexes offset.

re = Regex.union(/skiing/i, "sledding")
re.match("Skiing")   # => Regex::MatchData("Skiing")
re.match("sledding") # => Regex::MatchData("sledding")

Class Method Detail

def self.error?(source : String) : String | Nil #

Determines Regex's source validity. If it is,nil is returned. If it's not, aString containing the error message is returned.

Regex.error?("(foo|bar)") # => nil
Regex.error?("(foo|bar")  # => "missing ) at 8"

def self.escape(str : String) : String #

Returns aString constructed by escaping any metacharacters instr.

string = Regex.escape("*?{}.") # => "\\*\\?\\{\\}\\."
/#{string}/                    # => /\*\?\{\}\./

def self.needs_escape?(char : Char) : Bool #

Returnstrue ifchar need to be escaped,false otherwise.

Regex.needs_escape?('*') # => true
Regex.needs_escape?('@') # => false

def self.needs_escape?(str : String) : Bool #

Returnstrue ifstr need to be escaped,false otherwise.

Regex.needs_escape?("10$") # => true
Regex.needs_escape?("foo") # => false

def self.supports_compile_options?(options : CompileOptions) : Bool #

Returnstrue if the regex engine supports alloptions flags when compiling a pattern.


def self.supports_match_options?(options : MatchOptions) : Bool #

Returnstrue if the regex engine supports alloptions flags when matching a pattern.


Instance Method Detail

def +(other) : Regex #

Union. Returns aRegex that matches either of the operands.

All capture groups in the second operand will have their indexes offset.

re = /skiing/i + /sledding/
re.match("Skiing")   # => Regex::MatchData("Skiing")
re.match("sledding") # => Regex::MatchData("sledding")

def ==(other : Regex) : Bool #

Equality. Two regexes are equal if their sources and options are the same.

/abc/ == /abc/i  # => false
/abc/i == /ABC/i # => false
/abc/i == /abc/i # => true

def ===(other : String) : Bool #

Case equality. This is equivalent to#match or#=~ but only returns true orfalse. Used incase expressions. The special variable $~ will contain aRegex::MatchData if there was a match,nil otherwise.

a = "HELLO"
b = case a
    when /^[a-z]*$/
      "Lower case"
    when /^[A-Z]*$/
      "Upper case"
    else
      "Mixed case"
    end
b # => "Upper case"

def ===(other : JSON::Any) #

def ===(other : YAML::Any) #

def =~(other : String) : Int32 | Nil #

Match. Matches a regular expression againstother and returns the starting position of the match ifother is a matchingString, otherwisenil.$~ will contain aRegex::MatchData if there was a match, nil otherwise.

/at/ =~ "input data" # => 7
/ax/ =~ "input data" # => nil

def =~(other) : Nil #

Match. When the argument is not aString, always returnsnil.

/at/ =~ "input data" # => 7
/ax/ =~ "input data" # => nil

def capture_count : Int32 #

Returns the number of (named&amp; non-named) capture groups.

/(?:.+)/.capture_count     # => 0
/(?<foo>.+)/.capture_count # => 1
/(.)/.capture_count        # => 1
/(.)|(.)/.capture_count    # => 2

def clone : Regex #

def dup #
Description copied from class Reference

Returns a shallow copy of this object.

This allocates a new object and copies the contents of self into it.


def hash(hasher) #

def inspect(io : IO) : Nil #

Prints toio an unambiguous string representation of this regular expression object.

Uses the regex literal syntax with basic option flags if sufficient (i.e. no other options thanIGNORE_CASE,MULTILINE, orEXTENDED are set). Otherwise the syntax presents aRegex.new call.

/ab+c/ix.inspect                     # => "/ab+c/ix"
Regex.new("ab+c", :anchored).inspect # => Regex.new("ab+c", Regex::Options::ANCHORED)

def match(str : String, pos : Int32 = 0, options : Regex::MatchOptions = :none) : MatchData | Nil #

Match at character index. Matches a regular expression againstString str. Starts at the character index given bypos if given, otherwise at the start ofstr. Returns aRegex::MatchData ifstr matched, otherwise nil.$~ will contain the same value that was returned.

/(.)(.)(.)/.match("abc").try &.[2]   # => "b"
/(.)(.)/.match("abc", 1).try &.[2]   # => "c"
/(.)(.)/.match("クリスタル", 3).try &.[2] # => "ル"

def match(str, pos, _options) : MatchData | Nil #

Match at character index. Matches a regular expression againstString str. Starts at the character index given bypos if given, otherwise at the start ofstr. Returns aRegex::MatchData ifstr matched, otherwise nil.$~ will contain the same value that was returned.

/(.)(.)(.)/.match("abc").try &.[2]   # => "b"
/(.)(.)/.match("abc", 1).try &.[2]   # => "c"
/(.)(.)/.match("クリスタル", 3).try &.[2] # => "ル"

DEPRECATED Use the overload withRegex::MatchOptions instead.


def match(str, pos = 0, *, options) : MatchData | Nil #

Match at character index. Matches a regular expression againstString str. Starts at the character index given bypos if given, otherwise at the start ofstr. Returns aRegex::MatchData ifstr matched, otherwise nil.$~ will contain the same value that was returned.

/(.)(.)(.)/.match("abc").try &.[2]   # => "b"
/(.)(.)/.match("abc", 1).try &.[2]   # => "c"
/(.)(.)/.match("クリスタル", 3).try &.[2] # => "ル"

DEPRECATED Use the overload withRegex::MatchOptions instead.


def match!(str : String, pos : Int32 = 0, *, options : Regex::MatchOptions = :none) : MatchData #

Matches a regular expression againststr. This starts at the character indexpos if given, otherwise at the start ofstr. Returns aRegex::MatchData ifstr matched, otherwise raisesRegex::Error.$~ will contain the same value if matched.

/(.)(.)(.)/.match!("abc")[2]   # => "b"
/(.)(.)/.match!("abc", 1)[2]   # => "c"
/(.)(タ)/.match!("クリスタル", 3)[2] # raises Exception

def match_at_byte_index(str : String, byte_index : Int32 = 0, options : Regex::MatchOptions = :none) : MatchData | Nil #

Match at byte index. Matches a regular expression againstString str. Starts at the byte index given bypos if given, otherwise at the start ofstr. Returns aRegex::MatchData ifstr matched, otherwise nil.$~ will contain the same value that was returned.

/(.)(.)(.)/.match_at_byte_index("abc").try &.[2]   # => "b"
/(.)(.)/.match_at_byte_index("abc", 1).try &.[2]   # => "c"
/(.)(.)/.match_at_byte_index("クリスタル", 3).try &.[2] # => "ス"

def match_at_byte_index(str, byte_index, _options) : MatchData | Nil #

Match at byte index. Matches a regular expression againstString str. Starts at the byte index given bypos if given, otherwise at the start ofstr. Returns aRegex::MatchData ifstr matched, otherwise nil.$~ will contain the same value that was returned.

/(.)(.)(.)/.match_at_byte_index("abc").try &.[2]   # => "b"
/(.)(.)/.match_at_byte_index("abc", 1).try &.[2]   # => "c"
/(.)(.)/.match_at_byte_index("クリスタル", 3).try &.[2] # => "ス"

DEPRECATED Use the overload withRegex::MatchOptions instead.


def match_at_byte_index(str, byte_index = 0, *, options) : MatchData | Nil #

Match at byte index. Matches a regular expression againstString str. Starts at the byte index given bypos if given, otherwise at the start ofstr. Returns aRegex::MatchData ifstr matched, otherwise nil.$~ will contain the same value that was returned.

/(.)(.)(.)/.match_at_byte_index("abc").try &.[2]   # => "b"
/(.)(.)/.match_at_byte_index("abc", 1).try &.[2]   # => "c"
/(.)(.)/.match_at_byte_index("クリスタル", 3).try &.[2] # => "ス"

DEPRECATED Use the overload withRegex::MatchOptions instead.


def matches?(str : String, pos : Int32 = 0, options : Regex::MatchOptions = :none) : Bool #

Match at character index. It behaves like#match, however it returnsBool value. It neither returnsMatchData nor assigns it to the$~ variable.

/foo/.matches?("bar") # => false
/foo/.matches?("foo") # => true

# `$~` is not set even if last match succeeds.
$~ # raises Exception

def matches?(str, pos, _options) : Bool #

Match at character index. It behaves like#match, however it returnsBool value. It neither returnsMatchData nor assigns it to the$~ variable.

/foo/.matches?("bar") # => false
/foo/.matches?("foo") # => true

# `$~` is not set even if last match succeeds.
$~ # raises Exception

DEPRECATED Use the overload withRegex::MatchOptions instead.


def matches?(str, pos = 0, *, options) : Bool #

Match at character index. It behaves like#match, however it returnsBool value. It neither returnsMatchData nor assigns it to the$~ variable.

/foo/.matches?("bar") # => false
/foo/.matches?("foo") # => true

# `$~` is not set even if last match succeeds.
$~ # raises Exception

DEPRECATED Use the overload withRegex::MatchOptions instead.


def matches_at_byte_index?(str : String, byte_index : Int32 = 0, options : Regex::MatchOptions = :none) : Bool #

Match at byte index. It behaves like#match_at_byte_index, however it returnsBool value. It neither returnsMatchData nor assigns it to the$~ variable.


def matches_at_byte_index?(str, byte_index, _options) : Bool #

Match at byte index. It behaves like#match_at_byte_index, however it returnsBool value. It neither returnsMatchData nor assigns it to the$~ variable.

DEPRECATED Use the overload withRegex::MatchOptions instead.


def matches_at_byte_index?(str, byte_index = 0, *, options) : Bool #

Match at byte index. It behaves like#match_at_byte_index, however it returnsBool value. It neither returnsMatchData nor assigns it to the$~ variable.

DEPRECATED Use the overload withRegex::MatchOptions instead.


def name_table : Hash(Int32, String) #

Returns aHash where the values are the names of capture groups and the keys are their indexes. Non-named capture groups will not have entries in theHash. Capture groups are indexed starting from1.

/(.)/.name_table                         # => {}
/(?<foo>.)/.name_table                   # => {1 => "foo"}
/(?<foo>.)(?<bar>.)/.name_table          # => {2 => "bar", 1 => "foo"}
/(.)(?<foo>.)(.)(?<bar>.)(.)/.name_table # => {4 => "bar", 2 => "foo"}

def options : Options #

Returns aRegex::CompileOptions representing the optional flags applied to thisRegex.

/ab+c/ix.options      # => Regex::CompileOptions::IGNORE_CASE | Regex::CompileOptions::EXTENDED
/ab+c/ix.options.to_s # => "IGNORE_CASE | EXTENDED"

def source : String #

Returns the originalString representation of theRegex pattern.

/ab+c/x.source # => "ab+c"

def to_s(io : IO) : Nil #

Convert toString in subpattern format. Produces aString which can be embedded in anotherRegex via interpolation, where it will be interpreted as a non-capturing subexpression in another regular expression.

re = /A*/i                 # => /A*/i
re.to_s                    # => "(?i-msx:A*)"
"Crystal".match(/t#{re}l/) # => Regex::MatchData("tal")
re = /A*/                  # => "(?-imsx:A*)"
"Crystal".match(/t#{re}l/) # => nil