A.4 Looking ahead and back
Lookahead specifies a pattern to be matched but not returned. A lookahead is actually a subexpression and is formatted as such. The syntax for a lookahead pattern is a subexpression preceded by ?=, and the text to match follows the = sign. Some refer to this behaviour as “match but not consume”, in the sense that lookhead and lookahead match a pattern after/before what we actually want to extract, but do not return it.
In the following example, we only want to matcch “my homepage” that followed by a </title>, and we do not want </title> in the results
text <- c("<title>my homepage</title>", "<p>my homepage</p>")
str_extract(text, "my homepage(?=</title>)")
#> [1] "my homepage" NA
# looking ahead (and back) must be used in subexpressions 
str_extract(text, "my homepage?=</title>")
#> [1] NA NASimilarly, ?<= is interpreted as the lookback operator, which specifies a pattern before the text we actually want to extract. Following is an example. A database search lists products, and you need only the prices.
Following is an example. A database search lists products, and you need only the prices.
text <- c("ABC01: $23.45", 
          "HGG42: $5.31", 
          "CFMX1: $899.00", 
          "XTC99: $69.96", 
          "Total items found: 4")
str_extract(text, "(?<=\\$)[0-9]+")
#> [1] "23"  "5"   "899" "69"  NAookahead and lookbehind operations may be combined, as in the following example
str_extract("<title>my homepage</title>", "(?<=<title>)my homepage(?=</title>)")
#> [1] "my homepage"Additionally, (?=) and (?<=) are known as positive lookahead and lookback. A lesser used version is the negative form of those two operators, looking for text that does not match the specified pattern.
| class | description | 
|---|---|
(?=) | 
positive lookahead | 
(?!) | 
negative lookahead | 
(?<=) | 
positive lookbehind | 
(?<!) | 
negative lookbehind | 
Suppose we want to extract just the quantities but not the prices in the followin text: