Chapter 3: Quantifiers and Repetition

Haiyue
11min

Chapter 3: Quantifiers and Repetition

Learning Objectives

  1. Master basic quantifiers: *, +, ?
  2. Learn to use precise quantifiers: {n}, {n,}, {n,m}
  3. Understand the difference between greedy and non-greedy matching
  4. Master the use of non-greedy quantifiers: *?, +?, ??

3.1 Overview of Quantifiers

Quantifiers are used to specify how many times the preceding character or pattern should match. This is the core of regex flexibility and power.

Target Objects of Quantifiers

Quantifiers apply to the character or group immediately preceding them:

a*          # * applies to character 'a'
(abc)*      # * applies to group 'abc'
[0-9]+      # + applies to character class [0-9]

3.2 Asterisk (*) - Zero or More Times

The asterisk matches the preceding character or group zero or more times.

Basic Usage

a*          # Matches "", "a", "aa", "aaa", ...
ab*         # Matches "a", "ab", "abb", "abbb", ...
[0-9]*      # Matches "", "1", "123", "999999", ...

Practical Applications

// Match optional whitespace characters
const pattern1 = /\s*/;

// Match digits (including empty string)
const pattern2 = /\d*/;

// Match filename (starts with letter, followed by any number of letters or digits)
const filename = /[a-zA-Z][a-zA-Z0-9]*/;

Common Pitfalls

const text = "abc";
const result = text.match(/a*b*/);
console.log(result[0]); // "ab"

const text2 = "xyz";
const result2 = text2.match(/a*b*/);
console.log(result2[0]); // "" (matches empty string)

3.3 Plus (+) - One or More Times

The plus sign matches the preceding character or group one or more times.

Basic Usage

a+          # Matches "a", "aa", "aaa", ... (doesn't match empty string)
ab+         # Matches "ab", "abb", "abbb", ...
[0-9]+      # Matches "1", "123", "999999", ... (at least one digit)

Practical Applications

// Match one or more digits
const numbers = /\d+/g;
const text = "I have 123 apples and 456 oranges";
console.log(text.match(numbers)); // ["123", "456"]

// Match words (at least one letter)
const words = /[a-zA-Z]+/g;
const sentence = "Hello world! 123";
console.log(sentence.match(words)); // ["Hello", "world"]

Difference from Asterisk

const text = "bcd";

// Using a*
console.log(/a*/.exec(text)[0]); // "" (matches empty string)

// Using a+
console.log(/a+/.exec(text)); // null (no match)

3.4 Question Mark (?) - Zero or One Time

The question mark matches the preceding character or group zero or one time, used to represent optional items.

Basic Usage

a?          # Matches "" or "a"
ab?         # Matches "a" or "ab"
colou?r     # Matches "color" or "colour"

Practical Applications

// Optional protocol part
const url = /https?:\/\//;
console.log(url.test("http://example.com"));  // true
console.log(url.test("https://example.com")); // true

// Optional negative sign
const number = /-?\d+/;
console.log(number.test("123"));  // true
console.log(number.test("-123")); // true

// British and American spelling
const spelling = /colou?r/;
console.log(spelling.test("color"));  // true
console.log(spelling.test("colour")); // true

3.5 Precise Quantifiers {n}, {n,}, {n,m}

Precise quantifiers allow specifying exact match counts.

{n} - Exactly n Times

a{3}        # Matches "aaa"
\d{4}       # Matches exactly 4 digits
[A-Z]{2}    # Matches exactly 2 uppercase letters

{n,} - At Least n Times

a{3,}       # Matches "aaa", "aaaa", "aaaaa", ...
\d{2,}      # Matches at least 2 digits
\w{5,}      # Matches at least 5 alphanumeric characters

{n,m} - Between n and m Times

a{2,5}      # Matches "aa", "aaa", "aaaa", "aaaaa"
\d{3,6}     # Matches 3 to 6 digits
[a-z]{1,10} # Matches 1 to 10 lowercase letters

Practical Applications

// Validate phone number (11 digits)
const phone = /^\d{11}$/;
console.log(phone.test("13812345678")); // true

// Validate password length (6-20 characters)
const password = /^.{6,20}$/;
console.log(password.test("123456")); // true

// Match postal code (6 digits)
const zipCode = /^\d{6}$/;
console.log(zipCode.test("100000")); // true

// Match hexadecimal color (3 or 6 digits)
const hexColor = /#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})/;
console.log(hexColor.test("#F00"));    // true
console.log(hexColor.test("#FF0000")); // true

3.6 Greedy vs Non-Greedy Matching

This is an important concept in regular expressions.

Greedy Matching (Default Behavior)

Greedy quantifiers match as many characters as possible:

const text = "<div>Hello</div><div>World</div>";

// Greedy matching
const greedy = /<div>.*<\/div>/;
console.log(text.match(greedy)[0]);
// Result: "<div>Hello</div><div>World</div>"
// Matches from the first <div> to the last </div>

Problem with Greedy Matching

const html = '<p class="text">Content 1</p><p class="text">Content 2</p>';
const greedyPattern = /<p.*<\/p>/;
const result = html.match(greedyPattern);
console.log(result[0]);
// Matches the entire string, not individual <p> tags

Non-Greedy Matching (Lazy Matching)

Adding ? after a quantifier makes it non-greedy:

const text = "<div>Hello</div><div>World</div>";

// Non-greedy matching
const nonGreedy = /<div>.*?<\/div>/;
console.log(text.match(nonGreedy)[0]);
// Result: "<div>Hello</div>"
// Only matches the first complete div tag

// Match all div tags
const allDivs = text.match(/<div>.*?<\/div>/g);
console.log(allDivs);
// Result: ["<div>Hello</div>", "<div>World</div>"]

3.7 All Non-Greedy Quantifiers

Non-Greedy Quantifier List

*?          # Zero or more times (non-greedy)
+?          # One or more times (non-greedy)
??          # Zero or one time (non-greedy)
{n,m}?      # n to m times (non-greedy)
{n,}?       # At least n times (non-greedy)

Practical Comparison Examples

const text = "aaaa";

// Greedy matching
console.log(text.match(/a+/)[0]);     // "aaaa"
console.log(text.match(/a{2,}/)[0]);  // "aaaa"

// Non-greedy matching
console.log(text.match(/a+?/)[0]);    // "a"
console.log(text.match(/a{2,}?/)[0]); // "aa"

HTML Tag Extraction Example

const html = `
  <h1>Title 1</h1>
  <p>Paragraph 1</p>
  <h1>Title 2</h1>
  <p>Paragraph 2</p>
`;

// Greedy matching - wrong approach
const greedyTags = html.match(/<h1>.*<\/h1>/g);
console.log(greedyTags);
// May match overly long content

// Non-greedy matching - correct approach
const lazyTags = html.match(/<h1>.*?<\/h1>/g);
console.log(lazyTags);
// ["<h1>Title 1</h1>", "<h1>Title 2</h1>"]

3.8 Practical Use Cases

Extracting Content in Quotes

const text = 'He said "Hello" and she replied "Hi there!"';

// Use non-greedy matching to extract quoted content
const quotes = text.match(/".*?"/g);
console.log(quotes); // ['"Hello"', '"Hi there!"']

// Extract only the text inside quotes (without quotes)
const quotedText = text.match(/"(.*?)"/g).map(match => match.slice(1, -1));
console.log(quotedText); // ['Hello', 'Hi there!']

Matching Repeated Characters

// Match repeated characters
const repeatedChars = /(.)\1+/g;
const text = "aabbcccddddd";
console.log(text.match(repeatedChars)); // ["aa", "bb", "ccc", "ddddd"]

Format Validation

// QQ number validation (5-11 digits, cannot start with 0)
const qqPattern = /^[1-9]\d{4,10}$/;

// Username validation (starts with letter, 3-16 alphanumeric or underscore)
const usernamePattern = /^[a-zA-Z][a-zA-Z0-9_]{2,15}$/;

// Password validation (8-16 characters, contains letters and numbers)
const passwordPattern = /^(?=.*[a-zA-Z])(?=.*\d)[a-zA-Z\d]{8,16}$/;

3.9 Performance Considerations

Performance Issues with Greedy Matching

// Potential performance issue
const text = "a".repeat(10000) + "b";
const problematicPattern = /a*a*b/;

// Better approach
const betterPattern = /a*b/;

Quantifier Efficiency Comparison

// Less efficient: multiple individual character matches
const inefficient = /a?a?a?aaa/;

// More efficient: using appropriate quantifiers
const efficient = /a{3,6}/;

3.10 Common Mistakes and Pitfalls

Mistake 1: Forgetting the Scope of Quantifiers

// Wrong: only 's' is optional
const wrong = /cats?/; // Matches "cat" or "cats"

// Correct: entire "cats" is optional
const correct = /(cats)?/; // Matches "" or "cats"

Mistake 2: Over-Matching with Greedy Matching

// Wrong: will match the entire line
const wrong = /<!--.*-->/;

// Correct: use non-greedy matching
const correct = /<!--.*?-->/;

Mistake 3: Edge Cases with Quantifiers

// Note empty string matching
const pattern = /\d*/;
console.log("abc".match(pattern)[0]); // "" (empty string)

// If you need at least one digit
const pattern2 = /\d+/;
console.log("abc".match(pattern2)); // null

3.11 Practice Exercises

Exercise 1: Basic Quantifier Usage

Write regular expressions:

  1. Match domain names with optional “www.” prefix
  2. Match one or more consecutive digits
  3. Match exactly 8-digit numeric password
// Answers
const domain = /(www\.)?[a-zA-Z0-9-]+\.[a-z]{2,}/;
const numbers = /\d+/;
const password = /^\d{8}$/;

Exercise 2: Greedy vs Non-Greedy

Given an HTML string, extract all tag content:

const html = "<p>Paragraph 1</p><div>Content</div><p>Paragraph 2</p>";

// Use non-greedy matching
const tags = html.match(/<[^>]+>.*?<\/[^>]+>/g);
console.log(tags);

Exercise 3: Practical Application

Write regular expressions to validate:

  1. Chinese mobile phone number (11 digits, starting with 1)
  2. ID card number (18 digits)
  3. Email address (simple version)
// Answers
const phone = /^1\d{10}$/;
const idCard = /^\d{18}$/;
const email = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;

Summary

Quantifiers are important tools for controlling match counts in regular expressions:

  1. Basic Quantifiers: * (zero or more), + (one or more), ? (zero or one)
  2. Precise Quantifiers: {n}, {n,}, {n,m} provide exact count control
  3. Greedy vs Non-Greedy: Default is greedy matching, add ? for non-greedy
  4. Performance Considerations: Proper use of quantifiers can improve matching efficiency
  5. Common Pitfalls: Pay attention to quantifier scope and edge cases

Mastering quantifier usage is a key step in writing efficient and accurate regular expressions.