Chapter 1: Introduction to Regular Expressions

Haiyue
5min

Chapter 1: Introduction to Regular Expressions

Learning Objectives

  1. Understand the basic concepts and uses of regular expressions
  2. Master the basic syntax structure of regular expressions
  3. Learn to use online tools to test regular expressions
  4. Understand regular expression support in different programming languages

1.1 What are Regular Expressions

Regular expressions (regex or regexp) are patterns used to match character combinations in strings. They are a powerful text processing tool that can be used to:

  • Validate input formats (such as email addresses, phone numbers)
  • Search and replace text
  • Extract specific information
  • Data cleaning and format conversion

1.2 Basic Structure of Regular Expressions

Regular expressions consist of the following parts:

  • Literal characters: Directly matched characters, such as hello matches “hello” in text
  • Metacharacters: Characters with special meanings, such as . * + ?, etc.
  • Character classes: Sets of characters enclosed in square brackets, such as [abc]
  • Quantifiers: Specify number of matches, such as {3} {2,5}
  • Anchors: Specify position, such as ^ for start, $ for end

1.3 Regular Expression Syntax Examples

Basic Examples

cat         # Matches "cat"
c.t         # Matches "cat", "cot", "cut", etc.
c[ao]t      # Matches "cat" or "cot"
c[a-z]t     # Matches c + any lowercase letter + t

Common Metacharacters

  • . - Matches any character (except newline)
  • * - Matches the preceding character 0 or more times
  • + - Matches the preceding character 1 or more times
  • ? - Matches the preceding character 0 or 1 time
  • ^ - Matches the beginning of a line
  • $ - Matches the end of a line

1.4 Online Testing Tools

Recommended regular expression testing websites:

  1. RegexPal (https://regexpal.com/)
  2. Regex101 (https://regex101.com/)
  3. RegExr (https://regexr.com/)
  4. RegexTester (https://www.regextester.com/)

Features of these tools:

  • Real-time testing and match result display
  • Syntax highlighting and error hints
  • Detailed match explanations
  • Support for different programming language syntaxes

1.5 Programming Language Support

JavaScript

const regex = /hello/;
const result = "hello world".match(regex);

Python

import re
pattern = r"hello"
result = re.search(pattern, "hello world")

Java

import java.util.regex.*;
Pattern pattern = Pattern.compile("hello");
Matcher matcher = pattern.matcher("hello world");

PHP

$pattern = "/hello/";
preg_match($pattern, "hello world", $matches);

1.6 Why We Need Regular Expressions

Limitations of Traditional String Operations

// Traditional way to validate email - complex and incomplete
function validateEmailOld(email) {
    return email.includes("@") &&
           email.includes(".") &&
           email.indexOf("@") > 0 &&
           email.lastIndexOf(".") > email.indexOf("@");
}

Advantages of Using Regular Expressions

// Regular expression approach - concise and accurate
function validateEmailNew(email) {
    const regex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
    return regex.test(email);
}

1.7 Application Scenarios for Regular Expressions

  1. Data Validation

    • Email address validation
    • Phone number format checking
    • Password strength validation
  2. Text Search and Replace

    • Code refactoring
    • Batch text processing
    • Log analysis
  3. Data Extraction

    • Web data scraping
    • Log information extraction
    • Configuration file parsing
  4. Data Cleaning

    • Removing extra whitespace
    • Unifying data formats
    • Removing special characters

1.8 Learning Suggestions

  1. Step by Step: Start with simple literal matching, gradually learning complex syntax
  2. Practice More: Use online tools to practice various patterns frequently
  3. Practical Application: Learn in combination with specific project requirements
  4. Reference Documentation: Familiarize yourself with the regular expression documentation of your target programming language

Summary

Regular expressions are a powerful tool for text processing. Although the syntax may seem complex, through systematic learning and practice, they can greatly improve text processing efficiency. Understanding their basic concepts and structure is the foundation for further in-depth learning.

Practice Exercises

  1. Use online tools to test the following regular expressions:

    • cat matches text “The cat is sleeping”
    • c.t matches text “cat, cot, cut, c@t”
    • ^hello matches text “hello world”
  2. Try writing a simple regular expression to match 3 digits.