Chapter 7: Modifiers and Patterns

Haiyue
21min

Chapter 7: Modifiers and Patterns

Learning Objectives

  1. Master case-insensitive matching (i)
  2. Learn to use multi-line mode (m)
  3. Understand the role of single-line mode (s)
  4. Master the use of global matching (g)
  5. Understand the function of other common modifiers

7.1 Overview of Modifiers

Modifiers (Flags/Modifiers) are options for regular expressions that change the matching behavior of the regular expression. They usually appear at the end of the regular expression.

Syntax of Modifiers

// Literal syntax
const regex = /pattern/flags;

// Constructor syntax
const regex = new RegExp('pattern', 'flags');

// Common modifiers
const examples = {
    'i': /hello/i,           // Case-insensitive
    'g': /hello/g,           // Global match
    'm': /^hello/m,          // Multi-line mode
    's': /hello.world/s,     // Single-line mode (dotAll)
    'u': /\u{1f600}/u,       // Unicode mode
    'y': /hello/y            // Sticky matching
};

7.2 Case-Insensitive (i)

The i modifier makes the regular expression ignore case differences.

Basic Usage

// Case-sensitive (default)
const caseSensitive = /hello/;
console.log(caseSensitive.test("Hello")); // false
console.log(caseSensitive.test("hello")); // true

// Case-insensitive
const caseInsensitive = /hello/i;
console.log(caseInsensitive.test("Hello")); // true
console.log(caseInsensitive.test("HELLO")); // true
console.log(caseInsensitive.test("HeLLo")); // true

Practical Applications

// Search function: user input is case-insensitive
function searchText(content, query) {
    const searchRegex = new RegExp(query.replace(/[.*+?^${}()|[\\]/g, '\\$&'), 'gi');
    return content.match(searchRegex) || [];
}

const content = "JavaScript is a Programming Language. Java is different from JavaScript.";
console.log(searchText(content, "javascript")); 
// ["JavaScript", "JavaScript"]

// Email validation: domain part is case-insensitive
const emailPattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/i;
console.log(emailPattern.test("User@Example.COM")); // true

// URL matching: protocol and domain are case-insensitive
const urlPattern = /^https?:\/\/[\w.-]+/i;
console.log(urlPattern.test("HTTPS://EXAMPLE.COM")); // true

Case in Character Classes

// Character classes automatically include corresponding cases
const letterPattern = /[a-z]/i;
console.log(letterPattern.test("A")); // true
console.log(letterPattern.test("a")); // true

// Equivalent to
const equivalentPattern = /[a-zA-Z]/;
console.log(equivalentPattern.test("A")); // true
console.log(equivalentPattern.test("a")); // true

// Case of Unicode characters
const unicodePattern = /café/i;
console.log(unicodePattern.test("CAFÉ")); // true
console.log(unicodePattern.test("Café")); // true

7.3 Global Match (g)

The g modifier makes the regular expression match all possible occurrences, rather than just the first one.

Basic Comparison

const text = "cat bat rat";

// Without global match - only matches the first
const nonGlobal = /\w+at/;
console.log(text.match(nonGlobal)); // ["cat"]

// With global match - matches all
const global = /\w+at/g;
console.log(text.match(global)); // ["cat", "bat", "rat"]

Different Behavior of match() Method

const text = "The year 2023 and 2024";

// Non-global match: returns detailed information
const nonGlobalMatch = text.match(/\d{4}/);
console.log(nonGlobalMatch);
// ["2023", index: 9, input: "The year 2023 and 2024", groups: undefined]

// Global match: returns an array of all matches
const globalMatch = text.match(/\d{4}/g);
console.log(globalMatch); // ["2023", "2024"]

exec() Method with Global Match

const text = "apple 123, banana 456, cherry 789";
const pattern = /(\w+) (\d+)/g;

// Use exec() to iterate and get detailed information
let match;
while ((match = pattern.exec(text)) !== null) {
    console.log(`${match[1]}: ${match[2]}`);
    // apple: 123
    // banana: 456
    // cherry: 789
}

// Use matchAll() to get all detailed matches (ES2020)
const allMatches = [...text.matchAll(/(\w+) (\d+)/g)];
allMatches.forEach(match => {
    console.log(`${match[1]}: ${match[2]}`);
});

Global Match in Replacement

const text = "Hello World Hello Universe";

// Non-global replacement - only replaces the first
const singleReplace = text.replace(/Hello/, "Hi");
console.log(singleReplace); // "Hi World Hello Universe"

// Global replacement - replaces all
const globalReplace = text.replace(/Hello/g, "Hi");
console.log(globalReplace); // "Hi World Hi Universe"

// Global replacement with a callback function
const numbered = text.replace(/Hello/g, (match, index) => `${match}(${index})`);
console.log(numbered); // "Hello(0) World Hello(12) Universe"

7.4 Multi-line Mode (m)

The m modifier changes the behavior of ^ and $ to match the start and end of each line.

Default Behavior vs. Multi-line Mode

const multilineText = `First line
Second line
Third line`;

// Default mode: ^ and $ match the beginning and end of the entire string
console.log(/^Second/.test(multilineText)); // false
console.log(/line$/.test(multilineText));   // true (matches the last line)

// Multi-line mode: ^ and $ match the beginning and end of each line
console.log(/^Second/m.test(multilineText)); // true
console.log(/^First/m.test(multilineText));  // true
console.log(/line$/mg.test(multilineText));  // true

Practical Applications

// Processing configuration files: matching configuration items per line
const configText = `
# Configuration file
host=localhost
port=3000
# Database configuration
db_host=127.0.0.1
db_port=5432
`;

// Extract all configuration items (key=value format at the beginning of each line)
const configPattern = /^(\w+)=(.+)$/gm;
const configs = {};
let match;

while ((match = configPattern.exec(configText)) !== null) {
    configs[match[1]] = match[2];
}

console.log(configs);
// { host: "localhost", port: "3000", db_host: "127.0.0.1", db_port: "5432" }

// Match semicolons at the end of each line
const codeText = `console.log("Hello");
let x = 42;
const y = "world"`;

const missingSemicolon = codeText.split('\n').filter(line => 
    line.trim() && !/;$/m.test(line.trim())
);
console.log(missingSemicolon); // ['const y = "world"']

Comment Processing

// Remove comments at the beginning of each line
const textWithComments = `// This is a comment
console.log("code");
// Another comment
let x = 42;
// Final comment`;

const withoutComments = textWithComments.replace(/^\s*\/\/.*$/gm, '').trim();
console.log(withoutComments);
// console.log("code");
// let x = 42;

7.5 Single-Line Mode/dotAll (s)

The s modifier makes the dot character . match any character, including line breaks.

Default Behavior vs. Single-line Mode

const text = `Hello
World`;

// Default: dot does not match line breaks
console.log(/Hello.World/.test(text)); // false

// Single-line mode: dot matches all characters including line breaks
console.log(/Hello.World/s.test(text)); // true

// Equivalent traditional way
console.log(/Hello[\s\S]World/.test(text)); // true
console.log(/Hello[^]*World/.test(text));   // true (supported by some engines)

Practical Applications

// Matching HTML tags and their content (potentially multi-line)
const html = `<div class="content">
  <p>This is a paragraph</p>
  <p>Another paragraph</p>
</div>`;

// Use single-line mode to match multi-line HTML tags
const tagPattern = /<div.*?<\/div>/s;
const match = html.match(tagPattern);
console.log(match[0]); // Matches the entire div and its content

// Extract multi-line comments
const code = `function hello() {
  /*
   * This is a multi-line
   * comment block
   */
  console.log("Hello");
}`;

const multilineComment = /\/\*.*?\*\/s;
const comment = code.match(multilineComment);
console.log(comment[0]);
// /*
//  * This is a multi-line
//  * comment block
//  */

Combination with Other Modifiers

// Combine multiple modifiers
const logText = `[INFO] Starting application
[ERROR] Database connection failed
Error details: connection timeout
[WARN] Retrying connection`;

// Match ERROR level logs and their details (potentially multi-line)
const errorPattern = / \[ERROR\].*?(?=\[|\s*$)/gs;
const errors = logText.match(errorPattern);
console.log(errors);
// ["[ERROR] Database connection failed\nError details: connection timeout\n"]

7.6 Unicode Mode (u)

The u modifier enables full Unicode support.

Unicode Character Matching

// Without u modifier
const withoutU = /\u{1F600}/; // Syntax error or no match

// With u modifier
const withU = /\u{1F600}/u; // 😀
console.log(withU.test("😀")); // true

// Match various Unicode characters
const emojiPattern = /\p{Emoji}/u;
console.log(emojiPattern.test("😀")); // true
console.log(emojiPattern.test("🎉")); // true

// Match Chinese characters
const chinesePattern = /\p{Script=Han}/u;
console.log(chinesePattern.test("中文")); // true
console.log(chinesePattern.test("中")); // true

Character Property Matching

// Match different types of characters
const patterns = {
    letter: /\p{Letter}/u,
    number: /\p{Number}/u,
    punctuation: /\p{Punctuation}/u,
    currency: /\p{Currency_Symbol}/u,
    emoji: /\p{Emoji}/u
};

const text = "Hello123! $100 😀";
Object.entries(patterns).forEach(([name, pattern]) => {
    console.log(`${name}: ${text.match(new RegExp(pattern, 'gu'))}`);
});

Correct Handling of Length Calculation

// Correct handling of Unicode surrogate pairs
const text = "👨‍👩‍👧‍👦"; // Family emoji (compound character)

// Not using the u modifier may lead to incorrect calculation
console.log(text.length); // May not be the expected result

// Use the u modifier for correct character handling
function countUnicodeCharacters(str) {
    return [...str].length;
}

console.log(countUnicodeCharacters(text)); // Correct number of characters

7.7 Sticky Matching (y)

The y modifier performs “sticky” matching, starting the match from the lastIndex position.

Basic Usage

const text = "cat bat rat";
const pattern = /\w+/y;

// Sticky matching must start from the current position
pattern.lastIndex = 0;
console.log(pattern.exec(text)); // ["cat"]

pattern.lastIndex = 4; // Position of 'b'
console.log(pattern.exec(text)); // ["bat"]

pattern.lastIndex = 2; // Position of 't' (not start of a word)
console.log(pattern.exec(text)); // null

Difference from Global Matching

const text = "a1b2c3";

// Global matching: skips non-matching characters
const globalPattern = /\d/g;
console.log(globalPattern.exec(text)); // ["1"]
console.log(globalPattern.exec(text)); // ["2"]

// Sticky matching: must match from the exact position
const stickyPattern = /\d/y;
stickyPattern.lastIndex = 1;
console.log(stickyPattern.exec(text)); // ["1"]
stickyPattern.lastIndex = 3;
console.log(stickyPattern.exec(text)); // ["2"]
stickyPattern.lastIndex = 2; // 'b' is not a digit
console.log(stickyPattern.exec(text)); // null

Practical Application: Lexer

// Simple lexer
class Tokenizer {
    constructor(text) {
        this.text = text;
        this.pos = 0;
        this.tokens = [];
    }
    
    tokenize() {
        const patterns = [
            { type: 'NUMBER', regex: /\d+/y },
            { type: 'IDENTIFIER', regex: /[a-zA-Z_]\w*/y },
            { type: 'OPERATOR', regex: /[+\-*/=]/y },
            { type: 'WHITESPACE', regex: /\s+/y },
            { type: 'SEMICOLON', regex: /;/y }
        ];
        
        while (this.pos < this.text.length) {
            let matched = false;
            
            for (const { type, regex } of patterns) {
                regex.lastIndex = this.pos;
                const match = regex.exec(this.text);
                
                if (match) {
                    if (type !== 'WHITESPACE') { // Ignore whitespace
                        this.tokens.push({ type, value: match[0], pos: this.pos });
                    }
                    this.pos = regex.lastIndex;
                    matched = true;
                    break;
                }
            }
            
            if (!matched) {
                throw new Error(`Unexpected character at position ${this.pos}`);
            }
        }
        
        return this.tokens;
    }
}

const tokenizer = new Tokenizer("let x = 42;");
console.log(tokenizer.tokenize());
// [
//   { type: 'IDENTIFIER', value: 'let', pos: 0 },
//   { type: 'IDENTIFIER', value: 'x', pos: 4 },
//   { type: 'OPERATOR', value: '=', pos: 6 },
//   { type: 'NUMBER', value: '42', pos: 8 },
//   { type: 'SEMICOLON', value: ';', pos: 10 }
// ]

7.8 Combined Use of Modifiers

Multiple modifiers can be combined to create more powerful matching patterns.

Common Combinations

// Global + case-insensitive
const globalIgnoreCase = /hello/gi;
const text = "Hello world, HELLO universe, hello there";
console.log(text.match(globalIgnoreCase)); // ["Hello", "HELLO", "hello"]

// Multi-line + global
const multilineGlobal = /^\w+/gm;
const lines = `first
second
third`;
console.log(lines.match(multilineGlobal)); // ["first", "second", "third"]

// Single-line + global + case-insensitive
const html = `<DIV>
  Content here
</DIV>
<div>
  More content
</div>`;
const tagPattern = /<div.*?<\/div>/gis;
console.log(html.match(tagPattern)); // Matches all div tags (case-insensitive, multi-line)

Practical Example

// Log analysis: extract all error messages (case-insensitive, multi-line, global)
const logContent = `2023-12-01 10:00:00 INFO Application started
2023-12-01 10:01:00 ERROR Database connection failed
Error details: timeout after 30 seconds
2023-12-01 10:02:00 error Network timeout
2023-12-01 10:03:00 WARN Low memory`;

const errorPattern = /^.*error.*$/gim;
const errors = logContent.match(errorPattern);
console.log(errors);
// [
//   "2023-12-01 10:01:00 ERROR Database connection failed",
//   "2023-12-01 10:02:00 error Network timeout"
// ]

// Code comment extraction: extract all types of comments
const sourceCode = `
// Single-line comment
function test() {
  /* Multi-line comment
     spanning multiple lines */
  return true;
}
// Another comment
`;

const commentPattern = /\/\/.*$|\/\*[\s\S]*?\*\//gm;
const comments = sourceCode.match(commentPattern);
console.log(comments);

7.9 Dynamic Modifiers

In some cases, we need to set modifiers dynamically based on conditions.

Building Dynamic Regular Expressions

function createSearchPattern(query, options = {}) {
    const {
        caseSensitive = false,
        wholeWord = false,
        global = true
    } = options;
    
    // Escape special characters
    const escapedQuery = query.replace(/[.*+?^${}()|[\\]/g, '\\$&');
    
    // Build pattern
    let pattern = wholeWord ? `\\b${escapedQuery}\\b` : escapedQuery;
    
    // Build modifiers
    let flags = '';
    if (!caseSensitive) flags += 'i';
    if (global) flags += 'g';
    
    return new RegExp(pattern, flags);
}

// Usage example
const text = "JavaScript is great. I love JavaScript programming.";

const regex1 = createSearchPattern("JavaScript", { caseSensitive: true });
console.log(text.match(regex1)); // ["JavaScript", "JavaScript"]

const regex2 = createSearchPattern("java", { caseSensitive: false });
console.log(text.match(regex2)); // ["Java", "Java"]

const regex3 = createSearchPattern("Script", { wholeWord: true, caseSensitive: false });
console.log(text.match(regex3)); // null (because Script is not a whole word)

Conditional Modifier Application

class FlexibleRegex {
    constructor(pattern) {
        this.pattern = pattern;
        this.flags = new Set();
    }
    
    ignoreCase(enable = true) {
        if (enable) this.flags.add('i');
        else this.flags.delete('i');
        return this;
    }
    
    global(enable = true) {
        if (enable) this.flags.add('g');
        else this.flags.delete('g');
        return this;
    }
    
    multiline(enable = true) {
        if (enable) this.flags.add('m');
        else this.flags.delete('m');
        return this;
    }
    
    dotAll(enable = true) {
        if (enable) this.flags.add('s');
        else this.flags.delete('s');
        return this;
    }
    
    build() {
        return new RegExp(this.pattern, Array.from(this.flags).join(''));
    }
}

// Usage example
const builder = new FlexibleRegex('hello');
const regex = builder.ignoreCase().global().build();
console.log("Hello HELLO hello".match(regex)); // ["Hello", "HELLO", "hello"]

7.10 Exercises

Exercise 1: Search Function Implementation

Implement a text search function that supports the following options:

  • Case-sensitive/case-insensitive
  • Whole word/partial match
  • Return match position and context
// Answer
function advancedSearch(text, query, options = {}) {
    const {
        caseSensitive = false,
        wholeWord = false,
        contextLength = 20
    } = options;
    
    let pattern = query.replace(/[.*+?^${}()|[\\]/g, '\\$&');
    if (wholeWord) pattern = `\\b${pattern}\\b`;
    
    let flags = 'g';
    if (!caseSensitive) flags += 'i';
    
    const regex = new RegExp(pattern, flags);
    const results = [];
    let match;
    
    while ((match = regex.exec(text)) !== null) {
        const start = Math.max(0, match.index - contextLength);
        const end = Math.min(text.length, match.index + match[0].length + contextLength);
        
        results.push({
            match: match[0],
            index: match.index,
            context: text.slice(start, end),
            before: text.slice(start, match.index),
            after: text.slice(match.index + match[0].length, end)
        });
    }
    
    return results;
}

const text = "JavaScript is a powerful programming language. Many developers love JavaScript.";
console.log(advancedSearch(text, "javascript", { caseSensitive: false, wholeWord: true }));

Exercise 2: Configuration File Parser

Write a configuration file parser that supports comments and multi-line values.

// Answer
function parseConfig(configText) {
    const config = {};
    const lines = configText.split('\n');
    let currentKey = null;
    let multilineValue = [];
    let inMultiline = false;
    
    const keyValuePattern = /^(\w+)\s*=\s*(.*)$/;
    const commentPattern = /^\s*#.*$/;
    const emptyLinePattern = /^\s*$/;
    const multilineStartPattern = /"""$/;
    const multilineEndPattern = /^"""/;
    
    for (const line of lines) {
        // Skip comments and empty lines
        if (commentPattern.test(line) || emptyLinePattern.test(line)) {
            continue;
        }
        
        if (inMultiline) {
            if (multilineEndPattern.test(line)) {
                config[currentKey] = multilineValue.join('\n');
                inMultiline = false;
                currentKey = null;
                multilineValue = [];
            } else {
                multilineValue.push(line);
            }
        } else {
            const match = line.match(keyValuePattern);
            if (match) {
                const [, key, value] = match;
                if (multilineStartPattern.test(value)) {
                    currentKey = key;
                    inMultiline = true;
                } else {
                    config[key] = value.replace(/^["'](.*)["']$/, '$1'); // Remove quotes
                }
            }
        }
    }
    
    return config;
}

const configText = `
# Application configuration
app_name=MyApp
app_version="1.0.0"

# Database configuration  
db_host=localhost
description="""
This is a multi-line
description text
can contain line breaks
"""

# Other settings
debug=true
`;

console.log(parseConfig(configText));

Summary

Modifiers are an important part of regular expressions; they change the matching behavior of regular expressions:

  1. i modifier: Case-insensitive matching, suitable for searching and validation
  2. g modifier: Global matching, gets all occurrences instead of just the first
  3. m modifier: Multi-line mode, changes the behavior of ^ and $
  4. s modifier: Single-line mode, makes the dot match line breaks
  5. u modifier: Unicode mode, correctly handles Unicode characters
  6. y modifier: Sticky matching, used for precise position matching
  7. Modifier combinations: Multiple modifiers can be used in combination
  8. Dynamic modifiers: Dynamically build regular expressions based on conditions

Understanding and mastering the use of modifiers can make our regular expressions more flexible and powerful.