Chapter 7: Modifiers and Patterns
Chapter 7: Modifiers and Patterns
Learning Objectives
- Master case-insensitive matching (i)
- Learn to use multi-line mode (m)
- Understand the role of single-line mode (s)
- Master the use of global matching (g)
- Understand the function of other common modifiers
7.1 Overview of Modifiers
Modifiers (Flags/Modifiers) are options for regular expressions that change the matching behavior of the regular expression. They usually appear at the end of the regular expression.
Syntax of Modifiers
// Literal syntax
const regex = /pattern/flags;
// Constructor syntax
const regex = new RegExp('pattern', 'flags');
// Common modifiers
const examples = {
'i': /hello/i, // Case-insensitive
'g': /hello/g, // Global match
'm': /^hello/m, // Multi-line mode
's': /hello.world/s, // Single-line mode (dotAll)
'u': /\u{1f600}/u, // Unicode mode
'y': /hello/y // Sticky matching
};
7.2 Case-Insensitive (i)
The i modifier makes the regular expression ignore case differences.
Basic Usage
// Case-sensitive (default)
const caseSensitive = /hello/;
console.log(caseSensitive.test("Hello")); // false
console.log(caseSensitive.test("hello")); // true
// Case-insensitive
const caseInsensitive = /hello/i;
console.log(caseInsensitive.test("Hello")); // true
console.log(caseInsensitive.test("HELLO")); // true
console.log(caseInsensitive.test("HeLLo")); // true
Practical Applications
// Search function: user input is case-insensitive
function searchText(content, query) {
const searchRegex = new RegExp(query.replace(/[.*+?^${}()|[\\]/g, '\\$&'), 'gi');
return content.match(searchRegex) || [];
}
const content = "JavaScript is a Programming Language. Java is different from JavaScript.";
console.log(searchText(content, "javascript"));
// ["JavaScript", "JavaScript"]
// Email validation: domain part is case-insensitive
const emailPattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/i;
console.log(emailPattern.test("User@Example.COM")); // true
// URL matching: protocol and domain are case-insensitive
const urlPattern = /^https?:\/\/[\w.-]+/i;
console.log(urlPattern.test("HTTPS://EXAMPLE.COM")); // true
Case in Character Classes
// Character classes automatically include corresponding cases
const letterPattern = /[a-z]/i;
console.log(letterPattern.test("A")); // true
console.log(letterPattern.test("a")); // true
// Equivalent to
const equivalentPattern = /[a-zA-Z]/;
console.log(equivalentPattern.test("A")); // true
console.log(equivalentPattern.test("a")); // true
// Case of Unicode characters
const unicodePattern = /café/i;
console.log(unicodePattern.test("CAFÉ")); // true
console.log(unicodePattern.test("Café")); // true
7.3 Global Match (g)
The g modifier makes the regular expression match all possible occurrences, rather than just the first one.
Basic Comparison
const text = "cat bat rat";
// Without global match - only matches the first
const nonGlobal = /\w+at/;
console.log(text.match(nonGlobal)); // ["cat"]
// With global match - matches all
const global = /\w+at/g;
console.log(text.match(global)); // ["cat", "bat", "rat"]
Different Behavior of match() Method
const text = "The year 2023 and 2024";
// Non-global match: returns detailed information
const nonGlobalMatch = text.match(/\d{4}/);
console.log(nonGlobalMatch);
// ["2023", index: 9, input: "The year 2023 and 2024", groups: undefined]
// Global match: returns an array of all matches
const globalMatch = text.match(/\d{4}/g);
console.log(globalMatch); // ["2023", "2024"]
exec() Method with Global Match
const text = "apple 123, banana 456, cherry 789";
const pattern = /(\w+) (\d+)/g;
// Use exec() to iterate and get detailed information
let match;
while ((match = pattern.exec(text)) !== null) {
console.log(`${match[1]}: ${match[2]}`);
// apple: 123
// banana: 456
// cherry: 789
}
// Use matchAll() to get all detailed matches (ES2020)
const allMatches = [...text.matchAll(/(\w+) (\d+)/g)];
allMatches.forEach(match => {
console.log(`${match[1]}: ${match[2]}`);
});
Global Match in Replacement
const text = "Hello World Hello Universe";
// Non-global replacement - only replaces the first
const singleReplace = text.replace(/Hello/, "Hi");
console.log(singleReplace); // "Hi World Hello Universe"
// Global replacement - replaces all
const globalReplace = text.replace(/Hello/g, "Hi");
console.log(globalReplace); // "Hi World Hi Universe"
// Global replacement with a callback function
const numbered = text.replace(/Hello/g, (match, index) => `${match}(${index})`);
console.log(numbered); // "Hello(0) World Hello(12) Universe"
7.4 Multi-line Mode (m)
The m modifier changes the behavior of ^ and $ to match the start and end of each line.
Default Behavior vs. Multi-line Mode
const multilineText = `First line
Second line
Third line`;
// Default mode: ^ and $ match the beginning and end of the entire string
console.log(/^Second/.test(multilineText)); // false
console.log(/line$/.test(multilineText)); // true (matches the last line)
// Multi-line mode: ^ and $ match the beginning and end of each line
console.log(/^Second/m.test(multilineText)); // true
console.log(/^First/m.test(multilineText)); // true
console.log(/line$/mg.test(multilineText)); // true
Practical Applications
// Processing configuration files: matching configuration items per line
const configText = `
# Configuration file
host=localhost
port=3000
# Database configuration
db_host=127.0.0.1
db_port=5432
`;
// Extract all configuration items (key=value format at the beginning of each line)
const configPattern = /^(\w+)=(.+)$/gm;
const configs = {};
let match;
while ((match = configPattern.exec(configText)) !== null) {
configs[match[1]] = match[2];
}
console.log(configs);
// { host: "localhost", port: "3000", db_host: "127.0.0.1", db_port: "5432" }
// Match semicolons at the end of each line
const codeText = `console.log("Hello");
let x = 42;
const y = "world"`;
const missingSemicolon = codeText.split('\n').filter(line =>
line.trim() && !/;$/m.test(line.trim())
);
console.log(missingSemicolon); // ['const y = "world"']
Comment Processing
// Remove comments at the beginning of each line
const textWithComments = `// This is a comment
console.log("code");
// Another comment
let x = 42;
// Final comment`;
const withoutComments = textWithComments.replace(/^\s*\/\/.*$/gm, '').trim();
console.log(withoutComments);
// console.log("code");
// let x = 42;
7.5 Single-Line Mode/dotAll (s)
The s modifier makes the dot character . match any character, including line breaks.
Default Behavior vs. Single-line Mode
const text = `Hello
World`;
// Default: dot does not match line breaks
console.log(/Hello.World/.test(text)); // false
// Single-line mode: dot matches all characters including line breaks
console.log(/Hello.World/s.test(text)); // true
// Equivalent traditional way
console.log(/Hello[\s\S]World/.test(text)); // true
console.log(/Hello[^]*World/.test(text)); // true (supported by some engines)
Practical Applications
// Matching HTML tags and their content (potentially multi-line)
const html = `<div class="content">
<p>This is a paragraph</p>
<p>Another paragraph</p>
</div>`;
// Use single-line mode to match multi-line HTML tags
const tagPattern = /<div.*?<\/div>/s;
const match = html.match(tagPattern);
console.log(match[0]); // Matches the entire div and its content
// Extract multi-line comments
const code = `function hello() {
/*
* This is a multi-line
* comment block
*/
console.log("Hello");
}`;
const multilineComment = /\/\*.*?\*\/s;
const comment = code.match(multilineComment);
console.log(comment[0]);
// /*
// * This is a multi-line
// * comment block
// */
Combination with Other Modifiers
// Combine multiple modifiers
const logText = `[INFO] Starting application
[ERROR] Database connection failed
Error details: connection timeout
[WARN] Retrying connection`;
// Match ERROR level logs and their details (potentially multi-line)
const errorPattern = / \[ERROR\].*?(?=\[|\s*$)/gs;
const errors = logText.match(errorPattern);
console.log(errors);
// ["[ERROR] Database connection failed\nError details: connection timeout\n"]
7.6 Unicode Mode (u)
The u modifier enables full Unicode support.
Unicode Character Matching
// Without u modifier
const withoutU = /\u{1F600}/; // Syntax error or no match
// With u modifier
const withU = /\u{1F600}/u; // 😀
console.log(withU.test("😀")); // true
// Match various Unicode characters
const emojiPattern = /\p{Emoji}/u;
console.log(emojiPattern.test("😀")); // true
console.log(emojiPattern.test("🎉")); // true
// Match Chinese characters
const chinesePattern = /\p{Script=Han}/u;
console.log(chinesePattern.test("中文")); // true
console.log(chinesePattern.test("中")); // true
Character Property Matching
// Match different types of characters
const patterns = {
letter: /\p{Letter}/u,
number: /\p{Number}/u,
punctuation: /\p{Punctuation}/u,
currency: /\p{Currency_Symbol}/u,
emoji: /\p{Emoji}/u
};
const text = "Hello123! $100 😀";
Object.entries(patterns).forEach(([name, pattern]) => {
console.log(`${name}: ${text.match(new RegExp(pattern, 'gu'))}`);
});
Correct Handling of Length Calculation
// Correct handling of Unicode surrogate pairs
const text = "👨👩👧👦"; // Family emoji (compound character)
// Not using the u modifier may lead to incorrect calculation
console.log(text.length); // May not be the expected result
// Use the u modifier for correct character handling
function countUnicodeCharacters(str) {
return [...str].length;
}
console.log(countUnicodeCharacters(text)); // Correct number of characters
7.7 Sticky Matching (y)
The y modifier performs “sticky” matching, starting the match from the lastIndex position.
Basic Usage
const text = "cat bat rat";
const pattern = /\w+/y;
// Sticky matching must start from the current position
pattern.lastIndex = 0;
console.log(pattern.exec(text)); // ["cat"]
pattern.lastIndex = 4; // Position of 'b'
console.log(pattern.exec(text)); // ["bat"]
pattern.lastIndex = 2; // Position of 't' (not start of a word)
console.log(pattern.exec(text)); // null
Difference from Global Matching
const text = "a1b2c3";
// Global matching: skips non-matching characters
const globalPattern = /\d/g;
console.log(globalPattern.exec(text)); // ["1"]
console.log(globalPattern.exec(text)); // ["2"]
// Sticky matching: must match from the exact position
const stickyPattern = /\d/y;
stickyPattern.lastIndex = 1;
console.log(stickyPattern.exec(text)); // ["1"]
stickyPattern.lastIndex = 3;
console.log(stickyPattern.exec(text)); // ["2"]
stickyPattern.lastIndex = 2; // 'b' is not a digit
console.log(stickyPattern.exec(text)); // null
Practical Application: Lexer
// Simple lexer
class Tokenizer {
constructor(text) {
this.text = text;
this.pos = 0;
this.tokens = [];
}
tokenize() {
const patterns = [
{ type: 'NUMBER', regex: /\d+/y },
{ type: 'IDENTIFIER', regex: /[a-zA-Z_]\w*/y },
{ type: 'OPERATOR', regex: /[+\-*/=]/y },
{ type: 'WHITESPACE', regex: /\s+/y },
{ type: 'SEMICOLON', regex: /;/y }
];
while (this.pos < this.text.length) {
let matched = false;
for (const { type, regex } of patterns) {
regex.lastIndex = this.pos;
const match = regex.exec(this.text);
if (match) {
if (type !== 'WHITESPACE') { // Ignore whitespace
this.tokens.push({ type, value: match[0], pos: this.pos });
}
this.pos = regex.lastIndex;
matched = true;
break;
}
}
if (!matched) {
throw new Error(`Unexpected character at position ${this.pos}`);
}
}
return this.tokens;
}
}
const tokenizer = new Tokenizer("let x = 42;");
console.log(tokenizer.tokenize());
// [
// { type: 'IDENTIFIER', value: 'let', pos: 0 },
// { type: 'IDENTIFIER', value: 'x', pos: 4 },
// { type: 'OPERATOR', value: '=', pos: 6 },
// { type: 'NUMBER', value: '42', pos: 8 },
// { type: 'SEMICOLON', value: ';', pos: 10 }
// ]
7.8 Combined Use of Modifiers
Multiple modifiers can be combined to create more powerful matching patterns.
Common Combinations
// Global + case-insensitive
const globalIgnoreCase = /hello/gi;
const text = "Hello world, HELLO universe, hello there";
console.log(text.match(globalIgnoreCase)); // ["Hello", "HELLO", "hello"]
// Multi-line + global
const multilineGlobal = /^\w+/gm;
const lines = `first
second
third`;
console.log(lines.match(multilineGlobal)); // ["first", "second", "third"]
// Single-line + global + case-insensitive
const html = `<DIV>
Content here
</DIV>
<div>
More content
</div>`;
const tagPattern = /<div.*?<\/div>/gis;
console.log(html.match(tagPattern)); // Matches all div tags (case-insensitive, multi-line)
Practical Example
// Log analysis: extract all error messages (case-insensitive, multi-line, global)
const logContent = `2023-12-01 10:00:00 INFO Application started
2023-12-01 10:01:00 ERROR Database connection failed
Error details: timeout after 30 seconds
2023-12-01 10:02:00 error Network timeout
2023-12-01 10:03:00 WARN Low memory`;
const errorPattern = /^.*error.*$/gim;
const errors = logContent.match(errorPattern);
console.log(errors);
// [
// "2023-12-01 10:01:00 ERROR Database connection failed",
// "2023-12-01 10:02:00 error Network timeout"
// ]
// Code comment extraction: extract all types of comments
const sourceCode = `
// Single-line comment
function test() {
/* Multi-line comment
spanning multiple lines */
return true;
}
// Another comment
`;
const commentPattern = /\/\/.*$|\/\*[\s\S]*?\*\//gm;
const comments = sourceCode.match(commentPattern);
console.log(comments);
7.9 Dynamic Modifiers
In some cases, we need to set modifiers dynamically based on conditions.
Building Dynamic Regular Expressions
function createSearchPattern(query, options = {}) {
const {
caseSensitive = false,
wholeWord = false,
global = true
} = options;
// Escape special characters
const escapedQuery = query.replace(/[.*+?^${}()|[\\]/g, '\\$&');
// Build pattern
let pattern = wholeWord ? `\\b${escapedQuery}\\b` : escapedQuery;
// Build modifiers
let flags = '';
if (!caseSensitive) flags += 'i';
if (global) flags += 'g';
return new RegExp(pattern, flags);
}
// Usage example
const text = "JavaScript is great. I love JavaScript programming.";
const regex1 = createSearchPattern("JavaScript", { caseSensitive: true });
console.log(text.match(regex1)); // ["JavaScript", "JavaScript"]
const regex2 = createSearchPattern("java", { caseSensitive: false });
console.log(text.match(regex2)); // ["Java", "Java"]
const regex3 = createSearchPattern("Script", { wholeWord: true, caseSensitive: false });
console.log(text.match(regex3)); // null (because Script is not a whole word)
Conditional Modifier Application
class FlexibleRegex {
constructor(pattern) {
this.pattern = pattern;
this.flags = new Set();
}
ignoreCase(enable = true) {
if (enable) this.flags.add('i');
else this.flags.delete('i');
return this;
}
global(enable = true) {
if (enable) this.flags.add('g');
else this.flags.delete('g');
return this;
}
multiline(enable = true) {
if (enable) this.flags.add('m');
else this.flags.delete('m');
return this;
}
dotAll(enable = true) {
if (enable) this.flags.add('s');
else this.flags.delete('s');
return this;
}
build() {
return new RegExp(this.pattern, Array.from(this.flags).join(''));
}
}
// Usage example
const builder = new FlexibleRegex('hello');
const regex = builder.ignoreCase().global().build();
console.log("Hello HELLO hello".match(regex)); // ["Hello", "HELLO", "hello"]
7.10 Exercises
Exercise 1: Search Function Implementation
Implement a text search function that supports the following options:
- Case-sensitive/case-insensitive
- Whole word/partial match
- Return match position and context
// Answer
function advancedSearch(text, query, options = {}) {
const {
caseSensitive = false,
wholeWord = false,
contextLength = 20
} = options;
let pattern = query.replace(/[.*+?^${}()|[\\]/g, '\\$&');
if (wholeWord) pattern = `\\b${pattern}\\b`;
let flags = 'g';
if (!caseSensitive) flags += 'i';
const regex = new RegExp(pattern, flags);
const results = [];
let match;
while ((match = regex.exec(text)) !== null) {
const start = Math.max(0, match.index - contextLength);
const end = Math.min(text.length, match.index + match[0].length + contextLength);
results.push({
match: match[0],
index: match.index,
context: text.slice(start, end),
before: text.slice(start, match.index),
after: text.slice(match.index + match[0].length, end)
});
}
return results;
}
const text = "JavaScript is a powerful programming language. Many developers love JavaScript.";
console.log(advancedSearch(text, "javascript", { caseSensitive: false, wholeWord: true }));
Exercise 2: Configuration File Parser
Write a configuration file parser that supports comments and multi-line values.
// Answer
function parseConfig(configText) {
const config = {};
const lines = configText.split('\n');
let currentKey = null;
let multilineValue = [];
let inMultiline = false;
const keyValuePattern = /^(\w+)\s*=\s*(.*)$/;
const commentPattern = /^\s*#.*$/;
const emptyLinePattern = /^\s*$/;
const multilineStartPattern = /"""$/;
const multilineEndPattern = /^"""/;
for (const line of lines) {
// Skip comments and empty lines
if (commentPattern.test(line) || emptyLinePattern.test(line)) {
continue;
}
if (inMultiline) {
if (multilineEndPattern.test(line)) {
config[currentKey] = multilineValue.join('\n');
inMultiline = false;
currentKey = null;
multilineValue = [];
} else {
multilineValue.push(line);
}
} else {
const match = line.match(keyValuePattern);
if (match) {
const [, key, value] = match;
if (multilineStartPattern.test(value)) {
currentKey = key;
inMultiline = true;
} else {
config[key] = value.replace(/^["'](.*)["']$/, '$1'); // Remove quotes
}
}
}
}
return config;
}
const configText = `
# Application configuration
app_name=MyApp
app_version="1.0.0"
# Database configuration
db_host=localhost
description="""
This is a multi-line
description text
can contain line breaks
"""
# Other settings
debug=true
`;
console.log(parseConfig(configText));
Summary
Modifiers are an important part of regular expressions; they change the matching behavior of regular expressions:
- i modifier: Case-insensitive matching, suitable for searching and validation
- g modifier: Global matching, gets all occurrences instead of just the first
- m modifier: Multi-line mode, changes the behavior of
^and$ - s modifier: Single-line mode, makes the dot match line breaks
- u modifier: Unicode mode, correctly handles Unicode characters
- y modifier: Sticky matching, used for precise position matching
- Modifier combinations: Multiple modifiers can be used in combination
- Dynamic modifiers: Dynamically build regular expressions based on conditions
Understanding and mastering the use of modifiers can make our regular expressions more flexible and powerful.