第 7 章：修饰符和模式

Haiyue2025/9/1大约 10 分钟

第 7 章：修饰符和模式

学习目标

掌握不区分大小写匹配(i)
学会使用多行模式(m)
理解单行模式(s)的作用
掌握全局匹配(g)的使用
了解其他常用修饰符的作用

7.1 修饰符概述

修饰符（Flags/Modifiers）是正则表达式的选项，用来改变正则表达式的匹配行为。它们通常出现在正则表达式的末尾。

修饰符的语法

// 字面量语法
const regex = /pattern/flags;

// 构造函数语法
const regex = new RegExp('pattern', 'flags');

// 常见修饰符
const examples = {
    'i': /hello/i,           // 不区分大小写
    'g': /hello/g,           // 全局匹配
    'm': /^hello/m,          // 多行模式
    's': /hello.world/s,     // 单行模式（dotAll）
    'u': /\u{1f600}/u,       // Unicode模式
    'y': /hello/y            // 粘性匹配
};

7.2 不区分大小写 (i)

i 修饰符让正则表达式忽略大小写差异。

基础用法

// 区分大小写（默认）
const caseSensitive = /hello/;
console.log(caseSensitive.test("Hello")); // false
console.log(caseSensitive.test("hello")); // true

// 不区分大小写
const caseInsensitive = /hello/i;
console.log(caseInsensitive.test("Hello")); // true
console.log(caseInsensitive.test("HELLO")); // true
console.log(caseInsensitive.test("HeLLo")); // true

实际应用

// 搜索功能：用户输入不区分大小写
function searchText(content, query) {
    const searchRegex = new RegExp(query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'), 'gi');
    return content.match(searchRegex) || [];
}

const content = "JavaScript is a Programming Language. Java is different from JavaScript.";
console.log(searchText(content, "javascript")); 
// ["JavaScript", "JavaScript"]

// 邮箱验证：域名部分不区分大小写
const emailPattern = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/i;
console.log(emailPattern.test("User@Example.COM")); // true

// URL匹配：协议和域名不区分大小写
const urlPattern = /^https?:\/\/[\w.-]+/i;
console.log(urlPattern.test("HTTPS://EXAMPLE.COM")); // true

字符类中的大小写

// 字符类会自动包含对应的大小写
const letterPattern = /[a-z]/i;
console.log(letterPattern.test("A")); // true
console.log(letterPattern.test("a")); // true

// 等效于
const equivalentPattern = /[a-zA-Z]/;
console.log(equivalentPattern.test("A")); // true
console.log(equivalentPattern.test("a")); // true

// Unicode 字符的大小写
const unicodePattern = /café/i;
console.log(unicodePattern.test("CAFÉ")); // true
console.log(unicodePattern.test("Café")); // true

7.3 全局匹配 (g)

g 修饰符使正则表达式匹配所有可能的匹配项，而不是只匹配第一个。

基础对比

const text = "cat bat rat";

// 不使用全局匹配 - 只匹配第一个
const nonGlobal = /\w+at/;
console.log(text.match(nonGlobal)); // ["cat"]

// 使用全局匹配 - 匹配所有
const global = /\w+at/g;
console.log(text.match(global)); // ["cat", "bat", "rat"]

match() 方法的不同行为

const text = "The year 2023 and 2024";

// 非全局匹配：返回详细信息
const nonGlobalMatch = text.match(/\d{4}/);
console.log(nonGlobalMatch);
// ["2023", index: 9, input: "The year 2023 and 2024", groups: undefined]

// 全局匹配：返回所有匹配项的数组
const globalMatch = text.match(/\d{4}/g);
console.log(globalMatch); // ["2023", "2024"]

exec() 方法与全局匹配

const text = "apple 123, banana 456, cherry 789";
const pattern = /(\w+) (\d+)/g;

// 使用 exec() 迭代获取详细信息
let match;
while ((match = pattern.exec(text)) !== null) {
    console.log(`${match[1]}: ${match[2]}`);
    // apple: 123
    // banana: 456
    // cherry: 789
}

// 使用 matchAll() 获取所有详细匹配（ES2020）
const allMatches = [...text.matchAll(/(\w+) (\d+)/g)];
allMatches.forEach(match => {
    console.log(`${match[1]}: ${match[2]}`);
});

替换中的全局匹配

const text = "Hello World Hello Universe";

// 非全局替换 - 只替换第一个
const singleReplace = text.replace(/Hello/, "Hi");
console.log(singleReplace); // "Hi World Hello Universe"

// 全局替换 - 替换所有
const globalReplace = text.replace(/Hello/g, "Hi");
console.log(globalReplace); // "Hi World Hi Universe"

// 带回调函数的全局替换
const numbered = text.replace(/Hello/g, (match, index) => `${match}(${index})`);
console.log(numbered); // "Hello(0) World Hello(12) Universe"

7.4 多行模式 (m)

m 修饰符改变 ^ 和 $ 的行为，使它们匹配每行的开始和结束。

默认行为 vs 多行模式

const multilineText = `First line
Second line
Third line`;

// 默认模式：^ 和 $ 匹配整个字符串的开始和结束
console.log(/^Second/.test(multilineText)); // false
console.log(/line$/.test(multilineText));   // true（匹配最后一行）

// 多行模式：^ 和 $ 匹配每行的开始和结束
console.log(/^Second/m.test(multilineText)); // true
console.log(/^First/m.test(multilineText));  // true
console.log(/line$/mg.test(multilineText));  // true

实际应用

// 处理配置文件：匹配每行的配置项
const configText = `
# 配置文件
host=localhost
port=3000
# 数据库配置
db_host=127.0.0.1
db_port=5432
`;

// 提取所有配置项（每行开始的key=value格式）
const configPattern = /^(\w+)=(.+)$/gm;
const configs = {};
let match;

while ((match = configPattern.exec(configText)) !== null) {
    configs[match[1]] = match[2];
}

console.log(configs);
// { host: "localhost", port: "3000", db_host: "127.0.0.1", db_port: "5432" }

// 匹配每行末尾的分号
const codeText = `console.log("Hello");
let x = 42;
const y = "world"`;

const missingSemicolon = codeText.split('\n').filter(line => 
    line.trim() && !/;$/m.test(line.trim())
);
console.log(missingSemicolon); // ['const y = "world"']

注释处理

// 移除每行开头的注释
const textWithComments = `// This is a comment
console.log("code");
// Another comment
let x = 42;
// Final comment`;

const withoutComments = textWithComments.replace(/^\s*\/\/.*$/gm, '').trim();
console.log(withoutComments);
// console.log("code");
// let x = 42;

7.5 单行模式/dotAll (s)

s 修饰符让点号 . 匹配包括换行符在内的任何字符。

默认行为 vs 单行模式

const text = `Hello
World`;

// 默认：点号不匹配换行符
console.log(/Hello.World/.test(text)); // false

// 单行模式：点号匹配所有字符包括换行符
console.log(/Hello.World/s.test(text)); // true

// 等效的传统写法
console.log(/Hello[\s\S]World/.test(text)); // true
console.log(/Hello[^]*World/.test(text));   // true (某些引擎支持)

实际应用

// 匹配HTML标签及其内容（可能跨行）
const html = `<div class="content">
  <p>This is a paragraph</p>
  <p>Another paragraph</p>
</div>`;

// 使用单行模式匹配跨行的HTML标签
const tagPattern = /<div.*?<\/div>/s;
const match = html.match(tagPattern);
console.log(match[0]); // 匹配整个div及其内容

// 提取多行注释
const code = `function hello() {
  /*
   * This is a multi-line
   * comment block
   */
  console.log("Hello");
}`;

const multilineComment = /\/\*.*?\*\//s;
const comment = code.match(multilineComment);
console.log(comment[0]);
// /*
//  * This is a multi-line
//  * comment block
//  */

与其他修饰符组合

// 组合使用多个修饰符
const logText = `[INFO] Starting application
[ERROR] Database connection failed
Error details: connection timeout
[WARN] Retrying connection`;

// 匹配ERROR级别的日志及其详细信息（可能跨行）
const errorPattern = /\[ERROR\].*?(?=\[|\s*$)/gs;
const errors = logText.match(errorPattern);
console.log(errors);
// ["[ERROR] Database connection failed\nError details: connection timeout\n"]

7.6 Unicode 模式 (u)

u 修饰符启用完整的Unicode支持。

Unicode 字符匹配

// 没有 u 修饰符
const withoutU = /\u{1F600}/; // 语法错误或不匹配

// 使用 u 修饰符
const withU = /\u{1F600}/u; // 😀
console.log(withU.test("😀")); // true

// 匹配各种Unicode字符
const emojiPattern = /\p{Emoji}/u;
console.log(emojiPattern.test("😀")); // true
console.log(emojiPattern.test("🎉")); // true

// 匹配中文字符
const chinesePattern = /\p{Script=Han}/u;
console.log(chinesePattern.test("中文")); // true
console.log(chinesePattern.test("中")); // true

字符属性匹配

// 匹配不同类型的字符
const patterns = {
    letter: /\p{Letter}/u,
    number: /\p{Number}/u,
    punctuation: /\p{Punctuation}/u,
    currency: /\p{Currency_Symbol}/u,
    emoji: /\p{Emoji}/u
};

const text = "Hello123! $100 😀";
Object.entries(patterns).forEach(([name, pattern]) => {
    console.log(`${name}: ${text.match(new RegExp(pattern, 'gu'))}`);
});

长度计算的正确处理

// Unicode 代理对的正确处理
const text = "👨‍👩‍👧‍👦"; // 家庭emoji（复合字符）

// 不使用 u 修饰符可能导致错误计算
console.log(text.length); // 可能不是期望的结果

// 使用 u 修饰符进行正确的字符处理
function countUnicodeCharacters(str) {
    return [...str].length;
}

console.log(countUnicodeCharacters(text)); // 正确的字符数

7.7 粘性匹配 (y)

y 修饰符进行"粘性"匹配，从lastIndex位置开始匹配。

基础用法

const text = "cat bat rat";
const pattern = /\w+/y;

// 粘性匹配必须从当前位置开始
pattern.lastIndex = 0;
console.log(pattern.exec(text)); // ["cat"]

pattern.lastIndex = 4; // 'b' 的位置
console.log(pattern.exec(text)); // ["bat"]

pattern.lastIndex = 2; // 't' 的位置（不是单词开始）
console.log(pattern.exec(text)); // null

与全局匹配的区别

const text = "a1b2c3";

// 全局匹配：会跳过不匹配的字符
const globalPattern = /\d/g;
console.log(globalPattern.exec(text)); // ["1"]
console.log(globalPattern.exec(text)); // ["2"]

// 粘性匹配：必须从确切位置匹配
const stickyPattern = /\d/y;
stickyPattern.lastIndex = 1;
console.log(stickyPattern.exec(text)); // ["1"]
stickyPattern.lastIndex = 3;
console.log(stickyPattern.exec(text)); // ["2"]
stickyPattern.lastIndex = 2; // 'b' 不是数字
console.log(stickyPattern.exec(text)); // null

实际应用：词法分析器

// 简单的词法分析器
class Tokenizer {
    constructor(text) {
        this.text = text;
        this.pos = 0;
        this.tokens = [];
    }
    
    tokenize() {
        const patterns = [
            { type: 'NUMBER', regex: /\d+/y },
            { type: 'IDENTIFIER', regex: /[a-zA-Z_]\w*/y },
            { type: 'OPERATOR', regex: /[+\-*/=]/y },
            { type: 'WHITESPACE', regex: /\s+/y },
            { type: 'SEMICOLON', regex: /;/y }
        ];
        
        while (this.pos < this.text.length) {
            let matched = false;
            
            for (const { type, regex } of patterns) {
                regex.lastIndex = this.pos;
                const match = regex.exec(this.text);
                
                if (match) {
                    if (type !== 'WHITESPACE') { // 忽略空白
                        this.tokens.push({ type, value: match[0], pos: this.pos });
                    }
                    this.pos = regex.lastIndex;
                    matched = true;
                    break;
                }
            }
            
            if (!matched) {
                throw new Error(`Unexpected character at position ${this.pos}`);
            }
        }
        
        return this.tokens;
    }
}

const tokenizer = new Tokenizer("let x = 42;");
console.log(tokenizer.tokenize());
// [
//   { type: 'IDENTIFIER', value: 'let', pos: 0 },
//   { type: 'IDENTIFIER', value: 'x', pos: 4 },
//   { type: 'OPERATOR', value: '=', pos: 6 },
//   { type: 'NUMBER', value: '42', pos: 8 },
//   { type: 'SEMICOLON', value: ';', pos: 10 }
// ]

7.8 修饰符的组合使用

多个修饰符可以组合使用，创建更强大的匹配模式。

常见组合

// 全局 + 不区分大小写
const globalIgnoreCase = /hello/gi;
const text = "Hello world, HELLO universe, hello there";
console.log(text.match(globalIgnoreCase)); // ["Hello", "HELLO", "hello"]

// 多行 + 全局
const multilineGlobal = /^\w+/gm;
const lines = `first
second
third`;
console.log(lines.match(multilineGlobal)); // ["first", "second", "third"]

// 单行 + 全局 + 不区分大小写
const html = `<DIV>
  Content here
</DIV>
<div>
  More content
</div>`;
const tagPattern = /<div.*?<\/div>/gis;
console.log(html.match(tagPattern)); // 匹配所有div标签（不区分大小写，跨行）

实际应用示例

// 日志分析：提取所有错误信息（不区分大小写，多行，全局）
const logContent = `2023-12-01 10:00:00 INFO Application started
2023-12-01 10:01:00 ERROR Database connection failed
Error details: timeout after 30 seconds
2023-12-01 10:02:00 error Network timeout
2023-12-01 10:03:00 WARN Low memory`;

const errorPattern = /^.*error.*$/gim;
const errors = logContent.match(errorPattern);
console.log(errors);
// [
//   "2023-12-01 10:01:00 ERROR Database connection failed",
//   "2023-12-01 10:02:00 error Network timeout"
// ]

// 代码注释提取：提取所有类型的注释
const sourceCode = `
// 单行注释
function test() {
  /* 多行注释
     跨越多行 */
  return true;
}
// 另一个注释
`;

const commentPattern = /\/\/.*$|\/\*[\s\S]*?\*\//gm;
const comments = sourceCode.match(commentPattern);
console.log(comments);

7.9 动态修饰符

在某些情况下，我们需要根据条件动态设置修饰符。

构建动态正则表达式

function createSearchPattern(query, options = {}) {
    const {
        caseSensitive = false,
        wholeWord = false,
        global = true
    } = options;
    
    // 转义特殊字符
    const escapedQuery = query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
    
    // 构建模式
    let pattern = wholeWord ? `\\b${escapedQuery}\\b` : escapedQuery;
    
    // 构建修饰符
    let flags = '';
    if (!caseSensitive) flags += 'i';
    if (global) flags += 'g';
    
    return new RegExp(pattern, flags);
}

// 使用示例
const text = "JavaScript is great. I love JavaScript programming.";

const regex1 = createSearchPattern("JavaScript", { caseSensitive: true });
console.log(text.match(regex1)); // ["JavaScript", "JavaScript"]

const regex2 = createSearchPattern("java", { caseSensitive: false });
console.log(text.match(regex2)); // ["Java", "Java"]

const regex3 = createSearchPattern("Script", { wholeWord: true, caseSensitive: false });
console.log(text.match(regex3)); // null (因为Script不是完整单词)

条件修饰符应用

class FlexibleRegex {
    constructor(pattern) {
        this.pattern = pattern;
        this.flags = new Set();
    }
    
    ignoreCase(enable = true) {
        if (enable) this.flags.add('i');
        else this.flags.delete('i');
        return this;
    }
    
    global(enable = true) {
        if (enable) this.flags.add('g');
        else this.flags.delete('g');
        return this;
    }
    
    multiline(enable = true) {
        if (enable) this.flags.add('m');
        else this.flags.delete('m');
        return this;
    }
    
    dotAll(enable = true) {
        if (enable) this.flags.add('s');
        else this.flags.delete('s');
        return this;
    }
    
    build() {
        return new RegExp(this.pattern, Array.from(this.flags).join(''));
    }
}

// 使用示例
const builder = new FlexibleRegex('hello');
const regex = builder.ignoreCase().global().build();
console.log("Hello HELLO hello".match(regex)); // ["Hello", "HELLO", "hello"]

7.10 练习题

练习1：搜索功能实现

实现一个文本搜索函数，支持以下选项：

区分/不区分大小写
全词匹配/部分匹配
返回匹配位置和上下文

// 答案
function advancedSearch(text, query, options = {}) {
    const {
        caseSensitive = false,
        wholeWord = false,
        contextLength = 20
    } = options;
    
    let pattern = query.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
    if (wholeWord) pattern = `\\b${pattern}\\b`;
    
    let flags = 'g';
    if (!caseSensitive) flags += 'i';
    
    const regex = new RegExp(pattern, flags);
    const results = [];
    let match;
    
    while ((match = regex.exec(text)) !== null) {
        const start = Math.max(0, match.index - contextLength);
        const end = Math.min(text.length, match.index + match[0].length + contextLength);
        
        results.push({
            match: match[0],
            index: match.index,
            context: text.slice(start, end),
            before: text.slice(start, match.index),
            after: text.slice(match.index + match[0].length, end)
        });
    }
    
    return results;
}

const text = "JavaScript is a powerful programming language. Many developers love JavaScript.";
console.log(advancedSearch(text, "javascript", { caseSensitive: false, wholeWord: true }));

练习2：配置文件解析器

编写一个配置文件解析器，支持注释和多行值。

// 答案
function parseConfig(configText) {
    const config = {};
    const lines = configText.split('\n');
    let currentKey = null;
    let multilineValue = [];
    let inMultiline = false;
    
    const keyValuePattern = /^(\w+)\s*=\s*(.*)$/;
    const commentPattern = /^\s*#.*$/;
    const emptyLinePattern = /^\s*$/;
    const multilineStartPattern = /"""$/;
    const multilineEndPattern = /^"""/;
    
    for (const line of lines) {
        // 跳过注释和空行
        if (commentPattern.test(line) || emptyLinePattern.test(line)) {
            continue;
        }
        
        if (inMultiline) {
            if (multilineEndPattern.test(line)) {
                config[currentKey] = multilineValue.join('\n');
                inMultiline = false;
                currentKey = null;
                multilineValue = [];
            } else {
                multilineValue.push(line);
            }
        } else {
            const match = line.match(keyValuePattern);
            if (match) {
                const [, key, value] = match;
                if (multilineStartPattern.test(value)) {
                    currentKey = key;
                    inMultiline = true;
                } else {
                    config[key] = value.replace(/^["'](.*)["']$/, '$1'); // 移除引号
                }
            }
        }
    }
    
    return config;
}

const configText = `
# 应用配置
app_name=MyApp
app_version="1.0.0"

# 数据库配置  
db_host=localhost
description="""
这是一个多行的
描述文本
可以包含换行
"""

# 其他设置
debug=true
`;

console.log(parseConfig(configText));

小结

修饰符是正则表达式的重要组成部分，它们改变了正则表达式的匹配行为：

i 修饰符：不区分大小写匹配，适用于搜索和验证
g 修饰符：全局匹配，获取所有匹配项而不是只有第一个
m 修饰符：多行模式，改变 ^ 和 $ 的行为
s 修饰符：单行模式，让点号匹配换行符
u 修饰符：Unicode 模式，正确处理Unicode字符
y 修饰符：粘性匹配，用于精确位置匹配
修饰符组合：多个修饰符可以组合使用
动态修饰符：根据条件动态构建正则表达式

理解和掌握修饰符的使用可以让我们的正则表达式更加灵活和强大。