第 4 章：位置匹配和边界

Haiyue2025/9/1大约 8 分钟

第 4 章：位置匹配和边界

学习目标

掌握行首(^)和行尾($)锚点
学会使用单词边界(\b)和非单词边界(\B)
理解字符串开始(\A)和结束(\Z)位置
掌握先行断言和后行断言的基本概念

4.1 锚点概述

锚点（Anchors）不匹配任何字符，而是匹配位置。它们用于确保模式在特定位置出现，这对于精确匹配非常重要。

锚点的特点

零宽度：不消耗字符
位置性：匹配位置而不是字符
边界性：定义匹配的边界条件

4.2 行首锚点 (^)

行首锚点 ^ 匹配行的开始位置。

基础用法

^hello      # 匹配以 "hello" 开头的行
^\d+        # 匹配以数字开头的行
^[A-Z]      # 匹配以大写字母开头的行

实际应用

// 验证输入是否以特定字符开头
const startsWithHello = /^hello/i;
console.log(startsWithHello.test("Hello world"));  // true
console.log(startsWithHello.test("Say hello"));    // false

// 匹配以数字开头的行
const text = `123 是数字
abc 是字母
456 也是数字`;

const numberLines = text.split('\n').filter(line => /^\d/.test(line));
console.log(numberLines); // ["123 是数字", "456 也是数字"]

多行模式中的行首

const multilineText = `第一行
第二行
第三行`;

// 不使用多行模式 - 只匹配整个字符串的开头
const singleMode = /^第/;
console.log(singleMode.test(multilineText)); // true (匹配第一行)

// 使用多行模式 - 匹配每行的开头
const multiMode = /^第/gm;
console.log(multilineText.match(multiMode)); // ["第", "第", "第"]

4.3 行尾锚点 ($)

行尾锚点 $ 匹配行的结束位置。

基础用法

world$      # 匹配以 "world" 结尾的行
\d+$        # 匹配以数字结尾的行
[.!?]$      # 匹配以句号、感叹号或问号结尾的行

实际应用

// 验证文件扩展名
const isTextFile = /\.txt$/;
console.log(isTextFile.test("document.txt"));  // true
console.log(isTextFile.test("document.pdf"));  // false

// 验证邮箱格式（简单版本）
const emailPattern = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
console.log(emailPattern.test("user@example.com"));     // true
console.log(emailPattern.test("user@example.com."));    // false

// 查找以特定字符结尾的行
const text = `文件1.txt
文件2.pdf
文件3.txt
文件4.doc`;

const txtFiles = text.split('\n').filter(line => /\.txt$/.test(line));
console.log(txtFiles); // ["文件1.txt", "文件3.txt"]

4.4 组合使用行首和行尾

精确匹配整行

// 匹配恰好是 "hello" 的行
const exactMatch = /^hello$/;
console.log(exactMatch.test("hello"));        // true
console.log(exactMatch.test("hello world"));  // false
console.log(exactMatch.test("say hello"));    // false

// 匹配空行
const emptyLine = /^$/;

// 匹配只包含空白字符的行
const blankLine = /^\s*$/;

常见的验证模式

// 手机号验证（11位数字）
const phonePattern = /^\d{11}$/;

// 邮政编码验证（6位数字）
const zipPattern = /^\d{6}$/;

// 用户名验证（3-16位字母数字下划线，字母开头）
const usernamePattern = /^[a-zA-Z][a-zA-Z0-9_]{2,15}$/;

// IP地址验证（简化版）
const ipPattern = /^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/;

4.5 单词边界 (\b)

单词边界 \b 匹配单词字符和非单词字符之间的位置。

什么是单词字符

单词字符：字母、数字、下划线 [a-zA-Z0-9_]
非单词字符：空格、标点符号、特殊字符等

基础用法

\bword\b    # 匹配完整的单词 "word"
\bcat       # 匹配以 "cat" 开头的单词
cat\b       # 匹配以 "cat" 结尾的单词

实际应用

// 精确匹配单词，避免部分匹配
const text = "I have a cat and a caterpillar";

// 不使用单词边界 - 会匹配 "caterpillar" 中的 "cat"
const withoutBoundary = /cat/g;
console.log(text.match(withoutBoundary)); // ["cat", "cat"]

// 使用单词边界 - 只匹配完整的 "cat"
const withBoundary = /\bcat\b/g;
console.log(text.match(withBoundary)); // ["cat"]

// 查找以特定字符串开头的单词
const startsWithPre = /\bpre\w*/g;
const text2 = "prefix, prepare, represent, pretty";
console.log(text2.match(startsWithPre)); // ["prefix", "prepare", "pretty"]

单词边界的位置

const text = "hello world";
//           ^    ^     ^
//           1    2     3
// 位置1：字符串开始，h前面（单词边界）
// 位置2：o和空格之间（单词边界）  
// 位置3：d后面，字符串结束（单词边界）

const boundaries = /\b/g;
console.log(text.replace(boundaries, "|")); // "|hello| |world|"

4.6 非单词边界 (\B)

非单词边界 \B 匹配两个单词字符之间或两个非单词字符之间的位置。

基础用法

\Bword\B    # 匹配被单词字符包围的 "word"
\Bcat       # 匹配不在单词开头的 "cat"
cat\B       # 匹配不在单词结尾的 "cat"

实际应用

// 匹配单词内部的模式
const text = "JavaScript and Java are different";

// 匹配单词内部的 "ava"
const internal = /\Bava\B/g;
console.log(text.match(internal)); // ["ava"] (来自 JavaScript)

// 对比：匹配完整单词 "Java"
const wholeWord = /\bJava\b/g;
console.log(text.match(wholeWord)); // ["Java"]

// 替换单词内部的特定模式
const result = text.replace(/\Bava\B/g, "XXX");
console.log(result); // "JXXXScript and Java are different"

4.7 字符串开始和结束 (\A, \z, \Z)

某些正则表达式引擎提供了更精确的字符串边界。

\A - 字符串开始

// 类似 ^，但在多行模式下不匹配行首
// 注意：JavaScript 不直接支持 \A，用 ^ 代替

\z 和 \Z - 字符串结束

// \z - 字符串的绝对结尾
// \Z - 字符串结尾（可能在最终换行符之前）
// 注意：JavaScript 不直接支持，用 $ 代替

JavaScript 中的实现

// 模拟 \A 的行为
function stringStart(pattern) {
    return new RegExp('^' + pattern.source, pattern.flags.replace('m', ''));
}

// 模拟 \z 的行为  
function stringEnd(pattern) {
    return new RegExp(pattern.source + '$', pattern.flags.replace('m', ''));
}

4.8 断言的预览

断言是位置匹配的高级形式，我们在这里简单介绍，在第6章详细讲解。

先行断言（Lookahead）

foo(?=bar)  # 匹配后面跟着 "bar" 的 "foo"
foo(?!bar)  # 匹配后面不跟着 "bar" 的 "foo"

后行断言（Lookbehind）

(?<=foo)bar # 匹配前面是 "foo" 的 "bar"
(?<!foo)bar # 匹配前面不是 "foo" 的 "bar"

简单示例

// 先行断言示例
const text = "foo123 foobar foobaz";

// 匹配后面跟数字的 "foo"
const followedByDigit = /foo(?=\d)/g;
console.log(text.match(followedByDigit)); // ["foo"]

// 匹配后面不跟 "bar" 的 "foo"
const notFollowedByBar = /foo(?!bar)/g;
console.log(text.match(notFollowedByBar)); // ["foo", "foo"]

4.9 实际应用案例

密码验证

// 密码必须包含大小写字母和数字，8-16位
function validatePassword(password) {
    const hasUpper = /[A-Z]/.test(password);
    const hasLower = /[a-z]/.test(password);
    const hasNumber = /\d/.test(password);
    const validLength = /^.{8,16}$/.test(password);
    
    return hasUpper && hasLower && hasNumber && validLength;
}

// 使用先行断言的更简洁写法
const passwordPattern = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,16}$/;

提取数据

// 从日志中提取IP地址和时间戳
const logEntry = "2023-12-01 10:30:45 192.168.1.100 GET /api/users";

// 提取IP地址（单词边界确保完整匹配）
const ipPattern = /\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b/;
const ip = logEntry.match(ipPattern)[0]; // "192.168.1.100"

// 提取时间戳（行首确保正确位置）
const timePattern = /^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}/;
const timestamp = logEntry.match(timePattern)[0]; // "2023-12-01 10:30:45"

文本清理

// 清理多余的空白字符
function cleanText(text) {
    return text
        .replace(/^\s+/gm, '')    // 删除行首空白
        .replace(/\s+$/gm, '')    // 删除行尾空白
        .replace(/\n{2,}/g, '\n\n') // 多个换行符合并为两个
        .replace(/^$\n/gm, '');   // 删除空行
}

// 删除HTML标签但保留内容
const removeHtmlTags = /<[^>]*>/g;
const htmlText = "<p>Hello <strong>world</strong>!</p>";
const plainText = htmlText.replace(removeHtmlTags, ''); // "Hello world!"

4.10 常见陷阱和注意事项

陷阱1：多行模式下的 ^ 和 $

const text = "first line\nsecond line";

// 不使用多行模式
console.log(/^second/.test(text)); // false

// 使用多行模式
console.log(/^second/m.test(text)); // true

陷阱2：单词边界的定义

const text = "hello-world";

// \b 在连字符处是边界（因为 - 不是单词字符）
console.log(text.match(/\bhello\b/)); // ["hello"]
console.log(text.match(/\bworld\b/)); // ["world"]

// 但这可能不是我们想要的
console.log(text.match(/\bhello-world\b/)); // null

陷阱3：空字符串和边界

// 空字符串匹配行首和行尾
console.log(/^$/.test("")); // true

// 但要注意 \b 在空字符串中的行为
console.log(/\b/.test("")); // false (没有单词字符)

4.11 练习题

练习1：验证格式

编写正则表达式验证以下格式：

中国手机号（11位，1开头）
邮箱地址（用户名@域名.后缀）
URL（http://或https😕/开头）

// 答案
const phonePattern = /^1\d{10}$/;
const emailPattern = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
const urlPattern = /^https?:\/\/.+/;

练习2：文本处理

提取句子开头的单词
匹配以特定后缀结尾的文件名
删除行首的数字编号

// 答案
const firstWords = /^\b\w+\b/gm;
const imageFiles = /\.(jpg|png|gif|bmp)$/i;
const removeNumbers = /^\d+\.\s*/gm;

const text = "1. 第一项\n2. 第二项\n3. 第三项";
const cleaned = text.replace(removeNumbers, '');

练习3：高级应用

编写一个函数，验证密码是否符合以下要求：

8-20位长度
包含大写字母
包含小写字母
包含数字
包含特殊字符

// 答案
function validateStrongPassword(password) {
    const patterns = [
        /^.{8,20}$/,           // 长度检查
        /[A-Z]/,               // 大写字母
        /[a-z]/,               // 小写字母
        /\d/,                  // 数字
        /[!@#$%^&*(),.?":{}|<>]/ // 特殊字符
    ];
    
    return patterns.every(pattern => pattern.test(password));
}

// 或者使用先行断言的更简洁写法
const strongPasswordPattern = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*(),.?":{}|<>]).{8,20}$/;

小结

位置匹配和边界是正则表达式精确匹配的关键工具：

行首行尾锚点：^ 和 $ 用于匹配行的开始和结束
单词边界：\b 匹配单词边界，\B 匹配非单词边界
组合使用：锚点组合使用可以实现精确的格式验证
多行模式：影响 ^ 和 $ 的行为
断言预览：为高级位置匹配做准备

掌握这些概念对于编写准确的正则表达式至关重要，它们帮助我们精确控制匹配的位置和边界。