第 3 章：量词和重复

Haiyue2025/9/1大约 7 分钟

第 3 章：量词和重复

学习目标

掌握基本量词：*、+、?
学会使用精确量词：{n}、{n,}、
理解贪婪匹配和非贪婪匹配的区别
掌握非贪婪量词的使用：*?、+?、??

3.1 基本量词概述

量词用来指定前面的字符或模式应该匹配多少次。这是正则表达式灵活性和强大功能的核心。

量词的作用对象

量词作用于紧挨着它前面的字符或组：

a*          # * 作用于字符 'a'
(abc)*      # * 作用于组 'abc'
[0-9]+      # + 作用于字符类 [0-9]

3.2 星号 (*) - 零次或多次

星号匹配前面的字符或组零次或多次。

基础用法

a*          # 匹配 "", "a", "aa", "aaa", ...
ab*         # 匹配 "a", "ab", "abb", "abbb", ...
[0-9]*      # 匹配 "", "1", "123", "999999", ...

实际应用

// 匹配可选的空白字符
const pattern1 = /\s*/;

// 匹配数字（包括空字符串）
const pattern2 = /\d*/;

// 匹配文件名（字母开头，后面跟任意数量的字母或数字）
const filename = /[a-zA-Z][a-zA-Z0-9]*/;

常见陷阱

const text = "abc";
const result = text.match(/a*b*/);
console.log(result[0]); // "ab"

const text2 = "xyz";
const result2 = text2.match(/a*b*/);
console.log(result2[0]); // "" (匹配空字符串)

3.3 加号 (+) - 一次或多次

加号匹配前面的字符或组一次或多次。

基础用法

a+          # 匹配 "a", "aa", "aaa", ... (不匹配空字符串)
ab+         # 匹配 "ab", "abb", "abbb", ...
[0-9]+      # 匹配 "1", "123", "999999", ... (至少一个数字)

实际应用

// 匹配一个或多个数字
const numbers = /\d+/g;
const text = "我有123个苹果和456个橘子";
console.log(text.match(numbers)); // ["123", "456"]

// 匹配单词（至少一个字母）
const words = /[a-zA-Z]+/g;
const sentence = "Hello world! 123";
console.log(sentence.match(words)); // ["Hello", "world"]

与星号的区别

const text = "bcd";

// 使用 a*
console.log(/a*/.exec(text)[0]); // "" (匹配空字符串)

// 使用 a+
console.log(/a+/.exec(text)); // null (不匹配)

3.4 问号 (?) - 零次或一次

问号匹配前面的字符或组零次或一次，用于表示可选项。

基础用法

a?          # 匹配 "" 或 "a"
ab?         # 匹配 "a" 或 "ab"
colou?r     # 匹配 "color" 或 "colour"

实际应用

// 可选的协议部分
const url = /https?:\/\//;
console.log(url.test("http://example.com"));  // true
console.log(url.test("https://example.com")); // true

// 可选的负号
const number = /-?\d+/;
console.log(number.test("123"));  // true
console.log(number.test("-123")); // true

// 英式和美式拼写
const spelling = /colou?r/;
console.log(spelling.test("color"));  // true
console.log(spelling.test("colour")); // true

3.5 精确量词 {n}、{n,}、

精确量词允许指定确切的匹配次数。

{n} - 恰好 n 次

a{3}        # 匹配 "aaa"
\d{4}       # 匹配恰好4位数字
[A-Z]{2}    # 匹配恰好2个大写字母

{n,} - 至少 n 次

a{3,}       # 匹配 "aaa", "aaaa", "aaaaa", ...
\d{2,}      # 匹配至少2位数字
\w{5,}      # 匹配至少5个字母数字字符

{n,m} - n 到 m 次之间

a{2,5}      # 匹配 "aa", "aaa", "aaaa", "aaaaa"
\d{3,6}     # 匹配3到6位数字
[a-z]{1,10} # 匹配1到10个小写字母

实际应用

// 验证手机号（11位数字）
const phone = /^\d{11}$/;
console.log(phone.test("13812345678")); // true

// 验证密码长度（6-20位）
const password = /^.{6,20}$/;
console.log(password.test("123456")); // true

// 匹配邮政编码（6位数字）
const zipCode = /^\d{6}$/;
console.log(zipCode.test("100000")); // true

// 匹配十六进制颜色（3或6位）
const hexColor = /#([0-9a-fA-F]{3}|[0-9a-fA-F]{6})/;
console.log(hexColor.test("#F00"));    // true
console.log(hexColor.test("#FF0000")); // true

3.6 贪婪匹配 vs 非贪婪匹配

这是正则表达式中一个重要的概念。

贪婪匹配（默认行为）

贪婪量词会尽可能多地匹配字符：

const text = "<div>Hello</div><div>World</div>";

// 贪婪匹配
const greedy = /<div>.*<\/div>/;
console.log(text.match(greedy)[0]);
// 结果: "<div>Hello</div><div>World</div>"
// 匹配了从第一个 <div> 到最后一个 </div>

贪婪匹配的问题

const html = '<p class="text">内容1</p><p class="text">内容2</p>';
const greedyPattern = /<p.*<\/p>/;
const result = html.match(greedyPattern);
console.log(result[0]);
// 匹配整个字符串，而不是单个 <p> 标签

非贪婪匹配（懒惰匹配）

在量词后面加 ? 可以变成非贪婪匹配：

const text = "<div>Hello</div><div>World</div>";

// 非贪婪匹配
const nonGreedy = /<div>.*?<\/div>/;
console.log(text.match(nonGreedy)[0]);
// 结果: "<div>Hello</div>"
// 只匹配第一个完整的 div 标签

// 匹配所有 div 标签
const allDivs = text.match(/<div>.*?<\/div>/g);
console.log(allDivs);
// 结果: ["<div>Hello</div>", "<div>World</div>"]

3.7 所有非贪婪量词

非贪婪量词列表

*?          # 零次或多次（非贪婪）
+?          # 一次或多次（非贪婪）
??          # 零次或一次（非贪婪）
{n,m}?      # n到m次（非贪婪）
{n,}?       # 至少n次（非贪婪）

实际对比示例

const text = "aaaa";

// 贪婪匹配
console.log(text.match(/a+/)[0]);     // "aaaa"
console.log(text.match(/a{2,}/)[0]);  // "aaaa"

// 非贪婪匹配
console.log(text.match(/a+?/)[0]);    // "a"
console.log(text.match(/a{2,}?/)[0]); // "aa"

HTML标签提取示例

const html = `
  <h1>标题1</h1>
  <p>段落1</p>
  <h1>标题2</h1>
  <p>段落2</p>
`;

// 贪婪匹配 - 错误的做法
const greedyTags = html.match(/<h1>.*<\/h1>/g);
console.log(greedyTags);
// 可能匹配过长的内容

// 非贪婪匹配 - 正确的做法
const lazyTags = html.match(/<h1>.*?<\/h1>/g);
console.log(lazyTags);
// ["<h1>标题1</h1>", "<h1>标题2</h1>"]

3.8 实际应用案例

提取引号内的内容

const text = 'He said "Hello" and she replied "Hi there!"';

// 使用非贪婪匹配提取引号内容
const quotes = text.match(/".*?"/g);
console.log(quotes); // ['"Hello"', '"Hi there!"']

// 只提取引号内的文本（不包括引号）
const quotedText = text.match(/"(.*?)"/g).map(match => match.slice(1, -1));
console.log(quotedText); // ['Hello', 'Hi there!']

匹配重复的字符

// 匹配重复的字符
const repeatedChars = /(.)\1+/g;
const text = "aabbcccddddd";
console.log(text.match(repeatedChars)); // ["aa", "bb", "ccc", "ddddd"]

验证格式

// QQ号验证（5-11位数字，不能以0开头）
const qqPattern = /^[1-9]\d{4,10}$/;

// 用户名验证（字母开头，3-16位字母数字下划线）
const usernamePattern = /^[a-zA-Z][a-zA-Z0-9_]{2,15}$/;

// 密码验证（8-16位，包含字母和数字）
const passwordPattern = /^(?=.*[a-zA-Z])(?=.*\d)[a-zA-Z\d]{8,16}$/;

3.9 性能考虑

贪婪匹配可能导致的性能问题

// 潜在的性能问题
const text = "a".repeat(10000) + "b";
const problematicPattern = /a*a*b/;

// 更好的写法
const betterPattern = /a*b/;

量词的效率对比

// 效率较低：多个单独的字符匹配
const inefficient = /a?a?a?aaa/;

// 效率较高：使用合适的量词
const efficient = /a{3,6}/;

3.10 常见错误和陷阱

错误1：忘记量词的作用范围

// 错误：只有 's' 是可选的
const wrong = /cats?/; // 匹配 "cat" 或 "cats"

// 正确：整个 "cats" 是可选的
const correct = /(cats)?/; // 匹配 "" 或 "cats"

错误2：贪婪匹配导致的过度匹配

// 错误：会匹配整行
const wrong = /<!--.*-->/;

// 正确：使用非贪婪匹配
const correct = /<!--.*?-->/;

错误3：量词的边界情况

// 注意空字符串的匹配
const pattern = /\d*/;
console.log("abc".match(pattern)[0]); // "" (空字符串)

// 如果需要至少匹配一个数字
const pattern2 = /\d+/;
console.log("abc".match(pattern2)); // null

3.11 练习题

练习1：基础量词使用

编写正则表达式：

匹配可选的 "www." 开头的域名
匹配一个或多个连续的数字
匹配恰好8位的数字密码

// 答案
const domain = /(www\.)?[a-zA-Z0-9-]+\.[a-z]{2,}/;
const numbers = /\d+/;
const password = /^\d{8}$/;

练习2：贪婪与非贪婪

给定 HTML 字符串，提取所有的标签内容：

const html = "<p>段落1</p><div>内容</div><p>段落2</p>";

// 使用非贪婪匹配
const tags = html.match(/<[^>]+>.*?<\/[^>]+>/g);
console.log(tags);

练习3：实际应用

编写正则表达式验证：

中国手机号（11位，1开头）
身份证号（18位）
邮箱地址（简单版本）

// 答案
const phone = /^1\d{10}$/;
const idCard = /^\d{18}$/;
const email = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;

小结

量词是正则表达式中控制匹配次数的重要工具：

基本量词：*（零次或多次）、+（一次或多次）、?（零次或一次）
精确量词：{n}、{n,}、{n,m} 提供精确的次数控制
贪婪与非贪婪：默认是贪婪匹配，加 ? 变成非贪婪匹配
性能考虑：合理使用量词可以提高匹配效率
常见陷阱：注意量词的作用范围和边界情况

掌握量词的使用是编写高效、准确正则表达式的关键步骤。