Chapter 6: Advanced Assertion Techniques

Haiyue
16min

Chapter 6: Advanced Assertion Techniques

Learning Objectives

  1. Deep understanding of positive lookahead assertions (?=)
  2. Master negative lookahead assertions (?!)
  3. Learn to use positive lookbehind assertions (?<=)
  4. Master negative lookbehind assertions (?<!)
  5. Understand techniques for combining assertions

6.1 Overview of Assertions

Assertions are zero-width matches that don’t consume characters; they only check whether a specific condition is met at a position. Assertions enable more precise pattern matching.

Types of Assertions

  • Lookahead Assertions: Check content after the current position
  • Lookbehind Assertions: Check content before the current position
  • Positive Assertions: Check if a specific pattern matches
  • Negative Assertions: Check if a specific pattern doesn’t match

Characteristics of Assertions

  • Zero-width: Don’t occupy characters in the match result
  • Positional: Only check position, don’t match content
  • Conditional: Perform matching based on conditions

6.2 Positive Lookahead (?=)

Positive lookahead (?=pattern) matches a position immediately followed by a specific pattern.

Basic Syntax

// Syntax: mainPattern(?=lookaheadCondition)
const pattern = /foo(?=bar)/;
console.log("foobar".match(pattern));   // ["foo"]
console.log("foobaz".match(pattern));   // null

// Position matching example
const text = "foobar";
console.log(text.replace(/foo(?=bar)/, "XXX")); // "XXXbar"

Practical Applications

Password Strength Validation

// Validate password contains uppercase, lowercase, and digits, 8-16 characters long
const strongPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,16}$/;

function validatePassword(password) {
    return strongPassword.test(password);
}

console.log(validatePassword("Password123"));  // true
console.log(validatePassword("password"));     // false (missing uppercase and digits)
console.log(validatePassword("PASSWORD123"));  // false (missing lowercase)

Finding Words in Specific Context

// Find words followed by numbers
const wordBeforeNumber = /\w+(?=\s+\d+)/g;
const text = "item 123, product 456, name abc, count 789";
console.log(text.match(wordBeforeNumber)); // ["item", "product", "count"]

// Find attributes in HTML tags
const attributePattern = /\w+(?==)/g; // Attribute name followed by equals sign
const html = '<div class="container" id="main" data-value="test">';
console.log(html.match(attributePattern)); // ["class", "id", "data-value"]

Complex Lookahead Assertions

// Match "the" not followed by a specific word
const theNotFollowedByEnd = /\bthe(?!\s+end\b)/gi;
const text = "the beginning, the middle, the end";
console.log(text.match(theNotFollowedByEnd)); // ["the", "the"]

// Match filename without extension
const filenameWithoutExt = /[\w-]+(?=\.\w+)/g;
const files = "document.pdf image.jpg script.js readme.txt";
console.log(files.match(filenameWithoutExt)); // ["document", "image", "script", "readme"]

6.3 Negative Lookahead (?!)

Negative lookahead (?!pattern) matches a position not followed by a specific pattern.

Basic Usage

// Match "foo" not followed by "bar"
const pattern = /foo(?!bar)/;
console.log("foobar".match(pattern));   // null
console.log("foobaz".match(pattern));   // ["foo"]

// Practical example
const text = "Java JavaScript Python";
// Match "Java" but not "JavaScript"
const javaNotScript = /Java(?!Script)/g;
console.log(text.match(javaNotScript)); // ["Java"]

Practical Applications

Validate String Without Specific Characters

// Validate password doesn't contain common weak password patterns
const noWeakPattern = /^(?!.*(?:123|abc|password|admin)).+$/i;

function isNotWeakPassword(password) {
    return noWeakPattern.test(password);
}

console.log(isNotWeakPassword("mypassword123")); // false (contains 123)
console.log(isNotWeakPassword("StrongPass456")); // true

Match Non-Comment Lines

// Match lines not starting with //
const nonCommentLine = /^(?!\s*\/\/).*\S.*/gm;
const code = `
// This is a comment
console.log("Hello");
// Another comment
const x = 42;
`;

const codeLines = code.match(nonCommentLine);
console.log(codeLines); // ['console.log("Hello");', 'const x = 42;']

6.4 Positive Lookbehind (?<=)

Positive lookbehind (?<=pattern) matches a position preceded by a specific pattern.

Basic Usage

// Match numbers preceded by "$"
const pricePattern = /(?<=\$)\d+(\.\d{2})?/g;
const text = "The price is $19.99 and tax is 5%.";
console.log(text.match(pricePattern)); // ["19.99"]

// Match content within tags
const tagContent = /(?<=<b>).*?(?=<\/b>)/g;
const html = "<b>Bold text</b> and <b>more bold</b>";
console.log(html.match(tagContent)); // ["Bold text", "more bold"]

Practical Applications

Extract Data in Specific Format

// Extract username after @ symbol
const username = /(?<=@)\w+/g;
const text = "Contact @john or @sarah for help";
console.log(text.match(username)); // ["john", "sarah"]

// Extract content after quotes
const quotedContent = /(?<=["'])[^"']*(?=["'])/g;
const text2 = 'He said "Hello" and she replied \'Hi there\'';
console.log(text2.match(quotedContent)); // ["Hello", "Hi there"]

Format Replacement

// Add currency symbol before numbers, but only for specific context
const text = "Price: 100, Cost: 50, ID: 123";
// Only add $ symbol to numbers after Price and Cost
const formatted = text.replace(/(?<=Price: |Cost: )\d+/g, "$$$&");
console.log(formatted); // "Price: $100, Cost: $50, ID: 123"

6.5 Negative Lookbehind (?<!)

Negative lookbehind (?<!pattern) matches a position not preceded by a specific pattern.

Basic Usage

// Match numbers not preceded by $ symbol
const nonPriceNumbers = /(?<!\$)\b\d+(?:\.\d{2})?\b/g;
const text = "Price $19.99, quantity 5, ID 12345";
console.log(text.match(nonPriceNumbers)); // ["5", "12345"]

Practical Applications

Avoid Matching Patterns in Specific Context

// Match text not within HTML tags
const nonTagText = /(?<!<[^>]*)\b\w+\b(?![^<]*>)/g;
const html = '<div class="container">Hello world</div>';
// Note: This example may require more complex implementation in JavaScript

// More practical example: Match code not in comments
const nonCommentCode = /(?<!\/\/.*)\bconsole\.log\b(?!.*\/\/)/g;
const code = `
console.log("active"); // This is active
// console.log("commented");
let x = console.log("also active");
`;
console.log(code.match(nonCommentCode)); // Match non-commented console.log

Data Cleaning

// Remove spaces not within quotes
function removeUnquotedSpaces(text) {
    // This is a simplified example, actual implementation would be more complex
    return text.replace(/(?<!["'])\s+(?!["'])/g, '');
}

// Match special characters that are not escape characters
const unescapedSpecial = /(?<!\\)[\\'"]/g;
const text = 'This is a "quote" and this is \\"escaped\\"';
console.log(text.match(unescapedSpecial)); // ['"', '"']

6.6 Combining Assertions

Assertions can be combined to create complex matching conditions.

Multiple Lookahead Assertions

// Validate password: at least 8 characters, contains uppercase, lowercase, digits, and special characters
const complexPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;

function validateComplexPassword(password) {
    return complexPassword.test(password);
}

console.log(validateComplexPassword("Password123!")); // true
console.log(validateComplexPassword("password123"));  // false (missing uppercase and special characters)

Combining Lookahead and Lookbehind

// Extract content surrounded by specific markers
const betweenMarkers = /(?<=START:).*?(?=:END)/g;
const text = "START:important data:END and START:more data:END";
console.log(text.match(betweenMarkers)); // ["important data", "more data"]

// Match words in specific context
const contextualWord = /(?<=\b(?:Mr|Mrs|Ms)\.?\s+)\w+(?=\s)/g;
const names = "Mr. John Smith, Mrs. Jane Doe, Ms. Sarah Connor";
console.log(names.match(contextualWord)); // ["John", "Jane", "Sarah"]

Nested Assertions

// Complex URL validation: ensure protocol is included but not localhost
const validUrl = /^(?=https?:\/\/)(?!.*localhost)(?!.*127\.0\.0\.1).+$/;

console.log(validUrl.test("https://example.com"));     // true
console.log(validUrl.test("https://localhost:3000"));  // false
console.log(validUrl.test("http://127.0.0.1:8080"));   // false

6.7 Practical Use Cases

Code Analysis Tool

// Find unused variables (simplified version)
function findUnusedVariables(code) {
    // Find variable declarations
    const declarations = code.match(/(?:let|const|var)\s+(\w+)/g);

    // Find variable usage (not in declaration statements)
    const usages = code.match(/(?<!(?:let|const|var)\s+)\b\w+(?=\s*[^\s=])/g);

    // Actual application would need more complex logic for correct analysis
    return declarations?.filter(decl => {
        const varName = decl.match(/\w+$/)[0];
        return !usages?.includes(varName);
    });
}

Data Validation

// Validate credit card number (simplified version based on Luhn algorithm)
const creditCardPattern = /^(?=(?:\d{4}[-\s]?){3}\d{4}$)(?!\d*0{4})\d+$/;

// Validate IP address (stricter version)
const ipAddress = /^(?=(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)$)/;

// Validate email (with common domain restrictions)
const emailWithDomain = /^(?=.{1,64}@.{4,253}$)[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?=.{1,63}\.)[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;

Text Processing

// Smart word boundaries (considering Unicode characters)
function smartWordBoundary(text) {
    // Use assertions to handle more complex word boundary cases
    const wordPattern = /(?<!\w)[\w\u00C0-\u017F\u0100-\u017F]+(?!\w)/g;
    return text.match(wordPattern);
}

// Extract code comments but exclude comment symbols in strings
const commentPattern = /(?<!["'])\/\*[\s\S]*?\*\/|(?<!["'])\/\/.*$/gm;
const codeWithComments = `
console.log("This // is not a comment");
// This is a comment
/* This is a
   multi-line comment */
const url = "http://example.com"; // End of line comment
`;

console.log(codeWithComments.match(commentPattern));

6.8 Browser Compatibility and Alternatives

Lookbehind Compatibility

// Check if lookbehind is supported
function supportsLookbehind() {
    try {
        new RegExp('(?<=x)y');
        return true;
    } catch (e) {
        return false;
    }
}

// Alternative when lookbehind is not supported
function extractAfterPattern(text, pattern, target) {
    const combined = new RegExp(`${pattern}(${target})`, 'g');
    const matches = [];
    let match;

    while ((match = combined.exec(text)) !== null) {
        matches.push(match[1]);
    }

    return matches;
}

// Usage example
const text = "Price: $19.99, Cost: $25.50";
if (supportsLookbehind()) {
    console.log(text.match(/(?<=\$)\d+\.\d{2}/g));
} else {
    console.log(extractAfterPattern(text, '\\$', '\\d+\\.\\d{2}'));
}

6.9 Performance Considerations

Performance Impact of Assertions

// Efficient assertion usage
const efficient = /^(?=.*\d)(?=.*[a-z]).{6,}$/; // Simple conditions first

// Potentially inefficient assertion usage
const inefficient = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*])(?=.*[0-9]).*$/; // Duplicate conditions

// Optimization recommendation: Merge similar conditions, simple conditions first
const optimized = /^(?=.*[a-z])(?=.*[A-Z])(?=.*[\d!@#$%^&*]).{8,}$/;

Alternative Comparison

// Using assertions
function validateWithAssertion(password) {
    return /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/.test(password);
}

// Using step-by-step validation (sometimes more efficient)
function validateStepByStep(password) {
    return password.length >= 8 &&
           /[a-z]/.test(password) &&
           /[A-Z]/.test(password) &&
           /\d/.test(password);
}

// Performance testing
console.time('assertion');
for (let i = 0; i < 100000; i++) {
    validateWithAssertion('Password123');
}
console.timeEnd('assertion');

console.time('stepByStep');
for (let i = 0; i < 100000; i++) {
    validateStepByStep('Password123');
}
console.timeEnd('stepByStep');

6.10 Exercises

Exercise 1: Password Validation

Write a regular expression to validate passwords:

  • 8-20 characters long
  • At least one lowercase letter
  • At least one uppercase letter
  • At least one digit
  • At least one special character (!@#$%^&*)
  • Cannot contain username
// Answer
function createPasswordValidator(username) {
    const pattern = new RegExp(
        `^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[!@#$%^&*])(?!.*${username}).{8,20}$`,
        'i'
    );
    return password => pattern.test(password);
}

const validator = createPasswordValidator('john');
console.log(validator('JohnPass123!')); // false (contains username)
console.log(validator('SecurePass123!')); // true

Exercise 2: Data Extraction

Extract price information from the following text format:

  • Price format: $XX.XX or ¥XX.XX
  • Ignore prices within parentheses or quotes
// Answer
const pricePattern = /(?<![("'])[\$¥](\d+\.?\d{0,2})(?![)"'])/g;

function extractPrices(text) {
    const matches = [...text.matchAll(pricePattern)];
    return matches.map(match => ({
        currency: match[0][0],
        amount: parseFloat(match[1])
    }));
}

const text = 'Price $19.99, ("$10.00" is fake), ¥25.50, another $30';
console.log(extractPrices(text));
// [
//   { currency: '$', amount: 19.99 },
//   { currency: '¥', amount: 25.5 },
//   { currency: '$', amount: 30 }
// ]

Exercise 3: Code Analysis

Write a regular expression to find JavaScript function definitions but exclude function definitions in comments.

// Answer (simplified version)
const functionPattern = /(?<!\/\/.*?)(?<!\/\*[\s\S]*?)function\s+(\w+)\s*\(/g;

// More practical method is step-by-step processing
function findFunctionDefinitions(code) {
    // First remove comments
    const withoutComments = code
        .replace(/\/\*[\s\S]*?\*\//g, '')
        .replace(/\/\/.*$/gm, '');

    // Then find function definitions
    const functions = [...withoutComments.matchAll(/function\s+(\w+)\s*\(/g)];
    return functions.map(match => match[1]);
}

const code = `
function activeFunction() {}
// function commentedFunction() {}
/* function blockCommentedFunction() {} */
const arrow = () => {};
function anotherActive() {}
`;

console.log(findFunctionDefinitions(code)); // ['activeFunction', 'anotherActive']

Summary

Advanced assertion techniques are powerful tools for regular expressions:

  1. Lookahead Assertions: (?=) positive, (?!) negative, check content ahead
  2. Lookbehind Assertions: (?<=) positive, (?<!) negative, check content behind
  3. Combining: Multiple assertions can be combined to create complex conditions
  4. Practical Applications: Password validation, data extraction, code analysis, etc.
  5. Compatibility Considerations: Lookbehind not supported in some environments, need alternatives
  6. Performance Optimization: Reasonable use of assertions, avoid overly complex conditions

Mastering assertion techniques enables us to write more precise and powerful regular expressions to implement complex matching requirements.