Chapter 6: Advanced Assertion Techniques
Chapter 6: Advanced Assertion Techniques
Learning Objectives
- Deep understanding of positive lookahead assertions (?=)
- Master negative lookahead assertions (?!)
- Learn to use positive lookbehind assertions (?<=)
- Master negative lookbehind assertions (?<!)
- Understand techniques for combining assertions
6.1 Overview of Assertions
Assertions are zero-width matches that don’t consume characters; they only check whether a specific condition is met at a position. Assertions enable more precise pattern matching.
Types of Assertions
- Lookahead Assertions: Check content after the current position
- Lookbehind Assertions: Check content before the current position
- Positive Assertions: Check if a specific pattern matches
- Negative Assertions: Check if a specific pattern doesn’t match
Characteristics of Assertions
- Zero-width: Don’t occupy characters in the match result
- Positional: Only check position, don’t match content
- Conditional: Perform matching based on conditions
6.2 Positive Lookahead (?=)
Positive lookahead (?=pattern) matches a position immediately followed by a specific pattern.
Basic Syntax
// Syntax: mainPattern(?=lookaheadCondition)
const pattern = /foo(?=bar)/;
console.log("foobar".match(pattern)); // ["foo"]
console.log("foobaz".match(pattern)); // null
// Position matching example
const text = "foobar";
console.log(text.replace(/foo(?=bar)/, "XXX")); // "XXXbar"
Practical Applications
Password Strength Validation
// Validate password contains uppercase, lowercase, and digits, 8-16 characters long
const strongPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,16}$/;
function validatePassword(password) {
return strongPassword.test(password);
}
console.log(validatePassword("Password123")); // true
console.log(validatePassword("password")); // false (missing uppercase and digits)
console.log(validatePassword("PASSWORD123")); // false (missing lowercase)
Finding Words in Specific Context
// Find words followed by numbers
const wordBeforeNumber = /\w+(?=\s+\d+)/g;
const text = "item 123, product 456, name abc, count 789";
console.log(text.match(wordBeforeNumber)); // ["item", "product", "count"]
// Find attributes in HTML tags
const attributePattern = /\w+(?==)/g; // Attribute name followed by equals sign
const html = '<div class="container" id="main" data-value="test">';
console.log(html.match(attributePattern)); // ["class", "id", "data-value"]
Complex Lookahead Assertions
// Match "the" not followed by a specific word
const theNotFollowedByEnd = /\bthe(?!\s+end\b)/gi;
const text = "the beginning, the middle, the end";
console.log(text.match(theNotFollowedByEnd)); // ["the", "the"]
// Match filename without extension
const filenameWithoutExt = /[\w-]+(?=\.\w+)/g;
const files = "document.pdf image.jpg script.js readme.txt";
console.log(files.match(filenameWithoutExt)); // ["document", "image", "script", "readme"]
6.3 Negative Lookahead (?!)
Negative lookahead (?!pattern) matches a position not followed by a specific pattern.
Basic Usage
// Match "foo" not followed by "bar"
const pattern = /foo(?!bar)/;
console.log("foobar".match(pattern)); // null
console.log("foobaz".match(pattern)); // ["foo"]
// Practical example
const text = "Java JavaScript Python";
// Match "Java" but not "JavaScript"
const javaNotScript = /Java(?!Script)/g;
console.log(text.match(javaNotScript)); // ["Java"]
Practical Applications
Validate String Without Specific Characters
// Validate password doesn't contain common weak password patterns
const noWeakPattern = /^(?!.*(?:123|abc|password|admin)).+$/i;
function isNotWeakPassword(password) {
return noWeakPattern.test(password);
}
console.log(isNotWeakPassword("mypassword123")); // false (contains 123)
console.log(isNotWeakPassword("StrongPass456")); // true
Match Non-Comment Lines
// Match lines not starting with //
const nonCommentLine = /^(?!\s*\/\/).*\S.*/gm;
const code = `
// This is a comment
console.log("Hello");
// Another comment
const x = 42;
`;
const codeLines = code.match(nonCommentLine);
console.log(codeLines); // ['console.log("Hello");', 'const x = 42;']
6.4 Positive Lookbehind (?<=)
Positive lookbehind (?<=pattern) matches a position preceded by a specific pattern.
Basic Usage
// Match numbers preceded by "$"
const pricePattern = /(?<=\$)\d+(\.\d{2})?/g;
const text = "The price is $19.99 and tax is 5%.";
console.log(text.match(pricePattern)); // ["19.99"]
// Match content within tags
const tagContent = /(?<=<b>).*?(?=<\/b>)/g;
const html = "<b>Bold text</b> and <b>more bold</b>";
console.log(html.match(tagContent)); // ["Bold text", "more bold"]
Practical Applications
Extract Data in Specific Format
// Extract username after @ symbol
const username = /(?<=@)\w+/g;
const text = "Contact @john or @sarah for help";
console.log(text.match(username)); // ["john", "sarah"]
// Extract content after quotes
const quotedContent = /(?<=["'])[^"']*(?=["'])/g;
const text2 = 'He said "Hello" and she replied \'Hi there\'';
console.log(text2.match(quotedContent)); // ["Hello", "Hi there"]
Format Replacement
// Add currency symbol before numbers, but only for specific context
const text = "Price: 100, Cost: 50, ID: 123";
// Only add $ symbol to numbers after Price and Cost
const formatted = text.replace(/(?<=Price: |Cost: )\d+/g, "$$$&");
console.log(formatted); // "Price: $100, Cost: $50, ID: 123"
6.5 Negative Lookbehind (?<!)
Negative lookbehind (?<!pattern) matches a position not preceded by a specific pattern.
Basic Usage
// Match numbers not preceded by $ symbol
const nonPriceNumbers = /(?<!\$)\b\d+(?:\.\d{2})?\b/g;
const text = "Price $19.99, quantity 5, ID 12345";
console.log(text.match(nonPriceNumbers)); // ["5", "12345"]
Practical Applications
Avoid Matching Patterns in Specific Context
// Match text not within HTML tags
const nonTagText = /(?<!<[^>]*)\b\w+\b(?![^<]*>)/g;
const html = '<div class="container">Hello world</div>';
// Note: This example may require more complex implementation in JavaScript
// More practical example: Match code not in comments
const nonCommentCode = /(?<!\/\/.*)\bconsole\.log\b(?!.*\/\/)/g;
const code = `
console.log("active"); // This is active
// console.log("commented");
let x = console.log("also active");
`;
console.log(code.match(nonCommentCode)); // Match non-commented console.log
Data Cleaning
// Remove spaces not within quotes
function removeUnquotedSpaces(text) {
// This is a simplified example, actual implementation would be more complex
return text.replace(/(?<!["'])\s+(?!["'])/g, '');
}
// Match special characters that are not escape characters
const unescapedSpecial = /(?<!\\)[\\'"]/g;
const text = 'This is a "quote" and this is \\"escaped\\"';
console.log(text.match(unescapedSpecial)); // ['"', '"']
6.6 Combining Assertions
Assertions can be combined to create complex matching conditions.
Multiple Lookahead Assertions
// Validate password: at least 8 characters, contains uppercase, lowercase, digits, and special characters
const complexPassword = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$/;
function validateComplexPassword(password) {
return complexPassword.test(password);
}
console.log(validateComplexPassword("Password123!")); // true
console.log(validateComplexPassword("password123")); // false (missing uppercase and special characters)
Combining Lookahead and Lookbehind
// Extract content surrounded by specific markers
const betweenMarkers = /(?<=START:).*?(?=:END)/g;
const text = "START:important data:END and START:more data:END";
console.log(text.match(betweenMarkers)); // ["important data", "more data"]
// Match words in specific context
const contextualWord = /(?<=\b(?:Mr|Mrs|Ms)\.?\s+)\w+(?=\s)/g;
const names = "Mr. John Smith, Mrs. Jane Doe, Ms. Sarah Connor";
console.log(names.match(contextualWord)); // ["John", "Jane", "Sarah"]
Nested Assertions
// Complex URL validation: ensure protocol is included but not localhost
const validUrl = /^(?=https?:\/\/)(?!.*localhost)(?!.*127\.0\.0\.1).+$/;
console.log(validUrl.test("https://example.com")); // true
console.log(validUrl.test("https://localhost:3000")); // false
console.log(validUrl.test("http://127.0.0.1:8080")); // false
6.7 Practical Use Cases
Code Analysis Tool
// Find unused variables (simplified version)
function findUnusedVariables(code) {
// Find variable declarations
const declarations = code.match(/(?:let|const|var)\s+(\w+)/g);
// Find variable usage (not in declaration statements)
const usages = code.match(/(?<!(?:let|const|var)\s+)\b\w+(?=\s*[^\s=])/g);
// Actual application would need more complex logic for correct analysis
return declarations?.filter(decl => {
const varName = decl.match(/\w+$/)[0];
return !usages?.includes(varName);
});
}
Data Validation
// Validate credit card number (simplified version based on Luhn algorithm)
const creditCardPattern = /^(?=(?:\d{4}[-\s]?){3}\d{4}$)(?!\d*0{4})\d+$/;
// Validate IP address (stricter version)
const ipAddress = /^(?=(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)$)/;
// Validate email (with common domain restrictions)
const emailWithDomain = /^(?=.{1,64}@.{4,253}$)[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-zA-Z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?=.{1,63}\.)[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;
Text Processing
// Smart word boundaries (considering Unicode characters)
function smartWordBoundary(text) {
// Use assertions to handle more complex word boundary cases
const wordPattern = /(?<!\w)[\w\u00C0-\u017F\u0100-\u017F]+(?!\w)/g;
return text.match(wordPattern);
}
// Extract code comments but exclude comment symbols in strings
const commentPattern = /(?<!["'])\/\*[\s\S]*?\*\/|(?<!["'])\/\/.*$/gm;
const codeWithComments = `
console.log("This // is not a comment");
// This is a comment
/* This is a
multi-line comment */
const url = "http://example.com"; // End of line comment
`;
console.log(codeWithComments.match(commentPattern));
6.8 Browser Compatibility and Alternatives
Lookbehind Compatibility
// Check if lookbehind is supported
function supportsLookbehind() {
try {
new RegExp('(?<=x)y');
return true;
} catch (e) {
return false;
}
}
// Alternative when lookbehind is not supported
function extractAfterPattern(text, pattern, target) {
const combined = new RegExp(`${pattern}(${target})`, 'g');
const matches = [];
let match;
while ((match = combined.exec(text)) !== null) {
matches.push(match[1]);
}
return matches;
}
// Usage example
const text = "Price: $19.99, Cost: $25.50";
if (supportsLookbehind()) {
console.log(text.match(/(?<=\$)\d+\.\d{2}/g));
} else {
console.log(extractAfterPattern(text, '\\$', '\\d+\\.\\d{2}'));
}
6.9 Performance Considerations
Performance Impact of Assertions
// Efficient assertion usage
const efficient = /^(?=.*\d)(?=.*[a-z]).{6,}$/; // Simple conditions first
// Potentially inefficient assertion usage
const inefficient = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*])(?=.*[0-9]).*$/; // Duplicate conditions
// Optimization recommendation: Merge similar conditions, simple conditions first
const optimized = /^(?=.*[a-z])(?=.*[A-Z])(?=.*[\d!@#$%^&*]).{8,}$/;
Alternative Comparison
// Using assertions
function validateWithAssertion(password) {
return /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}$/.test(password);
}
// Using step-by-step validation (sometimes more efficient)
function validateStepByStep(password) {
return password.length >= 8 &&
/[a-z]/.test(password) &&
/[A-Z]/.test(password) &&
/\d/.test(password);
}
// Performance testing
console.time('assertion');
for (let i = 0; i < 100000; i++) {
validateWithAssertion('Password123');
}
console.timeEnd('assertion');
console.time('stepByStep');
for (let i = 0; i < 100000; i++) {
validateStepByStep('Password123');
}
console.timeEnd('stepByStep');
6.10 Exercises
Exercise 1: Password Validation
Write a regular expression to validate passwords:
- 8-20 characters long
- At least one lowercase letter
- At least one uppercase letter
- At least one digit
- At least one special character (!@#$%^&*)
- Cannot contain username
// Answer
function createPasswordValidator(username) {
const pattern = new RegExp(
`^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d)(?=.*[!@#$%^&*])(?!.*${username}).{8,20}$`,
'i'
);
return password => pattern.test(password);
}
const validator = createPasswordValidator('john');
console.log(validator('JohnPass123!')); // false (contains username)
console.log(validator('SecurePass123!')); // true
Exercise 2: Data Extraction
Extract price information from the following text format:
- Price format: $XX.XX or ¥XX.XX
- Ignore prices within parentheses or quotes
// Answer
const pricePattern = /(?<![("'])[\$¥](\d+\.?\d{0,2})(?![)"'])/g;
function extractPrices(text) {
const matches = [...text.matchAll(pricePattern)];
return matches.map(match => ({
currency: match[0][0],
amount: parseFloat(match[1])
}));
}
const text = 'Price $19.99, ("$10.00" is fake), ¥25.50, another $30';
console.log(extractPrices(text));
// [
// { currency: '$', amount: 19.99 },
// { currency: '¥', amount: 25.5 },
// { currency: '$', amount: 30 }
// ]
Exercise 3: Code Analysis
Write a regular expression to find JavaScript function definitions but exclude function definitions in comments.
// Answer (simplified version)
const functionPattern = /(?<!\/\/.*?)(?<!\/\*[\s\S]*?)function\s+(\w+)\s*\(/g;
// More practical method is step-by-step processing
function findFunctionDefinitions(code) {
// First remove comments
const withoutComments = code
.replace(/\/\*[\s\S]*?\*\//g, '')
.replace(/\/\/.*$/gm, '');
// Then find function definitions
const functions = [...withoutComments.matchAll(/function\s+(\w+)\s*\(/g)];
return functions.map(match => match[1]);
}
const code = `
function activeFunction() {}
// function commentedFunction() {}
/* function blockCommentedFunction() {} */
const arrow = () => {};
function anotherActive() {}
`;
console.log(findFunctionDefinitions(code)); // ['activeFunction', 'anotherActive']
Summary
Advanced assertion techniques are powerful tools for regular expressions:
- Lookahead Assertions:
(?=)positive,(?!)negative, check content ahead - Lookbehind Assertions:
(?<=)positive,(?<!)negative, check content behind - Combining: Multiple assertions can be combined to create complex conditions
- Practical Applications: Password validation, data extraction, code analysis, etc.
- Compatibility Considerations: Lookbehind not supported in some environments, need alternatives
- Performance Optimization: Reasonable use of assertions, avoid overly complex conditions
Mastering assertion techniques enables us to write more precise and powerful regular expressions to implement complex matching requirements.