marcel/mermaid

Fork 0

mirror of https://github.com/mermaid-js/mermaid.git synced 2025-09-28 03:39:38 +02:00

Files

Knut Sveidqvist 4a5e1a3250 Better handling of special characters

2025-06-13 08:09:54 +02:00

7.4 KiB

Raw Permalink Blame History

Chevrotain Parser Implementation Plan

Current Status: 86% Complete ✅

Progress: 174/203 tests passing (86% success rate)

Major Achievements:

✅ Fixed grammar ambiguity issues
✅ Added standaloneLinkStatement to statement rule with proper lookahead
✅ Core parser architecture is working
✅ Most single node, vertex, and basic edge tests are passing

Remaining Issues: 29 Tests (3 Core Problems)

✅ COMPLETED: Phase 3 - Special Characters (4 tests)

Status: FIXED - All special character tests now passing Solution: Removed conflicting punctuation tokens from lexer main mode Impact: +2 tests (174/203 passing)

1. Node Creation in Edges (17 tests) - HIGH PRIORITY

Problem: Cannot read properties of undefined (reading 'id') Root Cause: When parsing edges like A-->B, vertices A and B are not being created in the vertices map

Examples of Failing Tests:

should handle basic arrow (A-->B)
should handle multiple edges (A-->B; B-->C)
should handle chained edges (A-->B-->C)

Solution Strategy:

Investigate which grammar rule is actually being used for failing tests
Add vertex creation to all edge processing paths:
- standaloneLinkStatement visitor (already has ensureVertex())
- vertexStatement with link chains
- Any other edge processing methods
Test the fix incrementally with one failing test at a time

Implementation Steps:

// In flowAst.ts - ensure all edge processing creates vertices
private ensureVertex(nodeId: string): void {
  if (!this.vertices[nodeId]) {
    this.vertices[nodeId] = {
      id: nodeId,
      text: nodeId,
      type: 'default',
    };
  }
}

// Add to ALL methods that process edges:
// - standaloneLinkStatement ✅ (already done)
// - vertexStatement (when it has link chains)
// - linkChain processing
// - Any other edge creation paths

2. Arrow Text Parsing (10 tests) - MEDIUM PRIORITY

Problem: Parse error: Expecting token of type --> EOF <-- but found --> '|' <-- Root Cause: Lexer not properly handling pipe character | in arrow text patterns like A-->|text|B

Examples of Failing Tests:

should handle arrow with text (A-->|text|B)
should handle edges with quoted text (A-->|"quoted text"|B)

Solution Strategy:

Fix lexer mode switching for pipe characters
Follow original JISON grammar for arrow text patterns
Implement proper tokenization of LINK + PIPE + text + PIPE sequences

Implementation Steps:

// In flowLexer.ts - fix pipe character handling
// Current issue: PIPE token conflicts with text content
// Solution: Use lexer modes or proper token precedence

// 1. Check how JISON handles |text| patterns
// 2. Implement similar tokenization in Chevrotain
// 3. Ensure link text is properly captured and processed

3. Special Characters at Node Start (4 tests) - LOW PRIORITY

Problem: Specific characters (:, &, ,, -) at start of node IDs not being parsed Root Cause: TOKEN precedence issues where punctuation tokens override NODE_STRING

Examples of Failing Tests:

Node IDs starting with :, &, ,, -

Solution Strategy:

Adjust token precedence in lexer
Modify NODE_STRING pattern to handle special characters
Test with each special character individually

Execution Plan

Phase 1: Fix Node Creation (Target: +17 tests = 189/203 passing)

Timeline: 1-2 hours Priority: HIGH - This affects the most tests

Debug which grammar rule is being used for failing edge tests

# Add logging to AST visitor methods to see which path is taken
vitest packages/mermaid/src/diagrams/flowchart/parser/flow-chev-arrows.spec.js -t "should handle basic arrow" --run

Add vertex creation to all edge processing paths
- Check vertexStatement when it processes link chains
- Check linkChain processing
- Ensure ensureVertex() is called for all edge endpoints

Test incrementally

# Test one failing test at a time
vitest packages/mermaid/src/diagrams/flowchart/parser/flow-chev-arrows.spec.js -t "should handle basic arrow" --run

Phase 2: Fix Arrow Text Parsing (Target: +10 tests = 199/203 passing)

Timeline: 2-3 hours Priority: MEDIUM - Complex lexer issue

Analyze original JISON grammar for arrow text patterns

# Check how flow.jison handles |text| patterns
grep -n "EdgeText\|PIPE" packages/mermaid/src/diagrams/flowchart/parser/flow.jison

Fix lexer tokenization for pipe characters
- Implement proper mode switching or token precedence
- Ensure A-->|text|B tokenizes as NODE_STRING LINK PIPE TEXT PIPE NODE_STRING
Update grammar rules to handle arrow text
- Ensure link rules can consume pipe-delimited text
- Test with various text patterns (quoted, unquoted, complex)

Phase 3: Fix Special Characters (Target: +4 tests = 203/203 passing)

Timeline: 1 hour Priority: LOW - Affects fewest tests

Identify token conflicts for each special character
Adjust lexer token order or patterns
Test each character individually

Success Criteria

Phase 1 Success:

All basic edge tests pass (A-->B, A-->B-->C, etc.)
Vertices are created for all edge endpoints
No regression in currently passing tests

Phase 2 Success:

All arrow text tests pass (A-->|text|B)
Lexer properly tokenizes pipe-delimited text
Grammar correctly parses arrow text patterns

Phase 3 Success:

All special character tests pass
Node IDs can start with :, &, ,, -
No conflicts with other tokens

Final Success:

203/203 tests passing (100%)
Full compatibility with original JISON parser
All existing functionality preserved

Risk Mitigation

High Risk: Breaking Currently Passing Tests

Mitigation: Run full test suite after each change

vitest packages/mermaid/src/diagrams/flowchart/parser/*flow*-chev*.spec.js --run

Medium Risk: Lexer Changes Affecting Other Patterns

Mitigation: Test with diverse input patterns, not just failing tests

Low Risk: Performance Impact

Mitigation: Current implementation is already efficient, changes should be minimal

Tools and Commands

Run Specific Test:

vitest packages/mermaid/src/diagrams/flowchart/parser/flow-chev-arrows.spec.js -t "should handle basic arrow" --run

Run All Chevrotain Tests:

vitest packages/mermaid/src/diagrams/flowchart/parser/*flow*-chev*.spec.js --run

Debug Lexer Tokenization:

// In flowParserAdapter.ts
const lexResult = FlowChevLexer.tokenize(input);
console.debug('Tokens:', lexResult.tokens.map(t => [t.image, t.tokenType.name]));
console.debug('Errors:', lexResult.errors);

Check Grammar Rule Usage:

// Add logging to AST visitor methods
console.debug('Using standaloneLinkStatement for:', ctx);

Next Actions

Start with Phase 1 - Fix node creation (highest impact)
Debug the exact grammar path being taken for failing tests
Add vertex creation to all edge processing methods
Test incrementally to avoid regressions
Move to Phase 2 only after Phase 1 is complete

This systematic approach ensures we fix the most impactful issues first while maintaining the stability of the 85% of tests that are already passing.

7.4 KiB Raw Permalink Blame History

Chevrotain Parser Implementation Plan

Current Status: 86% Complete ✅

Remaining Issues: 29 Tests (3 Core Problems)

✅ COMPLETED: Phase 3 - Special Characters (4 tests)

1. Node Creation in Edges (17 tests) - HIGH PRIORITY

2. Arrow Text Parsing (10 tests) - MEDIUM PRIORITY

3. Special Characters at Node Start (4 tests) - LOW PRIORITY

Execution Plan

Phase 1: Fix Node Creation (Target: +17 tests = 189/203 passing)

Phase 2: Fix Arrow Text Parsing (Target: +10 tests = 199/203 passing)

Phase 3: Fix Special Characters (Target: +4 tests = 203/203 passing)

Success Criteria

Phase 1 Success:

Phase 2 Success:

Phase 3 Success:

Final Success:

Risk Mitigation

High Risk: Breaking Currently Passing Tests

Medium Risk: Lexer Changes Affecting Other Patterns

Low Risk: Performance Impact

Tools and Commands

Run Specific Test:

Run All Chevrotain Tests:

Debug Lexer Tokenization:

Check Grammar Rule Usage:

Next Actions

7.4 KiB

Raw Permalink Blame History