RegEx : Complted lectures 1-7. NOT TESTED.

This commit is contained in:
Edward R. Gonzalez 2022-07-16 19:55:57 -04:00
parent d48610d5b8
commit 0dbc2c04ba
6 changed files with 368 additions and 0 deletions

View File

@ -0,0 +1,35 @@
# Automata Theory: Building a RegExp machine
## Content:
State Machines
Formal Grammars
Implement a regular expression processor
## History:
*Pioneers:*
1951 - Stephen Kleene invented reg exp (sets).
Reuglar Langauge : Langauge recognized by a finite automata (state machines).
Kleene's Therem : Equivalence of regular expressions and finite automata.
Has a notation named after him:
Kleene-Closure (AKA: Kleene star) : A* (Stands for repetition)
1956 - Chomsky defines his hiearchy fo grammers
Regular grammers are considered a type 3.
See: https://en.wikipedia.org/wiki/Chomsky_hierarchy
![img](https://i.imgur.com/Pj2aFeg.png)
Thus they are the weakest form of grammars.
1968 - Ken Thompson used them for pattern matching in strings, and
lexical analysis (scanners)
NFA - Thompson construction

View File

@ -0,0 +1,74 @@
# Symbols, alphabets, and langauges and Regular Grammars
Alphabet : A set of characters.
Sigma = { a, b }
Langauge : A set of strings over a particular alphabet.
L1(Sigma) = { a, aa, b, ab, ba, bba, .. } (Infinite)
L2(Sigma) = { aa, bb, ab, ba }; (Length = 2, Finite)
Any time you constraint a langauge you are
defining a formal grammar.
## Formal Grammars:
FormalGrammer = (Non-Terminals, Terminals, Productions, Starting Symbol)
Non-Terminals : Variables (can be subsituted with a value)
Terminals : Cannot be replaced by anything (constant)
Productions : Rule in the grammar
**G = (N, T, P, S)**
Ex:
```
S -> aX
X -> b
```
**(This notation is known as BNF : Bakus-Naur Form)**
Ex.Non-Terminals = S, X
Ex.Terminals = a, b
Ex.Productions = S -> aX, X -> b (2)
Ex.Starting Symbol = S
Only valid string : "ab"
## Chomsky Hierachy :
0. Unrestricted : Natural Langauges, Turing Machines
1. Context-Sensitive : Programming Languages (Almost all in production)
2. Context-Free : Programming Langauges (Parsing Syntax only)
3. Regular : Regular Expressions
The lower in the hiearchy the less expressive it is.
RegExp is a vomit inducing terse notation that is equivalent to BNF.
BNF : RegExp
S -> aS :
S -> bA : `a*bc*`
A -> epsilon :
A -> cA :
epsilon : "The empty string".
Regular expressions may only have one non-terminal:
* A the very right side (right-linear, RHS)
* At the very left side (left-linear, LHS)
Regular expression have no support for *NESTING*
They can be *RECURSIVE*
Context-free grammers support nesting.
Ex:
(( () ))
`Parenthesis balacing`
Non-regular RegExp can support nesting but are not pure
finite automata and are slower implementation.

View File

@ -0,0 +1,85 @@
# Finite Automata
***(AKA: Finite State Machine)***
Mechanism and abstraction used behind regular grammars.
Usually has its state represented using nodes and edges.
Regular grammar:
```
S -> bA
A -> epsilon
```
Equivalent to: `\b\`
State transition:
--label--> : Transition symbol
O : State Symbol
(o) : Accepting State
->O.Start : Starting State (State transition to Start)
Ex:
->O.*Start* --*transition*--> (o).*Accepting*
*ε* - Epsilon (Empty String)
`I will be spelling it out as I do not enjoy single glyth representation`
Two main types of Finite Automtata :
FA w/ output
* Moore machine
* Mealy machine
FA w/o output
* DFA - Deterministic
* NFA - Non-deterministic
* epsilon-NFA - (Epsilon Transition) special case
NFA : Non-deterministic FA - Allos transition on the same symbol to
different states
```
a->o
/
->o.1---b-->o
\
a->o
```
epsilon-NFA : Extension of NFA that allows *epsilon* transitions
```
a--->o---epsi--->(o)
/ /
->o----b-->epsi--->o
\
a-->o--epsi-->(o)
```
DFA : A state machine which forbids multiple transitions on the same symbol, and *epsilon* transitions
```
a--->o
/
->o----b-->o
```
Use case:
Implementation Transformations:
```RegExp -> epsilon-NFA -> ... -> DFA```
## Formal Definition:
Non-deterministic finite automata is a tuple of five elements:
* All possible states
* Alphabet
* Transition Function
* Starting State
* Set of accepting states
NFA = ( States, Alphabet, TransitionFunction, StartingState, AcceptingStates )
NFA = ( Q, Σ, Δ, q0, F )

View File

@ -0,0 +1,28 @@
# Basic NFA Fragments
### Single Character
RegExp: `/^A$/`
Psuedo:
`str.start glyph(A) str.end`
^ : Beginning of string : Str.Start
$ : End of a string : Str.End
Machine:
->o.*Start* ---**Glyph**---> (o).*Accepting*
### Epsilon-Transition
RegExp: `/^$/`
Psuedo: `str.start str.end`
Machine:
```
->o --epsilon--> (o)
```
Everyhing else can be built on top of these machines.
```
Start = Input, Accepting = Output
```

View File

@ -0,0 +1,39 @@
## Concatenation
Regex : `/^AB%/`
Psuedo: `str.start str(AB) str.end`
Machine:
```
->o --A--> o --epsilon--> o --B--> (o)
Submachine_A --epsilon--> Submachine_B
```
## Union
Regex : `/^A|B$/`
Psuedo: `str.start glyph(A) | glyph(B) str.end`
Machine:
```
epsilon--> o --A--> o --epsilon
/ \
->o ->(o)
\ /
epsilon--> o --B--> o --epsilon
```
## Kleene Closure
Regex : `/^A*$/`
Psuedo: `str.start glyph(A).repeating str.end`
Machine:
```
<------episolon-------
/ \
->o --epsilon--> o --A--> o --epsilon--> (o)
\ /
-------------epsilon---------------->
```

View File

@ -0,0 +1,107 @@
extends Object
const epsilon = 'ε'
class State :
var accepting : bool = false
var transitionMap : Dictionary
func _init(accepting : bool):
self.accepting = accepting
transitionMap[epsilon] = Array.new()
func add_Transition(symbol : string, state : State):
if symbol == epsilon :
transitionMap[symbol].append(state)
return
transitionMap[symbol] = state
func get_Transition(symbol : string) :
return transitionMap[symbol]
class NFA :
var input : State
var output : State
func _init(input : State, output : State):
self.input = input
self.output = output
func test(string : String) :
return input.test(string)
func concat(first : NFA, rest : Array):
for entry in rest :
first = concat_pair(first, entry)
return first
func concat_pair(first : NFA, second : NFA):
first.output.accepting = false
second.output.accepting = true
first.output.add_Transition(epsilon, second.input)
return NFA.new(first.input, second.output)
# Epsilon-Transition machine
func empty():
return glyph(epsilon)
# Single character machine.
func glyph(symbol : string):
var start = State.new(false)
var accepting = State.new(true)
start.add_Transition(symbol, accepting)
return NFA.new(start, accepting)
func repeat(entry : NFA)
var start = State.new(false)
var accepting = State.new(true)
start.add_Transition(epsilon, entry.input)
entry.output.accepting(false)
entry.output.add_Transition(epsilon, entry.input) # Repeater transition
entry.output.add_Transition(epsilon, accepting)
return NFA.new(start, accepting)
func union(first : NFA, rest : Array):
for entry in rest :
first = union_pair(first, entry)
return first
func union_pair(a : NFA, b : NFA):
var start = State.new(false)
var accepting = State.new(true)
start.add_Transition(epsilon, a.input)
start.add_Transition(epsilon, b.output)
a.output.accepting = false
b.output.accepting = false
a.output.add_Transition(epsilon, accepting)
b.output.add_Transition(epsilon, accepting)
return NFA.new(start, accepting)
func test():
var state_1 = State.new(false)
var state_2 = State.new(true)
state_1.add_Transition('A', state_2)
print("State 1 Transition for " + "A: " + state_1.get_Transition('A'))