mirror of
https://github.com/Ed94/LangStudies.git
synced 2025-01-22 04:23:45 -08:00
RegEx : Complted lectures 1-7. NOT TESTED.
This commit is contained in:
parent
d48610d5b8
commit
0dbc2c04ba
35
App/RegM/Lectures/Lecture.1.Notes.md
Normal file
35
App/RegM/Lectures/Lecture.1.Notes.md
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
# Automata Theory: Building a RegExp machine
|
||||||
|
|
||||||
|
## Content:
|
||||||
|
State Machines
|
||||||
|
Formal Grammars
|
||||||
|
Implement a regular expression processor
|
||||||
|
|
||||||
|
## History:
|
||||||
|
|
||||||
|
*Pioneers:*
|
||||||
|
|
||||||
|
1951 - Stephen Kleene invented reg exp (sets).
|
||||||
|
|
||||||
|
Reuglar Langauge : Langauge recognized by a finite automata (state machines).
|
||||||
|
Kleene's Therem : Equivalence of regular expressions and finite automata.
|
||||||
|
|
||||||
|
Has a notation named after him:
|
||||||
|
Kleene-Closure (AKA: Kleene star) : A* (Stands for repetition)
|
||||||
|
|
||||||
|
1956 - Chomsky defines his hiearchy fo grammers
|
||||||
|
|
||||||
|
Regular grammers are considered a type 3.
|
||||||
|
See: https://en.wikipedia.org/wiki/Chomsky_hierarchy
|
||||||
|
|
||||||
|
![img](https://i.imgur.com/Pj2aFeg.png)
|
||||||
|
|
||||||
|
Thus they are the weakest form of grammars.
|
||||||
|
|
||||||
|
1968 - Ken Thompson used them for pattern matching in strings, and
|
||||||
|
lexical analysis (scanners)
|
||||||
|
|
||||||
|
NFA - Thompson construction
|
||||||
|
|
||||||
|
|
||||||
|
|
74
App/RegM/Lectures/Lecture.2.Notes.md
Normal file
74
App/RegM/Lectures/Lecture.2.Notes.md
Normal file
@ -0,0 +1,74 @@
|
|||||||
|
# Symbols, alphabets, and langauges and Regular Grammars
|
||||||
|
|
||||||
|
Alphabet : A set of characters.
|
||||||
|
|
||||||
|
Sigma = { a, b }
|
||||||
|
|
||||||
|
Langauge : A set of strings over a particular alphabet.
|
||||||
|
|
||||||
|
L1(Sigma) = { a, aa, b, ab, ba, bba, .. } (Infinite)
|
||||||
|
L2(Sigma) = { aa, bb, ab, ba }; (Length = 2, Finite)
|
||||||
|
|
||||||
|
Any time you constraint a langauge you are
|
||||||
|
defining a formal grammar.
|
||||||
|
|
||||||
|
## Formal Grammars:
|
||||||
|
|
||||||
|
FormalGrammer = (Non-Terminals, Terminals, Productions, Starting Symbol)
|
||||||
|
|
||||||
|
Non-Terminals : Variables (can be subsituted with a value)
|
||||||
|
Terminals : Cannot be replaced by anything (constant)
|
||||||
|
Productions : Rule in the grammar
|
||||||
|
|
||||||
|
**G = (N, T, P, S)**
|
||||||
|
|
||||||
|
Ex:
|
||||||
|
```
|
||||||
|
S -> aX
|
||||||
|
X -> b
|
||||||
|
```
|
||||||
|
**(This notation is known as BNF : Bakus-Naur Form)**
|
||||||
|
|
||||||
|
Ex.Non-Terminals = S, X
|
||||||
|
Ex.Terminals = a, b
|
||||||
|
Ex.Productions = S -> aX, X -> b (2)
|
||||||
|
Ex.Starting Symbol = S
|
||||||
|
|
||||||
|
Only valid string : "ab"
|
||||||
|
|
||||||
|
## Chomsky Hierachy :
|
||||||
|
|
||||||
|
0. Unrestricted : Natural Langauges, Turing Machines
|
||||||
|
1. Context-Sensitive : Programming Languages (Almost all in production)
|
||||||
|
2. Context-Free : Programming Langauges (Parsing Syntax only)
|
||||||
|
3. Regular : Regular Expressions
|
||||||
|
|
||||||
|
The lower in the hiearchy the less expressive it is.
|
||||||
|
|
||||||
|
RegExp is a vomit inducing terse notation that is equivalent to BNF.
|
||||||
|
|
||||||
|
BNF : RegExp
|
||||||
|
S -> aS :
|
||||||
|
S -> bA : `a*bc*`
|
||||||
|
A -> epsilon :
|
||||||
|
A -> cA :
|
||||||
|
|
||||||
|
epsilon : "The empty string".
|
||||||
|
|
||||||
|
Regular expressions may only have one non-terminal:
|
||||||
|
* A the very right side (right-linear, RHS)
|
||||||
|
* At the very left side (left-linear, LHS)
|
||||||
|
|
||||||
|
Regular expression have no support for *NESTING*
|
||||||
|
They can be *RECURSIVE*
|
||||||
|
|
||||||
|
Context-free grammers support nesting.
|
||||||
|
Ex:
|
||||||
|
(( () ))
|
||||||
|
`Parenthesis balacing`
|
||||||
|
|
||||||
|
Non-regular RegExp can support nesting but are not pure
|
||||||
|
finite automata and are slower implementation.
|
||||||
|
|
||||||
|
|
||||||
|
|
85
App/RegM/Lectures/Lecture.3.Notes.md
Normal file
85
App/RegM/Lectures/Lecture.3.Notes.md
Normal file
@ -0,0 +1,85 @@
|
|||||||
|
# Finite Automata
|
||||||
|
***(AKA: Finite State Machine)***
|
||||||
|
|
||||||
|
Mechanism and abstraction used behind regular grammars.
|
||||||
|
|
||||||
|
Usually has its state represented using nodes and edges.
|
||||||
|
|
||||||
|
Regular grammar:
|
||||||
|
```
|
||||||
|
S -> bA
|
||||||
|
A -> epsilon
|
||||||
|
```
|
||||||
|
Equivalent to: `\b\`
|
||||||
|
|
||||||
|
State transition:
|
||||||
|
|
||||||
|
--label--> : Transition symbol
|
||||||
|
O : State Symbol
|
||||||
|
(o) : Accepting State
|
||||||
|
->O.Start : Starting State (State transition to Start)
|
||||||
|
|
||||||
|
Ex:
|
||||||
|
|
||||||
|
->O.*Start* --*transition*--> (o).*Accepting*
|
||||||
|
|
||||||
|
*ε* - Epsilon (Empty String)
|
||||||
|
`I will be spelling it out as I do not enjoy single glyth representation`
|
||||||
|
|
||||||
|
Two main types of Finite Automtata :
|
||||||
|
|
||||||
|
FA w/ output
|
||||||
|
* Moore machine
|
||||||
|
* Mealy machine
|
||||||
|
|
||||||
|
FA w/o output
|
||||||
|
* DFA - Deterministic
|
||||||
|
* NFA - Non-deterministic
|
||||||
|
* epsilon-NFA - (Epsilon Transition) special case
|
||||||
|
|
||||||
|
NFA : Non-deterministic FA - Allos transition on the same symbol to
|
||||||
|
different states
|
||||||
|
|
||||||
|
```
|
||||||
|
a->o
|
||||||
|
/
|
||||||
|
->o.1---b-->o
|
||||||
|
\
|
||||||
|
a->o
|
||||||
|
```
|
||||||
|
|
||||||
|
epsilon-NFA : Extension of NFA that allows *epsilon* transitions
|
||||||
|
|
||||||
|
```
|
||||||
|
a--->o---epsi--->(o)
|
||||||
|
/ /
|
||||||
|
->o----b-->epsi--->o
|
||||||
|
\
|
||||||
|
a-->o--epsi-->(o)
|
||||||
|
```
|
||||||
|
|
||||||
|
DFA : A state machine which forbids multiple transitions on the same symbol, and *epsilon* transitions
|
||||||
|
|
||||||
|
```
|
||||||
|
a--->o
|
||||||
|
/
|
||||||
|
->o----b-->o
|
||||||
|
```
|
||||||
|
|
||||||
|
Use case:
|
||||||
|
|
||||||
|
Implementation Transformations:
|
||||||
|
```RegExp -> epsilon-NFA -> ... -> DFA```
|
||||||
|
|
||||||
|
## Formal Definition:
|
||||||
|
|
||||||
|
Non-deterministic finite automata is a tuple of five elements:
|
||||||
|
* All possible states
|
||||||
|
* Alphabet
|
||||||
|
* Transition Function
|
||||||
|
* Starting State
|
||||||
|
* Set of accepting states
|
||||||
|
|
||||||
|
NFA = ( States, Alphabet, TransitionFunction, StartingState, AcceptingStates )
|
||||||
|
|
||||||
|
NFA = ( Q, Σ, Δ, q0, F )
|
28
App/RegM/Lectures/Lecture.4.Notes.md
Normal file
28
App/RegM/Lectures/Lecture.4.Notes.md
Normal file
@ -0,0 +1,28 @@
|
|||||||
|
# Basic NFA Fragments
|
||||||
|
|
||||||
|
### Single Character
|
||||||
|
RegExp: `/^A$/`
|
||||||
|
Psuedo:
|
||||||
|
`str.start glyph(A) str.end`
|
||||||
|
|
||||||
|
^ : Beginning of string : Str.Start
|
||||||
|
$ : End of a string : Str.End
|
||||||
|
|
||||||
|
Machine:
|
||||||
|
->o.*Start* ---**Glyph**---> (o).*Accepting*
|
||||||
|
|
||||||
|
### Epsilon-Transition
|
||||||
|
RegExp: `/^$/`
|
||||||
|
Psuedo: `str.start str.end`
|
||||||
|
|
||||||
|
Machine:
|
||||||
|
```
|
||||||
|
->o --epsilon--> (o)
|
||||||
|
```
|
||||||
|
|
||||||
|
Everyhing else can be built on top of these machines.
|
||||||
|
|
||||||
|
```
|
||||||
|
Start = Input, Accepting = Output
|
||||||
|
```
|
||||||
|
|
39
App/RegM/Lectures/Lecture.5.6.7.Notes.md
Normal file
39
App/RegM/Lectures/Lecture.5.6.7.Notes.md
Normal file
@ -0,0 +1,39 @@
|
|||||||
|
## Concatenation
|
||||||
|
|
||||||
|
Regex : `/^AB%/`
|
||||||
|
Psuedo: `str.start str(AB) str.end`
|
||||||
|
|
||||||
|
Machine:
|
||||||
|
```
|
||||||
|
->o --A--> o --epsilon--> o --B--> (o)
|
||||||
|
|
||||||
|
Submachine_A --epsilon--> Submachine_B
|
||||||
|
```
|
||||||
|
|
||||||
|
## Union
|
||||||
|
|
||||||
|
Regex : `/^A|B$/`
|
||||||
|
Psuedo: `str.start glyph(A) | glyph(B) str.end`
|
||||||
|
|
||||||
|
Machine:
|
||||||
|
```
|
||||||
|
epsilon--> o --A--> o --epsilon
|
||||||
|
/ \
|
||||||
|
->o ->(o)
|
||||||
|
\ /
|
||||||
|
epsilon--> o --B--> o --epsilon
|
||||||
|
```
|
||||||
|
|
||||||
|
## Kleene Closure
|
||||||
|
|
||||||
|
Regex : `/^A*$/`
|
||||||
|
Psuedo: `str.start glyph(A).repeating str.end`
|
||||||
|
|
||||||
|
Machine:
|
||||||
|
```
|
||||||
|
<------episolon-------
|
||||||
|
/ \
|
||||||
|
->o --epsilon--> o --A--> o --epsilon--> (o)
|
||||||
|
\ /
|
||||||
|
-------------epsilon---------------->
|
||||||
|
```
|
107
App/RegM/Scripts/FiniteAutomata.gd
Normal file
107
App/RegM/Scripts/FiniteAutomata.gd
Normal file
@ -0,0 +1,107 @@
|
|||||||
|
extends Object
|
||||||
|
|
||||||
|
|
||||||
|
const epsilon = 'ε'
|
||||||
|
|
||||||
|
|
||||||
|
class State :
|
||||||
|
var accepting : bool = false
|
||||||
|
var transitionMap : Dictionary
|
||||||
|
|
||||||
|
func _init(accepting : bool):
|
||||||
|
self.accepting = accepting
|
||||||
|
|
||||||
|
transitionMap[epsilon] = Array.new()
|
||||||
|
|
||||||
|
func add_Transition(symbol : string, state : State):
|
||||||
|
if symbol == epsilon :
|
||||||
|
transitionMap[symbol].append(state)
|
||||||
|
return
|
||||||
|
|
||||||
|
transitionMap[symbol] = state
|
||||||
|
|
||||||
|
func get_Transition(symbol : string) :
|
||||||
|
return transitionMap[symbol]
|
||||||
|
|
||||||
|
class NFA :
|
||||||
|
var input : State
|
||||||
|
var output : State
|
||||||
|
|
||||||
|
func _init(input : State, output : State):
|
||||||
|
self.input = input
|
||||||
|
self.output = output
|
||||||
|
|
||||||
|
func test(string : String) :
|
||||||
|
return input.test(string)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
func concat(first : NFA, rest : Array):
|
||||||
|
for entry in rest :
|
||||||
|
first = concat_pair(first, entry)
|
||||||
|
|
||||||
|
return first
|
||||||
|
|
||||||
|
func concat_pair(first : NFA, second : NFA):
|
||||||
|
first.output.accepting = false
|
||||||
|
second.output.accepting = true
|
||||||
|
|
||||||
|
first.output.add_Transition(epsilon, second.input)
|
||||||
|
|
||||||
|
return NFA.new(first.input, second.output)
|
||||||
|
|
||||||
|
# Epsilon-Transition machine
|
||||||
|
func empty():
|
||||||
|
return glyph(epsilon)
|
||||||
|
|
||||||
|
# Single character machine.
|
||||||
|
func glyph(symbol : string):
|
||||||
|
var start = State.new(false)
|
||||||
|
var accepting = State.new(true)
|
||||||
|
|
||||||
|
start.add_Transition(symbol, accepting)
|
||||||
|
|
||||||
|
return NFA.new(start, accepting)
|
||||||
|
|
||||||
|
func repeat(entry : NFA)
|
||||||
|
var start = State.new(false)
|
||||||
|
var accepting = State.new(true)
|
||||||
|
|
||||||
|
start.add_Transition(epsilon, entry.input)
|
||||||
|
|
||||||
|
entry.output.accepting(false)
|
||||||
|
entry.output.add_Transition(epsilon, entry.input) # Repeater transition
|
||||||
|
entry.output.add_Transition(epsilon, accepting)
|
||||||
|
|
||||||
|
return NFA.new(start, accepting)
|
||||||
|
|
||||||
|
func union(first : NFA, rest : Array):
|
||||||
|
for entry in rest :
|
||||||
|
first = union_pair(first, entry)
|
||||||
|
|
||||||
|
return first
|
||||||
|
|
||||||
|
func union_pair(a : NFA, b : NFA):
|
||||||
|
var start = State.new(false)
|
||||||
|
var accepting = State.new(true)
|
||||||
|
|
||||||
|
start.add_Transition(epsilon, a.input)
|
||||||
|
start.add_Transition(epsilon, b.output)
|
||||||
|
|
||||||
|
a.output.accepting = false
|
||||||
|
b.output.accepting = false
|
||||||
|
|
||||||
|
a.output.add_Transition(epsilon, accepting)
|
||||||
|
b.output.add_Transition(epsilon, accepting)
|
||||||
|
|
||||||
|
return NFA.new(start, accepting)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
func test():
|
||||||
|
var state_1 = State.new(false)
|
||||||
|
var state_2 = State.new(true)
|
||||||
|
|
||||||
|
state_1.add_Transition('A', state_2)
|
||||||
|
|
||||||
|
print("State 1 Transition for " + "A: " + state_1.get_Transition('A'))
|
Loading…
x
Reference in New Issue
Block a user