mirror of
https://github.com/Ed94/LangStudies.git
synced 2025-06-15 03:21:46 -07:00
RegEx : Complted lectures 1-7. NOT TESTED.
This commit is contained in:
35
App/RegM/Lectures/Lecture.1.Notes.md
Normal file
35
App/RegM/Lectures/Lecture.1.Notes.md
Normal file
@ -0,0 +1,35 @@
|
||||
# Automata Theory: Building a RegExp machine
|
||||
|
||||
## Content:
|
||||
State Machines
|
||||
Formal Grammars
|
||||
Implement a regular expression processor
|
||||
|
||||
## History:
|
||||
|
||||
*Pioneers:*
|
||||
|
||||
1951 - Stephen Kleene invented reg exp (sets).
|
||||
|
||||
Reuglar Langauge : Langauge recognized by a finite automata (state machines).
|
||||
Kleene's Therem : Equivalence of regular expressions and finite automata.
|
||||
|
||||
Has a notation named after him:
|
||||
Kleene-Closure (AKA: Kleene star) : A* (Stands for repetition)
|
||||
|
||||
1956 - Chomsky defines his hiearchy fo grammers
|
||||
|
||||
Regular grammers are considered a type 3.
|
||||
See: https://en.wikipedia.org/wiki/Chomsky_hierarchy
|
||||
|
||||

|
||||
|
||||
Thus they are the weakest form of grammars.
|
||||
|
||||
1968 - Ken Thompson used them for pattern matching in strings, and
|
||||
lexical analysis (scanners)
|
||||
|
||||
NFA - Thompson construction
|
||||
|
||||
|
||||
|
74
App/RegM/Lectures/Lecture.2.Notes.md
Normal file
74
App/RegM/Lectures/Lecture.2.Notes.md
Normal file
@ -0,0 +1,74 @@
|
||||
# Symbols, alphabets, and langauges and Regular Grammars
|
||||
|
||||
Alphabet : A set of characters.
|
||||
|
||||
Sigma = { a, b }
|
||||
|
||||
Langauge : A set of strings over a particular alphabet.
|
||||
|
||||
L1(Sigma) = { a, aa, b, ab, ba, bba, .. } (Infinite)
|
||||
L2(Sigma) = { aa, bb, ab, ba }; (Length = 2, Finite)
|
||||
|
||||
Any time you constraint a langauge you are
|
||||
defining a formal grammar.
|
||||
|
||||
## Formal Grammars:
|
||||
|
||||
FormalGrammer = (Non-Terminals, Terminals, Productions, Starting Symbol)
|
||||
|
||||
Non-Terminals : Variables (can be subsituted with a value)
|
||||
Terminals : Cannot be replaced by anything (constant)
|
||||
Productions : Rule in the grammar
|
||||
|
||||
**G = (N, T, P, S)**
|
||||
|
||||
Ex:
|
||||
```
|
||||
S -> aX
|
||||
X -> b
|
||||
```
|
||||
**(This notation is known as BNF : Bakus-Naur Form)**
|
||||
|
||||
Ex.Non-Terminals = S, X
|
||||
Ex.Terminals = a, b
|
||||
Ex.Productions = S -> aX, X -> b (2)
|
||||
Ex.Starting Symbol = S
|
||||
|
||||
Only valid string : "ab"
|
||||
|
||||
## Chomsky Hierachy :
|
||||
|
||||
0. Unrestricted : Natural Langauges, Turing Machines
|
||||
1. Context-Sensitive : Programming Languages (Almost all in production)
|
||||
2. Context-Free : Programming Langauges (Parsing Syntax only)
|
||||
3. Regular : Regular Expressions
|
||||
|
||||
The lower in the hiearchy the less expressive it is.
|
||||
|
||||
RegExp is a vomit inducing terse notation that is equivalent to BNF.
|
||||
|
||||
BNF : RegExp
|
||||
S -> aS :
|
||||
S -> bA : `a*bc*`
|
||||
A -> epsilon :
|
||||
A -> cA :
|
||||
|
||||
epsilon : "The empty string".
|
||||
|
||||
Regular expressions may only have one non-terminal:
|
||||
* A the very right side (right-linear, RHS)
|
||||
* At the very left side (left-linear, LHS)
|
||||
|
||||
Regular expression have no support for *NESTING*
|
||||
They can be *RECURSIVE*
|
||||
|
||||
Context-free grammers support nesting.
|
||||
Ex:
|
||||
(( () ))
|
||||
`Parenthesis balacing`
|
||||
|
||||
Non-regular RegExp can support nesting but are not pure
|
||||
finite automata and are slower implementation.
|
||||
|
||||
|
||||
|
85
App/RegM/Lectures/Lecture.3.Notes.md
Normal file
85
App/RegM/Lectures/Lecture.3.Notes.md
Normal file
@ -0,0 +1,85 @@
|
||||
# Finite Automata
|
||||
***(AKA: Finite State Machine)***
|
||||
|
||||
Mechanism and abstraction used behind regular grammars.
|
||||
|
||||
Usually has its state represented using nodes and edges.
|
||||
|
||||
Regular grammar:
|
||||
```
|
||||
S -> bA
|
||||
A -> epsilon
|
||||
```
|
||||
Equivalent to: `\b\`
|
||||
|
||||
State transition:
|
||||
|
||||
--label--> : Transition symbol
|
||||
O : State Symbol
|
||||
(o) : Accepting State
|
||||
->O.Start : Starting State (State transition to Start)
|
||||
|
||||
Ex:
|
||||
|
||||
->O.*Start* --*transition*--> (o).*Accepting*
|
||||
|
||||
*ε* - Epsilon (Empty String)
|
||||
`I will be spelling it out as I do not enjoy single glyth representation`
|
||||
|
||||
Two main types of Finite Automtata :
|
||||
|
||||
FA w/ output
|
||||
* Moore machine
|
||||
* Mealy machine
|
||||
|
||||
FA w/o output
|
||||
* DFA - Deterministic
|
||||
* NFA - Non-deterministic
|
||||
* epsilon-NFA - (Epsilon Transition) special case
|
||||
|
||||
NFA : Non-deterministic FA - Allos transition on the same symbol to
|
||||
different states
|
||||
|
||||
```
|
||||
a->o
|
||||
/
|
||||
->o.1---b-->o
|
||||
\
|
||||
a->o
|
||||
```
|
||||
|
||||
epsilon-NFA : Extension of NFA that allows *epsilon* transitions
|
||||
|
||||
```
|
||||
a--->o---epsi--->(o)
|
||||
/ /
|
||||
->o----b-->epsi--->o
|
||||
\
|
||||
a-->o--epsi-->(o)
|
||||
```
|
||||
|
||||
DFA : A state machine which forbids multiple transitions on the same symbol, and *epsilon* transitions
|
||||
|
||||
```
|
||||
a--->o
|
||||
/
|
||||
->o----b-->o
|
||||
```
|
||||
|
||||
Use case:
|
||||
|
||||
Implementation Transformations:
|
||||
```RegExp -> epsilon-NFA -> ... -> DFA```
|
||||
|
||||
## Formal Definition:
|
||||
|
||||
Non-deterministic finite automata is a tuple of five elements:
|
||||
* All possible states
|
||||
* Alphabet
|
||||
* Transition Function
|
||||
* Starting State
|
||||
* Set of accepting states
|
||||
|
||||
NFA = ( States, Alphabet, TransitionFunction, StartingState, AcceptingStates )
|
||||
|
||||
NFA = ( Q, Σ, Δ, q0, F )
|
28
App/RegM/Lectures/Lecture.4.Notes.md
Normal file
28
App/RegM/Lectures/Lecture.4.Notes.md
Normal file
@ -0,0 +1,28 @@
|
||||
# Basic NFA Fragments
|
||||
|
||||
### Single Character
|
||||
RegExp: `/^A$/`
|
||||
Psuedo:
|
||||
`str.start glyph(A) str.end`
|
||||
|
||||
^ : Beginning of string : Str.Start
|
||||
$ : End of a string : Str.End
|
||||
|
||||
Machine:
|
||||
->o.*Start* ---**Glyph**---> (o).*Accepting*
|
||||
|
||||
### Epsilon-Transition
|
||||
RegExp: `/^$/`
|
||||
Psuedo: `str.start str.end`
|
||||
|
||||
Machine:
|
||||
```
|
||||
->o --epsilon--> (o)
|
||||
```
|
||||
|
||||
Everyhing else can be built on top of these machines.
|
||||
|
||||
```
|
||||
Start = Input, Accepting = Output
|
||||
```
|
||||
|
39
App/RegM/Lectures/Lecture.5.6.7.Notes.md
Normal file
39
App/RegM/Lectures/Lecture.5.6.7.Notes.md
Normal file
@ -0,0 +1,39 @@
|
||||
## Concatenation
|
||||
|
||||
Regex : `/^AB%/`
|
||||
Psuedo: `str.start str(AB) str.end`
|
||||
|
||||
Machine:
|
||||
```
|
||||
->o --A--> o --epsilon--> o --B--> (o)
|
||||
|
||||
Submachine_A --epsilon--> Submachine_B
|
||||
```
|
||||
|
||||
## Union
|
||||
|
||||
Regex : `/^A|B$/`
|
||||
Psuedo: `str.start glyph(A) | glyph(B) str.end`
|
||||
|
||||
Machine:
|
||||
```
|
||||
epsilon--> o --A--> o --epsilon
|
||||
/ \
|
||||
->o ->(o)
|
||||
\ /
|
||||
epsilon--> o --B--> o --epsilon
|
||||
```
|
||||
|
||||
## Kleene Closure
|
||||
|
||||
Regex : `/^A*$/`
|
||||
Psuedo: `str.start glyph(A).repeating str.end`
|
||||
|
||||
Machine:
|
||||
```
|
||||
<------episolon-------
|
||||
/ \
|
||||
->o --epsilon--> o --A--> o --epsilon--> (o)
|
||||
\ /
|
||||
-------------epsilon---------------->
|
||||
```
|
Reference in New Issue
Block a user