RegEx : Complted lectures 1-7. NOT TESTED.

This commit is contained in:
2022-07-16 19:55:57 -04:00
parent d48610d5b8
commit 0dbc2c04ba
6 changed files with 368 additions and 0 deletions

View File

@ -0,0 +1,35 @@
# Automata Theory: Building a RegExp machine
## Content:
State Machines
Formal Grammars
Implement a regular expression processor
## History:
*Pioneers:*
1951 - Stephen Kleene invented reg exp (sets).
Reuglar Langauge : Langauge recognized by a finite automata (state machines).
Kleene's Therem : Equivalence of regular expressions and finite automata.
Has a notation named after him:
Kleene-Closure (AKA: Kleene star) : A* (Stands for repetition)
1956 - Chomsky defines his hiearchy fo grammers
Regular grammers are considered a type 3.
See: https://en.wikipedia.org/wiki/Chomsky_hierarchy
![img](https://i.imgur.com/Pj2aFeg.png)
Thus they are the weakest form of grammars.
1968 - Ken Thompson used them for pattern matching in strings, and
lexical analysis (scanners)
NFA - Thompson construction

View File

@ -0,0 +1,74 @@
# Symbols, alphabets, and langauges and Regular Grammars
Alphabet : A set of characters.
Sigma = { a, b }
Langauge : A set of strings over a particular alphabet.
L1(Sigma) = { a, aa, b, ab, ba, bba, .. } (Infinite)
L2(Sigma) = { aa, bb, ab, ba }; (Length = 2, Finite)
Any time you constraint a langauge you are
defining a formal grammar.
## Formal Grammars:
FormalGrammer = (Non-Terminals, Terminals, Productions, Starting Symbol)
Non-Terminals : Variables (can be subsituted with a value)
Terminals : Cannot be replaced by anything (constant)
Productions : Rule in the grammar
**G = (N, T, P, S)**
Ex:
```
S -> aX
X -> b
```
**(This notation is known as BNF : Bakus-Naur Form)**
Ex.Non-Terminals = S, X
Ex.Terminals = a, b
Ex.Productions = S -> aX, X -> b (2)
Ex.Starting Symbol = S
Only valid string : "ab"
## Chomsky Hierachy :
0. Unrestricted : Natural Langauges, Turing Machines
1. Context-Sensitive : Programming Languages (Almost all in production)
2. Context-Free : Programming Langauges (Parsing Syntax only)
3. Regular : Regular Expressions
The lower in the hiearchy the less expressive it is.
RegExp is a vomit inducing terse notation that is equivalent to BNF.
BNF : RegExp
S -> aS :
S -> bA : `a*bc*`
A -> epsilon :
A -> cA :
epsilon : "The empty string".
Regular expressions may only have one non-terminal:
* A the very right side (right-linear, RHS)
* At the very left side (left-linear, LHS)
Regular expression have no support for *NESTING*
They can be *RECURSIVE*
Context-free grammers support nesting.
Ex:
(( () ))
`Parenthesis balacing`
Non-regular RegExp can support nesting but are not pure
finite automata and are slower implementation.

View File

@ -0,0 +1,85 @@
# Finite Automata
***(AKA: Finite State Machine)***
Mechanism and abstraction used behind regular grammars.
Usually has its state represented using nodes and edges.
Regular grammar:
```
S -> bA
A -> epsilon
```
Equivalent to: `\b\`
State transition:
--label--> : Transition symbol
O : State Symbol
(o) : Accepting State
->O.Start : Starting State (State transition to Start)
Ex:
->O.*Start* --*transition*--> (o).*Accepting*
*ε* - Epsilon (Empty String)
`I will be spelling it out as I do not enjoy single glyth representation`
Two main types of Finite Automtata :
FA w/ output
* Moore machine
* Mealy machine
FA w/o output
* DFA - Deterministic
* NFA - Non-deterministic
* epsilon-NFA - (Epsilon Transition) special case
NFA : Non-deterministic FA - Allos transition on the same symbol to
different states
```
a->o
/
->o.1---b-->o
\
a->o
```
epsilon-NFA : Extension of NFA that allows *epsilon* transitions
```
a--->o---epsi--->(o)
/ /
->o----b-->epsi--->o
\
a-->o--epsi-->(o)
```
DFA : A state machine which forbids multiple transitions on the same symbol, and *epsilon* transitions
```
a--->o
/
->o----b-->o
```
Use case:
Implementation Transformations:
```RegExp -> epsilon-NFA -> ... -> DFA```
## Formal Definition:
Non-deterministic finite automata is a tuple of five elements:
* All possible states
* Alphabet
* Transition Function
* Starting State
* Set of accepting states
NFA = ( States, Alphabet, TransitionFunction, StartingState, AcceptingStates )
NFA = ( Q, Σ, Δ, q0, F )

View File

@ -0,0 +1,28 @@
# Basic NFA Fragments
### Single Character
RegExp: `/^A$/`
Psuedo:
`str.start glyph(A) str.end`
^ : Beginning of string : Str.Start
$ : End of a string : Str.End
Machine:
->o.*Start* ---**Glyph**---> (o).*Accepting*
### Epsilon-Transition
RegExp: `/^$/`
Psuedo: `str.start str.end`
Machine:
```
->o --epsilon--> (o)
```
Everyhing else can be built on top of these machines.
```
Start = Input, Accepting = Output
```

View File

@ -0,0 +1,39 @@
## Concatenation
Regex : `/^AB%/`
Psuedo: `str.start str(AB) str.end`
Machine:
```
->o --A--> o --epsilon--> o --B--> (o)
Submachine_A --epsilon--> Submachine_B
```
## Union
Regex : `/^A|B$/`
Psuedo: `str.start glyph(A) | glyph(B) str.end`
Machine:
```
epsilon--> o --A--> o --epsilon
/ \
->o ->(o)
\ /
epsilon--> o --B--> o --epsilon
```
## Kleene Closure
Regex : `/^A*$/`
Psuedo: `str.start glyph(A).repeating str.end`
Machine:
```
<------episolon-------
/ \
->o --epsilon--> o --A--> o --epsilon--> (o)
\ /
-------------epsilon---------------->
```