LangStudies/App/RegM/Lectures/Lecture.2.Notes.md

1.7 KiB

Symbols, alphabets, and langauges and Regular Grammars

Alphabet : A set of characters.

Sigma = { a, b }

Langauge : A set of strings over a particular alphabet.

L1(Sigma) = { a, aa, b, ab, ba, bba, .. } (Infinite) L2(Sigma) = { aa, bb, ab, ba }; (Length = 2, Finite)

Any time you constrain a langauge you are defining a formal grammar.

Formal Grammars:

FormalGrammer = (Non-Terminals, Terminals, Productions, Starting Symbol)

Non-Terminals : Variables (can be subsituted with a value) Terminals : Cannot be replaced by anything (constant) Productions : Rule in the grammar

G = (N, T, P, S)

Ex:

S -> aX
X -> b

(This notation is known as BNF : Bakus-Naur Form)

Ex.Non-Terminals = S, X Ex.Terminals = a, b Ex.Productions = S -> aX, X -> b (2) Ex.Starting Symbol = S

Only valid string : "ab"

Chomsky Hierachy :

  1. Unrestricted : Natural Langauges, Turing Machines
  2. Context-Sensitive : Programming Languages (Almost all in production)
  3. Context-Free : Programming Langauges (Parsing Syntax only)
  4. Regular : Regular Expressions

The lower in the hiearchy the less expressive it is.

RegExp is a vomit inducing terse notation that is equivalent to BNF.

BNF : RegExp S -> aS : S -> bA : a*bc* A -> epsilon : A -> cA :

epsilon : "The empty string".

Regular expressions may only have one non-terminal:

  • A the very right side (right-linear, RHS)
  • At the very left side (left-linear, LHS)

Regular expression have no support for NESTING They can be RECURSIVE

Context-free grammers support nesting. Ex: (( () )) Parenthesis balacing

Non-regular RegExp can support nesting but are not pure finite automata and are slower implementation.