Ada 2012 Tutorial #2

Ada 2012 Tutorial
Parsing (and Streams)

Ada Lovelace, the namesake of the Ada programming language, considered the world’s first computer programmer

We left off the previous tutorial at parsing input from a user or a file, so we’re going to address that today. First, however, I need to introduce Streams.

Streams are a method to read or write any object to any medium, and thus they are doubly generalized. This also means that you are bounded by the most restrictive set of operations common to all mediums. As an example, you cannot provide position control in a general manner because not all transmission modes are random-access (like receiving a radio-signal), and not all streams are bi-directional (like a light-sensor).

In the informal parlance we’ve adopted we can just sat that all types have stream attributes, accessed with ‘Read and ‘Write, because all elementary types have them and the compiler knows how to compose compound types from elementary types, so you don’t normally have to keep track of elements in a compound type. (You do have to keep track of them if you’re writing both read and write that must be be functionally, rather than perfectly, inverse-operations; this is not a deficiency, but because you are implementing a protocol.)

So let’s see how to do it.

Here’s the spec for lists with the appropriate modifications.

Use All Type LISP.Elements.Element;

Package LISP.Lists is

    -- The following package, "implementation", is the
    -- instantiation of a generic package; a generic must be
    -- explicitly instantiated before its use in your program
    -- text.
    Package Implementation is new Ada.Containers.Indefinite_Vectors
      (Index_Type => Positive,  Element_Type => LISP.Elements.Element);
    -- PS: The above is using named association, which can be helpful.

    -- The following line declares inheritance, with no
    -- extension to the data of the base object/class; there
    -- are, however, new 'methods' which are added below.
    Type LIST is new Implementation.Vector with null record
	with Read => Read, Write => Write;
    -- For some reason, the subtype-rename causes an error on
    -- visibility; therefore, the type-extension above is used.
    --     SubType List is Implementation.Vector;

    -- Primitive operations are those subprograms which
    -- interact with a type and are declared before the type is
    -- frozen. (Read as: in the same declaring unit, before
    -- some point which the compiler must commit to a type.)
    -- The following are primitive operations for LIST.
    Function Evaluate(Input : List) Return List;
    Function Head(Input : List) Return LISP.Elements.Element;
    Function Tail(Input : List) Return List;
    Function To_String  (Input : List)   Return String;
    Function Homoginized_List( Input : List ) Return Boolean;

    Use Implementation, Ada.Streams;

    -- Read and Write are ptivate because the user can access
    -- them via LIST'Read and LIST'Write stream operations.

    procedure Read(
		    Stream : not null access Root_Stream_Type'Class;
		    Item   : out LIST

    procedure Write(
		    Stream : not null access Root_Stream_Type'Class;
		    Item   : in LIST

    -- The two following comment lines are the old method of
    -- specifying custom subprograms be used for stream
    -- interaction:

    -- For LIST'Read  Use Read;
    -- For List'Write Use Write;

End Lisp.Lists;

Then, in the body, we add the following lines:

Portions of lisp-lists.adp
    -- Write is simple: wirte out the string representation of
    -- the list to the stream.
    procedure Write(
            Stream : not null access Root_Stream_Type'Class;
            Item   : in LIST
           )  is
        String'Write( Stream, Item.To_String );
    end Write;

    -- Because reading is itself more complicated we will
    -- make the function itself SEPARATE.
    procedure Read(
            Stream : not null access Root_Stream_Type'Class;
            Item   : out LIST
          ) is separate;

Here is the completion-file for the separate. As you can see it’s quite a bit more complicated than the write-procedure, I trust the comments will explain the reasoning behind this.


Separate (LISP.Lists)

-- Read is perhaps overly complicated for a parsing example,
-- this is unavoidable given the constraints we are working
-- with in choosing to use strings AND to read/save directly in
-- valid LISP-text formant; namely:
--    (A) Generalized IO,
--        (1) No look-ahead is possible,
--        (2) No "unreading" is possible,
--    (B) LISP-text
--        (1) No way to tell, in advance, the length of
--            a list or element thereof.
--        (2) No way to tell, in advance, the type of an
--            element.
-- Therefore, the solution below is presented in a rather step-
-- wise manner: read the list from the stream, use that string
-- as the input of a string-to-list subprogram [consisting of
-- element recognition and list-construction]. Yes, this all
-- *COULD* [and should] be done in a single pass; that however,
-- might be too much for a set of "intro into the language"
-- tutorials. (Though arguably an interpreter is, too.)
-- Lastly, it would be remiss of me to fail to explain the
-- advantages in the above after detailing the disadvantages:
--    (A) Generalized IO
--        (1) Because it is general, a stream may be
--            generated by anything from a disk-drive
--            to a radio-receiver to a network itself.
--        (2) Because is is general, this method allows
--            us to swap out bodies for read, so we
--            can test multiple implementations of a
--            stream-parser AND different file-formats.
--    (B) LISP Program-Text
--        (1) There need not be differing functions for
--            reading in source-code and reading in a
--            line typed by the user -- both may use
--            the same stream-functionality.
--        (2) Output is absurdly simple -- just pipe all
--            the list's elements's To_String output and
--            make sure to output the '(' & ')' for
--            (respectively) before and after writing.

procedure Read(
        Stream : not null access Root_Stream_Type'Class;
        Item   : out LIST
          ) is
    Use ASCII, Ada.Characters.Handling, Ada.Streams.Stream_IO, LISP.Strings;

    -- The actual ASCII Whitespace characters.
    Subtype Whitespace is Character Range NUL..' '
    with Static_Predicate => Whitespace in NUL | HT..CR |  ' ';

    -- The following function gets the next non white-space character.
    -- Used to "prime the pump" of the stream, as any leading
    -- whitespace should be disregarded.
    function Get_Next_Character Return Character is
        Return Result : Character:= ASCII.NUL do
            Character'Read( Stream, Result );
            Exit When Result not in Whitespace;
            end loop;
        End return;

        When End_Error => Raise Parse_Error;
    end Get_Next_Character;

    -- Look_Ahead contains the imminent character from the stream.
    -- Current_Character, the working character.
    -- Stream_End marks whether the stream has been exhausted.
    Current_Character    : Character:= ASCII.NUL;
    Stream_End        : Boolean:= False;

    -- Advance shifts the old look_ahead into current_character
    -- and reads in a new character from the stream. In case of
    -- an empty stream a flag is set and NUL is substituted.
    -- The END_ERROR exception propagates if the flag is set.
    procedure Advance is
        Current_Character:= Look_Ahead;
        Character'Read( Stream, Look_Ahead );
        Stream_End:= False; -- Reset the flag.
        when END_ERROR =>
            if not Stream_End then
                Stream_End:= True;
                Look_Ahead:= ASCII.NUL;
                Raise END_ERROR;
            end if;
    end Advance;

    -- Build_String reads the characters from the stream, and
    -- returns the resultant string.
    Function Build_String Return String is

        Function Advance_Build(
               Open_Parens : Natural;
               In_String   : Boolean
             ) return String is
            OP : Constant Character:= '(';
            CP : Constant Character:= ')';
            C  : Constant Character:= Current_Character;

            -- Active_Paren handles the increment and decriment
            -- of the Open_Paren being passed to Build_String.
            Function Active_Paren Return Natural is
              ( if In_String then Open_Parens
                elsif C = CP then Open_Parens - 1
                elsif C = OP then Open_Parens + 1
                else Open_Parens -- Any other character

            -- Inside_String is merely a shorthand for
            -- determining when we enter/exit a string.
            Function Inside_String Return Boolean is
              (if C = '"' then Not In_String else In_String);

            if Active_Paren = 0 then
                return "";
            end if;

            return C & Advance_Build(
                    Open_Parens => Active_Paren,
                    In_String   => Inside_String
            When End_Error => return "";
        end Advance_Build;

        return Advance_Build(0, False);
    end Build_String;

    -- Here we "prime the pump" for our stream processing.
    Look_Ahead:= Get_Next_Character;

    -- Ensure we are starting a new list.
    if Look_Ahead /= '(' then
        raise Parse_Error;
    end if;

    -- We advance once to move '(' to the current position.

        -- String_Value holds a List's string representation.
        String_Value : Constant String:= Build_String;

        -- Parse_String returns the List from the string which
        -- has been passed as the input.
        Function Parse_String( Input : String:= String_Value ) Return List is

            Function Is_Numeric(S:String) Return Boolean Renames

            Function Element_Type( Input : String ) Return Data_Type is
                Initial : Constant Character := Input(Input'First);
                if Initial = '"' then
                    Return String_Type;
                elsif Initial = '(' then
                    Return List_Type;
                elsif To_Upper(Input) = "NULL" then
                    Return Empty_Type;
                elsif Is_Numeric(Input) then
                    -- If we add in floating types we'll need
                    -- to differentiate between integers and
                    -- the floating-type here.
                    Return Integer_Type;
                elsif Valid_ID(Input) then
                    Return Name_Type;
                end if;
                -- Since the above covers all possible values
                -- for Data_Types, if we made it here it
                -- indicates something went wrong in parsing.
                Raise PARSE_ERROR;
            end Element_Type;

        subtype Internal_Indecies is Positive Range

        -- Scan is a procedure that, taking some index,
        -- returns the next index of interest -- there are
        -- two such functions: one for getting the start of
        -- a new element, and one for getting the end of it.
        Procedure Scan( Index : in out Internal_Indecies ) is
            while Input(Index) in Whitespace
              and Internal_Indecies'Last > Index loop
                Index:= Internal_Indecies'Succ(Index);
            end loop;
        end Scan;

        Function Scan( Old_Index : in  Internal_Indecies;
                       New_Index : out Positive ) Return LISP.Elements.Element is

            Stop      : Internal_Indecies:= Old_Index;

            Function This Return Character is
              ( Input(Stop) );
            Function Element Return String is
              ( Input(Start..Stop) );

            Function Is_List  Return Boolean is
              (Input(Start) = '(');
            Function Is_String Return Boolean is
              (Input(Start) = '"');

            -- Because the element separators ", " and " "
            -- are equivalent, we will strip out any initial
            -- comma found and proceed as if it were whitespace.
            if Input(Start) = ',' then
                Start:= Start + 1;
            end if;

            -- Consume leading whitespace, because we might
            -- have something like ( Bobby , Dave, ,Jo).
            Stop:= Start;

            if Is_List then
                -- if it is a list we need to balance parentheses.
                    Paren_Count   : Natural:= 1;
                    Inside_String : Boolean:= False;
                    -- We don't want our start-index to
                    -- terminate the loop prematurely.
                        Stop:= Internal_Indecies'Succ(Stop);

                        case this is
                        when '(' =>
                            if not Inside_String then
                                Paren_Count:= Natural'Succ(Paren_Count);
                            end if;
                        when ')' =>
                            if not Inside_String then
                                Paren_Count:= Natural'Pred(Paren_Count);
                            end if;
                        when '"' =>
                            Inside_String:= not Inside_String;
                        when others => null;
                        end case;

                        exit when Paren_Count not in Positive
                               or Stop = Internal_Indecies'Last;
                    end loop;

                    if this /= ')' then
                        Raise PARSE_ERROR;
                    end if;
                -- If element is a string, then we need to
                -- search for the closing quote, otherwise
                -- we need to search for the next character
                -- delimiting items [whitespace | ','].
                while ( if Is_String then (if stop > start then this = '"')
                        else this not in Whitespace | ',' ) loop
                    Stop:= Stop + 1;
                end loop;

                -- Now we correct for consuming delimiters,
                -- or not consuming them (in string's case),
                -- as we certainly don't want to have
                -- "BOBBY " or "BOBBY," as an function-name
                -- or leave the closing quote on our index.
                Stop:= (if Is_String then Stop + 1 else Stop - 1);
            end if;

            -- Set the output-parameter.
            New_Index:= 1 + Stop;

            -- Yes, we already know whether or not the
            -- element is a string; however, using a case-
            -- statement [without OTHERS] ensures that all
            -- choices are covered... so if we change the
            -- what types the interpreter handles, we will
            -- receive a compiler error informing us of it.
            case Element_Type(Element) is
            when Empty_Type   =>
                Return Create;
            when Integer_Type =>
                    Value : Integer:= Integer'Value( Element );
                    Return Create(Value);
            when Name_Type | String_Type   =>
                -- Strings and identifiers are both strings,
                -- the difference is that one is surrounded
                -- by quotes, the other not; the following
                -- conditional statement handles that for
                -- our call to Create.
                Return Create(
                        Item => (if Is_String
                                 then Get_Internals(Element)
                                 else Element),
                        Identifier => not Is_String
            when List_Type    =>
                -- Recursion, Oy!
                Return Create( Parse_String(Element) );
            end case;
        end Scan;

        Index : Positive:= Internal_Indecies'First;

        Return Result: List do
                Scan( Index );
                    Working : LISP.Elements.Element
                                Renames Scan( Index, Index );
                    -- Yes, we could get rid of this whole
                    -- declare-block by saying:
                    --    Result.Append( Scan( Index, Index ) );
                    Result.Append( Working );
                    exit PARSE_ELEMENTS when
                          Index not in Internal_Indecies'First..
            end loop PARSE_ELEMENTS;
        End return;
    end Parse_String;

        if String_Value'Length < 2 then raise PARSE_ERROR; end if;
        Item:= Parse_String;
end Read;

Then finally modify your Testbed.adb to use the ‘Read and ‘Write attributes. Here’s what my testbed (yes, it’s a different name) looks like after stripping out the more “for beginner’s” comments:
Ada.Streams.Stream_IO, Ada.Text_IO.Text_Streams,
Ada.Text_IO;      -- Ada.Text_IO: exactly what it says.

-- This names the current compilation unit "working".
Procedure Working is

    Function Open( File_Name : String ) Return Ada.Text_IO.File_Type is
        Use Ada.Text_IO;
        Return Result : File_Type do
            Open(File => Result,
                  Mode => In_File,
                  Name => File_Name
        End return;
    end open;

    Output_Stream : Ada.Text_IO.Text_Streams.Stream_Access :=
      Ada.Text_IO.Text_Streams.Stream( Ada.Text_IO.Standard_Output );

    Input_Stream :  Ada.Text_IO.Text_Streams.Stream_Access :=
      Ada.Text_IO.Text_Streams.Stream( Open("test.lsp") );

   -- Stubs for Read, Eval, and Print.

   -- Hack to terminate the loop; we will replace this later, with an
   -- orderly shutdown-mechanism.
   Loop_Count : Positive:= 1;

    Use LISP.Lists, LISP.Elements;

    -- Our working list.
    Working : List;

Procedure Read is
    Function Create_ID( ID : String ) Return LISP.Elements.Element is
      ( Create( Item => ID, Identifier => True) );
  case Loop_Count is
    When 8  => Working.Prepend( Create_ID("+") );
    When 10 => Working:= Create_ID("define") &
                          Create_ID("bob") &
                          Create_ID("car") &
                            Create("Fire") &
                            Create("Ice") &
              -- Above: (DEFINE BOB CAR ("Fire" "Ice" "Air"))
              -- Result: BOB => ("Fire" "Ice" "Air")
    When Others => Working.Append( Create( Loop_Count ) );
  end case;
end Read;

    Procedure Eval is
        -- Eval replaces our working list with the result of the evaluation.
        Working:= Working.Evaluate;
    end Eval;

    Procedure Print is
    -- To print we simply pass to Put_Line the result of Working.To_String
    -- Note:  When working with tagged-types we may use an alternitive to
    --        the more common object-dot-method; it can be called just like
    --        the subprogram it is.
        Ada.Text_IO.Put_Line( To_String(Working) );
    end Print;

    Sub_Loop : array (Positive Range <>) of Not Null Access Constant String:=
      ( New String'("car"), New String'("cdr"), New String'("cadr"),
        New String'("cdar"), New String'("caddr"), New String'("BOB") );

    -- Testing functions -- Between the call to print and the loop's exit.
    if Loop_Count = 10 then
        Working:= Create("A") & Create("B") & Create("C");
        For E of Sub_Loop loop
            Use Type LISP.Lists.Implementation.Vector;
            Temp : LIST := Create(E.All, True) & Working.Copy;
            Use Ada.Text_IO;
            Put_Line( ASCII.HT & Temp.To_String );
            Temp:= Temp.Evaluate;
            Put_Line( E.All & ASCII.HT & "=>" & Temp.To_String );
        End loop;
    end if;
      -- Set up the exit-condition; the name is optional but useful when dealing
      -- with nested loops.
      Exit REPL when Loop_Count = 10;
      -- Update the loop-counter;
      Loop_Count:= Loop_Count + 1;
   end loop REPL;

    Ada.Text_IO.Put_Line( "Stream output of Working:" );
    LIST'Write( Output_Stream, Working );

    Ada.Text_IO.Put_Line( "Stream input of Working:" );
    LIST'Read( Input_Stream, Working );
    LIST'Write( Output_Stream, Working );

   -- Print some text to indicate an orderly shutdown was achieved.
   Ada.Text_IO.Put_Line( "Goodbye." );

End Working;

When run with the input below, it produces the output shown.

Input (test.lsp) Output
(DEFINE maybe ("A", "B" ,"C" , "G"))
[Count: 1]
[Count: 2]
(1, 2)
[Count: 3]
(1, 2, 3)
[Count: 4]
(1, 2, 3, 4)
[Count: 5]
(1, 2, 3, 4, 5)
[Count: 6]
(1, 2, 3, 4, 5, 6)
[Count: 7]
(1, 2, 3, 4, 5, 6, 7)
[Count: 8]
[Count: 9]
(28, 9)
[Count: 10]
("BOB", "Defined")
    (car, "A", "B", "C")
car    =>("A")
    (cdr, "A", "B", "C")
cdr    =>("B", "C")
    (cadr, "A", "B", "C")
cadr    =>("B")
    (cdar, "A", "B", "C")
cdar    =>()
    (caddr, "A", "B", "C")
caddr    =>("C")
    (BOB, "A", "B", "C")
BOB    =>("Fire", "Ice", "Air")
Stream output of Working:
("A", "B", "C")
Stream input of Working:
(DEFINE, maybe, ("A", "B", "C", "G"))

Everything there works as expected.

You won’t be able to just take the source from the previous examples and these, compile and run though. This is because I did some restructuring, in particular consolidating the internal functions for dealing with strings into a single child package which you may have noticed in the with clause of lisp-lists-read: LISP.Strings. I also separated the string_type and name_type fields so there is now an internal difference and strong-typing will prevent us from accidentally conflating the two.

Because all the changes would require showing the contents of everything you’ve already seen with relatively little change I will forego that in favor of leaving you with a link to my github page for this tutorial’s repository.