Archive
Turbo Pascal Compiler Secrets
Compiler Design – Practical Compiler Construction
Good optimizing compiler is a must for any computer platform. The availability and quality of compilers will determine the success of the platform. Compiler design is a science. There are numerous books written about compiler principles, compiler design, compiler construction, optimizing compilers, etc. Usually compilers are part of the integrated development environment (IDE). You write a program in some high-level language, click compile and a moment later you get executable code, assembler listing, map file and a detailed report on memory usage.
As a programmer and IDE user you expect fast compilation and optimized generated code. Usually you are not interested in compiler internals. However, to build your own compiler you need a lot of knowledge, skills and experience. Compiler construction is a science. Compilation is a symphony of data structures and algorithms. Storing compiler data in proper structures and using smart algorithms will determine the quality of the compiler. Compilers usually have no user interface, they process source files and generate output files: executable code, object file, assembler source, or any other needed file.
Why are compiler data structures so important?
Compiler needs to store identifiers in symbol tables. Each identifier has some attributes like type, size, value, scope, visibility, etc. Compiler must be able to quickly search the symbol table for particular identifier, store new identifier into symbol table with minimum changes to it, etc. To satisfy all these requirements it is mandatory to carefully design symbol table structure. Usually hash tables are used for quick search and linked lists for simple addition or deletion of elements. Symbol table management is one of the critical elements of any compiler. Another important internal data structure is intermediate code representation. It is a code that is generated from the source language and from which the target code is generated. Intermediate code is usually used to apply optimizations, so right form of intermediate code with general syntax and detailed information is crucial for successful code optimizations.
To convert source code to target instructions is not a big deal. A limited set of registers can sometimes require some smart register saving techniques, but in general each source language statement can be translated into a set of target instructions to perform required effect. If you do not take actions to optimize the generated code you will have a very inefficient code with many redundant loads and register transfers. Excellent optimizations are the shinning jewel of any compiler. With some average optimizations you can easily reduce generated code size for 10% or more. Code size reduction in loops means also increased execution speed. There are many algorithms for compiler optimizations. Most are based on control flow and data flow analysis. Optimizations like constant folding, integer arithmetic optimizations, dead code elimination, branch elimination, code-block reordering, loop-invariant code motion, loop inversion, induction variable elimination, instruction selection, instruction combining, register allocation, common sub-expression elimination, peephole optimization and many others can make initially generated code almost perfect.
If you have ever tried to write a compiler then you probably already know that this is not a simple task. You can use some compiler generators, or write the compiler from scratch. Compiler construction kits, parser generators, lexical analyzer generators (lexers), optimizer generators and similar tools provide the environment where you define your language and enable the compiler construction tools to generate the source code for your compiler. However, to make a fast and compact compiler you need to design your own compiler building blocks, from architecture of symbol tables and scanner to code generator and linker. Before you start reinventing the wheel it is a good idea to read some books about compiler design and examine the source code of some existing compiler. Writing your own compiler can be a great fun.
There are many excellent books on compiler design and implementation. However, the best book on compiler design is the compiler itself. Take a look at Turbo Pascal compiler written in Turbo Pascal. This source code shows all the beauty of the Pascal programming language. It reveals all the tricks needed to build a fast and compact compiler for any language, not just Pascal.
Turbo Pascal Download – How to Compile Old Projects
Turbo Pascal was probably the most widely used Pascal compiler of all times. Borland released it in early 1980s and at that time it was available on the CP/M and PC platform. It featured fast compiler, integrated development environment and a very affordable price. Its syntax, known also as Object Pascal, has become standard and the concept of units is still used in all modern implementations. Until recently Pascal programming language was taught in many schools. For many people it was the first step into computer programming. It is a language that is easy to write and easy to read so you need very few comments to understand what the program does.
Turbo Pascal in 1990s evolved into Delphi. This is a rapid application development tool for Windows. It still uses Object Pascal with many additional features. However, because of popularity in early years, there are many projects that were developed with Borland DOS compilers. If you would like to compile such project you would need the original compiler, most likely version 7.0 which was the last released DOS version. Unfortunately, this version is no longer available. Borland has some time ago released old versions of compilers free of charge: 1.0, 3.02 and 5.5. The last version 7.0 is not yet available.
If you would like to compile the old Pascal code you can either try to find the original compiler from one of the illegal sources or you can use the open-source Free Pascal in compatible mode. There is also a third option. You can use the TPC32 command line compiler which is available as part of the demo package of the TPC32 source files and can be downloaded for free. This is not some limited version, it is a fully functional compiler compatible with TPC.EXE command line compiler. It is called demo because the full version includes complete source files which are not available for free.
TPC32 is a successor of the TPC16, a compatible compiler written in Turbo Pascal 7. It is compatible with the original Borland compiler in all aspects. Compiles the same source files and generates binary compatible unit and executable files. TPC32 is still the same compiler, the sources were slightly modified to be compatible with Delphi 7 which doesn't use the old segment-offset memory model. TPC32 still generates 16-bit x86 code. Source code of both compilers is available for purchase. You can use this source code to understand the internal data structures and algorithms of the famous Borland product or to make your own compiler. Both compilers are also available in demo versions which include fully functional compiled executable files. Because TPC16 is a DOS application it has some memory limitations. The TPC32 is a Win32 application and uses flat memory model with very few limitations. Both compilers can be used to compile the old projects created with the original Borland tools.
Pascal programming language is now rarely used in schools, but for some programmers it is still very popular and many old projects are still maintained. TPC32 compiler might be a solution for those who need a cheap and legal solution to compile the old Pascal sources.
I'm a big fan of Pascal programming language. Therefore I have a lot of old projects created with Turbo/Borland Pascal. If you need a free compatible solution to compile the old Pascal projects you can download a demo version of the TPC32, Turbo Pascal compiler written in Delphi, which contains a fully functional command line compiler. You can also get the TPC32 source code. It can be used for your own project or as a great book on compiler design and implementation.
Compiler Design – Hash Functions and Tables
The purpose of every compiler is to read the input file in one programming language and convert it to one or more output files. Output file can be a different programming language, object code or executable code. The process of compilation must first examine the input file. This means reading all characters, identifying keywords, expressions, statements and storing all the data into symbol tables for future use. Symbol table is one of the most important data structures in any compiler.
Symbol table stores identifiers and its attributes. Every time the compiler finds a new identifier in the source code it needs to check if this identifier is already in the table, and if not it needs to store it there. This means a lot of searches and comparisons with every symbol table item. Search is always a very time consuming operation. Our goal is to have a fast compiler. Therefore we should find a way to make the searches across the symbol table as fast as possible.
One of the simple yet effective approaches is to use some hash function and to create a hash table. For every identifier in the table we apply the hash function and calculate some number. The hash function is an arbitrary function that for each identifier returns some number. It can be a simple sum of the ASCII codes of the identifier or some more complex one. Then we use, for example, the last 4 bits of this hash value to determine where to search for our identifier. 4 bits of hash value mean that we have 16 different linked lists of identifiers. We search only the list that belongs to the calculated hash value. This means that we only have to search a small list of identifiers which have the same hash value. Using more bits of hash value and consequently having more linked lists means faster search but we need some more space for bigger hash table.
If we don't find our identifier in this list then we can be sure that the identifier is not in the table because all other identifiers have different hash values so they must be different. In such case we simply add the new identifier at the end of the list of identifiers which belongs to the calculated hash value. Using hash functions and hash tables is a very effective way to speed up searches in symbol tables. Hash functions and hash tables are used in almost all compilers because their implementation is pretty simple and the gain in search speed is huge.
There are many excellent books on compiler design. However, the best book on compiler design is the compiler itself. Take a look at Turbo Pascal compiler source code – a Turbo Pascal compiler written in Turbo Pascal. This source code shows all the beauty of the Pascal programming language and reveals all the tricks needed to build a fast and compact compiler for any language, not just Pascal.
Turbo Pascal – A Brief History
Turbo Pascal is a famous Pascal compiler developed by Borland in early 1980s. It was the first compiler that included also an Integrated Development Environment (IDE). Because of this it was possible to write the code, compile it, run it and debug it without ever leaving the editor and running other tools. Another strength of the Turbo Pascal compiler was its speed. Comparing to other compilers at that time it was very fast.
Turbo Pascal was developed by Anders Hejlsberg who initially developed Blue Label Pascal and then Compas Pascal compiler. It was available on CP/M and MS-DOS platforms. Borland then licensed the compiler core and added user interface and editor. Anders Hejlsberg joined Borland where he was the chief architect of all versions of Turbo Pascal and first versions of its successor, Delphi. Anders Hejlsberg is now forking for Microsoft where he works as the lead architect of the C# programming language.
The first version of the Turbo Pascal compiler was released in November 1983. It was sold for $49.95 and was very affordable comparing to other Pascal compilers. It generated .com executable files which were also pretty fast. This was a direct consequence of the quality of the generated code by the compiler. Included Integrated Development Environment, fast compilation, fast development cycle (edit, compile, debug), quality of the generated code and affordable price contributed to additional popularity of the Pascal programming language in the 1980s. At that time Pascal was also used as the programming language for teaching in high schools and universities.
The compiler development went on. Later versions introduced a full-screen user interface with pull-down menus, generated .exe files, supported inline assembly instructions and object oriented programming. Many advanced features were added to ease software development. The last version of the compiler for DOS, Turbo Pascal 7, had everything needed to get the most out of a DOS program.
One of the most important contributions to the popularity of Pascal language made by Borland was a clever approach to add some simple extensions of the language which filled the gaps in the standard Pascal. The most important extension was the support for units. Unit is a separate file which could also be compiled separately. Usually a complex program is divided into logical units which make code writing and program development easier. Second important extension was support for strings. Strings are character arrays which can be used for anything not just for characters. Borland also added support for object oriented programming, access to absolute memory locations, support for interrupt procedures, inline assembly instructions, etc.
If you are interested in compiler construction and Turbo Pascal internals you can examine the Turbo Pascal compiler source code. This is not the original source code but a compatible compiler written in Turbo Pascal. This source code can be also used as a great book on compiler design and implementation.
Pascal Programming Language
Initially, it was intended to teach computer programming because it encourages structured programming and using data structures. This is how the famous “hello world” program looks like:
Program HelloWorld (Output); begin Writeln ('Hello, world!'); end.
Definitions for data types and structures are simple and clear. Language provides an orthogonal and recursive approach to data structures. You can declare arrays of arrays (multidimensional arrays), arrays of records, records containing arrays, file elements can be records, arrays, records containing arrays of records, etc. Some examples:
Type TDay = 1..31; TMonth = 1..12; SimpleArray = Array [1..1000] of Integer; 2D_Array = Array [1..1000, 0..3] of Real; Friend = (Barbara, Alice, Rebecca, Laura); DateType = Record Day: TDay; Month: TMonth; Year: Integer; end; Var Birthday: Array [Friend] of DateType;
The first major milestone in Pascal history was Turbo Pascal. Based on the compiler written by Anders Hejlsberg, the product was licensed to Borland, and integrated into an IDE. It was also available on the CP/M platform, but the biggest success Borland achieved with Turbo Pascal for DOS. Borland continued the tradition of successful Pascal compilers with Delphi. Delphi is a visual rapid application development environment using Object Pascal as programming language. There is also an open-source compiler available: Free Pascal. It is a 32 and 64-bit compiler for various processors like Intel x86, Amd64, PowerPC, Sparc and ARM. Pascal is also used for development of embedded systems. Compilers are available for microcontrollers like 8051, AVR and ARM.
In 1983 the language was standardized as IEC/ISO 7185. In 1990 an extended standard was created as ISO/IEC 10206. The ISO Pascal is somehow limited because it lacks some features like strings and units. The most known and used syntax is the Borland Turbo Pascal syntax which added necessary features to fill the gaps in the ISO standard. There is also a derivative of Turbo Pascal known as Object Pascal (used in Borland Delphi) which was designed for object oriented programming.
Because of it’s popularity among many programmers some older versions of the Turbo Pascal compiler are now available for download.
Today, Pascal is still popular in various areas but not that much it was decades ago. It was replaced mainly with C which is available for almost any platform. Nevertheless, there is still a large community that finds Pascal programming language an excellent choice. It is easy to learn and easy to read. If you give your identifiers meaningful names you can read the program almost like a plain English text. Therefore it is very easy to transfer any algorithm into program. In addition to this the language is not case sensitive which is another step closer to English language.
There are endless debates and comparisons of Pascal and C programming languages. Some favor Pascal, other like C. There is no winner. Both languages are used to describe an algorithm. It is up to the programmer to choose his preferred language.