AN EXTENSIBLE APPROACH TO GENERATE FLOWCHARTS FROM SOURCE CODE

Source-code that a developer writes may not definitely make sense to another, the understandability of a source code depends on the proficiency in the language and the logical thinking pattern of the person who has developed the code and who tries to understand it. However, in distributed software development and in software maintenance there is a need to read and understand the source-code probably written by someone else after some time it has encoded. Flowcharts are used to depict the logical flow of processes and can be used as an effective tool in representing the control flow of software programs. This paper presents a novel approach to generate flowcharts from program snippets. It demonstrates that by using an intermediate abstract representation, independent of any programming language, the generation of flowcharts for programs written in any programming language can be achieved. The feasibility of the proposed approach was demonstrated by developing a porotype system of compilers to generate flowcharts for source-codes written in the PHP language.


Introduction
Programming can be considered both as a science and as an art (Knuth D. E., 1974). Programming languages are built to instruct a computer to perform a sequence of computations. The syntax used by different programming language may vary from a language to language. However, irrespective of the programming language used to code an algorithm, there is only a finite number of abstract constructs that can be used to develop a program. How these constructs are combined to produce the expected result is an art and can be done in many different ways in different programming languages based on the software development maturity and the programming language knowledge of the developer.
The statements of a program define a logical flow of instructions. For a computer what matters is the sequence in which the instructions are to be executed. However, for a developer, who is writing Http://www.granthaalayah.com ©International Journal of Research -GRANTHAALAYAH [506] or reading the code, what is important is the logical flow of the program instructions. It is hard for a developer to conceptualize the logic defined by a program just by reading the code. Even an experienced programmer may find it difficult to understand the logic of a program encoded by him/her at a later time. This is also an inherent problem in distributed software development environments. In a distributed software development environment original author of the code is not the only one responsible in maintaining the code, rather the same code may have to modified by some others.
Visualization is proved to be a good way for understanding a program. Flowchart representation of a source-code snippet makes it easier for a developer to understand, debug and check the validity of the logic of the code. There are numerous software packages to generate code from flowchart diagrams but there aren't ways to generate flowcharts from sourcecode. This paper presents a novel approach to generate flowchart for program snippets. The proposed approached is based on the compilers (Aho, Lam, Sethi, & Ulman, 2013) and can be easily extended to generate flowchart for programs snippets written in any programming language. The feasibility of the proposed approach is proved by building a porotype application to generate flowchart for PHP code snippets.
There were very few researches reported in the literature on this domain. Even in the reported literature a generalized approach is not seen on how to translate source-code written in any language to flowcharts. We claim the proposed approach as novel because there was no research found to be done for the translation of source code snippets into flowcharts by constructing a compiler which can be easily extended for any programming language.
One of the applications repoted to generate pictorial representations of source code is AutoDia (AutoDai, 2017). It is designed to generate Unified Modelling Language (UML) class diagrams from source-code written in a selected set of programming languages. The parser of this application searches for specific pre-defined programming language constructs to generating the output on the fly. The output of the sparser is in a proprietary format which is used by the drawing algorithm to generate the class diagrams. The drawing algorithm is tightly-coupled with the application. Since the output generated by the parse does not confirms to a specification, the output cannot be used with any other drawing tools. Also, to extend the application for other source/input languages sub-components have to be developed and the drawing algorithm has to be modified.  Http://www.granthaalayah.com ©International Journal of Research -GRANTHAALAYAH [508] language is described with examples used to evaluate the prototype implementation. The last section draws conclusions and suggestions.

Methodology
An application construction process intend to generate flowcharts for program snippets can take two different approaches. 1) Develop code from scratch to generate flowcharts for programs written in any language.
2) Develop an intermediate representation for flowcharts independent of any programming language, develop a single backend component to generate flowcharts from this intermediate representation and finally develop a frontend translator for each programming language to convert program snippets in that language to the intermediate representation.
Using approach, a) to generate flowcharts for any program language source code by using a programming language like Python can be depicted by using a T diagram as in Figure 1. This is a complex approach and does not allow easy extendibility. Extending of this approach for each new programming language may require complex modification of the code.
Using approach b) for this task offers may advantages over the approach a). Firstly, there are many popular tools available to generate different types of diagrams (Visio, 2018) (Gansner & Ellson, 2017) (Lucidchart, 2018). Thus, there is no need to invest time and money in developing application for data visualization. However, internal representations used by these tools to store data is different. Thus, to visualize a flowchart in the intermediate representation requires a backend component to be developed for each visualization application that uses a different representation of data. Secondly, the approach can be easily extended for program snippets developed in any language. In this case a font end translator must be developed for each programming language.
Using the approach b) to generate flowcharts coded in a programming language such as PHP is depicted in Figure 3. The application shown in Figure 3 comprises of three compilers. The first compiler, termed as the font-end of the application, converts a given PHP snippets into an Abstract Syntax Tree (AST), which is the intermediate representation  On the other hand, the deep analyzer evaluates recursively the embedded nested structures within each code block. The output of the shallow analyzer is passed to the deep analyzer which in turn carry out a shallow analysis for each enclosed block followed by a deep-analysis if the code block contains any other code blocks.
Therefor the intermediate representation (AST) used to encode a flowchart needs features to represent nested code blocks recursively.
Any flowchart can be converted to a direct graph data structure where nodes represent the individual items in the flowchart and the arrows represents the flow directions. An example grammar of a language developed to represent the AST of flowcharts (the intermediate representation) is given below. In this AST a program snippet for which a flowchart to be generated is considered as a class.

Results
The feasibility of the proposed approach was verified by building a prototype to translate source code snippets written in PHP programming language to flowcharts and by evaluating the correctness of the generated flowchart.
The front-end and the middle compiler of the architecture of our prototype was developed by using the tool "Yet Another Compiler-Compiler" (Yacc) and Graphviz software is used as the back-end compiler to generate flowcharts.
The following subset of PHP constructs were used to build our prototype. The experiment designed to evaluate the correctness of the outputs included the following phases: Designing flowchart representation of a logic flow of a program by hand.
Encoding the logic flow by using PHP programming language.
Use the encoded program as the input for the compiler and generating flowchart outputs.
Compare the output generated in step 3 with the initial flowchart generated at step 1 manually, for correctness.
The experiment listed out were performed across number of standard and mixed programming language constructs as described in the following sections.

Test 1: If-Else Construct
An if-else construct comprises a single if block and a single else block. Either of these two blocks can be optional. A manual flowchart and PHP code constructed to represent an if-else construct are shown in Figure 2 and Listing 3 respectively and the flowchart generated by the prototype implementation is shown in figure 3.

Test 2: Nested If-Else Construct
Nested if-else construct comprises of a embed if-else construct within the if block or else block of a basic if-else construct. A manual flowchart and PHP code constructed to represent a nested ifelse construct are shown in figure 4 and Listing 4 respectively and the flowchart generated by the prototype implementation is shown in figure 5.

Test 3: Switch construct
Switch construct can be considered as an extension of nested if-else construct. It typically comprises of multiple logical tests and actions associated with each test. Switch-case can be represented in the flowcharts using a series of if-else constructs. A manual flowchart and PHP code constructed to represent switch construct are shown in figure 6 and Listing 5 respectively and the flowchart generated by the prototype implementation is shown in figure 7.

Test 5: Mixed Constructs
A flowchart created by mixing different flow control structures is given in Figure 10. The corresponding source-code written in PHP and the generated flowchart are given in Listing 7 and Figure 11 respectively.  Table 1 summarized the overall results produced by each of the experiments. This contains the experimentation subject and Boolean values stating whether the flowcharts generated were correct, with regard to the original logic flow and source-code, and whether the flowcharts were valid, with regard to the rules of drawing flowcharts.

Conclusion
The experimentations and results verified that the flowcharts are valid and correct. With regard to the experiments and the corresponding results, we can conclude that the proposed architecture is feasible in constructing a compiler to translate source-code to flowcharts.
There are various ways that this research can be extended.
The application can be extended to support various other source languages by writing the appropriate lexers and parsers. The output of the parser should be an AST and it should comply with the specification of AST composed in this research.
Enhancement to the look and feel of the flowchart can be done as a future contribution to the work, either by modifying the GraphViz library or by constructing another library using the Dot language as the input source.
Optimizations of the generated dot language representation hasn't been considered by this research which could be done as an improvement.
The prototype system built to prove the concepts presented only selected set of programming language constructs were used. The rest of the programming language constructs can be made supported by extending the code-generator.
Finally, a User interface (UI) can be implemented for novice developers to use the prototype compiler as a product.