Visualizing C/C++ data structures Project Proposal

Lap Chung Lam, Mianqian Lin, and Jianhua Yang

INTRODUCTION

Researches have been done on software visualization since the production of the motion picture Sorting Out Sorting (SOS) in 1981. Many software visualization systems have been also developed. Software visualization is defined as the method using typography, graphic image, sound, and cinematography to understand computer software. Because of the limitation of the current software visualization systems, software visualization is still under usage; however, eventually software visualization will definitely become a powerful tool for students, professors, algorithm designers, and software developers.

Roughly in the macroscopic domain we can classify the software visualization into three categories as algorithm visualization, program visualization, and the hybrid of the two. Algorithm visualization is the visualization of high level abstraction of data, operations, and semantics of a computer program. In contrast to algorithm visualization, program visualization displays the underlining source code and data structure of a program.

In the paper A Principled Taxonomy of Software Visualization (Baecker & Small), twelve software visualization systems have been studied. Most systems are not automatic which means that great efforts are required to create the visualization of a given piece of software. This involves manually inserting function calls that generate events into the program being visualized, and writing a script file to control the view of the visualization. Events are used to inform a visualization system what is going on in the visualized process, and the system will generate visualization according the events. In other word, creating visualization involves the expertise of a system. This is not practical for software programmers who don't know the system at all, especially for the beginner.

Amount the twelve software visualization systems studied, three of them can actually generate visualization automatically or with small help of a programmer. However, two of them are only for Pascal or subset of Pascal language that is somewhat obsolete. Since Pascal is no more a productive language or a popular teaching language, the practical usage of the systems are limited. The final one is commercial software though it is for C language, and this limits the public access to the system.

When developing an application, usually the programmers have already encounter great difficulties of implementing the application. If they are required to learn to use a software visualization systems, struggle to insert library function calls and write scripts, there is no doubt they will give up using the system. Software visualization systems should separate programmers from developing software and creating visualization of the software, and it should provide powerful help instead imposing the programmer with extra burdens.

Our project goal is to develop a software visualization system to visually display the relevant data structures of C or C++ programs with some limited algorithm animation. We believe that when teaching algorithms in computer science, showing how an algorithm works is not good enough because the implementation of the algorithm is also very important. However, most algorithm visualization systems only show how an algorithm works in high level abstraction. They do not show any information of the underlining data structure. This is also not practical for debugging an algorithm.

For the practical reason, the users of our system do not need to know what function calls to insert. They only need to specify which data structures of his/her program needed to be visualized, and then the system will automatically insert visualization system library function calls into the right places to produce an annotated program. When the annotated program is compiled and run, it will visually draw the specified data structures and give reasonable animation for the operation of the data.

Developing such a software visualization system involves compiler technology, programming language principle, and graphic technology. It is also a long-term development and requires human power. As a short-term class project, we only focus on coming up with a clean neat project design framework so the followers can easily extend it into a master project or a Ph.D. thesis.

PROBLEMS TO ADDRESS

Developing an automatic system is more difficult than the non-automatic one. For a non-automatic system, a visualizer (who generates visualization) takes care of the semantics of the input program, then generates the mapping from algorithms to the corresponding images according to the semantics. For an automatic system, the system has to analyze the semantics of the input program, derives the semantics from the contents of the program, then generate the mapping.

Before we will go further to address the difficulties, we should look at the conventional systems such as TANGO. TANGO system consists of two processes, the input process (the program being animated) and the animating process. The input process generates a series of events, and then the events are passed to the animating process by the means of UNIX inter-process communication. The events drive the view showing on the animating process's display window.

To produce the animation for an input program using TANGO, a visualizer has to perform three tasks: (Stacko, 1998) 1. Annotate the program with a set of abstract algorithm operations. 2. Design specific image display and animation scenes. 3. Map the algorithm operations to the animation scenes. The first task is to insert function calls into the input program to generate events. The second task is to write functions to create desired images (or animation scenes). Finally the visualizer needs to write a script file called view control file to map each event to its corresponding scenes. When the system is run, it reads in the control file. Whenever the animating process receives an event from the animated process, it looks at the control file to find view scene mapped to the event, then display the scene.

The TANGO system only provides high level graphical library functions for visualizers. A visualizer will design the scenes using the library function according to his/her understanding of an animated program. For this process, sometimes a visualizer has to write more codes than the animated program.

The systems like TANGO do not provide much help for most programmers, especially for the beginners. We will design an automatic software visualization system that requires no extra programming for the users. The system will perform all three tasks mentioned previously with a condition that the users specify which data structures to be visualized.

Unlike the human being, which can have perfect understanding of the semantics of a given input program, a software system can only guess the semantics from the contexts of the program. Therefore, our first problem is to come up with a systematic method of analyzing the semantics of computer software. We will use this method to guide the system inserting correct codes into right plays of any input program.

The main purpose of the system is to visualize the data structure of C or C++. However, showing the contents of a given data structure sometimes does not help much because we also need to know how the contents of the data structure are changed or used by the algorithm. This means algorithm animation is also required. Since the semantics is guessed by the system, we do not expect to have perfect animation like TANGO, but at least we need some reasonable animation to annotate the give data structure. Therefore, we need to find out a way to map data structures to images, and the operations of the data structures to the operations of the images.

PRIMITIVE DESIGN

Temporarily we decide to use the TANGO system as our template. The system consists of two processes, a process for a visualized program and a visualizing process that performs the visualization. The visualized process will generate a series of events and pass them to the visualizing process. Each even should contain the data structure type, the contents of the data structure, and the operation on the data structure. When the visualizing process receives the events, it will choose the right image for the data type to display, and use the right operations to animate the images.

To visualize data structures with some proper animation, we need to know how to map data structures to images, and the operations of the data structures to the operations of images. All images and the operations of images should be predefined so that we can map any given data structure to a proper image. This is unlike TANGO, images are defined by visualizer. Of course, since all images are predefined, the algorithm annotation of our system will not be as flexible as TANGO. Although the algorithm annotation is limited in cur system, it will help programmers to debug their programs because the users will be able to see what is going wrong with the data of their programs. It also can help programming beginner to understand algorithms.

Our system will define a set of data types, a set of operations for each data type, a set of images (or image types), and a set of operations for each image type. Figure 1 depicts the relationship

Figure 1 System Structure Overview

between each set. When visualizing a program, for any given data structure the system will identify the data type, and look for a corresponding image type in the image library. As the input program (process) is continuing to run, it will inform the visualizing process each time it encounters an operation on the data structure by using event passing, then the visualizing process will map the operation to the corresponding operation for the image type. Image type plus image operation will produce animation for the data structure. We can not say the system can generate animation for the algorithm operating on the data structure because the system is not able to have knowledge about the algorithm. It can only know how the data are being used or modified from the contexts of the program.

The classification of data type is the first task to be accomplished. When we have a solid classification of data type, then we can derive image types. The primary data types for C and C++ are integer, character, floating point, long integer and so on. The most difficult data types to be classified are the user-defined types such as structure and class. Inside a structure can be other structures, and inside a class can be other class types. Class type also imposes another difficulty because of the private data elements. First task is to identify a set of basic types, then find a pattern of building composition types (probably including tree type, linked-list type, stack type, queue type, digraphic type, and graphic type, etc.) from the basic types. Accordingly we can construct a basic set of primary image types, then build the more complex image types using the basic image types. By using this scheme we can represent any data types using graphic images. The difficult arise with the private data in the class type because we need to access those data when building visualization. To access the private data elements in a class type, the only way is to insert methods operating on private data elements into the class definition.

Probably the operations on each data type are less complicate. Now we can roughly classify the operations as assignment, comparison, swapping, adding, subtracting, etc. According to those operations, then we can define a set of operations for images. Those operations mostly should be drawing images, changing images size, color, position, and text contents (for displaying the contents of the corresponding data structure).

Each visualized program should be inserted function calls automatically to generate events. However, since the system will automatically generate the insertion, it has to understand the semantics of the program. Probably it does not know what algorithm is operating the data structures, but it has to know how data structures are operated, then inserts the right function call in the place where data structures are operated. However, the nature of programming language imposes a great burden for the system developers. Usually for a give program, it will consist of functions and function calls. For a give function, it can operate on different set of data of the same type. For this situation, we have to handle the even generation carefully. For example, if we are visualizing two different data of the same type, we have to create two different animations for the data. When come to a function call, the inserted function call has to know how to generate a correct even to drive the right animation. The alias (pointer etc.) of C or C++ language is also a headache for understanding the semantics of a program. This is also a difficult subject in compiler research. For example, an user specifies a data structure to be visualized at the beginning of the program, then later all operations on the data structure are through pointers or aliases. This kind of situation is also going to be solved in our system to produce correction data visualization.

The final problem needs to be solved is the view control. An animated image includes points, lines, circles, rectangles, colors, positions and movements. This problem is not that difficult comparing to the previous problems. We can apply the existing graphic library to implement the display of images. The more difficult issue is the timing control. Timing control is definitely need to be implemented because sometimes users need to look at the visualization slowly to understand an algorithm or find out bugs in a program. Event passing can also be use to solve this problem. A visualized process needs to wait for a continue event from the visualizing process to continue the process. When an users need to slow down 2 seconds, then the visualizing program needs to wait for 2 seconds before send back the continue event to the visualized process.

CONCLUSION

Designing such a system is not an easy task because of the complex nature of the system, but we have the foresight that our C/C++ visualization system will be a great help for teaching or learning algorithms. It also can help monitoring the data operations when doing debugging. Software visualization for debugging is also an important research topic currently. Our system framework can be easily extended to this field without many efforts.

BIBLIOGRAPHY

John Thomas Stacko, (1998) TANGO: A Framework and System for Algorithm Animation, Brown University, Dept. of Computer Science. Technical Report No. CS-89-30.

Price, B.A., Baecker, R.M., and Small, I.S. AA principled Taxonomy of Software Visualization@ Journal of Visual Languages and Computing page 211-266.