Nov 24, 2009 - 07:18 AM  
XVCL :: Technology for Reuse based on Bassett's frames  
 

Search


Clone Miner & Clone Analyzer

Clone Miner & Clone Analyzer

A Technology for Detection and Analysis of Design-Level Software Similarities

 

The Problem Addressed

Similarities are inherent in software. We repeatedly apply similar design solutions to solve similar problems. Architecture-centric and pattern-driven development encouraged by modern platforms (.NET™ and J2EE™) leads to standardization of program solutions. Programmers use rampant copy-paste-modify practice for quick performance gains during development and maintenance. Much of the repetitions cannot be avoided, because conventional techniques fail to unify them with simple enough generic design solutions. In our experiments, we typically find 50%-90% of code contained in so called clones – similar program structures, repeated many times within or across programs,  in variant forms [2][3][4][5][6][7][8]. Independently of the reasons why they arise, such cloned structures hinder future maintenance. They complicate programs, make it difficult to trace the impact of change (ripple effect of change), and increase the risk of update anomalies. Other problems triggered by clones include replication of unknown bugs, code bloat, and dead code -  symptoms of the “software aging” phenomenon.

The above are some of the examples of the reasons why it is good to know where the clones are, especially in legacy code that undergoes extensive maintenance, needs re-engineering into a more maintainable form, or must be ported to a new language/platform (e.g., from VB to VB.NET, or ASP to ASP.NET, or from Java to J2EE).

 

Our Solution: CM/CA Method and Tools

We have developed a method and tools – called CM and CA, which stand for Clone Miner and Clone Analyzer - for detecting code clones at all levels. CA is language independent, whereas CM is easily configurable for different languages, as it works on the lexical tokens rather than parsed structures. Currently, our tools can work with Java, C/C++ and VB.

Clone Miner (CM) is used first to detect clone candidates [1]. CM uses efficient algorithms and heuristics to extract the most useful cloning information in the shortest time. First, CM looks for cloned code fragments. Then, CM uses data mining to identify configurations of cloned code fragments that may signify design-level similarities.

Useful design-level similarities cannot be found without user involvement. Clone Analyzer (CA) applies visualization and abstraction to filter information produced by CM.  The underlying principle behind CA is to aid the user in identifying design-level similarities that are useful for a given maintenance or re-engineering task at hand. CA is equipped with features such as a diff tool, navigation tables, queries and overview charts.

 

The Benefits

The benefits of identifying similarity patterns in software are numerous:

Ř    During maintenance, the knowledge about the similarities existing in a system can aid the user in controlling the impact of change. This would reduce update anomalies and control the ripple effect of making changes to the system.

Ř    Knowledge of the design level similarities can lead to better program understanding and possibly identifying areas in which the system can be improved in terms of design and quality of code.  These similarities are potential candidates for code refactoring which will enhance the maintainability of the system.  Re-engineering of existing legacy code is aided by such design similarities.   

Ř    Large granularity, design-level similarity patterns often create opportunities for reuse of design solutions within a given system, or even across similar systems.  This form of reuse is natural and enhances current architecture-centric, component-based reuse methods. A meta-programming technique of XVCL [9] (http://xvcl.comp.nus.edu.sg) developed in our lab has been designed to reap benefits offered by similarity patterns. We can represent similarity patterns (and any kinds of counter-productive repetitions) as compact, easy to understand and easy to work with XVCL-enabled generic solutions. We got evidences from lab studies and industrial applications that XVCL-enabled reuse can considerably reduce development and maintenance effort [4][6][7][8]

 

References

[1]     Basit, H.A., and Jarzabek, S. “Detecting Higher-level Similarity Patterns in Programs,” ESEC-FSE'05, European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, ACM Press, September 2005, Lisbon, pp. 156-165

[2]     Basit, H.A., Rajapakse, D.C., and Jarzabek, S. “Beyond Templates: a Study of Clones in the STL and Some General Implications,” Int. Conf. Software Engineering, ICSE’05, St. Louis, USA, May 2005, pp. 451-459

[3]     Jarzabek, S. and Li, S. “Eliminating Redundancies with a “Composition with Adaptation” Meta-programming Technique,” Proc. ESEC-FSE'03, European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, ACM Press, September 2003, Helsinki, pp. 237-246; paper received ACM Distinguished Paper award

[4]     Pettersson, U., and Jarzabek, S. “Industrial Experience with Building a Web Portal Product Line using a  Lightweight, Reactive Approach,” ESEC-FSE'05, European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering, ACM Press, September 2005, Lisbon, pp. 326-335

[5]     Zhang, H. and Jarzabek, S., “An XVCL-based Approach to Software Product Line Development”, Proc. 15th International Conference on Software Engineering and Knowledge Engineering (SEKE’03), San Francisco, USA, 1 - 3 July, 2003.

[6]     Zhang, W. and Jarzabek, S. “Reuse without Compromising Performance: Experience from RPG Software Product Line for Mobile Devices,”  9th Int. Software Product Line Conference, SPLC’05, September 2005, Rennes, France, pp. 57-69

[7]     Zhang, H. and Jarzabek, S. A Mechanism for Handling Variants in Software Product Lines,” special issue on Software Variability Management of Elsevier’s journal Science of Computer Programming, Volume 53, Issue 3, Dec. 2004,  pp. 255-436

[8]     Yang, J. and Jarzabek, S. “Applying a Generative Technique for Enhanced Reuse on J2EE Platform,” 4th Int. Conf. on Generative Programming and Component Engineering, GPCE'05, Sep 29 - Oct 1, 2005, pp.  237-255

[9]     XVCL: Technology Summary

 

Contact: Stan Jarzabek

Department of Computer Science, School of Computing, National University of Singapore

3 Science Drive 2, Singapore 117543;

e-mail: stan@comp.nus.edu.sg; http://www.comp.nus.edu.sg/~stan

fax: 65-6779-4580; tel: 65-6874-2863 (office) 65-96255863 (mobile)

 
Access statistic