静态分析工具由于广泛使用e they are effective at finding programming defects. They work by analyzing the source code of a program without executing it, so don’t require test cases. Loosely speaking, these tools operate in two phases. First, they parse the source code to create a model of the program; then they use a variety of techniques that examine the model to find defects. These techniques can range in sophistication from simple syntactic analysis through whole-program path-sensitive symbolic execution. They can find simple defects such as violations of spelling conventions, or more serious run-time errors such as null pointer exceptions, resource leaks, and data races. The most valuable analysis techniques are those that find the defects that are most damaging when they show up in the field. Of course this can vary tremendously depending on the application domain—a program intended for use in a secure environment will be vulnerable in very different ways to a program designed to be used in a trusted setting. This article is intended to be domain-agnostic so I discuss the kinds of bugs that are likely to be important in many kinds of applications.
尽管语言确实有许多共同的编程错误类别,但在语言之间造成的损害的可能性大大变化。例如,运行时误差(例如缓冲区过度)的后果在内存 - 不安全语言(例如C和C ++)中可能是灾难性的,但是诸如Java之类的语言更为纪律,因此从严重且无法预测的效果的风险较低得多这样的错误。本文探讨了C ++和Java的这些差异,并解释了哪种分析技术最适合每种语言。工具需要复杂的路径敏感分析技术来找到C和C ++代码中最常见和风险的缺陷,但即使是相当轻的分析算法也对Java非常有效。
Risks of C and C++
C was designed as a language for systems programming at a time when it was important to be able to wring as much speed as possible from systems software. Unfortunately the same language features that make it possible for compilers to generate very fast code also make C a very risky language to use. Although C++ is better in many respects, it has inherited many of the problematic features of C. (From here on when I refer to “C”, I mean to include the subset of C++ that shares these issues.) The two chief complaints are weakness on type safety, and unchecked pointer arithmetic. The lack of type safety means it is possible to cast a value of one type into another type with no guarantees that the conversion is legal. The ability to do pointer arithmetic means it is possible to access essentially any location in the address space of the program. These two features can be used to create programs that are extremely fast and efficient, but the consequences of misuse can be severe.
In C it is easy to write programs that stray outside the bounds of defined behavior, with hugely unpredictable results. For example, consider the well-known buffer overrun bug; this bug is one of the most notorious in the history of computing, because it has enabled innumerable security breaches. The ability to do unchecked pointer arithmetic is at the root of why buffer overruns are so dangerous.
It is easy for a programmer to inadvertently introduce a buffer overrun—all it takes is to forget to check whether incoming data can fit in the available space. What makes it so hazardous is that C has no built-in protection against the damage it can wreak. A carefully-crafted input vector can allow attackers to hijack the running program and force it to do as they wish.
In contrast, the same defect in a Java program is relatively benign. The Java specification is explicit in requiring that every buffer access be checked, and an exception thrown if the access is determined to be illegal. In Java, even an unhandled exception has predictable consequences and Java programmers are accustomed to thinking about what exceptions may occur and how to handle them. As a result it is conventional for programs to be written to log exceptions and either recover or restart. This turns a buffer overrun from a potentially catastrophic bug into a relatively minor annoyance that is fairly easy to debug.
还有其他几类编程缺陷对C程序产生严重影响,但在Java代码中大多是无害或不存在的。例如,C/C ++编程更困难的方面之一是动态内存管理。程序员完全负责分配和释放内存块:当不正确地完成此操作时,程序可能会泄漏内存,如果释放了不适当的内存位置,则堆甚至可能会损坏。在Java程序中,自动垃圾收集使内存管理大多是非问题的。
Java的风险
与C相反,Java程序中最常见的错误并不像引入微妙的语义错误一样造成崩溃。Java拥有一组非常丰富的库,从基本数据类型(如地图和正则表达式),通过UI工具包以及企业架构框架运行范围。与任何API一样,可以滥用库的方式,这种滥用可能引入缺陷。
For example, all classes in Java are derived from the Object class, for which methods equals() and hashCode() are defined. A very important invariant in Java is that objects that are considered equal must also have equal hash codes. If this invariant is violated, it is considered a bug because classes such as HashTable will then not work as expected. This defect will never trigger a run-time error directly; instead it is more likely to cause puzzling symptoms. For example, an object placed in a hash table may become impossible to retrieve. Unfortunately there are many ways that a programmer can inadvertently write code that violates the invariant—for example, a subclass that overrides equals() but not hashCode() is very likely to have this problem.
同样,Java中的许多类实现了可序列化接口,该接口允许对象从持久存储中写入和读取。该界面可能非常有用,但是实施它可能非常棘手,因此有一些惯例可以减少错误的可能性。例如,通常在所有可序列化类中明确定义serialversionuid字段,以利用运行时的内置兼容性检查。不这样做的代码是完全合法的,但容易受到错误的影响。对静态分析仪的影响
可以公平地说,C程序中最常见的严重缺陷是内存访问错误,而Java程序中很大一部分的严重错误是由于滥用标准API而引起的。毫不奇怪,最适合找到这些缺陷类别的技术完全不同。
要在C或C ++程序中找到内存访问错误,分析工具必须找到错误表现出的程序执行路径。大多数程序在大多数执行路径上都正确工作,因此缺陷通常仅在异常或轻度锻炼的路径上发生。这些通常对应于很难通过测试覆盖的角病。路径的相关部分可以从程序的一部分开始,绕过各种循环,并涉及跨越编译单元和模块边界的过程调用链。因此,能够执行符号执行的整个程序敏感分析是在静态上找到这些缺陷的最有效技术之一。实施这种分析不是针对胆小的人 - 提供此类工具的供应商使几个人的努力用于开发复杂的分析算法。几乎与找到缺陷一样重要的是在整个程序的背景下帮助用户理解它们,因此工具提供了帮助解决方案的功能。Figure 1下面显示了一个此类工具的屏幕截图。
图1.静态分析工具的屏幕截图。左窗格显示了呼叫图的可视化,红色突出显示以指示分析发现最多问题的模块。
相比之下,上面讨论的Java问题需要进行不太复杂的分析,因为它们主要是代码结构的简单属性。因此,使用相对轻巧的工具可以从中获得很多好处。对于Java而言,该域中的主要工具是findbugs(findbugs.sourceforge.net)。这是一个免费的开源工具,非常易于使用独立或与其他工具集成。所有Java开发人员都可以不时地使用Findbugs从其代码上使用Findbugs受益,而且价格意味着没有充分的理由不这样做。
图2.来自Findbugs的报告表明,类实现不是使用可序列化类的最佳实践。
这种类型的工具是有效的,因为它们具有数百种模式的知识库,这些模式表明了不良练习,可疑的正确性,潜在的安全漏洞或简单的可疑代码。可以通过Findbugs检测到前面讨论的两个Java问题(图2)。
警告
如上所述,诸如API滥用和内存访问错误之类的问题发生在许多应用程序域中。但是,如果您的程序旨在在某些缺陷特别容易受到影响的环境中使用,那么值得考虑使用多种技术和工具来查找和消除这些缺陷。例如,如果Java用于启用了Internet的数据库应用程序,则缺陷可能会使该应用程序容易受到SQL注入或跨站点脚本的攻击。在这种情况下,建议使用专门用于识别这些问题的工具。
数据竞赛,饥饿和僵局等并发缺陷属于自己的类别,因为它们在所有语言上都非常严重,并且很难找到和诊断。数据竞赛特别有问题,因为它们通常对时间安排非常敏感。具有数据竞赛错误的程序可以在相同的输入的同一环境中运行数百次,但在执行一次时仍然因混乱和神秘的症状而失败。
静态检测并发缺陷需要比查找内存错误的算法更复杂。C和C ++最复杂的工具确实已经具有这种能力,尽管它们的有效性差异很大,并且可扩展性可能是有问题的。对于Java而言,此类功能仅在商业工具中出现,尚未在免费的开源工具中引入。
结论
尽管C和C ++存在许多问题,但对于许多应用程序,仍然没有广泛认可的替代高级语言。由于特殊用途处理器的流行,对于嵌入式系统尤其如此。C是标准语言,因为为这些处理器开发C的工具链的成本相对较低。可以合理地期望C ++将在很长一段时间内与我们同在。不幸的是,尽管对C和C ++程序特别容易受到的缺陷的广泛了解,但程序员仍会犯同样的错误,这些错误仍然令人失望。C和C ++的高级静态分析工具在出现在现场之前提供了一种找到这些缺陷的方法。
A more encouraging trend for embedded systems development is that programmers are increasingly using safer languages such as Java when appropriate, such as for the desktop or mobile applications used to communicate with the embedded systems. Of course such programs are prone to bugs too, but even the worst defects will tend to be less devastating than those found in C and C++ programs. The good news is the availability of low-cost static analysis tools which make it possible to find those bugs early in the development cycle.