Due to the popularity of small computing devices such as smartphones, users are surrounded by diverse software more than ever. Software on small computing devices usually shows the characteristics of small size, low power consumption and smaller number of functions. Compared to these lightweight software systems, the conventional software systems show the characteristics of bigger size and more complex functionality. In the conventional computing environment, users use only a small number of software systems intensively. So it had been enough to check and guarantee the functional correctness and safety of a small number of giant systems in order to protect the user systems and their information inside the systems from outside attacks.
However, checking the correctness and safety of giant systems is not enough anymore, since users are using various software systems or web services provided by unskilled developers. To prove or guarantee the safety of software system, a lot of research has been conducted in diverse areas of computer science. Testing or behavior monitoring has been one of the best approaches for insuring the safety and correctness of software system for a long time. Those approaches take a large amount of time and money for verification: paying time and money was worthy since the verified software was used for a long time, and the developers were rich enough to pay the verification cost.
In the era of small software applications with relatively short lifetime, verifying the already- implemented software is not a viable option anymore. Verifying the code in the middle of development or guiding developers to write a safer code would be a more feasible solution. If the language semantics could guarantee that the compiled code is always safe with a rigid proof, software developers would not need time-consuming testing or runtime monitoring any more.
Due to the manifold definition of security, there exists an argument about the meaning of secure software. What does secure software or a secure software system mean? Is the software made by the development process which guarantees the security? Is the software going to be used for guaranteeing the security of users? Or will the software not undermine the security of the underlying system or the user information inside?
Depending on which definition of secure software is used, the approaches toward building secure software will vary. In this paper, the term trustworthy software is used to refer a software system which will not do the harmful things if compiled.
We will discuss the on-going approaches for guaranteeing or verifying the safety of software systems in this paper. Fig. 1 shows the research topics and mechanisms of software verification we will examine in this paper. In Section 2, current software verification approaches are categorized by their checking targets, target platforms, type of languages. From Section 3 to Section 5, we will discuss the state-of-theart approaches of trustworthy software.
In Section 6, the examples of trustworthy software research will be reviewed to demonstrate the viability of trustworthy software development. We also discuss the future research challenge which must be solved with better solutions in the near future. We will conclude our review in Section 7.
Taxonomy of software verification research.
2. Categories of Software Verification
Categorizing the methods of software verification can be very tricky since so many algorithms, platforms, or frameworks have been proposed for the exactly same purpose: verifying software in order to guarantee its safe or not-hostile behavior. From the view point of programming language, one simple criterion is whether or not the language semantics is concerned for the resulting code safety.
If the safety of implemented software is checked by the language semantics, it is not necessary to verify the resulting software after development is done. That means, if the code compilation is successful, the code will not do any harm to the system. With these approaches, verification is usually done when writing code and compiling it. Programming languages in this category are usually equipped with fully-verified operational semantics, and dedicated certifying compiler produces runnable machine code and a proof of its safety. Verification of the software safety ends up with checking the validity of the accompanying proof. The operational semantics of these languages and the proofs are usually written in mathematical logic. The proofs can be written by human experts, but it can be a tedious job for a human expert to write down long logical formulas for herself. Producing a proof for a snippet of code and verifying the proof automatically is one of the most interesting issues in semantics-based software verification, and software systems for this purpose are called theorem provers. We will discuss the research trend of theorem provers in Section 6.2.
However, most of the programming languages which are used for software implementation do not have the safety-sensitive operational semantics. In order to overcome the weakness of conventional programming languages such as C or to re-enforce the safe feature of conventional programming languages such as Java, constraint specification languages have been proposed. Constraint specification languages are used for specifying the pre- and post-conditions of functional procedures written in conventional programming language. Constraint are dropped or transformed into a bunch of conditional branch code by the dedicated pre-processors. In the runtime, constraints defined by developers are checked to see whether all the required constraints remain valid. Even though adding annotation is weaker than writing a program in a programming language with safety-sensitive semantics, adding annotation can be a good option when implementing all the software system is already done. We will discuss the constraint specification languages in Section 4.
If the source code is not available for annotation, the annotating approach cannot be applied. In this case, it is inevitable to determine the safety of a software system based on its observable behavior. Software testing and address space protection are two of the most popular software verification techniques which can be used without available source code. We will discuss these techniques in Section 5.
Some security features are so important that some approaches cross our classification boundaries have been proposed. These are: memory space protection and information flow security. Keeping a notverified application within a limited memory space prevents the untrusted application from underlying system infrastructure or other applications. Sandboxing is monitoring the addressing violation in order to protect the memory address boundary of every application. Since it is a runtime monitoring approach, sandboxing can be applied to software with no source code. Software fault isolation is another approach to protect address boundaries. In this scheme, annotations checking address violation are inserted into the source code of application automatically before compiling. Recently the researchers have started checking the memory safety by language semantics and checkable proofs. Separation logic is the most up-and-coming research in this area. With separation logic, more rigid and tight reasoning can be achieved in protecting memory address spaces. We will discuss sandboxing in Section 5.2, software fault isolation in Section 4.1, and separation logic in Section 3.4.
Information flow security is another good example to which diverse approaches are applied. Information breach of program variables can be successfully handled by program annotation, whereas covert channels such as timing or power consumption can be detected by runtime monitoring. We will discuss the approaches for information flow security in Section 4.2 and Section 5.3. Table 1 shows the classification of software verification we proposed in this paper.
After extensive overview of software verification mechanisms, we will discuss the research topics which must be explored for the success of software verification. We will present the related topics with some pioneering research examples in Section 6.
Classification features of software verification
3. Checking by Language Semantics
Infusing safety feature into language semantics can be very useful to build safe and trustworthy software. Researches in this area usually span in two ways: formalizing the semantics of (a subset of) existing language or designing a new language with formalized language semantics.
3.1 Proof-Carrying Code
Necula and his colleague [1,2] proposed proof-carrying code (PCC) in which a host system can execute a code from untrusted outside party. In this mechanism, the host system must determine with certainty that it is safe to execute a program supplied (possibly in binary form) by an untrusted (and maybe malicious) outsider. The untrusted code producer must supply with the code a safety proof that claims that the code satisfies a previously defined safety policy.
Appel and his colleague [3,4] proposed foundational proof-carrying code (FPCC), in which code is verified with the smallest possible set of axioms, using the simplest possible verifier and the smallest possible runtime system. The paper describes the mathematical and engineering problems to be solved and how to applying PCC mechanism to general programs.
Recent research of Vanegue  made comparison between PCC  and FPCC [3,4]. While in PCC, it is possible to make use of type rules directly in the axioms of the system, in FPCC, each type rule should be first defined from ground axioms. In addition, FPCC suggests the use of type-preserving compilers, such as the one in CompCert . Vanague  argued that PCC is more vulnerable to unconventional machine instructions than FPCC since PCC can lose soundness because of its dependency to the underlying arbitrary type system.
3.2 Typed Assembly Language
Morrisett and his colleagues [7-9] proposed typed assembly language (TAL) which has been based on other machine code verification. The typed assembly language is based on a conventional RISC assembly language, but its static type system even supports high-level language abstractions, such as closures, tuples, and user-defined abstract data types. The type system ensures that well-typed programs cannot violate these abstractions, and the safety of underlying system.
The TAL has played an important role in verifying low-level system software . Lattner and Adve  proposed a compiler framework called LLVM (Low Level Virtual Machine). The framework was designed to support transparent program analysis and transformation for arbitrary programs. It provides high-level information to compiler transformations at compile-time, link-time, and runtime.
Patrignani et al.  suggested a compilation technique which prohibits an attacker operating at the target language level from bypassing security features of the source language. In their compilation scheme, whose source language is an object-oriented, high-level language and whose target language is an extended typed assembly language.
3.3 Dependently Typed Functional Language
In type theory, a dependent type is a type whose definition depends on a value. In intuitionistic type theory, dependent types are used to encode logic's quantifiers like "for all" and "there exists" in intuitionistic type theory. Dependent types can enhance the expressive power of type systems, and the improved type system is powerful enough to prevent software bugs . Several functional programming languages support dependent types such as Agda , Cayenne , Coq [16,17], Epigram , and Idris .
Coq is one of the most popular proof assistant based on dependently typed functional language. Huet and Herbelin  summarized the research related to Coq over 30 year. Researchers in diverse research areas are adopting Coq to verify formally their implementation. Athalye  suggested a framework based on Coq, which can be used for formalizing the theory of IO automata, including refinement, simulation relations, and composition. Chatzikyriakidis and Luo  utilized Coq for formalizing natural language inference based on the formal semantics in modern type theories (MTTs).
3.4 Separation Logic
It is commonly accepted that verifying a program written in a programming language allowing memory address manipulation such as C is extremely hard or even impossible. However, researchers have demonstrated that carefully designed logic or type system can reason about the safety of program with pointers. Separation logic by Reynolds [24-26] aims at expediting programs that manipulate pointer data structures. The separation logic is used for specifications and proofs of a program component, and it reasons about only the portion of memory used by the component, and not the entire global state of the system.
For checking the proofs of a program, Berdine et al.  proposed an automatic proof checker of separation logic. In the proposed separation logic checker, called Smallfoot, the assertions describe only the shapes of data structures rather than their detailed contents.
Separation logic has gained widespread popularity because of its ability to succinctly express complex invariants of program’s heap configurations. Qui and his colleagues [28,29] proposed a framework which extends the Vcc framework  to provide an automated deductive framework against separation logic specifications for C programs based on natural proofs.
4. Annotation and Transformation
Implementing a software system in a programming language with rigorous semantics is the best way of building trustworthy software. However, there still exist a large number of legacy systems written in not-so-much safe programming languages. For some mysterious reasons, a lot of software systems are implemented in unsafe programming languages. To enhance the trustworthiness, adding annotation to the code of classic programming language is a simple and viable option. Usually annotated code is transformed into a code of original programming language by a dedicated pre-processor.
4.1 Software Fault Isolation
It has been believed that protecting memory space or memory pages is the duty of the OS/kernel or the hardware: virtual memory mechanism of hardware or user/admin mode of operating system .
However, memory boundaries can be preserved by program annotation. Wahbe et al.  proposed a memory protection based on program rewriting called software fault isolation (SFI). In SFI scheme, untrusted object code is modified by rewriting algorithm automatically in order for the untrusted code not to write or jump to an address outside its fault domain.
Recently, SFI scheme has expanded to various intermediate languages or hardware platforms. While conventional SFI relies on analysis of assembly-level programs, Kroll et al.  suggested an improved scheme of analyzing and rewriting programs in a compiler intermediate language, the Cminor language of the CompCert C compiler. Since the CompCert compiler has been formally certified that its transformation process preserves the safety property of a program, it is formally guaranteed that resulting binary modules satisfy the SFI memory safety. At the same time, resulting binary modules can be any of the supported architectures of CompCert compiler.
4.2 Information Flow Security
Confidentiality is one of the most important factors of standard system security. An end-to-end confidentiality policy might assert that secret input data cannot be inferred by an attacker when the attacker can observe the system output. To enforce this information flow policy, access control and encryption have been conventionally used. In addition to these conventional protections, the use of programming language techniques for specifying and enforcing information flow policies has been proposed. Sabelfeld and Myers  have surveyed language-based approaches for information flow security and identified open challenges in this area of software verification.
Costanzo et al.  designed and implemented a system which can be used to formally verify that the end-to-end behavior of the computing system really satisfies various information-flow policies. The input of their system includes a software system that consists of both C and assembly programs, and they demonstrate that their system constructed an end-to-end security proof, fully formalized in the Coq proof assistant.
Static analysis approach can be also applied to catching covert channels. Doychev et al.  suggested a framework for the automatic, static analysis of cache side channels. In their approach, observed cache states, traces of hits and misses, and execution times are used to generate formal, quantitative security guarantees.
5. Checking Behavior
If the source code of software system is not available, or if annotating is not expressive enough to cover the safety feature of stakeholders, the safety of software must be determined by observing the behavior of software. Testing or monitoring is not complete, but it is useful to find some (but not all) defects of software system if testing or monitoring is used widely and intensively.
Fuzzing or fuzz testing might be the most popular testing techniques in these days. It is one of the automated software testing techniques, and the software conducting fuzzing test is called fuzzer. Fuzzers provides unexpected or random data as inputs to a computer program, and monitors the program running for crashes or potential memory leaks. Depending on the existence of source code, fuzzing is classified as white-box fuzzing or black-box fuzzing [37,38]. Generating test cases randomly is less powerful than considering program running path in test case generation, but it is also known that fuzzing is much more cost-effective .
Godefroid et al.  proposed a grammar-based specification technique for generated random, but valid inputs for fuzzing. They also demonstrated that the complex structured-input guided by their proposed specification enhanced the effectiveness of white-box fuzzing. Stephens et al.  adopted symbolic execution when white-box fuzzing is performed. They showed that their fuzzer is efficient enough to identifying the same number of vulnerabilities, in the same time, as the top-scoring team of the qualifying event of the DARPA Cyber Grand Challenge.
Sandboxing is a security mechanism for separating running programs, and protecting the programs from each other. It is usually used for executing untrusted or unverified code from outside of the system . An untrusted program is put into a sandbox with a tightly controlled set of resources such as limited number of available APIs, restricted memory space, reduced network access or/and limited access to device drivers. In the sense of providing a highly controlled environment, sandboxing may be seen as a specific example of virtualization.
5.3 Information Flow Security
Preventing user information from owing to non-legitimate third-party gets more important than ever as more and more sensitive personal information is gathered into small smart phones. Covert channels, that means, leaking information through program termination, execution time, or exceptions, have become the major source of wrong information flow.
Since the source code of smartphone apps are not usually available for static analysis of potential information leaks, tracking or monitoring information flow is still the most viable ways of protecting the system from malicious information leaks. Enck et al.  implemented a system-wide dynamic taint tracking and analysis system, which is able to track multiple sources of sensitive data simultaneously. Their system also gave to users a visual analysis of how third-party applications collect and share their private data.
5.4 Property-Based Random Testing
Property-based random testing (PBRT) is a form of black-box testing with formal statements of its intended behavior properties. The testing case is derived from the formal statements, and the software system is tested with a large number of random test cases. PBRT was popularized in the functional programming society by the QuickCheck library for Haskell . Pacheco and Ernst  proposed PBRT for java called RANDOOP. Their system generates unit tests for Java code using feedbackdirected random test generation. It also provides an annotation-based interface for specifying configuration parameters.
The concept of random testing and testing directed by feedback has improved to automated random testing. Godefroid et al.  proposed the concept of directed automated random testing. They argued that with their system, testing can be performed completely automatically on any program that compiles—there is no need to write any test driver or harness code.
6. Research Challenge
6.1 Trusted Computing Base
To guarantee the safety of implemented software, the underlying software systems must operate correctly. For example, if a compiler does not produce a correct code because of its internal bugs, all the care and attention a software developer pays would get useless. Some software such as compilers, operating systems or cryptography algorithms must be trusted, so they are called trusted computing base. There have been research cases showing the successful demonstration of verifying the trust- worthiness of base software. For compilers, Leroy's CompCert project [6,52] demonstrates the viability of automatic or machine-assisted program verification. In this project, the correctness of an optimizing C compiler is specified, implemented and proved. They expressed the operational semantics of all the involved languages, source language and all intermediate languages, as inductive relations in the Coq [16,17] proof assistant. The Vellvm project  formalized the LLVM compiler's intermediate language and proved the correctness of some key optimizations.
In operating systems, the CertiKOS project [54,55] has built a new, fully verified and secure hypervisor kernel. Cerritos is hypervisor architecture that leverages formal certification to ensure correctness and counter information leakage in cloud computing. CertiKOS shows an example of applying recent advances in certified software design to implementation of a modular and evolvable certified kernel. Through machine-checkable proof certificates and runtime monitoring, CertiKOS provides users the assurance of correct and leak-free execution of their cloud services. Klein et al. [56,57] verified seL4 microkernel from an abstract specification down to its C implementation. A thirdgeneration microkernel of L4 provenance, seL4, comprises 8,700 lines of C code and 600 lines of assembler, and they built a formal, machine-checkable verification. In their verification, they assumed correctness of compiler, assembly code, and hardware. It also shows that verification of individual software in software hierarchy must depend on trusted computing base.
6.2 Theorem Prover
Tools for reasoning about programs ranges from fully automatic program analysis tools, called automated theorem prover, to interactive tools where the human is closely involved in the proof process, called proof assistant. Early versions of theorem provers were first-order. The first-order theorem provers were applied to the problem of verifying the correctness of computer programs in languages such as Pascal, Ada, and Java etc. First-order theorem proving is one of the most mature areas of automated theorem proving. CARINE (Computer Aided Reasoning engINE)  and Larch Prover  are notable automated theorem provers in their performance and rigorous.
While other logics, such as higher-order logics, have more expressive power, theorem proving for these logics is far more difficult. Therefore, proving or/and checking higher-order logics are usually conducted by proof assistant systems with human involvement. Because of the flexibility and expressive power of higher-order logics, proof assistants are more prevalent than fully automated theorem provers: Coq [16,60], Isabelle [61,62], NuPRL , PVS , and Twelf [65,66].
6.3 Policy Specification
It is not very easy to determine which property will be essential to safe software not to intrude the system security. It depends on what is expected from a software system. It would be very useful for a user or diverse stakeholders to specify their policy requirement in a dedicated language, and then other parts such as certifying compilers and theorem provers can generate and check the proofs. REMS project  has calibrate formal specifications against observed behavior successfully. Formal languages like Z , Alloy , or AADL  are used for specifying properties of system models. Higher-order logic and type theories in modern proof systems will be successfully used for specifying arbitrary concepts as abstraction boundaries.
We have discussed the on-going approaches for guaranteeing or verifying the safety of software systems in this paper. We have also discussed that more vigorous research is needed in some areas such as assisting automatic or semi-automatic theorem proving, specifying arbitrary policy policies, and making the infrastructure software more trustworthy.
Building trustworthy software has been the aim of most of software developers since the dawn of computer science. Due to the intensive research on programming language semantics and formal verification as well as increasing hardware power and the ability of fast data analysis, we believe that we are close to machine-verifiable trustworthy software.