Coverity's Zack Samocha: Software Quality and the Open Source Advantage
"At the end of the day, every software has issues," explained Zack Samocha, director of the Coverity Scan project. "What is important is the impact those issues have. That is the question that developers have to ask. That answer depends on what you use the software for. Obviously, it is more critical if the software is used in machines that control X-rays or you send it to Mars."
Software quality is a topic close to most developers' hearts, whether they work with open source or proprietary code.
Assessing quality, however, isn't a simple matter. As a result, several efforts have sprung up to tackle the challenge, including the Coverity Scan project.
Coverity began work in 2006 on the open source project, which is a joint endeavor with the Department of Homeland Security designed to enable software developers -- both community-based and commercial -- to upload their code for analysis of the number of defects and other glitches.
This year's Scan Project analyzed more than 850 million lines of code from more than 300 open source projects. The participants include Linux, PHP and Apache developers, plus an anonymous sample of nearly 300 of Coverity's customers. Among its conclusions was that Linux code is the "benchmark of quality," in the project's own words.
In this interview, LinuxInsider talks to Zack Samocha, director of the Coverity Scan project, about the many and varied issues involved in ridding software of bad code.
LinuxInsider: How have developers been motivated by the scanning efforts of groups like the Sustainable Computing Consortium and the Coverity Scan project?
Zack Samocha: In our case we host the scan project in the cloud and let developers use it for free. The more ways that exist to measure code quality and the number of organizations that pursue this will only help to strengthen all of the open source communities.
LI: Does your testing of open source code show it to be more or less reliable and error-filled than commercial products?
Samocha: I think it is hard to give a definite answer to that. What we have seen is that the ability to scan and review so many open source projects makes the code at least equal to if not better than proprietary products. The realty to being successful in delivering quality code is more than scanning it. That may be fine with smaller projects, but with really big projects, once you cross beyond 500,000 lines of code, just having more people involved is not going to be enough.
You need to have secure infrastructure. You need processes. You need to monitor nightly builds and weekly builds. You need to have a reliable system that will pull the code after checking it. All of these things are required. Naturally, they also exist in proprietary code, because that is how organizations develop. I have seen open source communities that are able to do all of this in a very successful way.
LI: So how do the two compare?
Samocha: Linux is a good example. Those distribution communities have so many contributors. Imagine how much it would cost for proprietary developers to submit their code constantly for scanning. So having a code-scanning option for open source is definitely an advantage, but it does not replace the need for having some sort of methodology and a way to do things to ensure code quality.
If you look into the facts, it is very obvious the different density in open source and proprietary are very much in line. Once you go over one million lines of code, proprietary code is a bit better but not significantly. When the density is below 100,000 lines of code, you can see that open source is actually better.
LI: What key findings did the 2012 Coverity Scan project reveal about any key areas of weakness or concerns about software quality?
Samocha: I think we have identified two key things. The first one is benchmarking. When we started with the Coverity Scan project, we wanted to identify quality ratings. We decided that 1,000 lines of code without errors was good, and gave it a one. What we began to notice over the last few years is that the many software projects had developers that paid attention to the rating and really started to lower them. Today good software quality is 0.7 for 1,000 lines of code. I think that shows that organizations, both in commercial and in open source, are understanding the need to do more testing and development. As a result, they are getting the good quality ratings by doing that. It is a clear trend over the years.
The second thing is the rate of adoption of code scan in the open source community. It just shows the maturity of organizations and developers in the open source community.
LI: What changes in software development have these assessments brought about?
Samocha: The use of scan is growing like crazy. You know, developers did not like to fix things like this before. Now they do it willingly. They are concentrating more on fixing issues and growing the quality of development.
LI: Is author T. Capers Jones correct in his view that no method of removing software defects or errors is 100 percent effective?
Samocha: Another thing we saw in the report is that once you reach a certain size, you really have to have your resources together. You can do well with 100,000 lines of code, but once you go over one million lines of code, we see very clearly that you can still have good quality if you have an organization controlling it. If you do not, then just having talent and many eyes looking over the code is not enough.
Still, today you cannot keep your code limit small. It is the nature of code to keep growing and going. You decide to add a component of open source code from somebody else, and suddenly, boom. You instantly have a large chunk of code, but you do not know that code that well, and your developers are not really experts with it.
LI: How close to perfect can developers expect to get their software if they apply all of the scans and fixing?
Samocha: Regarding T. Capers Jones' views that you can never remove all software defects, I really appreciate his writing. I think at the end of the day, every software has issues. What is important is the impact those issues have. That is the question that developers have to ask. That answer depends on what you use the software for. Obviously, it is more critical if the software is used in machines that control X-rays or other medical tools that look inside of peoples' bodies, or you send it to Mars. In those areas the impact can be very severe. They demand a totally different level of quality.
Not every tool out there can find everything. At the end of the day, you do the very best that you can. Even then, you will not be fully successful with every task. As an organization, you want to understand the issues and the impacts they can make.
LI: What role has the Department of Homeland Security played in changing the quality assessments of both commercial and open source software?
Samocha: Homeland Security was active in the early days of the project, but they are not really active in it now. They were mostly associated through MITRE. MITRE would encourage software engineering departments to use our Scan to find defects. MITRE is funded by the government so it gave developers a reason to fix the issue because it is a standard, but to be honest with you, developers fix what they really think is a problem.
LI: Given the level of disclosures recently about what some call government abuses with surveillance, has DHS pushed developers to include back doors or other secret access to content obtained with various software projects?
Samocha: I can honestly say no. DHS and the Scan project have been involved with none of that. Keep in mind that the success of the Coverity Scan Project depends on the open source community to willingly come to us. Developers have to choose to do it. They have to spend some time and effort downloading our scan software and using it. We cannot control who is uploading their software to us or verify how complete the software submitted is.
LI: What major challenges do open source communities face in developing high-quality software?
Samocha: We talk to the open source community all the time. For the open source community, it is important to find the right talent. To have a good project you have to have a good architect and good people behind it. The advantage of an open source community is the participants are so excited about what they are doing. It is not always easy to find good talent. The second aspect is the infrastructure. This involves things like how do you monitor for defects and how do you monitor your process. The good thing is there are many tools out there that are giving a free service to the open source community.
LI: What problems lie ahead in insuring better quality control for software?
Samocha: Developers have to have the maturity to know that they really have to do it this way. The big advantage open source has is the developers really do not have the pressure to rush to market, but they have the financial challenge of getting money to set up their infrastructure machines and all of that.