oss-fuzz/docs/ideal_integration.md

# Ideal integration with OSS-Fuzz 
OSS projects have different build and test systems. So, we can not expect them
to have a unified way of implementing and maintaining fuzz targets and integrating
them with OSS-Fuzz. However, we will still try to give recommendations on the preferred ways.

Here are several features (starting from the easiest) that will make automated fuzzing
simple and efficient, and will allow to catch regressions early on in the development cycle. 

## Fuzz Target
The code of the [fuzz target(s)](http://libfuzzer.info/#fuzz-target) should be part of the project's source code repository. 
All fuzz targets should be easily discoverable (e.g. reside in the same directory, or follow the same naming pattern, etc). 

This makes it easy to maintain the fuzzers and minimizes breakages that can arise as source code changes over time.

Make sure to fuzz the target locally for a small period of time to ensure that 
it does not crash, hang, or runs out of memory instantly. 
See details at http://libfuzzer.info and http://tutorial.libfuzzer.info

Examples: 
[boringssl](https://github.com/google/boringssl/tree/master/fuzz),
[SQLite](https://www.sqlite.org/src/artifact/ad79e867fb504338),
[s2n](https://github.com/awslabs/s2n/tree/master/tests/fuzz),
[openssl](https://github.com/openssl/openssl/tree/master/fuzz),
[FreeType](http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/tools/ftfuzzer),
[re2](https://github.com/google/re2/tree/master/re2/fuzzing),
[harfbuzz](https://github.com/behdad/harfbuzz/tree/master/test/fuzzing),
[pcre2](http://vcs.pcre.org/pcre2/code/trunk/src/pcre2_fuzzsupport.c?view=markup),
[ffmpeg](https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/decoder_targeted.c).


## Seed Corpus
The *corpus* is a set of inputs for the fuzz target (stored as individual files). 
When starting the fuzzing process, one should have a "seed corpus", 
i.e. a set of inputs to "seed" the mutations.
The quality of the seed corpus has a huge impact on the fuzzing efficiency as it allows the fuzzer
to discover new code paths easier. 

The ideal corpus is a minimial set of intputs that provides maximal code coverage. 

For better OSS-Fuzz integration 
the seed corpus should be available in revision control (can be same or different as the source code). 
It should be regularly extended with the inputs that (used to) trigger bugs and/or touch new parts of the code. 

Examples: 
[boringssl](https://github.com/google/boringssl/tree/master/fuzz),
[openssl](https://github.com/openssl/openssl/tree/master/fuzz),
[nss](https://github.com/mozilla/nss-fuzzing-corpus) (corpus in a separate repo) 


## Regression Testing
The fuzz targets should be regularly tested (not necessary fuzzed!) as a part of the project's regression testing process.
One way to do so is to link the fuzz target with a simple driver
(e.g. [this one](https://github.com/llvm-mirror/llvm/tree/master/lib/Fuzzer/standalone))
that runs the provided inputs and use this driver with the seed corpus created in previous step. 
It is recommended to use the [sanitizers](https://github.com/google/sanitizers) during regression testing.

Examples: [SQLite](https://www.sqlite.org/src/artifact/d9f1a6f43e7bab45),
[openssl](https://github.com/openssl/openssl/blob/master/fuzz/test-corpus.c)

## Fuzzing dictionary

For some input types a simple dictionary of tokens used by the input language
may have dramatic positive effect on fuzzing. 
For example, when fuzzing an XML parser, a dictionary of XML tokens will help.
AFL has a [collection](https://github.com/rc0r/afl-fuzz/tree/master/dictionaries)
of such dictionaries for some of the popular data formats.
Ideally, a dictionary should be maintained alongside the fuzz target.
The syntax is described [here](http://libfuzzer.info/#dictionaries).

## Build support
A plethora of different build systems exist in the open-source world.
And the less OSS-Fuzz knows about them, the better it can scale. 

An ideal build integration for OSS-Fuzz would look like this:
* For every fuzz target `foo` in the project, there is a build rule that builds `foo_fuzzer.a`,
an archive that contains the fuzzing entry point (`LLVMFuzzerTestOneInput`)
and all the code it depends on, but not the `main()` function
* The build system supports changing the compiler and passing extra compiler
flags so that the build command for a `foo_fuzzer.a` looks similar to this:

```
CC="clang $FUZZER_FLAGS" CXX="clang++ $FUZZER_FLAGS" make_or_whatever_other_command foo_fuzzer.a
```

In this case, linking the target with e.g. libFuzzer will look like "clang++ foo_fuzzer.a libFuzzer.a".
This will allow to have minimal OSS-Fuzz-specific configuration and thus be more robust. 

There is no point in hardcoding the exact compiler flags in the build system because they 
a) may change and b) are different depending on the fuzzing target and the sanitizer being used. 

## Not a project member?

If you are a member of the project you want to fuzz, most of the steps above are simple.
However in some cases someone outside the project team may want to fuzz the code
and the project maintainers are not interested in helping.

In such cases we can host the fuzz targets, dictionaries, etc in this
repository and mention them in the Dockerfile.
Examples: [libxml2](../targets/libxml2), [c-ares](../targets/c-ares), [expat](../targets/expat).
This is far from ideal because the fuzz targets will not be continuously tested 
and hence may quickly bitrot.

If you are not a project maintainer we may not be able to CC you to security bugs found by OSS-Fuzz.
Create ideal_integration.md 2016-11-15 18:04:07 +00:00			`# Ideal integration with OSS-Fuzz`
Update ideal_integration.md 2016-11-16 16:59:35 +00:00			`OSS projects have different build and test systems. So, we can not expect them`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00			`to have a unified way of implementing and maintaining fuzz targets and integrating`
Update ideal_integration.md 2016-11-16 16:59:35 +00:00			`them with OSS-Fuzz. However, we will still try to give recommendations on the preferred ways.`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00
Update ideal_integration.md 2016-11-18 23:04:06 +00:00			`Here are several features (starting from the easiest) that will make automated fuzzing`
			`simple and efficient, and will allow to catch regressions early on in the development cycle.`
Update ideal_integration.md 2016-11-16 06:07:18 +00:00
Update ideal_integration.md 2016-11-18 23:04:06 +00:00			`## Fuzz Target`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00			`The code of the [fuzz target(s)](http://libfuzzer.info/#fuzz-target) should be part of the project's source code repository.`
			`All fuzz targets should be easily discoverable (e.g. reside in the same directory, or follow the same naming pattern, etc).`

Update ideal_integration.md 2016-11-18 23:25:28 +00:00			`This makes it easy to maintain the fuzzers and minimizes breakages that can arise as source code changes over time.`

			`Make sure to fuzz the target locally for a small period of time to ensure that`
			`it does not crash, hang, or runs out of memory instantly.`
			`See details at http://libfuzzer.info and http://tutorial.libfuzzer.info`
Update ideal_integration.md 2016-11-16 16:59:35 +00:00
Create ideal_integration.md 2016-11-15 18:04:07 +00:00			`Examples:`
			`[boringssl](https://github.com/google/boringssl/tree/master/fuzz),`
			`[SQLite](https://www.sqlite.org/src/artifact/ad79e867fb504338),`
			`[s2n](https://github.com/awslabs/s2n/tree/master/tests/fuzz),`
			`[openssl](https://github.com/openssl/openssl/tree/master/fuzz),`
			`[FreeType](http://git.savannah.gnu.org/cgit/freetype/freetype2.git/tree/src/tools/ftfuzzer),`
Update ideal_integration.md 2016-11-15 19:56:11 +00:00			`[re2](https://github.com/google/re2/tree/master/re2/fuzzing),`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00			`[harfbuzz](https://github.com/behdad/harfbuzz/tree/master/test/fuzzing),`
			`[pcre2](http://vcs.pcre.org/pcre2/code/trunk/src/pcre2_fuzzsupport.c?view=markup),`
Update ideal_integration.md 2016-11-15 19:56:11 +00:00			`[ffmpeg](https://github.com/FFmpeg/FFmpeg/blob/master/doc/examples/decoder_targeted.c).`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00

Update ideal_integration.md 2016-11-18 23:04:06 +00:00			`## Seed Corpus`
Update ideal_integration.md 2016-11-17 04:20:49 +00:00			`The corpus is a set of inputs for the fuzz target (stored as individual files).`
			`When starting the fuzzing process, one should have a "seed corpus",`
			`i.e. a set of inputs to "seed" the mutations.`
			`The quality of the seed corpus has a huge impact on the fuzzing efficiency as it allows the fuzzer`
			`to discover new code paths easier.`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00
Update ideal_integration.md 2016-11-17 04:20:49 +00:00			`The ideal corpus is a minimial set of intputs that provides maximal code coverage.`

			`For better OSS-Fuzz integration`
			`the seed corpus should be available in revision control (can be same or different as the source code).`
			`It should be regularly extended with the inputs that (used to) trigger bugs and/or touch new parts of the code.`
Update ideal_integration.md 2016-11-16 06:10:56 +00:00
Create ideal_integration.md 2016-11-15 18:04:07 +00:00			`Examples:`
			`[boringssl](https://github.com/google/boringssl/tree/master/fuzz),`
			`[openssl](https://github.com/openssl/openssl/tree/master/fuzz),`
Update ideal_integration.md 2016-11-17 04:20:49 +00:00			`[nss](https://github.com/mozilla/nss-fuzzing-corpus) (corpus in a separate repo)`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00

Update ideal_integration.md 2016-11-18 23:04:06 +00:00			`## Regression Testing`
Update ideal_integration.md 2016-11-16 16:59:35 +00:00			`The fuzz targets should be regularly tested (not necessary fuzzed!) as a part of the project's regression testing process.`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00			`One way to do so is to link the fuzz target with a simple driver`
			`(e.g. [this one](https://github.com/llvm-mirror/llvm/tree/master/lib/Fuzzer/standalone))`
Update ideal_integration.md 2016-11-16 16:59:35 +00:00			`that runs the provided inputs and use this driver with the seed corpus created in previous step.`
Update ideal_integration.md 2016-11-17 04:27:09 +00:00			`It is recommended to use the [sanitizers](https://github.com/google/sanitizers) during regression testing.`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00
Update ideal_integration.md 2016-11-16 19:16:27 +00:00			`Examples: [SQLite](https://www.sqlite.org/src/artifact/d9f1a6f43e7bab45),`
			`[openssl](https://github.com/openssl/openssl/blob/master/fuzz/test-corpus.c)`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00
Update ideal_integration.md 2016-11-18 23:04:06 +00:00			`## Fuzzing dictionary`

			`For some input types a simple dictionary of tokens used by the input language`
			`may have dramatic positive effect on fuzzing.`
			`For example, when fuzzing an XML parser, a dictionary of XML tokens will help.`
			`AFL has a [collection](https://github.com/rc0r/afl-fuzz/tree/master/dictionaries)`
			`of such dictionaries for some of the popular data formats.`
			`Ideally, a dictionary should be maintained alongside the fuzz target.`
			`The syntax is described [here](http://libfuzzer.info/#dictionaries).`

			`## Build support`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00			`A plethora of different build systems exist in the open-source world.`
Update ideal_integration.md 2016-11-16 16:59:35 +00:00			`And the less OSS-Fuzz knows about them, the better it can scale.`
Create ideal_integration.md 2016-11-15 18:04:07 +00:00
			`An ideal build integration for OSS-Fuzz would look like this:`
Update ideal_integration.md 2016-11-16 19:16:27 +00:00			* For every fuzz target `foo` in the project, there is a build rule that builds `foo_fuzzer.a`,
Create ideal_integration.md 2016-11-15 18:04:07 +00:00			an archive that contains the fuzzing entry point (`LLVMFuzzerTestOneInput`)
			and all the code it depends on, but not the `main()` function
			`* The build system supports changing the compiler and passing extra compiler`
Update ideal_integration.md 2016-11-17 04:26:11 +00:00			flags so that the build command for a `foo_fuzzer.a` looks similar to this:

			```
			`CC="clang $FUZZER_FLAGS" CXX="clang++ $FUZZER_FLAGS" make_or_whatever_other_command foo_fuzzer.a`
			```
Create ideal_integration.md 2016-11-15 18:04:07 +00:00
Update ideal_integration.md 2016-11-16 16:59:35 +00:00			`In this case, linking the target with e.g. libFuzzer will look like "clang++ foo_fuzzer.a libFuzzer.a".`
			`This will allow to have minimal OSS-Fuzz-specific configuration and thus be more robust.`
Update ideal_integration.md 2016-11-17 04:26:11 +00:00
			`There is no point in hardcoding the exact compiler flags in the build system because they`
			`a) may change and b) are different depending on the fuzzing target and the sanitizer being used.`

Update ideal_integration.md 2016-11-18 23:25:28 +00:00			`## Not a project member?`

Update ideal_integration.md 2016-11-18 23:26:02 +00:00			`If you are a member of the project you want to fuzz, most of the steps above are simple.`
Update ideal_integration.md 2016-11-18 23:30:46 +00:00			`However in some cases someone outside the project team may want to fuzz the code`
			`and the project maintainers are not interested in helping.`

Update ideal_integration.md 2016-11-19 00:04:50 +00:00			`In such cases we can host the fuzz targets, dictionaries, etc in this`
			`repository and mention them in the Dockerfile.`
Update ideal_integration.md 2016-11-18 23:30:46 +00:00			`Examples: [libxml2](../targets/libxml2), [c-ares](../targets/c-ares), [expat](../targets/expat).`
fixed a handful of typos (#91) 2016-11-19 02:54:10 +00:00			`This is far from ideal because the fuzz targets will not be continuously tested`
Update ideal_integration.md 2016-11-18 23:33:16 +00:00			`and hence may quickly bitrot.`
Update ideal_integration.md 2016-11-18 23:30:46 +00:00
Update ideal_integration.md 2016-11-18 23:33:16 +00:00			`If you are not a project maintainer we may not be able to CC you to security bugs found by OSS-Fuzz.`