diff --git a/docs/images/llm_framework.png b/docs/images/llm_framework.png new file mode 100644 index 000000000..38baec505 Binary files /dev/null and b/docs/images/llm_framework.png differ diff --git a/docs/images/tinyxml2_examples.png b/docs/images/tinyxml2_examples.png new file mode 100644 index 000000000..f817b73b2 Binary files /dev/null and b/docs/images/tinyxml2_examples.png differ diff --git a/docs/research/target_generation.md b/docs/research/target_generation.md new file mode 100644 index 000000000..d350733fd --- /dev/null +++ b/docs/research/target_generation.md @@ -0,0 +1,1727 @@ +--- +layout: default +nav_exclude: true +permalink: /research/llms/target_generation/ +--- +# Fuzz target generation using LLMs + +# Background + +[OSS-Fuzz](http://github.com/google/oss-fuzz) performs continuous fuzzing of [1000+](https://github.com/google/oss-fuzz/tree/master/projects) open source projects across most major languages. To integrate a new project, a human typically analyzes the attack surface of a library and writes [fuzz targets](https://github.com/google/fuzzing/blob/master/docs/glossary.md#fuzz-target) (also called fuzzing harnesses ) to exercise the relevant code. Linked with a [fuzzing engine](https://github.com/google/fuzzing/blob/master/docs/glossary.md#fuzzing-engine) (e.g. libFuzzer, AFL, Centipede), this enables coverage-guided fuzzing for all OSS-Fuzz projects. Depending on the complexity of the project, writing fuzz targets typically requires several hours of manual work and sufficient background knowledge of the project. + +Additionally, the main challenge for most integrated OSS-Fuzz projects is ensuring high code coverage. Most OSS-Fuzz projects have fairly low runtime coverage ([~30%](http://introspector.oss-fuzz.com/)) despite millions of hours of CPU time. This means we are not finding vulnerabilities in approximately 70% of each project that we’re fuzzing. Our [preliminary research](https://dl.acm.org/doi/abs/10.1145/3605157.3605177) found that many[ fuzz blockers](https://github.com/ossf/fuzz-introspector/blob/main/doc/Glossary.md#fuzz-blockers) (as determined by [FuzzIntrospector](https://introspector.oss-fuzz.com/)) are because of deficiencies in existing targets, rather than deficiencies in fuzzing engines. + +Generating fuzz targets via LLMs can reduce the manual effort required to more thoroughly fuzz existing projects in OSS-Fuzz as well as integrating new projects into OSS-Fuzz. + + +# Goals + +Our ideal end state of this research is to enable two use cases for OSS-Fuzz projects leveraging LLMs: + + +1. Completely automatic fuzz target generation (or modification of existing targets) for existing OSS-Fuzz projects to unblock fuzz blockers and increase project code coverage (and bugs found) for free. +2. Completely automatic fuzz target generation for completely new OSS-Fuzz projects. This is much more challenging than 1, and is an extension of it. + +Our current experiments focus on the first use case for C/C++ projects. This report serves as a preliminary investigation into how effective LLMs are for this use case. More detailed results and the experimentation framework for our research will be published at a later date. + + +# Experiment framework + +To discover whether an LLM could successfully write new fuzz targets, we built an evaluation framework that connects OSS-Fuzz to Google’s LLMs, conducts the experiment, and evaluates the results. The steps look like this: + + + +![experiment framework]({{ site.baseurl }}/images/llm_framework.png "image_tooltip") + + + + +1. OSS-Fuzz’s [Fuzz Introspector tool](http://introspector.oss-fuzz.com/) identifies an under-fuzzed, high-potential, portion of the sample project’s code and passes the code to the evaluation framework. +2. The evaluation framework creates a prompt that the LLM will use to write the new fuzz target. The prompt includes project specific information. +3. The evaluation framework takes the fuzz target generated by the LLM and runs the new target. +4. The evaluation framework observes the run for any change in code coverage. +5. In the event that the fuzz target fails to compile, the evaluation framework prompts the LLM to write a revised fuzz target that addresses the compilation errors. + + +## 1. Identifying high potential portions of the project’s code + +We leverage [Fuzz Introspector](https://introspector.oss-fuzz.com/) ([example JSON endpoint](https://introspector.oss-fuzz.com/api/far-reach-but-low-coverage?project=tinyxml2)) to provide us with a list of functions with low runtime coverage (but high potential to reach more code coverage). These are turned into benchmark YAML files, which consist of an OSS-Fuzz project, and a list of function signatures to generate new targets for. + +We will start with a small set of benchmarks, gradually scaling this to larger, automated sets of benchmarks taken from all of OSS-Fuzz as we improve the function selection and prompt generation process. + +Example benchmark (YAML): + +```yaml +functions: +- XML_Parser XMLCALL XML_ExternalEntityParserCreate(XML_Parser oldParser, const XML_Char + *context, const XML_Char *encodingName) +- XML_Parser XMLCALL XML_ParserCreateNS(const XML_Char *encodingName, XML_Char nsSep) +- XML_Bool XMLCALL XML_ParserReset(XML_Parser parser, const XML_Char *encodingName) +- static enum XML_Error PTRCALL externalParEntInitProcessor(XML_Parser parser, const + char *s, const char *end, const char **nextPtr) +project: expat +target_path: /src/expat/expat/fuzz/xml_parse_fuzzer.c +target_name: xml_parse_fuzzer_UTF-8 +``` + +## 2. Prompt generation + +We dynamically generate a prompt based on a template ([example](https://storage.googleapis.com/oss-fuzz-llm-targets-public/jsoncpp-json-value-removeindex/prompts.txt)). + +As part of our experimentation, we tried various different prompt approaches. So far, the best results have come from including: + + + +* One example of an existing function signature and fuzz target from the project under test, formatted into problem and solution structure. +* Two examples from other projects in OSS-Fuzz, formatted in the same way. +* Examples of how to leverage [FuzzedDataProvider](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider) to generate inputs for function arguments. +* A priming that gives the task context. +* Examples of code antipatterns to avoid. + +The dynamically generated sections today include examples of existing fuzz targets from both other projects on OSS-Fuzz as well as one example from the project under test. We have other unexplored ideas including more structured information about the function under test, such as: + + + +* Relevant data structure definitions +* Function implementations of the function under test and related functions +* Usages of the function under test and related functions + + +## 3. Build and run + +We leverage the OSS-Fuzz build infrastructure to build new targets by replacing an existing target’s source code with the newly generated target source code. + +OSS-Fuzz projects often have strict compiler flags on by default. To make compilation easier, we also implemented a [compiler wrapper](https://github.com/google/oss-fuzz/blob/d78275b4e2e17d3d8f12b99db2b51de4b114edb3/infra/base-images/base-builder/jcc.go) that: + + + +* Turns off compiler warnings to prevent trivial issues such as missing pointer casts from blocking compilation. +* Re-compile targets as C++ (to leverage [FuzzedDataProvider](https://github.com/google/fuzzing/blob/master/docs/split-inputs.md#fuzzed-data-provider)). + + +## 4. Measuring quality of generated targets + +An important part of our research is to define metrics to measure the quality of generated targets. + +These metrics are: + +* Syntax correctness and project consistency. This is measured by its compilation result. For example: whether it compiles successfully, does it call functions in the project correctly without hallucination. +* Whether it crashes instantly or within the fuzz target. This often means that there is some miscalled API and the crashes are likely to be false positives. +* **New code coverage**. This is measured by the new lines it covered compared to all existing targets in OSS-Fuzz for the same project. + +All of these metrics can be automatically computed for a given generated target. + + +## 5. LLM Code Fixer + +The fuzz targets generated by LLM often contain various trivial defects, which can be fixed by a separate LLM query. + +The prompt of the code fixing query is structured as follows, where the raw code and error are respectively replaced with the fuzz target source code generated by the LLM and the build error messages extracted from pages of build logs: + +```` +Given the following code and its build error message, fix the code without affecting its functionality. +First explain the reason, then output the whole fixed code. +If a function is missing, fix it by including the related libraries. +Code: +``` +{raw_code} +``` +Build error message: +``` +{error} +``` +Fixed code: +```` + +Several rounds of code fixing queries are required for some cases. For example, when multiple defects incurs several error messages, sometimes LLM tends to only fix one of them at a time. Similarly, new defects may be introduced during code fixing. In these cases, we found iteratively querying LLM with the same prompt structure will gradually fix all errors. + +LLM often proposes several responses for each query, we prefer the one with the longest code. This is an implementation decision to avoid a quadratically increasing number of targets to build (e.g. the LLMs could propose 4 new targets across N iterations) and to avoid the LLM deleting the function code to fix build failures. + +Additionally, we also check that the generated target includes a call to the requested function to test. If it does not, this is surfaced as an error to the LLM. + + +### Example + +[Prompt](https://storage.googleapis.com/oss-fuzz-llm-targets-public/tinyxml2-tinyxml2-xmldocument-print/fixes/04-F4/prompt.txt): Incorrect target with missing arguments passed to target function. + +[After fix](https://storage.googleapis.com/oss-fuzz-llm-targets-public/tinyxml2-tinyxml2-xmldocument-print/fixes/04-F4/01.cpp): Correct function argument added. + +# Results + +Initially, getting any compilable output was a challenge. We were able to improve this via prompt engineering and our compiler wrapper to having [14/31 tested OSS-Fuzz projects](https://storage.googleapis.com/oss-fuzz-llm-targets-public/index.html?prefix=benchmarks/) successfully compile new targets and increase coverage. The successful examples and prompts are published [here](https://storage.googleapis.com/oss-fuzz-llm-targets-public/index.html). + +We see a wide range of coverage improvements from 0-31% code coverage increases. + +The top coverage increases, aggregated across all benchmarks per OSS-Fuzz project are: + + + + + + + + + + + + + + + + + + + + + + + + + + + +
tinyxml2 + 31% +
cjson + 6% +
expat + 4% +
libplist + 4% +
libxml2 + 1% +
elfutils + 1% +
+ + +The best result we’ve had is with the TinyXML2 project, where we managed to increase fuzz coverage from [38%](https://storage.googleapis.com/oss-fuzz-coverage/tinyxml2/reports/20230801/linux/report.html) line coverage to [69%](https://storage.googleapis.com/oss-fuzz-coverage/llm-results/tinyxml2/linux/report.html) line coverage without any interventions. + + + + +![alt_text]({{ site.baseurl }}/images/tinyxml2_examples.png "image_tooltip") + + +Additionally, we targeted OpenSSL from the perspective of discovering past vulnerabilities that were not found due to lack of fuzzing coverage. We were able to replicate [a similar fuzz target](https://storage.googleapis.com/oss-fuzz-llm-targets-public/openssl-ossl_punycode_decode/targets/15.c) that rediscovered [CVE-2022-3602](https://nvd.nist.gov/vuln/detail/CVE-2022-3602). + + +# Future work + +We’ve seen very promising early results in this space and will continue our research. + + +## 1. Continue research + +There are a number of areas we’d like to further research on: + + + +* Expand benchmarks to all of OSS-Fuzz. We’d like to expand the set of benchmarks to cover all of OSS-Fuzz. +* Continued prompt engineering and experimentation with project-specific context, such as more structured context (e.g. structure definitions, implementations) around the relevant function to test. +* Model fine-tuning +* Expand to other languages beyond C/C++ +* Expand research to completely new projects with no existing OSS-Fuzz integration. + + +## 2. Open source evaluation framework + +We plan to open source the evaluation framework we’ve built to help test arbitrary auto-fuzz target generation capabilities. We hope that OSS-Fuzz can serve as a valuable benchmarking platform for researchers in this space. + + +## 3. OSS-Fuzz integration + +Ultimately, the goal is to integrate the results of this research into OSS-Fuzz, to provide: + + + +* Free coverage increases for existing projects +* Automated onboarding of new projects, and tools to help maintainers write manual fuzz targets. + + +# Appendix + + +## Successful benchmark results + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Project + Function + Prompt + Fix iterations + Final fuzz targets + Build rate + Max Coverage + Max Line coverage diff + Reports +
tinyxml2 + tinyxml2-xmldocument-print + Prompt + Fixes + Targets +

+50

+ +

+29.74

+ +

+11.16

+ +
Reports +
tinyxml2 + tinyxml2-xmldocument-deepcopy + Prompt + Fixes + Targets +

+25

+ +

+26.8

+ +

+4.45

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-setattribute + Prompt + Fixes + Targets +

+75

+ +

+26.08

+ +

+3.77

+ +
Reports +
libplist + plist_print + Prompt + Fixes + Targets +

+25

+ +

+12.88

+ +

+3.42

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-doubletext + Prompt + Fixes + Targets +

+62.5

+ +

+25.61

+ +

+3.28

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-booltext + Prompt + Fixes + Targets +

+87.5

+ +

+26.18

+ +

+2.9

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-insertnewunknown + Prompt + Fixes + Targets +

+25

+ +

+25.67

+ +

+2.64

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-int64text + Prompt + Fixes + Targets +

+87.5

+ +

+25.91

+ +

+2.64

+ +
Reports +
cjson + cjson_compare + Prompt + Fixes + Targets +

+75

+ +

+29.68

+ +

+2.47

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-floattext + Prompt + Fixes + Targets +

+62.5

+ +

+25.09

+ +

+2.45

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-inttext + Prompt + Fixes + Targets +

+75

+ +

+26.2

+ +

+2.41

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-unsigned64text + Prompt + Fixes + Targets +

+37.5

+ +

+25.74

+ +

+2.22

+ +
Reports +
tinyobjloader + tinyobj-objreader-parsefromfile + Prompt + Fixes + Targets +

+37.5

+ +

+5.7

+ +

+2.16

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-unsignedtext + Prompt + Fixes + Targets +

+50

+ +

+25.53

+ +

+2.15

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-shallowclone + Prompt + Fixes + Targets +

+50

+ +

+25.03

+ +

+2.07

+ +
Reports +
cjson + cjson_replaceiteminobject + Prompt + Fixes + Targets +

+37.5

+ +

+27.56

+ +

+1.98

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-gettext + Prompt + Fixes + Targets +

+37.5

+ +

+25.42

+ +

+1.96

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-shallowequal + Prompt + Fixes + Targets +

+62.5

+ +

+25.58

+ +

+1.96

+ +
Reports +
cjson + cjson_duplicate + Prompt + Fixes + Targets +

+62.5

+ +

+27.4

+ +

+1.89

+ +
Reports +
expat + xml_externalentityparsercreate + Prompt + Fixes + Targets +

+12.5

+ +

+1.25

+ +

+1.88

+ +
Reports +
cjson + cjson_replaceiteminobjectcasesensitive + Prompt + Fixes + Targets +

+87.5

+ +

+25.54

+ +

+1.85

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-insertnewcomment + Prompt + Fixes + Targets +

+62.5

+ +

+25.61

+ +

+1.85

+ +
Reports +
expat + xml_parsercreatens + Prompt + Fixes + Targets +

+12.5

+ +

+45.6

+ +

+1.84

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-deleteattribute + Prompt + Fixes + Targets +

+50

+ +

+25.73

+ +

+1.7

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-insertnewdeclaration + Prompt + Fixes + Targets +

+50

+ +

+25

+ +

+1.7

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-insertnewchildelement + Prompt + Fixes + Targets +

+50

+ +

+24.88

+ +

+1.51

+ +
Reports +
tinyxml2 + tinyxml2-xmlnode-previoussiblingelement + Prompt + Fixes + Targets +

+87.5

+ +

+26.08

+ +

+1.43

+ +
Reports +
tinyobjloader + tinyobj-loadobj + Prompt + Fixes + Targets +

+25

+ +

+4.33

+ +

+1.35

+ +
Reports +
tinyobjloader + tinyobj-material_t-material_t + Prompt + Fixes + Targets +

+25

+ +

+23.91

+ +

+1.35

+ +
Reports +
tinyxml2 + tinyxml2-xmlelement-insertnewtext + Prompt + Fixes + Targets +

+62.5

+ +

+24.42

+ +

+1.32

+ +
Reports +
elfutils + dwfl_module_relocate_address + Prompt + Fixes + Targets +

+87.5

+ +

+7.81

+ +

+1.1

+ +
Reports +
libxml2 + xmlschemavalidatefile + Prompt + Fixes + Targets +

+50

+ +

+4.18

+ +

+0.93

+ +
Reports +
tinyxml2 + tinyxml2-xmldocument-loadfile + Prompt + Fixes + Targets +

+25

+ +

+2.17

+ +

+0.9

+ +
Reports +
speex + ogg_stream_packetin + Prompt + Fixes + Targets +

+25

+ +

+8.06

+ +

+0.55

+ +
Reports +
libxml2 + xmltextreadersetschema + Prompt + Fixes + Targets +

+37.5

+ +

+1.44

+ +

+0.48

+ +
Reports +
libxml2 + xmltextreaderschemavalidate + Prompt + Fixes + Targets +

+37.5

+ +

+4.03

+ +

+0.36

+ +
Reports +
libplist + plist_dict_merge + Prompt + Fixes + Targets +

+12.5

+ +

+0.8

+ +

+0.35

+ +
Reports +
libxml2 + xmltextreaderschemavalidatectxt + Prompt + Fixes + Targets +

+25

+ +

+1.19

+ +

+0.33

+ +
Reports +
cjson + cjson_printpreallocated + Prompt + Fixes + Targets +

+75

+ +

+26.44

+ +

+0.22

+ +
Reports +
libucl + ucl_parser_insert_chunk + Prompt + Fixes + Targets +

+25

+ +

+10.5

+ +

+0.22

+ +
Reports +
libucl + ucl_object_compare + Prompt + Fixes + Targets +

+12.5

+ +

+1.35

+ +

+0.21

+ +
Reports +
elfutils + dwarf_getlocations + Prompt + Fixes + Targets +

+62.5

+ +

+7.22

+ +

+0.2

+ +
Reports +
libucl + ucl_parser_add_fd_priority + Prompt + Fixes + Targets +

+71.43

+ +

+19.61

+ +

+0.18

+ +
Reports +
jsoncpp + json-value-resize + Prompt + Fixes + Targets +

+37.5

+ +

+2.23

+ +

+0.17

+ +
Reports +
mosquitto + mosquitto_topic_matches_sub2 + Prompt + Fixes + Targets +

+87.5

+ +

+28.28

+ +

+0.16

+ +
Reports +
xvid + xvid_encore + Prompt + Fixes + Targets +

+31.25

+ +

+0.22

+ +

+0.16

+ +
Reports +
cjson + cjson_parse + Prompt + Fixes + Targets +

+75

+ +

+25.72

+ +

+0.13

+ +
Reports +
cjson + cjson_parsewithlength + Prompt + Fixes + Targets +

+62.5

+ +

+25.72

+ +

+0.13

+ +
Reports +
jsoncpp + json-value-removeindex + Prompt + Fixes + Targets +

+25

+ +

+5.11

+ +

+0.13

+ +
Reports +
libucl + ucl_object_merge + Prompt + Fixes + Targets +

+25

+ +

+0.65

+ +

+0.12

+ +
Reports +
speex + ogg_stream_pageout_fill + Prompt + Fixes + Targets +

+6.25

+ +

+0

+ +

+0.07

+ +
Reports +
libsndfile + sf_command + Prompt + Fixes + Targets +

+25

+ +

+3.41

+ +

+0.06

+ +
Reports +
mosquitto + mosquitto_topic_matches_sub + Prompt + Fixes + Targets +

+62.5

+ +

+4.03

+ +

+0.05

+ +
Reports +
libsndfile + sf_format_check + Prompt + Fixes + Targets +

+12.5

+ +

+0.1

+ +

+0.04

+ +
Reports +
libucl + ucl_object_replace_key + Prompt + Fixes + Targets +

+50

+ +

+7.14

+ +

+0.04

+ +
Reports +
libdwarf + dwarf_find_die_given_sig8 + Prompt + Fixes + Targets +

+37.5

+ +

+11.62

+ +

+0.01

+ +
Reports +
+ + diff --git a/research/LLM/target_generation.md b/research/LLM/target_generation.md deleted file mode 100644 index ef9fb20a7..000000000 --- a/research/LLM/target_generation.md +++ /dev/null @@ -1 +0,0 @@ -This is a placeholder.