2019-08-07 14:37:16 +00:00
|
|
|
---
|
|
|
|
layout: default
|
|
|
|
title: Corpora
|
|
|
|
parent: Advanced topics
|
|
|
|
nav_order: 3
|
2019-08-22 18:33:39 +00:00
|
|
|
permalink: /advanced-topics/corpora/
|
2019-08-07 14:37:16 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
# Accessing Corpora
|
2019-09-05 20:21:13 +00:00
|
|
|
{: .no_toc}
|
2019-08-07 14:37:16 +00:00
|
|
|
|
2019-09-05 20:21:13 +00:00
|
|
|
If you want to access the corpora that we are using for your fuzz targets
|
|
|
|
(synthesized by the fuzzing engines), follow these steps.
|
2019-08-07 14:37:16 +00:00
|
|
|
|
|
|
|
- TOC
|
|
|
|
{:toc}
|
|
|
|
---
|
|
|
|
|
|
|
|
## Obtain access
|
|
|
|
|
2019-09-05 20:21:13 +00:00
|
|
|
To get access to a project's corpora, you must be listed as the
|
|
|
|
primary contact or as an auto cc in the project's `project.yaml` file, as described
|
2019-08-07 14:37:16 +00:00
|
|
|
in the [New Project Guide]({{ site.baseurl }}/getting-started/new-project-guide/#projectyaml).
|
|
|
|
If you don't do this, most of the links below won't work.
|
|
|
|
|
|
|
|
## Install Google Cloud SDK
|
|
|
|
|
2019-09-05 20:21:13 +00:00
|
|
|
The corpora for fuzz targets are stored on [Google Cloud
|
|
|
|
Storage](https://cloud.google.com/storage/). To access them, you need to
|
|
|
|
[install the gsutil
|
|
|
|
tool](https://cloud.google.com/storage/docs/gsutil_install), which is part of
|
|
|
|
the Google Cloud SDK. Follow the instructions on the installation page to
|
|
|
|
login with the Google account listed in your project's `project.yaml` file.
|
2019-08-07 14:37:16 +00:00
|
|
|
|
|
|
|
## Viewing the corpus for a fuzz target
|
|
|
|
|
|
|
|
The fuzzer statistics page for your project on
|
2019-08-21 22:10:15 +00:00
|
|
|
[ClusterFuzz]({{ site.baseurl }}/further-reading/clusterfuzz)
|
2019-09-05 20:21:13 +00:00
|
|
|
contains a link to the Google Cloud console for your corpus under the
|
|
|
|
**corpus_size** column. Click the link to browse and download individual test inputs in the
|
|
|
|
corpus.
|
2019-08-07 14:37:16 +00:00
|
|
|
|
|
|
|
![viewing_corpus](https://raw.githubusercontent.com/google/oss-fuzz/master/docs/images/viewing_corpus.png)
|
|
|
|
|
|
|
|
## Downloading the corpus
|
|
|
|
|
2019-09-05 20:21:13 +00:00
|
|
|
If you want to download the entire corpus, click the link in the **corpus_size** column, then
|
|
|
|
copy the **Buckets** path at the top of the page:
|
2019-08-07 14:37:16 +00:00
|
|
|
|
|
|
|
![corpus_path](https://raw.githubusercontent.com/google/oss-fuzz/master/docs/images/corpus_path.png)
|
|
|
|
|
2019-09-05 20:21:13 +00:00
|
|
|
Copy the corpus to a directory on your
|
|
|
|
machine by running the following command:
|
2019-08-07 14:37:16 +00:00
|
|
|
|
|
|
|
```bash
|
2019-09-06 15:44:15 +00:00
|
|
|
$ gsutil -m cp -r gs://<bucket_path> <local_directory>
|
2019-08-07 14:37:16 +00:00
|
|
|
```
|
2019-09-05 20:21:13 +00:00
|
|
|
Using the expat example above, this would be:
|
2019-08-07 14:37:16 +00:00
|
|
|
|
|
|
|
```bash
|
2019-09-06 15:44:15 +00:00
|
|
|
$ gsutil -m cp -r \
|
2019-08-15 22:07:23 +00:00
|
|
|
gs://expat-corpus.clusterfuzz-external.appspot.com/libFuzzer/expat_parse_fuzzer \
|
|
|
|
<local_directory>
|
2019-08-07 14:37:16 +00:00
|
|
|
```
|
|
|
|
|
|
|
|
## Corpus backups
|
|
|
|
|
2019-09-05 20:21:13 +00:00
|
|
|
We keep daily zipped backups of your corpora. These can be accessed from the
|
|
|
|
**corpus_backup** column of the fuzzer statistics page. Downloading these can
|
2019-09-06 15:44:15 +00:00
|
|
|
be significantly faster than running `gsutil -m cp -r` on the corpus bucket.
|