Bedtools

## BED format

Format definition

The BED format is a text file which defines regions within a genomic region. It has three required columns, and can support additional data columns.
The three mandatory columns are chromosome (or contig/scaffold/assembly as appropriate):

chr1  213941196  213942363
chr1  213942363  213943530
chr1  213943530  213944697
chr2  158364697  158365864
chr2  158365864  158367031
chr3  127477031  127478198
chr3  127478198  127479365
chr3  127479365  127480532
chr3  127480532  127481699


Additional columns may include information such as interval name, strand, and settings affecting display in genome browsers.

## Bedtools

Bedtools manual

Bedtools is a program designed for working with bed files. It can retrieve FASTA sequence from a reference based on intervals in a bed file, or it can perform various operations (such as merging, finding overlaps, etc) on multiple bed files.

—hist option creates a histogram of coverage (data only, must be graphed with another tool), which is useful as a summary. It doesn't give position or interval specific information, just the percentage of the whole bed file covered at each depth of read (see example).

In BAM files, does each read count as a feature? Could then use with bedtools genomecov to get depth of coverage!

intersect - returns the overlapping regions between two bed files (i.e. the regions that are covered in both files)

$bedtools intersect -a <file1> -b <file2>  Works on BAM files, returning a BAM of the intersection region? Can force BED output using -bed (e.g. to check coverage in another tool) I think this means BAM vs. BED and also BAM vs. BAM intersections. Also uses VCF, GFF… (see docs for full support). Possibly useful if you have a vcf of variants and quickly want to see if they're covered? the -b option supports multiple files. complement - returns the regions of a reference not covered by a bed file. $ bedtools complement -i <file.bed> -g <ref.txt>


The reference should be a list of the expected chromosomes and their length - NOT a bed file with start and end coordinates?

jaccard - a simple statistical measure of the similarity between two files:

(1)
\begin{align} jaccard = {length(intersection) \over length(union) - length(intersection)} \end{align}

So a jaccard score of 1.0 represents a complete overlap, while 0.0 represents no overlap.

## list of bedtools tools

Full details of each tool are available from the online manual, or by running \$ bedtools <tool name>

annotate
bamtobed
bamtofastq
bed12tobed6
bedpetobam
bedtobam
closest
cluster
complement
coverage
expand
flank
fisher
genomecov
getfasta
groupby
igv
intersect
jaccard
makewindows
map
merge
multicov
multiinter
nuc
overlap
pairtobed
pairtopair
random
reldist
shuffle
slop
sort
subtract
tag
unionbedg
window

page revision: 15, last edited: 27 May 2015 12:55