Package 'valr' reference manual

Title:	Genome Interval Arithmetic
Description:	Read and manipulate genome intervals and signals. Provides functionality similar to command-line tool suites within R, enabling interactive analysis and visualization of genome-scale data. Riemondy et al. (2017) <doi:10.12688/f1000research.11997.1>.
Authors:	Jay Hesselberth [aut] , Kent Riemondy [aut, cre] , RNA Bioscience Initiative [fnd, cph] (https://ror.org/03wmf1y16)
Maintainer:	Kent Riemondy <[email protected]>
License:	MIT + file LICENSE
Version:	0.8.3.9000
Built:	2025-03-01 14:29:08 UTC
Source:	https://github.com/rnabioco/valr

Compute absolute distances between intervals.

Description

Computes the absolute distance between the midpoint of each x interval and the midpoints of each closest y interval.

Usage

bed_absdist(x, y, genome)
bed_absdist(x, y, genome)

Arguments

`x`	ivl_df
`y`	ivl_df
`genome`	genome_df

Details

Absolute distances are scaled by the inter-reference gap for the chromosome as follows. For Q query points and R reference points on a chromosome, scale the distance for each query point i to the closest reference point by the inter-reference gap for each chromosome. If an x interval has no matching y chromosome, .absdist is NA.

$d_i(x,y) = min_k(|q_i - r_k|)\frac{R}{Length\ of\ chromosome}$

Both absolute and scaled distances are reported as .absdist and .absdist_scaled.

Interval statistics can be used in combination with dplyr::group_by() and dplyr::do() to calculate statistics for subsets of data. See vignette('interval-stats') for examples.

Value

ivl_df with .absdist and .absdist_scaled columns.

Examples

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)

bed_absdist(x, y, genome)

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)

bed_absdist(x, y, genome)

Identify closest intervals.

Description

Identify closest intervals.

Usage

bed_closest(x, y, overlap = TRUE, suffix = c(".x", ".y"))
bed_closest(x, y, overlap = TRUE, suffix = c(".x", ".y"))

Arguments

`x`	ivl_df
`y`	ivl_df
`overlap`	report overlapping intervals
`suffix`	colname suffixes in output

Details

input tbls are grouped by chrom by default, and additional groups can be added using dplyr::group_by(). For example, grouping by strand will constrain analyses to the same strand. To compare opposing strands across two tbls, strands on the y tbl can first be inverted using flip_strands().

Value

ivl_df with additional columns:

.overlap amount of overlap with overlapping interval. Non-overlapping or adjacent intervals have an overlap of 0. .overlap will not be included in the output if overlap = FALSE.
.dist distance to closest interval. Negative distances denote upstream intervals. Book-ended intervals have a distance of 1.

Note

For each interval in x bed_closest() returns overlapping intervals from y and the closest non-intersecting y interval. Setting overlap = FALSE will report the closest non-intersecting y intervals, ignoring any overlapping y intervals.

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    125
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25,     50,
  "chr1", 140,    175
)

bed_glyph(bed_closest(x, y))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 500,    600,
  "chr2", 5000,   6000
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    200,
  "chr1", 150,    200,
  "chr1", 550,    580,
  "chr2", 7000,   8500
)

bed_closest(x, y)

bed_closest(x, y, overlap = FALSE)

# Report distance based on strand
x <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 10, 20, "a", 1, "-"
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 8, 9, "b", 1, "+",
  "chr1", 21, 22, "b", 1, "-"
)

res <- bed_closest(x, y)

# convert distance based on strand
res$.dist_strand <- ifelse(res$strand.x == "+", res$.dist, -(res$.dist))
res

# report absolute distances
res$.abs_dist <- abs(res$.dist)
res

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    125
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25,     50,
  "chr1", 140,    175
)

bed_glyph(bed_closest(x, y))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 500,    600,
  "chr2", 5000,   6000
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    200,
  "chr1", 150,    200,
  "chr1", 550,    580,
  "chr2", 7000,   8500
)

bed_closest(x, y)

bed_closest(x, y, overlap = FALSE)

# Report distance based on strand
x <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 10, 20, "a", 1, "-"
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 8, 9, "b", 1, "+",
  "chr1", 21, 22, "b", 1, "-"
)

res <- bed_closest(x, y)

# convert distance based on strand
res$.dist_strand <- ifelse(res$strand.x == "+", res$.dist, -(res$.dist))
res

# report absolute distances
res$.abs_dist <- abs(res$.dist)
res

Cluster neighboring intervals.

Description

The output .id column can be used in downstream grouping operations. Default max_dist = 0 means that both overlapping and book-ended intervals will be clustered.

Usage

bed_cluster(x, max_dist = 0)
bed_cluster(x, max_dist = 0)

Arguments

`x`	ivl_df
`max_dist`	maximum distance between clustered intervals.

Details

Value

ivl_df with .id column specifying sets of clustered intervals.

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    200,
  "chr1", 180,    250,
  "chr1", 250,    500,
  "chr1", 501,    1000,
  "chr2", 1,      100,
  "chr2", 150,    200
)

bed_cluster(x)

# glyph illustrating clustering of overlapping and book-ended intervals
x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1,      10,
  "chr1", 5,      20,
  "chr1", 30,     40,
  "chr1", 40,     50,
  "chr1", 80,     90
)

bed_glyph(bed_cluster(x), label = ".id")

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    200,
  "chr1", 180,    250,
  "chr1", 250,    500,
  "chr1", 501,    1000,
  "chr2", 1,      100,
  "chr2", 150,    200
)

bed_cluster(x)

# glyph illustrating clustering of overlapping and book-ended intervals
x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1,      10,
  "chr1", 5,      20,
  "chr1", 30,     40,
  "chr1", 40,     50,
  "chr1", 80,     90
)

bed_glyph(bed_cluster(x), label = ".id")

Identify intervals in a genome not covered by a query.

Description

Identify intervals in a genome not covered by a query.

Usage

bed_complement(x, genome)
bed_complement(x, genome)

Arguments

`x`	ivl_df
`genome`	ivl_df

Value

ivl_df

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 0,      10,
  "chr1", 75,     100
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 200
)

bed_glyph(bed_complement(x, genome))

genome <- tibble::tribble(
  ~chrom,  ~size,
  "chr1",  500,
  "chr2",  600,
  "chr3",  800
)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    300,
  "chr1", 200,    400,
  "chr2", 0,      100,
  "chr2", 200,    400,
  "chr3", 500,    600
)

# intervals not covered by x
bed_complement(x, genome)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 0,      10,
  "chr1", 75,     100
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 200
)

bed_glyph(bed_complement(x, genome))

genome <- tibble::tribble(
  ~chrom,  ~size,
  "chr1",  500,
  "chr2",  600,
  "chr3",  800
)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    300,
  "chr1", 200,    400,
  "chr2", 0,      100,
  "chr2", 200,    400,
  "chr3", 500,    600
)

# intervals not covered by x
bed_complement(x, genome)

Compute coverage of intervals.

Description

Compute coverage of intervals.

Usage

bed_coverage(x, y, ...)
bed_coverage(x, y, ...)

Arguments

`x`	ivl_df
`y`	ivl_df
`...`	extra arguments (not used)

Details

Value

ivl_df with the following additional columns:

.ints number of x intersections
.cov per-base coverage of x intervals
.len total length of y intervals covered by x intervals
.frac .len scaled by the number of y intervals

Note

Book-ended intervals are included in coverage calculations.

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~strand,
  "chr1", 100,    500,  "+",
  "chr2", 200,    400,  "+",
  "chr2", 300,    500,  "-",
  "chr2", 800,    900,  "-"
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value, ~strand,
  "chr1", 150,    400,  100,    "+",
  "chr1", 500,    550,  100,    "+",
  "chr2", 230,    430,  200,    "-",
  "chr2", 350,    430,  300,    "-"
)

bed_coverage(x, y)

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~strand,
  "chr1", 100,    500,  "+",
  "chr2", 200,    400,  "+",
  "chr2", 300,    500,  "-",
  "chr2", 800,    900,  "-"
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value, ~strand,
  "chr1", 150,    400,  100,    "+",
  "chr1", 500,    550,  100,    "+",
  "chr2", 230,    430,  200,    "-",
  "chr2", 350,    430,  300,    "-"
)

bed_coverage(x, y)

Fisher's test to measure overlap between two sets of intervals.

Description

Calculate Fisher's test on number of intervals that are shared and unique between two sets of x and y intervals.

Usage

bed_fisher(x, y, genome)
bed_fisher(x, y, genome)

Arguments

`x`	ivl_df
`y`	ivl_df
`genome`	genome_df

Details

Interval statistics can be used in combination with dplyr::group_by() and dplyr::do() to calculate statistics for subsets of data. See vignette('interval-stats') for examples.

Value

ivl_df

Examples

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, n = 1e4, seed = 1010486)
y <- bed_random(genome, n = 1e4, seed = 9203911)

bed_fisher(x, y, genome)

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, n = 1e4, seed = 1010486)
y <- bed_random(genome, n = 1e4, seed = 9203911)

bed_fisher(x, y, genome)

Create flanking intervals from input intervals.

Description

Create flanking intervals from input intervals.

Usage

bed_flank(
  x,
  genome,
  both = 0,
  left = 0,
  right = 0,
  fraction = FALSE,
  strand = FALSE,
  trim = FALSE,
  ...
)
bed_flank(
  x,
  genome,
  both = 0,
  left = 0,
  right = 0,
  fraction = FALSE,
  strand = FALSE,
  trim = FALSE,
  ...
)

Arguments

`x`	ivl_df
`genome`	genome_df
`both`	number of bases on both sizes
`left`	number of bases on left side
`right`	number of bases on right side
`fraction`	define flanks based on fraction of interval length
`strand`	define `left` and `right` based on strand
`trim`	adjust coordinates for out-of-bounds intervals
`...`	extra arguments (not used)

Value

ivl_df

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25, 50,
  "chr1", 100, 125
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 130
)

bed_glyph(bed_flank(x, genome, both = 20))

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 500,    1000, ".",   ".",    "+",
  "chr1", 1000,   1500, ".",   ".",    "-"
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 5000
)

bed_flank(x, genome, left = 100)

bed_flank(x, genome, right = 100)

bed_flank(x, genome, both = 100)

bed_flank(x, genome, both = 0.5, fraction = TRUE)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25, 50,
  "chr1", 100, 125
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 130
)

bed_glyph(bed_flank(x, genome, both = 20))

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 500,    1000, ".",   ".",    "+",
  "chr1", 1000,   1500, ".",   ".",    "-"
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 5000
)

bed_flank(x, genome, left = 100)

bed_flank(x, genome, right = 100)

bed_flank(x, genome, both = 100)

bed_flank(x, genome, both = 0.5, fraction = TRUE)

Calculate coverage across a genome

Description

This function is useful for calculating interval coverage across an entire genome.

Usage

bed_genomecov(x, genome, zero_depth = FALSE)
bed_genomecov(x, genome, zero_depth = FALSE)

Arguments

`x`	ivl_df
`genome`	genome_df
`zero_depth`	If TRUE, report intervals with zero depth. Zero depth intervals will be reported with respect to groups.

Details

Value

ivl_df with the an additional column:

.depth depth of interval coverage

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~strand,
  "chr1", 20, 70, "+",
  "chr1", 50, 100, "-",
  "chr1", 200, 250, "+",
  "chr1", 220, 250, "+"
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 500,
  "chr2", 1000
)

bed_genomecov(x, genome)

bed_genomecov(dplyr::group_by(x, strand), genome)

bed_genomecov(dplyr::group_by(x, strand), genome, zero_depth = TRUE)

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~strand,
  "chr1", 20, 70, "+",
  "chr1", 50, 100, "-",
  "chr1", 200, 250, "+",
  "chr1", 220, 250, "+"
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 500,
  "chr2", 1000
)

bed_genomecov(x, genome)

bed_genomecov(dplyr::group_by(x, strand), genome)

bed_genomecov(dplyr::group_by(x, strand), genome, zero_depth = TRUE)

Create example glyphs for valr functions.

Description

Used to illustrate the output of valr functions with small examples.

Usage

bed_glyph(expr, label = NULL)
bed_glyph(expr, label = NULL)

Arguments

`expr`	expression to evaluate
`label`	column name to use for label values. should be present in the result of the call.

Value

ggplot2::ggplot()

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25,     50,
  "chr1", 100,    125
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 30, 75, 50
)

bed_glyph(bed_intersect(x, y))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 30,     75,
  "chr1", 50,     90,
  "chr1", 91,     120
)

bed_glyph(bed_merge(x))

bed_glyph(bed_cluster(x), label = ".id")

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25,     50,
  "chr1", 100,    125
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 30, 75, 50
)

bed_glyph(bed_intersect(x, y))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 30,     75,
  "chr1", 50,     90,
  "chr1", 91,     120
)

bed_glyph(bed_merge(x))

bed_glyph(bed_cluster(x), label = ".id")

Identify intersecting intervals.

Description

Report intersecting intervals from x and y tbls. Book-ended intervals have .overlap values of 0 in the output.

Usage

bed_intersect(x, ..., invert = FALSE, suffix = c(".x", ".y"))
bed_intersect(x, ..., invert = FALSE, suffix = c(".x", ".y"))

Arguments

`x`	ivl_df
`...`	one or more (e.g. a list of) `y` `ivl_df()`s
`invert`	report `x` intervals not in `y`
`suffix`	colname suffixes in output

Details

Value

ivl_df with original columns from x and y suffixed with .x and .y, and a new .overlap column with the extent of overlap for the intersecting intervals.

If multiple y tbls are supplied, the .source contains variable names associated with each interval. All original columns from the y are suffixed with .y in the output.

If ... contains named inputs (i.e ⁠a = y, b = z⁠ or list(a = y, b = z)), then .source will contain supplied names (see examples).

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25, 50,
  "chr1", 100, 125
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 30,     75
)

bed_glyph(bed_intersect(x, y))

bed_glyph(bed_intersect(x, y, invert = TRUE))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    500,
  "chr2", 200,    400,
  "chr2", 300,    500,
  "chr2", 800,    900
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 150,    400,  100,
  "chr1", 500,    550,  100,
  "chr2", 230,    430,  200,
  "chr2", 350,    430,  300
)

bed_intersect(x, y)

bed_intersect(x, y, invert = TRUE)

# start and end of each overlapping interval
res <- bed_intersect(x, y)
dplyr::mutate(res,
  start = pmax(start.x, start.y),
  end = pmin(end.x, end.y)
)

z <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 150,    400,  100,
  "chr1", 500,    550,  100,
  "chr2", 230,    430,  200,
  "chr2", 750,    900,  400
)

bed_intersect(x, y, z)

bed_intersect(x, exons = y, introns = z)

# a list of tbl_intervals can also be passed
bed_intersect(x, list(exons = y, introns = z))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25, 50,
  "chr1", 100, 125
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 30,     75
)

bed_glyph(bed_intersect(x, y))

bed_glyph(bed_intersect(x, y, invert = TRUE))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    500,
  "chr2", 200,    400,
  "chr2", 300,    500,
  "chr2", 800,    900
)

y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 150,    400,  100,
  "chr1", 500,    550,  100,
  "chr2", 230,    430,  200,
  "chr2", 350,    430,  300
)

bed_intersect(x, y)

bed_intersect(x, y, invert = TRUE)

# start and end of each overlapping interval
res <- bed_intersect(x, y)
dplyr::mutate(res,
  start = pmax(start.x, start.y),
  end = pmin(end.x, end.y)
)

z <- tibble::tribble(
  ~chrom, ~start, ~end, ~value,
  "chr1", 150,    400,  100,
  "chr1", 500,    550,  100,
  "chr2", 230,    430,  200,
  "chr2", 750,    900,  400
)

bed_intersect(x, y, z)

bed_intersect(x, exons = y, introns = z)

# a list of tbl_intervals can also be passed
bed_intersect(x, list(exons = y, introns = z))

Calculate the Jaccard statistic for two sets of intervals.

Description

Quantifies the extent of overlap between to sets of intervals in terms of base-pairs. Groups that are shared between input are used to calculate the statistic for subsets of data.

Usage

bed_jaccard(x, y)
bed_jaccard(x, y)

Arguments

`x`	ivl_df
`y`	ivl_df

Details

The Jaccard statistic takes values of ⁠[0,1]⁠ and is measured as:

$J(x,y) = \frac{\mid x \bigcap y \mid} {\mid x \bigcup y \mid} = \frac{\mid x \bigcap y \mid} {\mid x \mid + \mid y \mid - \mid x \bigcap y \mid}$

Interval statistics can be used in combination with dplyr::group_by() and dplyr::do() to calculate statistics for subsets of data. See vignette('interval-stats') for examples.

Value

tibble with the following columns:

len_i length of the intersection in base-pairs
len_u length of the union in base-pairs
jaccard value of jaccard statistic
n_int number of intersecting intervals between x and y

If inputs are grouped, the return value will contain one set of values per group.

Examples

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)

bed_jaccard(x, y)

# calculate jaccard per chromosome
bed_jaccard(
  dplyr::group_by(x, chrom),
  dplyr::group_by(y, chrom)
)

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)

bed_jaccard(x, y)

# calculate jaccard per chromosome
bed_jaccard(
  dplyr::group_by(x, chrom),
  dplyr::group_by(y, chrom)
)

Divide intervals into new sub-intervals ("windows").

Description

Divide intervals into new sub-intervals ("windows").

Usage

bed_makewindows(x, win_size = 0, step_size = 0, num_win = 0, reverse = FALSE)
bed_makewindows(x, win_size = 0, step_size = 0, num_win = 0, reverse = FALSE)

Arguments

`x`	ivl_df
`win_size`	divide intervals into fixed-size windows
`step_size`	size to step before next window
`num_win`	divide intervals to fixed number of windows
`reverse`	reverse window numbers

Value

ivl_df with .win_id column that contains a numeric identifier for the window.

Note

The name and .win_id columns can be used to create new interval names (see 'namenum' example below) or in subsequent group_by operations (see vignette).

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 100,    200,  "A",   ".",    "+"
)

bed_glyph(bed_makewindows(x, num_win = 10), label = ".win_id")

# Fixed number of windows
bed_makewindows(x, num_win = 10)

# Fixed window size
bed_makewindows(x, win_size = 10)

# Fixed window size with overlaps
bed_makewindows(x, win_size = 10, step_size = 5)

# reverse win_id
bed_makewindows(x, win_size = 10, reverse = TRUE)

# bedtools 'namenum'
wins <- bed_makewindows(x, win_size = 10)
dplyr::mutate(wins, namenum = stringr::str_c(name, "_", .win_id))

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 100,    200,  "A",   ".",    "+"
)

bed_glyph(bed_makewindows(x, num_win = 10), label = ".win_id")

# Fixed number of windows
bed_makewindows(x, num_win = 10)

# Fixed window size
bed_makewindows(x, win_size = 10)

# Fixed window size with overlaps
bed_makewindows(x, win_size = 10, step_size = 5)

# reverse win_id
bed_makewindows(x, win_size = 10, reverse = TRUE)

# bedtools 'namenum'
wins <- bed_makewindows(x, win_size = 10)
dplyr::mutate(wins, namenum = stringr::str_c(name, "_", .win_id))

Calculate summaries from overlapping intervals.

Description

Apply functions like min() and max() to intersecting intervals. bed_map() uses bed_intersect() to identify intersecting intervals, so output columns will be suffixed with .x and .y. Expressions that refer to input columns from x and y columns must take these suffixes into account.

Usage

bed_map(x, y, ..., min_overlap = 1)

concat(.data, sep = ",")

values_unique(.data, sep = ",")

values(.data, sep = ",")
bed_map(x, y, ..., min_overlap = 1)

concat(.data, sep = ",")

values_unique(.data, sep = ",")

values(.data, sep = ",")

Arguments

`x`	ivl_df
`y`	ivl_df
`...`	name-value pairs specifying column names and expressions to apply
`min_overlap`	minimum overlap for intervals.
`.data`	data
`sep`	separator character

Details

Book-ended intervals can be included by setting min_overlap = 0.

Non-intersecting intervals from x are included in the result with NA values.

Value

ivl_df

Examples

x <- tibble::tribble(
  ~chrom,
  ~start,
  ~end,
  'chr1',
  100,
  250,
  'chr2',
  250,
  500
)

y <- tibble::tribble(
  ~chrom,
  ~start,
  ~end,
  ~value,
  'chr1',
  100,
  250,
  10,
  'chr1',
  150,
  250,
  20,
  'chr2',
  250,
  500,
  500
)

bed_glyph(bed_map(x, y, value = sum(value)), label = 'value')

# summary examples
bed_map(x, y, .sum = sum(value))

bed_map(x, y, .min = min(value), .max = max(value))

# identify non-intersecting intervals to include in the result
res <- bed_map(x, y, .sum = sum(value))
x_not <- bed_intersect(x, y, invert = TRUE)
dplyr::bind_rows(res, x_not)

# create a list-column
bed_map(x, y, .values = list(value))

# use `nth` family from dplyr
bed_map(x, y, .first = dplyr::first(value))

bed_map(x, y, .absmax = abs(max(value)))

bed_map(x, y, .count = length(value))

bed_map(x, y, .vals = values(value))

# count defaults are NA not 0; differs from bedtools2 ...
bed_map(x, y, .counts = dplyr::n())

# ... but NA counts can be coverted to 0's
dplyr::mutate(
  bed_map(x, y, .counts = dplyr::n()),
  .counts = ifelse(is.na(.counts), 0, .counts)
)
x <- tibble::tribble(
  ~chrom,
  ~start,
  ~end,
  'chr1',
  100,
  250,
  'chr2',
  250,
  500
)

y <- tibble::tribble(
  ~chrom,
  ~start,
  ~end,
  ~value,
  'chr1',
  100,
  250,
  10,
  'chr1',
  150,
  250,
  20,
  'chr2',
  250,
  500,
  500
)

bed_glyph(bed_map(x, y, value = sum(value)), label = 'value')

# summary examples
bed_map(x, y, .sum = sum(value))

bed_map(x, y, .min = min(value), .max = max(value))

# identify non-intersecting intervals to include in the result
res <- bed_map(x, y, .sum = sum(value))
x_not <- bed_intersect(x, y, invert = TRUE)
dplyr::bind_rows(res, x_not)

# create a list-column
bed_map(x, y, .values = list(value))

# use `nth` family from dplyr
bed_map(x, y, .first = dplyr::first(value))

bed_map(x, y, .absmax = abs(max(value)))

bed_map(x, y, .count = length(value))

bed_map(x, y, .vals = values(value))

# count defaults are NA not 0; differs from bedtools2 ...
bed_map(x, y, .counts = dplyr::n())

# ... but NA counts can be coverted to 0's
dplyr::mutate(
  bed_map(x, y, .counts = dplyr::n()),
  .counts = ifelse(is.na(.counts), 0, .counts)
)

Merge overlapping intervals.

Description

Operations can be performed on merged intervals by specifying name-value pairs. Default max_dist of 0 means book-ended intervals are merged.

Usage

bed_merge(x, max_dist = 0, ...)
bed_merge(x, max_dist = 0, ...)

Arguments

`x`	ivl_df
`max_dist`	maximum distance between intervals to merge
`...`	name-value pairs that specify operations on merged intervals

Details

Value

ivl_df

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1, 50,
  "chr1", 10, 75,
  "chr1", 100, 120
)

bed_glyph(bed_merge(x))

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~value, ~strand,
  "chr1", 1,      50,   1,      "+",
  "chr1", 100,    200,  2,      "+",
  "chr1", 150,    250,  3,      "-",
  "chr2", 1,      25,   4,      "+",
  "chr2", 200,    400,  5,      "-",
  "chr2", 400,    500,  6,      "+",
  "chr2", 450,    550,  7,      "+"
)

bed_merge(x)

bed_merge(x, max_dist = 100)

# merge intervals on same strand
bed_merge(dplyr::group_by(x, strand))

bed_merge(x, .value = sum(value))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1, 50,
  "chr1", 10, 75,
  "chr1", 100, 120
)

bed_glyph(bed_merge(x))

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~value, ~strand,
  "chr1", 1,      50,   1,      "+",
  "chr1", 100,    200,  2,      "+",
  "chr1", 150,    250,  3,      "-",
  "chr2", 1,      25,   4,      "+",
  "chr2", 200,    400,  5,      "-",
  "chr2", 400,    500,  6,      "+",
  "chr2", 450,    550,  7,      "+"
)

bed_merge(x)

bed_merge(x, max_dist = 100)

# merge intervals on same strand
bed_merge(dplyr::group_by(x, strand))

bed_merge(x, .value = sum(value))

Partition intervals into elemental intervals

Description

Convert a set of intervals into elemental intervals that contain each start and end position in the set.

Usage

bed_partition(x, ...)
bed_partition(x, ...)

Arguments

`x`	ivl_df
`...`	name-value pairs specifying column names and expressions to apply

Details

Summary operations, such as min() or max() can be performed on elemental intervals by specifying name-value pairs.

This function is useful for calculating summaries across overlapping intervals without merging the intervals.

Value

ivl_df()

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~value, ~strand,
  "chr1", 100, 500, 10, "+",
  "chr1", 200, 400, 20, "-",
  "chr1", 300, 550, 30, "+",
  "chr1", 550, 575, 2, "+",
  "chr1", 800, 900, 5, "+"
)


bed_glyph(bed_partition(x))
bed_glyph(bed_partition(x, value = sum(value)), label = "value")

bed_partition(x)

# compute summary over each elemental interval
bed_partition(x, value = sum(value))

# partition and compute summaries based on group
x <- dplyr::group_by(x, strand)
bed_partition(x, value = sum(value))

# combine values across multiple tibbles
y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value, ~strand,
  "chr1", 10, 500, 100, "+",
  "chr1", 250, 420, 200, "-",
  "chr1", 350, 550, 300, "+",
  "chr1", 550, 555, 20, "+",
  "chr1", 800, 900, 50, "+"
)

x <- dplyr::bind_rows(x, y)
bed_partition(x, value = sum(value))

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~value, ~strand,
  "chr1", 100, 500, 10, "+",
  "chr1", 200, 400, 20, "-",
  "chr1", 300, 550, 30, "+",
  "chr1", 550, 575, 2, "+",
  "chr1", 800, 900, 5, "+"
)


bed_glyph(bed_partition(x))
bed_glyph(bed_partition(x, value = sum(value)), label = "value")

bed_partition(x)

# compute summary over each elemental interval
bed_partition(x, value = sum(value))

# partition and compute summaries based on group
x <- dplyr::group_by(x, strand)
bed_partition(x, value = sum(value))

# combine values across multiple tibbles
y <- tibble::tribble(
  ~chrom, ~start, ~end, ~value, ~strand,
  "chr1", 10, 500, 100, "+",
  "chr1", 250, 420, 200, "-",
  "chr1", 350, 550, 300, "+",
  "chr1", 550, 555, 20, "+",
  "chr1", 800, 900, 50, "+"
)

x <- dplyr::bind_rows(x, y)
bed_partition(x, value = sum(value))

Projection test for query interval overlap.

Description

Projection test for query interval overlap.

Usage

bed_projection(x, y, genome, by_chrom = FALSE)
bed_projection(x, y, genome, by_chrom = FALSE)

Arguments

`x`	ivl_df
`y`	ivl_df
`genome`	genome_df
`by_chrom`	compute test per chromosome

Details

Interval statistics can be used in combination with dplyr::group_by() and dplyr::do() to calculate statistics for subsets of data. See vignette('interval-stats') for examples.

Value

ivl_df with the following columns:

chrom the name of chromosome tested if by_chrom = TRUE, otherwise has a value of whole_genome
p.value p-value from a binomial test. p-values > 0.5 are converted to 1 - p-value and lower_tail is FALSE
obs_exp_ratio ratio of observed to expected overlap frequency
lower_tail TRUE indicates the observed overlaps are in the lower tail of the distribution (e.g., less overlap than expected). FALSE indicates that the observed overlaps are in the upper tail of the distribution (e.g., more overlap than expected)

Examples

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)

bed_projection(x, y, genome)

bed_projection(x, y, genome, by_chrom = TRUE)

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)

bed_projection(x, y, genome)

bed_projection(x, y, genome, by_chrom = TRUE)

Generate randomly placed intervals on a genome.

Description

Generate randomly placed intervals on a genome.

Usage

bed_random(genome, length = 1000, n = 1e+06, seed = 0, sorted = TRUE)
bed_random(genome, length = 1000, n = 1e+06, seed = 0, sorted = TRUE)

Arguments

`genome`	genome_df
`length`	length of intervals
`n`	number of intervals to generate
`seed`	seed RNG for reproducible intervals
`sorted`	return sorted output

Details

Sorting can be suppressed with sorted = FALSE.

Value

ivl_df

Examples

genome <- tibble::tribble(
  ~chrom,  ~size,
  "chr1",  10000000,
  "chr2",  50000000,
  "chr3",  60000000,
  "chrX",  5000000
)

bed_random(genome, seed = 10104)

# sorting can be suppressed
bed_random(genome, sorted = FALSE, seed = 10104)

# 500 random intervals of length 500
bed_random(genome, length = 500, n = 500, seed = 10104)

genome <- tibble::tribble(
  ~chrom,  ~size,
  "chr1",  10000000,
  "chr2",  50000000,
  "chr3",  60000000,
  "chrX",  5000000
)

bed_random(genome, seed = 10104)

# sorting can be suppressed
bed_random(genome, sorted = FALSE, seed = 10104)

# 500 random intervals of length 500
bed_random(genome, length = 500, n = 500, seed = 10104)

Compute relative distances between intervals.

Description

Compute relative distances between intervals.

Usage

bed_reldist(x, y, detail = FALSE)
bed_reldist(x, y, detail = FALSE)

Arguments

`x`	ivl_df
`y`	ivl_df
`detail`	report relative distances for each `x` interval.

Details

Interval statistics can be used in combination with dplyr::group_by() and dplyr::do() to calculate statistics for subsets of data. See vignette('interval-stats') for examples.

Value

If detail = FALSE, a ivl_df that summarizes calculated .reldist values with the following columns:

.reldist relative distance metric
.counts number of metric observations
.total total observations
.freq frequency of observation

If detail = TRUE, the .reldist column reports the relative distance for each input x interval.

Examples

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)

bed_reldist(x, y)

bed_reldist(x, y, detail = TRUE)

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

x <- bed_random(genome, seed = 1010486)
y <- bed_random(genome, seed = 9203911)

bed_reldist(x, y)

bed_reldist(x, y, detail = TRUE)

Adjust intervals by a fixed size.

Description

Out-of-bounds intervals are removed by default.

Usage

bed_shift(x, genome, size = 0, fraction = 0, trim = FALSE)
bed_shift(x, genome, size = 0, fraction = 0, trim = FALSE)

Arguments

`x`	ivl_df
`genome`	ivl_df
`size`	number of bases to shift. positive numbers shift right, negative shift left.
`fraction`	define `size` as a fraction of interval
`trim`	adjust coordinates for out-of-bounds intervals

Value

ivl_df

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25, 50,
  "chr1", 100, 125
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 125
)

bed_glyph(bed_shift(x, genome, size = -20))

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~strand,
  "chr1", 100,    150,  "+",
  "chr1", 200,    250,  "+",
  "chr2", 300,    350,  "+",
  "chr2", 400,    450,  "-",
  "chr3", 500,    550,  "-",
  "chr3", 600,    650,  "-"
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 1000,
  "chr2", 2000,
  "chr3", 3000
)

bed_shift(x, genome, 100)

bed_shift(x, genome, fraction = 0.5)

# shift with respect to strand
stranded <- dplyr::group_by(x, strand)
bed_shift(stranded, genome, 100)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25, 50,
  "chr1", 100, 125
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 125
)

bed_glyph(bed_shift(x, genome, size = -20))

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~strand,
  "chr1", 100,    150,  "+",
  "chr1", 200,    250,  "+",
  "chr2", 300,    350,  "+",
  "chr2", 400,    450,  "-",
  "chr3", 500,    550,  "-",
  "chr3", 600,    650,  "-"
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 1000,
  "chr2", 2000,
  "chr3", 3000
)

bed_shift(x, genome, 100)

bed_shift(x, genome, fraction = 0.5)

# shift with respect to strand
stranded <- dplyr::group_by(x, strand)
bed_shift(stranded, genome, 100)

Shuffle input intervals.

Description

Shuffle input intervals.

Usage

bed_shuffle(
  x,
  genome,
  incl = NULL,
  excl = NULL,
  max_tries = 1000,
  within = FALSE,
  seed = 0
)
bed_shuffle(
  x,
  genome,
  incl = NULL,
  excl = NULL,
  max_tries = 1000,
  within = FALSE,
  seed = 0
)

Arguments

`x`	ivl_df
`genome`	genome_df
`incl`	ivl_df of included intervals
`excl`	ivl_df of excluded intervals
`max_tries`	maximum tries to identify a bounded interval
`within`	shuffle within chromosomes
`seed`	seed for reproducible intervals

Value

ivl_df

Examples

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 1e6,
  "chr2", 2e6,
  "chr3", 4e6
)

x <- bed_random(genome, seed = 1010486)

bed_shuffle(x, genome, seed = 9830491)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 1e6,
  "chr2", 2e6,
  "chr3", 4e6
)

x <- bed_random(genome, seed = 1010486)

bed_shuffle(x, genome, seed = 9830491)

Increase the size of input intervals.

Description

Increase the size of input intervals.

Usage

bed_slop(
  x,
  genome,
  both = 0,
  left = 0,
  right = 0,
  fraction = FALSE,
  strand = FALSE,
  trim = FALSE,
  ...
)
bed_slop(
  x,
  genome,
  both = 0,
  left = 0,
  right = 0,
  fraction = FALSE,
  strand = FALSE,
  trim = FALSE,
  ...
)

Arguments

`x`	ivl_df
`genome`	genome_df
`both`	number of bases on both sizes
`left`	number of bases on left side
`right`	number of bases on right side
`fraction`	define flanks based on fraction of interval length
`strand`	define `left` and `right` based on strand
`trim`	adjust coordinates for out-of-bounds intervals
`...`	extra arguments (not used)

Value

ivl_df

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 110,    120,
  "chr1", 225,    235
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 400
)

bed_glyph(bed_slop(x, genome, both = 20, trim = TRUE))

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 5000
)

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 500, 1000, ".", ".", "+",
  "chr1", 1000, 1500, ".", ".", "-"
)

bed_slop(x, genome, left = 100)

bed_slop(x, genome, right = 100)

bed_slop(x, genome, both = 100)

bed_slop(x, genome, both = 0.5, fraction = TRUE)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 110,    120,
  "chr1", 225,    235
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 400
)

bed_glyph(bed_slop(x, genome, both = 20, trim = TRUE))

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 5000
)

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~name, ~score, ~strand,
  "chr1", 500, 1000, ".", ".", "+",
  "chr1", 1000, 1500, ".", ".", "-"
)

bed_slop(x, genome, left = 100)

bed_slop(x, genome, right = 100)

bed_slop(x, genome, both = 100)

bed_slop(x, genome, both = 0.5, fraction = TRUE)

Sort a set of intervals.

Description

Sort a set of intervals.

Usage

bed_sort(x, by_size = FALSE, by_chrom = FALSE, reverse = FALSE)
bed_sort(x, by_size = FALSE, by_chrom = FALSE, reverse = FALSE)

Arguments

`x`	ivl_df
`by_size`	sort by interval size
`by_chrom`	sort within chromosome
`reverse`	reverse sort order

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr8", 500,    1000,
  "chr8", 1000,   5000,
  "chr8", 100,    200,
  "chr1", 100,    300,
  "chr1", 100,    200
)

# sort by chrom and start
bed_sort(x)

# reverse sort order
bed_sort(x, reverse = TRUE)

# sort by interval size
bed_sort(x, by_size = TRUE)

# sort by decreasing interval size
bed_sort(x, by_size = TRUE, reverse = TRUE)

# sort by interval size within chrom
bed_sort(x, by_size = TRUE, by_chrom = TRUE)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr8", 500,    1000,
  "chr8", 1000,   5000,
  "chr8", 100,    200,
  "chr1", 100,    300,
  "chr1", 100,    200
)

# sort by chrom and start
bed_sort(x)

# reverse sort order
bed_sort(x, reverse = TRUE)

# sort by interval size
bed_sort(x, by_size = TRUE)

# sort by decreasing interval size
bed_sort(x, by_size = TRUE, reverse = TRUE)

# sort by interval size within chrom
bed_sort(x, by_size = TRUE, by_chrom = TRUE)

Subtract two sets of intervals.

Description

Subtract y intervals from x intervals.

Usage

bed_subtract(x, y, any = FALSE)
bed_subtract(x, y, any = FALSE)

Arguments

`x`	ivl_df
`y`	ivl_df
`any`	remove any `x` intervals that overlap `y`

Details

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1,      100
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 50,     75
)

bed_glyph(bed_subtract(x, y))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    200,
  "chr1", 250,    400,
  "chr1", 500,    600,
  "chr1", 1000,   1200,
  "chr1", 1300,   1500
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 150,    175,
  "chr1", 510,    525,
  "chr1", 550,    575,
  "chr1", 900,    1050,
  "chr1", 1150,   1250,
  "chr1", 1299,   1501
)

bed_subtract(x, y)

bed_subtract(x, y, any = TRUE)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1,      100
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 50,     75
)

bed_glyph(bed_subtract(x, y))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 100,    200,
  "chr1", 250,    400,
  "chr1", 500,    600,
  "chr1", 1000,   1200,
  "chr1", 1300,   1500
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 150,    175,
  "chr1", 510,    525,
  "chr1", 550,    575,
  "chr1", 900,    1050,
  "chr1", 1150,   1250,
  "chr1", 1299,   1501
)

bed_subtract(x, y)

bed_subtract(x, y, any = TRUE)

Identify intervals within a specified distance.

Description

Identify intervals within a specified distance.

Usage

bed_window(x, y, genome, ...)
bed_window(x, y, genome, ...)

Arguments

`x`	ivl_df
`y`	ivl_df
`genome`	genome_df
`...`	params for bed_slop and bed_intersect

Details

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25,     50,
  "chr1", 100,    125
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 60,     75
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 125
)

bed_glyph(bed_window(x, y, genome, both = 15))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 10, 100,
  "chr2", 200, 400,
  "chr2", 300, 500,
  "chr2", 800, 900
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 150,    400,
  "chr2", 230,    430,
  "chr2", 350,    430
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 500,
  "chr2", 1000
)

bed_window(x, y, genome, both = 100)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 25,     50,
  "chr1", 100,    125
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 60,     75
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 125
)

bed_glyph(bed_window(x, y, genome, both = 15))

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 10, 100,
  "chr2", 200, 400,
  "chr2", 300, 500,
  "chr2", 800, 900
)

y <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 150,    400,
  "chr2", 230,    430,
  "chr2", 350,    430
)

genome <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 500,
  "chr2", 1000
)

bed_window(x, y, genome, both = 100)

Convert BED12 to individual exons in BED6.

Description

After conversion to BED6 format, the score column contains the exon number, with respect to strand (i.e., the first exon for - strand genes will have larger start and end coordinates).

Usage

bed12_to_exons(x)
bed12_to_exons(x)

Arguments

x

ivl_df

Examples

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

bed12_to_exons(x)

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

bed12_to_exons(x)

Select intervals bounded by a genome.

Description

Used to remove out-of-bounds intervals, or trim interval coordinates using a genome.

Usage

bound_intervals(x, genome, trim = FALSE)
bound_intervals(x, genome, trim = FALSE)

Arguments

`x`	ivl_df
`genome`	genome_df
`trim`	adjust coordinates for out-of-bounds intervals

Value

ivl_df

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", -100,   500,
  "chr1", 100,    1e9,
  "chr1", 500,    1000
)

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

# out-of-bounds are removed by default ...
bound_intervals(x, genome)

# ... or can be trimmed within the bounds of a genome
bound_intervals(x, genome, trim = TRUE)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", -100,   500,
  "chr1", 100,    1e9,
  "chr1", 500,    1000
)

genome <- read_genome(valr_example("hg19.chrom.sizes.gz"))

# out-of-bounds are removed by default ...
bound_intervals(x, genome)

# ... or can be trimmed within the bounds of a genome
bound_intervals(x, genome, trim = TRUE)

Create intron features.

Description

Numbers in the score column are intron numbers from 5' to 3' independent of strand. I.e., the first introns for + and - strand genes both have score values of 1.

Usage

create_introns(x)
create_introns(x)

Arguments

`x`	ivl_df in BED12 format

Examples

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

create_introns(x)

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

create_introns(x)

Create transcription start site features.

Description

Create transcription start site features.

Usage

create_tss(x)
create_tss(x)

Arguments

`x`	ivl_df in BED format

Examples

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

create_tss(x)

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

create_tss(x)

Create 3' UTR features.

Description

Create 3' UTR features.

Usage

create_utrs3(x)
create_utrs3(x)

Arguments

`x`	ivl_df in BED12 format

Examples

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

create_utrs3(x)

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

create_utrs3(x)

Create 5' UTR features.

Description

Create 5' UTR features.

Usage

create_utrs5(x)
create_utrs5(x)

Arguments

`x`	ivl_df in BED12 format

Examples

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

create_utrs5(x)

x <- read_bed12(valr_example("mm9.refGene.bed.gz"))

create_utrs5(x)

Fetch data from remote databases.

Description

Currently db_ucsc and db_ensembl are available for connections.

Usage

db_ucsc(
  dbname,
  host = "genome-mysql.cse.ucsc.edu",
  user = "genomep",
  password = "password",
  port = 3306,
  ...
)

db_ensembl(
  dbname,
  host = "ensembldb.ensembl.org",
  user = "anonymous",
  password = "",
  port = 3306,
  ...
)
db_ucsc(
  dbname,
  host = "genome-mysql.cse.ucsc.edu",
  user = "genomep",
  password = "password",
  port = 3306,
  ...
)

db_ensembl(
  dbname,
  host = "ensembldb.ensembl.org",
  user = "anonymous",
  password = "",
  port = 3306,
  ...
)

Arguments

`dbname`	name of database
`host`	hostname
`user`	username
`password`	password
`port`	MySQL connection port
`...`	params for connection

Examples

## Not run: 
if (require(RMariaDB)) {
  library(dplyr)
  ucsc <- db_ucsc("hg38")

  # fetch the `refGene` tbl
  tbl(ucsc, "refGene")

  # the `chromInfo` tbls have size information
  tbl(ucsc, "chromInfo")
}

## End(Not run)
## Not run: 
if (require(RMariaDB)) {
  library(dplyr)
  # squirrel genome
  ensembl <- db_ensembl("spermophilus_tridecemlineatus_core_67_2")

  tbl(ensembl, "gene")
}

## End(Not run)

## Not run: 
if (require(RMariaDB)) {
  library(dplyr)
  ucsc <- db_ucsc("hg38")

  # fetch the `refGene` tbl
  tbl(ucsc, "refGene")

  # the `chromInfo` tbls have size information
  tbl(ucsc, "chromInfo")
}

## End(Not run)
## Not run: 
if (require(RMariaDB)) {
  library(dplyr)
  # squirrel genome
  ensembl <- db_ensembl("spermophilus_tridecemlineatus_core_67_2")

  tbl(ensembl, "gene")
}

## End(Not run)

Flip strands in intervals.

Description

Flips positive (+) stranded intervals to negative (-) strands, and vice-versa. Facilitates comparisons among intervals on opposing strands.

Usage

flip_strands(x)
flip_strands(x)

Arguments

x

ivl_df

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~strand,
  "chr1", 1,      100,  "+",
  "chr2", 1,      100,  "-"
)

flip_strands(x)

x <- tibble::tribble(
  ~chrom, ~start, ~end, ~strand,
  "chr1", 1,      100,  "+",
  "chr2", 1,      100,  "-"
)

flip_strands(x)

Convert Granges to bed tibble

Description

Convert Granges to bed tibble

Usage

gr_to_bed(x)
gr_to_bed(x)

Arguments

`x`	GRanges object to convert to bed tibble.

Value

tibble::tibble()

Examples

## Not run: 
gr <- GenomicRanges::GRanges(
  seqnames = S4Vectors::Rle(
    c("chr1", "chr2", "chr1", "chr3"),
    c(1, 1, 1, 1)
  ),
  ranges = IRanges::IRanges(
    start = c(1, 10, 50, 100),
    end = c(100, 500, 1000, 2000),
    names = head(letters, 4)
  ),
  strand = S4Vectors::Rle(
    c("-", "+"), c(2, 2)
  )
)

gr_to_bed(gr)

# There are two ways to convert a bed-like data.frame to GRanges:

gr <- GenomicRanges::GRanges(
  seqnames = S4Vectors::Rle(x$chrom),
  ranges = IRanges::IRanges(
    start = x$start + 1,
    end = x$end,
    names = x$name
  ),
  strand = S4Vectors::Rle(x$strand)
)
# or:

gr <- GenomicRanges::makeGRangesFromDataFrame(dplyr::mutate(x, start = start + 1))

## End(Not run)

## Not run: 
gr <- GenomicRanges::GRanges(
  seqnames = S4Vectors::Rle(
    c("chr1", "chr2", "chr1", "chr3"),
    c(1, 1, 1, 1)
  ),
  ranges = IRanges::IRanges(
    start = c(1, 10, 50, 100),
    end = c(100, 500, 1000, 2000),
    names = head(letters, 4)
  ),
  strand = S4Vectors::Rle(
    c("-", "+"), c(2, 2)
  )
)

gr_to_bed(gr)

# There are two ways to convert a bed-like data.frame to GRanges:

gr <- GenomicRanges::GRanges(
  seqnames = S4Vectors::Rle(x$chrom),
  ranges = IRanges::IRanges(
    start = x$start + 1,
    end = x$end,
    names = x$name
  ),
  strand = S4Vectors::Rle(x$strand)
)
# or:

gr <- GenomicRanges::makeGRangesFromDataFrame(dplyr::mutate(x, start = start + 1))

## End(Not run)

Calculate interval spacing.

Description

Spacing for the first interval of each chromosome is undefined (NA). The leading interval of an overlapping interval pair has a negative value.

Usage

interval_spacing(x)
interval_spacing(x)

Arguments

x

ivl_df

Value

ivl_df with .spacing column.

Examples

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1,      100,
  "chr1", 150,    200,
  "chr2", 200,    300
)

interval_spacing(x)

x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1,      100,
  "chr1", 150,    200,
  "chr2", 200,    300
)

interval_spacing(x)

Bed-like data.frame requirements for valr functions

Description

Required column names for interval dataframes are chrom, start and end. Internally interval dataframes are validated using check_interval()

Required column names for genome dataframes are chrom and size. Internally genome dataframes are validated using check_genome().

Usage

check_interval(x)

check_genome(x)
check_interval(x)

check_genome(x)

Arguments

`x`	A `data.frame` or `tibble::tibble`

Examples

# using tibble
x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1, 50,
  "chr1", 10, 75,
  "chr1", 100, 120
)

check_interval(x)

# using base R data.frame
x <- data.frame(
  chrom = "chr1",
  start = 0,
  end = 100,
  stringsAsFactors = FALSE
)

check_interval(x)

# example genome input

x <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 1e6
)

check_genome(x)

# using tibble
x <- tibble::tribble(
  ~chrom, ~start, ~end,
  "chr1", 1, 50,
  "chr1", 10, 75,
  "chr1", 100, 120
)

check_interval(x)

# using base R data.frame
x <- data.frame(
  chrom = "chr1",
  start = 0,
  end = 100,
  stringsAsFactors = FALSE
)

check_interval(x)

# example genome input

x <- tibble::tribble(
  ~chrom, ~size,
  "chr1", 1e6
)

check_genome(x)

Read BED and related files.

Description

read functions for BED and related formats. Filenames can be local file or URLs. The read functions load data into tbls with consistent chrom, start and end colnames.

Usage

read_bed(
  filename,
  col_types = bed12_coltypes,
  sort = TRUE,
  ...,
  n_fields = NULL
)

read_bed12(filename, ...)

read_bedgraph(filename, ...)

read_narrowpeak(filename, ...)

read_broadpeak(filename, ...)
read_bed(
  filename,
  col_types = bed12_coltypes,
  sort = TRUE,
  ...,
  n_fields = NULL
)

read_bed12(filename, ...)

read_bedgraph(filename, ...)

read_narrowpeak(filename, ...)

read_broadpeak(filename, ...)

Arguments

`filename`	file or URL
`col_types`	column type spec for `readr::read_tsv()`
`sort`	sort the tbl by chrom and start
`...`	options to pass to `readr::read_tsv()`
`n_fields`

Details

https://genome.ucsc.edu/FAQ/FAQformat.html#format1

https://genome.ucsc.edu/goldenPath/help/bedgraph.html

https://genome.ucsc.edu/FAQ/FAQformat.html#format12

https://genome.ucsc.edu/FAQ/FAQformat.html#format13

Value

ivl_df

Examples

# read_bed assumes 3 field BED format.
read_bed(valr_example("3fields.bed.gz"))

# result is sorted by chrom and start unless `sort = FALSE`
read_bed(valr_example("3fields.bed.gz"), sort = FALSE)


read_bed12(valr_example("mm9.refGene.bed.gz"))


read_bedgraph(valr_example("test.bg.gz"))


read_narrowpeak(valr_example("sample.narrowPeak.gz"))


read_broadpeak(valr_example("sample.broadPeak.gz"))

# read_bed assumes 3 field BED format.
read_bed(valr_example("3fields.bed.gz"))

# result is sorted by chrom and start unless `sort = FALSE`
read_bed(valr_example("3fields.bed.gz"), sort = FALSE)


read_bed12(valr_example("mm9.refGene.bed.gz"))


read_bedgraph(valr_example("test.bg.gz"))


read_narrowpeak(valr_example("sample.narrowPeak.gz"))


read_broadpeak(valr_example("sample.broadPeak.gz"))

Read a bigwig file into a valr compatible tbl

Description

This function will output a 4 column tibble with zero-based chrom, start, end, value columns.

Usage

read_bigwig(path, ...)
read_bigwig(path, ...)

Arguments

`path`	path to bigWig file
`...`	params for `cpp11bigwig::read_bigwig()`

Examples

read_bigwig(valr_example("hg19.dnase1.bw"))

read_bigwig(valr_example("hg19.dnase1.bw"), as = "GRanges")

read_bigwig(valr_example("hg19.dnase1.bw"))

read_bigwig(valr_example("hg19.dnase1.bw"), as = "GRanges")

Read genome files.

Description

Genome files (UCSC "chromSize" files) contain chromosome name and size information. These sizes are used by downstream functions to identify computed intervals that have coordinates outside of the genome bounds.

Usage

read_genome(path)
read_genome(path)

Arguments

path

containing chrom/contig names and sizes, one-pair-per-line, tab-delimited

Value

genome_df, sorted by size

Note

URLs to genome files can also be used.

Examples

read_genome(valr_example("hg19.chrom.sizes.gz"))

## Not run: 
# `read_genome` accepts a URL
read_genome("https://genome.ucsc.edu/goldenpath/help/hg19.chrom.sizes")

## End(Not run)

read_genome(valr_example("hg19.chrom.sizes.gz"))

## Not run: 
# `read_genome` accepts a URL
read_genome("https://genome.ucsc.edu/goldenpath/help/hg19.chrom.sizes")

## End(Not run)

Import and convert a GTF/GFF file into a valr compatible bed tbl format

Description

This function will output a tibble with the required chrom, start, and end columns, as well as other columns depending on content in GTF/GFF file.

Usage

read_gtf(path, zero_based = TRUE)
read_gtf(path, zero_based = TRUE)

Arguments

`path`	path to gtf or gff file
`zero_based`	if TRUE, convert to zero based

Examples


## Not run: 
gtf <- read_gtf(valr_example("hg19.gencode.gtf.gz"))
head(gtf)

## End(Not run)

## Not run: 
gtf <- read_gtf(valr_example("hg19.gencode.gtf.gz"))
head(gtf)

## End(Not run)

Read a VCF file.

Description

Read a VCF file.

Usage

read_vcf(vcf)
read_vcf(vcf)

Arguments

vcf

vcf filename

Value

data_frame

Note

return value has chrom, start and end columns. Interval lengths are the size of the 'REF' field.

Examples

vcf_file <- valr_example("test.vcf.gz")
read_vcf(vcf_file)

vcf_file <- valr_example("test.vcf.gz")
read_vcf(vcf_file)

valr: genome interval arithmetic in R

Description

valr provides tools to read and manipulate intervals and signals on a genome reference. valr was developed to facilitate interactive analysis of genome-scale data sets, leveraging the power of dplyr and piping.

Details

To learn more about valr, start with the vignette: browseVignettes(package = "valr")

Author(s)

Jay Hesselberth [email protected]

Kent Riemondy [email protected]

Provide working directory for valr example files.

Description

Provide working directory for valr example files.

Usage

valr_example(path)
valr_example(path)

Arguments

path

path to file

Examples

valr_example("hg19.chrom.sizes.gz")

valr_example("hg19.chrom.sizes.gz")

Package 'valr'

Help Index

Compute absolute distances between intervals.

Description

Usage

Arguments

Details

Value

See Also

Examples

Identify closest intervals.

Description

Usage

Arguments

Details

Value

Note

See Also

Examples

Cluster neighboring intervals.

Description

Usage

Arguments

Details

Value

See Also

Examples

Identify intervals in a genome not covered by a query.

Description

Usage

Arguments

Value

See Also

Examples

Compute coverage of intervals.

Description

Usage

Arguments

Details

Value

Note

See Also

Examples

Fisher's test to measure overlap between two sets of intervals.

Description

Usage

Arguments

Details

Value

See Also

Examples

Create flanking intervals from input intervals.

Description

Usage

Arguments

Value

See Also

Examples

Calculate coverage across a genome

Description

Usage

Arguments

Details

Value

See Also

Examples

Create example glyphs for valr functions.

Description

Usage

Arguments

Value

Examples

Identify intersecting intervals.

Description

Usage

Arguments

Details

Value

See Also

Examples