Address NOTE on CRAN about Rd link targets.
Change maintainer email address.
bed_genomecov()
to compute interval coverage across a genome.read_bed
and related functions now automatically calculate the fields. Use of n_fields
was deprecated.bed_closest()
now reports all x intervals, even when there are no closest y intervals (e.g. when there is no matching chromosome in y intervals). These intervals are returned populated with NA
for .overlap
, .dist
and y interval locations.
Reimplemented bed_closest()
to use binary search rather than an interval tree search. The closest y interval can be missed with the previous search strategy in high depth interval trees.
Fix off by one error when using max_dist
argument in bed_cluster()
(#401).
Removed SystemRequirements
from DESCRIPTION to eliminate a NOTE on CRAN.
bed_coverage()
now reports intervals from x
with no matching group in y
(#395).
Updated intervalTree header to commit f0c4046
valr now uses cli for more consistent errors and messages during interactive use.
deprecated genome
argument to bed_makewindows()
was removed.
max_dist
for first intervals in bed_cluster()
(#388)Fixed intron score numbering error in create_introns
(#377 @sheridar)
Fixed bug in handling of list inputs for bed_intersect()
(#380 @sheridar)
Added read_bigwig
and read_gtf
functions to import data into valr compatible tibbles (#379)
Kent Riemondy is now maintainer.
RMariaDB
has replaced the deprecated RMySQL
package as the database backend.
valr now imports Rcpp, which should have always been the case, but was masked by its Import by readr, which recently dropped use of Rcpp.
trbl_interval()
and trbl_genome()
custom tibble
subclasses have been deemed unnecessary and have been removed from the package.
coercing GRanges
to a valr
compatible data.frame now uses the gr_to_bed()
function rather than as.trbl_interal()
methods.
dplyr version < 0.8.0 is no longer supported due to unnecessary code bloat and challenges with handling multiple grouping structures (#359).
The sort_by
argument of bed_random()
has been changed to sorted
, and will now by default
use bed_sort()
to sort the output, rather than rely on naming the sorting columns. Sorting can
be suppressed by using sorted = FALSE
.
bed_sort()
now uses base R sorting with the radix
method for increased speed. (#353)
tbls
processed by bed_merge()
or bed_sort()
no longer store either merged
or sorted
as attributes, due to these attributes being rarely checked in the codebase and potential sources of unexpected behavior.
Fixed bed_closest()
to prevent erroneous intervals being reported when adjacent closest intervals are present in the y
table. (#348)
Factor columns that are not used for grouping are returned as factors rather than inappropriately being coerced to integer vectors (#360)
Rcpp
functions have been reorganized to remove all dependencies on dplyr
C++ functions.Due to internal refactoring of Rcpp functions, only data.frames containing Numeric, Logical, Integer, Character, and List column types are supported. Columns containing Raw, Complex, or other R classes are not supported and will issue an error.
Factors are now disallowed from grouping variables in multiset operations to avoid sort order discrepancies, and compatibility with factor handling in dplyr
v.0.8.0. Factors will now be internally type-converted to character and a warning is issued.
as.tbl_interval()
to call as_tibble()
only on non-tibble input, which prevents groups from being stripped from tibble()
input (#338).Added new function, bed_partition()
, which is similar to bed_merge()
but collapses intervals to elemental intervals rather than the maximal overlapping region. bed_partition()
also can compute summaries of data from overlapping intervals. See examples in bed_partition()
and timings in vignette('benchmarks')
@kriemo.
Several explicit comparisons to the Bioconductor GenomicRanges library are included for users considering using valr. See examples in as.tbl_interval()
and timings in vignette('benchmarks')
.
All relevant tests from bedtool2 were ported into valr. Bugs identified in corner cases by new tests were fixed (#328 @raysinesis)
bed_jaccard()
now works with grouped inputs (#216)
Update dplyr header files to v0.7
bed_intersect()
and internal intersect_impl
were refactored to enable return of non-intersecting intervals.
The genome argument to bed_makewindows()
was deprecated and will produce a warning if used. Also error handling was added to check and warn if there are intervals smaller than the requested window size in makewindows_impl()
(#312 @kriemo)
Fixed off by one error in reported distances from bed_closest()
. Distances reported now are the same as bedtools closest
behavior (#311).
bed_glyph()
accepts trbl_intervals
named other than x
and y
(#318).
bed_makewindows()
now returns the number of windows specified by num_win
when the input intervals are not evenly divisble into num_win
, consistent with bedtools
behavior.
The output of findOverlaps()
is now sorted in subtract_impl()
to prevent reporting intervals that should have been dropped when calling bed_subtract()
(#316 @kriemo)
A manuscript describing valr has been published in F1000Research.
New S3 generic as.tbl_interval()
converts GenomicRanges::GRanges
objects to tbl_interval
.
New create_tss()
for creating transcription start sites.
Improve documentation of interval statistics with more complex examples.
bed_sort()
has been de-deprecated to reduce arrange
calls in library code.bed_merge()
now reports start/end columns if spec is provided (#288)New create_introns()
, create_utrs5()
and create_utrs3()
functions for generating features from BED12 files.
Speed-ups in bed_makewindows()
(~50x), bed_merge()
(~4x), and bed_flank()
(~4x) (thanks to @kriemo and @sheridar). Thanks to the sponsors of the Biofrontiers Hackathon for the caffeine underlying these improvements.
bed_random()
are now sorted properly.Package dplyr v0.5.0 headers with valr to remove dplyr LinkingTo dependency.
bed_intersect()
now accepts multiple tbls for intersection (#220 @kriemo).
new tbl_interval()
and tbl_genome()
that wrap tibbles and enforce strict column naming. trbl_interval()
and trbl_genome()
are constructors that take tibble::tribble()
formatting and is.tbl_interval()
and is.tbl_genome()
are used to check for valid classes.
bed_random()
are sorted by chrom
and start
by default.bed_jaccard()
and use numeric values for calculation (fixes #204).bed_sort()
in favor of using dplyr::arrange()
explicitly (fixes #134).add src/init.c
that calls R_registerRoutines
and R_useDynamicSymbols
to address NOTE in r-devel
Deprecate dist
parameter in bed_closest()
in favor of using user supplied functions (#182 @kriemo)
Make .id
values sequential across chroms in bed_cluster()
output (#171)
Transfer repository to http://github.com/rnabioco/valr, update links and docs.
Move shiny app to new repo (http://github.com/rnabioco/valrdata).
Add Kent Riemondy to LICENSE file.
bed_merge()
now merges contained intervals (#177)test / vignette guards for Suggested RMySQL
fixed memory leak in absdist.cpp
fixed vignette entry names