bed_slop() and bed_flank() now preserve input row order instead of sorting output by chrom and start (#434, #435).
Fixed bed_closest() to respect custom suffix parameter when computing .dist column (#436).
Improved memory efficiency in bed_intersect() by using a visitor pattern, reducing allocations for large datasets (#446).
min_overlap parameter to bed_intersect(), bed_subtract(), bed_coverage(), and bed_map(). The default min_overlap = 1L aligns with bedtools behavior where book-ended (adjacent) intervals are not considered overlapping. Use min_overlap = 0L to preserve the previous valr behavior where book-ended intervals were treated as overlapping. Currently, calling these functions without an explicit min_overlap value will emit a deprecation warning and use the legacy behavior (min_overlap = 0L). In a future release, the default will change to min_overlap = 1L, so users should update their code to explicitly specify the desired behavior.Eliminated all global variable dependencies by replacing bare column names with explicit .data[["column"]] syntax in data manipulation operations and all_of() in column selection operations.
Fixed bed_makewindows() step size calculation when step_size parameter is used. Previously, overlapping windows stepped by win_size - step_size instead of the specified step_size (#438).
Select methods (tibble, tribble) are now re-exported from the tibble package.
read_bigbed() is now re-exported from the cpp11bigwig package.
read_bigwig() now uses cpp11bigwig on CRAN. The set_strand param was removed to be
more consistent with expected bigWig contents.
read_gtf() was deprecated. The rtracklayer package used
for this functionality is no longer a dependency of valr due to errors from
CRAN AddressSantizer checks of the UCSC c-library code vendored in rtracklayer.
valr now depends on R >= 4.0.0.
Address NOTE on CRAN about Rd link targets.
Change maintainer email address.
bed_genomecov() to compute interval coverage across a genome.read_bed and related functions now automatically calculate the fields. Use of n_fields was deprecated.bed_closest() now reports all x intervals, even when there are no closest y intervals (e.g. when there is no matching chromosome in y intervals). These intervals are returned populated with NA for .overlap, .dist and y interval locations.
Reimplemented bed_closest() to use binary search rather than an interval tree search. The closest y interval can be missed with the previous search strategy in high depth interval trees.
Fix off by one error when using max_dist argument in bed_cluster() (#401).
Removed SystemRequirements from DESCRIPTION to eliminate a NOTE on CRAN.
bed_coverage() now reports intervals from x with no matching group in y (#395).
Updated intervalTree header to commit f0c4046
valr now uses cli for more consistent errors and messages during interactive use.
deprecated genome argument to bed_makewindows() was removed.
max_dist for first intervals in bed_cluster() (#388)Fixed intron score numbering error in create_introns (#377 @sheridar)
Fixed bug in handling of list inputs for bed_intersect()(#380 @sheridar)
Added read_bigwig and read_gtf functions to import data into valr compatible tibbles (#379)
Kent Riemondy is now maintainer.
RMariaDB has replaced the deprecated RMySQL package as the database backend.
valr now imports Rcpp, which should have always been the case, but was masked by its Import by readr, which recently dropped use of Rcpp.
trbl_interval() and trbl_genome() custom tibble subclasses have been deemed unnecessary and have been removed from the package.
coercing GRanges to a valr compatible data.frame now uses the gr_to_bed() function rather than as.trbl_interal() methods.
dplyr version < 0.8.0 is no longer supported due to unnecessary code bloat and challenges with handling multiple grouping structures (#359).
The sort_by argument of bed_random() has been changed to sorted, and will now by default
use bed_sort() to sort the output, rather than rely on naming the sorting columns. Sorting can
be suppressed by using sorted = FALSE.
bed_sort() now uses base R sorting with the radix method for increased speed. (#353)
tbls processed by bed_merge()or bed_sort() no longer store either merged or sorted as attributes, due to these attributes being rarely checked in the codebase and potential sources of unexpected behavior.
Fixed bed_closest() to prevent erroneous intervals being reported when adjacent closest intervals are present in the y table. (#348)
Factor columns that are not used for grouping are returned as factors rather than inappropriately being coerced to integer vectors (#360)
Rcpp functions have been reorganized to remove all dependencies on dplyr C++ functions.Due to internal refactoring of Rcpp functions, only data.frames containing Numeric, Logical, Integer, Character, and List column types are supported. Columns containing Raw, Complex, or other R classes are not supported and will issue an error.
Factors are now disallowed from grouping variables in multiset operations to avoid sort order discrepancies, and compatibility with factor handling in dplyr v.0.8.0. Factors will now be internally type-converted to character and a warning is issued.
as.tbl_interval() to call as_tibble() only on non-tibble input, which prevents groups from being stripped from tibble() input (#338).Added new function, bed_partition(), which is similar to bed_merge() but collapses intervals to elemental intervals rather than the maximal overlapping region. bed_partition() also can compute summaries of data from overlapping intervals. See examples in bed_partition() and timings in vignette('benchmarks') @kriemo.
Several explicit comparisons to the Bioconductor GenomicRanges library are included for users considering using valr. See examples in as.tbl_interval() and timings in vignette('benchmarks').
All relevant tests from bedtool2 were ported into valr. Bugs identified in corner cases by new tests were fixed (#328 @raysinesis)
bed_jaccard() now works with grouped inputs (#216)
Update dplyr header files to v0.7
bed_intersect() and internal intersect_impl were refactored to enable return of non-intersecting intervals.
The genome argument to bed_makewindows() was deprecated and will produce a warning if used. Also error handling was added to check and warn if there are intervals smaller than the requested window size in makewindows_impl() (#312 @kriemo)
Fixed off by one error in reported distances from bed_closest(). Distances reported now are the same as bedtools closest behavior (#311).
bed_glyph() accepts trbl_intervals named other than x and y (#318).
bed_makewindows() now returns the number of windows specified by num_win when the input intervals are not evenly divisble into num_win, consistent with bedtools behavior.
The output of findOverlaps() is now sorted in subtract_impl() to prevent reporting intervals that should have been dropped when calling bed_subtract() (#316 @kriemo)
A manuscript describing valr has been published in F1000Research.
New S3 generic as.tbl_interval() converts GenomicRanges::GRanges objects to tbl_interval.
New create_tss() for creating transcription start sites.
Improve documentation of interval statistics with more complex examples.
bed_sort() has been de-deprecated to reduce arrange calls in library code.bed_merge() now reports start/end columns if spec is provided (#288)New create_introns(), create_utrs5() and create_utrs3() functions for generating features from BED12 files.
Speed-ups in bed_makewindows() (~50x), bed_merge() (~4x), and bed_flank() (~4x) (thanks to @kriemo and @sheridar). Thanks to the sponsors of the Biofrontiers Hackathon for the caffeine underlying these improvements.
bed_random() are now sorted properly.Package dplyr v0.5.0 headers with valr to remove dplyr LinkingTo dependency.
bed_intersect() now accepts multiple tbls for intersection (#220 @kriemo).
new tbl_interval() and tbl_genome() that wrap tibbles and enforce strict column naming. trbl_interval() and trbl_genome() are constructors that take tibble::tribble() formatting and is.tbl_interval() and is.tbl_genome() are used to check for valid classes.
bed_random() are sorted by chrom and start by default.bed_jaccard() and use numeric values for calculation (fixes #204).bed_sort() in favor of using dplyr::arrange() explicitly (fixes #134).add src/init.c that calls R_registerRoutines and R_useDynamicSymbols to address NOTE in r-devel
Deprecate dist parameter in bed_closest() in favor of using user supplied functions (#182 @kriemo)
Make .id values sequential across chroms in bed_cluster() output (#171)
Transfer repository to https://github.com/rnabioco/valr, update links and docs.
Move shiny app to new repo (https://github.com/rnabioco/valrdata).
Add Kent Riemondy to LICENSE file.
bed_merge() now merges contained intervals (#177)test / vignette guards for Suggested RMySQL
fixed memory leak in absdist.cpp
fixed vignette entry names