.. currentmodule:: bpnetlite =============== Release History =============== Version 1.0.0 ============= highlights ---------- - First major release! - The pipeline has been simplified, cleaned up, and expanded. - Support for training ChromBPNet models has been dropped. - ReadTheDocs documentation has been added in the docs/ folder - The example_jsons folder has been removed in favor of using the `pipeline-json` command bpnet ----- - The `predict` subcommand has been dropped and replaced with the `evaluate` subcommand - Models are automatically evaluated on the validation set at the end of the `fit` subcommand - MACS3 has been added as an optional preprocessing step to do peak calling - All preprocessing arguments have been moved into `preprocessing_parameters` in the pipeline JSON - Reports the number of trainable parameters in the model and its # filters/layers - Moved the model architecture parameters into the `fit_parameters` JSON instead of the umbrella JSON - Added a `-pe` argument to `pipeline-json` to specify if an input is paired end data for MACS3 - Removed the `find_negatives` argument from the umbrella JSON with the assumption that you need to find negatives if you have not provided a file with them. Version 0.9.5 ============= highlights ---------- - Minor bug fixes related to exclusion list and control tracks with the pipeline Version 0.9.4 ============= io ===== - Fixed a small bug in the calculation of epoch length. Thanks @amanpatel101 Version 0.9.3 ============= bpnet cmd --------- - Made the `bpnet evaluate` command more robust performance ----------- - Fixed a minor bug with count loss shapes - Fixed an issue with dtypes not being correctly assigned in _kl_divergence io ---- - Keep the data as numpy arrays for faster slicing Version 0.9.2 ============= bpnet cmd --------- - Reorganized imports to make `bpnet pipeline-json` immediate - Added a large number of print statements if verbose is True - Added an evaluation at the end of the `fit` command to report validation set performance measures - Added an `evaluate` command that runs a model on a provided set and reports performance measures - Added a reverse complement averaging function to the prediction command - Changed default `negative_ratio` to 0.333. - Changed the `alpha` parameter to `count_loss_weight` for interpretability - If not provided, `count_loss_weight` will be automatically derived from training data as mean(reads_in_peaks) / n_peaks. - Set `scheduler` to True by default - When `scheduler` is True, use a ReduceLROnPlateau scheduler that halves the LR every 5 iterations without improvement - Set default early stopping to 10 - Added in an optional argument for exclusion lists, where peaks and negatives overlapping them are removed bpnet ----- - Changed `alpha` to `count_loss_weight` - Reorganized code to make the `fit` function easier to follow - Added in an internal `_mixture_loss` that ensures consistency in calculating the multinomial - Added in gradient clipping at a norm of 0.5 - Added in option (default True) to only calculate profile loss on peaks - Calculate validation loss only at the end of every epoch - Removes the `validation_iter` argument performance ----------- - Changed `calculate_performance_measures` to internally normalize logits and also to sum counts across all strands to match the model io ---- - Replaced the DataGenerator with PeakNegativeSampler - PeakNegativeSampler takes in a set of peaks and separately a set of negatives and sample from these sets according to a provided ratio, rather than trying to merge the two into a single list - PeakGenerator now extracts peaks and negatives separately and passes them into PeakNegativeSampler. - Filters out regions based on an optionally provided exclusion list - Filters out regions whose signal is larger than the 99th percentile multiplied by 1.2. This multiplication means that if the top 1st percentile is all similar, nothing is filtered out. Version 0.9.1 ============= Highlights ---------- - Improved documentation in losses.py and performance.py - Cast X as float in ChromBPNet code - Improved automatic trimming logic based on actual receptive field - When using ChromBPNet, resizes the bias model predictions when it is a different size than the accessibility model - Add negative sampling ratio to the dataloader and BPNet training code - Add optional learning rate scheduler to BPNet training Version 0.9.0 ============= Highlights ---------- - Added support for specifying what device to use within the JSONs - Added support for specifying what dtype to use within the JSONs pipeline-json ------------- - Added a new command-line function "pipeline-json" - This function takes in filepaths to input data and generates a pipeline JSON that can be modified or immediately run. pipeline -------- - Added support for processing .BAM/.SAM/.tsv/.tsv.gz files into stranded or unstranded bigWigs before training the BPNet model using bam2bw - Added support for calculating GC-matched negatives regions from the provided peak file and FASTA file. Version 0.8.1 ============== Highlights ---------- - Added robustness toward other characters in the nucleotide alphabet. Anything not A, C, G, or T gets ignored. This robustness has been added to PeakGenerator and the command line arguments. Version 0.8.0 ============== Highlights ---------- - When training ChromBPNet bias models from the command line and no loci are provided for training the bias model, the default is changed to inherit from `negatives` instead of `loci`. Colloquially, when the user provides a set of negatives and peak loci to train ChromBPNet, the default is now that the bias model will be trained on the negatives instead of incorrectly on the peaks. - When calculating the maximum number of reads a negative region can have when training the bias model, changed from using the minimum number of reads in the peaks to the 1st quantile of reads because it is more robust. Version 0.7.4 ============== Highlights ---------- - Re-added the `attribute.py` file back in with a `deep_lift_shap` function that wraps tangermeme's function but passes in the layers that must be registered. Version 0.7.3 ============== Highlights ---------- - Added `BasePairNet` in `bpnetlite.bpnet` which is an implementation of the model from the official `basepairmodels` - Added `BasePairNet.from_bpnet` to load TensorFlow-trained models from `basepairmodels` into the PyTorch wrapper - Removed a few dependencies that are no longer needed after using tangermeme Version 0.7.2 ============== Highlights ---------- - Complete inclusion of tangermeme as the backend for operations - Remove the `predict` method for models in favor of `tangermeme.predict` - Remove attribute.py in favor of `tangermeme.deep_lift_shap` - Remove marginalization functions in favor of `tangermeme.marginalize` - Remove plotting functions in favor of `tangermeme.plot` - Alter the `bpnet` and `chrombpnet` command-line tools to account for these changes. - Add in `_Log` and `_Exp` as layers for the ChromBPNet model so that they can be registered as non-linear functions for `deep_lift_shap`. Version 0.7.1 ============== Highlights ---------- - Begin inclusion of tangermeme into the backend. Version 0.7.0 ============== Highlights ---------- - Changed the function name from `calculate_attributions` to `attribute` to be more in line with the `predict` and `marginalize` functions. The functionality and usage should be the same. - Changed the nomenclature from "interpret" to "attribute" to be more consistent with the name of the function and what is used colloquially. Version 0.6.0 ============== Highlights ---------- - Replaced the negative sampling code with a simpler approach that only considers bins of signal rather than operates at bp resolution. This code is much faster and more robust but may produce slightly worse GC matches. - The negative sampling code now allows you to pass in a bigwig so that only regions that pass a threshold are selected. Version 0.5.7 ============== Highlights ---------- - Changed the `warning_threshold` argument to only print a warning rather than end the process when the model exceeds it. - Added support for plotting annotations alongside `plot_attributions` - Fixed various minor bugs. Version 0.5.6 ============== Highlights ---------- - Changed the shape of the returned one-hot encoded sequences to match the documentation. - Fixed an issue with dinucleotide shuffling when not all nucleotides are present. Version 0.5.5 ============== Highlights ---------- - Fixed an issue with ChromBPNet reading. Version 0.5.4 ============== Highlights ---------- - Added in reading of TensorFlow-formatted ChromBPNet models from the official repo using the `from_chrombpnet` commands to the BPNet and ChromBPNet objects. Version 0.5.2 ============== Highlights ---------- - Fixed issue where non-linear operations in DeepLiftShap were not registered correctly and hence causing minor divergences. Through the use of an ugly wrapper object this has been fixed. - Added in `print_convergence_deltas` and `warning_threshold` to the `calculate_attributions` function and the `DeepLiftShap` object. The first will print convergence deltas for every example that gets explained and the second will raise a warning if the divergence is higher than it. Version 0.5.0 ============== Highlights ---------- - Extended support for the `chrombpnet` command-line tool - Now has mirrored functionality of the `bpnet` command-line tool - `chrombpnet pipeline` now mirrors `bpnet pipeline` except that it will run each of the reports on each of the three models: the full ChromBPNet model, the accessibility model, and the bias model. It will train a bias model and an accessibility model if not provided. - Changed the ChromBPNet object to be compatible with the `bpnet` command options. - Fixed issue with attributions where performance would degrade over time. Version 0.4.0 ============== Highlights ---------- - Extended support for the `bpnet` command-line tool - Added in `marginalize` command-line option for generating those reports - Added in `pipeline` command-line option for running a full pipeline from model training to inference, attribution, tfmodisco, and marginalization Version 0.3.0 ============== Highlights ---------- - I forgot. Version 0.2.0 ============== Highlights ---------- - Addition of a `ChromBPNet` model - Addition of an explicit, shared, `Logger` class - "Peak" semantics have been switched to "locus" semantics chrombpnet.py ------------- - Newly added. - This file contains the `ChromBPNet` class, which is a wrapper that takes in two BPNet objects: a pre-trained bias model, and an untrained accessibility model, and specifies the training procedure for training the accessibility model. io.py ----- - The semantics of "peaks", e.g. `extract_peaks`, has been changed to loci, e.g. `extract_loci`, and the associated keywords (now `loci` from `peaks`) can take in a list or tuple of files to interleave them. This means you can now train on peaks and background regions. logging.py ---------- - Newly added. - This file contains the `Logger` class which is a simple way to record and display statistics during training.