Release History
Version 1.0.0
highlights
First major release!
The pipeline has been simplified, cleaned up, and expanded.
Support for training ChromBPNet models has been dropped.
ReadTheDocs documentation has been added in the docs/ folder
The example_jsons folder has been removed in favor of using the pipeline-json command
bpnet
The predict subcommand has been dropped and replaced with the evaluate subcommand
Models are automatically evaluated on the validation set at the end of the fit subcommand
MACS3 has been added as an optional preprocessing step to do peak calling
All preprocessing arguments have been moved into preprocessing_parameters in the pipeline JSON
Reports the number of trainable parameters in the model and its # filters/layers
Moved the model architecture parameters into the fit_parameters JSON instead of the umbrella JSON
Added a -pe argument to pipeline-json to specify if an input is paired end data for MACS3
Removed the find_negatives argument from the umbrella JSON with the assumption that you need to find negatives if you have not provided a file with them.
Version 0.9.5
highlights
Minor bug fixes related to exclusion list and control tracks with the pipeline
Version 0.9.4
io
Fixed a small bug in the calculation of epoch length. Thanks @amanpatel101
Version 0.9.3
bpnet cmd
Made the bpnet evaluate command more robust
performance
Fixed a minor bug with count loss shapes
Fixed an issue with dtypes not being correctly assigned in _kl_divergence
io
Keep the data as numpy arrays for faster slicing
Version 0.9.2
bpnet cmd
Reorganized imports to make bpnet pipeline-json immediate
Added a large number of print statements if verbose is True
Added an evaluation at the end of the fit command to report validation set performance measures
Added an evaluate command that runs a model on a provided set and reports performance measures
Added a reverse complement averaging function to the prediction command
Changed default negative_ratio to 0.333.
Changed the alpha parameter to count_loss_weight for interpretability
If not provided, count_loss_weight will be automatically derived from training data as mean(reads_in_peaks) / n_peaks.
Set scheduler to True by default
When scheduler is True, use a ReduceLROnPlateau scheduler that halves the LR every 5 iterations without improvement
Set default early stopping to 10
Added in an optional argument for exclusion lists, where peaks and negatives overlapping them are removed
bpnet
Changed alpha to count_loss_weight
Reorganized code to make the fit function easier to follow
Added in an internal _mixture_loss that ensures consistency in calculating the multinomial
Added in gradient clipping at a norm of 0.5
Added in option (default True) to only calculate profile loss on peaks
Calculate validation loss only at the end of every epoch
Removes the validation_iter argument
performance
Changed calculate_performance_measures to internally normalize logits and also to sum counts across all strands to match the model
io
Replaced the DataGenerator with PeakNegativeSampler
PeakNegativeSampler takes in a set of peaks and separately a set of negatives and sample from these sets according to a provided ratio, rather than trying to merge the two into a single list
PeakGenerator now extracts peaks and negatives separately and passes them into PeakNegativeSampler.
Filters out regions based on an optionally provided exclusion list
Filters out regions whose signal is larger than the 99th percentile multiplied by 1.2. This multiplication means that if the top 1st percentile is all similar, nothing is filtered out.
Version 0.9.1
Highlights
Improved documentation in losses.py and performance.py
Cast X as float in ChromBPNet code
Improved automatic trimming logic based on actual receptive field
When using ChromBPNet, resizes the bias model predictions when it is a
different size than the accessibility model - Add negative sampling ratio to the dataloader and BPNet training code - Add optional learning rate scheduler to BPNet training
Version 0.9.0
Highlights
Added support for specifying what device to use within the JSONs
Added support for specifying what dtype to use within the JSONs
pipeline-json
Added a new command-line function “pipeline-json”
This function takes in filepaths to input data and generates a pipeline JSON
that can be modified or immediately run.
pipeline
Added support for processing .BAM/.SAM/.tsv/.tsv.gz files into stranded or
unstranded bigWigs before training the BPNet model using bam2bw - Added support for calculating GC-matched negatives regions from the provided peak file and FASTA file.
Version 0.8.1
Highlights
Added robustness toward other characters in the nucleotide alphabet.
Anything not A, C, G, or T gets ignored. This robustness has been added to PeakGenerator and the command line arguments.
Version 0.8.0
Highlights
When training ChromBPNet bias models from the command line and no loci
are provided for training the bias model, the default is changed to inherit from negatives instead of loci. Colloquially, when the user provides a set of negatives and peak loci to train ChromBPNet, the default is now that the bias model will be trained on the negatives instead of incorrectly on the peaks. - When calculating the maximum number of reads a negative region can have when training the bias model, changed from using the minimum number of reads in the peaks to the 1st quantile of reads because it is more robust.
Version 0.7.4
Highlights
Re-added the attribute.py file back in with a deep_lift_shap function
that wraps tangermeme’s function but passes in the layers that must be registered.
Version 0.7.3
Highlights
Added BasePairNet in bpnetlite.bpnet which is an implementation of the
model from the official basepairmodels - Added BasePairNet.from_bpnet to load TensorFlow-trained models from basepairmodels into the PyTorch wrapper - Removed a few dependencies that are no longer needed after using tangermeme
Version 0.7.2
Highlights
Complete inclusion of tangermeme as the backend for operations
Remove the predict method for models in favor of tangermeme.predict
Remove attribute.py in favor of tangermeme.deep_lift_shap
Remove marginalization functions in favor of tangermeme.marginalize
Remove plotting functions in favor of tangermeme.plot
Alter the bpnet and chrombpnet command-line tools to account for
these changes. - Add in _Log and _Exp as layers for the ChromBPNet model so that they can be registered as non-linear functions for deep_lift_shap.
Version 0.7.1
Highlights
Begin inclusion of tangermeme into the backend.
Version 0.7.0
Highlights
Changed the function name from calculate_attributions to attribute
to be more in line with the predict and marginalize functions. The functionality and usage should be the same. - Changed the nomenclature from “interpret” to “attribute” to be more consistent with the name of the function and what is used colloquially.
Version 0.6.0
Highlights
Replaced the negative sampling code with a simpler approach that only
considers bins of signal rather than operates at bp resolution. This code is much faster and more robust but may produce slightly worse GC matches. - The negative sampling code now allows you to pass in a bigwig so that only regions that pass a threshold are selected.
Version 0.5.7
Highlights
Changed the warning_threshold argument to only print a warning rather
than end the process when the model exceeds it. - Added support for plotting annotations alongside plot_attributions - Fixed various minor bugs.
Version 0.5.6
Highlights
Changed the shape of the returned one-hot encoded sequences to match
the documentation. - Fixed an issue with dinucleotide shuffling when not all nucleotides are present.
Version 0.5.5
Highlights
Fixed an issue with ChromBPNet reading.
Version 0.5.4
Highlights
Added in reading of TensorFlow-formatted ChromBPNet models from the
official repo using the from_chrombpnet commands to the BPNet and ChromBPNet objects.
Version 0.5.2
Highlights
Fixed issue where non-linear operations in DeepLiftShap were not
registered correctly and hence causing minor divergences. Through the use of an ugly wrapper object this has been fixed. - Added in print_convergence_deltas and warning_threshold to the calculate_attributions function and the DeepLiftShap object. The first will print convergence deltas for every example that gets explained and the second will raise a warning if the divergence is higher than it.
Version 0.5.0
Highlights
Extended support for the chrombpnet command-line tool
Now has mirrored functionality of the bpnet command-line tool
chrombpnet pipeline now mirrors bpnet pipeline except that it will
run each of the reports on each of the three models: the full ChromBPNet model, the accessibility model, and the bias model. It will train a bias model and an accessibility model if not provided. - Changed the ChromBPNet object to be compatible with the bpnet command options. - Fixed issue with attributions where performance would degrade over time.
Version 0.4.0
Highlights
Extended support for the bpnet command-line tool
Added in marginalize command-line option for generating those reports
Added in pipeline command-line option for running a full pipeline from
model training to inference, attribution, tfmodisco, and marginalization
Version 0.3.0
Highlights
I forgot.
Version 0.2.0
Highlights
Addition of a ChromBPNet model
Addition of an explicit, shared, Logger class
“Peak” semantics have been switched to “locus” semantics
chrombpnet.py
Newly added.
This file contains the ChromBPNet class, which is a wrapper that
takes in two BPNet objects: a pre-trained bias model, and an untrained accessibility model, and specifies the training procedure for training the accessibility model.
io.py
The semantics of “peaks”, e.g. extract_peaks, has been changed to loci,
e.g. extract_loci, and the associated keywords (now loci from peaks) can take in a list or tuple of files to interleave them. This means you can now train on peaks and background regions.
logging.py
Newly added.
This file contains the Logger class which is a simple way to record
and display statistics during training.