Quickstart to the BPNet Pipeline Command
Author: Jacob Scheiber
The easiest way for most people to get started and integrate BPNet into their pipelines is through the command-line tools that come packaged with the bpnet-lite repository. These tools cover the common steps of training and using BPNet models and aim to be as interoperable as possible with other tools and pipelines by relying on common file formats for data inputs and outputs. Further, because these tools are implemented using the Python API, the models and files that result from these tools can be easily loaded into Python if you want to do subsequent analyses or extend them to build your own pipelines.
Here is a complete list of the existing individual command-line tools:
negatives - Take in a set of loci and a genome and output a set of loci that are as GC matched as possible. fit - Train a BPNet model and evaluate it. evaluate - Evaluate the performance of a BPNet model and output the statistics in a TSV. attribute - Apply a trained BPNet model to a set of loci and return the DeepLIFT/SHAP attributions. seqlets - Calculate attributions for a set of regions and then use the tangermeme recursive seqlet caller on them and return a BED file.
marginalize - Calculate marginalization predictions and (optionally) attributions for a set of motifs in a database.
Tying these steps together is the pipeline command, which goes all the raw data to BPNet analysis results. This involves converting BAM/SAM/bed/tsv files to bigWigs (if necessary), calling peaks using MACS3 (if necessary), calling GC-matched negatives to the peaks (if necessary), training and evaluating the BPNet model, calculating DeepLIFT/SHAP attributions, calling and annotating seqlets, running TF-MoDISCo, and running in silico marginalizations.
If you already have a model, you can skip the first few steps and go straight into the attributions and discovery steps.
Using the Pipeline
The pipeline is meant to be as simple and flexible as possible. Because it is a command-line tool, we will run our examples using os.system, but in practice you should use the command-line directly and not a Jupyter notebook / Python script.
Creating a BPNet Pipeline JSON
The pipeline is composed of many steps, and each of these steps have hyperparameters that can be set and require pointers to where the data are. To manage this process, each step has a JSON containing this information, and the pipeline has an umbrella JSON containing JSONs for each of these steps. As steps in the pipeline are executed (such as data processing or peak/negative calling), the pointers are updated in the subsequent steps and JSONs are written out so that you have a record of the exact commands being run in each step.
Having an umbrella JSON like this might seem overwhelming at first, but you can use the pipeline-json command to create a working JSON with default parameters for each of the steps. You should pass in the pointers to your data when running pipeline-json and it will automatically figure out if it needs to preprocess the data and do peak/negative calling.
Here is an example of how to create a pipeline JSON for a MYC BPNet model. Note that the data pointers are to remote files on the ENCODE Portal to demonstrate those capabilities, but they can also be filepaths on your local computer. All you need to do is pass in where your genome FASTA file is, where the input signal files are, where the control files are (if needed), the name you want all of the intermediary steps to use when saving results, and the name of the pipeline JSON.
[1]:
import os
# https://jaspar.elixir.no/download/data/2026/CORE/JASPAR2026_CORE_vertebrates_non-redundant_pfms_meme.txt
#-m $HOME/common/JASPAR2026_CORE_vertebrates_non-redundant_pfms.meme
os.system("""
bpnet pipeline-json\
-s $HOME/common/hg38.fa\
-p https://www.encodeproject.org/files/ENCFF331IUK/@@download/ENCFF331IUK.bed.gz\
-i https://www.encodeproject.org/files/ENCFF074GYD/@@download/ENCFF074GYD.bam\
-i https://www.encodeproject.org/files/ENCFF122BQB/@@download/ENCFF122BQB.bam\
-c https://www.encodeproject.org/files/ENCFF772ZHK/@@download/ENCFF772ZHK.bam\
-n quick-myc\
-o quick-myc-pipeline.json
""")
[1]:
0
We can then take a look at the JSON. It is large, but worth looking at what each of the parameters is.
[2]:
import json
with open("quick-myc-pipeline.json", "r") as infile:
parameters = json.load(infile)
print(json.dumps(parameters, indent=4))
{
"in_window": 2114,
"out_window": 1000,
"name": "quick-myc",
"model": null,
"dtype": "float32",
"device": "cuda",
"batch_size": 64,
"verbose": true,
"min_counts": 0,
"max_counts": 99999999,
"random_state": null,
"exclusion_lists": null,
"sequences": "/home/jmschrei/common/hg38.fa",
"loci": [
"https://www.encodeproject.org/files/ENCFF331IUK/@@download/ENCFF331IUK.bed.gz"
],
"signals": [
"https://www.encodeproject.org/files/ENCFF074GYD/@@download/ENCFF074GYD.bam",
"https://www.encodeproject.org/files/ENCFF122BQB/@@download/ENCFF122BQB.bam"
],
"controls": [
"https://www.encodeproject.org/files/ENCFF772ZHK/@@download/ENCFF772ZHK.bam"
],
"skip": false,
"dry_run": false,
"preprocessing_parameters": {
"unstranded": false,
"fragments": false,
"paired_end": false,
"pos_shift": 0,
"neg_shift": 0,
"callpeaks_format": null,
"callpeaks_gsize": "hs",
"callpeaks_q": 0.01,
"verbose": true
},
"fit_parameters": {
"n_filters": 64,
"n_layers": 8,
"profile_output_bias": true,
"count_output_bias": true,
"batch_size": 64,
"lr": 0.001,
"scheduler": true,
"negative_ratio": 0.333,
"count_loss_weight": null,
"early_stopping": 5,
"max_jitter": 128,
"reverse_complement": true,
"reverse_complement_average": true,
"max_epochs": 20,
"training_chroms": [
"chr2",
"chr4",
"chr5",
"chr7",
"chr9",
"chr10",
"chr11",
"chr12",
"chr13",
"chr14",
"chr15",
"chr16",
"chr17",
"chr18",
"chr19",
"chr21",
"chr22",
"chrX",
"chrY"
],
"validation_chroms": [
"chr8",
"chr20"
],
"sequences": null,
"loci": null,
"signals": null,
"controls": null,
"verbose": null,
"random_state": null,
"summits": false,
"performance_filename": null
},
"attribute_parameters": {
"batch_size": 64,
"chroms": [
"chr8",
"chr20"
],
"output": "counts",
"loci": null,
"device": null,
"ohe_filename": null,
"attr_filename": null,
"idx_filename": null,
"n_shuffles": 20,
"warning_threshold": 0.001,
"random_state": null,
"verbose": null
},
"seqlet_parameters": {
"threshold": 0.01,
"min_seqlet_len": 4,
"max_seqlet_len": 25,
"additional_flanks": 3,
"in_window": null,
"chroms": null,
"verbose": null,
"loci": null,
"ohe_filename": null,
"attr_filename": null,
"idx_filename": null,
"output_filename": null
},
"annotation_parameters": {
"motifs": null,
"sequences": null,
"seqlet_filename": null,
"n_score_bins": 100,
"n_median_bins": 1000,
"n_target_bins": 100,
"n_cache": 250,
"reverse_complement": true,
"n_jobs": -1,
"output_filename": null
},
"modisco_motifs_parameters": {
"n_seqlets": 100000,
"output_filename": null,
"verbose": null
},
"modisco_report_parameters": {
"motifs": null,
"output_folder": null,
"verbose": null
},
"marginalize_parameters": {
"loci": null,
"n_loci": 100,
"attributions": false,
"batch_size": 64,
"shuffle": false,
"random_state": null,
"output_folder": null,
"motifs": null,
"minimal": true,
"device": null,
"verbose": null
},
"negatives": null,
"motifs": null
}
The defaults for each of the steps assume that you want to train a BPNet model. This means that the negative sampling ratio is 0.33 (not 0.1 like for ChromBPNet) and that the max_jitter is 128 instead of 500 for ChromBPNet. If you are trying to train an accessibility model, or train a BPNet model on ATAC-/DNase-seq data directly, you may need to change those parameters manually after creating the JSON.
An important note about the structure of the JSON is that each of the steps will first look for the relevant parameters in their respective JSON (such as the “attribute_parameters” JSON) but, if the parameter is not set there, will inherit the parameter from the umbrella pipeline JSON. This means that you only have to modify the constant parameters, such as filenames to the data or the receptive field of the model, at the highest level of the JSON for the changes to take place in each of the steps. However, it also gives you the flexibility to have custom parameters for any individual step; remember, it only inherits from the umbrella JSON if it is not set, and it allows you to set different values from the umbrella JSON if you would like. This allows you to do things such as train a model on the human genome and evaluate on the mouse genome, or use bfloat16 to train the model and then float32 for the attributions, or train on promoter regions and evaluate at distal peaks.
Providing a motif database
You are not required to provide a motif database to the pipeline. However, if you do, the TF-MoDISCo report will include annotations for each of the discovered patterns and you will run in silico marginalizations using each of the motifs in the database.
[3]:
os.system("""
bpnet pipeline-json\
-s $HOME/common/hg38.fa\
-p https://www.encodeproject.org/files/ENCFF331IUK/@@download/ENCFF331IUK.bed.gz\
-i https://www.encodeproject.org/files/ENCFF074GYD/@@download/ENCFF074GYD.bam\
-i https://www.encodeproject.org/files/ENCFF122BQB/@@download/ENCFF122BQB.bam\
-c https://www.encodeproject.org/files/ENCFF772ZHK/@@download/ENCFF772ZHK.bam\
-n test\
-o test.json\
-m $HOME/common/JASPAR2026_CORE_vertebrates_non-redundant_pfms.meme
""")
[3]:
0
On ATAC-seq data
The pipeline above works well for TF ChIP-seq experiments but several changes are needed for ATAC-/DNase-seq experiments. First, the data usually come in the form of fragments where both the start and the end correspond to cuts. This can be specified using the -f flag. Second, the data is unstranded and so does not need to be split into + and - bigWigs but should be in just one. This can be specified using the -u flag. The data is paired end, and so the -pe flag should be
specified if you need MACS3 to run peak calling on your data (not needed if you already have epaks). Finally, if you want to shift the position of the reads you can pass in a shift on the positive and negative strands using -ps <int> and -ns <int>.
[4]:
os.system("""
bpnet pipeline-json\
-s $HOME/common/hg38.fa\
-p https://www.encodeproject.org/files/ENCFF748UZH/@@download/ENCFF748UZH.bed.gz\
-i https://www.encodeproject.org/files/ENCFF337YBN/@@download/ENCFF337YBN.bam\
-i https://www.encodeproject.org/files/ENCFF981FXV/@@download/ENCFF981FXV.bam\
-i https://www.encodeproject.org/files/ENCFF144GBU/@@download/ENCFF144GBU.bam\
-n example\
-o example-pipeline.json\
-m $HOME/common/JASPAR2026_CORE_vertebrates_non-redundant_pfms.meme\
-u -f -pe -ps 4 -ns -4
""")
[4]:
0
Running the Pipeline
Now that we have a JSON, running the pipeline is just as simple as bpnet pipeline -p <fname>.json where you plug in the pipeline parameter JSON you just created. In principle, this could be combined into one step where you produce the JSON and immediately run it. However, this is intentionally separated so that you can inspect and modify the JSON from the default parameters without interrupting the workflow. For example, if you wanted to run a larger model you would need to go in and
manually change the number of filters or layers.
(Apologies for the many lines of BAM processing output. This only happens when processing many BAMs and when running the command through Jupyter notebook like this. Given that, I didn’t feel a need to fix the issue.)
[5]:
!bpnet pipeline -p quick-myc-pipeline.json
Step 0.2: Convert data to bigWigs
chrEBV encountered in input but not in FASTA/chrom sizes.
ENCFF074GYD.bam: 15759996it [00:17, 913672.45it/s]
ENCFF122BQB.bam: 0it [00:00, ?it/s]
ENCFF122BQB.bam: 10064it [00:00, 100536.79it/s]
ENCFF122BQB.bam: 24436it [00:00, 125925.72it/s]
ENCFF122BQB.bam: 37030it [00:00, 78154.99it/s]
ENCFF122BQB.bam: 89065it [00:00, 191890.46it/s]
ENCFF122BQB.bam: 111165it [00:00, 163297.67it/s]
ENCFF122BQB.bam: 136314it [00:00, 168631.36it/s]
ENCFF122BQB.bam: 154683it [00:00, 168425.69it/s]
ENCFF122BQB.bam: 172534it [00:01, 150130.14it/s]
ENCFF122BQB.bam: 188369it [00:01, 137594.98it/s]
ENCFF122BQB.bam: 204609it [00:01, 143474.20it/s]
ENCFF122BQB.bam: 219526it [00:01, 131796.19it/s]
ENCFF122BQB.bam: 235272it [00:01, 138143.63it/s]
ENCFF122BQB.bam: 249831it [00:01, 130254.77it/s]
ENCFF122BQB.bam: 263780it [00:01, 132485.63it/s]
ENCFF122BQB.bam: 280777it [00:01, 142557.40it/s]
ENCFF122BQB.bam: 295350it [00:02, 131941.84it/s]
ENCFF122BQB.bam: 311051it [00:02, 138482.11it/s]
ENCFF122BQB.bam: 326936it [00:02, 133299.87it/s]
ENCFF122BQB.bam: 341910it [00:02, 137691.67it/s]
ENCFF122BQB.bam: 357716it [00:02, 143317.96it/s]
ENCFF122BQB.bam: 372652it [00:02, 137102.55it/s]
ENCFF122BQB.bam: 386556it [00:02, 137090.62it/s]
ENCFF122BQB.bam: 403088it [00:02, 145059.62it/s]
ENCFF122BQB.bam: 418761it [00:02, 141471.47it/s]
ENCFF122BQB.bam: 433033it [00:03, 141678.00it/s]
ENCFF122BQB.bam: 448598it [00:03, 145462.67it/s]
ENCFF122BQB.bam: 465650it [00:03, 145200.45it/s]
ENCFF122BQB.bam: 480227it [00:03, 144246.03it/s]
ENCFF122BQB.bam: 494887it [00:03, 144917.83it/s]
ENCFF122BQB.bam: 511441it [00:03, 150899.33it/s]
ENCFF122BQB.bam: 526572it [00:03, 145018.63it/s]
ENCFF122BQB.bam: 541594it [00:03, 144830.30it/s]
ENCFF122BQB.bam: 558248it [00:03, 150948.53it/s]
ENCFF122BQB.bam: 573400it [00:03, 149494.42it/s]
ENCFF122BQB.bam: 588389it [00:04, 146379.28it/s]
ENCFF122BQB.bam: 603180it [00:04, 146217.45it/s]
ENCFF122BQB.bam: 620231it [00:04, 150876.18it/s]
ENCFF122BQB.bam: 635331it [00:04, 146342.78it/s]
ENCFF122BQB.bam: 650214it [00:04, 147055.37it/s]
ENCFF122BQB.bam: 666332it [00:04, 149845.78it/s]
ENCFF122BQB.bam: 681440it [00:04, 148543.20it/s]
ENCFF122BQB.bam: 696307it [00:04, 145568.13it/s]
ENCFF122BQB.bam: 710879it [00:04, 141936.34it/s]
ENCFF122BQB.bam: 728936it [00:05, 153049.13it/s]
ENCFF122BQB.bam: 744303it [00:05, 147632.74it/s]
ENCFF122BQB.bam: 759299it [00:05, 144584.20it/s]
ENCFF122BQB.bam: 777012it [00:05, 153858.21it/s]
ENCFF122BQB.bam: 792485it [00:05, 151008.64it/s]
ENCFF122BQB.bam: 807651it [00:05, 145691.31it/s]
ENCFF122BQB.bam: 824019it [00:05, 150720.58it/s]
ENCFF122BQB.bam: 839167it [00:05, 150231.35it/s]
ENCFF122BQB.bam: 854385it [00:05, 150796.96it/s]
ENCFF122BQB.bam: 869503it [00:06, 146008.90it/s]
ENCFF122BQB.bam: 886745it [00:06, 153631.42it/s]
ENCFF122BQB.bam: 902175it [00:06, 153301.21it/s]
ENCFF122BQB.bam: 917551it [00:06, 147132.93it/s]
ENCFF122BQB.bam: 933340it [00:06, 150217.00it/s]
ENCFF122BQB.bam: 948498it [00:06, 150611.76it/s]
ENCFF122BQB.bam: 963608it [00:06, 150459.62it/s]
ENCFF122BQB.bam: 978688it [00:06, 145405.83it/s]
ENCFF122BQB.bam: 995504it [00:06, 151985.61it/s]
ENCFF122BQB.bam: 1010765it [00:06, 148308.00it/s]
ENCFF122BQB.bam: 1025652it [00:07, 147545.21it/s]
ENCFF122BQB.bam: 1040614it [00:07, 146849.04it/s]
ENCFF122BQB.bam: 1056884it [00:07, 149296.29it/s]
ENCFF122BQB.bam: 1071830it [00:07, 149101.09it/s]
ENCFF122BQB.bam: 1086751it [00:07, 147789.22it/s]
ENCFF122BQB.bam: 1102203it [00:07, 149712.60it/s]
ENCFF122BQB.bam: 1117184it [00:07, 148945.16it/s]
ENCFF122BQB.bam: 1132085it [00:07, 148829.04it/s]
ENCFF122BQB.bam: 1146973it [00:07, 147274.18it/s]
ENCFF122BQB.bam: 1162787it [00:07, 150483.93it/s]
ENCFF122BQB.bam: 1178131it [00:08, 151206.71it/s]
ENCFF122BQB.bam: 1193259it [00:08, 149218.96it/s]
ENCFF122BQB.bam: 1208190it [00:08, 147847.84it/s]
ENCFF122BQB.bam: 1223451it [00:08, 149219.28it/s]
ENCFF122BQB.bam: 1239333it [00:08, 151383.89it/s]
ENCFF122BQB.bam: 1254477it [00:08, 150307.96it/s]
ENCFF122BQB.bam: 1269941it [00:08, 150681.45it/s]
ENCFF122BQB.bam: 1285050it [00:08, 150322.86it/s]
ENCFF122BQB.bam: 1301274it [00:08, 153854.39it/s]
ENCFF122BQB.bam: 1316665it [00:08, 151459.14it/s]
ENCFF122BQB.bam: 1332042it [00:09, 152138.90it/s]
ENCFF122BQB.bam: 1347814it [00:09, 152419.81it/s]
ENCFF122BQB.bam: 1363316it [00:09, 150971.12it/s]
ENCFF122BQB.bam: 1379589it [00:09, 154366.75it/s]
ENCFF122BQB.bam: 1395195it [00:09, 154866.16it/s]
ENCFF122BQB.bam: 1410691it [00:09, 154602.50it/s]
ENCFF122BQB.bam: 1426158it [00:09, 154047.89it/s]
ENCFF122BQB.bam: 1441567it [00:09, 152624.57it/s]
ENCFF122BQB.bam: 1456835it [00:09, 149762.98it/s]
ENCFF122BQB.bam: 1474902it [00:10, 158819.12it/s]
ENCFF122BQB.bam: 1491170it [00:10, 158139.06it/s]
ENCFF122BQB.bam: 1507425it [00:10, 159435.72it/s]
ENCFF122BQB.bam: 1523386it [00:10, 154606.66it/s]
ENCFF122BQB.bam: 1538886it [00:10, 153503.98it/s]
ENCFF122BQB.bam: 1557410it [00:10, 160881.97it/s]
ENCFF122BQB.bam: 1573517it [00:10, 158595.40it/s]
ENCFF122BQB.bam: 1590721it [00:10, 162499.92it/s]
ENCFF122BQB.bam: 1608158it [00:10, 165885.80it/s]
ENCFF122BQB.bam: 1624769it [00:10, 164123.18it/s]
ENCFF122BQB.bam: 1641199it [00:11, 161618.03it/s]
ENCFF122BQB.bam: 1658897it [00:11, 166115.75it/s]
ENCFF122BQB.bam: 1676431it [00:11, 168835.32it/s]
ENCFF122BQB.bam: 1693335it [00:11, 156881.24it/s]
ENCFF122BQB.bam: 1711623it [00:11, 161479.18it/s]
ENCFF122BQB.bam: 1727903it [00:11, 160903.81it/s]
ENCFF122BQB.bam: 1744083it [00:11, 160471.13it/s]
ENCFF122BQB.bam: 1762741it [00:11, 168045.79it/s]
ENCFF122BQB.bam: 1779615it [00:11, 154203.47it/s]
ENCFF122BQB.bam: 1795323it [00:12, 153748.05it/s]
ENCFF122BQB.bam: 1812368it [00:12, 157607.77it/s]
ENCFF122BQB.bam: 1832128it [00:12, 169003.69it/s]
ENCFF122BQB.bam: 1851117it [00:12, 174075.22it/s]
ENCFF122BQB.bam: 1868946it [00:12, 174880.61it/s]
ENCFF122BQB.bam: 1887114it [00:12, 170926.89it/s]
ENCFF122BQB.bam: 1907405it [00:12, 180152.84it/s]
ENCFF122BQB.bam: 1927451it [00:12, 185898.39it/s]
ENCFF122BQB.bam: 1946663it [00:12, 187726.35it/s]
ENCFF122BQB.bam: 1965496it [00:12, 180090.19it/s]
ENCFF122BQB.bam: 1986699it [00:13, 189295.59it/s]
ENCFF122BQB.bam: 2005737it [00:13, 189083.47it/s]
ENCFF122BQB.bam: 2026627it [00:13, 189773.87it/s]
ENCFF122BQB.bam: 2045963it [00:13, 190811.70it/s]
ENCFF122BQB.bam: 2067890it [00:13, 199145.87it/s]
ENCFF122BQB.bam: 2090393it [00:13, 206780.40it/s]
ENCFF122BQB.bam: 2111118it [00:13, 195665.64it/s]
ENCFF122BQB.bam: 2133365it [00:13, 203328.73it/s]
ENCFF122BQB.bam: 2156014it [00:13, 210044.59it/s]
ENCFF122BQB.bam: 2178871it [00:13, 209752.70it/s]
ENCFF122BQB.bam: 2199933it [00:14, 207578.44it/s]
ENCFF122BQB.bam: 2222715it [00:14, 213449.82it/s]
ENCFF122BQB.bam: 2247309it [00:14, 222969.24it/s]
ENCFF122BQB.bam: 2270688it [00:14, 225333.74it/s]
ENCFF122BQB.bam: 2293269it [00:14, 224200.66it/s]
ENCFF122BQB.bam: 2317238it [00:14, 228781.18it/s]
ENCFF122BQB.bam: 2342060it [00:14, 234554.68it/s]
ENCFF122BQB.bam: 2367180it [00:14, 238332.99it/s]
ENCFF122BQB.bam: 2391031it [00:14, 233847.83it/s]
ENCFF122BQB.bam: 2415609it [00:14, 236368.38it/s]
ENCFF122BQB.bam: 2441579it [00:15, 242979.96it/s]
ENCFF122BQB.bam: 2470562it [00:15, 256861.24it/s]
ENCFF122BQB.bam: 2496283it [00:15, 249111.87it/s]
ENCFF122BQB.bam: 2522430it [00:15, 252715.54it/s]
ENCFF122BQB.bam: 2550243it [00:15, 260190.39it/s]
ENCFF122BQB.bam: 2579894it [00:15, 265251.45it/s]
ENCFF122BQB.bam: 2607404it [00:15, 243452.80it/s]
ENCFF122BQB.bam: 2632117it [00:15, 227739.53it/s]
ENCFF122BQB.bam: 2662427it [00:15, 239980.79it/s]
ENCFF122BQB.bam: 2686725it [00:16, 237019.75it/s]
ENCFF122BQB.bam: 2710618it [00:16, 236677.07it/s]
ENCFF122BQB.bam: 2734417it [00:16, 218750.01it/s]
ENCFF122BQB.bam: 2756650it [00:16, 219715.64it/s]
ENCFF122BQB.bam: 2779341it [00:16, 221721.23it/s]
ENCFF122BQB.bam: 2801678it [00:16, 210924.13it/s]
ENCFF122BQB.bam: 2824436it [00:16, 215565.20it/s]
ENCFF122BQB.bam: 2848523it [00:16, 222770.59it/s]
ENCFF122BQB.bam: 2870955it [00:16, 217107.55it/s]
ENCFF122BQB.bam: 2894114it [00:17, 221257.20it/s]
ENCFF122BQB.bam: 2917676it [00:17, 225427.81it/s]
ENCFF122BQB.bam: 2941442it [00:17, 229014.22it/s]
ENCFF122BQB.bam: 2964414it [00:17, 192432.97it/s]
ENCFF122BQB.bam: 2995241it [00:17, 222054.69it/s]
ENCFF122BQB.bam: 3018568it [00:17, 208542.43it/s]
ENCFF122BQB.bam: 3040311it [00:17, 190909.16it/s]
ENCFF122BQB.bam: 3060204it [00:17, 190630.99it/s]
ENCFF122BQB.bam: 3079825it [00:18, 179385.68it/s]
ENCFF122BQB.bam: 3098201it [00:18, 177339.33it/s]
ENCFF122BQB.bam: 3117634it [00:18, 176459.29it/s]
ENCFF122BQB.bam: 3135478it [00:18, 174550.83it/s]
ENCFF122BQB.bam: 3153319it [00:18, 175615.10it/s]
ENCFF122BQB.bam: 3171088it [00:18, 176193.54it/s]
ENCFF122BQB.bam: 3190065it [00:18, 176273.68it/s]
ENCFF122BQB.bam: 3208676it [00:18, 178769.35it/s]
ENCFF122BQB.bam: 3226598it [00:18, 176268.05it/s]
ENCFF122BQB.bam: 3246593it [00:18, 183137.26it/s]
ENCFF122BQB.bam: 3265239it [00:19, 182698.78it/s]
ENCFF122BQB.bam: 3284608it [00:19, 184626.47it/s]
ENCFF122BQB.bam: 3303095it [00:19, 180803.36it/s]
ENCFF122BQB.bam: 3323773it [00:19, 188368.66it/s]
ENCFF122BQB.bam: 3342652it [00:19, 185519.85it/s]
ENCFF122BQB.bam: 3361238it [00:19, 184525.99it/s]
ENCFF122BQB.bam: 3379714it [00:19, 184226.01it/s]
ENCFF122BQB.bam: 3400058it [00:19, 189772.54it/s]
ENCFF122BQB.bam: 3419055it [00:19, 187077.04it/s]
ENCFF122BQB.bam: 3437782it [00:19, 184184.40it/s]
ENCFF122BQB.bam: 3456993it [00:20, 186332.32it/s]
ENCFF122BQB.bam: 3478689it [00:20, 193572.36it/s]
ENCFF122BQB.bam: 3498057it [00:20, 192432.86it/s]
ENCFF122BQB.bam: 3517307it [00:20, 189590.40it/s]
ENCFF122BQB.bam: 3537967it [00:20, 194442.35it/s]
ENCFF122BQB.bam: 3558125it [00:20, 196368.31it/s]
ENCFF122BQB.bam: 3577777it [00:20, 195725.63it/s]
ENCFF122BQB.bam: 3597360it [00:20, 193783.58it/s]
ENCFF122BQB.bam: 3617916it [00:20, 197262.78it/s]
ENCFF122BQB.bam: 3638331it [00:21, 199305.67it/s]
ENCFF122BQB.bam: 3658272it [00:21, 198294.78it/s]
ENCFF122BQB.bam: 3678110it [00:21, 195640.96it/s]
ENCFF122BQB.bam: 3698358it [00:21, 197659.84it/s]
ENCFF122BQB.bam: 3718445it [00:21, 198610.39it/s]
ENCFF122BQB.bam: 3738561it [00:21, 199367.44it/s]
ENCFF122BQB.bam: 3758505it [00:21, 195179.08it/s]
ENCFF122BQB.bam: 3779765it [00:21, 200202.42it/s]
ENCFF122BQB.bam: 3799810it [00:21, 197101.76it/s]
ENCFF122BQB.bam: 3819650it [00:21, 197480.64it/s]
ENCFF122BQB.bam: 3839416it [00:22, 187924.79it/s]
ENCFF122BQB.bam: 3858308it [00:22, 165621.84it/s]
ENCFF122BQB.bam: 3882374it [00:22, 185497.38it/s]
ENCFF122BQB.bam: 3901539it [00:22, 177783.96it/s]
ENCFF122BQB.bam: 3919776it [00:22, 162521.02it/s]
ENCFF122BQB.bam: 3936523it [00:22, 157002.33it/s]
ENCFF122BQB.bam: 3952553it [00:22, 154264.40it/s]
ENCFF122BQB.bam: 3968190it [00:22, 153538.81it/s]
ENCFF122BQB.bam: 3983944it [00:22, 150800.44it/s]
ENCFF122BQB.bam: 3999117it [00:23, 149315.24it/s]
ENCFF122BQB.bam: 4015439it [00:23, 153226.86it/s]
ENCFF122BQB.bam: 4032594it [00:23, 158480.08it/s]
ENCFF122BQB.bam: 4048513it [00:23, 150931.60it/s]
ENCFF122BQB.bam: 4064902it [00:23, 154599.19it/s]
ENCFF122BQB.bam: 4081088it [00:23, 156689.31it/s]
ENCFF122BQB.bam: 4096834it [00:23, 156536.42it/s]
ENCFF122BQB.bam: 4112542it [00:23, 155658.92it/s]
ENCFF122BQB.bam: 4128145it [00:23, 137005.97it/s]
ENCFF122BQB.bam: 4144371it [00:24, 143351.37it/s]
ENCFF122BQB.bam: 4159062it [00:24, 136703.39it/s]
ENCFF122BQB.bam: 4173015it [00:24, 127441.79it/s]
ENCFF122BQB.bam: 4186031it [00:24, 127196.81it/s]
ENCFF122BQB.bam: 4198938it [00:24, 121662.53it/s]
ENCFF122BQB.bam: 4211250it [00:24, 119058.71it/s]
ENCFF122BQB.bam: 4223779it [00:24, 119949.57it/s]
ENCFF122BQB.bam: 4235845it [00:24, 119486.99it/s]
ENCFF122BQB.bam: 4247841it [00:24, 115325.48it/s]
ENCFF122BQB.bam: 4261345it [00:25, 120534.40it/s]
ENCFF122BQB.bam: 4274230it [00:25, 122914.48it/s]
ENCFF122BQB.bam: 4286577it [00:25, 120171.02it/s]
ENCFF122BQB.bam: 4299286it [00:25, 122164.77it/s]
ENCFF122BQB.bam: 4312487it [00:25, 124105.00it/s]
ENCFF122BQB.bam: 4325561it [00:25, 126045.04it/s]
ENCFF122BQB.bam: 4338192it [00:25, 122484.18it/s]
ENCFF122BQB.bam: 4352164it [00:25, 127485.94it/s]
ENCFF122BQB.bam: 4365166it [00:25, 127544.29it/s]
ENCFF122BQB.bam: 4378342it [00:25, 126719.32it/s]
ENCFF122BQB.bam: 4391036it [00:26, 125924.70it/s]
ENCFF122BQB.bam: 4404682it [00:26, 129006.95it/s]
ENCFF122BQB.bam: 4418602it [00:26, 132015.11it/s]
ENCFF122BQB.bam: 4431821it [00:26, 128675.30it/s]
ENCFF122BQB.bam: 4444895it [00:26, 129277.90it/s]
ENCFF122BQB.bam: 4458870it [00:26, 132361.44it/s]
ENCFF122BQB.bam: 4472492it [00:26, 132704.45it/s]
ENCFF122BQB.bam: 4485777it [00:26, 129305.19it/s]
ENCFF122BQB.bam: 4498837it [00:26, 129677.76it/s]
ENCFF122BQB.bam: 4513561it [00:27, 134796.10it/s]
ENCFF122BQB.bam: 4527121it [00:27, 134975.69it/s]
ENCFF122BQB.bam: 4540635it [00:27, 131068.39it/s]
ENCFF122BQB.bam: 4553854it [00:27, 130773.61it/s]
ENCFF122BQB.bam: 4568580it [00:27, 135262.17it/s]
ENCFF122BQB.bam: 4582456it [00:27, 136291.77it/s]
ENCFF122BQB.bam: 4596104it [00:27, 132737.88it/s]
ENCFF122BQB.bam: 4609408it [00:27, 131280.85it/s]
ENCFF122BQB.bam: 4624227it [00:27, 136213.17it/s]
ENCFF122BQB.bam: 4637877it [00:27, 133687.16it/s]
ENCFF122BQB.bam: 4651272it [00:28, 133508.41it/s]
ENCFF122BQB.bam: 4664641it [00:28, 133427.75it/s]
ENCFF122BQB.bam: 4679774it [00:28, 138716.34it/s]
ENCFF122BQB.bam: 4693663it [00:28, 135946.38it/s]
ENCFF122BQB.bam: 4707280it [00:28, 133676.86it/s]
ENCFF122BQB.bam: 4720853it [00:28, 133441.57it/s]
ENCFF122BQB.bam: 4734417it [00:28, 133881.95it/s]
ENCFF122BQB.bam: 4748756it [00:28, 136643.63it/s]
ENCFF122BQB.bam: 4762433it [00:28, 135119.29it/s]
ENCFF122BQB.bam: 4775956it [00:28, 133471.10it/s]
ENCFF122BQB.bam: 4789439it [00:29, 133370.75it/s]
ENCFF122BQB.bam: 4803778it [00:29, 136162.45it/s]
ENCFF122BQB.bam: 4817403it [00:29, 135419.47it/s]
ENCFF122BQB.bam: 4831295it [00:29, 134300.64it/s]
ENCFF122BQB.bam: 4845200it [00:29, 135695.91it/s]
ENCFF122BQB.bam: 4858801it [00:29, 134523.97it/s]
ENCFF122BQB.bam: 4873052it [00:29, 136878.89it/s]
ENCFF122BQB.bam: 4886749it [00:29, 135864.75it/s]
ENCFF122BQB.bam: 4900342it [00:29, 134957.61it/s]
ENCFF122BQB.bam: 4914443it [00:29, 136746.20it/s]
ENCFF122BQB.bam: 4928167it [00:30, 136317.98it/s]
ENCFF122BQB.bam: 4941803it [00:30, 131660.16it/s]
ENCFF122BQB.bam: 4956059it [00:30, 134664.80it/s]
ENCFF122BQB.bam: 4970397it [00:30, 136665.13it/s]
ENCFF122BQB.bam: 4984350it [00:30, 132079.96it/s]
ENCFF122BQB.bam: 4998944it [00:30, 136066.03it/s]
ENCFF122BQB.bam: 5012603it [00:30, 135142.61it/s]
ENCFF122BQB.bam: 5027360it [00:30, 138740.50it/s]
ENCFF122BQB.bam: 5041269it [00:30, 133768.75it/s]
ENCFF122BQB.bam: 5055640it [00:31, 134005.18it/s]
ENCFF122BQB.bam: 5070361it [00:31, 137437.47it/s]
ENCFF122BQB.bam: 5084144it [00:31, 135106.19it/s]
ENCFF122BQB.bam: 5098016it [00:31, 136152.70it/s]
ENCFF122BQB.bam: 5112076it [00:31, 137454.21it/s]
ENCFF122BQB.bam: 5126152it [00:31, 138428.46it/s]
ENCFF122BQB.bam: 5140011it [00:31, 133078.18it/s]
ENCFF122BQB.bam: 5154425it [00:31, 135668.66it/s]
ENCFF122BQB.bam: 5168888it [00:31, 138279.31it/s]
ENCFF122BQB.bam: 5182754it [00:31, 136697.48it/s]
ENCFF122BQB.bam: 5196452it [00:32, 136444.90it/s]
ENCFF122BQB.bam: 5210582it [00:32, 137200.15it/s]
ENCFF122BQB.bam: 5224907it [00:32, 138663.47it/s]
ENCFF122BQB.bam: 5238785it [00:32, 136008.47it/s]
ENCFF122BQB.bam: 5252407it [00:32, 135952.01it/s]
ENCFF122BQB.bam: 5266740it [00:32, 138097.64it/s]
ENCFF122BQB.bam: 5281819it [00:32, 141859.39it/s]
ENCFF122BQB.bam: 5296018it [00:32, 137514.06it/s]
ENCFF122BQB.bam: 5309805it [00:32, 137554.79it/s]
ENCFF122BQB.bam: 5324862it [00:32, 141269.39it/s]
ENCFF122BQB.bam: 5339510it [00:33, 142808.88it/s]
ENCFF122BQB.bam: 5353810it [00:33, 142333.81it/s]
ENCFF122BQB.bam: 5368253it [00:33, 141501.84it/s]
ENCFF122BQB.bam: 5383453it [00:33, 144338.21it/s]
ENCFF122BQB.bam: 5398473it [00:33, 144193.00it/s]
ENCFF122BQB.bam: 5413197it [00:33, 144462.21it/s]
ENCFF122BQB.bam: 5427860it [00:33, 145101.91it/s]
ENCFF122BQB.bam: 5443803it [00:33, 148363.22it/s]
ENCFF122BQB.bam: 5458641it [00:33, 146947.83it/s]
ENCFF122BQB.bam: 5474072it [00:33, 149121.80it/s]
ENCFF122BQB.bam: 5488990it [00:34, 148575.98it/s]
ENCFF122BQB.bam: 5505448it [00:34, 153327.78it/s]
ENCFF122BQB.bam: 5520789it [00:34, 149928.45it/s]
ENCFF122BQB.bam: 5536095it [00:34, 150847.59it/s]
ENCFF122BQB.bam: 5552301it [00:34, 154093.85it/s]
ENCFF122BQB.bam: 5568576it [00:34, 156662.25it/s]
ENCFF122BQB.bam: 5584256it [00:34, 152651.36it/s]
ENCFF122BQB.bam: 5601131it [00:34, 156156.83it/s]
ENCFF122BQB.bam: 5618240it [00:34, 160537.78it/s]
ENCFF122BQB.bam: 5634320it [00:35, 157828.29it/s]
ENCFF122BQB.bam: 5651120it [00:35, 159680.17it/s]
ENCFF122BQB.bam: 5668170it [00:35, 162824.63it/s]
ENCFF122BQB.bam: 5685615it [00:35, 165596.66it/s]
ENCFF122BQB.bam: 5702287it [00:35, 164896.76it/s]
ENCFF122BQB.bam: 5718842it [00:35, 165088.35it/s]
ENCFF122BQB.bam: 5737426it [00:35, 171251.74it/s]
ENCFF122BQB.bam: 5754563it [00:35, 170417.76it/s]
ENCFF122BQB.bam: 5772831it [00:35, 172416.64it/s]
ENCFF122BQB.bam: 5790076it [00:35, 171951.67it/s]
ENCFF122BQB.bam: 5810032it [00:36, 178624.52it/s]
ENCFF122BQB.bam: 5828239it [00:36, 177975.11it/s]
ENCFF122BQB.bam: 5847225it [00:36, 180377.95it/s]
ENCFF122BQB.bam: 5865263it [00:36, 180377.72it/s]
ENCFF122BQB.bam: 5885575it [00:36, 187124.95it/s]
ENCFF122BQB.bam: 5904295it [00:36, 185675.49it/s]
ENCFF122BQB.bam: 5923178it [00:36, 175703.97it/s]
ENCFF122BQB.bam: 5943947it [00:36, 184827.74it/s]
ENCFF122BQB.bam: 5965599it [00:36, 194002.17it/s]
ENCFF122BQB.bam: 5986326it [00:36, 186256.70it/s]
ENCFF122BQB.bam: 6008906it [00:37, 197420.44it/s]
ENCFF122BQB.bam: 6031262it [00:37, 204780.87it/s]
ENCFF122BQB.bam: 6053343it [00:37, 198636.79it/s]
ENCFF122BQB.bam: 6075215it [00:37, 204312.27it/s]
ENCFF122BQB.bam: 6099053it [00:37, 214111.90it/s]
ENCFF122BQB.bam: 6123089it [00:37, 208875.23it/s]
ENCFF122BQB.bam: 6145943it [00:37, 214374.44it/s]
ENCFF122BQB.bam: 6169450it [00:37, 220308.25it/s]
ENCFF122BQB.bam: 6195510it [00:37, 218552.69it/s]
ENCFF122BQB.bam: 6219273it [00:38, 223671.43it/s]
ENCFF122BQB.bam: 6244224it [00:38, 231067.19it/s]
ENCFF122BQB.bam: 6270941it [00:38, 241546.27it/s]
ENCFF122BQB.bam: 6295208it [00:38, 233416.08it/s]
ENCFF122BQB.bam: 6319848it [00:38, 236937.49it/s]
ENCFF122BQB.bam: 6346192it [00:38, 244645.96it/s]
ENCFF122BQB.bam: 6374856it [00:38, 253362.43it/s]
ENCFF122BQB.bam: 6400248it [00:38, 250414.82it/s]
ENCFF122BQB.bam: 6427939it [00:38, 258156.31it/s]
ENCFF122BQB.bam: 6457354it [00:38, 268568.64it/s]
ENCFF122BQB.bam: 6484263it [00:39, 264664.20it/s]
ENCFF122BQB.bam: 6510774it [00:39, 260594.60it/s]
ENCFF122BQB.bam: 6538928it [00:39, 266718.43it/s]
ENCFF122BQB.bam: 6571648it [00:39, 284510.45it/s]
ENCFF122BQB.bam: 6600161it [00:39, 284037.06it/s]
ENCFF122BQB.bam: 6628608it [00:39, 282013.06it/s]
ENCFF122BQB.bam: 6658715it [00:39, 287655.93it/s]
ENCFF122BQB.bam: 6692329it [00:39, 302071.27it/s]
ENCFF122BQB.bam: 6725479it [00:39, 310020.88it/s]
ENCFF122BQB.bam: 6756505it [00:39, 300414.31it/s]
ENCFF122BQB.bam: 6788640it [00:40, 306518.04it/s]
ENCFF122BQB.bam: 6824990it [00:40, 323292.10it/s]
ENCFF122BQB.bam: 6859550it [00:40, 328533.67it/s]
ENCFF122BQB.bam: 6892462it [00:40, 321553.12it/s]
ENCFF122BQB.bam: 6926588it [00:40, 327305.16it/s]
ENCFF122BQB.bam: 6963600it [00:40, 339931.40it/s]
ENCFF122BQB.bam: 7002538it [00:40, 349188.88it/s]
ENCFF122BQB.bam: 7037481it [00:40, 341885.47it/s]
ENCFF122BQB.bam: 7074958it [00:40, 351471.91it/s]
ENCFF122BQB.bam: 7115455it [00:41, 367200.85it/s]
ENCFF122BQB.bam: 7156781it [00:41, 380782.84it/s]
ENCFF122BQB.bam: 7194928it [00:41, 368782.97it/s]
ENCFF122BQB.bam: 7235994it [00:41, 380972.90it/s]
ENCFF122BQB.bam: 7279729it [00:41, 397533.10it/s]
ENCFF122BQB.bam: 7321860it [00:41, 399955.79it/s]
ENCFF122BQB.bam: 7361937it [00:41, 393129.36it/s]
ENCFF122BQB.bam: 7405164it [00:41, 404483.63it/s]
ENCFF122BQB.bam: 7452315it [00:41, 424230.29it/s]
ENCFF122BQB.bam: 7495402it [00:41, 415229.14it/s]
ENCFF122BQB.bam: 7540053it [00:42, 424359.06it/s]
ENCFF122BQB.bam: 7586168it [00:42, 435184.57it/s]
ENCFF122BQB.bam: 7638781it [00:42, 462101.56it/s]
ENCFF122BQB.bam: 7685091it [00:42, 443893.25it/s]
ENCFF122BQB.bam: 7732120it [00:42, 451531.14it/s]
ENCFF122BQB.bam: 7782525it [00:42, 466895.79it/s]
ENCFF122BQB.bam: 7836705it [00:42, 479047.58it/s]
ENCFF122BQB.bam: 7884688it [00:42, 473477.73it/s]
ENCFF122BQB.bam: 7935823it [00:42, 484529.04it/s]
ENCFF122BQB.bam: 7995796it [00:42, 518427.47it/s]
ENCFF122BQB.bam: 8047754it [00:43, 513851.26it/s]
ENCFF122BQB.bam: 8099224it [00:43, 507484.70it/s]
ENCFF122BQB.bam: 8150041it [00:43, 499576.43it/s]
ENCFF122BQB.bam: 8223009it [00:43, 559887.92it/s]
ENCFF122BQB.bam: 8279020it [00:43, 537351.79it/s]
ENCFF122BQB.bam: 8338398it [00:43, 553396.27it/s]
ENCFF122BQB.bam: 8404231it [00:43, 583820.86it/s]
ENCFF122BQB.bam: 8462852it [00:43, 556931.37it/s]
ENCFF122BQB.bam: 8521663it [00:43, 565769.88it/s]
ENCFF122BQB.bam: 8589544it [00:44, 598476.35it/s]
ENCFF122BQB.bam: 8658847it [00:44, 617749.91it/s]
ENCFF122BQB.bam: 8720849it [00:44, 598372.32it/s]
ENCFF122BQB.bam: 8793859it [00:44, 636240.33it/s]
ENCFF122BQB.bam: 8865556it [00:44, 659741.74it/s]
ENCFF122BQB.bam: 8931840it [00:44, 632136.29it/s]
ENCFF122BQB.bam: 9004561it [00:44, 659368.89it/s]
ENCFF122BQB.bam: 9079764it [00:44, 686275.15it/s]
ENCFF122BQB.bam: 9148979it [00:44, 686122.01it/s]
ENCFF122BQB.bam: 9217862it [00:44, 672766.00it/s]
ENCFF122BQB.bam: 9295417it [00:45, 702766.94it/s]
ENCFF122BQB.bam: 9370394it [00:45, 716596.81it/s]
ENCFF122BQB.bam: 9446132it [00:45, 717706.61it/s]
ENCFF122BQB.bam: 9519768it [00:45, 723188.23it/s]
ENCFF122BQB.bam: 9600314it [00:45, 747520.27it/s]
ENCFF122BQB.bam: 9682828it [00:45, 770574.32it/s]
ENCFF122BQB.bam: 9759985it [00:45, 748069.81it/s]
ENCFF122BQB.bam: 9843584it [00:45, 773765.67it/s]
ENCFF122BQB.bam: 9929719it [00:45, 799561.50it/s]
ENCFF122BQB.bam: 10016426it [00:45, 819551.20it/s]
ENCFF122BQB.bam: 10098543it [00:46, 782711.32it/s]
ENCFF122BQB.bam: 10186645it [00:46, 811066.03it/s]
ENCFF122BQB.bam: 10274991it [00:46, 832188.56it/s]
ENCFF122BQB.bam: 10361639it [00:46, 842272.43it/s]
ENCFF122BQB.bam: 10446135it [00:46, 838866.29it/s]
ENCFF122BQB.bam: 10536861it [00:46, 859113.13it/s]
ENCFF122BQB.bam: 10628441it [00:46, 875960.33it/s]
ENCFF122BQB.bam: 10722529it [00:46, 895036.73it/s]
ENCFF122BQB.bam: 10813184it [00:46, 898337.19it/s]
ENCFF122BQB.bam: 10905528it [00:46, 905838.08it/s]
ENCFF122BQB.bam: 11002216it [00:47, 923865.81it/s]
ENCFF122BQB.bam: 11094643it [00:47, 912099.32it/s]
ENCFF122BQB.bam: 11185913it [00:47, 888311.11it/s]
ENCFF122BQB.bam: 11283650it [00:47, 914338.16it/s]
ENCFF122BQB.bam: 11375981it [00:47, 916977.27it/s]
ENCFF122BQB.bam: 11474142it [00:47, 936115.07it/s]
ENCFF122BQB.bam: 11571184it [00:47, 946309.72it/s]
ENCFF122BQB.bam: 11665906it [00:47, 945950.70it/s]
ENCFF122BQB.bam: 11760565it [00:47, 921254.26it/s]
ENCFF122BQB.bam: 11859949it [00:48, 942401.34it/s]
ENCFF122BQB.bam: 11958566it [00:48, 955322.29it/s]
ENCFF122BQB.bam: 12055249it [00:48, 958728.23it/s]
ENCFF122BQB.bam: 12151224it [00:48, 957777.80it/s]
ENCFF122BQB.bam: 12247783it [00:48, 959984.79it/s]
ENCFF122BQB.bam: 12346971it [00:48, 969418.59it/s]
ENCFF122BQB.bam: 12446143it [00:48, 976082.78it/s]
ENCFF122BQB.bam: 12544844it [00:48, 979348.20it/s]
ENCFF122BQB.bam: 12642801it [00:48, 960223.90it/s]
ENCFF122BQB.bam: 12740249it [00:48, 964161.45it/s]
ENCFF122BQB.bam: 12836739it [00:49, 963781.45it/s]
ENCFF122BQB.bam: 12933168it [00:49, 954855.95it/s]
ENCFF122BQB.bam: 13031601it [00:49, 963586.88it/s]
ENCFF122BQB.bam: 13128004it [00:49, 962816.32it/s]
ENCFF122BQB.bam: 13225738it [00:49, 967140.61it/s]
ENCFF122BQB.bam: 13322477it [00:49, 958070.86it/s]
ENCFF122BQB.bam: 13420987it [00:49, 966094.17it/s]
ENCFF122BQB.bam: 13519240it [00:49, 970986.09it/s]
ENCFF122BQB.bam: 13616802it [00:49, 972364.91it/s]
ENCFF122BQB.bam: 13714057it [00:49, 960004.62it/s]
ENCFF122BQB.bam: 13811303it [00:50, 963692.93it/s]
ENCFF122BQB.bam: 13907723it [00:50, 963838.64it/s]
ENCFF122BQB.bam: 14006104it [00:50, 969542.98it/s]
ENCFF122BQB.bam: 14104716it [00:50, 974491.74it/s]
ENCFF122BQB.bam: 14202595it [00:50, 975775.00it/s]
ENCFF122BQB.bam: 14300185it [00:50, 963043.02it/s]
ENCFF122BQB.bam: 14397169it [00:50, 965055.98it/s]
ENCFF122BQB.bam: 14496565it [00:50, 973657.33it/s]
ENCFF122BQB.bam: 14594552it [00:50, 975451.27it/s]
ENCFF122BQB.bam: 14692119it [00:50, 975107.65it/s]
ENCFF122BQB.bam: 14789645it [00:51, 960259.51it/s]
ENCFF122BQB.bam: 14885729it [00:51, 958703.40it/s]
ENCFF122BQB.bam: 14982222it [00:51, 960279.96it/s]
ENCFF122BQB.bam: 15078596it [00:51, 961304.45it/s]
ENCFF122BQB.bam: 15176442it [00:51, 966419.25it/s]
ENCFF122BQB.bam: 15273102it [00:51, 958931.66it/s]
ENCFF122BQB.bam: 15369019it [00:51, 944851.48it/s]
ENCFF122BQB.bam: 15464766it [00:51, 948568.19it/s]
ENCFF122BQB.bam: 15561546it [00:51, 954269.81it/s]
ENCFF122BQB.bam: 15659213it [00:51, 960933.01it/s]
ENCFF122BQB.bam: 15758997it [00:52, 971940.57it/s]
ENCFF122BQB.bam: 15859242it [00:52, 981056.00it/s]
ENCFF122BQB.bam: 15958383it [00:52, 984071.19it/s]
ENCFF122BQB.bam: 16058356it [00:52, 988412.72it/s]
ENCFF122BQB.bam: 16157799it [00:52, 990211.72it/s]
ENCFF122BQB.bam: 16258715it [00:52, 995827.40it/s]
ENCFF122BQB.bam: 16358305it [00:52, 985990.10it/s]chrEBV encountered in input but not in FASTA/chrom sizes.
ENCFF122BQB.bam: 16386917it [00:52, 311047.59it/s]
ENCFF772ZHK.bam: 33635580it [01:37, 726022.67it/s]chrEBV encountered in input but not in FASTA/chrom sizes.
ENCFF772ZHK.bam: 33653792it [01:37, 344068.85it/s]
Step 0.3: Find GC-matched negative regions.
Processing given loci.
Getting N percentages: 18213it [00:00, 181875.64it/s]
Getting GC percentages: 18213it [00:00, 92792.60it/s]
Getting background GC: 100%|██████████████████| 455/455 [00:15<00:00, 30.02it/s]
GC Bin Background Count Peak Count Chosen Count
0.00: 0 0 0
0.02: 0 0 0
0.04: 1 0 0
0.06: 6 0 0
0.08: 11 0 0
0.10: 16 0 0
0.12: 23 0 0
0.14: 32 0 0
0.16: 46 0 0
0.18: 45 0 0
0.20: 83 0 0
0.22: 142 2 2
0.24: 273 1 1
0.26: 1428 4 4
0.28: 8586 47 47
0.30: 32046 223 223
0.32: 74809 714 714
0.34: 127141 1281 1281
0.36: 176336 1859 1859
0.38: 208003 2258 2258
0.40: 187209 2370 2370
0.42: 149663 2206 2206
0.44: 121417 1905 1905
0.46: 96712 1514 1514
0.48: 69704 1109 1109
0.50: 49892 896 896
0.52: 34582 564 564
0.54: 25016 417 417
0.56: 18717 299 299
0.58: 14014 233 233
0.60: 9617 139 139
0.62: 6615 67 67
0.64: 4530 51 51
0.66: 2558 30 30
0.68: 1327 13 13
0.70: 576 8 8
0.72: 329 4 4
0.74: 154 0 0
0.76: 62 0 0
0.78: 23 0 0
0.80: 12 0 0
0.82: 1 0 0
0.84: 2 0 0
0.86: 0 0 0
0.88: 0 0 0
0.90: 0 0 0
0.92: 0 0 0
0.94: 0 0 0
0.96: 0 0 0
0.98: 0 0 0
1.00: 0 0 0
GC-bin KS test stat:0.0, p-value 1.0
Step 1: Fitting a BPNet model
Training Chroms: ['chr2', 'chr4', 'chr5', 'chr7', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr21', 'chr22', 'chrX', 'chrY']
Vaidation Chroms: ['chr8', 'chr20']
Loading peaks from: ['https://www.encodeproject.org/files/ENCFF331IUK/@@download/ENCFF331IUK.bed.gz']
Loading negatives from: ['quick-myc.negatives.bed']
Loading Loci: 100%|█████████████████████| 12365/12365 [00:01<00:00, 6818.95it/s]
Loading Loci: 100%|█████████████████████| 12760/12760 [00:01<00:00, 8266.28it/s]
Filtered Peaks: 66
Filtered Negatives: 0
Loading Loci: 100%|███████████████████████| 1792/1792 [00:00<00:00, 7195.70it/s]
Training Set Peaks: 12299
Training Set Negatives: 12760
Validation Set Size: 1792
Count loss weight set to 34.06
Negative Ratio: 1:0.333 pos:neg
Model has 8 dilated layers and 64 filters
Model has 114224 trainable parameters.
Warning: BPNet and ChromBPNet models trained using bpnet-lite may underperform those trained using the official repositories. See the GitHub README for further documentation.
Epoch Iteration Training Time Validation Time Training MNLL Training Count MSE Validation MNLL Validation Profile Pearson Validation Count Pearson Validation Count MSE Saved?
0 257 2.5736 0.1029 264.9587 0.3918 300.3564 0.104345106 0.47176304 0.294 True
1 514 1.7206 0.0913 215.5858 0.4594 260.7704 0.22135003 0.39767185 0.462 True
2 771 1.7087 0.0838 277.5604 0.9036 255.937 0.23200242 0.37501767 0.2905 True
3 1028 1.7126 0.083 262.9023 0.3599 254.7375 0.23529406 0.43602714 0.2941 True
4 1285 1.7086 0.0872 209.4381 0.4382 256.2182 0.2381413 0.5225112 0.2713 False
5 1542 1.7064 0.0874 240.7835 0.3972 253.5245 0.24027398 0.56034505 0.2097 True
6 1799 1.7074 0.1008 224.9795 0.0983 253.0069 0.24281423 0.6067278 0.1821 True
7 2056 1.7107 0.0816 247.7721 0.1696 252.595 0.2446042 0.6034302 0.1935 True
8 2313 1.7059 0.081 284.273 0.1083 251.8988 0.24576186 0.6524341 0.1647 True
9 2570 1.705 0.0839 227.1236 0.2407 252.1148 0.24590006 0.6052533 0.2027 False
10 2827 1.7058 0.0807 200.6794 0.0286 252.2623 0.24741817 0.6203805 0.1998 False
11 3084 1.7048 0.0809 250.2796 0.1709 251.474 0.24768683 0.67776453 0.153 True
12 3341 1.7056 0.083 239.7303 0.2763 251.3146 0.24752548 0.68329793 0.1481 True
13 3598 1.7056 0.081 222.844 0.361 250.6782 0.24909927 0.6265607 0.1922 False
14 3855 1.706 0.0829 204.7631 0.3058 250.9392 0.24914667 0.6430525 0.1839 False
15 4112 1.7055 0.0809 279.9958 0.5954 250.9259 0.24897979 0.6743577 0.1651 False
16 4369 1.712 0.0821 197.9249 0.2025 250.985 0.24975494 0.658726 0.1647 False
17 4626 1.7124 0.0843 226.5629 0.2748 250.743 0.24907656 0.60358965 0.2295 False
Loading Loci: 100%|███████████████████████| 1792/1792 [00:00<00:00, 6825.98it/s]
100%|███████████████████████████████████████████| 28/28 [00:00<00:00, 93.93it/s]
100%|██████████████████████████████████████████| 28/28 [00:00<00:00, 381.91it/s]
profile_mnll profile_jsd profile_pearson profile_spearman count_pearson count_spearman count_mse
123.77391052246094 0.5475015044212341 0.25038570165634155 0.04760761931538582 0.6882738471031189 0.6886327862739563 0.14510877430438995
Step 2: Calculating attributions
Loading Loci: 100%|██████████████████████| 1792/1792 [00:00<00:00, 25105.27it/s]
100%|███████████████████████████████████| 35840/35840 [00:16<00:00, 2236.42it/s]
Step 3.1: Seqlet identification
Step 4.1: TF-MoDISco motifs
Using 4318 positive seqlets
Step 4.2: TF-MoDISco reports
At the end of this process, we have mapped the BAMs into bigWigs, identified GC-matched negatives from the provided peaks , trained and evaluated a BPNet model, run DeepLIFT/SHAP to get attributions, identified seqlets, run TF-MoDISco, and generated the report. All of these products are now on disk with a prefix of the name you provided in the pipeline-json command. This means that you can use the processed data in the next round if you would like, you can load the BPNet model into Python
scripts and use it however you would like, and you can just look through the TF-MoDISco report without needing to run the command yourself.
If you provided a motif database, there would be a Step 5.1/2 which involves running in silico marginalizations and creating a similar report based on the marginal influence of running each motif through the model.
You will also note several JSONs are now in the specified directories. These are the JSONs that were passed into the other subcommands, e.g. bpnet fit and bpnet attribute. This serves as a record of the precise commands used in each of those steps.
Finally, the pipeline command has no memory. This means that if you run it to process BAMs into bigWigs and then run it a second time it will reprocess the BAMs into bigWigs again. This is useful if you want to try out several different settings without being worried about previous runs contaminating your future runs, but can be annoying if you’re only trying out different model architecture parameters. Fortunately, because bam2bw is so fast, it is not that big a deal to have to reprocess
small to moderate sized datasets each time. If it is annoying, you can replace the BAM files with the bigWig files after the first run.