ASimpleNanoporeTutorial/tutorial.html at master · alexiswl/ASimpleNanoporeTutorial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
<!DOCTYPE html>
<html>
  <head>
    <title>Nanopore on the command line</title>
    <meta charset="utf-8">
    <meta name="author" content="Alexis Lucattini" />
    <link href="libs/remark-css-0.0.1/example.css" rel="stylesheet" />
  </head>
  <body>
    <textarea id="source">
class: center, middle, inverse, title-slide

# Nanopore on the command line
## and python3 virtual environments
### Alexis Lucattini
### 2017/08/17

---


# Installing commandline tools (0 to 10 minutes)
Mac OSX by default does not have command line tools installed.
This is a simple fix as shown [here](http://railsapps.github.io/xcode-command-line-tools.html)

Open up 'Terminal' and type in the following.

```bash
xcode-select -p
```

`/Applications/Apple Dev Tools/Xcode.app/Contents/Developer` shoud appear as the output.
If not, type:

```bash
xcode-select --install
```

---

# Install anaconda3  on desktop (approx. 5 mins)
Open up terminal and type in the following lines of code.
(Not those starting with '#', these are comments)


```bash
# Set the version number
anaconda_version="4.4.0"
# Download anaconda3 using the wget command
wget https://repo.continuum.io/archive/Anaconda3-${anaconda_version}-MacOSX-x86_64.sh
# Install anaconda.
# -b forces install without asking questions.
# -p sets anaconda to be installed in our home directory.
bash Anaconda3-${anaconda_version}-MacOSX-x86_64.sh -b -p $HOME/anaconda3
# Now we need to update it.
conda update conda
# And we may need to install the latest version of git
conda install -c anaconda git -y
```

When on a server that uses modules, anaconda may already be installed.
If so, just type in the following:

```bash
module load anaconda3/4.3.1
```

---

# Installing Albacore
.small[
### Linux Users:
Unfortunately, albacore is not supported by python3.6 on Linux.
Therefore we will need to create a python3.5 environment to run our basecalling software on.
`https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp35-cp35m-manylinux1_x86_64.whl`

### Mac Users:
Albacore is supported on python3.6
Never the less we should create a separate environemnt for albacore to run on anyway.
`https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp36-cp36m-macosx_10_11_x86_64.whl`

### Windows Users:
Like Linux, Windows Users can only use python3.5.
`https://mirror.oxfordnanoportal.com/software/analysis/ont-albacore-1.2.6-amd64.msi`
]

???

Worth checking out that Linux distro on Windows.

---

# Creating a conda environment (10 mins)
An environment is a list of settings where software versions and paths are all calibrated for a particular program or list of programs. However, unlike your general workspace, an environment must be 'sourced' and installations of programs into an 'environment' will not disrupt your general workspace.

Here we show an example of creating an environment for albacore

```bash
# Swap out python version as 3.5 if we're on our Linux server
PYTHON_VERSION=3.5
conda create --name albacore_env python=${PYTHON_VERSION} anaconda
```

---

# Installing albacore in the conda environment
.small[
Now we have our albacore environment, we must 'source' it.

If you can't remember the name of an environment, you can see all your installed environments using:

```bash
conda info --envs
```

Now activate this environment, and install the .whl file using pip

```bash
# Activate environment
source activate albacore_env
# Update the standard conda library (especially important when using conda 3.5)
conda update --all
# Create a standard yaml file
conda env export &gt; standard.yaml
# Download albacore pip wheel for mac
wget https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp36-cp36m-macosx_10_11_x86_64.whl
# Or Linux
wget https://mirror.oxfordnanoportal.com/software/analysis/ont_albacore-1.2.6-cp35-cp35m-manylinux1_x86_64.whl
# Install albacore using pip
pip install ont_albacore-*.whl  # Star represents
# Write what we have installed to file
conda env export &gt; albacore.yaml
# Decativate the albacore environment
source deactivate
```
]

???

---

# Installing our other tools.
It may be wise to keep albacore as its own environment, and have our other tools in a separate environment.
Albacore is quite dynamic, a with a high-frequency of upgrades.


```bash
# Create a new environment
conda create --name nanopore_tools_env python=3.6 anaconda
# Activate the environment
source activate nanopore_tools_env
# Update the standard conda library
conda update --all
# Create a standard .yaml file (picture of the blank environment)
conda env export &gt; standard_env.3.6.yaml
```

Now let's use conda to install some more analysis tools.

---

# Installing useful nanopore tools.

We use minimap2 from Heng Li to align these long inaccurate reads to our genome.


```bash
# Pauvre, for viewing quality and read-length distributions.
conda install -c bioconda pauvre -y
# bwa-mem and minimap2 alignment
conda install -c bioconda bwa -y
conda install -c bioconda minimap2 -y
# Samtools and bamtools for sorting and assessing alignments
conda install -c bioconda samtools -y
conda install -c bioconda bamtools -y
# Assemblers:
conda install -c bioconda unicycler -y
conda install -c bioconda canu -y
```

---

# Check what we have installed


```bash
conda env export &gt; nanopore.3.6.yaml
diff nanopore.3.6.yaml standard.3.6.yaml | grep '==' &gt; requirements.txt
cat requirements.txt
# Return to normal environment
source deactivate
```

---

# Transferring data across

On the laptop running the MinION we will need to do the following.
Note sample_name is the name specified when using MinKNOW.


```bash
source activate nanopore_env
git clone https://github.com/alexiswl/poreduck.git
# To run the transfer script we will need to type in the following:
./poreduck/transfer_fast5_to_server.py \
--reads_dir &lt;/path/to/reads&gt; \
--server_name &lt;your_hpc&gt; \
--user_name &lt;user_on_hpc&gt; \
--dest_directory &lt;/path/to/dest/on/hpc&gt; \
--sample_name &lt;name_of_sample&gt;
```


---


# Running albacore

Albacore has two main commands.
* `read_fast5_basecaller.py`
* `full_1d2_analysis.py`

We only use the second one when using the SQK-LSK308 kit.

We can check the options by typing in the following into the terminal

```bash
source activate albacore_env
read_fast5_basecaller.py --help
```

---

# Standard albacore command that are required.

```bash
read_fast5_basecaller.py \
--input &lt;path/to/fast5/files&gt; \
--worker_threads &lt;number_of_threads_used&gt; \
--save_path &lt;/path/to/albacore/dir/&gt; \
--flowcell &lt;flowcell_version&gt; \
--kit &lt;kit_version&gt;
```

---

# Poreduck: albacore_server_scaled.py


```bash
# Download poreduck
git clone https://github.com/alexiswl/poreduck.git
# Update poreduck
cd poreduck
git pull origin master
```

---

# Poreduck: albacore_server_scaled.py


```bash
# Getting help
albacore_server_scaled.py --help
# Run albacore through poreduck
albacore_server_scaled.py \
--reads_dir &lt;/minion/directory&gt; \
--kit SQK-LSK108 \
--flowcell FLO-MIN106 \
--num_threads 5 \
--max_processes 10
```

---


# Introduction to qsub
qsub is the way many users can interact with a HPC, such as Milton.
qsub allocates partitions of the server to users in a 'fair' manner.
To run qsub, we pipe the 'albacore' command into a 'qsub command' which tells the terminal to run the albacore job on the qsub cluster.


---
# Pauvre
.pull-left[
* Yield and read-length distribution plots.
* Statistic plots
]
.pull-right[
&lt;img src=images/pauvre_example.png width="100%"&gt;
]

```bash
# Create a margin plot
pauvre margin plot --fastq &lt;input_fastq_file&gt;
# Create a summary file.
pauvre stats --fastq &lt;input.fastq_file&gt;
```
---

# Aligning to the genome

```bash
# Create a reference index
minimap2 -x map-ont -d reference_index /path/to/reference_genome
# Use minimap2 to align to the genome.
minimap2 -x map-ont -d reference_index /path/to/fastq &gt; alignment.sam
# The output is a sam file. We should convert this to a bam file and sort it.
samtools view -b alignment.sam -o alignment.bam
# Now sort and index the bam file
samtools sort -o alignment.sorted.bam alignment.bam
samtools index alignment.sorted.bam alignment.sorted.bai
```

---

# Canu: de novo assembly.

Canu is a de novo assembler. i.e completes assembly of a genome without using a reference.
It requires the user to have an estimate of the genome size prior to use.
Designed for long-inaccurate reads.
Corrects, trims and then assembles each genome.


```bash
canu \
-d canu_assembly_directory \
-nanopore-raw \
-genomeSize=3g *.fastq
```


Full documentation at:
http://canu.readthedocs.org/en/latest/

---

# Unicycler: for hybrid assembly.

Unicycler is different to canu as it takes in short reads as a method of polishing the genome.

To use Unicycler, you must have short-read illumina data for the same sample.


```bash
unicycler \
-1 short_reads_1.fastq.gz \
-2 short_reads_2.fastq.gz \
-l long_reads.fastq.gz \
-o output_dir
```

Full documentation at:
https://github.com/rrwick/Unicycler
    </textarea>
<script src="https://remarkjs.com/downloads/remark-latest.min.js"></script>
<script>var slideshow = remark.create({
"highlight": "pygments",
"highlightLines": true,
"countIncrementalSlides": false
});
if (window.HTMLWidgets) slideshow.on('afterShowSlide', function (slide) {window.dispatchEvent(new Event('resize'));});</script>

<script type="text/x-mathjax-config">
MathJax.Hub.Config({
  tex2jax: {
    skipTags: ['script', 'noscript', 'style', 'textarea', 'pre']
  }
});
</script>
<!-- dynamically load mathjax for compatibility with self-contained -->
<script>
(function () {
  var script = document.createElement('script');
  script.type = 'text/javascript';
  script.src  = 'https://cdn.bootcss.com/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML';
  if (location.protocol !== 'file:' && /^https?:/.test(script.src))
    script.src  = script.src.replace(/^https?:/, '');
  document.getElementsByTagName('head')[0].appendChild(script);
})();
</script>
  </body>
</html>