diff --git a/Help/1 Introduction/1.1 What is FastQC.html b/Help/1 Introduction/1.1 What is FastQC.html index a124ff3..dc0e4a9 100644 --- a/Help/1 Introduction/1.1 What is FastQC.html +++ b/Help/1 Introduction/1.1 What is FastQC.html @@ -1,7 +1,7 @@ -What is FastQC? +1.1 - What is FastQC? -

What is FastQC

+

1.1 - What is FastQC

Modern high throughput sequencers can generate hundreds of millions of sequences in a single run. Before analysing this sequence to draw biological conclusions @@ -18,7 +18,7 @@

What is FastQC

may affect how you can usefully use it.

-Most sequencers will generate a QC report as part of their analysis pipeline, +Most sequencers will generate a QC report as part of their analysis pipeline, but this is usually only focused on identifying problems which were generated by the sequencer itself. FastQC aims to provide a QC report which can spot problems which originate either in the sequencer or in the starting library @@ -31,6 +31,36 @@

What is FastQC

for integrating into a larger analysis pipeline for the systematic processing of large numbers of files.

+
+ - \ No newline at end of file + diff --git a/Help/2 Basic Operations/2.1 Opening a sequence file.html b/Help/2 Basic Operations/2.1 Opening a sequence file.html index 993e802..f25b10e 100644 --- a/Help/2 Basic Operations/2.1 Opening a sequence file.html +++ b/Help/2 Basic Operations/2.1 Opening a sequence file.html @@ -1,7 +1,7 @@ -Opening a FastQ file +2.1 - Opening a FastQ file -

Opening a Sequence file

+

2.1 - Opening a Sequence file

To open one or more Sequence files interactively simply run the program and select File > Open. You can then select the files @@ -34,7 +34,7 @@

Opening a Sequence file

  • BAM
  • SAM/BAM Mapped only (normally used for colorspace data)
  • - +

    * Casava fastq format is the same as regular fastq except that the data is usually split across multiple files for a single sample. @@ -44,20 +44,48 @@

    Opening a Sequence file

    In Casava mode the program will exclude these flagged sequences from the report.

    - +

    By default FastQC will try to guess the file format from the name - of the input file. Anything ending in .sam or .bam will be + of the input file. Anything ending in .sam or .bam will be opened as a SAM/BAM file (using all sequences, mapped and unmapped) - , and everything else will be treated as FastQ format. If you want + , and everything else will be treated as FastQ format. If you want to override this detection and specify the file format manually then you can use the drop down file filter in the file chooser to - select the type of file you're going to load. You need to use the + select the type of file you're going to load. You need to use the drop down selector to make the program use the Mapped BAM or Casava file modes as these won't be selected automatically.

    - - +
    + \ No newline at end of file diff --git a/Help/2 Basic Operations/2.2 Evaluating Results.html b/Help/2 Basic Operations/2.2 Evaluating Results.html index c9ff2d9..21e2a93 100644 --- a/Help/2 Basic Operations/2.2 Evaluating Results.html +++ b/Help/2 Basic Operations/2.2 Evaluating Results.html @@ -1,7 +1,7 @@ -Evaluating Results +2.2 - Evaluating Results -

    Evaluating Results

    +

    2.2 - Evaluating Results

    The analysis in FastQC is performed by a series of analysis modules. The left hand side of the main interactive display @@ -33,5 +33,35 @@

    Evaluating Results

    Specific guidance on how to interpret the output of each module can be found in the modules section of the help.

    +
    + \ No newline at end of file diff --git a/Help/2 Basic Operations/2.3 Saving a Report.html b/Help/2 Basic Operations/2.3 Saving a Report.html index 4d01778..177114a 100644 --- a/Help/2 Basic Operations/2.3 Saving a Report.html +++ b/Help/2 Basic Operations/2.3 Saving a Report.html @@ -1,7 +1,7 @@ -Saving a Report +2.3 - Saving a Report -

    Saving a Report

    +

    2.3 - Saving a Report

    In addition to providing an interactive report FastQC also has the option to create an HTML version of this report @@ -28,12 +28,41 @@

    Saving a Report

    The HTML file which is saved is a self-contained document with all of the graphs embedded into it, so you can distribute this single file. Alongside the HTML file is a zip file (with the -same name as the HTML file, but with .zip added to the end). +same name as the HTML file, but with .zip added to the end). This file contains the graphs from the report as separate files -but also contains data files which are designed to be easily -parsed to allow for a more detailed and automated evauation of +but also contains data files which are designed to be easily +parsed to allow for a more detailed and automated evauation of the raw data on which the QC report is built.

    - +
    + \ No newline at end of file diff --git a/Help/3 Analysis Modules/1 Basic Statistics.html b/Help/3 Analysis Modules/1 Basic Statistics.html index 76766aa..a0ecaba 100644 --- a/Help/3 Analysis Modules/1 Basic Statistics.html +++ b/Help/3 Analysis Modules/1 Basic Statistics.html @@ -1,7 +1,7 @@ -Basic Statistics +3.1 - Basic Statistics -

    Basic Statistics

    +

    3.1 - Basic Statistics

    Summary

    The Basic Statistics module generates some simple composition @@ -21,14 +21,14 @@

    Summary

  • File type: Says whether the file appeared to contain actual base calls or colorspace data which had to be converted to base calls
  • Encoding: Says which ASCII encoding of quality values was found in this -file. -
  • Total Sequences: A count of the total number of sequences processed. +file.
  • +
  • Total Sequences: A count of the total number of sequences processed. There are two values reported, actual and estimated. At the moment these will always be the same. In the future it may be possible to analyse just a subset of sequences and estimate the total number, to speed up the analysis, but since we have found that problematic sequences are not evenly distributed through a file we have disabled this for now.
  • -
  • Filtered Sequences: If running in Casava mode sequences flagged to be +
  • Filtered Sequences: If running in Casava mode sequences flagged to be filtered will be removed from all analyses. The number of such sequences removed will be reported here. The total sequences count above will not include these filtered sequences and will the number of sequences actually used for the @@ -53,6 +53,35 @@

    Common reasons for warnings

    This module never raises warnings or errors

    - +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/10 Adapter Content.html b/Help/3 Analysis Modules/10 Adapter Content.html index 6ebfe47..8f263b8 100644 --- a/Help/3 Analysis Modules/10 Adapter Content.html +++ b/Help/3 Analysis Modules/10 Adapter Content.html @@ -1,7 +1,7 @@ -Adapter Content +3.10 - Adapter Content -

    Adapter Content

    +

    3.10 - Adapter Content

    Summary

    The Kmer Content module will do a generic analysis of all of the Kmers -in your library to find those which do not have even coverage through +in your library to find those which do not have even coverage through the length of your reads. This can find a number of different sources of bias in the library which can include the presence of read-through adapter sequences building up on the end of your sequences. @@ -26,15 +26,15 @@

    Summary

    be interested.

    -One obvious class of sequences which you might want to analyse are -adapter sequences. It is useful to know if your library contains a -significant amount of adapter in order to be able to assess whether -you need to adapter trim or not. Although the Kmer analysis can +One obvious class of sequences which you might want to analyse are +adapter sequences. It is useful to know if your library contains a +significant amount of adapter in order to be able to assess whether +you need to adapter trim or not. Although the Kmer analysis can theoretically spot this kind of contamination it isn't always clear. This module therefore does a specific search for a set of separately defined Kmers and will give you a view of the total proportion of your -library which contain these Kmers. A results trace will always be -generated for all of the sequences present in the adapter config file +library which contain these Kmers. A results trace will always be +generated for all of the sequences present in the adapter config file so you can see the adapter content of your library, even if it's low.

    @@ -60,10 +60,39 @@

    Failure

    Common reasons for warnings

    Any library where a reasonable proportion of the insert sizes are shorter -than the read length will trigger this module. This doesn't indicate a +than the read length will trigger this module. This doesn't indicate a problem as such - just that the sequences will need to be adapter trimmed before proceeding with any downstream analysis.

    - +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/11 Kmer Content.html b/Help/3 Analysis Modules/11 Kmer Content.html index b2b2a82..1297715 100644 --- a/Help/3 Analysis Modules/11 Kmer Content.html +++ b/Help/3 Analysis Modules/11 Kmer Content.html @@ -1,7 +1,7 @@ -Kmer Content +3.11 - Kmer Content -

    Kmer Content

    +

    3.11 - Kmer Content

    Summary

    The analysis of overrepresented sequences will spot an increase in @@ -25,16 +25,16 @@

    Summary

    of places within your sequence then this won't be seen either by the per base content plot or the duplicate sequence analysis.
  • - +

    -The Kmer module starts from the assumption that any small fragment +The Kmer module starts from the assumption that any small fragment of sequence should not have a positional bias in its apearance within a diverse library. There may be biological reasons why certain Kmers are enriched or depleted overall, but these biases should affect all positions within a sequence equally. This module therefore measures the number of each 7-mer at each position in your library and then uses -a binomial test to look for significant deviations from an even +a binomial test to look for significant deviations from an even coverage at all positions. Any Kmers with positionally biased enrichment are reported. The top 6 most biased Kmer are additionally plotted to show their distribution. @@ -64,14 +64,43 @@

    Common reasons for warnings

    Any individually overrepresented sequences, even if not present at a high enough threshold to trigger the overrepresented sequences module will cause the Kmers from those sequences to be highly enriched in this module. These will normally appear -as sharp spikes of enrichemnt at a single point in the sequence, rather than a +as sharp spikes of enrichemnt at a single point in the sequence, rather than a progressive or broad enrichment.

    -Libraries which derive from random priming will nearly always show Kmer bias at +Libraries which derive from random priming will nearly always show Kmer bias at the start of the library due to an incomplete sampling of the possible random primers.

    - +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/12 Per Tile Sequence Quality.html b/Help/3 Analysis Modules/12 Per Tile Sequence Quality.html index 7597d58..3a7bb4d 100644 --- a/Help/3 Analysis Modules/12 Per Tile Sequence Quality.html +++ b/Help/3 Analysis Modules/12 Per Tile Sequence Quality.html @@ -1,7 +1,7 @@ -Per Tile Sequence Quality +3.12 - Per Tile Sequence Quality -

    Per Tile Sequence Quality

    +

    3.12 - Per Tile Sequence Quality

    Summary

    This graph will only appear in your analysis results if you're using @@ -21,7 +21,7 @@

    Summary

    The plot shows the deviation from the average quality for each tile. -The colours are on a cold to hot scale, with cold colours being +The colours are on a cold to hot scale, with cold colours being positions where the quality was at or above the average for that base in the run, and hotter colours indicate that a tile had worse qualities than other tiles for that base. In the example below you @@ -40,27 +40,56 @@

    Warning

    This module will issue a warning if any tile shows a mean Phred score more than 2 less than the mean for that base across all -tiles. +tiles.

    Failure

    This module will raise and error if any tile shows a mean Phred score more than 5 less than the mean for that base across all -tiles. +tiles.

    Common reasons for warnings

    Whilst warnings in this module can be triggered by individual specific -events we have also observed that greater variation in the phred -scores attributed to tiles can also appear when a flowcell is generally -overloaded. In this case events appear all over the flowcell rather +events we have also observed that greater variation in the phred +scores attributed to tiles can also appear when a flowcell is generally +overloaded. In this case events appear all over the flowcell rather than being confined to a specific area or range of cycles. We would generally ignore errors which mildly affected a small number of tiles for only 1 or 2 cycles, but would pursue larger effects which showed high -deviation in scores, or which persisted for several cycles. +deviation in scores, or which persisted for several cycles.

    - +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/2 Per Base Sequence Quality.html b/Help/3 Analysis Modules/2 Per Base Sequence Quality.html index 45771e2..ec62246 100644 --- a/Help/3 Analysis Modules/2 Per Base Sequence Quality.html +++ b/Help/3 Analysis Modules/2 Per Base Sequence Quality.html @@ -1,7 +1,7 @@ -Per Base Sequence Quality +3.2 - Per Base Sequence Quality -

    Per Base Sequence Quality

    +

    3.2 - Per Base Sequence Quality

    Summary

    This view shows an overview of the range of quality values across all bases @@ -28,7 +28,7 @@

    Summary

    The y-axis on the graph shows the quality scores. The higher the score -the better the base call. The background of the graph divides the +the better the base call. The background of the graph divides the y axis into very good quality calls (green), calls of reasonable quality (orange), and calls of poor quality (red). The quality of calls on most platforms will degrade as the run progresses, so it is common to see @@ -62,7 +62,7 @@

    Failure

    Common reasons for warnings

    -The most common reason for warnings and failures in this module is a general +The most common reason for warnings and failures in this module is a general degradation of quality over the duration of long runs. In general sequencing chemistry degrades with increasing read length and for long runs you may find that the general quality of the run falls to a level where a warning or error @@ -71,25 +71,55 @@

    Common reasons for warnings

    If the quality of the library falls to a low level then the most common remedy is to perform quality trimming where reads are truncated based on their average -quality. For most libraries where this type of degradation has occurred you +quality. For most libraries where this type of degradation has occurred you will often be simultaneously running into the issue of adapter read-through so a combined adapter and quality trimming step is often employed.

    Another possibility is that a warn / error is triggered because of a short loss -of quality earlier in the run, which then recovers to produce later good +of quality earlier in the run, which then recovers to produce later good quality sequence. This can happen if there is a transient problem with the run (bubbles passing through a flowcell for example). You can normally see this -type of error by looking at the per-tile quality plot (if available for your +type of error by looking at the per-tile quality plot (if available for your platform). In these cases trimming is not advisable as it will remove later good sequence, but you might want to consider masking bases during subsequent mapping or assembly.

    If your library has reads of varying length then you can find a warning or error -is triggered from this module because of very low coverage for a given base range. +is triggered from this module because of very low coverage for a given base range. Before committing to any action, check how many sequences were responsible for triggering an error by looking at the sequence length distribution module results.

    +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/3 Per Sequence Quality Scores.html b/Help/3 Analysis Modules/3 Per Sequence Quality Scores.html index 7dc9ac7..24dddb9 100644 --- a/Help/3 Analysis Modules/3 Per Sequence Quality Scores.html +++ b/Help/3 Analysis Modules/3 Per Sequence Quality Scores.html @@ -1,7 +1,7 @@ -Per Sequence Quality Scores +3.3 - Per Sequence Quality Scores -

    Per Sequence Quality Scores

    +

    3.3 - Per Sequence Quality Scores

    Summary

    The per sequence quality score report allows you to see if a subset @@ -52,7 +52,35 @@

    Common reasons for warnings

    be evaluated in concert with the per-tile qualities (if available) since this might indicate the reason for the loss in quality of a subset of sequences.

    - - +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/4 Per Base Sequence Content.html b/Help/3 Analysis Modules/4 Per Base Sequence Content.html index bae1142..c9c53f1 100644 --- a/Help/3 Analysis Modules/4 Per Base Sequence Content.html +++ b/Help/3 Analysis Modules/4 Per Base Sequence Content.html @@ -1,7 +1,7 @@ -Per Base Sequence Content +3.4 - Per Base Sequence Content -

    Per Base Sequence Content

    +

    3.4 - Per Base Sequence Content

    Summary

    Per Base Sequence Content plots out the proportion of each base @@ -27,11 +27,11 @@

    Summary

    It's worth noting that some types of library will always produce biased -sequence composition, normally at the start of the read. Libraries +sequence composition, normally at the start of the read. Libraries produced by priming using random hexamers (including nearly all RNA-Seq libraries) and those which were fragmented using transposases inherit an intrinsic -bias in the positions at which reads start. This bias does not concern -an absolute sequence, but instead provides enrichement of a number of +bias in the positions at which reads start. This bias does not concern +an absolute sequence, but instead provides enrichement of a number of different K-mers at the 5' end of the reads. Whilst this is a true technical bias, it isn't something which can be corrected by trimming and in most cases doesn't seem to adversely affect the downstream analysis. @@ -52,34 +52,64 @@

    Failure

    Common reasons for warnings

    -There are a number of common scenarios which would ellicit a warning +There are a number of common scenarios which would ellicit a warning or error from this module.

    1. Overrepresented sequences: If there is any evidence of overrepresented -sequences such as adapter dimers or rRNA in a sample then these sequences +sequences such as adapter dimers or rRNA in a sample then these sequences may bias the overall composition and their sequence will emerge from this plot.
    2. Biased fragmentation: Any library which is generated based on the ligation of random hexamers or through tagmentation should theoretically have good -diversity through the sequence, but experience has shown that these libraries +diversity through the sequence, but experience has shown that these libraries always have a selection bias in around the first 12bp of each run. This is due to a biased selection of random primers, but doesn't represent any individually biased sequences. Nearly all RNA-Seq libraries will fail this module because of -this bias, but this is not a problem which can be fixed by processing, and it +this bias, but this is not a problem which can be fixed by processing, and it doesn't seem to adversely affect the ablity to measure expression.
    3. Biased composition libraries: Some libraries are inherently biased in their -sequence composition. The most obvious example would be a library which has been +sequence composition. The most obvious example would be a library which has been treated with sodium bisulphite which will then have converted most of the cytosines -to thymines, meaning that the base composition will be almost devoid of cytosines +to thymines, meaning that the base composition will be almost devoid of cytosines and will thus trigger an error, despite this being entirely normal for that type of library
    4. -
    5. If you are analysing a library which has been aggressivley adapter trimmed -then you will naturally introduce a composition bias at the end of the reads as -sequences which happen to match short stretches of adapter are removed, leaving +
    6. If you are analysing a library which has been aggressivley adapter trimmed +then you will naturally introduce a composition bias at the end of the reads as +sequences which happen to match short stretches of adapter are removed, leaving only sequences which do not match. Sudden deviations in composition at the end of libraries which have undergone aggressive trimming are therefore likely to be spurious.
    7. - +
    +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/5 Per Sequence GC Content.html b/Help/3 Analysis Modules/5 Per Sequence GC Content.html index db6f594..b32d079 100644 --- a/Help/3 Analysis Modules/5 Per Sequence GC Content.html +++ b/Help/3 Analysis Modules/5 Per Sequence GC Content.html @@ -1,7 +1,7 @@ -Per Sequence GC Content +3.5 - Per Sequence GC Content -

    Per Sequence GC Content

    +

    3.5 - Per Sequence GC Content

    Summary

    This module measures the GC content across the whole length -of each sequence in a file and compares it to a modelled +of each sequence in a file and compares it to a modelled normal distribution of GC content.

    @@ -21,7 +21,7 @@

    Summary

    In a normal random library you would expect to see a roughly -normal distribution of GC content where the central peak +normal distribution of GC content where the central peak corresponds to the overall GC content of the underlying genome. Since we don't know the the GC content of the genome the modal GC content is calculated from the observed data and used to @@ -40,7 +40,7 @@

    Summary

    Warning

    -A warning is raised if the sum of the deviations from the normal +A warning is raised if the sum of the deviations from the normal distribution represents more than 15% of the reads.

    @@ -58,6 +58,35 @@

    Common reasons for warnings

    overrepresented sequences module. Broader peaks may represent contamination with a different species.

    - +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/6 Per Base N Content.html b/Help/3 Analysis Modules/6 Per Base N Content.html index 443a119..05f15cf 100644 --- a/Help/3 Analysis Modules/6 Per Base N Content.html +++ b/Help/3 Analysis Modules/6 Per Base N Content.html @@ -1,7 +1,7 @@ -Per Base N Content +3.6 - Per Base N Content -

    Per Base N Content

    +

    3.6 - Per Base N Content

    Summary

    If a sequencer is unable to make a base call with sufficient confidence @@ -23,7 +23,7 @@

    Summary

    -It's not unusual to see a very low proportion of Ns appearing in a sequence, +It's not unusual to see a very low proportion of Ns appearing in a sequence, especially nearer the end of a sequence. However, if this proportion rises above a few percent it suggests that the analysis pipeline was unable to interpret the data well enough to make valid base calls. @@ -43,18 +43,48 @@

    Common reasons for warnings

    The most common reason for the inclusion of significant proportions of Ns is a general loss of quality, so the results of this module should be evaluated -in concert with those of the various quality modules. You should check the +in concert with those of the various quality modules. You should check the coverage of a specific bin, since it's possible that the last bin in this analysis -could contain very few sequences, and an error could be prematurely triggered in +could contain very few sequences, and an error could be prematurely triggered in this case.

    Another common scenario is the incidence of a high proportions of N at a small -number of positions early in the library, against a background of generally -good quality. Such deviations can occur when you have very biased sequence +number of positions early in the library, against a background of generally +good quality. Such deviations can occur when you have very biased sequence composition in the library to the point that base callers can become confused and make poor calls. This type of problem will be apparent when looking at the per-base sequence content results.

    +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/7 Sequence Length Distribution.html b/Help/3 Analysis Modules/7 Sequence Length Distribution.html index 27bbbb9..d9ad328 100644 --- a/Help/3 Analysis Modules/7 Sequence Length Distribution.html +++ b/Help/3 Analysis Modules/7 Sequence Length Distribution.html @@ -1,7 +1,7 @@ -Sequence Length Distribution +3.7 - Sequence Length Distribution -

    Sequence Length Distribution

    +

    3.7 - Sequence Length Distribution

    Summary

    Some high throughput sequencers generate sequence fragments of uniform length, but others can contain reads of wildly -varying lengths. Even within uniform length libraries some -pipelines will trim sequences to remove poor quality base calls +varying lengths. Even within uniform length libraries some +pipelines will trim sequences to remove poor quality base calls from the end.

    @@ -44,6 +44,35 @@

    Common reasons for warnings

    For some sequencing platforms it is entirely normal to have different read lengths so warnings here can be ignored.

    - +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/8 Duplicate Sequences.html b/Help/3 Analysis Modules/8 Duplicate Sequences.html index 296fe4e..4c2ee7b 100644 --- a/Help/3 Analysis Modules/8 Duplicate Sequences.html +++ b/Help/3 Analysis Modules/8 Duplicate Sequences.html @@ -1,7 +1,7 @@ -Duplicate Sequences +3.8 - Duplicate Sequences -

    Duplicate Sequences

    +

    3.8 - Duplicate Sequences

    Summary

    In a diverse library most sequences will occur only once in the final @@ -28,9 +28,9 @@

    Summary

    To cut down on the memory requirements for this module only sequences which first appear in the first 100,000 sequences in each file are analysed, but this should be enough to get a good impression for the duplication -levels in the whole file. Each sequence is tracked to the end of the -file to give a representative count of the overall duplication level. -To cut down on the amount of information in the final plot any sequences +levels in the whole file. Each sequence is tracked to the end of the +file to give a representative count of the overall duplication level. +To cut down on the amount of information in the final plot any sequences with more than 10 duplicates are placed into grouped bins to give a clear impression of the overall duplication level without having to show each individual duplication value. @@ -50,14 +50,14 @@

    Summary

    blue line takes the full sequence set and shows how its duplication levels are distributed. In the red plot the sequences are de-duplicated and the proportions shown are the proportions of the deduplicated set which come from different duplication -levels in the original data. +levels in the original data.

    In a properly diverse library most sequences should fall into the far left of the plot in both the red and blue lines. A general level of enrichment, indicating broad -oversequencing in the library will tend to flatten the lines, lowering the low end and -generally raising other categories. More specific enrichments of subsets, or the +oversequencing in the library will tend to flatten the lines, lowering the low end and +generally raising other categories. More specific enrichments of subsets, or the presence of low complexity contaminants will tend to produce spikes towards the right of the plot. These high duplication peaks will most often appear in the blue trace as they make up a high proportion of the original library, but usually disappear in the @@ -87,12 +87,12 @@

    Failure

    Common reasons for warnings

    -The underlying assumption of this module is of a diverse unenriched library. Any deviation +The underlying assumption of this module is of a diverse unenriched library. Any deviation from this assumption will naturally generate duplicates and can lead to warnings or errors from this module.

    -In general there are two potential types of duplicate in a library, technical duplicates +In general there are two potential types of duplicate in a library, technical duplicates arising from PCR artefacts, or biological duplicates which are natural collisions where different copies of exactly the same sequence are randomly selected. From a sequence level there is no way to distinguish between these two types and both will be reported as duplicates here. @@ -117,10 +117,39 @@

    Common reasons for warnings

    although the duplication there is less pronounced. Finally, if you have a library where the sequence start points are constrained (a library constructed around restriction sites for example, or an unfragmented small RNA library) then the constrained start sites will generate huge dupliction levels -which should not be treated as a problem, nor removed by deduplication. In these types of library +which should not be treated as a problem, nor removed by deduplication. In these types of library you should consider using a system such as random barcoding to allow the distinction of technical and biological duplicates.

    - +
    + - + \ No newline at end of file diff --git a/Help/3 Analysis Modules/9 Overrepresented Sequences.html b/Help/3 Analysis Modules/9 Overrepresented Sequences.html index 01dca9f..c4c7269 100644 --- a/Help/3 Analysis Modules/9 Overrepresented Sequences.html +++ b/Help/3 Analysis Modules/9 Overrepresented Sequences.html @@ -1,7 +1,7 @@ -Overrepresented Sequences +3.9 - Overrepresented Sequences -

    Overrepresented Sequences

    +

    3.9 - Overrepresented Sequences

    Summary

    A normal high-throughput library will contain a diverse set of sequences, with each individual sequence making up only a tiny fraction of the whole. Finding that a single sequence is very -overrepresented in the set either means that it is highly +overrepresented in the set either means that it is highly biologically significant, or indicates that the library is contaminated, or not as diverse as you expected.

    @@ -62,9 +62,38 @@

    Failure

    Common reasons for warnings

    This module will often be triggered when used to analyse small RNA libraries -where sequences are not subjected to random fragmentation, and the same +where sequences are not subjected to random fragmentation, and the same sequence may natrually be present in a significant proportion of the library.

    - +
    + - + \ No newline at end of file