-
Notifications
You must be signed in to change notification settings - Fork 10
K Inskeep and M Corley Submission #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kinskeep
wants to merge
17
commits into
lyy005:master
Choose a base branch
from
kinskeep:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
9b779de
created python script file
kinskeep 43a71cf
initial commit of part 1 script, no plots yet
mcorley1 b96c834
Merge branch 'master' of https://github.com/kinskeep/Intro_Biocom_ND_…
mcorley1 ceeed73
Added to Q1, loaded Q3 dataset and made blank lists
kinskeep 493ca77
Completed Questions 1 and 3
kinskeep 7017b00
Cleaned up code and added comments for Q1 and Q3
kinskeep d28b565
histogram of sequence length
mcorley1 4755988
percentGC histogram added
mcorley1 48664ba
part 1 script updated
mcorley1 0deb9a8
part 3 script with scatter plot
mcorley1 36b5e95
Recommit because we were accidentally working in separate files. Maki…
kinskeep b96c3cd
trouble with plotting
mcorley1 b5b3c8b
data for part2
mcorley1 1dd0597
Removed quotation marks via command line
kinskeep 0834cbf
Fixed importing file issue. It works now!
kinskeep c51647a
Added Michelle's Question 2 script. This file contains all 3 Questions
kinskeep 6088aaa
Added print commands to Q1
kinskeep File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| ####Exercise7#### | ||
| #Question 1 | ||
| #load dataset | ||
| import pandas | ||
| InFile=open("Lecture11.fasta","r") | ||
| #InFile=close() | ||
| #create lists for storing information about sequences | ||
| sequenceID=[] | ||
| sequenceLength=[] | ||
| percentGC=[] | ||
| meltingTemp=[] | ||
| #for loop to sort out sequence lines and append their lengths | ||
| for Line in InFile: | ||
| # remove newline character from file line | ||
| Line=Line.strip() | ||
| print (Line) | ||
| # carrot lines separated from sequence lines | ||
| if '>' in Line: | ||
| sequenceID.append(Line[1:]) | ||
| else: | ||
| # Create new seqlength dataframe and append lengths | ||
| Seqlength = float(len(Line)) | ||
| print (Seqlength) | ||
| sequenceLength.append(Seqlength) | ||
| # count the number of G's and C's | ||
| nG=Line.count("G") | ||
| print (nG) | ||
| nC=Line.count("C") | ||
| print (nC) | ||
| # append values to list | ||
| gcTotal = (nG+nC)/Seqlength*100 | ||
| percentGC.append(gcTotal) | ||
|
|
||
| #dataframe of resulting info | ||
| seqDF = pandas.DataFrame(list(zip(sequenceID,sequenceLength,percentGC)),columns=['sequenceID','sequenceLength','percentGC']) | ||
| #to make infile management easier | ||
| #InFile=open("Lecture11.fasta","r") | ||
| InFile.close() | ||
|
|
||
| #Histogram of sequence lengths | ||
| import plotnine | ||
| from plotnine import * | ||
| p=(ggplot(data=seqDF) + | ||
| aes(x="sequenceLength") + | ||
| geom_histogram(binwidth=4)) | ||
| p | ||
| #Histogram of Percent GC | ||
| g=(ggplot(data=seqDF) + | ||
| aes(x="percentGC") + | ||
| geom_histogram(binwidth=5)) | ||
| g | ||
|
|
||
| #Question 2 | ||
| import numpy | ||
| import pandas | ||
| import plotnine | ||
| from plotnine import * | ||
|
|
||
| #read in file | ||
| Part2=pandas.read_csv("part2datacopy.txt", sep=",") | ||
| #print(Part2) | ||
|
|
||
| #plotting data in scatterplot with trendline | ||
| a=ggplot(Part2,aes(x="oil changes per year",y="cost of repairs($)"))+theme_classic()+geom_point() | ||
| a+xlab("oil changes per year")+ylab("cost of repairs($)")+stat_smooth(method="lm") | ||
|
|
||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good job |
||
| #Question 3 | ||
| #load the dataset | ||
| import pandas | ||
| import numpy | ||
| Data = pandas.read_csv("data.txt", sep=',') | ||
| print (Data) | ||
|
|
||
| #making bar graph with region as x and ave as y | ||
| import plotnine | ||
| from plotnine import * | ||
| d=ggplot(Data)+theme_classic()+xlab("region")+ylab("Average") | ||
| d+geom_bar(aes(x="factor(region)",y="observations"),stat="summary",fun_y=numpy.mean) | ||
|
|
||
| #scatter plot of everything observed | ||
| a=ggplot(Data,aes(x="region",y="observations")) | ||
| a+geom_jitter()+coord_cartesian() | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good job |
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| import numpy | ||
| import pandas | ||
| from plotnine import * | ||
|
|
||
|
|
||
| #Question 1 | ||
| InFile=open("Lecture11.fasta","r") | ||
|
|
||
| #create lists for storing information about sequences | ||
| sequenceID=[] | ||
| sequenceLength=[] | ||
| percentGC=[] | ||
| meltingTemp=[] | ||
|
|
||
| #loop through each line in fasta file to process sequences | ||
| for Line in InFile: | ||
| Line=Line.strip() #removes white space, tab, space, newline characters | ||
| if '>' in Line: | ||
| sequenceID.append(Line[1:]) | ||
| #print(Line[1:]) | ||
| else: | ||
| seqLen=float(len(Line)) | ||
| nG=Line.count("G") | ||
| nC=Line.count("C") | ||
|
|
||
| #append values to lists | ||
| sequenceLength.append(seqLen) | ||
| percentGC.append((nG+nC)/seqLen*100) | ||
|
|
||
| #combine lists into dataframe | ||
| seqDF = pandas.DataFrame(list(zip(sequenceID,sequenceLength,percentGC)),columns=['sequenceID','sequenceLength','percentGC']) | ||
| #min(seqDF.sequenceLength) | ||
|
|
||
| #close file | ||
| InFile.close() | ||
|
|
||
| #plots histogram of sequence length | ||
| b=ggplot(seqDF,aes(x="sequenceLength")) | ||
| b+geom_histogram(binwidth=5)+theme_classic() | ||
|
|
||
| #plots histogram of percent GC | ||
| b=ggplot(seqDF,aes(x="percentGC")) | ||
| b+geom_histogram(binwidth=5)+theme_classic() | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| oil changes per year,cost of repairs($)3,3005,3002,5003,4001,7004,4006,1004,2503,4502,6500,60010,07,150 | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| import numpy | ||
| import pandas | ||
| import plotnine | ||
| from plotnine import * | ||
|
|
||
| Part2=pandas.read_csv("part2datacopy.txt", sep=",") | ||
| #print(Part2) | ||
|
|
||
| #plotting data in scatterplot with trendline | ||
| a=ggplot(Part2,aes(x="oil changes per year",y="cost of repairs($)"))+theme_classic()+geom_point() | ||
| a+xlab("oil changes per year")+ylab("cost of repairs($)")+stat_smooth(method="lm") |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| #Question 3 | ||
| #load the dataset | ||
| import pandas | ||
| import numpy | ||
| Data = pandas.read_csv("data.txt", sep=',') | ||
| #print (Data) | ||
|
|
||
| #making bar graph with region as x and ave as y | ||
| import plotnine | ||
| from plotnine import * | ||
| d=ggplot(Data)+theme_classic()+xlab("region")+ylab("Average") | ||
| d+geom_bar(aes(x="factor(region)",y="observations"),stat="summary",fun_y=numpy.mean) | ||
|
|
||
| #scatter plot of all observations | ||
| a=ggplot(Data,aes(x="region",y="observations")) | ||
| a+geom_jitter()+coord_cartesian() |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job