-
Notifications
You must be signed in to change notification settings - Fork 10
exercise 7 Bruzzese Loh submission #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
0327e79
a77b810
80473e5
d7ee4b5
399fc34
d69f5f3
afe0ca3
cf86ba5
248cb60
6fcd030
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| #exercise 7# | ||
| #Dan Bruzzese and Zoe Loh | ||
|
|
||
|
|
||
|
|
||
|
|
||
|
|
||
| # question 1 | ||
| import pandas | ||
| from plotnine import * | ||
| File=open("Lecture11.fasta","r") | ||
| plotData = pandas.DataFrame(columns = ["Sequence Length" , "GC content"]) | ||
|
|
||
| for line in File: | ||
| line = line.strip() | ||
| if ">" in line: | ||
| continue | ||
| else: | ||
| #First the length of the sequence and the percent gc count is calculated | ||
| Length = (len(line)-1) | ||
| #Because it is integer division we must force python do divide as if it was real numbers by using float() | ||
| GCcount = (float((line.count("G"))+line.count("C"))/len(line)) | ||
| #The values are inserted into a dataframe for plotting | ||
| row = pandas.DataFrame({"Sequence Length": Length, "GC content": GCcount}, index=[0]) | ||
| plotData = plotData.append(row) | ||
| #GC histogram plot | ||
| a=ggplot(plotData,aes(x="GC content")) | ||
| aa= a+geom_histogram()+theme_classic() | ||
| print aa | ||
|
|
||
| #sequence length histogram plot | ||
| b=ggplot(plotData,aes(x="Sequence Length")) | ||
| bb=b+geom_histogram()+theme_classic() | ||
| print bb | ||
|
|
||
| #question2 | ||
|
|
||
| import pandas | ||
| from plotnine import * | ||
| data=pandas.read_csv("heartrate.txt",sep=",",header=0) | ||
|
|
||
| #Here I make the scatter plot showing how running speed and heart rate are related | ||
| plot=ggplot(data,aes(x="Heart rate",y="Running speed")) | ||
| p=plot+geom_point()+coord_cartesian()+stat_smooth(method="lm") | ||
| print p | ||
|
|
||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good job |
||
| #########question 3################ | ||
| from plotnine import * | ||
| import pandas as pd | ||
| dat = pd.read_csv("data.txt") | ||
|
|
||
| #barplot for mean observations in a region | ||
| grouped= dat.groupby(["region"]).mean().reset_index() #mean observations by region | ||
| print grouped | ||
| grouped.columns = ['region', 'mean_observations'] | ||
| p= (ggplot(data=grouped) | ||
| + aes(x='region', y= 'mean_observations',fill= 'region') | ||
| + geom_bar(stat = "identity") | ||
| + theme_classic() | ||
| ) | ||
| print p | ||
|
|
||
| #scatterplot | ||
| d= (ggplot(data=dat) | ||
| + aes(y='observations', x='region', fill= 'region') | ||
| + geom_point(alpha= .1) | ||
| + theme_classic() | ||
| ) | ||
| print d | ||
|
Owner
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good job. jitter plot? |
||
| # why= the bar chart shows us the mean of observations from each region | ||
| #while the scatter plot shows us the value of all observations from each region | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| "Heart rate","Running speed" | ||
| 80,0 | ||
| 85,2 | ||
| 87,3 | ||
| 90,3 | ||
| 94,3 | ||
| 97,4 | ||
| 102,4 | ||
| 60,-5 | ||
| 110,5 | ||
| 117,6 | ||
| 120,6 | ||
| 124,7 | ||
| 130,7 | ||
| 138,8 | ||
| 143,8 | ||
| 150,9 | ||
| 157,10 | ||
| 160,11 | ||
| 165,12 | ||
| 170,12.5 | ||
| 185,14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job