-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsql_notes
More file actions
114 lines (100 loc) · 4.2 KB
/
Copy pathsql_notes
File metadata and controls
114 lines (100 loc) · 4.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
DB advantages:
• More secure
• Store more data
• Many users can write queries to gather insights from the data at the same time.
--> when a db is queried, the data stored does not change; rather, the db info is accessed and presented according to instructions in the query.
Rows -> Records, Columns -> Fields
fields naming convention: lowercase, underscore instead of space, singular, never share name with the table tehy are housed in
key: aka primary key, an unique id
strings: can store up to 250 chars, VARCHAR can store up to thousands of chars
int: can store less than -2b and more than 2b
NUMERIC: float type, could store up to 38 digits
schema: blueprint of database, it shows the bd's design (show datatype of each field and relationship between tables)
AS: Aliasing, rename columns
DISTINCT: having multiple fields returns all distinct combinations of the fields
View: A vitual *table* that is the result of a saved SQL SELECT statement
ex: CREATE VIEW <view_name> AS
SELECT <fields> FROM <table>;
Relational Database (RDB):
• real-lif entities become tables
• reduced redundancy
• data integrity by relationships
* constraints, keys, referential integrity *
An information_schema is a meta-database that holds information about your current database (tables, views, columns and procedures).
You can query it with SELECT * FROM syntax.
Entity relationship diagram, rectangle -> entity types, circles connected to the rectangle -> attributes (columns)
Creating a new table:
CREATE TABLE <table_name>(
<column_name> <data_type>,
<column_name> <data_type>,
...
);
Adding column to table:
ALTER TABLE <table_name>
ADD COLUMN <column_name> <data_type>;
Renaming column in table:
ALTER TABLE <table_name>
RENAME COLUMN <old_name> TO <new_name>;
Drop a column in table:
ALTER TABLE <table_name>
DROP COLUMN <column_name>;
Alter types after table creation:
ALTER TABLE <table_name>
ALTER COLUMN <column_name>
TYPE <data_type>;
Only store DISTINCT data in the new tables:
INSERT INTO <table_name>
SELECT DISTINCT <fields>
FROM <table_name>;
Normally we use the INSERT INTO statement like this:
INSERT INTO <table_name> (<column_a>, <column_b>)
VALUES (<value_a>, <value_b>);
Integrity Constraints:
• Attribute constraints, e.g. data types on columns
• Key contraints, e.g. primary keys
• Referential integrity constraints, enforced through foreign keys
Why Constrainst?
• give the data a structure to follow
• help with consistency, thus data quality
• data quality is a business advantage / data science prerequisite
• Enforcing is difficult but PostgreSQL helps
CAST(<column_name> AS <data_type>) --> type casting
Data Types:
• Enforced on columns
• Define the domain of a column
• Define what operations are possible
• Enforce consistent storage of values
Most common types:
• text: character strings of any length
• varchar(n): a maximum of n characters
• char(n): a fiexed-length string of n characters
• boolean: TRUE, FALSE or NULL
• date, time, timestamp: various formats for date and time calculations
• numeric: arbitrary precision numbers
• integer: whole numbers in the range of approx -2b and +2b
SUBSTRING(<string> FROM 1 FOR x) --> truncate string
not-null constraint:
• disallow NULL values in a certain column
• must hold true for the current state
• must hold true for any future state
NULL:
• unknown value
• does not exist
• does not apply
and so on ...
<column_name> <data_type> NOT NULL
SET NOT NULL or DROP NOT NULL
unique constraint:
• disallow duplicate values in a column
• must hold true for the current state
• must hold true for the future state
ADD CONSTRAINT <constraint_name> UNIQUE(<column_name>)
Surrogate keys:
An artificial primary key. In other words, they are not based on a native column in your data,
but on a column that just exists for the sake of having a primary key.
REFERENCES (foreign_pkey) ON DELETE:
NO ACTION: Throw an error
CASCADE: Delete all referencing records
RESTRICT: Throw an error
SET NULL: set the referencing column to NULL
SET DEFAUL: set the referencing column to the defaul value (if set)