Advanced SQL (CMU Databases Systems Fall 2019)

Relational Languages

User only needs to specify the answer that they want, not how to compute it.
The DBMS is responsible for efficient evaluation of the query.
- Query optimizer: re-orders operations and generates query plan

Originally SEQUEL from IBM’s System R prototype

Structured English Query Language

IBM releases DB2 in 1983

ANSI Standard in 1986. ISO in 1987

Structured Query Language

Important: SQL is base on bags (duplicates)

Aggregations + Group By

String / Date / Time Operations

Output Control + Redirection

Nested Queries

Common Table Expressions

Window Functions

Aggregates

AVG(col)
MIN(col)
MAX(col)
SUM(col)
COUNT(col)
select count(login) as cnt
  from student where login like '%@cs'
  
  // or
select count(*) as cnt
  from student whare login like '%@cs'
  
    // or
select count(1) as cnt
  from student whare login like '%@cs'

select avg(s.gpa), e.cid
  from enrolled as e, student as s
 where e.sid = s.sid
 group by e.cid

Non-aggregated values in SELECT output clause must appear in GROUP BY clause

# not aggr'd yet
select avg(s.gpa) as avg_gpa, e.cid
from enrolled as e, student as s
where e.sid = sid
and avg_gpa > 3.9
group by e.cid

# right
select avg(s.gpa) as avg_gpa, e.cid
from enrolled as e, student as s
where e.sid = sid
group by e.cid
having avg_gpa > 3.9

Strings

all strings have to be case sensitive and declared with single quotes ‘’

sqlite: “” is ok

mysql is case insensitive

Like is used for stirng matching.String-matching operators

‘%’ Matches any substring (including empty strings.)
‘_’ Match any one character

SQL-92 defines string functins

Many DBMSs also have theri onw unique functions

Can be used ineither output and predicxates

1 2	UBSTRING(name, 0, 5) UPPER(e.name)

SQL standartd says to use || operator to concatenate two or ,more stirng together.

1 2	select 'an' 'dy ' 'pavlo'; > andy pavlo # only mysql

Date / Time operations

operations to manipulate and mudify date / time attributes.
Can be used in either output and predicates
Spport / syntax varies wildly…

1
2
3

select NOW();  # current timestamp
select current_timestamp();
select current_timestamp;

sqlite: no such function

1	select current_timestamp; # keyword

1 2	select extract day(day from date(‘2018-08-29')); # only pavlo select date('2919-08-29') - date('2018-01-01') as days;

1 2	# mysql select round((unix_timestamp(date()) - unix_timestamp(date())) / (60 * 60 * 24), 0) as days

1 2	# sqlite select cast((julianday(current_timestamp) - julianday('2018-01-01')) as int) as days;

!!! but sqlite is the most popular

Photoshop runs sqlite inside

cause there is no copyright on sqlite. But oracle has mysql

Output redirection

select distinct cid into courseids
  from enrolled;
 
 create table courseids (
 select distinct cid from enrolled);

Don’t take it out for me. Write it to this table.

Insert tuplesfrom query into another table:

Inner select must generatge the same columns as the target table.
dbmss have different options / syntax on what to do with duplicates.

1 2	insert into courseids (select distinct cid from enrolled);

(implements will vary widely)

Output control

select sid, grade from enrolled
  where cid = '15-721'
  order by grade;
  
select sid from enrolled
 whhere cid = '15-721'
 order by grade desc, sid asc;   # grade not in input but OK

LIMIT [offset]

limit the # of tuples returned in output.
can set an offset rto return a “range”

select sid, name from student
 where login like '%@cs'
 limit 10;
 
select sid, name from student
 where login like '%@cs'
 limit 20 offset 10;    # pagination

Nested queries

Queries containing other queries
They are often difficult to optimize.
Inner queries can appear (almost)anywhere in query.

1 2	select name from student where sid in (select sid from enrolled);

WHERE

ALL
ANY
IN
EXISTS

select name from student 
whiere sid = any(
select sid from enrolled
where cid = '15-445'
);

select (select s.name from stucent as s
         where s.sid = e.sid) as sname
  from enrolled as e
where cid = '15-445'

select sid, name from student
 where sid => all(
  select sid from enrolled
);
select sid, name from student
 where sid in (
  select max(sid) from enrolled
);
select sid, name from student
 where sid in (
  select sid from enrolled
   order by sid desc limit 1
);

find all coursed that has no students enrolled in it.

select * from course
 where not exists(
  select * from enrolled
  where course.cid = enrolled.cid
);

window functions

Performs a calculation across a set of tuples that related to a single row.
Like an aggregation but tuples are not grouped into a single output tuples.

Special window functions:

row_number() -> # of the current row
rank() -> order position of the current row

The over keyword specifies how to group together tuples when computing the window function.

Use partition by to specify group

select cid, sid,
  row_number() over (partition by sid)
 from enrolled
order by cid;

Find the student with the highest grade fro each course.

select * from (
  select *,
    rank() over (partition by cid
                order bygrace asc)
    as rank
   from enrolled) as ranking
where ranking.rank = 1;

postgres is the only one supports.

Common table expressions

provides a way to write auxiliary statements for use in a larger query.
- think of it like a temp table just for one query.

Alternative to nested queries and views.

with ctename (col1, col2) as (
 select 1, 2
)
select col1 + col2 from ctename;

with ctesource (maxid) as (
 select max(sid) from enrolled
)
select name from student, ctesource
 where student.sid = ctesource.max;
 
 
# recursion
with recursive ctesource (counter) as (
 (select 1)
 union all
 (select counter + 1 from ctesource
   where counter < 10)
)
select * from ctesource;