subqueries in the from
clause
From
clause subqueries always seemed somewhat limited to me. Generally, you need to be able to fully create the subquery staticly on it's own, without any data from the main query's results, and then join that static data set to your main query using the on
clause.
But, sometimes, you need to base the subquery on data items from the main query. (Where the subquery gets re-evaluated for each row of the main query.)
Basing the subquery on the main query data happens all the time when using subqueries in the columns
and where
sections of sql statements. for example:
if we have some tables:
student
id name
1 joe
2 mary
scores
id studentId score scoreDate
1 1 75 2025-01-31
2 1 92 2025-02-25
3 1 88 2025-03-15
4 2 87 2025-01-31
5 2 95 2025-02-25
6 2 100 2025-03-15
and we select data to return the top score for each student:
select
student.name,
(
select
score
from
scores
where
score.studentID = student.id
order by
scoreDate desc
limit 1
) as topScore
from
student
or select only the students with their top score being greater than 90:
select
student.name
from
student
where
(
select
score
from
scores
where
score.studentID = student.id
order by
scoreDate desc
limit 1
) > 90
everything is fine, but if you try to use the subquery in the from
clause :
select
student.name,
scores.score as topScore
from
student
left join (
select
score
from
scores
where
score.studentID = student.id
order by
scoreDate desc
limit 1
) as topScore
FAIL: the sql engine will say: "can not access table 'student' in subquery"
WTF, why not?!?
Well the answer lies within the "type" of subquery. A subquery that references the main query in order to re-rerun itself and get new data with every row from the main query is called a "correlated subquery". While the columns
section and the where
section don't care which type of subquery is used (correlated vs non-correlated), the from
section does care.
Question: if you can use a correlated subquery in the columns
clause, why need it in the from
clause?
Well, because sometimes you need to get multiple columns from the subquery, and subqueries in the columns
clause only allow one column to be returned. For example, what if I wanted topScore
AND the scoreDate
of that score? I would love to do this:
select
student.name,
topScore.score as topScore,
topScore.scoreDate as topScoreDate
from
student
left join (
select
score
from
scores
where
score.studentID = student.id
order by
scoreDate desc
limit 1
) as topScore
FAIL: still can't do it of course!
Do we need to create 2 subqueries in the columns
clause?!? That is extremely inefficient! The same query would have to be called twice now. (and using correlated subqueries in the first place incurs a cost, so having to double-up the cost is painful!)
select
student.name,
(
select
score
from
scores
where
score.studentID = student.id
order by
scoreDate desc
limit 1
) as topScore,
(
select
scoreDate
from
scores
where
score.studentID = student.id
order by
scoreDate desc
limit 1
) as topScoreDate,
from
student
(yuck!)
I found the answer
I've been developing in sql for about 20 years and never knew about this. When I found out, I felt a little foolish, but definitely relieved--here's how it's done:
The secret is in the on
clause. On
clauses, like columns
clauses and where
clauses also do not care if a correlated subquery is used, and that fixes the problem.
select
student.name,
topScores.score as topScore,
topScores.scoreDate as topScoreDate
from
student
left join scores as topScores on topScores.id = (
select
id
from
scores
where
score.studentID = student.id
order by
scoreDate desc
limit 1
)
Yes this does incur an extra lookup to get the topScores
record "again", however, since it's via the primary key, it's not so bad.
By the way, some database engines offer special syntax for correlated subqueries in from
clauses, which would then eliminate that extra lookup. See the outer apply
join operator for MS SQL Server, and the lateral
join operator in MySql. As of 9/2025, MariaDB and Sqlite do not have any special syntax.