Question of the Day

“In the age of machine learning, are clever algorithms for large-scale data problems becoming a lost art?”

Insight: Ullman demonstrates that elegant algorithmic thinking remains essential. Locality-sensitive hashing, stream sampling, and approximate counting solve problems at scales where brute force and even ML approaches are computationally infeasible. The art of algorithms is not obsolete -- it is more needed than ever.

Talk Highlights: Data Science: Is it Real?

Professor Ullman presents four algorithmic techniques for large-scale data problems outside machine learning: locality-sensitive hashing for entity resolution, approximate counting of distinct elements in streams, proper sampling of data streams, and optimal triangle-finding in large graphs. He demonstrates that clever algorithmic design can solve problems at scales where brute-force approaches are computationally infeasible.

Key Takeaways

Locality-sensitive hashing solves the seemingly impossible problem of finding similar pairs among billions of items without comparing every pair -- designing hash functions where similar items land in the same bucket while dissimilar items rarely do.
When sampling data streams, sampling by position versus by key value gives fundamentally different results -- position-based sampling is useless for uniqueness queries because items appearing twice have an 18% chance of appearing exactly once in a 10% sample.
The Flajolet-Martin algorithm for counting distinct elements uses only a few bits per counter by tracking maximum trailing zeros in hashed values -- elegant sublinear-space counting.

Notable Quotes

“Everyone knows machine learning is a very important deal these days, but the point I'm trying to make is that there are still some very interesting algorithms, typically involving very large scale data, that are not really machine learning.”
-- Prof. Jeffrey D. Ullman, Turing Minds Speaker Series

“Marvin Minsky once said that you can tell a computer scientist by the fact that they believe that hashing is real. I think locality-sensitive hashing is another one of these ideas that your intuition would be that it's impossible, but it in fact is possible.”
-- Prof. Jeffrey D. Ullman, Turing Minds Speaker Series

Jeffrey D. Ullman

Early Life and Education

Jeffrey David Ullman, born November 22, 1942, is an American computer scientist and Professor Emeritus at Stanford University. He earned his B.S. in Engineering Mathematics from Columbia University in 1963 and his Ph.D. in Electrical Engineering from Princeton University in 1966.

Career and Contributions

Bell Labs and Princeton

After his doctorate, Ullman spent three years at Bell Labs before joining Princeton University in 1969, where he was promoted to Full Professor in 1974.

Stanford University

In 1979, Ullman moved to Stanford, serving as Chair of Computer Science (1990-1994). He is best known for transformative textbooks: the "Dragon Book" on compilers (with Aho and Sethi), the "Cinderella Book" on automata theory (with Hopcroft and Motwani), and "Mining of Massive Datasets" (with Leskovec and Rajaraman). His doctoral students include Sergey Brin, co-founder of Google.

Awards and Honors

2000: Knuth Prize
2010: IEEE John von Neumann Medal (with Hopcroft)
2020: ACM A.M. Turing Award (with Alfred Aho) for foundational contributions to programming language implementation and compiler design
2020: Elected to the National Academy of Sciences

Career Timeline

1942: Born on November 22

1963: B.S. from Columbia University

1966: Ph.D. from Princeton University

1966-1969: Researcher at Bell Labs

1969: Joined Princeton University faculty

1979: Joined Stanford University

1990-1994: Chair of Stanford Computer Science Department

1994: Named ACM Fellow

2000: Awarded the Knuth Prize

2003: Became Professor Emeritus at Stanford

2010: IEEE John von Neumann Medal

2020: ACM A.M. Turing Award (with Alfred Aho)

Prof. Jeffrey D. Ullman