Preface
This is the first piece in our Majors series. The whole point of the series is to give an intro-style overview of what different disciplines actually look like as undergraduate majors — what subfields they have, what kinds of career paths come out of them. For the first piece I want to talk about CS, walking through how my understanding of the field has shifted since I came to college, and how that compares to what I thought I knew in high school.
My interest in CS started as a kid playing with robots — writing simple code to make robots do deterministic tasks, the LEGO robotics kind. Later I did some algorithm competitions and picked up some competitive programming techniques. In high school I worked on a couple of small AI projects too, and that was basically my mental model of CS going into college. But that mental model fell apart almost immediately once I got to college. Looking back, what I had been exposed to was just the surface of CS — as a discipline it actually has many distinct areas with a wealth of interesting open problems. And honestly that’s the thing I admire most about it: unlike some disciplines that hold themselves at a remote, intimidating distance, CS uses different layers of abstraction to let people from all kinds of backgrounds participate and feel its appeal. Most readers of this article probably don’t understand how large language models work under the hood, but they benefit from them every day. People with a bit more interest can dive deeper — using a coding agent to write code for a project, or calling an LLM API to automate a task. People with more background can keep going down the foundation-model line: fine-tuning for specific tasks, even pretraining, developing the relevant algorithms, working on the underlying mathematical theory, and so on. There’s a place in this field for everyone. And foundation models are just one currently popular subfield within AI; the discipline has many other branches and areas like systems and theory. This is what I mean when I say CS has both vertical accessibility and horizontal heterogeneity.
Honestly, the reason CS exhibits this horizontal heterogeneity is that the discipline itself is the highly coupled product of math, ECE, physics, linguistics, cognitive science, and more. And that’s precisely why the boundaries between its subfields are often blurry. As for how exactly to carve them up, different perspectives produce different taxonomies, but the differences tend not to be huge. Here I’ll use the taxonomy from CSRankings and look at things from a research angle.
On CSRankings
Before getting started, let me briefly introduce CSRankings. The site is widely regarded within the CS community as a reasonable proxy for a school’s research strength. Its rankings are computed entirely from a weighted count of how many top-conference papers each school’s faculty publish in each area. The methodology is fairly objective, but it has its issues. First, it can’t distinguish groundbreaking work from incremental work — both count as one paper in its statistics (publication += 1) even though their actual impact can differ by orders of magnitude. Second, it counts conference rather than journal publications. (As an aside: the convention in CS is to publish in conferences over journals, because the field iterates so fast that journal review cycles, often several times longer than conference cycles, can’t keep up.) That counting rule puts at a major disadvantage the small set of professors and subfields — biocomputing, for example — that prefer journals. There’s also the fact that schools with more faculty have a built-in advantage: even if average faculty quality is the same, the school with more faculty will rack up more papers. But that’s not entirely bias, because more faculty does mean a thicker ecosystem. I’ll come back to these specific issues later in the school-selection section; for now, just keep in mind that CSRankings isn’t a perfect ranking but is still a useful reference for understanding any given school’s strength, and it works well as a guide to the field’s taxonomy.
OK, back to the main thread. On CSRankings, CS is grouped into roughly four big clusters: AI, Systems, Theory, and a set of Interdisciplinary areas. Each cluster is relatively self-contained — every area, and even every subfield within it, has its own community — but I’ve always felt these four clusters have a logical dependency: Theory is the most foundational layer, providing the mathematical grounding for algorithms and computation itself; Systems sits on top of Theory, building the actual hardware and software infrastructure that runs computation; AI extends Systems with applications that learn from data; and Interdisciplinary is where CS meets other disciplines (biology, economics, the arts, and so on), generally tilting toward applications. None of this is absolute, of course — AI itself has plenty of theoretical research directions, like learning theory. But I’ll go through them in this logical order below.
Theory: the mathematical foundation of CS
What Theory broadly does is study the mathematical properties of computation itself — what’s computable, what’s not, how much resource a computable problem requires, how to prove an algorithm is optimal. These are the bedrock of CS as a discipline, providing the language and tools we use to talk about computation. On CSRankings, theory is further split into three subfields: Algorithms & Complexity, Cryptography, and Logic & Verification.
Algorithms & Complexity
Algorithms & Complexity is the most classical direction in theory. I remember from my algorithm-competition days that the most important thing to do before solving any problem was to look at the data scale and constraints to pick the right algorithm — that’s what got you within the time complexity the problem allowed. The spirit is the same in research, just with much more complex problems: algorithm research is about designing faster algorithms for specific problems (graph algorithms, approximation algorithms, online algorithms, randomized algorithms, etc.), and complexity is the inverse — studying the minimum resources (time, space, randomness, and so on) any problem in a given class requires, drawing a theoretical lower bound for algorithm design. The two lines push the upper bound down and the lower bound up, with the goal of meeting in the middle. For example, the lower bound for comparison-based sorting has been proven to be $\Omega(n \log n)$, and merge sort and heap sort happen to hit that bound, so sorting is essentially settled in the comparison model. But many harder problems — matrix multiplication, all-pairs shortest paths — still have non-trivial gaps between their upper and lower bounds, and that’s exactly what the algorithm and complexity people are working on. The famous P vs NP problem belongs to complexity, and it’s been open for over fifty years — listed by the Clay Mathematics Institute as one of the seven Millennium Prize Problems.
Cryptography
Cryptography looks on the surface like an application of algorithms, but it actually has its own complete and independent theoretical framework. It studies how to guarantee the confidentiality, integrity, and authenticity of information in the presence of an adversary. The unique thing about this direction is that security definitions are always built on top of some complexity assumption — RSA’s security, for example, rests on the assumption that factoring large integers is hard. So cryptography and complexity are intrinsically entangled: you have to start from a hardness assumption before you can build a crypto scheme on top of it. Active topics in this area in recent years include post-quantum cryptography (worried that future quantum computers will break today’s mainstream crypto, so designing schemes that resist quantum attacks ahead of time), zero-knowledge proofs (letting one party prove they know some secret without revealing any additional information — the foundation of many blockchain systems), and multi-party computation, among others.
Logic & Verification
Logic & Verification is a somewhat smaller community, but the problems it tackles are very concrete: how do you use formal methods to mathematically prove that a piece of code or a system is correct? For an OS kernel, a compiler, or a distributed protocol, how do you guarantee that under all possible inputs it won’t crash, leak data, or produce a race condition? That’s what verification answers. This area has deep overlap with the PL (programming languages) community within systems, since many verification tools are built on top of PL concepts like type systems and operational semantics.
This part may feel abstract to readers who haven’t encountered verification before, so let me use a software-development analogy to explain. In everyday software development, you typically check whether a piece of code is correct by writing a lot of test cases and running them, not by doing formal verification. Test cases are easy to understand: if I have a function $f(x) = x^2$, then input $2$ should give output $4$, input $3$ should give output $9$, and so on. So you just write a lot of test cases that check whether each input produces the expected output. The cost is much lower than formal verification, because testing only checks a finite set of specific inputs against expected outputs, rather than mathematically proving that the program meets its requirements for all possible inputs. But real-world problems are usually much more complex, and there will always be corner cases your test suite doesn’t cover. For most software, that’s not a big deal — products we use every day like Chrome contain plenty of known and unknown bugs, and the core logic of building them is fast iteration rather than perfection. It’s a tradeoff: users can put up with small annoyances and wait for the next update. But in domains like aviation, aerospace, or cryptocurrency, an unconsidered corner case can mean someone dies or someone loses a lot of money — and that’s when it’s worth paying many times the cost to do formal verification.
Chip design follows the same logic. Once a chip is taped out, you can’t patch it. Out of the millions of chips you produce, just one corner case being triggered can mean recalling the entire batch. A classic example is Intel’s 1994 Pentium FDIV bug — floating-point division gave wrong answers under certain extreme inputs, and Intel ended up spending nearly half a billion dollars on the recall. So before tape-out, modern chips usually go through extensive formal verification to prove the design satisfies its specification under all valid inputs.
Theory has a high bar for math background. The work mostly happens on a whiteboard rather than in an IDE, and papers are almost entirely proofs rather than experiments. So if you don’t genuinely enjoy reasoning and proof, going it alone in this direction can be tough. On the other hand, theory results have the longest shelf life in CS — a good algorithms paper can stay heavily cited for decades, whereas systems and AI iterate so fast that what was state-of-the-art five years ago might be irrelevant today.
Systems: making computation actually run
Systems is probably the largest cluster in CS by footprint. It covers nearly all the infrastructure that lets computation actually run on hardware, from chip design at the bottom up through OS, networking, databases, PL, SE, and so on. The research style is almost the opposite of theory: theory is formulas and proofs on paper, systems is hands-on. Almost every paper builds a real prototype and then measures its performance (latency, throughput, power, etc.), using empirical data to back up its claims. Systems papers therefore often come with a substantial codebase.
If we take the algorithm-competition perspective: an algorithm tries to bring down the time complexity, while systems certainly also cares about making things run faster (i.e., optimizing the constant factor) — but it actually owns much messier territory than that, including how to keep multi-threaded code race-condition-free, how to avoid losing data when a machine crashes, what kind of API is good for the upper layer to use, and so on. Most of those concerns have nothing to do with raw speed, but they’re all systems’ responsibility.
CSRankings splits systems into about a dozen subfields. Below I’ll walk through a few of the more representative ones.
Computer Architecture
Computer Architecture is the part of systems closest to hardware, studying how the insides of CPUs, GPUs, TPUs, and similar chips should be designed: how the cache hierarchy is organized, RISC vs CISC instruction sets, how the pipeline is laid out, how to handle branch prediction and memory consistency, and so on. As Moore’s Law has been slowing down over the last few years, the gains from just stuffing in more transistors are basically gone, so the architecture line is increasingly focused on designing accelerators for specific workloads (deep learning, graph computation, cryptography, and so on). Google’s TPU and NVIDIA’s Tensor Cores are products of this trend.
Operating Systems
Operating Systems mostly studies how the OS kernel should be designed: how to schedule processes, how to manage memory, how to handle I/O, how to implement file systems, and so on. Active directions in recent years include unikernels (merging an application with the kernel into a single-purpose binary to squeeze out performance), verified microkernels (using formal methods to prove the correctness of the kernel core on top of the microkernel split architecture), and OS redesigns for new hardware (persistent memory, SmartNICs, disaggregated memory, and so on).
Networking
Networking studies how data is transmitted reliably and efficiently between machines: from switch design within a LAN, to wide-area networks across data centers, to routing protocols on the Internet — all of it falls under this area. A lot of recent work has focused on data-center networking, because cloud computing and large-model training are pushing everyone to care more about achieving extremely low latency and non-blocking communication inside the data center.
High-Performance Computing
High-Performance Computing studies how to scale a computation efficiently across tens of thousands of nodes. The traditional applications are scientific problems — climate simulation, fluid dynamics, first-principles materials calculations — the kind of workload that runs on national supercomputers every day, with PDE solvers and large-scale linear algebra (the broader family of numerical methods) sitting underneath. Honestly, HPC feels to me like the most encompassing direction in systems: deeply intertwined with architecture and networking, and at the same time having to care about applied-math details like numerical stability. Also worth mentioning: today’s large-model training is built directly on the infrastructure HPC has been developing for decades. GPU clusters, high-speed interconnects between nodes, collective communication — these have all been HPC’s bread and butter, and the entire training stack for large models is essentially built on top of them.
While I’m here, let me also mention numerical methods. It’s essentially a branch of applied math; when discussed within CS, it generally falls under the broader umbrella of scientific computing, which, beyond numerical methods, also covers HPC implementation, computational physics/chemistry/biology and other domain sciences, and the development and engineering optimization of scientific software. Since HPC is the most common compute platform for numerical methods, the two communities overlap substantially and are often the same group of people. What numerical methods actually does is simple: it resolves the fundamental mismatch between math and computers. This mismatch has two distinct sources. One is the gap between continuous and discrete: many mathematical objects are inherently continuous — derivatives, integrals, solutions of PDEs, eigenvalues of matrices — but computers can only perform discrete, finite-step operations, so derivatives have to become finite differences, integrals become summations, PDEs become algebraic equations on a grid, and eigenvalues get approximated through iteration. The other is floating point itself: computers approximate real numbers with floating-point values, every step loses a little to roundoff, and at scale these accumulate to the point where the final result can become unusable. The job of numerical methods is to address these two issues — making algorithms convergent and efficient enough.
In the foundation-model era, the continuous-vs-discrete layer basically doesn’t come into play during model training, because training itself is already discrete matrix multiplication with no continuous objects that need discretizing; the floating-point layer, however, is the real core problem. Training is essentially the accumulation of massive numbers of floating-point operations, and once numerical stability is mishandled the loss can spike and the model just diverges, costing tens of millions of dollars. That’s why so much work at the intersection of numerical methods and systems — mixed precision (using lower-precision formats like FP16 / BF16 / FP8 for performance — but FP16 itself introduces stability issues), loss scaling (used to patch FP16’s gradient underflow; BF16, with the same exponent width as FP32, usually doesn’t need it), and increasingly sophisticated scaling strategies — ends up at the heart of large-model training.
Database
Database studies how to store, index, and query large amounts of data. A modern database system has to handle a lot of problems: how to do concurrent transactions while maintaining ACID, how to distribute queries across many machines, how to optimize SQL query plans, how to handle streaming data, and so on. Active directions in recent years include in-memory databases, cloud-native databases (Snowflake, BigQuery, that kind of thing), and vector databases purpose-built for large-model retrieval.
Programming Languages
Programming Languages studies the languages themselves: how the language is designed, how the type system is built, how the compiler translates high-level code into machine code. Different languages embody different design tradeoffs. C++, for instance — popular in algorithm contests and high-frequency trading — gives the user full control over memory and is fast, but it’s easy to write use-after-free or buffer overflow bugs. Java and Python use a garbage collector to manage memory, which is safer but adds runtime overhead. Rust has been on the rise lately because, with ownership and borrow checking, it does memory safety at compile time — you get the safety without GC, and you avoid the worst classes of C++ bugs. The PL community has deep overlap with the formal verification work mentioned earlier, so there’s a lot of cross-talk between the two.
Security
Security spans a very wide range of layers: from low-level hardware security (Spectre, Meltdown, those kinds of side-channel attacks) to OS security and network security, all the way up to web security and ML security at the application layer. Almost every layer has its own attack model and defense mechanisms.
Software Engineering
Software Engineering studies how to organize, test, and maintain large code bases. With the rise of AI, the area has become very active — program synthesis, automatic bug fixing, and automatic test-case generation are all hot topics right now. Worth noting: Software Engineering as an academic research area has very little to do with the SWE (software engineer) job we usually talk about at big tech companies; they just happen to share a name. What SWE actually means in practice will get its own treatment later in the career-path section.
The problems systems researchers tackle mostly come from real engineering pain points, so academia and industry are unusually tightly connected in this area. That said, it’s not absolute. Software engineering, just mentioned, is a counterexample: the questions it cares about as a research topic — how to organize a large codebase, how to scale code review, how to design CI/CD pipelines — are questions on which companies like Google, by virtue of having enormous codebases, have accumulated far more first-hand experience than academia, and the relevant best practices typically come out of industry first. The unique contribution of academic software engineering researchers is to systematize these industry patterns, but the original problem-solving frontier really does sit in industry.
Systems demands the opposite kind of person from theory: it places a high bar on engineering ability, the work happens mostly in a terminal and a profiler rather than at a whiteboard, and papers are almost all benchmarks and measurements rather than mathematical proofs. So if you don’t genuinely enjoy writing code and wrestling with real hardware, the work can feel pretty tedious. On the flip side, systems also has one of the fastest paths to real-world impact in CS — a meaningful paper might get absorbed into industrial standard practice within just a few years, and during your PhD it’s quite possible to see a system you built yourself actually get deployed.
AI: from perception to decision-making
AI is the fastest-growing and most-talked-about direction in CS in recent years, so it hardly needs an introduction. But AI itself is split into many subfields too — let me briefly walk through them.
CSRankings divides AI into AI (general), Computer Vision, Machine Learning & Data Mining, Natural Language Processing, and Web & Information Retrieval. But honestly, ever since the transformer burst onto the scene in 2017 and then triggered the GenAI wave that followed, the boundaries between these subfields have grown blurrier and blurrier. Vision and NLP used to be relatively independent communities; now everyone uses the same backbone (transformer) and the same paradigm (pretrain + finetune), and multimodal models are taking over. So below I’ll go by the actual research landscape today rather than strictly following CSRankings’ categories.
Foundation Model
Foundation Models are the hottest direction of the last couple of years. The core problem is how to train a model that generalizes across a wide range of tasks. Specific subproblems include architecture design (how to modify the attention mechanism, how to handle long context), training (data composition for pretraining, scaling laws, RLHF, and so on), inference (acceleration, quantization, speculative decoding), and evaluation (designing benchmarks that actually measure model capability). This direction is extremely compute-intensive — much of the frontier work can only be done in industry labs (OpenAI, Anthropic, Google DeepMind, and others), because academia rarely has access to GPU clusters at the scale required to train a frontier model.
Computer Vision
Computer Vision studies how a model understands images and videos. Specific tasks include classification, detection, segmentation, generation, and so on. Before the foundation-model era, CV was a relatively self-contained direction with its own backbones (ResNet, ViT, and so on) and its own task suite. Today, vision is increasingly being absorbed into multimodal models. Generation is also very active right now — diffusion models and video generation are some of the hot areas.
Natural Language Processing
Natural Language Processing studies how a model understands and generates natural language. The field has essentially been reshaped by LLMs. The classic NLP tasks — translation, summarization, question answering — are now downstream applications of LLMs, and so much of NLP research has shifted toward the capabilities and alignment of the LLMs themselves.
Reinforcement Learning
Reinforcement Learning studies how an agent learns an optimal policy through interaction with an environment. RL has had an interesting arc over the last decade: it was once most famous for game-playing (AlphaGo, AlphaStar), then went through a stretch of being seen as not particularly practical. But over the last couple of years RLHF has made it indispensable for training LLMs and brought it back into the spotlight. The recent reasoning models (OpenAI’s o-series, DeepSeek’s R-series) have pushed RL to the very center of the LLM training pipeline.
ML Theory
ML Theory studies the mathematical properties of ML: why deep networks generalize, what the optimization landscape looks like, why over-parameterized models don’t overfit, and so on. It has deep overlap with the theory cluster mentioned earlier, and requires a strong math background.
AI for Science
AI for Science is a relatively new cross-disciplinary direction that has emerged in the last few years, applying ML to specific scientific problems. The most famous example is DeepMind’s AlphaFold, which essentially solved protein structure prediction — an open problem in biology for fifty years — and earned the 2024 Nobel Prize in Chemistry as a result. Beyond that, AI for math, AI for materials science, and others are all getting more attention. As an aside: using AI to do the kind of formal verification mentioned earlier, in order to bring down its cost, is now one of the most important research questions within AI for math.
Honestly, AI feels to me more like the fusion of theory and systems, with a very wide range. Research in this area can be very theoretical or very systems-y. So the bar for the people in it is unusually high. The very top AI researchers tend to be full-stack researchers strong in both math and engineering — they can derive scaling laws and other theoretical analyses, and at the same time get a full training pipeline running stably across thousands of GPUs. That kind of profile is rare in any other CS subfield, but in AI it’s almost the standard at top labs.
Separately, from a personal-development standpoint, AI is probably one of the best-paying and best-employment directions in CS right now. But it’s also extremely competitive and iterates extremely fast — a paper put up on arxiv can be obsolete three months later. Surviving in this direction requires a particular mindset: you have to keep up with the community’s pace, but you also have to keep your judgment in the middle of all the hype and not get swept along.
The Bitter Lesson
Finally, I want to briefly touch on The Bitter Lesson. The concept was put forward by computer scientist Richard Sutton in a short essay, and its core point is actually quite simple: general methods combined with scaled-up compute will, in the long run, always crush the specialized methods built on carefully designed human priors. This pattern has been demonstrated over and over again across 70 years of AI research. What I find most “bitter” and most ironic about it is this: a lot of conventional wisdom tells us that hard work always pays off, but according to The Bitter Lesson, those specialized efforts ultimately get replaced by more general methods — parse trees and various linguistic features in NLP went this way, hand-crafted descriptors like SIFT and HOG in CV went this way, and many things in our own lives go the same way: the diligent acts we were raised to perform — memorizing an essay, drilling a problem set, reading a book — are essentially specialized optimizations within some pre-defined evaluation framework. These efforts are effective as fine-grained optimization against an external framework, but the moment the framework changes, their transfer value collapses. So what actually stays effective across frameworks tends to be the seemingly less concrete general capabilities — things like how to update your judgment from raw observations, or how to slowly arrive at your own line of reasoning on problems without ready-made answers. Of course, these capabilities still have to be cultivated through diligent acts — reading, drilling problem sets — but what they emphasize is the generalization ability to see through surface phenomena to the underlying essence, not the effort itself; so it’s not the case that the people who read the most or work the hardest necessarily end up the strongest, which is actually quite sobering. That, I think, is where the “bitter” in The Bitter Lesson really comes from. Cultivating generalization ability is bound to look inefficient in the short run, but in the long run, that quality is what’s truly worth scaling. And that’s exactly the small reflection I want to share in the final section of this article, From feature engineering to representation learning.
Interdisciplinary: where CS meets other fields
The last cluster is interdisciplinary — the territory where CS meets other disciplines. The subfields here mostly apply CS methods and tools to a specific problem domain, so working in this area generally requires a dual background. Below I’ll walk through a few representative directions.
Computational Biology / Bioinformatics
Computational Biology / Bioinformatics is CS applied to biology. Specific tasks include genome sequencing, protein structure prediction, drug discovery, single-cell analysis, and so on; the AlphaFold mentioned earlier is a landmark result in this area. Because both the data scale and complexity in biology are growing very fast and ML tools keep getting stronger, this line is likely to be a growth area for a long time to come.
Computer Graphics
Computer Graphics studies how to use computers to generate, represent, and manipulate visual content — film effects, game rendering, 3D modeling, physical simulation all fall under this umbrella. With the development of VR/AR and the progress of generative models in recent years, the area has become more active too. Techniques like NeRF and Gaussian Splatting are good examples of successfully combining traditional graphics with modern ML.
Human-Computer Interaction
Human-Computer Interaction studies the interaction between people and computers: UI/UX design, accessibility, AR/VR, novel input devices. HCI is one of the most user-facing directions in CS, so its research often pulls in user studies and psychology — methodologies that aren’t traditionally part of CS. With the explosion of GenAI, a lot of people have started studying interactions between AI and humans and AI’s social impact, all of which also fall within HCI.
Robotics
Robotics studies how to make a physical agent perceive, reason, and act in the real world. The area naturally spans ML, control theory, mechanical engineering, and several other disciplines. With the recent progress in LLMs, using LLMs as the high-level planner for robots has become an active research direction.
Economics & Computation
Economics & Computation sits at the intersection of CS and economics, somewhat similar to Operations Research, with research questions including mechanism design (how to design an auction so that participants bid honestly, for example), algorithmic game theory, and market design. The line is also tightly connected to industry — the ad-bidding systems behind Google, Meta, and similar companies are backed by a lot of research in this area.
Visualization
Visualization studies how to present high-dimensional or complex data in ways humans can understand. The direction is small but very important in scenarios like data science and scientific computing.
CS Education
CS Education studies how CS should be taught. Specific research questions include language design (how to design a programming language friendly to beginners), pedagogy (how to teach abstraction, how to teach recursion), and access equity (how to bring more underrepresented groups into the discipline).
The interdisciplinary cluster as a whole offers a very wide menu, well-suited to people who, in addition to CS, have an interest in some specific domain. Another nice property: because the problems come from many sources and funding is more spread out, the area is relatively less affected by hype cycles in any single field.
Coming back to the question I opened with: what kind of discipline does CS turn out to be once you get to college? My biggest takeaway, personally, is that CS is much broader than the coding and algorithm contests I encountered in high school. It can be an extremely mathematical discipline (theory, for example), an extremely engineering-oriented discipline (systems, for example), and a discipline that crosses with almost any other field (even law, philosophy, and so on). Everyone can find a direction that fits their background and interests, and that’s the real meaning of horizontal heterogeneity that I mentioned at the beginning.
One more note for younger readers: the survey of directions above is meant to give you a basic sense of the landscape, not to push you to commit to a particular area right now. I’ve actually met plenty of people who finish their undergrad without any especially concrete idea of what they want to do, and that’s totally fine. For most people, the eventual commitment ends up being something like “this direction looks promising” or “I happened to get an offer from a professor or a company in that area,” and that’s how things settle. After laying out all of this, my only hope is that you come away with a better sense of what CS as a discipline actually looks like and maybe spot some little corner that feels right to you — not that you feel any pressure to decide right now.
Choosing a CS School
Before getting into this section, I’d recommend reading the On Choosing Colleges guide first, since a lot of what follows builds on the analysis there. One thing to flag upfront: this section sticks with the research-oriented field taxonomy from earlier and looks at schools from the angle of undergraduate teaching and academic research. The career-oriented analysis I’ll save for the next section.
Teaching-style spectrum
The first thing I’d say is that, for undergraduates, CS programs at different schools roughly fall on a spectrum, with math-heavy teaching at one end and engineering-heavy teaching at the other. The teaching style at a school usually reflects its research style to some extent, too.
The most distinctive feature of math-heavy programs is that the math prerequisites run deep. Princeton and Caltech are typical examples — required courses expect strong proof-reading and proof-writing skills. These schools fit people who want to do ML theory, cryptography, formal methods, and other math-heavy directions.
Engineering-heavy programs go the other way: the project load on required courses is heavy, and there’s a lot of hands-on training. CMU and UIUC are typical examples. While these schools do have hardcore theory requirements (like CMU 15-251 and UIUC CS 374), the overall culture and graduates’ muscle memory lean more toward systems engineering. On top of that, plenty of mid-tier R1s go very deep in systems too — UTK, for instance, has a heavy engineering culture and is among the very top in HPC. These schools fit people aiming for systems research, industry-bound SWE, or applied ML.
The middle ground is programs like MIT and Stanford that don’t have a pronounced one-sided cultural skew.
For most liberal arts colleges the picture is different — the styles aren’t as differentiated. As I mentioned in On Choosing Colleges, because LACs have limited lab resources, they default to emphasizing theory. Harvey Mudd is the exception: the whole program is well known for being engineering-heavy.
From a career standpoint, math-heavy training builds a stronger theoretical foundation and gets you closer to the essence of computation; engineering-heavy training, on the other hand, fits a wider set of industry jobs accessible to undergrads and is friendlier for going straight into the workforce after graduation. I’ll come back to the specifics later in the career-path section.
From a pure learning-and-skill standpoint, I personally don’t think the math-heavy and engineering-heavy ends are inherently better or worse than each other — what really matters is whether you can develop genuine generalization ability in whichever direction you pick (see The Bitter Lesson earlier). Math-heavy training isn’t just pushing formulas around on paper doing meaningless abstractions, and engineering-heavy training isn’t just grinding problems, debugging, and doing mechanical repetition; for someone who only knows how to put in effort by rote, neither end of the spectrum will actually make them stronger, while someone with real generalization ability can carve out their own path on either end. So there isn’t a permanently right answer on this spectrum — it’s more like a self-auditing tool. You need to ask yourself: “do I prefer proof or implementation? theory or applied science? mathematical clarity or engineering elegance?” and then pick the position on the spectrum that fits you.
That said, at any decent R1 CS program — math-heavy or engineering-heavy — every area will have world-class faculty, so even if you commit to a school whose teaching emphasis isn’t a perfect fit, you can always take more electives in your area of interest or seek out experts in that area to do research with. That’s one of the biggest practical advantages of a school with a large faculty count.
CSRankings limitations and which schools are strong where
While we’re on the topic, this is a good place to flag a known limitation of CSRankings: a school’s ranking largely reflects institutional total output rather than per-faculty quality, so departments with few faculty but high per-capita strength end up significantly understated. I promised earlier, when introducing CSRankings, that I’d come back to this in the school selection section. Caltech is the clearest victim of this limitation: the school is small to begin with (around 200-something undergrads per class), and the CS department is also small with very few faculty, so even though they have a lot of very strong professors, their CSRankings ranking comes out very low. So how do you hedge against this bias a bit? It’s actually pretty simple: in practice, clicking on a school’s name in CSRankings expands the list of all the school’s faculty along with their counts, and you can read it by combining the strength of the top faculty with the school’s overall ranking, rather than just looking at the institutional total. That said, as I noted before, the lower ranking caused by having fewer faculty isn’t entirely a bias either — schools with fewer faculty necessarily offer less breadth in both course offerings and research directions, which is a real tradeoff worth weighing. This shows up most clearly at LACs. The reasons I mentioned earlier — that LACs default to emphasizing theory and offer limited course variety — are partly limited lab resources, but the other half of the explanation is exactly that small faculty counts cap how broadly the elective offerings can cover.
Next, for readers interested in research, I want to give a rough sense of which areas each school is strong in. The Top 4 — CMU, MIT, Stanford, UC Berkeley — go without saying: these four are the strongest in most mainstream directions, so you basically don’t need to audit any specific subarea. Among the rest, University of Washington is the best in AI, UIUC is the strongest in systems, and Princeton is the best in theory. But can we really rank schools like that? This kind of single-label description is admittedly comfortable — it lets the reader build a mental model at minimal cognitive cost — but in practice it’s an irresponsibly crude simplification. “School X is especially strong in area Y” really means closer to “school X has unusually high visibility in area Y” or “school X is historically known for area Y” — it absolutely does not mean “school X is strong only in area Y,” it doesn’t mean “area Y has only this one strong school” either, and it certainly doesn’t mean “lower-ranked schools have no top-tier professors.” So if you have a specific direction in mind, I’d suggest going directly to CSRankings, picking the specific subarea filter, and taking a look at what the ecosystem in that area actually looks like, instead of relying on these rough labels.
CSRankings isn’t absolute either — it reflects a school’s historical publication record, can’t capture a program’s current trajectory, and even less can it reflect each faculty member’s actual influence in the community. A recently-hired rising star might have huge potential but isn’t reflected in the data yet; senior faculty who have already retired might still count; and a Turing Award winner might not have published all that many papers across their entire career. So a more practical complement is to look at the peer-reviewed recognition a school’s faculty have received. For early-career junior faculty, look at awards like the Sloan Research Fellowship and NSF CAREER Award, which reflect how the field judges these people’s potential over the next few years; for senior faculty, look at higher-bar honors like ACM Fellow, IEEE Fellow, and NAE / NAS membership, which reflect peer recognition of a scholar’s long-term sustained impact. Beyond those, at the single-paper level, top-conference Best Paper Awards and Test of Time Awards also carry strong signal — the former captures a single work’s immediate impact, while the latter captures whether a piece of work still holds up after the test of time. On top of that, the placement of the PhD students in a professor’s group is also an important reference point.
These honors often reflect a school’s true level better than any ranking, but they take quite a bit of work to research. Thankfully, with the various AI tools now widely available, you don’t actually need to flip through department websites one by one — just have an AI agent (Claude or ChatGPT’s deep research mode, for example) list them out for you. AI tends to be much faster and more accurate than humans at this kind of lookup task.
At the end of the day, the review process for these honors, like the scoring process for any ranking, isn’t perfect either. In On Choosing Colleges I touched briefly on the randomness in faculty hiring, but the randomness in these honor review processes is no less than in faculty hiring itself. So I hope readers, while not blindly trusting rankings, also avoid blindly trusting honors. The “Hermès or canvas tote” analogy I used in On Choosing Colleges applies just as well here: honors should ultimately be treated as just one indirect signal of capacity, not capacity itself; any metric, at its essence, is ultimately a compression of a living, breathing person into a soulless low-dimensional vector. (For more, see the final section of this article: From feature engineering to representation learning.)
Sweet spot framework
For readers interested in research, the ceiling × demand framework I laid out in On Choosing Colleges applies directly here. Examples like UIC for data mining and Utah for graphics — field-specific top mid-tier R1s — are textbook cases of the CS sweet spot. There are plenty more such examples across CS, and since everyone’s strengths are different, the sweet spot varies from person to person, so I won’t enumerate them all here. That said, top schools probably have more recipients of high-prestige awards like the Sloan Research Fellowship, but that doesn’t mean such people don’t exist at mid-tier R1s — so, as I mentioned in On Choosing Colleges, going to a decent school still leaves plenty of room to develop well.
National lab proximity and the compute advantage
One more mechanism worth flagging is the compute advantage that national labs provide. The US Department of Energy operates multiple national laboratories, and the top ones host top-tier supercomputing facilities, offering compute resources far beyond what’s typically available in academia. Schools geographically close to these labs naturally benefit from proximity, gaining structural advantages in collaborative research through channels like joint appointments and collaborative projects. This is a dimension almost orthogonal to overall school rankings. Concretely, California’s Lawrence Berkeley National Laboratory hosts Perlmutter, with UC Berkeley right down the hill; Illinois’s Argonne National Laboratory hosts Aurora, the world’s third-fastest supercomputer, with UChicago, UIC, UIUC, Northwestern, and other nearby schools; the world’s second-fastest supercomputer Frontier is at Tennessee’s Oak Ridge National Laboratory, with UTK and other Southern schools nearby; the world’s fastest supercomputer El Capitan is at California’s Lawrence Livermore National Laboratory, with the UC system and other California schools nearby. Beyond these there’s also California’s SLAC (near Stanford), New York’s Brookhaven (near NYC-area schools like Stony Brook, Columbia, and NYU), and many other examples. These partnerships are often complete game-changers for research in systems (especially HPC), AI for science, and large-scale ML. So if you’re aiming for one of those directions, when picking a school, beyond the metrics already discussed, it’s worth checking whether there’s a national lab nearby. As a rule of thumb, any R1 with this kind of geographic advantage will have at least a few faculty collaborating with a national lab, and at schools with deeper partnerships (UC Berkeley, UTK, that kind) the count of collaborating faculty is much higher.
CS Career Paths
Note: snapshot is as of May 2026. The CS job market shifts quickly — hot specializations rotate, comp levels fluctuate, hiring bars move — so this section is meant as a general landscape to help you build a reliable mental model; for the latest state of any specific role, defer to the latest information.
Finally, let me talk about the common CS career paths for undergrads planning to go straight into industry. Honestly, for undergrads the major-aligned paths are basically just SWE and quant. There are others — startups, product management, consulting — but I don’t have enough experience with those to say much, and they’re not CS-specific anyway, so I’ll skip them here.
I’ve always thought “SWE” at many companies is a catch-all bucket — anything goes in. So the actual range of work the title covers is much wider than it sounds. Depending on the company’s business and stack, the SWE landscape internally is quite diverse. Let me walk through a few representative branches.
Product Engineer
The most common is the Product Engineer, which further splits into App, Web, and Backend depending on the specific business. The work centers on implementing the business logic the product manager designs — adding a new feature to an app, getting page load time from 800ms down to under 400ms, or designing a backend that can hold up under peaks of tens of millions of QPS for a high-concurrency service. This line emphasizes business understanding, fluency with distributed systems, and the engineering ability to iterate efficiently inside a large monorepo. Product is the largest headcount line at big tech and the friendliest to new grads — the vast majority of new-grad SWE hires at Meta, Google, and Amazon start out on product teams. Beyond the traditional FAANG companies, fast-growing larger firms like Stripe, Notion, Figma, and Cloudflare also hire mostly product engineers — smaller than the giants, but each engineer gets to own a wider slice of the product, the bar on engineering independence tends to be high, and they’ve become a popular choice for new grads in recent years.
Systems/Infrastructure Engineer
Next, on the systems side, is the Systems/Infrastructure Engineer. The work here is closer to the systems research mentioned earlier, but with different priorities from a researcher’s: the researcher cares about turning an idea into a paper, while the infra engineer cares about turning that paper’s idea into something that actually runs stably in production. Specific work includes in-house databases (Google’s Spanner, Meta’s MyRocks), distributed storage, CI/CD pipelines, in-house compilers and toolchains, and even kernel patches. This kind of role is disproportionately common at companies like Databricks and Snowflake, whose core business is low-level infrastructure, and every big tech company has a dedicated infra org as well. But the bar on systems fundamentals is high, so it’s not particularly friendly to most undergrads.
Machine Learning Engineer
Last, on the AI side, is the Machine Learning Engineer (MLE). Like the systems engineer, the MLE is fundamentally an engineer — the job isn’t deriving formulas but actually getting researchers’ trained models deployed into production. Day-to-day work includes wrangling massive training datasets, writing training pipelines, getting models to run with low power on mobile devices, keeping large-scale inference at low latency, and scaling model serving up and down. In practice, MLEs at frontier labs like OpenAI, Anthropic, and Google DeepMind mostly work alongside researchers to get the training runs going; MLEs at the business-driven teams inside Meta and Google mostly tune ranking models and recommendation systems for deployment. The role demands both a theoretical understanding of common AI models and fluency with stacks like PyTorch and CUDA. Over the last couple of years, with the GenAI explosion, this has also become one of the fastest-growing and highest-paid lines within SWE for new grads — but the bar on candidates is correspondingly high.
S&P 500 SWE
The three branches above are all SWE roles inside the tech industry, but SWE’s footprint extends far beyond tech companies. A substantial fraction of the S&P 500 isn’t pure tech but still has sizable engineering teams internally, and the three branches above — product, infra, MLE — are all well-represented: e-commerce at Walmart and Target maps to product engineering, Visa and Mastercard’s payment systems are textbook large-scale infra work, and Disney’s streaming recommendation system and UnitedHealth’s risk modeling increasingly rely on MLE. SWE roles at these traditional-industry companies pay a tier below big tech, with a relatively moderate pace and a more stable business — a nice fit for people who want work-life balance or to build domain knowledge in a specific industry.
Other Traditional-Industry SWE
Beyond the S&P 500 giants, there are a few other traditional-industry SWE lines worth mentioning. Accounting and consulting is probably the largest of them. The Big 4 (Deloitte, PwC, EY, KPMG) all have sizable internal engineering orgs working on audit automation, tax software, and tech delivery on consulting engagements; in strategy consulting, McKinsey (McKinsey Digital), BCG (BCG X), and Bain (Bain Vector) have all been aggressively expanding their digital arms in recent years; and within the IT consulting / outsourcing industry, Accenture, IBM Consulting, Cognizant, Infosys, and TCS collectively have SWE pools in the hundreds of thousands. Beyond that, biotech / hospital systems (Vertex, Moderna, Biogen, and Regeneron on the biotech side, plus large hospital systems like Mass General Brigham and Cleveland Clinic) form another sizable domain-specific SWE cluster, mostly working on clinical data pipelines, electronic health records, and bioinformatics. Compensation and WLB across these lines are roughly on par with the S&P 500 — a good fit for generalists or for people with a biology / medical background.
Traditional Finance SWE
Within traditional industries, the somewhat more distinctive line is finance SWE, covering both investment banks (Goldman Sachs, Morgan Stanley, JP Morgan, etc.) and asset management (BlackRock, Vanguard, Fidelity, State Street, Wellington, etc.). Many firms on both sides have engineering orgs of thousands or even tens of thousands of people internally, and the work is fairly similar across them — the three branches above are all well-represented: client-facing and internal trading platforms / business portals are product engineering, in-house low-latency trading systems and market-data systems are textbook infra engineering, and portfolio analytics, risk systems, fraud detection, credit risk, and algorithmic execution all increasingly depend on MLE. Finance SWE compensation overall sits between big tech and traditional S&P 500, and the culture is more formal than at pure tech companies; investment banks run noticeably more intense schedules with longer hours than tech, while asset management is closer to S&P 500 in this respect. Both industries are a solid choice for people who want to work on engineering problems in a financial context.
Quant
Some of those traditional finance SWE roles are actually fairly close to quant in content, just with a lower bar on the candidate than actual quant. So what does quant actually do? Before answering, I think it’s useful to split quant into two camps: high-frequency market makers and hedge funds.
A market maker plays the role of liquidity provider in the market — think “middleman” — quoting both a bid and an ask simultaneously and earning the spread between them (bid-ask spread). With the vast majority of trading now electronic, market makers mostly do this at high frequency online (though a small amount of trading is still done over the phone, such as some large ETF trades by institutions like pension funds). The most well-known firms here include Jane Street, Citadel Securities, Hudson River Trading, Jump Trading, Optiver, and others. The core challenge of market-making strategies is low latency — many strategies’ P&L comes down to microsecond- or even nanosecond-level latency differences — so the bar on systems engineering at these firms is extremely high. A substantial portion of what quant developers do is essentially indistinguishable from top-tier infra engineering, just with a stack customized for ultra-low-latency (kernel bypass, FPGA acceleration, hand-written assembly on hot paths, and so on). The roles at these firms generally split into three: quant traders handle real-time decisions and strategy tuning, quant researchers design new strategies and dig up new signals, and quant developers build and optimize the entire trading infrastructure. All three are very well paid; new-grad base salaries at the top firms are typically in the \$200–300k range, and total compensation with sign-on and bonus generally exceeds \$400k.
Unlike market makers using their own capital as “middlemen,” hedge funds primarily manage money for clients. From a funding-source perspective this is actually similar to bank wealth management, just with a narrower client base and more aggressive strategies. Their trading frequency is much lower than market makers’ but the strategy space is much broader. The most famous firms include Citadel, Two Sigma, D.E. Shaw, Renaissance Technologies, Millennium, and Point72, and each is further subdivided into pods or business lines with different strategy approaches and investment philosophies. Their holding periods range from minutes to months, so low latency isn’t as critical as it is for market makers, but they put a heavy load on mathematical modeling tools like statistical modeling, time-series analysis, and machine learning.
For CS undergrads, the set of reachable roles is smaller on the hedge fund side than on the market-maker side. For one, aside from a handful of pure quant funds (such as Renaissance Technologies), the vast majority of hedge funds still use discretionary trading rather than systematic trading — even at the firms listed above with the strongest quant operations, a non-trivial share of the work is still discretionary, and compared to other majors CS brings basically no clear advantage to discretionary trading. For another, at every fund’s systematic business line, the heaviest role is almost always the quant researcher, and for that they prefer to hire math, statistics, or physics PhDs directly — it’s relatively hard for a new-grad undergrad to walk in straight. But plenty of funds’ systematic businesses still have a fair number of quant developer roles aimed at CS undergrads, for example GQS (Global Quantitative Strategies) inside Citadel. GQS quant developers primarily build research and backtest platforms for quant researchers, maintain production trading infra, and develop the tooling needed when a strategy is being deployed — essentially the same work as a quant developer at a market maker, with the main difference being that the stacks serve different strategies: hedge funds mainly run mid- and low-frequency trading while market makers focus on high-frequency market making. On top of that, a huge number of Two Sigma and D.E. Shaw strategies are also systematic, so they too have very large quant developer pipelines internally. Base salary for these roles at top funds is roughly on par with quant developers at top market makers. Both sides’ bonuses are tied to PnL; it’s just that a fund’s pod ties your bonus more tightly to a single pod’s PnL, so the upside is higher but so is the volatility.
Another trend worth flagging from recent years: both market makers and hedge funds are leaning more heavily on ML. Market makers mostly use ML to dig up short-horizon signals and optimize execution; hedge funds use ML directly as the core engine of their systematic strategies. From an engineering point of view, this is very similar to what frontier-lab AI researchers and MLEs do — just with the deployment context being financial markets. As a result, frontier labs and quant are also actively poaching each other right now.
Overall, whether market maker or hedge fund, the bar to enter quant is one of the highest in the CS career space, roughly on par with frontier labs. Market-maker trading roles tend to weigh quantitative intuition and reaction speed especially heavily, so they strongly favor undergrads with backgrounds in math, physics, or competitive programming. The most core and selective role inside a systematic fund is the quant researcher, so they emphasize independent research ability, making it harder for an undergrad to land an offer directly. Of course, for the very strongest undergrads these two sets of qualifications often overlap. But for the typical CS undergrad — whether targeting a market maker or a hedge fund — the most accessible entry point is actually quant developer, because the bar on pure trading or research ability is somewhat lower for that role. Just note that the bar on engineering ability for these roles is on par with or even higher than top big-tech infra engineers, so it’s actually a great target for those who want to go deep in systems.
CS Industry Geography
Note: snapshot is as of May 2026. CS geography shifts quickly — offices open, HQs relocate, layoffs happen — so this section is meant as a general landscape to help you build a reliable mental model; for any specific company, defer to the latest information.
Having walked through the main career paths, let me switch perspectives and look briefly at the geographic distribution of US CS industry hubs. This often gets overlooked, but it actually has real impact on internship opportunities and post-graduation placement.
Big Tech
Big tech’s core hub is the Bay Area in California (which includes downtown SF as well as the South Bay around Mountain View / Palo Alto / Cupertino) — Meta, Google, Apple, and Nvidia are all headquartered here. The Seattle metro is the second-largest hub, home to Microsoft in Redmond east of Lake Washington and Amazon’s HQ in Seattle proper. NYC has plenty of big-tech offices too, mostly focused on fintech, advertising, and media-related products. In recent years Austin has been emerging as a new big-tech cluster — Apple has its largest campus outside Cupertino there, Tesla relocated its HQ to Austin for tax and policy reasons (Oracle also moved its HQ from the Bay Area to Austin in 2020 but moved the global HQ again to Nashville in 2024), and Meta, Google, and others have all opened sizable satellite offices in Austin.
Other Tech Companies and AI Labs
Beyond the giants, the remaining tech firms fall into two rough buckets. One is the publicly listed mid-cap tech companies like Snowflake, Cloudflare, Datadog, MongoDB, Twilio, and Figma — this tier is much more evenly distributed outside SF than big tech is, with significant footprints in NYC (Datadog, MongoDB, etc.), Boston (HubSpot, etc.), Austin, Chicago (some enterprise SaaS), and Bozeman MT (Snowflake). The other is fast-growing private firms not yet public, especially the AI labs and AI infra companies that have risen in the last couple of years: OpenAI, Anthropic, Databricks, Stripe, Notion, Vercel, and others — almost all of them are headquartered in downtown SF.
S&P 500
Traditional-industry companies in the S&P 500 are the most geographically spread out. SWE teams basically follow the company’s HQ, so the hubs cover almost the entire US. In retail, Walmart is in Bentonville, Arkansas (with a sizable Walmart Global Tech site in Sunnyvale in the Bay Area); Target is in Minneapolis (with a Bay Area tech hub as well); Home Depot is in Atlanta (and its Austin Technology Center is also a sizable tech site); Costco is in Issaquah, east of Seattle. In healthcare and pharma, UnitedHealth is in Minneapolis, CVS Health is in Rhode Island, Aetna (acquired by CVS) is in Hartford, Pfizer is in NYC (with R&D centered in Cambridge MA and Groton CT), and Merck and J&J are in New Jersey. Financial services are more spread out: Visa is in Foster City on the Peninsula (with a large tech office in Austin as well), Mastercard is in Purchase, north of NYC, Bank of America is in Charlotte (with sizable engineering teams in NYC, Dallas, and Chicago), Capital One is in McLean, outside DC (with sizable tech offices in NYC, SF, Plano TX, and Richmond), and American Express is in NYC (with Phoenix and Salt Lake City as two other large tech hubs). For automotive, Ford and GM are both in the Detroit area. In aerospace/defense, Boeing moved its HQ from Chicago to Arlington but Commercial Airplanes is still anchored around Seattle; Lockheed Martin is in Bethesda (with major sites in Fort Worth, Sunnyvale, Denver, and Orlando); and Northrop Grumman is in Falls Church. In media, Disney is in Burbank, LA (with Bay Area and Seattle also serving as Disney Streaming tech sites), and Comcast is in Philadelphia. The defining feature of this tier of jobs is how spread out it is — partly because the HQs themselves cover the whole country, and partly because most of these companies also operate sizable secondary tech / delivery centers in second-tier cities like Salt Lake City, Phoenix, Plano, Columbus, and Tampa. So if you don’t want to stay in Bay Area / Seattle / NYC after graduation and want to develop your career in some other US city, the S&P 500 tier offers the broadest coverage and is the easiest place to find SWE roles. Also worth noting: these companies have been heavily upgrading their engineering orgs in recent years — sub-orgs like Walmart Global Tech and Capital One Tech are essentially run on modern tech-company standards internally, and the experience for job seekers isn’t far off from big-tech SWE.
Other Traditional Industries
The defining geographic feature of accounting and consulting firms is that their headquarters mostly sit in NYC or Boston, but their office networks cover essentially every major US metro, since the consulting business is built around staying close to clients. The US headquarters of the Big 4 are all in NYC, but they each have offices of thousands of employees in major cities like Chicago, DC, Atlanta, Dallas, LA, SF, Houston, and Boston. In strategy consulting, McKinsey’s US headquarters is in NYC, while BCG and Bain, both founded in Boston, still keep their headquarters there — and all three have offices in every major metro. Within IT consulting and outsourcing, Accenture’s US presence is anchored in Chicago but their office network is extremely spread out across the country; IBM Consulting follows IBM’s headquarters in Armonk north of NYC (with Raleigh-Durham being IBM’s largest non-HQ US campus, and Austin and Atlanta also sizable sites); Cognizant is headquartered in Teaneck, NJ (with Dallas, Phoenix, and Tampa as major delivery centers); Infosys anchors its US presence in Indianapolis, where it operates a large tech / training campus; TCS is headquartered in NYC, with major delivery centers spread across Cincinnati (its largest, Seven Hills Park, sits in Milford OH), Columbus, Edison NJ, and Phoenix.
Biotech firms are mostly concentrated in Boston, the Bay Area, and NYC metro. Boston barely needs an introduction — Moderna and Biogen are both headquartered around Kendall Square, while Vertex sits in Seaport. In the Bay Area, the most prominent names are Genentech in South SF and Gilead in Foster City. In NYC metro, Pfizer is headquartered in Manhattan, Bristol Myers Squibb is headquartered in Princeton NJ (with a major office in NYC), Regeneron is in Tarrytown north of NYC, and Merck is headquartered in Rahway, NJ.
Hospital systems sit on the opposite end — unlike biotech’s heavy concentration, the top US hospital systems are actually quite spread out, with essentially every major metro having its own anchor medical center: Mass General Brigham in Boston, Cleveland Clinic in Cleveland, Mayo Clinic in Rochester, Minnesota, Johns Hopkins Medicine in Baltimore, NYU Langone in NYC, Kaiser Permanente in the Bay Area, UPMC in Pittsburgh, and so on — so health IT engineering teams naturally end up distributed across the country.
Traditional Finance
The headquarters of investment banks are almost all in Manhattan in NYC — Goldman Sachs, Morgan Stanley, JP Morgan, Citi, and so on. But their engineering teams are actually distributed far more widely. Goldman Sachs’s Salt Lake City and Dallas offices are both major US sites outside NYC with engineering making up a significant share of headcount, and the Dallas campus in particular is still being aggressively built out; JP Morgan’s tech center of gravity is actually distributed across non-NYC hubs like Columbus OH, Plano TX, and Wilmington DE; Citi has major tech sites in Tampa and Irving TX; and Morgan Stanley has sizable offices in Salt Lake City and Westchester NY. Within asset management firms, BlackRock is headquartered in NYC (with Atlanta as its primary engineering hub outside NYC, and sizable offices in Princeton NJ, Wilmington DE, and SF), Vanguard is in the Valley Forge / Malvern area west of Philadelphia (with major offices in Charlotte and Dallas), and Fidelity, State Street, and Wellington are all headquartered in Boston — and Fidelity in particular has major tech sites in Smithfield RI, Salt Lake City, Raleigh-Durham, and Westlake TX. Taking biotech and asset management together, Boston is one of the few hubs outside of NYC / Bay Area that covers multiple career paths; Salt Lake City, hosting major tech sites of Goldman, Morgan Stanley, Fidelity, and American Express, plus Adobe’s Lehi campus and many other major-company offices, has likewise grown into a mature tech hub in recent years — earning the area the nickname Silicon Slopes.
Quant
The two camps within quant have slightly different geography. For market makers, Chicago is the traditional hub for derivatives market makers — Jump Trading, DRW, and others are HQed there. NYC is where Jane Street, Hudson River Trading, and Virtu — the ETF / options / equities market makers — sit. Houston is the hub for commodity and energy trading; large trading firms like Citadel and DRW have sizable commodity / energy desks there, and pure-play energy traders like Vitol, Mercuria, and Trafigura all have their main US offices in Houston. Miami has emerged as a new quant hub in recent years: Citadel Securities relocated its HQ from Chicago in 2022 (its Chicago office is still sizable). Hedge funds, by contrast, are mostly concentrated in NYC and neighboring Connecticut — Two Sigma, D.E. Shaw, Millennium, Point72, and others are HQed there. In recent years, beyond Citadel / Citadel Securities relocating their HQ to Miami, Millennium, Point72, Schonfeld, Balyasny, ExodusPoint, and D1 Capital have all opened sizable offices in Miami or nearby West Palm Beach; combined with Florida’s lack of state income tax, the quant footprint in the area continues to grow.
Industry clusters and school choice
The industry clusters above tie directly to undergrad school choice — CS industry distribution is highly geographic, and a school’s location often gives a major boost when it comes to finding work. California schools are the most tightly tied to Silicon Valley — FAANG, mid-cap tech, and SF unicorns all recruit aggressively at them. UW, with both a strong CS program of its own and the geographic advantage of being in Seattle alongside Microsoft and Amazon, has an almost monopoly-level recruiting pipeline in the PNW. UIUC is relatively close to Chicago and is itself a systems powerhouse, so it’s one of the largest feeders to Chicago trading firms. UChicago is in Chicago itself and has a strong math department, so it too is especially popular with Chicago trading firms. UT Austin is in Austin with a strong CS program, so it’s the most direct undergrad feeder to the Austin big-tech cluster (Apple, Tesla, Meta, Google, etc.). Georgia Tech is in Atlanta with a top-tier CS program, with strong pipelines both to the S&P 500 giants based in Atlanta (Home Depot, Delta, Coca-Cola, UPS, etc.) and to FAANG. Rice is in Houston and is the most natural undergrad feeder for the Houston energy trading scene mentioned above. In the NYC area, Columbia, NYU, Princeton, and Stony Brook are naturally close to Wall Street, with obvious pipelines to NYC trading firms and hedge funds. But the examples listed here are really just the tip of the iceberg.
There’s also a less-obvious alignment that often gets overlooked: many traditional giants strongly favor nearby state universities when hiring employees (including SWEs), so state schools right next to traditional-giant HQs are often the main undergrad pipeline for those companies — University of Arkansas for Walmart, University of Minnesota for Target / UnitedHealth / 3M, Michigan / Michigan State for Ford / GM, NC State / UNC for Bank of America, UMD / Virginia Tech / UVA for DC-area defense (Lockheed, Northrop) and Capital One, Purdue for Eli Lilly in Indianapolis, Arizona State for Intel’s Chandler campus, University of Utah for Adobe, and so on. There are far too many other examples to enumerate here.
So for undergrads planning to go straight to work after graduation, what really matters when picking a school by career outcomes is exactly this kind of alignment between school location and industry hub — not a few US News or CSRankings places one way or the other, which is a superficial factor. That said, it’s worth noting that this proximity advantage to an industry cluster is, for companies and roles that are hard to break into in the first place, only a meaningful boost and convenience — it isn’t a guarantee, and how things ultimately pan out still comes down to individual ability. It’s analogous in spirit to the institutional advantage UWC students have over other high schoolers when applying to U.S. colleges. This is also somewhat similar to the “MBA illusion” I mentioned in On Choosing Colleges: for the better companies and roles, a non-trivial share of their employees come from nearby schools — not that a non-trivial share of nearby schools’ students get into these companies. Unlike the “MBA illusion” though, this one only involves the statistical mismatch, not the reverse causation.
From feature engineering to representation learning
Before getting into this section, I want to introduce a small concept from machine learning. In the traditional machine learning era, the most critical step in training a model was usually feature engineering. In this step, the engineer had to manually carve out a set of features — telling a model meant to recognize cat photos things like “cats usually have four legs” or “a cat’s legs are usually furry” — and the model would only train and learn within this pre-designed feature space. This step relied heavily on the engineer’s domain knowledge of the specific problem, and was often the make-or-break factor for whether a model could be trained well. But in the later deep learning era, people discovered a more effective and simpler paradigm: we no longer need to hand-design complex features. Instead, the only thing we have to do is feed the raw data directly into a sufficiently powerful model, and let it learn an internal representation from the data on its own. This is representation learning — a representation of the data that cannot be described in any natural language, yet is often far more precise than any human-designed feature. From the CNN revolution sparked by ImageNet to today’s foundation models, this paradigm has run through almost every breakthrough in AI over the past decade-plus. And this is precisely The Bitter Lesson I mentioned earlier.
But the most interesting thing about this paradigm shift actually lies not in the model training inside ML, but in the generalizability of the paradigm itself: when it comes to schools, other people, and even ourselves, we can do representation learning instead of feature engineering. Representation learning about a school means not defining its value through hand-crafted features like US News or CSRankings, but slowly learning an internal representation of a program from the rawest observations — which faculty are doing what research, what industry hubs are nearby, whether the undergraduate culture is math-heavy or engineering-heavy, and so on. That’s exactly what this whole article, and the earlier On Choosing Colleges, have been doing. The same goes for representation learning about other people: a judgment about someone’s real ability and character should never be read directly off surface features like title, salary, or awards. Instead, we should let that judgment emerge slowly from how they handle things and how they treat people. And the same logic applies to our own self-worth — a point I already discussed in Victims of the System: anchoring our self-worth in external metrics like GPA, ranking, or offers is essentially defining our own value through a set of features someone else hand-crafted.
Of course, representation learning is never a free lunch on the technical side. It usually requires a massive amount of data and a long stretch of pretraining, and models trained this way often have worse early performance than carefully feature-engineered traditional models. The same is true on the cognitive level. If we stop using ready-made metrics and instead let our judgments about the world emerge from the rawest observations, we’ll often end up looking inefficient compared with peers who are dutifully optimizing external metrics — and then we may even start wanting to take shortcuts through feature engineering ourselves. That’s exactly the price representation learning demands. Because those judgments that emerge from raw observations only become genuinely more accurate and more robust than any hand-crafted feature after we’ve processed enough raw data.
So I wouldn’t claim that I’ve gotten very far with representation learning either; I’m probably still in the stage where I need a lot of data to keep doing pretraining. But anyway, this is a small reflection I’ve picked up after studying CS for as long as I have, and I wanted to share it with those who are walking the same path.