Concept Algorithm

How ForkFox sees a city

A 4.6-star restaurant can serve a six-out-of-ten biryani. Here's why dish-level intelligence is harder than venue ratings — and why it's the only honest way to tell you what to eat.

The Dish·No. 02 · Concept
The star problem, explained
Stop scoring rooms. Start scoring plates.
ForkFox · concept · 2026

The star problem

Every time you open a food app, it hands you a venue. Four and a half stars. Save it. Map it. Drive there. You sit down, and the dish that was supposed to be the signature — the thing the reviewers cried about — arrives cold. Or oily. Or worse, mediocre in a way that feels personal. You paid for the reviews. The reviews were wrong.

Pull that thread and something strange happens. The star rating isn't actually about any one thing. It's an average of a thousand opinions about unrelated things. The patio is lovely. The owner walks the floor. The risotto was cold that Tuesday in March. The wine list is deep if you know how to read it. One reviewer had a terrible day and took it out on the bread course. All of those get compressed into a single 4.6, and that single number gets pasted onto a map pin, and that map pin gets called a recommendation.

It isn't one. It's a compression. And the compression throws away the only information that mattered when you were hungry: which plate should I order?

A star rating is an average of a thousand opinions about different things. It can tell you a restaurant exists. It cannot tell you what to order.

Every food app in the English-speaking internet has been playing this compression game since roughly 2004. Yelp. Google. OpenTable. TripAdvisor. Michelin's digital surface. Even the review aggregators that aggregate the aggregators. All of them rank restaurants, because restaurants are the unit that maps and reservation systems have always used, and because the word "dish" didn't have a clean database schema in the early days of local search.

But you never ate a restaurant in your life. You ate the pork shoulder. You ate the clams. You ate a burrito wrapped in foil at eleven-thirty at night standing up. The plate is the unit. The restaurant is the address where the unit lives. It is long past time somebody started ranking the units.

What a dish score actually measures

Scoring plates is harder than scoring rooms, which is the polite way of saying this is why nobody has done it yet. A room is a fuzzy average — a kind of vibe. A plate is specific. A plate is either good or it's not. And our judgment about whether a plate is good isn't one judgment, it's at least five, and they don't always agree.

ForkFox looks at a set of sensory and contextual signals for each plate — the way the kitchen handles the technical stuff, how the plate compares to its cuisine’s canonical version, whether the price lines up with what’s on the plate. The algorithm combines them into a score out of 100 and calibrates that score to your taste.

Those five numbers feed a combined score. The combination is not a simple average — it's a weighted formula that shifts by cuisine. Flavor and execution carry more weight for technical-floor cuisines; texture and context carry more for regional and hyperlocal cuisines; value stays constant across all of them, because nobody's price tolerance changes based on what they ordered.

Why the dish wins over the restaurant

Run the algorithm on a real city and a strange thing happens. The rooms you expect to dominate — the tasting-menu fortresses, the three-starred reservations, the places where the bartender knows your name — don't always dominate the top scores. Sometimes they do. But more often, a 94 lives on the menu of a no-name spot whose reviews you've scrolled past because the venue rating was an uninspired 4.2.

We see this constantly in the beta. A 4.5-star Oakland restaurant that scores a 67 on its most-ordered dish and a 91 on a side that almost never gets mentioned. A Michelin-starred Center City room where the signature course grades out at 81 and the bread service grades out at 94. A Philadelphia taco truck whose al pastor scores higher than anything within a ten-block radius, including three sit-down restaurants with triple the price and double the star rating.

This isn't a gotcha. It's what happens when you stop compressing. The compressed number was hiding real information. The real information was that the room average lies about the plate variance. Every restaurant is actually a portfolio of dishes, and portfolios have uneven performance, and when you only read the portfolio's summary statistics you systematically miss the best individual investments.

The room average lies about the plate variance. Every restaurant is a portfolio of dishes — and portfolios have uneven performance.

The dish-first view also surfaces something the star-first view never could: the no-name spot whose one perfect plate is worth the trip. Every city has dozens of these. They never break into the top twenty restaurants. They don't have a marketing budget. Their one perfect plate is surrounded by six mediocre ones, and the venue rating drags the perfect plate down to the city's median. That plate was invisible under the star system. It becomes findable under the dish system.

What we found in beta

We've been running ForkFox against San Francisco and Philadelphia for most of 2026. The beta has scored several thousand dishes. We haven't ranked restaurants yet — that would violate the very frame we just spent two thousand words defending — but we have a few pattern-level observations that held up across both cities.

The best dish on a menu is usually not the "house signature." This surprised us. The dishes marketing teams push hardest are frequently the third or fourth best thing the kitchen makes. The actual best plate is often a lunch special, a side, or a fish-of-the-day — something that wasn't written for a website and therefore reflects what the chef actually thinks is working that month.

Value scores skew higher outside the Michelin-halo neighborhoods. This also surprised nobody who eats in immigrant neighborhoods. The corollary is that the Michelin-halo rooms aren't overpriced relative to their attribute scores — they're correctly priced for the technical floor they clear. You're just getting the same attribute delivery per dollar somewhere else, served on heavier ceramic.

Execution scores are the noisiest metric and the hardest to fix. A dish that's a 94 on its best day and a 76 on its worst is a reputation problem for the restaurant and a prediction problem for us. We're weighting execution scores more conservatively during beta until we have more repeat-visit data.

Context scores correlate with neighborhood more than with restaurant prestige. South Indian dosa scores in the Tenderloin out-perform South Indian dosa scores in Pacific Heights at the same price point. The neighborhood does something to the kitchen. The algorithm can see it. We don't fully understand the mechanism yet.

These are pattern-level statements, not recommendations. We're still in intro mode. The actual lists — the ones that say "the best taco in the Mission scored a 96 and here's where to find it" — come later, once the model has more beta data and we trust the scores enough to name names.

For now, the concept is what we're shipping. The concept is simple: stop averaging rooms, start ranking plates, and pay attention to which plate the algorithm has started believing in.

The Dish newsletter
Your Friday plate, served.
Real diners. Real scores. One dish a week that broke the star system, plus where to find it. We read the reviews so you don't have to. Free forever.
The Dish newsletter
One email per week · Unsubscribe anytime
Signature · concept

Frequently asked

Why dishes instead of restaurants?
Because you eat dishes, not restaurants. A venue rating averages the room, the service, and a dozen plates you'll never order. The plate is the unit you remember. That's the unit we score.
How does the scoring actually work?
Five attributes — several sensory and contextual attributes — each scored from aggregated review data combined into a 0-to-100 personalized to your taste profile.
What counts as "execution"?
Consistency across visits, technical competence, how close the plate lands on a repeatable version of itself. We weight it higher for cuisines with high technical floors like ramen, sushi, and Neapolitan pizza.
How is this different from Yelp or Google?
Those rank places. We rank plates. A 4-star restaurant in our system might have one 94-scoring dish and one 62-scoring dish on the same menu — and we'll tell you which is which.
Does the algorithm get dishes wrong?
Yes. It's in beta. Every correction a user submits trains the model. The app is built to learn in public. If it scores a plate wrong, we want to know.