A sample of the shelf. Google's Dataset Search alone indexes around 25 million public datasets, and the count only climbs (the US government publishes more than 2 million on its own). The ingredients are effectively infinite. The craft is knowing which two to reach for.
Everyone holds the same public data. The craft is the connection between sources, and the taste that picks the one worth making. Here is the whole system, in one place.
Access is commodity.
The bake is the moat.
Taste is the product.
Three stages. A key makes the connection possible, a move performs it, an engine captures the value. Read left to right.
Two datasets only combine if they share a dimension. Name the key in one word, or you are forcing a join that does not exist.
Same timeline or interval. Unlocks sequence, lead and lag, divergence.
Same geography. Unlocks overlay, clustering, spatial pattern.
Same person or entity. Unlocks enrichment and personalization.
Same taxonomy. Unlocks comparison, ranking, segmentation.
Convertible measures. Unlocks translation and normalization.
The complete set. Filter by the engine each move feeds, and notice where the commercial plays cluster. Tap any card for its worked example and its failure mode.
↳ cards are interactive. Filter, then tap to expand
Same ingredients, three different businesses. They are not a menu, they are a sequence. Delight earns attention, reputation converts it to authority, commercial converts authority to revenue.
Personal, absurd, shareable. Free tools and toys that spread on their own.
The combinations nobody else connected. Builds the perception that you see signal.
Combinations that make a decision measurably better. The part people pay for.
The principles that keep the practice sharp. This is where good ideas rot if ignored.
Random combination is a generator, not an output. Nine of ten mashups are noise.
Curation is the deliverable. The selection is the work, not the combining.
Name the engine before you build. Commercial, reputational, or delight. If you cannot say which, stop.
Freshness beats completeness for signal. Completeness beats freshness for archive.
Correlation theater is a reputational landmine. Implied causation you cannot defend costs more than it earns.
The shared key is the bottleneck, not the data. Most failures happen at the seam, not at access.