This data contains prices paid for real-estate property in England and Wales. The data is available since 1995, and the size of the dataset in uncompressed form is about 4 GiB (which will only take about 278 MiB in ClickHouse).Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Source: https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads
- Description of the fields: https://www.gov.uk/guidance/about-the-price-paid-data
- Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
Create the table
Preprocess and insert the data
We will use theurl function to stream the data into ClickHouse. We need to preprocess some of the incoming data first, which includes:
- splitting the
postcodeto two different columns -postcode1andpostcode2, which is better for storage and queries - converting the
timefield to date as it only contains 00:00 time - ignoring the UUid field because we don’t need it for analysis
- transforming
typeanddurationto more readableEnumfields using the transform function - transforming the
is_newfield from a single-character string (Y/N) to a UInt8 field with 0 or 1 - drop the last two columns since they all have the same value (which is 0)
url function streams the data from the web server into your ClickHouse table. The following command inserts 5 million rows into the uk_price_paid table: