How to hack competition by monitoring real estate market data

In the last couple of years, real estate companies have shifted their focus to the digital world. And now almost all investments have an online system showing what apartments are available. This is very convenient for potential clients, as they can easily become familiar with the apartments on offer. But did you know you can monitor real estate market data? Make the most of your data and beat the competition by monitoring frequently and analyzing sales progress.
Why is this so important? As sales dynamics are crucial figures that are rarely represented as a number, it gives you an edge in negotiations. Knowing the dynamics enables you to better plan your cash flow over time, and increase your asset efficiency in the short term.This investigation helped me to fight against the odds of the 2019 real estate market in Poland, and close apartment deals on better terms. How can you benefit as an individual client?
- You have an additional edge in price negotiations.
- You can choose the right moment to buy.
- You can predict the next move of a developer. Was the investment successful? Will they continue to the next phase? What’s their current financial situation?
- You have first-hand knowledge about market demand in a given location.
- You can recognize patterns in your competitor’s sales process.
- You can better understand clients’ needs. Which types of apartments are sold more quickly?
Example results after monitoring one of the locations in Warsaw
The investment is built in the Warsaw eastern district of Praga Południe. This is an area with great potential. Currently, there are a lot of apartment buildings and places that wait for renovation or replacement. However, it is very well connected with the city center (15 minutes by public transport or 10 minutes by car). Another advantage is that there are a lot of points of interest like shopping malls, parks, schools, or medical service points. Here are the basic characteristics of the monitored investment:- There are 135 apartments for sale.
- There are 8 floors in the building; each floor has 14 to 16 apartments.
- Investment starts from an empty plot with construction permission, work on the site started in Q1 2018. The planned date of completion is Q4 2019.
- The real estate developer began selling apartments in April 2018.

This is good to know, but the most interesting data comes from the periodical monitoring of sales progress. Below you can see the timeline representing the number of apartments sold each week:

What else can we see? For example, what is the current state of sales progress?

It is worth remembering that construction began in Q1 2018. Currently, only the building's foundation is ready, but almost half of the apartments are already sold.
Predicted date of selling all apartments
Let’s see what the sales dynamic is by apartment type. We can also try to make a simple prediction of when all apartments will be sold.
What we can see from this visualization:
- If the selling pace continues, almost all apartments will be sold by May 2019. This is over 6 months before the planned construction completion date.
- However, not all apartments have the same sales dynamic. The fastest-selling apartments have 1 or 2 rooms. The biggest apartments, with 4 rooms, are less popular - only 3 have sold.
Did potential buyers change their minds? Could they get a mortgage loan?
Another interesting observation is about apartments that had sold but later became available again. There are multiple reasons why this happens. It is important to understand how the selling process works. The buyer has to sign an agreement with the developer before requesting a loan from a bank. There is always a possibility that after officially signing for an apartment, the bank can reject your loan request. The buyer can also change their mind and withdraw for other reasons. In the observed data, there was only one situation like this. A 2-room apartment with a separate kitchen was sold at the end of August 2018, but, in mid-September, became available again.Data scraping
Gathering valuable data for analysis is the foundation of every data science process. When the data comes from an external online source, one of the methods is data scraping. This technique means that the data is “scraped” from a website by a web crawler (other names are “robot” or “scraper”). The robot parses text into a machine-readable format, so then it can be analyzed. Some websites are protected by the following techniques:- Displaying content in a dynamic way (e.g. using JavaScript) or complicating website structure, so it is harder for a crawler to parse.
- Protecting content with captcha or other robot-detection tools.
- Anomaly detection systems that analyze behavior between requests and ban suspicious visitors.
Collecting the data from the investment page
In the case described in this article, I monitored one of the locations in Warsaw. The real estate developer is selling apartments in this location using an online system, which contains the following table of apartment availability. 
This system didn’t have any captcha or anomaly detection protection. The only impediment here is that the content is loaded dynamically using JavaScript, and there is no open API endpoint where we can just request the data. You are able to view the data only after clicking through the interface.
The solution is a web crawler simulating human behavior, clicking through the interface. Imagine that you have a robotic employee, who monitors all the information you need, on a weekly basis, doesn’t get tired or bored, and is 100% precise.
I used Google’s Puppeteer.js technology; it can replicate human behavior in a browser running in the background. Here is the source code of a scraper:
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://real-estate-developer-address.com/'); // Real URL address is confidential
await page.click('#cbNotSale');
await page.waitFor(2000);
const nextSelector = '//input[contains(@alt, ">")]'
var next = await page.$x(nextSelector)
while (next.length > 0) {
const cells = await page.$x('//*[@id="gwProducts"]/tbody/tr/td');
var vals = [];
var val = null;
for (var i = 0; i < cells.length; i++) {
val = await page.evaluate(x => x.textContent, cells[i])
if (val == "F1") vals.push([]); // The quickest (dirty) way to recognize row beginning.
if (val == "zobacz" || val.trim() == "") continue; // The quickest (dirty) way to recognize row end.
vals[vals.length - 1].push(val)
}
vals.forEach(row => {
console.log(row.join(";")) // Robot prints collected rows to the stdout
});
await next[0].click()
await page.waitFor(2000);
next = await page.$x(nextSelector)
}
await page.screenshot({path: 'example.png'});
await browser.close();
})();
node scraper.js > data-YYYY-MM-DD.csv