How to hack competition by monitoring real estate market data

Estimated time:
time
min

In the last couple of years, real estate companies have shifted their focus to the digital world. And now almost all investments have an online system showing what apartments are available. This is very convenient for potential clients, as they can easily become familiar with the apartments on offer. But did you know you can monitor real estate market data? Make the most of your data and beat the competition by monitoring frequently and analyzing sales progress.
<blockquote>Why is this so important? As sales dynamics are crucial figures that are rarely represented as a number, it gives you an edge in negotiations. Knowing the dynamics enables you to better plan your cash flow over time, and increase your asset efficiency in the short term.</blockquote>
This investigation helped me to fight against the odds of the <a href="https://www.ey.com/en_pl/real-estate-hospitality-construction/the-polish-real-estate-guide-2019" target="_blank" rel="noopener noreferrer">2019 real estate market in Poland</a><u>,</u> and close apartment deals on better terms.

How can you benefit as an individual client?
<ol><li>You have an additional edge in price negotiations.</li><li>You can choose the right moment to buy.</li><li>You can predict the next move of a developer. Was the investment successful? Will they continue to the next phase? What’s their current financial situation?</li></ol>
Also, this approach can be beneficial to a real estate company through monitoring competition?
<ol><li>You have first-hand knowledge about market demand in a given location.</li><li>You can recognize patterns in your competitor’s sales process.</li><li>You can better understand clients’ needs. Which types of apartments are sold more quickly?</li></ol>
<h2>Example results after monitoring one of the locations in Warsaw</h2>
The investment is built in the Warsaw eastern district of Praga Południe. This is an area with great potential. Currently, there are a lot of apartment buildings and places that wait for renovation or replacement. However, it is very well connected with the city center (15 minutes by public transport or 10 minutes by car). Another advantage is that there are a lot of points of interest like shopping malls, parks, schools, or medical service points.

Here are the basic characteristics of the monitored investment:
<ul><li>There are 135 apartments for sale.</li><li>There are 8 floors in the building; each floor has 14 to 16 apartments.</li><li>Investment starts from an empty plot with construction permission, work on the site started in Q1 2018. The planned date of completion is Q4 2019.</li><li>The real estate developer began selling apartments in April 2018.</li></ul>
After collecting the first data from the investment website, we can see the current offers of different apartment types:



This is good to know, but the most interesting data comes from the periodical monitoring of sales progress. Below you can see the timeline representing the number of apartments sold each week:



What else can we see? For example, what is the current state of sales progress?



It is worth remembering that construction began in Q1 2018. Currently, only the building's foundation is ready, but almost half of the apartments are already sold.
<h2><strong>Predicted date of selling all apartments</strong></h2>
Let’s see what the <strong>sales dynamic</strong> is by apartment type. We can also try to make a simple prediction of when all apartments will be sold.



What we can see from this visualization:
<ul><li>If the selling pace continues, almost all apartments will be sold by May 2019. This is over 6 months before the planned construction completion date.</li><li>However, not all apartments have the same sales dynamic. The fastest-selling apartments have 1 or 2 rooms. The biggest apartments, with 4 rooms, are less popular - only 3 have sold.</li></ul>
<h3><strong>Did potential buyers change their minds? Could they get a mortgage loan?</strong></h3>
Another interesting observation is about apartments that had sold but later became available again. There are multiple reasons why this happens. It is important to understand how the selling process works. The buyer has to sign an agreement with the developer before requesting a loan from a bank. There is always a possibility that after officially signing for an apartment, the bank can reject your loan request. The buyer can also change their mind and withdraw for other reasons.

In the observed data, there was only one situation like this.  A 2-room apartment with a separate kitchen was sold at the end of August 2018, but, in mid-September, became available again.
<h2><strong>Data scraping</strong></h2>
Gathering valuable data for analysis is the foundation of every data science process. When the data comes from an external online source, one of the methods is <a href="https://en.wikipedia.org/wiki/Data_scraping" target="_blank" rel="noopener noreferrer">data scraping</a>. This technique means that the data is “scraped” from a website by a web crawler (other names are “robot” or “scraper”). The robot parses text into a machine-readable format, so then it can be analyzed.

Some websites are protected by the following techniques:
<ul><li>Displaying content in a dynamic way (e.g. using JavaScript) or complicating website structure, so it is harder for a crawler to parse.</li><li>Protecting content with captcha or other robot-detection tools.</li><li>Anomaly detection systems that analyze behavior between requests and ban suspicious visitors.</li></ul>
Some websites are harder to scrape. They display content in a dynamic way (e.g. using JavaScript) or have a complicated website structure. At Appsilon we know how to deal with all of these obstacles. All of them make web scraping harder, but are not bulletproof. Any data scraping should be in line with the terms of use of a given website and <a href="https://en.wikipedia.org/wiki/Web_scraping#Legal_issues" target="_blank" rel="noopener noreferrer">local law</a>. In most cases, web scraping is legal unless your robot use more than usual bandwidth or computing power.
<h2>Collecting the data from the investment page</h2>
In the case described in this article, I monitored one of the locations in Warsaw. The real estate developer is selling apartments in this location using an online system, which contains the following table of apartment availability.



This system didn’t have any captcha or anomaly detection protection. The only impediment here is that the content is loaded dynamically using JavaScript, and there is no open API endpoint where we can just request the data. You are able to view the data only after clicking through the interface.

The solution is a web crawler simulating human behavior, clicking through the interface. Imagine that you have a robotic employee, who monitors all the information you need, on a weekly basis, doesn’t get tired or bored, and is 100% precise.

I used <a href="https://developers.google.com/web/tools/puppeteer/">Google’s Puppeteer.js</a> technology; it can replicate human behavior in a browser running in the background. Here is the source code of a scraper:

&nbsp;
<figure class="highlight">
<pre class="language-r"><code class="language-r" data-lang="r">
const puppeteer = require('puppeteer');
(async() => {
 const browser = await puppeteer.launch();
 const page = await browser.newPage();
 await page.goto('<a href="https://real-estate-developer-address.com/list.aspx">https://real-estate-developer-address.com/</a>'); // Real URL address is confidential
 await page.click('#cbNotSale');
 await page.waitFor(2000);
<br>  const nextSelector = '//input[contains(@alt, ">")]'
 var next = await page.$x(nextSelector)

 while (next.length > 0) {
   const cells = await page.$x('//*[@id="gwProducts"]/tbody/tr/td');
   var vals = [];
   var val = null;
<br>    for (var i = 0; i < cells.length; i++) {
     val = await page.evaluate(x => x.textContent, cells[i])
     if (val == "F1") vals.push([]); // The quickest (dirty) way to recognize row beginning.
     if (val == "zobacz" || val.trim() == "") continue; // The quickest (dirty) way to recognize row end.
     vals[vals.length - 1].push(val)
   }
<br>    vals.forEach(row => {
     console.log(row.join(";")) // Robot prints collected rows to the stdout
   });

   await next[0].click()
   await page.waitFor(2000);
   next = await page.$x(nextSelector)
 }

 await page.screenshot({path: 'example.png'});
 await browser.close();
<br>})();
</code></pre>
</figure>
And we can run the crawler with the command below. Just configure a CRON or other job to run this weekly:
<figure class="highlight">
<pre class="language-r"><code class="language-r" data-lang="r">
node scraper.js > data-YYYY-MM-DD.csv
</code></pre>
</figure>
This is just one of a wide range of techniques that can be used for scraping. Let us know if you want to know more about <a href="https://appsilon.com/services/real-estate">collecting data and other business cases from our experience</a>!
<h2><b>Aftermath</b></h2>
As a private investor, I need to excel in my investments. There is no place for bad bets, which is why I use all available cards in the deck to make sure I do everything in my data science power. This analysis gave me information about the investment’s financial performance, and I could predict how much time I had to close the deal for my dream apartment. I had more time for the final decision, and I was more confident during negotiations. Remember that buying an apartment at the latest possible moment gives you short-term alternatives, instead of freezing your capital right away. <strong>I had an edge that other buyers did not, so I could easily tackle false claims about the popularity of the apartments,</strong> and not feel pressured into buying. Each sales process is a game with unequal distribution of knowledge. Knowledge of sales dynamics will allow you to fight against the odds and close better deals.

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
r
data analytics
tutorial
case studies