HackRcity 2019 recap: hacking the "coolest city" contest

Reading time:

time

min

December 6, 2019

<h3>Intro + hackathon description </h3> Have you ever participated in a hackathon? A month ago my answer was still no (despite the fact that I’ve been working in the R Shiny/Data Science space for 5 years). Now I have checked it off my list. And my team won, so I’m 1 for 1, with a 100% success rate, so I can retire now. More seriously, the team and I made tons of mistakes and just a few good decisions and we think that it would be beneficial to share the lessons learned, both from the coding and work organisation perspective. <img class="size-medium wp-image-3278" src="https://wordpress.appsilon.com/wp-content/uploads/2019/12/image-5-5-600x366.png" alt="HackRcity participants" width="600" height="366" /> the HackRcity participants What was the hackathon about? It was organised by the <a href="https://analyx.com/en/">Analyx</a> company from Poznań city, and held in their office. A great job done on their side! The problem to be solved was as follows: in 2018 one of the tourist organisations published “popularity index” of the Polish cities, and Poznań got a low score, especially in comparison to similar Polish cities e.g. Wrocław. That confirmed what city authorities had seen for a long time: that Poznań is not as attractive to tourists as it should be. So, Poznań’s office of promotion gathered all of the available data from the Central Statistical Office of Poland and gave it to the hackathon participants with an expected result: tell us what to do to make Poznań great again, especially in how it’s perceived by tourists. The general task was divided into several parts that were assessed by the jury: <ol><li style="font-weight: 400;">Some values of “popularity index” were missing - teams should build a model to predict them, the <a href="https://en.wikipedia.org/wiki/Mean_absolute_error">MAE</a> was compared</li><li style="font-weight: 400;">Reproducibility and code organisation</li><li style="font-weight: 400;">Interpretability of the model</li><li style="font-weight: 400;">Value of conclusions for the city of Poznań </li><li style="font-weight: 400;">The quality of the presentation at the end of the hackathon</li><li style="font-weight: 400;">An approach to solving side-quest: plan the trip through 5 polish cities for a fictional family based on their preferences</li></ol> There were also two nice concepts introduced by the organisers. They made the work more fun and emphasized the crucial concept in data science - you need to look for helpful data around you: <ul><li style="font-weight: 400;">each team was allotted 7 “points” that could be spent on additional datasets, and each had his own “price”. But you needed to choose wisely, for you could only afford some of them! So eventually each team had different data to use according to their brilliant (or ill-advised) ideas.</li><li style="font-weight: 400;">the names of the Polish cities in dataset were disguised as fruit names! So part of the game was to decode the names - it turned out that the ‘lemon’ was a nickname for ‘Warsaw’ :) </li></ul> <h3>How we organised our work </h3> So, we rolled up our sleeves and got to work! We had nine hours to complete the tasks. There was no time to lose! We decided to call our team lotR, after our favorite band of in-over-their-heads adventurers. <img class="wp-image-3279 size-medium" src="https://wordpress.appsilon.com/wp-content/uploads/2019/12/FB_IMG_1573921391108-600x400.jpg" alt="the lotR fellowship gets to work" width="600" height="400" /> the lotR fellowship gets to work <img class="wp-image-3283 size-medium" src="https://wordpress.appsilon.com/wp-content/uploads/2019/12/IMG_20191206_141414_386-500x500.jpg" alt="the lotR fellowship gets to work" width="500" height="500" /> the lotR fellowship gets to work But we really had no idea where to start. Someone said, “I don’t know, maybe we should look at the data.” Then the question became “Wait, how do we get the data?” But we figured it out and were very proud for the first success of the day! ;) We realized that we didn’t have much experience with hackathons together as a team, so we needed to organise the way we work. What developed quite naturally was the model of hierarchical work splitting: we started by just looking at the data and getting familiar with it all together. Then we formed two groups: one focused on decoding the fruit names plus building the trip planner app and the second group focused on building the predictive model plus its interpretability. Each group then divided into sub-groups (well, consisting of only 1 person) that wrote code for a specific area. We kept everyone informed of our progress and helped each other with issues. I’m sure that sounds great, but 2 hours into the hackathon I was honestly afraid that we would have nothing to show for our efforts, and we would have to get up on stage after 9 hours and admit our failure. But we eventually achieved some results, so lesson learned: do not give up! <blockquote class="instagram-media" style="background: #FFF; border: 0; border-radius: 3px; box-shadow: 0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width: 540px; min-width: 326px; padding: 0; width: calc(100% - 2px);" data-instgrm-permalink="https://www.instagram.com/p/B5u2ofPFsLZ/?utm_source=ig_embed&utm_campaign=loading" data-instgrm-version="12"> <div style="padding: 16px;">   <div style="display: flex; flex-direction: row; align-items: center;"> <div style="background-color: #f4f4f4; border-radius: 50%; flex-grow: 0; height: 40px; margin-right: 14px; width: 40px;"></div> <div style="display: flex; flex-direction: column; flex-grow: 1; justify-content: center;"> <div style="background-color: #f4f4f4; border-radius: 4px; flex-grow: 0; height: 14px; margin-bottom: 6px; width: 100px;"></div> <div style="background-color: #f4f4f4; border-radius: 4px; flex-grow: 0; height: 14px; width: 60px;"></div> </div> </div> <div style="padding: 19% 0;"></div> <div style="display: block; height: 50px; margin: 0 auto 12px; width: 50px;"></div> <div style="padding-top: 8px;"> <div style="color: #3897f0; font-family: Arial,sans-serif; font-size: 14px; font-style: normal; font-weight: 550; line-height: 18px;">View this post on Instagram</div> </div> <div style="padding: 12.5% 0;"></div> <div style="display: flex; flex-direction: row; margin-bottom: 14px; align-items: center;"> <div> <div style="background-color: #f4f4f4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(0px) translateY(7px);"></div> <div style="background-color: #f4f4f4; height: 12.5px; transform: rotate(-45deg) translateX(3px) translateY(1px); width: 12.5px; flex-grow: 0; margin-right: 14px; margin-left: 2px;"></div> <div style="background-color: #f4f4f4; border-radius: 50%; height: 12.5px; width: 12.5px; transform: translateX(9px) translateY(-18px);"></div> </div> <div style="margin-left: 8px;"> <div style="background-color: #f4f4f4; border-radius: 50%; flex-grow: 0; height: 20px; width: 20px;"></div> <div style="width: 0; height: 0; border-top: 2px solid transparent; border-left: 6px solid #f4f4f4; border-bottom: 2px solid transparent; transform: translateX(16px) translateY(-4px) rotate(30deg);"></div> </div> <div style="margin-left: auto;"> <div style="width: 0px; border-top: 8px solid #F4F4F4; border-right: 8px solid transparent; transform: translateY(16px);"></div> <div style="background-color: #f4f4f4; flex-grow: 0; height: 12px; width: 16px; transform: translateY(-4px);"></div> <div style="width: 0; height: 0; border-top: 8px solid #F4F4F4; border-left: 8px solid transparent; transform: translateY(-4px) translateX(8px);"></div> </div> </div> <div style="display: flex; flex-direction: column; flex-grow: 1; justify-content: center; margin-bottom: 24px;"> <div style="background-color: #f4f4f4; border-radius: 4px; flex-grow: 0; height: 14px; margin-bottom: 6px; width: 224px;"></div> <div style="background-color: #f4f4f4; border-radius: 4px; flex-grow: 0; height: 14px; width: 144px;"></div> </div>   <p style="color: #c9c8cd; font-family: Arial,sans-serif; font-size: 14px; line-height: 17px; margin-bottom: 0; margin-top: 8px; overflow: hidden; padding: 8px 0 7px; text-align: center; text-overflow: ellipsis; white-space: nowrap;"><a style="color: #c9c8cd; font-family: Arial,sans-serif; font-size: 14px; font-style: normal; font-weight: normal; line-height: 17px; text-decoration: none;" href="https://www.instagram.com/p/B5u2ofPFsLZ/?utm_source=ig_embed&utm_campaign=loading" target="_blank" rel="noopener noreferrer">A post shared by Appsilon Data Science (@appsilonds)</a> on <time style="font-family: Arial,sans-serif; font-size: 14px; line-height: 17px;" datetime="2019-12-06T13:08:10+00:00">Dec 6, 2019 at 5:08am PST</time></p> </div></blockquote> <script async src="//www.instagram.com/embed.js"></script> There was one obstacle with communication: we shared a room with a competing team! It caused some mistrust at the beginning. After a whole day together though we get along and it was actually a great idea for networking and building relations. We honestly kept our fingers crossed for the success of the other team during the presentations. <h3>Our delivered solution</h3> We started slowly, very unorganised, and lost a lot of time to find the optimal way to operate technically. So here is the lesson learned: set up a repo earlier than later, a Slack channel, and a common environment. Integrate the tools that might be useful: we struggled with <a href="https://www.h2o.ai/">H2O</a> on Mac and <a href="https://www.h2o.ai/">H2O</a> <=> <a href="https://github.com/ModelOriented/DALEX">DALEX</a> integration (and it was totally expected that it will be used, as <a href="http://pbiecek.github.io/">prof. Przemysław Biecek</a> was on the jury). I previously mentioned H2O. As the prediction error was just a part of the final score we decided to not overthink the solution and use <a href="http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html"><i>autoML</i></a> for R. It turned out to be the best strategy: our model was good enough (we had the 3rd best result in this section) and it took us just one click to generate the model and predictions. We used the saved time to improve the interpretability of the model and craft the interesting recommendations. But please don’t think that no work is needed when using such automated solutions. What we discovered is that the solution is quite vulnerable for feature selection. And in our case that was a major issue: in the dataset there were 59 observations with ‘popularity index’ values and 7 to be predicted, but over 570 variables! Those were mostly financial budgets of each city per category and counts of tourist facilities, like hotels and restaurants. We took the following steps, and reducing the number of variables in each step improved our <i>autoML </i>model: <ol><li style="font-weight: 400;">Removed all of the variables that had 0 variance (had the same values for all observations; usually all equal to 0). After this step were left with 470 variables.</li><li style="font-weight: 400;">Removed all highly correlated variables using the <i>caret::findCorrelation </i>function. We got down to 426 variables.</li><li style="font-weight: 400;">Removed all unimportant variables using <i>caret::rfeControl </i>- the process of feature selection based on recursive feature elimination with random forests. We end up with 16 variables.</li><li style="font-weight: 400;">Added important variables from datasets that we “bought” with our “points”: <i>social media comments</i> about Poznań and <i>reviews from travelling portal</i>. Our final dataset had 30 variables and gave the best results.</li></ol> We were quite surprised that such an advanced solution as <i>autoML</i> that produces comparable results of super advanced deep learning models in just a few minutes does not have a step like removing the correlated variables. The biggest value of the model was to find out why a city was popular or not. We spent more time on interpretability. We discovered that public spending on culture and education by a city correlated with its popularity with tourists. Other correlations include the number and quality of reviews on various social media outlets. <blockquote>So one of our recommendations to Poznań was to encourage small businesses there to get active on social media platforms. </blockquote> The lotR team also recommended that Poznań enhance the cultural side of the city. Poznań doesn’t have natural attractions like mountains or a seaside. There are some historical points of interest, such as a cathedral with the graves of Poland’s first kings. But there are a number of massive events like concerts and picnics. (We later discovered that Poznan is still pretty happening even at 2:30 AM. As people from the “Lemon” city, we were surprised to see that). What was our big plus was the “Trip Planner” in the form of a basic <a href="https://shiny.rstudio.com/">Shiny</a> app. It allows the solution to be easily and user-friendly reproducible for different sets of family preferences. The purpose of the app is to recommend road trips for families. It calculates the average preferences of the family members and finds the 5 best destinations within 1500 km. It captures the family members’ interests with a questionnaire and compares their interests with the offerings of the various cities. In short -- it’s an app for family road trip ideas! <img class="size-medium wp-image-3282" src="https://wordpress.appsilon.com/wp-content/uploads/2019/12/image-7-600x343.png" alt="Screenshot of the Trip Planner app" width="600" height="343" /> Screenshot of the Trip Planner app <h3>Results - lessons learned</h3> The presentation part was done by me because the other guys on the team were busy drinking beers in the audience. Ok, to be honest, we made this decision to prevent the chaos of having four presenters on stage at the same time with little prep time, and it turned out to be a good choice. So we recommend assigning one person to give the final presentation. Next, don’t worry about the details, just focus on what is useful. Nine hours goes by quickly in this type of event. Focus on what is actually assessed! We noticed that the model was only worth 20% of the total score, so we found a pre-made model and used that, leaving us time to focus on other tasks. We were surprised by the technical issues that we encountered. So it’s wise to prepare your work environment and tools before the hackathon starts. <a href="https://shiny.rstudio.com/">R Shiny</a> allowed us to develop a decent looking solution in a short amount of time. We were the only team to use it, so maybe it gave us a little edge. :) <h3>Closing thoughts </h3> It was a great experience and got us out of our typical coding environment. After the competition, the city center was still pretty happening, with crowds everywhere. Poznań really looked like a “best kept secret” type of city. <img class="size-medium wp-image-3280" src="https://wordpress.appsilon.com/wp-content/uploads/2019/12/DSC_0401-375x500.jpg" alt="winners of the hackathon with the poster" width="375" height="500" /> lotR completed the journey Do you have your own hackathon hacks? Please add them in the comments below. Thanks for reading! Follow us on Twitter @dubelmarcin , @frappsilon, @frappsilon, @krystian8207, @q_nowicki <h4>Follow Appsilon Data Science on Social Media</h4><ul><li>Follow<a href="https://twitter.com/appsilon"> @Appsilon</a> on Twitter</li><li>Follow us on<a href="https://www.linkedin.com/company/appsilon"> LinkedIn</a></li><li>Sign up for our<a href="https://appsilon.com/blog/"> newsletter</a></li><li>Try out our R Shiny<a href="https://appsilon.com/opensource/"> open source</a> packages</li></ul>  

HackRcity 2019 recap: hacking the "coolest city" contest

Open source, pharma, and AI insights - once a week.

Share Your Data Goals with Us