Machine Learning Hackathon

Important Dates Data Specifications Marking Scheme SparkCognition Team Darwin Tutorials

The Centre for Machine Intelligence in collaboration with SparkCognition, invites students to participate in its first Machine Learning Hackathon Competition. The hackathon is open to all students from all backgrounds and aims to help them:

learn about machine learning
develop their teamworking skills
develop entrepreneurial skills
improve their presentation skills.

No prior experience in Machine Learning is required but having some basic coding skills and some understanding of statistics will definitely give you a head start. Over the span of two weeks, you and your team will apply Machine Learning to solve a problem of your choice. We’ll start with “Machine Learning 101,” a short primer on the technology, what it’s good for (and not good for), and how to use it. Then you’ll pick a problem to solve and jump in!

Registration is now closed! We received over 300 registrations and many on the waiting list.

But, you ask, with little or no background in Machine Learning, how can you jump in without drowning? Good question. Machine Learning has matured to the point that powerful tools have been built to enable “citizen data scientists” to succeed without specialized training. SparkCognition, a leading AI company based in Austin, Texas, has given us Darwin, an Automated Machine Learning tool. Darwin solves the hard problems of wrangling and exploring datasets, searching for a good model and evaluating its efficacy. With Darwin you’ll be empowered to solve problems that would otherwise require years of AI experience.

Structure of the Competition

Projects should be done in teams of four or five.
You’re encouraged to choose your own problem and dataset(s) – think broadly and pursue your interests. Another good option is to tackle a problem from one of our corporate sponsors, using a provided dataset.
A panel of judges will select the top 10-20% of the projects to advance to the final round of competition.
These finalists will present their work to an audience, including another panel of judges. The judges will award several prizes, including a total of £2000 (in Amazon or other vouchers) to the teams producing the top two projects.

Throughout the event, SparkCognition will provide technical support on selecting problems and using Darwin. As described in the timeline below, your team will submit a short description of your project. The report should describe what you did: the problem you solved, why it’s interesting/important, challenges you encountered and the results you obtained. Three to five pages should suffice.

Important Dates

Before November 6^th: Recruit your teammates
November 6^th (venue: 46/3001) – Kickoff 6pm - 9pm (food/drinks will be provided)
- Machine Learning 101: introduction to the technology
- Using Darwin
- What makes for a good problem and dataset
November 6^th through 16^th – work with your team to solve your problem using Darwin
November 17^{th (midnight)} – submit your project report
November 18^th – finalists selected and announced
November 20^th – finalists present their work to judges, and prizes are awarded

You should Participate

Everyone’s busy, especially this time of year. Nevertheless, you should strongly consider making time for this event. Machine Learning is transforming whole industries, and you need to know when it’s applicable and how to use it. This event will give you a good introduction to the technology and a state-of-the-art tool that makes you quickly productive. You’ll meet corporate leaders, work with your friends, scope out a cool project, and compete to win significant prizes.

Challenge Problems

These are pre-selected datasets and problem definitions that you may want to start your project on. These problems have been chosen based on their relevance to real-world domains and the economic and social impact they have. If you come up with a brilliant idea to solve any of these, you can be sure there will be a route to market for it. We strongly encourage you to find your own dataset and specify your own problem (see constraints on datasets below).

Download sample datasets here.

Problem 1: CHURN

A telephone company is trying to retain its customers and avoid "customer churn," a jargony way of saying that a customer moves to another service provider. The company compiles a dataset that records information about each customer, including whether s/he "churned," as recorded in the rightmost column of the dataset. Given the training set, the challenge is to form a model that predicts whether different customers such as those in the test set, will churn or not.

Problem 2: POWER

A power-generation company is trying to predict the power it will produce each day, based on environmental conditions such as Temperature, Pressure and Humidity. The training set records information about environmental conditions and power generation across many days. Given the training set, the challenge is to form a model that predicts the power that will be generated under various conditions given in the test set.

Problem 3: ENGAGEMENT

We would like to categorise companies according to their engagement levels i.e., do they respond to calls and at what times of the day do they tend to answer phone calls. Each company has a number of contact points and therefore it's important to determine who might be the best person to contact and can we predict when to call them. Also, it may be interesting to know from which sectors most engaged companies come from or eventually set up meetings with us (see the Meeting column). Note of caution: this dataset does not come with a training/test set and has not yet been tested on Darwin and may require you to use Python and the SDK to specify a good subset.

PROBLEM 4: FOOTBALL PREDICTIONS

This dataset contains match outcomes from the NFL for 2017 and 2018 season. Interesting problems to solve include predicting the point spread or the match outcome. Note of caution: this dataset does not come with a training/test set and has not yet been tested on Darwin and may require you to use Python and the SDK to specify a good subset.

Darwin Tutorials

You can access Darwin tutorials here.

Dataset specifications and rules

Darwin is a platform that works for numerical data fit for prediction and classification task. If you know Python, you can also use the SDK (software development kit) to develop your model for time series prediction.

Darwin will help you work out what model to use to run regression, classification, and prediction on your dataset. The following rules apply:

Make sure you are allowed to use the dataset. Datasets classified as OPEN are good to use. Datasets on Kaggle.com would not usually be open and will come with constraints on usage. So we advise not to use these.
Make sure you are aware of the conditions under which your dataset was collected. Does it have gaps? Can it contain errors? Make sure you can answer these questions before starting to develop your model. Thus you can make sure you clean the dataset before loading it.
Your data will need to be Excel or CSV format for Darwin to process it.

Marking Scheme

These are the criteria to score your report and your presentation. Your submission will be reviewed by a panel of experts from industry and the university.

15 points	Problem Selection: How was Darwin used? Was the problem selected meaningful, appropriate for the tools used and scope of the project, and quantifiable? Was appropriate data used? Was the data set up appropriately to achieve meaningful results?
22.5 points	Outcome: Was the research complete? How well were the results analyzed to solve the problem?
25 points	Innovation: What innovation did the team bring to the table? (Examples: novel approaches to feature engineering, incorporating insights from a paper, integrating multiple data sets, or setting up a problem uniquely.)
17.5 points	Impact: What are the implications of this research, and how impactful could it be in a given field?
20 points	Presentation: How well did the team present the project and results? Were both the technical and business perspectives of the problem and solution explained?

SparkCognition

We are very grateful to have the support of SparkCognition for this event. SparkCognition have provided a prize of £2000 for the winner of the competition and are also providing free access to their platform, tutorial videos, and daily online support for all teams.

Sridhar Sudarsan

Sridhar Sudarsan is the Chief Technology Officer of SparkCognition. Sudarsan is responsible for driving SparkCognition’s product and technology strategy, leveraging next-generation artificial intelligence systems to secure and optimize assets across key industries.

With over two decades of technology leadership experience, Sudarsan has been at the helm of several complex products and projects, collaborating with global customers on cutting-edge technologies. Previously, Sudarsan was the CTO of IBM Watson Platform and Partnerships, where he led the technology strategy and architecture of the IBM Watson platform. Sudarsan is widely recognized as an expert on the business potential and application of advanced technologies. He provides thought leadership on AI solutions and patterns for clients, partners, academics, and R&D teams. He holds over 14 patents in the areas of AI and distributed computing, has published white papers and articles for a variety of outlets, and has been a featured speaker at conferences and universities.

Sari Andoni

Sari Andoni is a Senior Data Scientist at SparkCognition, Inc. He has extensive experience in machine learning, neural networks and deep learning

combined with a research background in neurobiology. With published research in leading journals, Sari currently focuses on automated model building with multivariate time-series data using artificial neural networks. He received his Bachelors degree in Computer Sciences and Mathematics from Brigham Young University, and a PhD from the Institute for Neuroscience at The University of Texas at Austin.

For his dissertation, Sari studied the auditory midbrain and how the auditory system classifies natural vocalizations into behaviorally relevant perceptions. In his postdoctoral research, he studied the visual system focusing on the interaction of spontaneous activity with stimulus-evoked responses in the thalamocortical circuit.

Anna Dingley

Anna represents SparkCognition in the Europe and UK as Regional Vice President of Business Development. She has been with the company since 2016, working closely with the executive board

as they grow in numbers of employees, numbers of clients, and global reach. She has worked in international business development for all her career, and specialises in building Japanese business due to being fluent in Japanese and having worked in Japan for over 8 years in finance, government and in industry.

Anna was born in Southampton but hasn't spent much of her life in the city so she is looking forward to continuing that connection at the Hackathon today.

Bruce Porter

A two-time Chair of the University of Texas Computer Science Department, Dr. Bruce Porter serves as SparkCognition's Chief Science Officer, where he leads the company's many R&D initiatives. Currently, as University Professor, Dr. Porter research focuses on machine reading, a technology that holds tremendous potential for capturing knowledge for automated inference, question answering, explanation generation, and other AI capabilities.

Dr. Porter also directs UT’s Knowledge Systems Research Group, an AI organization with the goal to develop methods to build knowledgeable computers. He has won the Best Paper Award at the National Conference on Artificial Intelligence, the College of Natural Sciences Teaching Excellence Award, the National Science Foundation’s Presidential Young Investigator Award, and the President’s Associates Teaching Excellence Award.

Selected Awards & Honors:

Best Paper Award, National Conference on Artificial Intelligence (AAAI)
Recipient of the President’s Associates Teaching Excellence Awardby the University of Texas at Austin
Recipient of a Presidential Young Investigator Awardby the National Science Foundation

Keith Moore

Keith Moore is the Director of Product Management at SparkCognition and is responsible for the development of the IoT product line (SparkPredict®). He specializes in applying advanced data science and natural language processing algorithms to complex data sets. Moore previously worked for National Instruments as an analog-to-digital converter and vibration software product manager. Prior to that, he developed client software solutions for major oil and gas, aerospace, and semiconductor organizations. Moore has served as a board member of Pi Kappa Phi fraternity, and still serves volunteers on the alumni engagement committee. He graduated from the University of Tennessee with a with a B.A. in mechanical engineering.