Scikit-Learn House Price AI

Goals: Get accustomed to Jupyter Notebooks, Scikit-Learn, and simple regression AI modeling. Learn concepts such as normalization, imputation, enumeration, the foundations of CRISP-DM, and the basics of AI modeling.

Results: Model with an r2 of 0.9999968, but is not general to any house due to location data being factored into the model.

Process:

Use Pandas to read a csv into a DataFrame. Enumerate the data to get a frame with only numbers. Check for unusable data and use imputation, if needed, to insert data. After inspecting graphs of the data, normalize the data and filter it accordingly. Split the data into training and test sets, and then model the data using KNeighborsRegressor model and train the data. Then, we predict on the test set and measure the error. Finally, we fiddle with the model a bit to find the most accurate one, and then we’re done.

Snippets:

Information Gain on Parameters (VarianceThreshold Not Pictured): Information Gain

The r2 Values: R-Squared Values

The Final Model: Final Model

See the full project on Github