12/24/2023 0 Comments Kaggle competition spelling corrector![]() Such version of features can be generated by turning off the GOOGLE_CORRECTING_QUERY flag in config.py.Īfter team merging with Igor&Kostia, we have rebuilt everything from scratch, and most of our models used different subsets of Igor&Kostia's features. As a matter of fact, one feature set (i.e., basic20160313) from our final solution is generated before the Fixing Typos post, i.e., not using the Google spelling correction dictionary. Note that various text processing are useful for introducing diversity into ensemble. While we have tried our best to make things as parallelism and efficient as possible, this part might still take 1 ~ 2 days to finish, depending on the computational power. ![]() To generate data and features, one should run python run_data.py. NLTK corpora and taggers data downloaded using nltk.download(), specifically: stopwords.zip, wordnet.zip and maxent_treebank_pos_tagger.zip.Data/dict/word_replacer.csv in this repo. Home-made word replacement dictionary, i.e.Google spelling correction dictionary from this Kaggle forum post, i.e., google_spelling_checker_dict.py in this repo.Color data from this Kaggle forum post, i.e.We also used the following external data: After that, put all the models in the corresponding directory (see config.py for detail). We used glove-gensim to convert GloVe vectors into Word2Vec format for easy usage with Gensim. We used pre-trained Word2Vec models listed in this Github repo. We used the following thirdparty packages: We used the following packages installed via install.packages(): In addition, we also used the following libraries and modules: ![]() We used Python 3.5.1 and modules comes with Anaconda 2.4.1 (64-bit). Note that in the following, all the commands and scripts are executed and run in directory. Instruction Chenglong's Partīefore proceeding, one should place all the data from the competition website into folder. Doc/Kaggle_HomeDepot_Turing_Test.pdf for documentation. Simplified Single Model from Igor and Kostia (10 features) Turing Test's Solution for Home Depot Product Search Relevance Competition on Kaggle Submission Submission
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |