|
|
Useful Information about the ALTA 2016 Shared Task
This page contains some links that you may find useful
when attempting the task. More information may be added later on.
Related Papers
- Chisholm et al. (2016). Discovering entity knowledge bases on the web. In NAACL Workshop on Automated Knowledge Base Construction. [PDF]
Some Ideas
- Train a classifier using features of the URL itself (e.g., cosine similarity between tfidf-weighted URL word vectors);
- Derive features from the provided search result title and snippet information (e.g., cosine similarity between tfidf-weighted title word vectors);
- Download the web pages and extract additional features from the full text or markup;
- Use a large web crawl like Common Crawl or ClueWeb to collect links to these URLs and build mention context features
Feel free to post questions, comments, etc. at
the Kaggle
in Class competition page. In order to access the Kaggle in Class
pages, you need to register with this shared
task.
|
|
|