Connecting firm's web scraped textual content to body of science: Utilizing microsoft academic graph hierarchical topic modeling.

Arash Hajikhani Lukas Pukelis Arho Suominen Sajad Ashouri Torben Schubert Ad Notten Scott W Cunningham

MethodsX

Department of Government, University of Strathclyde, United Kingdom.

Published: February 2022

This paper demonstrates a method to transform and link textual information scraped from companies' websites to the scientific body of knowledge. The method illustrates the benefit of Natural Language Processing (NLP) in creating links between established economic classification systems with novel and agile constructs that new data sources enable. Therefore, we experimented on the European classification of economic activities (known as NACE) on sectoral and company levels. We established a connection with Microsoft Academic Graph hierarchical topic modeling based on companies' website content. Central to the operationalization of our method are a web scraping process, NLP and a data transformation/linkage procedure. The method contains three main steps: data source identification, raw data retrieval, and data preparation and transformation. These steps are applied to two distinct data sources.

Download full-text PDF	Source
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC8914545	PMC
http://dx.doi.org/10.1016/j.mex.2022.101650	DOI Listing

Publication Analysis

Top Keywords

microsoft academic

academic graph

graph hierarchical

hierarchical topic

topic modeling

data sources

data

connecting firm's

firm's web

web scraped

Similar Publications

Want AI Summaries of new PubMed Abstracts delivered to your In-box?

Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!

A PHP Error was encountered