Topic Model Browser for NEB Projects

Project Description: this site provides a visualization of the topic modelling output of the hearing transcripts of 35 oil pipeline project proposals reviewed by the National Energy Board of Canada between 1993 and 2018. More details about the project and methodology can be found here (link to “About”). You can cite this work as Chavez, J.F. and Moa, B. (2023). Topic Modelling Browser – National Energy Board hearing transcripts.

topic	variation	top words	proportion of corpus average probability

Project description: this model-browser provides a visualization of the topic modelling output used to analyze the content of the hearing transcripts of 35 oil pipeline project proposals reviewed by the National Energy Board of Canada between 1993 and 2018. In Canada, major pipeline projects were reviewed by the National Energy Board (NEB) until 2019, when changes in regulation replaced the NEB with the Canadian Energy Regulator (CER). The NEB’s mandate was to “review applications to build and operate new energy pipelines and make its decision or recommendation based on the Canadian public interest” (National Energy Board, n.d.-a). As part of the review process, the NEB launched public hearings where stakeholders could participate to challenge the plans and evidence presented by the project proponent, and provide their own evidence and views. Meanwhile, the project proponent was permitted to respond to the interventions of stakeholders. The NEB’s responsibility was to take into consideration these arguments in their final decision.

Data collection: The dataset used in this analysis consists of 411 documents containing 44,231 pages and 14.9 million words associated with the hearings held as part of the review process of 35 oil pipeline project proposals reviewed by the National Energy Board between 1993 and 2018. These documents were publicly accessible from the National Energy Board’s website (https://apps.neb-one.gc.ca/) and were downloaded in May 2018.

Preparing the corpus: To prepare our corpus, we took several steps aimed at four objectives: first, to transform the original documents (i.e., volumes of hearings) into documents per actor, so each document contains the verbatim transcripts of an individual actor during the hearing; second, to estimate the size of each actor’s document as a proxy of the length of their participation in the hearing, which was used to exclude those that did not have significant participation in the hearing (i.e., around 100 words or less); third, to have the collection of documents (one document per actor) prepared for topics extraction and topics weighting quantification through topic modelling. The corpus comprised 3,074 documents, one per actor, associated with the 35 cases mentioned earlier.

Topics extraction and topics weighting quantification through topic modelling: To extract the topics from the transcripts and quantify them for each actor, we relied on Topic Modelling (TM). The TM algorithm used in this study was the probabilistic model of Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003), which is a generative probabilistic model based on the assumption that documents within a corpus exhibit multiple themes that can be represented as a probabilistic mixture of topics, and that topics are a probabilistic mixture of words. The entire corpus was examined to calculate the distribution of words in topics and the distribution of topics in documents (documents represent actors, in our case). The software used was MALLET embedded in our TM processing pipeline using both Python and R. This approach allowed us to inductively identify the topics mobilized by each actor within the hearing, as well as their weighting function (or probability distributions). We calculated models that ranged from 5 to 150 topics and then selected the best model based on its coherence measure and manual examination. The optimal model was one with 60 topics.

Labelling topics into general concern categories: We took an abductive approach (Dubois & Gadde, 2002), “going back and forth” between our analysis of the topics and the three dimensions that characterize the notion of the public interest as described by the NEB: “environmental, economic, and social interests” (National Energy Board, n.d.-a).

You can cite this work as Chavez, J.F. and Moa, B. (2023). Topic Modelling Browser – National Energy Board hearing transcripts.

This topic modelling browser was based on the model-browser interface by Andrew Goldstone; source available on github. Made using d3.js and Bootstrap. Zip support using JSZip.

Top words

Word	Weight

Conditional probability of words in topic of topic

Click a bar to limit to the documents it represents

Prominent topics for

Click row labels to go to the corresponding topic page; click a word to show the topic list for that word.

jump to:

top

All words prominent in any topic

Words not prominent in any topic are not listed

Top words

Conditional probability of words in topic of topic

Top documents

Prominent topics for

All words prominent in any topic