Accéder au contenu principal

Content is King

Preface


In this post I look at the existing NLP-enabled L2 content tools for learning Chinese referred to in Lee, Lam and Jiang (2016) and evaluate the philosophical differences with the system I propose here. Most of the research I refer to here is referenced in their paper and I discovered it from there.

Introduction

Learners of a second language are faced with the daunting task of learning the meanings and usage of 10s of thousands of new words and phrases. This requires the learner to be very highly motivated - it takes years and thousands of hours to master a second written language. Understanding how to motivate L2 learners using has been the subject of much research (for example), and one of the key findings of this research is that allowing the learners to consume material that aligns with their personal interests greatly improves outcomes.

Content vs Learning

I think it is intuitively obvious that it is easier to motivate learners when they are able to learn using content that is of personal interest to them. This finding has also been proven by field research. The focus with most tools, however, is on attempting to provide an optimal learning environment using a wide range of content using dedicated applications.

A number of applications have been developed over the course of the last 20 years but I argue that most of the existing applications take "learning the language" as their starting point rather than "consuming content". I think we should turn that on its head - a tool can be constructed to allow learners to get the cognitive scaffolding they need wherever they choose to consume digital content in the second language - on any content, on any device and whatever their level (above HSK3). As such, learning is a by-product of the content consumption rather than the focus. It should also be connected - all of the different consumption applications should allow the learner to interact with and enrich their own personal learning database wherever they are.

Obviously the current tools are very valuable learning aids and the system I propose would integrate many of the concepts. Learners also need powerful tools when they are in full "language learning" mode but these tools should be connected seamlessly to their general content consumption in a way to get maximal positive feedback. In that sense what I propose is not so much a replacement but rather extra functionality that could be integrated into to existing tools.

Existing applications

From Section 5 of Lee, Lam and Jiang:

    ...Fewer systems are available to learners of Chinese as a foreign language. Many focus mainly on teaching characters and words (Shei and Hsieh, 2012). Others, such as Clavis Sinica (clavisinica.com) and Du Chinese (duchinese.net), use pre-selected texts, vocabulary exercises and translations. The Smart Chinese Reader (nlptool.com) allows the user to input any text, and then automatically performs word segmentation and links the words to CC-CEDICT. In addition, it supports automatic sentence translation, and helps the user maintain a “to-learn” word list. Distinct to the systems cited above, our system automatically generates vocabulary review exercises (Section 2.3), and dynamically estimates the user’s proficiency level to personalize search results (Section 3).

Most of the tools mentioned use pre-selected texts. I think it is an impossible task to find and curate content that interests all users at the current time. I also haven't found anything that attempts to enrich text with language learning scaffolding outside of classical "textual" contexts (web pages, e-books, etc.). Music (karaoke screens) and video (via subtitles) are also opportunities that should be utilised for language learning, and would benefit from being enriched for learners.

NLPTool

The system mentioned above that most closely mirrors what I propose is NLPTool. NLPTool has the benefit of a strong set of learning tools to allow the learner to dissect the text with the same set of comprehension tools (sentence parsing, sentence translation with word/phrase alignment and dictionary entries) I use in my prototype. I suspect they are also using Stanford's CoreNLP for the parsing and they are definitely also using CC-Cedict for dictionary lookup. NLPTool comes with an integrated browser that lets the learner select words on any web page they choose and then get translations in place, or to select whole sections for detailed study in a separate window. These are key similarities.

Where the systems differ is that my system is (or rather will be!) available on any device I choose to use. NLPTool is a Windows desktop application. I haven't used Windows for many years and certainly don't intend to start doing so again in the near future. Increasing numbers of users don't have Windows machines, and most younger users consume most of their content on mobile devices. NLPTool claims that having all the components installed on the learners Windows machine is a key advantage - I disagree. Internet connections for most users are reliable and fast enough to not be a problem for the quantities of data transferred. I would be very surprised if many language learners take their Windows desktops or laptops into the mountains where connections are still poor. And the tool gets the source material from the internet, which means you need a reasonable internet connection anyway.

As a case in point, I am currently based in a relatively small city in Yunnan. The accommodation I am living in has no wired connection and campus wifi is not usable where I am. I am therefore exclusively using a 4G connection. For various reasons my prototype gets sentence translations and also does a dictionary lookup using Azure's Text Translation API. Because the Azure Text API is not available in mainland China without a VPN, all that traffic goes over a VPN, with significant extra latencies introduced. It is still perfectly usable. This is not viable long-term but illustrates how far networking has come, and how we can and should make this system available on the network. Being on the network also enables the system to have only a very light presentation layer on the learner's devices, and to quickly develop new plugins for future devices.

Again, I think that the more "learning focused" features of NLPTool (and the others) definitely benefit from having a dedicated application. The content comprehension aids should be everywhere though, and connected to each other seamlessly. Clearly the network will be an issue sometimes, and for certain "offline" consumption, plugins would benefit from having a local cache. What I envisage is some sort of pre-loading of the enriched content when the user has a good connection (storing an e-book, downloading a movie, etc.) and a local cache where the user can interact with the content (signaling known/unknown words, etc.) and have those interactions sync'ed later. This is exactly the principle the Anki spaced repetition software works on, and it works well in that context.

Conclusion

The existing tools available for content consumption for learners of Chinese focus on "language learning" rather than the content. While these are very useful, I think we should integrate a feature that allows the learner to enrich *any* content they wish to consume. Having multiple disconnected tools that need the same learner data is missing an opportunity for positive learning feedback. As such, all these tools should be connected and allow the user to benefit from them whenever they choose and on any device.

Commentaires

Posts les plus consultés de ce blog

Deeply personalised L2 content consumption

Preface In this blogpost I outline what I believe to be a novel use of existing technologies to dramatically increase the useful content available to learners of a second language. I introduce myself then follow with an outline of some relevant developmental and educational principles. From there I analyse the system requirements and present the system, and finally take a look at the opportunities for future research and development. For a more detailed look on the philosophical differences between my proposed system and existing tools, see my blogpost here . Introduction For my 40th birthday I decided to take a sabbatical year and tick off one of the big remaining items on my bucket list - learn Chinese. I had started learning Chinese during my Linguistics degree 20 years ago but fate took me to Europe and life as a software engineer, and not only did I not get very far with Chinese, I completely forgot what I had learnt all those years ago. The only character I could actively

Rich cloudy content

In this post I outline the technologies I used to create a prototype for the system I describe here . As I mention in the post outlining the idea , I strongly believe that a content consumption aid like this is only going to be useful if it is available on all content consumption devices, and if all the interactions with the system have a minimum of friction. Since the original post where I had developed a Google Chrome browser plugin and had the server-side stuff running on my laptop, I have moved the server applications to the (Alibaba) Cloud and ported the browser extension to Firefox. I also created a basic interface for interacting with the lexical database that is much easier to use for adding new words than using standard Anki . These are now live and I am using them to help my own learning. Server-side The system currently has 3 major components, all of which are currently installed on a single 4GB VM on Alibaba's cloud in their Hong Kong data centre. I am currently i