AI in Data & Skill Deserts: How Pressures and Unfamiliarity Kill AI Projects
July 16, 2021
Great AI applications require deliberate application, thoughtful planning, and meaningful data. Unfortunately, many projects are subject to various pressures that work against best practices. The seemingly never-ending hype around AI, and more specifically machine learning (ML) creates serious institutional fear of falling behind and losing opportunities. Budget cuts exacerbate these pressures as AI seems to be a solution to both do more with less and bring in additional funding.
This often means the push to create an AI project despite need or infrastructure, and thus, even if a true need exists, no concept of what time or resources the project might take to complete. By succumbing to these negative incentives, we force development in unfavorable conditions and become vulnerable to several hazards that threaten to waste resources and kill the project long before it reaches alpha testing.
try: AI_model except: print(“We do not need it!”)
The push to find some way to develop—and then brag about the use of—artificial intelligence can cause project developers to force complex solutions on simple problems. It is often data-centered problems that become the target of these ill-conceived initiatives since machine learning models can be used to pull meaningful insights out of data. However, ML is simply a tool. Just as you would not use a hammer as a can opener, you should not look to ML to address all your problems.
If you need insights from data, you should first explore existing data science tools. ML is great at finding patterns that are difficult to discern by other means. But there is often no need to train a model because the answers are not hidden. SQL queries and some insightful analysis can be sufficient to uncover the knowledge you seek.
SELECT * FROM non_existant_perfect_database
Even when machine learning is a useful way to address a problem, that does not mean the data is exists to support it. The data might be unavailable, expensive, or disorganized. I routinely get requests from ambitious undergrads for trial transcript data for some natural language processing (NLP) project they want to complete for class. Many of these projects have valuable aims, but that does not mean that the data creation workflows exist to support them. We exist in a world where so much information collected and made available, users mistakenly believe, or at least passionately hope, everything must be.
existing_Data !== usable_Data
Even when you can acquire raw data, that data might take months of work to prepare (clean and engineer) in a machine-readable format. Cleaning and engineering machine readable datasets takes time, money, and expertise. Time and money can ultimately impact the feasibility of any project. Unrealistic deadlines can hamper ethical development causing harmful outputs in the final model. Limited funding can bottleneck production or prevent the project from scaling with need.
Having all the data stars align will still not guide you through to the end of the project. Expertise is especially crucial as data preparation and ML development require thoughtful choices that can affect the reliability and generalizability of the model. Coding and training a meaningful and ethical machine learning model require knowledge beyond python, tensor flow, and other coding languages. Some understanding of statistical methods is required to choose the appropriate model as well as to fine tune the parameters in that model as the problem dictates. Documentation exists that suggests appropriate parameters for different problems, but ultimately, explaining the choices made in building any machine learning model falls on the developer. Failure to understand the potential impact of different decisions may result in employing problematic models in the real world long before they are ready.
Since burdens come from above, the responsibility falls on institutional managers and administrators to reduce these hazards. Leaders need a foundational knowledge of AI systems and development workflows it they want to foster an encouraging and supportive environment of AI development. They also need a deep understanding of the existing data infrastructure and information architecture at their institution. By doing so, they can better identify problems that AI might solve and likely ways forward to that solution.
Only then can they tailor short and long-term strategic goals to allows for ethical, meaningful, and thoughtful AI development at their institutions. These goals would address data collection, curation, and management, talent acquisition and development, and intentional cross-department or institution collaboration efforts. Roughly translated, these goals need to account for the time and resources it takes to cultivate a culture conducive to ethical and valuable AI projects.