This work was inspired by a heated social phenomenon in Chikkna -- matchmaking corner. Targeted on single urbanites, an AR application based on a mask device was design to improve blind date experience and make matchmaking more efficient and intelligent.
MISCA (Meta Idea Space for Creative) is an AI-driven tool for designers and strategists to use during the ideation phase of their projects based on AR. It aims to to overcome design fixation and make ideation process more effective.
MIT Media Lab MAS.S65 Cognitive Augmentation
Group of 3: Aria Xiying Bao, Noah Deutsch, Erica Luzzi
Instructor:
Pattie Maes, Pattie Maes, Professor of Media, Arts and Sciences
My contribution:
Ideation, UI/UX design, HoloLens Prototyping
We introduce MISCA, an augmented reality creativity tool that expands the possibilities of creative ideation. We hope to aid designers and strategists during the ideation phase by introducing new ideas and preventing the fixation effect. We achieve this by leveraging outputs from machine learning models to create an immersive, mixed-reality display. Our initial prototype successfully reads handwritten text from a brainstorming session, generates new ideas based on this text, and creates accompanying image artifacts which are then displayed on the whiteboard in augmented reality.
Whiteboard sketching is a method used broadly by designers and strategists to brainstorm and sketch out design concepts. It is a simple, low-tech system for augmenting creativity. Creativity is associated with the tendency to generate or recognize ideas, alternatives, or possibilities that may be useful in solving problems, communicating with others, and entertaining ourselves and others [1].
However, when whiteboarding, people often fixate and get stuck on the concepts that they have already written down or sketched. Designers often have difficulty moving beyond the ideas they have experienced previously, which is referred to as fixation [3]. One way to move beyond fixation is through introduction of new stimuli. The creative power this may afford is supported by studies which have shown that introducing inter-domain examples was an effective way to promote evocation in the idea-generation process [12]. This was more effective than intra-domain examples, which a strategist is more likely to fixate on given their specific knowledge base. Consequently, a knowledge base beyond the user’s would be helpful in moving beyond design fixation and making the ideation session more fruitful. This may be why many prefer to ideate with others who provide intra-domain knowledge.
We examined a number of existing projects that use AI to aid in ideation or creative tasks. From the Fluid Interfaces group, Paper Dreams is an AI-based collaborative tool that augments the sketching process by suggesting elements based on the user’s inputs to the canvas. Paper Dreams allows the user to control the level of serendipity in the outputs, which determines how conventional or unconventional the outputs will be. [7] This supports our background research on the effectiveness of intra-domain examples and served as a useful precedent when designing our “wackiness meter.”
Several other precedents aimed to augment the sketching process. One study explored an AI system that provides inspiration sketches to help the user improve their existing sketches. [8] Google’s Creative Lab has developed a website called Quick Draw which recognizes what the user is sketching. This is less of a tool and more of an interactive game whose purpose is to build a large hand-sketching dataset and improve the machine learning model being used. [9]
Finally, Nvidia’s Canvas AI painting tool instantly transforms sketches into realistic landscapes using a machine learning technique known as inpainting. This tool creates highly realistic outputs, yet lacks a collaborative element [10]. Finally, we wanted to explore precedents that used text generation modalities. AI Dungeon is a game that creates an infinitely generated text adventure powered by GPT-3. The game allows users to create their own custom settings, which is an interesting precedent given the recent release of fine-tuning abilities for GPT-3 [11].
In future iterations of our project, we hope to explore this fine-tuning to provide our user with a more personalized experience.In evaluating these precedents, we realized that none utilize a 3D space, indicating a clear opportunity for the use of mixed reality technologies to provide a more immersive brainstorming environment. Additionally, none of the precedents we explored utilize both text and image generated by machine learning, a novelty we hoped to achieve in our project.
We aim to augment human creativity by leveraging generative AI. We developed our tool to reduce the fixation effect and help users generate more inspired and out-of-the-box ideas. Our system recognizes words and sketched images based on a whiteboard and uses these as a basis to generate new words, images, and 3D representations in AR vision. This helps users unlock new connections between their previous ideas and spark new areas for them to explore.
We adopted the Hololens (1st gen) hardware [2] to develop the tool system. The built-in camera on the Hololens captures real-time images for our system to capture. First, we use Google’s Cloud Vision API to recognize the handwriting on the whiteboard and translate it into a string of text. For our prototype, we created an input paradigm in which the handwritten text on the whiteboard needs to be written as “input,” a series of keywords and phrases that should influence the idea generation, and “target,” the aim of the ideation session. This specific format allows our system to better understand the users’ ideation target, though we hope to make this more flexible in future prototypes. After the text is received, we use the GPT-3 [4] model to generate novel ideas based on the input. The generated text is then sent to the python server that runs the CLIP_Guided_Diffusion model to generate inspiration images.
We designed the XR interface to be displayed beside the whiteboard that the users are writing on (Figure 2). However, we also allow for users to manipulate the contents to be placed wherever they wish. The XR interface in MISCA consists mainly of two parts: the control panel and the generated ideas along with images. The first display consists of containers holding the keywords and phrases written on the whiteboard for the “input” and “target”, a wackiness control slider which determines how unconventional the outputs will be, and the “generate” button. The generated ideas display consists of five containers which hold a generated text idea and inspiration image based off of the text idea.
The user, who wears the HoloLens, first brainstorms ideas as they usually do on a whiteboard (Figure 3). To begin using our tool, users press both of the volume buttons on the HoloLens to capture the content on the whiteboard. Next, the user can adjust the wackiness meter by using two fingers to pinch the toggle and move the bar horizontally. To trigger idea generation, the user indicates the “generate” button by focusing their head position and using a pinch gesture to confirm. After one to two seconds, the recognized input, target and generated text ideas will be displayed on the XR interface. After approximately 30 seconds, the five generated images will be loaded onto the scene. If the users wish to generate another group of novel ideas, they simply need to press the “generate” button again.
We evaluated our tool by testing different handwritten inputs and targets for the system to generate ideas and images. We did not follow a systematic evaluation but rather experimented with our prototype and confirmed that all parts of the pipeline are functional. Below are examples of the generated outputs.
Overall, the initial prototype we developed successfully demonstrates the potential of an AI-powered tool to flexibly support creative ideation across a range of use cases. We believe this is only the beginning, and have identified four exciting opportunities for future development in the near-term.
One opportunity for future work is to provide users with more control over the style of the image output. In our current implementation, we simply include the words “Oil on Canvas.” at the end of each text prompt in order to give each image a stylized appearance. In the future, it would be possible to allow users to select from a range of different styles, from pixel art to photorealism, to better suit their needs.
Related to this opportunity, we would also like to explore the capability of generating 3D objects in addition to images. Recent work [5,6] in the research community has established new approaches to zero-shot object generation in response to text input with impressive results. Implementing 3D object generation would allow MISCA to become an even more immersive and inspiring idea space for our users and would better utilize the spatial affordances of an XR environment.
Another opportunity for MISCA is to allow users to view the progression of generated images in real time rather than waiting for a generated image to be displayed after a certain number of training cycles. Additionally, we would like to give users the ability to “reorient” their images in the direction of a new text prompt during image generation so that they can tweak the image to their liking.
Lastly, we believe MISCA has the potential to support a range of different input paradigms in the future. In our current implementation, users must specify inputs (keywords or phrases to be considered) and a target (a description of what MISCA should generate) in order to ensure reliable output quality. However, as a next step, we would like to see MISCA support more unstructured synthesis of whiteboard content. In particular, it may be interesting to explore a conversational approach, where the user and system communicate back and forth with one another to facilitate more collaborative ideation. We would also like to provide users with the opportunity to indicate which outputs they found helpful so that the system can adapt accordingly.
After further development, we believe it would be valuable to test MISCA in the context of a controlled human-subjects study in order to better characterize and quantify its impact on creative ideation.