VisionAID

GIF
VisionAid

Inspiration 💡

As we navigate the vibrant city of Calgary, with its extensive transportation network including C-trains, buses, and cabs, we often take for granted the ease of moving from one locale to another. However, for individuals with disabilities, particularly those who are visually impaired, the story is quite different. Imagine, in a city as developed as ours, 69% of the signals are not accessible to disabled people. This stark number sheds light on a significant accessibility gap that exists for nearly 60,000 visually impaired residents of Calgary, making even the simple act of walking a challenge.

How We Built It ⚙️

We built VisionAID to simplify city exploration for the visually impaired by integrating three key functionalities: Navigation, Collision Detection, and Contextual Analysis. Our approach harnesses data from a head-mounted device equipped with a Raspberry Pi as the main processor, 2 ultrasonic sensors, and a camera for image capture.

For navigation, we leverage Calgary's Open Data sets, including live transit routes, signals map, and crime statistics, along with Google Maps for precise directions. Collision detection is achieved through ultrasonic sensors and computer vision using OpenCV to identify nearby obstacles. To understand context, we utilize OpenAI's Vision APIs. Finally, the device converts all textual data into speech in real-time using Amazon Polly (Boto3), ensuring accessibility for users.

Conceptualization and Planning 📝

The project kicked off with an intensive conceptualization phase. Our team gathered to brainstorm the needs of visually impaired users and how technology could address these needs effectively. We identified key user cases that our project should cater to, such as navigating busy streets, identifying obstacles, and understanding the immediate environment through auditory feedback.

Understanding the constraints of a 24-hour development deadline, we scoped the project to ensure feasibility without compromising on the vision for future upgradability. This meant designing a modular system that could be expanded with more features and improved AI capabilities over time.

Primary Features 🚩

1. Navigation: VisionAID integrates Calgary's Open Data with Google Maps, providing users with audio-guided, safe, and efficient routes. It optimizes pathways by considering live transit updates, signal maps, and crime statistics to enhance travel safety.

2. Collision Avoidance: The device employs ultrasonic sensors and OpenCV technology to detect imminent obstacles in the user's path. This feature alerts users in real-time, allowing them to navigate around obstacles and maintain a clear path.

3. Contextual Analysis: Leveraging OpenAI's Vision APIs, VisionAID offers a deeper understanding of the surrounding environment by interpreting visual data and providing descriptive audio cues. This analysis aids in recognizing important landmarks, interpreting traffic signals, and providing situational awareness, enriching the navigation experience with contextual insights.

Navigation📍

For navigation, we leveraged the Calgary Open Database's maps. These comprehensive maps provided us with the necessary data to guide users through safe and accessible routes. By integrating this data into our system, we were able to offer real-time navigation assistance, tailored to the needs of visually impaired users, enhancing their mobility and independence.

Webcam Detection (Collision Avoidance) 📹

We utilized WebRTC and OpenCV for webcam detection, enabling our device to capture high-quality video feeds directly from a head-mounted webcam. This approach allowed us to process video data in real-time, crucial for the immediate detection of obstacles and navigation cues.

Context Analysis 🕵🏻

The core of our assistive device is the object detection feature, powered by Google Gemini's AI detection technology. We chose Gemini for its robustness and efficiency in processing visual data in real-time. Integrating Gemini's AI into a Python application, we created a lightweight yet powerful tool that could run on a Raspberry Pi. This setup allowed us to process live video feeds for object detection, ensuring our device could operate in a wide range of environments.

Speech-to-Text and Text-to-Speech Conversion 🔁

To facilitate seamless interaction between the user and the device, we incorporated Google Cloud's Speech-to-Text and Amazon Polly (Boto3) Text-to-Speech APIs. Speech-to-Text conversion enabled users to communicate naturally with the device, asking for information or inputting commands. Conversely, the Text-to-Speech feature allowed our system to provide auditory feedback to the user, describing their surroundings, announcing navigation directions, and alerting them to obstacles.

Challenges we ran into 😤

Throughout the development process, we encountered several challenges, from integrating diverse technologies to ensuring real-time performance under the constraints of a Raspberry Pi. However, these challenges pushed us to find creative solutions and optimize our system for efficiency and reliability.

Future Vision 🚀

Our ambition for VisionAID goes beyond the current prototype. We envision integrating this technology into sleek, sunglasses-like wearables, making it both more accessible and affordable. Our ultimate goal is to close the accessibility gap for visually impaired Calgarians, fostering a more inclusive community.

In conclusion, building this project was a comprehensive effort that blended advanced AI, software development, and user-centered design. Our team's dedication to creating an accessible and empowering tool for the visually impaired has laid a strong foundation for future enhancements and broader applications.

VisionAid Project