More Bang for your Context: Virtual Documents for Question Answering over Long Documents
- Yosi Mass
- Boaz Carmeli
- et al.
- 2024
- EMNLP 2024
I have consistently applied rapid application development resulting in products for commercial sales or to support customer engagements. Customers are the ultimate domain experts and I gravitate to applied research working closely with them for validation. I enjoy acting as a research entrepreneur in an environment that values real-world experience and is aggressive in turning concepts into assets. My research and development has resulted in commercially viable, revenue generating products, and successful customer engagements.
Assignment History:
2022 - Present
watsonx Conversational AI
RAG Data Pipeline
Designed and developed the Retrieval Augmented Generation (RAG) Data Pipeline to handle accumulation of training and test content for LLMs and to feed retrieval based search to generate document grounded responses for multi-turn conversations. Services include web content crawling with robots.txt validation and URL filtering; HTML transformation to both Markdown and Formatted Text; document decomposition into smaller "passages" to address LLM token limitations; document classification statistics; and markdown table expansion.
2018-2022
Major Telecom, Banking, Manufacturing
IBM Agent Assist, Data & AI Automation
Commercialized Asset: Real-time automated support agent assistance, monitoring voice and chat conversations to make recommendations for what Agents can say, do, or reference to address clients’ needs. Uses deep learning models based on conversation history to build recommendation models. PII is protected through semantic information masking. Web content is transformed and interpreted to glean procedures, conditionals, and document structure to drive document reference recommendations. Integration with 3rd Party Agent Platforms (LivePerson, Genesys Cloud, NICE).
Technologies: React, Carbon Design, MongoDB, Docker, WebSockets, Stanford NLP, System T, Apache PDFBox, Java EE, Node.js, ES6 JavaScript, HTML5, Selenium, Markdown
Open Source Contributions:
Developed and Open Sourced the following projects currently active and used in IBM products:
2015 – 2018
Major Aviation Manufacturer
Conversation Systems Research
Collaborative Research Engagement: Led all technical aspects of customer research engagement for prototype design and development of real-time conversation application for multi-person conversation system for situation awareness supporting shared context for evolving, asynchronously monitored situations. Involved real-time coordination and display of equipment health and safety status; weather monitoring, and impact; geographic awareness and destination planning; decision support, and assistance with operations procedures. Developed conversation workflow orchestration platform combining Cloud Functions, Watson Assistant, and multi-application routing to support conversation suspension / resumption to help clients address emergency diversions and unplanned itinerary changes. Created a graph-based conversation platform to field NLP queries and generate conversation to reduce results optimized on evolving situation awareness facet distribution. Integration of speech to text and ML-based conversation.
Technologies: Java / JavaScript, Watson Assistant, Watson Speech to Text / Text to Speech, ANTLR, WebSockets, Blazegraph, Kafka, Zookeeper, Node.js, Web Components, Cloud Functions, REST, Node-RED, Stanford NLP, WebSphere Liberty, DB2
2014 – 2015
Major Financial Institution
Visual Analytics Research
Project Description: Leading Governance, Risk and Compliance Visual Analytics First of a Kind customer engagement. https://www.youtube.com/watch?v=gSlWapg5HcE
Technologies: OpenPages, d3js.org, Web Components, WebSockets, Node.js
2013 - 2014
Major HR Vendor
Best Fit Expertise
Project Description: Lead all technical aspects of Best Fit Expertise. Coordinated multi-national efforts to design and develop aspects of expertise tracking at IBM.
Technologies: Smarter Workforce, SmallBlue, Social Q&A, Expediting Expertise, Expertise Locator, and Expertise 360.
2011 - 2013
IBM Cross Software Group Joint Program
Collaborative Decision-Making
Project Description: Chief Architect and Lead Designer for development of framework for Collaborative Decision Making. Allows teams to discuss and debate alternative solutions for complex issues resulting in sound decision making. Focuses on transparency of decision-making rationale, sentiment capture, and ratings aggregation to promote leading alternatives.
Technologies: IBM Social Networking and Deployment (SaND), DB2, WebSphere, REST, Lucene, UIMA, IBM Extreme Scale
2010 - 2011
Dept. of Energy Greater Philadelphia Innovation Center
Decision Support Data Warehouse
Project Description: Conceived, designed and developed decision support infrastructure and data warehouse to guide customers through decision trees, soliciting feedback and choices to assist with building component selection for smarter building / energy savings. Technologies: Decision Trees McMaster University
Energy Modeling/Simulation
i-BEE Research FoaK for Smarter Buildings Project Description: Conceived, designed and developed Building Management System data integration framework performing automated content retrieval, transformation, and summarization into a normalized repository to feed energy modeling and simulation analytics.
2009 - 2010
Caterpillar
Condition Monitoring Collaborative Innovation Market Test
Project Description: Chief Architect, Technical Lead, Knowledge Engineer to provide condition monitoring and automated recommendations for heavy equipment in mining and quarry and aggregates industries using PAC, Maximo, WebSphere, DB2. Conceived, designed and developed Analytic Application layer to review, correlate, and find patterns in electronic vehicle data, fluids analysis, and inspection data to generate Maximo Work Orders and associated recommended actions for preventative maintenance and improved availability of mission critical equipment. Developed > 25,000 rules. Managed technical delivery team. from GBS and S&D. Provided Condition Monitoring, Diagnostic / Prognostic Subject Matter Expertise.
2006 - 2009
US Army Heavy Brigade Combat Team
Vehicle Health Management System (VHMS)
Project Description: Chief Architect, Technical Lead for multi-division IBM team providing Condition Based Maintenance subject matter expertise in delivery of Portal, SOA Services, and Vehicle Data Repository. Expanded prior role from CoBRA to collaborate with other Army divisions (ARDEC, TARDEC, TACOM, LOGSA) to integrate the System Integrate Lab (SIL) with existing Army programs for Asset Management, Work Order Processing, and Repairs to augment vehicle operational data analysis.
CoBRA: Condition Based Reliability Analysis
Project Description: Chief Architect, Technical Lead for COBRA, a condition-based maintenance plus initiative for the Heavy Brigade Combat Team (managing tanks, tracked and wheeled fighting vehicles). Using PAC and other IBM assets, conceived, designed and developed an end-to-end SOA application carrying distributed operational data from vehicles to a vehicle data repository, accessible by Web Services on an Enterprise Service Bus (ESB) driving a Web Portal accessed by US Army, and 3rd Party analytics and web developers.
2004 - 2006
US Government Agency*
Distillery
Project Description: Team Lead for the User Experience for Investigative Reasoning and Reporting. Designed/Developed custom communications infrastructure.
2000 - 2004
Ford, US Army, GM, Daimler Chrysler, International Truck
Automated Analysis Initiative (AAI)
Project Description: Conceived, designed, developed multifaceted reasoning framework for complex systems operational analysis of high-volume time series and event data for diagnostics and prognostics. First-of-a-Kind engagements with US Army and International Truck. Less formal engagements with other automotive companies. Work commercialized as Research Asset: Parametric Analysis Center (PAC) 6949-20L in 2005
1997 - 2000
IBM Asset
Page Detailer
Project Description: Conceived, Designed, Developed Page Detailer -- a client-based TCP/ IP, HTTP/S and web performance monitor providing decomposed visibility of scheduled communications activities used to interact with servers. Page Detailer shipped with WebSphere Studio and is available for download at https://www.softpedia.com/get/Internet/WEB-Design/Web-Design-related/IBM-Page-Detailer.shtml
Employment History:
12/1997 - To date
IBM, United States of America
Senior Technical Staff Member, T.J. Watson Research Center, Yorktown, NY
10/1988 - 12/1997
Class Objects, United States of America
President
06/1979 - 10/1988
Avant Garde Computing, United States of America
Director Product Development, R&D, Product Manager
Patents
Filed:
Granted
Applications