Hongkai Wu

Hongkai Wu

Greater Seattle Area
3K followers 500+ connections

Activity

Join now to see all activity

Experience

  • Pilot Studio Inc

    United States

  • -

    Seattle, Washington, United States

  • -

    Kirkland, Washington, United States

  • -

    Greater Seattle Area

  • -

    Greater Seattle Area

  • -

    Washington D.C.

  • -

    Washington D.C. Metro Area

Education

Publications

  • Minerva II: A Novel Entity Discovery Tool

    ACM SIGCHI

    Entity discovery is a long-lasting interest in governments, enterprises, and the research community. It is a complex task that requires retrieving, extracting, linking, and displaying entities. Algorithms to support entity discovery have been proposed across disciplines including Information Retrieval (IR), Information Extraction (IE), Natural Language Processing (NLP), and Data Mining (DM). However, there is little study on User Interface (UI) for supporting effective entity discovery. This…

    Entity discovery is a long-lasting interest in governments, enterprises, and the research community. It is a complex task that requires retrieving, extracting, linking, and displaying entities. Algorithms to support entity discovery have been proposed across disciplines including Information Retrieval (IR), Information Extraction (IE), Natural Language Processing (NLP), and Data Mining (DM). However, there is little study on User Interface (UI) for supporting effective entity discovery. This paper presents Minerva II, a novel entity discovery tool, to tackle this challenge. In the paper, we illustrate the UI design and how it effectively supports the typical work flow when a user performs entity discovery. We also describe a new visualization algorithm for entity networks. Our user study shows that Minerva II is able to greatly increase users' efficiency.

    Other authors
    See publication
  • Modeling Search Engine's Explorations in Dynamic Search: An Ontological Perspective.

    M.S. Thesis. Georgetown University. DC, USA.

    Dynamic search is an information retrieval task, in which information systems retrieve documents for a user’s multiple queries. Each query starts a search iteration and aims to fulfill part of the user’s information need. Modeling search engine’s explorations in dynamic search serve to help search engines explore in the information space, retrieve relevant documents and fulfill the user’s information need. Previous work employs topic modeling, such as Latent Dirichlet Allocation (LDA) to…

    Dynamic search is an information retrieval task, in which information systems retrieve documents for a user’s multiple queries. Each query starts a search iteration and aims to fulfill part of the user’s information need. Modeling search engine’s explorations in dynamic search serve to help search engines explore in the information space, retrieve relevant documents and fulfill the user’s information need. Previous work employs topic modeling, such as Latent Dirichlet Allocation (LDA) to fulfill the user’s information need. In each iteration, the approach discovers potential topics of the user’s information need, and diversifies the search result by retrieving documents covering these topics. This thesis proposes to structure the user’s information need as an ontology (a topic hierarchy for knowledge representation) and to utilize topic transitions on the ontology to model search engine’s explorations in dynamic search. The ontology presents a clear landscape for search engine’s explorations and improves the effectiveness and efficiency of the user’s information seeking. The ontology can be obtained through extra resources, such as Wikipedia, or built on top of topic construction algorithms, such as nomothetic concept hierarchy construction method. In this thesis, we presume the ontology is presented to the search engine and focus on how the search engine efficiently achieves topic transitions on the ontology. Analogizing the search engine’s explorations on an ontology to a robot’s explorations in a world, we model the search engine’s explorations in dynamic search as a Reinforcement Learning (RL) problem and aim to learn a policy to optimize the topic transitions. We apply Multi- Armed Bandit (MAB) and Partially Observable Markov Decision Process (POMDP) to learn the search engine’s policy. We evaluate the model using the most recent Text REtrieval Conference Dynamic Domain track (TREC DD 2015) datasets. The result shows that our model is highly effective.

    See publication

Courses

  • Advanced Algorithms

    COSC-540

  • Advanced Database

    COSC-580

  • Computer Hardware & System Architecture

    COSC-520

  • Information Retrieval

    COSC-488

  • Introduction to Data Analytics

    ANLY-501

  • Machine Learning

    COSC-575

  • Research Tutorial

    COSC-901

  • Thesis

    COSC-999

  • Topics in Computer Security

    COSC-730

  • Web Search and Sense Making

    COSC-589

Projects

  • Anicademy

    - Cofounded Anicademy, a 3D AI character platform for manga and fiction creators, focusing on advanced tools like 3D modeling and voice creation.
    - Led the technology strategy and development. Designed the memory system, integrated Text-to-Speech and Speech-to-Text models and used Elasticsearch as the vector database.
    - Achieved an end-to-end communication time of under 1.5 seconds, outperforming 99% of competitors.

  • StockX - Notification Platform

    -

    - Led the Notification Platform project at StockX, guiding a team of 4 engineers and a project manager.
    - Designed and implemented a microservice and event-driven architecture, improving the platform's scalability.
    - Collaborated effectively with over 10 teams, initiating the migration of 20+ notification types and planning for an
    additional 100+, targeting the capacity to manage 10 million notifications daily.
    - Streamlined development, cutting time for new notification…

    - Led the Notification Platform project at StockX, guiding a team of 4 engineers and a project manager.
    - Designed and implemented a microservice and event-driven architecture, improving the platform's scalability.
    - Collaborated effectively with over 10 teams, initiating the migration of 20+ notification types and planning for an
    additional 100+, targeting the capacity to manage 10 million notifications daily.
    - Streamlined development, cutting time for new notification integrations by 66%, from one month to 1.5 weeks.

  • StockX - Product Feeds

    -

    - Headed the Product Feeds project, leading 2 engineers and a product manager to develop real-time product feeds.
    - Achieved seamless integration of StockX's product pricing data with Google Shopping for instant updates, and implemented 1-day delayed pricing feeds to social media platforms like Facebook and Snapchat.
    - Played a pivotal role in driving 38% of total website traffic to StockX using these product feeds.

  • Google Cloud - Cloud SQL Backend Support

    -

    - Spearheaded the design and execution of a Cloud SQL Blackout Window for maintenance activities, ensuring minimal disruption to services.
    - Designed and support minor version migrations for Cloud SQL database. On average, a Cloud SQL instance experiences less than 60 seconds of downtime during migration.

  • Pure Storage - Time-Series Database (TSDB): CaerusDB

    -

    CaerusDB is part of Pure1 Monetization project, to support pay tier customers to save metrics data for 3 years, consisting of: Kairos serving online requests, Cassandra cluster for hot data, Kafka + workers to batch & aggregate data and a cold storage s3. CaerusDB takes traffic of 3.6 trillion data points per hour and support to save up to 10PB data.

    - Designed the database schema and mechanism to split hot storage (Cassandra) and cold storage (s3).
    - Designed and implemented the…

    CaerusDB is part of Pure1 Monetization project, to support pay tier customers to save metrics data for 3 years, consisting of: Kairos serving online requests, Cassandra cluster for hot data, Kafka + workers to batch & aggregate data and a cold storage s3. CaerusDB takes traffic of 3.6 trillion data points per hour and support to save up to 10PB data.

    - Designed the database schema and mechanism to split hot storage (Cassandra) and cold storage (s3).
    - Designed and implemented the metrics pipeline, using Kafka, to batch metrics received from REST call and save them in the cold storage, and to aggregate metrics based on requirements in REST call.

  • Pure Storage - Active Management

    -

    Active management aims to help customers to manage their On-Premise devices from cloud side. It has 4 components: a workflow engine, a Kafka cluster as message channel, a security component to ensure authorization and authentication, an agent in On-Premise device to execute tasks.
    - Designed the active management flow from cloud to On-Premise devices.
    - Designed and implemented workflow service on top of AWS Step Functions to allow customers to do active management for On-Premise…

    Active management aims to help customers to manage their On-Premise devices from cloud side. It has 4 components: a workflow engine, a Kafka cluster as message channel, a security component to ensure authorization and authentication, an agent in On-Premise device to execute tasks.
    - Designed the active management flow from cloud to On-Premise devices.
    - Designed and implemented workflow service on top of AWS Step Functions to allow customers to do active management for On-Premise devices, i.e., restoring a snapshot from NFS to FlashArray

  • Pure Storage - Snapshot Catalog

    -

    Snapshot Catalog aims to provide a global view of snapshots for customers to help them manage data protection module

    - Designed and built backend for supporting snapshot catalog in cloud, accounting for device local volume snapshots, protection group local/remote snapshots, to-NFS snapshots and non-pure snapshots.

  • Pure Storage - Single-Sign-On (SSO)

    -

    Integrated Pure1 login flow with Okta and Auth0 to provide single sign on so that customers can integrate it with their Active Directory Certificate Services and have smaller granularity of access control

  • Pure Storage - Backend Support for Flashblade

    -

Honors & Awards

  • Computer Science Master Student Scholarship

    Department of Computer Science, Georgetown University

Languages

  • English

    Full professional proficiency

  • Chinese

    Native or bilingual proficiency

More activity by Hongkai

View Hongkai’s full profile

  • See who you know in common
  • Get introduced
  • Contact Hongkai directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Hongkai Wu in United States

Add new skills with these courses