Publication
Journal Articles
* denotes corresponding author
| J11 |
Efficient EMD-based Similarity Search via Batch Pruning and Incremental ComputationLink Code |
| J10 |
Developing Big-Data Application as Queries: an Aggregate-Based approachLink |
| J9 |
Clustering Enhanced Error-tolerant Top-k Spatio-textual SearchLink |
| J8 |
Formal Semantics and High Performance in Declarative Machine Learning using DatalogLink PDF Code |
| J7 |
Deep Entity Matching: Challenges and OpportunitiesLink |
| J6 |
Boosting Approximate Dictionary-based Entity Extraction with SynonymsLink |
| J5 |
Spatiotemporal Activity Modeling via Hierarchical Cross-Modal EmbeddingLink |
| J4 |
Large-scale Frequent Episode Mining from Complex Event Sequences with HierarchiesLink PDF |
| J3 |
A Transformation-based Framework for KNN Set Similarity SearchLink PDF(full version) Extended Abstract |
| J2 |
Mining Precise-positioning Episode Rules from Event SequencesLink |
| J1 |
A unified framework for string similarity search with edit-distance constraintLink |
Conference Papers
| C36 |
PORCA: Root Cause Analysis with Partially Observed DataLink |
| C35 |
CENTS: A Flexible and Cost-Effective Framework for LLM-Based Table UnderstandingLink Code |
| C34 |
LakeVisage: Towards Scalable, Flexible and Interactive Visualization Recommendation for Data Discovery over Data LakesLink PDF(arxiv) Code |
| C33 |
Fairness-aware Data Preparation for Entity MatchingLink |
| C32 |
Boosting the Adversarial Robustness of Graph Neural Networks: An OOD PerspectiveLink |
| C31 |
Deep Dirichlet Process Mixture Model for Non-parametric Trajectory ClusteringLink |
| C30 |
Watchog: A Light-weight Contrastive Learning based Framework for Column AnnotationLink Code |
| C29 |
Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation LearningLink PDF(arxiv) Code |
| C28 |
Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and PreparationLink PDF(arxiv) Code |
| C27 |
MSDR: Multi-Step Dependency Relation Networks for Spatial Temporal ForecastingLink Code |
| C26 |
Optimizing Parallel Recursive Datalog Evaluation on Multicore MachinesLink PDF Slides |
| C25 |
Highly Efficient String Similarity Search and Join over Compressed IndexesLink |
| C24 |
Machamp: A Generalized Entity Matching BenchmarkLink PDF(arxiv) Resource Blog |
| C23 |
A Graph-based Approach for Trajectory Similarity Computation in Spatial NetworksLink |
| C22 |
Updatable Learned Index with Precise PositionsLink Code |
| C21 |
KDDLog: Performance and Scalability in Knowledge Discovery by Declarative Queries with AggregatesLink |
| C20 |
Revisiting Data Prefetching for Database Systems with Machine Learning TechniquesLink |
| C19 |
Discovering Subsequence Patterns for Next POI RecommendationLink |
| C18 |
Fast Error-tolerant Location-aware Query AutocompletionLink PDF |
| C17 |
BigData Applications from Graph Analytics to Machine Learning by Aggregates in RecursionLink |
| C16 |
Learn Smart with Less: Building Better Online Decision Trees with Fewer Training ExamplesLink |
| C15 |
Hierarchical Inter-Attention Network for Document Classification with Multi-Task LearningLink |
| C14 |
MF-Join: Efficient Fuzzy String Similarity Join with Multi-level FilteringLink PDF |
| C13 |
Scalable Metric Similarity Join using MapReduceLink |
| C12 |
A Hierarchical Framework for Top-k Location-aware Error-tolerant Keyword SearchLink |
| C11 |
An Efficient Sliding Window Approach for Approximate Entity Extraction with SynonymsLink PDF Slides |
| C10 |
Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven AttentionLink |
| C9 |
Modeling Patient Visit Using Electronic Medical Records for Cost Profile EstimationLink |
| C8 |
Combining Knowledge with Deep Convolutional Neural Networks for Short Text ClassificationLink PDF |
| C7 |
An Efficient Framework for Exact Set Similarity Search using Tree Structure IndexesLink |
| C6 |
Mining Precise-positioning Episode Rules from Event SequencesLink |
| C5 |
Two Birds with One Stone: An Efficient Hierarchical Framework for Top-k and Threshold-based String Similarity SearchLink PDF Slides Code Poster |
| C4 |
A Cost-aware Buffer Management Policy for Flash-based Storage DevicesLink |
| C3 |
TL: A High Performance Buffer Replacement Strategy for Read-Write Splitting Web ApplicationsLink |
| C2 |
A New Plug-in System Supporting Very Large Digital LibraryLink PDF |
| C1 |
pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data AnalysisLink PDF |
Others (Demo, Workshop etc.)
| O11 |
MageSQL: Enhancing In-context Learning for Text-to-SQL Applications with Large Language ModelsLink |
| O10 |
Demonstration of a Multi-agent Framework for Text to SQL Applications with Large Language ModelsLink |
| O9 |
Causal Discovery from Temporal DataLink Website |
| O8 |
Table Discovery in Data Lakes: State-of-the-art and Future DirectionsLink Website |
| O7 |
Demonstration of LogicLib: An Expressive Multi-Language Interface over Scalable Datalog SystemLink |
| O6 |
Machop: an End-to-End Generalized Entity Matching FrameworkLink PDF(arxiv) |
| O5 |
Minun: Evaluating Counterfactual Explanations for Entity MatchingLink PDF Slides Code |
| O4 |
RaSQL: A Powerful Language and its System for Big Data ApplicationsLink PDF |
| O3 |
Synergy of Database Techniques and Machine Learning Models for String Similarity Search and JoinLink Proposal(full version) Website |
| O2 |
Distributed Query Engine for Multiple-Query Optimization over Data StreamLink |
| O1 |
Ranking Support for Matched Patterns over Complex Event Streams: the CEP-R SystemLink |