Computer Science
New submissions
[ showing up to 2000 entries per page: fewer | more ]
New submissions for Thu, 28 Mar 24
- [1] arXiv:2403.17938 [pdf, other]
-
Title: Circuit-centric Genetic Algorithm (CGA) for Analog and Radio-Frequency Circuit OptimizationComments: 15 pages, 6 figures, submission to Circuits, Systems and Signal ProcessingSubjects: Neural and Evolutionary Computing (cs.NE); Systems and Control (eess.SY)
This paper presents an automated method for optimizing parameters in analog/high-frequency circuits, aiming to maximize performance parameters of a radio-frequency (RF) receiver. The design target includes a reduction of power consumption and noise figure and an increase in conversion gain. This study investigates the use of an artificial algorithm for the optimization of a receiver, illustrating how to fulfill the performance parameters with diverse circuit parameters. To overcome issues observed in the traditional Genetic Algorithm (GA), the concept of the Circuit-centric Genetic Algorithm (CGA) is proposed as a viable approach. The new method adopts an inference process that is simpler and computationally more efficient than the existing deep learning models. In addition, CGA offers significant advantages over manual design of finding optimal points and the conventional GA, mitigating the designer's workload while searching for superior optimum points.
- [2] arXiv:2403.17939 [pdf, ps, other]
-
Title: Matrix Domination: Convergence of a Genetic Algorithm Metaheuristic with the Wisdom of Crowds to Solve the NP-Complete ProblemAuthors: Shane Storm StrachanComments: 8 pagesSubjects: Neural and Evolutionary Computing (cs.NE)
This research explores the application of a genetic algorithm metaheuristic enriched by the wisdom of crowds in order to address the NP-Complete matrix domination problem (henceforth: TMDP) which is itself a constraint on related problems applied in graphs. Matrix domination involves accurately placing a subset of cells, referred to as dominators, within a matrix with the goal of their dominating the remainder of the cells. This research integrates the exploratory nature of a genetic algorithm with the wisdom of crowds to find more optimal solutions with user-defined parameters to work within computational complexity considerations and gauge performance mainly with a fitness evaluation function and a constraining function to combat the stochastic nature of genetic algorithms. With this, I propose a novel approach to MDP with a genetic algorithm that incorporates the wisdom of crowds, emphasizing collective decision-making in the selection process, and by exploring concepts of matrix permutations and their relevance in finding optimal solutions. Results demonstrate the potential of this convergence to generate efficient solutions, optimizing the trade-off between the number of dominators and their strategic placements within the matrices while efficiently ensuring consistent and complete matrix domination.
- [3] arXiv:2403.17940 [pdf, ps, other]
-
Title: Navigating the Docker Ecosystem: A Comprehensive Taxonomy and SurveySubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
The cloud computing landscape is rapidly expanding and growing in complexity. It has witnessed the emergence of Cloud Computing as a widely adopted model for efficiently processing large volumes of data by harnessing clusters of commodity computers. This evolution enables the handling of massive data through on-demand services, relying on numerous microservices with diverse dependencies. The technology of containers ensures secure storage, allowing for largescale data processing with high scalability and portability. Container technology, particularly exemplified by Docker in the last decade, plays a pivotal role in this scenario. It empowers microservices to process data swiftly, enabling developers to dynamically scale these services in real-time. This paper initiates by establishing a comprehensive taxonomy for delineating container architecture. Focusing specifically on Docker containers, we scrutinize various existing container related literature. Through this taxonomy and survey, we not only discern similarities and disparities in the architectural approaches of Docker container technology but also pinpoint areas necessitating further research.
- [4] arXiv:2403.17942 [pdf, other]
-
Title: A Note On Lookahead In Real Life And ComputingSubjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Past, Present and Future are considered to be three temporal and logical concepts which are well defined by human beings for their existence and growth. We, as human beings, have the privilege of using our intelligence to mentally execute an activity before physical occurrence of the same in the real world. Knowledge of the past, aplomb of present and visualisation for the future correspond to three concepts such as look-back, look-at and look-ahead respectively in real life as well as in diversified domains of computing. Look-Ahead(LA) deals with the future prediction of information and processing of input to produce the output in advance. In this article, our main objective is to learn, understand and explore the concept of LA and design novel models as solution for real world problems. We present three well known algorithmic frameworks used in practice based on availability of input information such as offline, online and semi-online. We introduce interesting real life applications and well known computing problems where LA plays a significant role for making a process, system or algorithm efficient. We define new types of LA and propose a taxonomy for LA based on literature review for designing novel LA models in future. Using the concept of LA, We identify and present many interesting and non-trivial research challenges as future potential research directions. Intuitively, we observe that LA can be used as a powerful tool and framework for future researchers in design of efficient computational models and algorithms for solving non-trivial and challenging optimization problems.
- [5] arXiv:2403.17954 [pdf, other]
-
Title: Sort & Slice: A Simple and Superior Alternative to Hash-Based Folding for Extended-Connectivity FingerprintsComments: Submitted to Journal of CheminformaticsSubjects: Machine Learning (cs.LG); Chemical Physics (physics.chem-ph); Biomolecules (q-bio.BM)
Extended-connectivity fingerprints (ECFPs) are a ubiquitous tool in current cheminformatics and molecular machine learning, and one of the most prevalent molecular feature extraction techniques used for chemical prediction. Atom features learned by graph neural networks can be aggregated to compound-level representations using a large spectrum of graph pooling methods; in contrast, sets of detected ECFP substructures are by default transformed into bit vectors using only a simple hash-based folding procedure. We introduce a general mathematical framework for the vectorisation of structural fingerprints via a formal operation called substructure pooling that encompasses hash-based folding, algorithmic substructure-selection, and a wide variety of other potential techniques. We go on to describe Sort & Slice, an easy-to-implement and bit-collision-free alternative to hash-based folding for the pooling of ECFP substructures. Sort & Slice first sorts ECFP substructures according to their relative prevalence in a given set of training compounds and then slices away all but the $L$ most frequent substructures which are subsequently used to generate a binary fingerprint of desired length, $L$. We computationally compare the performance of hash-based folding, Sort & Slice, and two advanced supervised substructure-selection schemes (filtering and mutual-information maximisation) for ECFP-based molecular property prediction. Our results indicate that, despite its technical simplicity, Sort & Slice robustly (and at times substantially) outperforms traditional hash-based folding as well as the other investigated methods across prediction tasks, data splitting techniques, machine-learning models and ECFP hyperparameters. We thus recommend that Sort & Slice canonically replace hash-based folding as the default substructure-pooling technique to vectorise ECFPs for supervised molecular machine learning.
- [6] arXiv:2403.17958 [pdf, other]
-
Title: Deep Generative Domain Adaptation with Temporal Attention for Cross-User Activity RecognitionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
In Human Activity Recognition (HAR), a predominant assumption is that the data utilized for training and evaluation purposes are drawn from the same distribution. It is also assumed that all data samples are independent and identically distributed ($\displaystyle i.i.d.$). Contrarily, practical implementations often challenge this notion, manifesting data distribution discrepancies, especially in scenarios such as cross-user HAR. Domain adaptation is the promising approach to address these challenges inherent in cross-user HAR tasks. However, a clear gap in domain adaptation techniques is the neglect of the temporal relation embedded within time series data during the phase of aligning data distributions. Addressing this oversight, our research presents the Deep Generative Domain Adaptation with Temporal Attention (DGDATA) method. This novel method uniquely recognises and integrates temporal relations during the domain adaptation process. By synergizing the capabilities of generative models with the Temporal Relation Attention mechanism, our method improves the classification performance in cross-user HAR. A comprehensive evaluation has been conducted on three public sensor-based HAR datasets targeting different scenarios and applications to demonstrate the efficacy of the proposed DGDATA method.
- [7] arXiv:2403.17963 [pdf, other]
-
Title: A better compression driver? CutFEM 3D shape optimization taking viscothermal losses into accountSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
The compression driver, the standard sound source for midrange acoustic horns, contains a cylindrical compression chamber connected to the horn throat through a system of channels known as a phase plug. The main challenge in the design of the phase plug is to avoid resonance and interference phenomena. The complexity of these phenomena makes it difficult to carry out this design task manually, particularly when the phase-plug channels are radially oriented. Therefore, we employ an algorithmic technique that combines numerical solutions of the governing equations with a gradient-based optimization algorithm that can deform the walls of the phase plug. A particular modeling challenge here is that viscothermal losses cannot be ignored, due to narrow chambers and slits in the device. Fortunately, a recently developed, accurate, but computationally inexpensive boundary-layer model is applicable. We use this model, a level-set geometry description, and the Cut Finite Element technique to avoid mesh changes when the geometry is modified by the optimization algorithm. Moreover, the shape calculus needed to compute derivatives for the optimization algorithm is carried out in the fully discrete case. Applying these techniques, the algorithm was able to successfully design the shape of a set of radially-directed phase plugs so that the final frequency response surprisingly closely matches the ideal response, derived by a lumped circuit model where wave interference effects are not accounted for. This result may serve to resuscitate the radial phase plug design, rarely used in today's commercial compression drivers.
- [8] arXiv:2403.17969 [pdf, other]
-
Title: Antimagic Labeling of Graphs Using Prime NumbersComments: 11 pages, 15 figuresSubjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
Graph labeling is a technique that assigns unique labels or weights to the vertices or edges of a graph, often used to analyze and solve various graph-related problems. There are few methods with certain limitations conducted by researchers previously on this topic. This research paper focuses on antimagic labeling of different types of graphs and trees. It entails the assignment of distinct prime values to edges in a manner that ensures the cumulative sum of edge labels at each vertex remains unique. This research proposes a conjecture on antimagic labeling of any graphs and proves two theories. Firstly, we tried to give weights to the edges randomly, as some exceptions are faced in particular phases in this way, we followed a whole new way to mitigate this problem. This research paper demonstrates computational and mathematical verification to prove that antimagic labeling of any perfect binary tree and complete graph is possible.
- [9] arXiv:2403.17978 [pdf, other]
-
Title: Holographic Global Convolutional Networks for Long-Range Prediction Tasks in Malware DetectionComments: To appear in Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, SpainSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Malware detection is an interesting and valuable domain to work in because it has significant real-world impact and unique machine-learning challenges. We investigate existing long-range techniques and benchmarks and find that they're not very suitable in this problem area. In this paper, we introduce Holographic Global Convolutional Networks (HGConv) that utilize the properties of Holographic Reduced Representations (HRR) to encode and decode features from sequence elements. Unlike other global convolutional methods, our method does not require any intricate kernel computation or crafted kernel design. HGConv kernels are defined as simple parameters learned through backpropagation. The proposed method has achieved new SOTA results on Microsoft Malware Classification Challenge, Drebin, and EMBER malware benchmarks. With log-linear complexity in sequence length, the empirical results demonstrate substantially faster run-time by HGConv compared to other methods achieving far more efficient scaling even with sequence length $\geq 100,000$.
- [10] arXiv:2403.17980 [pdf, other]
-
Title: EG-ConMix: An Intrusion Detection Method based on Graph Contrastive LearningAuthors: Lijin Wu, Shanshan Lei, Feilong Liao, Yuanjun Zheng, Yuxin Liu, Wentao Fu, Hao Song, Jiajun ZhouSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
As the number of IoT devices increases, security concerns become more prominent. The impact of threats can be minimized by deploying Network Intrusion Detection System (NIDS) by monitoring network traffic, detecting and discovering intrusions, and issuing security alerts promptly. Most intrusion detection research in recent years has been directed towards the pair of traffic itself without considering the interrelationships among them, thus limiting the monitoring of complex IoT network attack events. Besides, anomalous traffic in real networks accounts for only a small fraction, which leads to a severe imbalance problem in the dataset that makes algorithmic learning and prediction extremely difficult. In this paper, we propose an EG-ConMix method based on E-GraphSAGE, incorporating a data augmentation module to fix the problem of data imbalance. In addition, we incorporate contrastive learning to discern the difference between normal and malicious traffic samples, facilitating the extraction of key features. Extensive experiments on two publicly available datasets demonstrate the superior intrusion detection performance of EG-ConMix compared to state-of-the-art methods. Remarkably, it exhibits significant advantages in terms of training speed and accuracy for large-scale graphs.
- [11] arXiv:2403.17983 [pdf, other]
-
Title: Is Watermarking LLM-Generated Code Robust?Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
We present the first study of the robustness of existing watermarking techniques on Python code generated by large language models. Although existing works showed that watermarking can be robust for natural language, we show that it is easy to remove these watermarks on code by semantic-preserving transformations.
- [12] arXiv:2403.17993 [pdf, other]
-
Title: Mixing Artificial and Natural Intelligence: From Statistical Mechanics to AI and Back to TurbulenceComments: 35 pages, 9 figuresSubjects: Machine Learning (cs.LG); Statistical Mechanics (cond-mat.stat-mech); Artificial Intelligence (cs.AI); Fluid Dynamics (physics.flu-dyn)
The paper reflects on the future role of AI in scientific research, with a special focus on turbulence studies, and examines the evolution of AI, particularly through Diffusion Models rooted in non-equilibrium statistical mechanics. It underscores the significant impact of AI on advancing reduced, Lagrangian models of turbulence through innovative use of deep neural networks. Additionally, the paper reviews various other AI applications in turbulence research and outlines potential challenges and opportunities in the concurrent advancement of AI and statistical hydrodynamics. This discussion sets the stage for a future where AI and turbulence research are intricately intertwined, leading to more profound insights and advancements in both fields.
- [13] arXiv:2403.17994 [pdf, other]
-
Title: Solution for Point Tracking Task of ICCV 1st Perception Test Challenge 2023Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This report proposes an improved method for the Tracking Any Point (TAP) task, which tracks any physical surface through a video. Several existing approaches have explored the TAP by considering the temporal relationships to obtain smooth point motion trajectories, however, they still suffer from the cumulative error caused by temporal prediction. To address this issue, we propose a simple yet effective approach called TAP with confident static points (TAPIR+), which focuses on rectifying the tracking of the static point in the videos shot by a static camera. To clarify, our approach contains two key components: (1) Multi-granularity Camera Motion Detection, which could identify the video sequence by the static camera shot. (2) CMR-based point trajectory prediction with one moving object segmentation approach to isolate the static point from the moving object. Our approach ranked first in the final test with a score of 0.46.
- [14] arXiv:2403.17995 [pdf, other]
-
Title: Semi-Supervised Image Captioning Considering Wasserstein Graph MatchingAuthors: Yang YangSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Image captioning can automatically generate captions for the given images, and the key challenge is to learn a mapping function from visual features to natural language features. Existing approaches are mostly supervised ones, i.e., each image has a corresponding sentence in the training set. However, considering that describing images always requires a huge of manpower, we usually have limited amount of described images (i.e., image-text pairs) and a large number of undescribed images in real-world applications. Thereby, a dilemma is the "Semi-Supervised Image Captioning". To solve this problem, we propose a novel Semi-Supervised Image Captioning method considering Wasserstein Graph Matching (SSIC-WGM), which turns to adopt the raw image inputs to supervise the generated sentences. Different from traditional single modal semi-supervised methods, the difficulty of semi-supervised cross-modal learning lies in constructing intermediately comparable information among heterogeneous modalities. In this paper, SSIC-WGM adopts the successful scene graphs as intermediate information, and constrains the generated sentences from two aspects: 1) inter-modal consistency. SSIC-WGM constructs the scene graphs of the raw image and generated sentence respectively, then employs the wasserstein distance to better measure the similarity between region embeddings of different graphs. 2) intra-modal consistency. SSIC-WGM takes the data augmentation techniques for the raw images, then constrains the consistency among augmented images and generated sentences. Consequently, SSIC-WGM combines the cross-modal pseudo supervision and structure invariant measure for efficiently using the undescribed images, and learns more reasonable mapping function.
- [15] arXiv:2403.17998 [pdf, other]
-
Title: Text Is MASS: Modeling as Stochastic Embedding for Text-Video RetrievalAuthors: Jiamian Wang, Guohao Sun, Pichao Wang, Dongfang Liu, Sohail Dianat, Majid Rabbani, Raghuveer Rao, Zhiqiang TaoComments: Accepted by CVPR 2024, code and model are available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
The increasing prevalence of video clips has sparked growing interest in text-video retrieval. Recent advances focus on establishing a joint embedding space for text and video, relying on consistent embedding representations to compute similarity. However, the text content in existing datasets is generally short and concise, making it hard to fully describe the redundant semantics of a video. Correspondingly, a single text embedding may be less expressive to capture the video embedding and empower the retrieval. In this study, we propose a new stochastic text modeling method T-MASS, i.e., text is modeled as a stochastic embedding, to enrich text embedding with a flexible and resilient semantic range, yielding a text mass. To be specific, we introduce a similarity-aware radius module to adapt the scale of the text mass upon the given text-video pairs. Plus, we design and develop a support text regularization to further control the text mass during the training. The inference pipeline is also tailored to fully exploit the text mass for accurate retrieval. Empirical evidence suggests that T-MASS not only effectively attracts relevant text-video pairs while distancing irrelevant ones, but also enables the determination of precise text embeddings for relevant pairs. Our experimental results show a substantial improvement of T-MASS over baseline (3% to 6.3% by R@1). Also, T-MASS achieves state-of-the-art performance on five benchmark datasets, including MSRVTT, LSMDC, DiDeMo, VATEX, and Charades.
- [16] arXiv:2403.18015 [pdf, other]
-
Title: A Constructive Method for Designing Safe Multirate Controllers for Differentially-Flat SystemsAuthors: Devansh R. Agrawal, Hardik Parwana, Ryan K. Cosner, Ugo Rosolia, Aaron D. Ames, Dimitra PanagouComments: 6 pages, 3 figures, accepted at IEEE Control Systems Letters 2021Journal-ref: IEEE Control Systems Letters, Vol 6, Page 2138--2143, 2021Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
We present a multi-rate control architecture that leverages fundamental properties of differential flatness to synthesize controllers for safety-critical nonlinear dynamical systems. We propose a two-layer architecture, where the high-level generates reference trajectories using a linear Model Predictive Controller, and the low-level tracks this reference using a feedback controller. The novelty lies in how we couple these layers, to achieve formal guarantees on recursive feasibility of the MPC problem, and safety of the nonlinear system. Furthermore, using differential flatness, we provide a constructive means to synthesize the multi-rate controller, thereby removing the need to search for suitable Lyapunov or barrier functions, or to approximately linearize/discretize nonlinear dynamics. We show the synthesized controller is a convex optimization problem, making it amenable to real-time implementations. The method is demonstrated experimentally on a ground rover and a quadruped robotic system.
- [17] arXiv:2403.18017 [pdf, ps, other]
-
Title: Nonsingularity of unsymmetric Kansa matrices: random collocation by MultiQuadrics and Inverse MultiQuadricsSubjects: Numerical Analysis (math.NA)
Unisolvence of unsymmetric Kansa collocation is still a substantially open problem. We prove that Kansa matrices with MultiQuadrics and Inverse MultiQuadrics for the Dirichlet problem of the Poisson equation are almost surely nonsingular, when the collocation points are chosen by any continuous random distribution in the domain interior and arbitrarily on its boundary.
- [18] arXiv:2403.18018 [pdf, ps, other]
-
Title: DORE: A Dataset For Portuguese Definition GenerationComments: Accepted to LREC-COLING 2024 (The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation)Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Definition modelling (DM) is the task of automatically generating a dictionary definition for a specific word. Computational systems that are capable of DM can have numerous applications benefiting a wide range of audiences. As DM is considered a supervised natural language generation problem, these systems require large annotated datasets to train the machine learning (ML) models. Several DM datasets have been released for English and other high-resource languages. While Portuguese is considered a mid/high-resource language in most natural language processing tasks and is spoken by more than 200 million native speakers, there is no DM dataset available for Portuguese. In this research, we fill this gap by introducing DORE; the first dataset for Definition MOdelling for PoRtuguEse containing more than 100,000 definitions. We also evaluate several deep learning based DM models on DORE and report the results. The dataset and the findings of this paper will facilitate research and study of Portuguese in wider contexts.
- [19] arXiv:2403.18021 [pdf, other]
-
Title: A Study on the Use of Simulation in Synthesizing Path-Following Control Policies for Autonomous Ground RobotsAuthors: Harry Zhang, Stefan Caldararu, Aaron Young, Alexis Ruiz, Huzaifa Unjhawala, Ishaan Mahajan, Sriram Ashokkumar, Nevindu Batagoda, Zhenhao Zhou, Luning Bakke, Dan NegrutComments: 8 pages, 7 figuresSubjects: Robotics (cs.RO)
We report results obtained and insights gained while answering the following question: how effective is it to use a simulator to establish path following control policies for an autonomous ground robot? While the quality of the simulator conditions the answer to this question, we found that for the simulation platform used herein, producing four control policies for path planning was straightforward once a digital twin of the controlled robot was available. The control policies established in simulation and subsequently demonstrated in the real world are PID control, MPC, and two neural network (NN) based controllers. Training the two NN controllers via imitation learning was accomplished expeditiously using seven simple maneuvers: follow three circles clockwise, follow the same circles counter-clockwise, and drive straight. A test randomization process that employs random micro-simulations is used to rank the ``goodness'' of the four control policies. The policy ranking noted in simulation correlates well with the ranking observed when the control policies were tested in the real world. The simulation platform used is publicly available and BSD3-released as open source; a public Docker image is available for reproducibility studies. It contains a dynamics engine, a sensor simulator, a ROS2 bridge, and a ROS2 autonomy stack the latter employed both in the simulator and the real world experiments.
- [20] arXiv:2403.18024 [pdf, other]
-
Title: Enriching Word Usage Graphs with Cluster DefinitionsComments: LREC-COLING 2024Subjects: Computation and Language (cs.CL)
We present a dataset of word usage graphs (WUGs), where the existing WUGs for multiple languages are enriched with cluster labels functioning as sense definitions. They are generated from scratch by fine-tuned encoder-decoder language models. The conducted human evaluation has shown that these definitions match the existing clusters in WUGs better than the definitions chosen from WordNet by two baseline systems. At the same time, the method is straightforward to use and easy to extend to new languages. The resulting enriched datasets can be extremely helpful for moving on to explainable semantic change modeling.
- [21] arXiv:2403.18025 [pdf, other]
-
Title: Improving Pre-trained Language Model Sensitivity via Mask Specific losses: A case study on Biomedical NERComments: Paper alrerady accepted for publishing by the NAACL 2024 conference (main conference paper)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Adapting language models (LMs) to novel domains is often achieved through fine-tuning a pre-trained LM (PLM) on domain-specific data. Fine-tuning introduces new knowledge into an LM, enabling it to comprehend and efficiently perform a target domain task. Fine-tuning can however be inadvertently insensitive if it ignores the wide array of disparities (e.g in word meaning) between source and target domains. For instance, words such as chronic and pressure may be treated lightly in social conversations, however, clinically, these words are usually an expression of concern. To address insensitive fine-tuning, we propose Mask Specific Language Modeling (MSLM), an approach that efficiently acquires target domain knowledge by appropriately weighting the importance of domain-specific terms (DS-terms) during fine-tuning. MSLM jointly masks DS-terms and generic words, then learns mask-specific losses by ensuring LMs incur larger penalties for inaccurately predicting DS-terms compared to generic words. Results of our analysis show that MSLM improves LMs sensitivity and detection of DS-terms. We empirically show that an optimal masking rate not only depends on the LM, but also on the dataset and the length of sequences. Our proposed masking strategy outperforms advanced masking strategies such as span- and PMI-based masking.
- [22] arXiv:2403.18028 [pdf, other]
-
Title: Predicting species occurrence patterns from partial observationsComments: Tackling Climate Change with Machine Learning workshop at ICLR 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Populations and Evolution (q-bio.PE)
To address the interlinked biodiversity and climate crises, we need an understanding of where species occur and how these patterns are changing. However, observational data on most species remains very limited, and the amount of data available varies greatly between taxonomic groups. We introduce the problem of predicting species occurrence patterns given (a) satellite imagery, and (b) known information on the occurrence of other species. To evaluate algorithms on this task, we introduce SatButterfly, a dataset of satellite images, environmental data and observational data for butterflies, which is designed to pair with the existing SatBird dataset of bird observational data. To address this task, we propose a general model, R-Tran, for predicting species occurrence patterns that enables the use of partial observational data wherever found. We find that R-Tran outperforms other methods in predicting species encounter rates with partial information both within a taxon (birds) and across taxa (birds and butterflies). Our approach opens new perspectives to leveraging insights from species with abundant data to other species with scarce data, by modelling the ecosystems in which they co-occur.
- [23] arXiv:2403.18031 [pdf, other]
-
Title: The Impact of Syntactic and Semantic Proximity on Machine Translation with Back-TranslationSubjects: Computation and Language (cs.CL)
Unsupervised on-the-fly back-translation, in conjunction with multilingual pretraining, is the dominant method for unsupervised neural machine translation. Theoretically, however, the method should not work in general. We therefore conduct controlled experiments with artificial languages to determine what properties of languages make back-translation an effective training method, covering lexical, syntactic, and semantic properties. We find, contrary to popular belief, that (i) parallel word frequency distributions, (ii) partially shared vocabulary, and (iii) similar syntactic structure across languages are not sufficient to explain the success of back-translation. We show however that even crude semantic signal (similar lexical fields across languages) does improve alignment of two languages through back-translation. We conjecture that rich semantic dependencies, parallel across languages, are at the root of the success of unsupervised methods based on back-translation. Overall, the success of unsupervised machine translation was far from being analytically guaranteed. Instead, it is another proof that languages of the world share deep similarities, and we hope to show how to identify which of these similarities can serve the development of unsupervised, cross-linguistic tools.
- [24] arXiv:2403.18033 [pdf, other]
-
Title: SpectralWaste Dataset: Multimodal Data for Waste Sorting AutomationAuthors: Sara Casao, Fernando Peña, Alberto Sabater, Rosa Castillón, Darío Suárez, Eduardo Montijano, Ana C. MurilloSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
The increase in non-biodegradable waste is a worldwide concern. Recycling facilities play a crucial role, but their automation is hindered by the complex characteristics of waste recycling lines like clutter or object deformation. In addition, the lack of publicly available labeled data for these environments makes developing robust perception systems challenging. Our work explores the benefits of multimodal perception for object segmentation in real waste management scenarios. First, we present SpectralWaste, the first dataset collected from an operational plastic waste sorting facility that provides synchronized hyperspectral and conventional RGB images. This dataset contains labels for several categories of objects that commonly appear in sorting plants and need to be detected and separated from the main trash flow for several reasons, such as security in the management line or reuse. Additionally, we propose a pipeline employing different object segmentation architectures and evaluate the alternatives on our dataset, conducting an extensive analysis for both multimodal and unimodal alternatives. Our evaluation pays special attention to efficiency and suitability for real-time processing and demonstrates how HSI can bring a boost to RGB-only perception in these realistic industrial settings without much computational overhead.
- [25] arXiv:2403.18035 [pdf, other]
-
Title: Bidirectional Consistency ModelsComments: 40 pages, 25 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Diffusion models (DMs) are capable of generating remarkably high-quality samples by iteratively denoising a random vector, a process that corresponds to moving along the probability flow ordinary differential equation (PF ODE). Interestingly, DMs can also invert an input image to noise by moving backward along the PF ODE, a key operation for downstream tasks such as interpolation and image editing. However, the iterative nature of this process restricts its speed, hindering its broader application. Recently, Consistency Models (CMs) have emerged to address this challenge by approximating the integral of the PF ODE, thereby bypassing the need to iterate. Yet, the absence of an explicit ODE solver complicates the inversion process. To resolve this, we introduce the Bidirectional Consistency Model (BCM), which learns a single neural network that enables both forward and backward traversal along the PF ODE, efficiently unifying generation and inversion tasks within one framework. Notably, our proposed method enables one-step generation and inversion while also allowing the use of additional steps to enhance generation quality or reduce reconstruction error. Furthermore, by leveraging our model's bidirectional consistency, we introduce a sampling strategy that can enhance FID while preserving the generated image content. We further showcase our model's capabilities in several downstream tasks, such as interpolation and inpainting, and present demonstrations of potential applications, including blind restoration of compressed images and defending black-box adversarial attacks.
- [26] arXiv:2403.18036 [pdf, other]
-
Title: Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene AffordanceAuthors: Zan Wang, Yixin Chen, Baoxiong Jia, Puhao Li, Jinlu Zhang, Jingze Zhang, Tengyu Liu, Yixin Zhu, Wei Liang, Siyuan HuangComments: CVPR 2024; 16 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Despite significant advancements in text-to-motion synthesis, generating language-guided human motion within 3D environments poses substantial challenges. These challenges stem primarily from (i) the absence of powerful generative models capable of jointly modeling natural language, 3D scenes, and human motion, and (ii) the generative models' intensive data requirements contrasted with the scarcity of comprehensive, high-quality, language-scene-motion datasets. To tackle these issues, we introduce a novel two-stage framework that employs scene affordance as an intermediate representation, effectively linking 3D scene grounding and conditional motion generation. Our framework comprises an Affordance Diffusion Model (ADM) for predicting explicit affordance map and an Affordance-to-Motion Diffusion Model (AMDM) for generating plausible human motions. By leveraging scene affordance maps, our method overcomes the difficulty in generating human motion under multimodal condition signals, especially when training with limited data lacking extensive language-scene-motion pairs. Our extensive experiments demonstrate that our approach consistently outperforms all baselines on established benchmarks, including HumanML3D and HUMANISE. Additionally, we validate our model's exceptional generalization capabilities on a specially curated evaluation set featuring previously unseen descriptions and scenes.
- [27] arXiv:2403.18038 [pdf, ps, other]
-
Title: TGGLinesPlus: A robust topological graph-guided computer vision algorithm for line detection from imagesComments: Our TGGLinesPlus Python implementation is open source. 27 pages, 8 figures and 4 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Line detection is a classic and essential problem in image processing, computer vision and machine intelligence. Line detection has many important applications, including image vectorization (e.g., document recognition and art design), indoor mapping, and important societal challenges (e.g., sea ice fracture line extraction from satellite imagery). Many line detection algorithms and methods have been developed, but robust and intuitive methods are still lacking. In this paper, we proposed and implemented a topological graph-guided algorithm, named TGGLinesPlus, for line detection. Our experiments on images from a wide range of domains have demonstrated the flexibility of our TGGLinesPlus algorithm. We also benchmarked our algorithm with five classic and state-of-the-art line detection methods and the results demonstrate the robustness of TGGLinesPlus. We hope our open-source implementation of TGGLinesPlus will inspire and pave the way for many applications where spatial science matters.
- [28] arXiv:2403.18040 [pdf, other]
-
Title: Global Point Cloud Registration Network for Large TransformationsAuthors: Hanz Cuevas-Velasquez, Alejandro Galán-Cuenca, Antonio Javier Gallego, Marcelo Saval-Calvo, Robert B. FisherSubjects: Computer Vision and Pattern Recognition (cs.CV)
Three-dimensional data registration is an established yet challenging problem that is key in many different applications, such as mapping the environment for autonomous vehicles, and modeling objects and people for avatar creation, among many others. Registration refers to the process of mapping multiple data into the same coordinate system by means of matching correspondences and transformation estimation. Novel proposals exploit the benefits of deep learning architectures for this purpose, as they learn the best features for the data, providing better matches and hence results. However, the state of the art is usually focused on cases of relatively small transformations, although in certain applications and in a real and practical environment, large transformations are very common. In this paper, we present ReLaTo (Registration for Large Transformations), an architecture that faces the cases where large transformations happen while maintaining good performance for local transformations. This proposal uses a novel Softmax pooling layer to find correspondences in a bilateral consensus manner between two point sets, sampling the most confident matches. These matches are used to estimate a coarse and global registration using weighted Singular Value Decomposition (SVD). A target-guided denoising step is then applied to both the obtained matches and latent features, estimating the final fine registration considering the local geometry. All these steps are carried out following an end-to-end approach, which has been shown to improve 10 state-of-the-art registration methods in two datasets commonly used for this task (ModelNet40 and KITTI), especially in the case of large transformations.
- [29] arXiv:2403.18041 [pdf, other]
-
Title: Learning Piecewise Residuals of Control Barrier Functions for Safety of Switching Systems using Multi-Output Gaussian ProcessesComments: arXiv admin note: text overlap with arXiv:2403.09573Subjects: Systems and Control (eess.SY); Robotics (cs.RO)
Control barrier functions (CBFs) have recently been introduced as a systematic tool to ensure safety by establishing set invariance. When combined with a control Lyapunov function (CLF), they form a safety-critical control mechanism. However, the effectiveness of CBFs and CLFs is closely tied to the system model. In practice, model uncertainty can jeopardize safety and stability guarantees and may lead to undesirable performance. In this paper, we develop a safe learning-based control strategy for switching systems in the face of uncertainty. We focus on the case that a nominal model is available for a true underlying switching system. This uncertainty results in piecewise residuals for each switching surface, impacting the CLF and CBF constraints. We introduce a batch multi-output Gaussian process (MOGP) framework to approximate these piecewise residuals, thereby mitigating the adverse effects of uncertainty. A particular structure of the covariance function enables us to convert the MOGP-based chance constraints CLF and CBF into second-order cone constraints, which leads to a convex optimization. We analyze the feasibility of the resulting optimization and provide the necessary and sufficient conditions for feasibility. The effectiveness of the proposed strategy is validated through a simulation of a switching adaptive cruise control system.
- [30] arXiv:2403.18042 [pdf, other]
-
Title: Extending Network Calculus To Deal With Partially Negative And Decreasing Service CurvesComments: To be published in part in RTAS 2024Subjects: Networking and Internet Architecture (cs.NI)
Network Calculus (NC) is a versatile analytical methodology to efficiently compute performance bounds in networked systems. The arrival and service curve abstractions allow to model diverse and heterogeneous distributed systems. The operations to compute residual service curves and to concatenate sequences of systems enable an efficient and accurate calculation of per-flow timing guarantees. Yet, in some scenarios involving multiple concurrent flows at a system, the central notion of so-called min-plus service curves is too weak to still be able to compute a meaningful residual service curve. In these cases, one usually resorts to so-called strict service curves that enable the computation of per-flow bounds. However, strict service curves are restrictive: (1) there are service elements for which only min-plus service curves can be provided but not strict ones and (2) strict service curves generally have no concatenation property, i.e., a sequence of two strict systems does not yield a strict service curve. In this report, we extend NC to deal with systems only offering aggregate min-plus service curves to multiple flows. The key to this extension is the exploitation of minimal arrival curves, i.e., lower bounds on the arrival process. Technically speaking, we provide basic performance bounds (backlog and delay) for the case of negative service curves. We also discuss their accuracy and show them to be tight. In order to illustrate their usefulness we also present patterns of application of these new results for: (1) heterogeneous systems involving computation and communication resources and (2) finite buffers that are shared between multiple flows.
- [31] arXiv:2403.18051 [pdf, other]
-
Title: Supervisory Prompt TrainingSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The performance of Large Language Models (LLMs) relies heavily on the quality of prompts, which are often manually engineered and task-specific, making them costly and non-scalable. We propose a novel approach, Supervisory Prompt Training (SPT). SPT automates the generation of highly effective prompts using a dual LLM system. In this system, one LLM, the generator, performs a task while the other, the corrector, provides feedback and generates improved prompts. In contrast to earlier techniques, both the generator and corrector collaboratively and continuously improve their prompts over time. We also introduce the concept of \textit{impact scores} to measure the sentence-level effectiveness of the prompts. Our method was tested on four benchmarks, testing the level of hallucinations in LLMs. Notably, we were able to increase the accuracy of GPT-4 on GSM8K from 65.8\% to 94.1\% (28.3\% increase). SPT advances LLMs by refining prompts to enhance performance and reduce hallucinations, offering an efficient and scalable alternative to traditional model fine-tuning.
- [32] arXiv:2403.18053 [pdf, ps, other]
-
Title: Shear banding and cracking in unsaturated porous media through a nonlocal THM meshfree paradigmSubjects: Numerical Analysis (math.NA)
The thermo-hydro-mechanical of unsaturated soils plays a significant role in dynamic shear banding and fracturing. In this article, we propose a thermo-hydro-mechanical material model in the periporomechanics paradigm to model shear banding and crack triggered by temperature. Periporomechanics is a nonlocal framework for the mechanics of unsaturated soil where a length scale dictates the nonlocal interaction between material points. Periporomechanics unites continuous and discontinuous deformation and fluid flow in porous media. As a new contribution, we incorporate the thermo-hydro-mechanical material model in the periporomechanics through the correspondence principle for modeling shear banding and cracking in unsaturated porous media. The stabilized PPM correspondence principle that mitigates the multiphase zero-energy mode instability is augmented. At the global level, we have numerically implemented the periporomechanics paradigm through an explicit Lagrangian meshfree algorithm in the global level. At the local level, we impose the return mapping algorithm to implement the thermo-hydro-mechanical constitutive model. We present numerical examples to demonstrate the efficacy and robustness of proposed periporomechanics for modeling the shear banding bifurcation and crack in unsaturated porous media triggered by temperature.
- [33] arXiv:2403.18055 [pdf, other]
-
Title: Adaptive Boundary Control of the Kuramoto-Sivashinsky Equation Under Intermittent SensingComments: Submitted to AutomaticaSubjects: Systems and Control (eess.SY); Analysis of PDEs (math.AP)
We study in this paper boundary stabilization, in the L2 sense, of the one-dimensional Kuramoto-Sivashinsky equation subject to intermittent sensing. We assume that we measure the state of this spatio-temporal equation on a given spatial subdomain during certain intervals of time, while we measure the state on the remaining spatial subdomain during the remaining intervals of time. As a result, we assign a feedback law at the boundary of the spatial domain and force to zero the value of the state at the junction of the two subdomains. Throughout the study, the destabilizing coefficient is assumed to be space-dependent and bounded but unknown. Adaptive boundary controllers are designed under different assumptions on the forcing term. In particular, when the forcing term is null, we guarantee global exponential stability of the origin. Furthermore, when the forcing term is bounded and admits a known upper bound, we guarantee input-to-state stability, and only global uniform ultimate boundedness is guaranteed when the upper bound is unknown. Numerical simulations are performed to illustrate our results
- [34] arXiv:2403.18056 [pdf, other]
-
Title: Self-Clustering Hierarchical Multi-Agent Reinforcement Learning with Extensible Cooperation GraphSubjects: Artificial Intelligence (cs.AI)
Multi-Agent Reinforcement Learning (MARL) has been successful in solving many cooperative challenges. However, classic non-hierarchical MARL algorithms still cannot address various complex multi-agent problems that require hierarchical cooperative behaviors. The cooperative knowledge and policies learned in non-hierarchical algorithms are implicit and not interpretable, thereby restricting the integration of existing knowledge. This paper proposes a novel hierarchical MARL model called Hierarchical Cooperation Graph Learning (HCGL) for solving general multi-agent problems. HCGL has three components: a dynamic Extensible Cooperation Graph (ECG) for achieving self-clustering cooperation; a group of graph operators for adjusting the topology of ECG; and an MARL optimizer for training these graph operators. HCGL's key distinction from other MARL models is that the behaviors of agents are guided by the topology of ECG instead of policy neural networks. ECG is a three-layer graph consisting of an agent node layer, a cluster node layer, and a target node layer. To manipulate the ECG topology in response to changing environmental conditions, four graph operators are trained to adjust the edge connections of ECG dynamically. The hierarchical feature of ECG provides a unique approach to merge primitive actions (actions executed by the agents) and cooperative actions (actions executed by the clusters) into a unified action space, allowing us to integrate fundamental cooperative knowledge into an extensible interface. In our experiments, the HCGL model has shown outstanding performance in multi-agent benchmarks with sparse rewards. We also verify that HCGL can easily be transferred to large-scale scenarios with high zero-shot transfer success rates.
- [35] arXiv:2403.18057 [pdf, other]
-
Title: Prioritized League Reinforcement Learning for Large-Scale Heterogeneous Multiagent SystemsSubjects: Artificial Intelligence (cs.AI)
Large-scale heterogeneous multiagent systems feature various realistic factors in the real world, such as agents with diverse abilities and overall system cost. In comparison to homogeneous systems, heterogeneous systems offer significant practical advantages. Nonetheless, they also present challenges for multiagent reinforcement learning, including addressing the non-stationary problem and managing an imbalanced number of agents with different types. We propose a Prioritized Heterogeneous League Reinforcement Learning (PHLRL) method to address large-scale heterogeneous cooperation problems. PHLRL maintains a record of various policies that agents have explored during their training and establishes a heterogeneous league consisting of diverse policies to aid in future policy optimization. Furthermore, we design a prioritized policy gradient approach to compensate for the gap caused by differences in the number of different types of agents. Next, we use Unreal Engine to design a large-scale heterogeneous cooperation benchmark named Large-Scale Multiagent Operation (LSMO), which is a complex two-team competition scenario that requires collaboration from both ground and airborne agents. We use experiments to show that PHLRL outperforms state-of-the-art methods, including QTRAN and QPLEX in LSMO.
- [36] arXiv:2403.18058 [pdf, other]
-
Title: COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuningAuthors: Yuelin Bai, Xinrun Du, Yiming Liang, Yonggang Jin, Ziqiang Liu, Junting Zhou, Tianyu Zheng, Xincheng Zhang, Nuo Ma, Zekun Wang, Ruibin Yuan, Haihong Wu, Hongquan Lin, Wenhao Huang, Jiajun Zhang, Wenhu Chen, Chenghua Lin, Jie Fu, Min Yang, Shiwen Ni, Ge ZhangSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Recently, there have been significant advancements in large language models (LLMs), particularly focused on the English language. These advancements have enabled these LLMs to understand and execute complex instructions with unprecedented accuracy and fluency. However, despite these advancements, there remains a noticeable gap in the development of Chinese instruction tuning. The unique linguistic features and cultural depth of the Chinese language pose challenges for instruction tuning tasks. Existing datasets are either derived from English-centric LLMs or are ill-suited for aligning with the interaction patterns of real-world Chinese users. To bridge this gap, we introduce COIG-CQIA, a high-quality Chinese instruction tuning dataset. Our aim is to build a diverse, wide-ranging instruction-tuning dataset to better align model behavior with human interactions. To this end, we collect a high-quality human-written corpus from various sources on the Chinese Internet, including Q&A communities, Wikis, examinations, and existing NLP datasets. This corpus was rigorously filtered and carefully processed to form the COIG-CQIA dataset. Furthermore, we train models of various scales on different subsets of CQIA, following in-depth evaluation and analyses. The findings from our experiments offer valuable insights for selecting and developing Chinese instruction-tuning datasets. We also find that models trained on CQIA-Subset achieve competitive results in human assessment as well as knowledge and security benchmarks. Data are available at https://huggingface.co/datasets/m-a-p/COIG-CQIA
- [37] arXiv:2403.18059 [pdf, ps, other]
-
Title: Online Submodular Welfare Maximization Meets Post-Allocation Stochasticity and ReusabilityAuthors: Rajan UdwaniSubjects: Data Structures and Algorithms (cs.DS)
We generalize the problem of online submodular welfare maximization to incorporate a variety of new elements arising from reusability, stochastic rewards, combinatorial actions and similar features that have received significant attention in recent years. For our general formulation, we show that a non-adaptive Greedy algorithm achieves the highest possible competitive ratio against an adaptive offline benchmark in the adversarial arrival model and in the unknown IID stochastic arrival model. In addition to generalizing several previous results, this shows that, in general, adaptivity to stochastic rewards (and similar features) offers no theoretical (worst-case) benefits.
- [38] arXiv:2403.18062 [pdf, other]
-
Title: ShapeGrasp: Zero-Shot Task-Oriented Grasping with Large Language Models through Geometric DecompositionAuthors: Samuel Li, Sarthak Bhagat, Joseph Campbell, Yaqi Xie, Woojun Kim, Katia Sycara, Simon StepputtisComments: 8 pagesSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Task-oriented grasping of unfamiliar objects is a necessary skill for robots in dynamic in-home environments. Inspired by the human capability to grasp such objects through intuition about their shape and structure, we present a novel zero-shot task-oriented grasping method leveraging a geometric decomposition of the target object into simple, convex shapes that we represent in a graph structure, including geometric attributes and spatial relationships. Our approach employs minimal essential information - the object's name and the intended task - to facilitate zero-shot task-oriented grasping. We utilize the commonsense reasoning capabilities of large language models to dynamically assign semantic meaning to each decomposed part and subsequently reason over the utility of each part for the intended task. Through extensive experiments on a real-world robotics platform, we demonstrate that our grasping approach's decomposition and reasoning pipeline is capable of selecting the correct part in 92% of the cases and successfully grasping the object in 82% of the tasks we evaluate. Additional videos, experiments, code, and data are available on our project website: https://shapegrasp.github.io/.
- [39] arXiv:2403.18063 [pdf, other]
-
Title: Spectral Convolutional Transformer: Harmonizing Real vs. Complex Multi-View Spectral Operators for Vision TransformerSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
Transformers used in vision have been investigated through diverse architectures - ViT, PVT, and Swin. These have worked to improve the attention mechanism and make it more efficient. Differently, the need for including local information was felt, leading to incorporating convolutions in transformers such as CPVT and CvT. Global information is captured using a complex Fourier basis to achieve global token mixing through various methods, such as AFNO, GFNet, and Spectformer. We advocate combining three diverse views of data - local, global, and long-range dependence. We also investigate the simplest global representation using only the real domain spectral representation - obtained through the Hartley transform. We use a convolutional operator in the initial layers to capture local information. Through these two contributions, we are able to optimize and obtain a spectral convolution transformer (SCT) that provides improved performance over the state-of-the-art methods while reducing the number of parameters. Through extensive experiments, we show that SCT-C-small gives state-of-the-art performance on the ImageNet dataset and reaches 84.5\% top-1 accuracy, while SCT-C-Large reaches 85.9\% and SCT-C-Huge reaches 86.4\%. We evaluate SCT on transfer learning on datasets such as CIFAR-10, CIFAR-100, Oxford Flower, and Stanford Car. We also evaluate SCT on downstream tasks i.e. instance segmentation on the MSCOCO dataset. The project page is available on this webpage.\url{https://github.com/badripatro/sct}
- [40] arXiv:2403.18066 [pdf, ps, other]
-
Title: Path Integral Control with Rollout Clustering and Dynamic ObstaclesComments: 8 pages, 5 figures, extended version of ACC 2024 submissionSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
Model Predictive Path Integral (MPPI) control has proven to be a powerful tool for the control of uncertain systems (such as systems subject to disturbances and systems with unmodeled dynamics). One important limitation of the baseline MPPI algorithm is that it does not utilize simulated trajectories to their fullest extent. For one, it assumes that the average of all trajectories weighted by their performance index will be a safe trajectory. In this paper, multiple examples are shown where the previous assumption does not hold, and a trajectory clustering technique is presented that reduces the chances of the weighted average crossing in an unsafe region. Secondly, MPPI does not account for dynamic obstacles, so the authors put forward a novel cost function that accounts for dynamic obstacles without adding significant computation time to the overall algorithm. The novel contributions proposed in this paper were evaluated with extensive simulations to demonstrate improvements upon the state-of-the-art MPPI techniques.
- [41] arXiv:2403.18067 [pdf, other]
-
Title: State of the art applications of deep learning within tracking and detecting marine debris: A surveyComments: Review paper, 60 pages including references, 1 figure, 3 tables, 1 supplementary dataSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Deep learning techniques have been explored within the marine litter problem for approximately 20 years but the majority of the research has developed rapidly in the last five years. We provide an in-depth, up to date, summary and analysis of 28 of the most recent and significant contributions of deep learning in marine debris. From cross referencing the research paper results, the YOLO family significantly outperforms all other methods of object detection but there are many respected contributions to this field that have categorically agreed that a comprehensive database of underwater debris is not currently available for machine learning. Using a small dataset curated and labelled by us, we tested YOLOv5 on a binary classification task and found the accuracy was low and the rate of false positives was high; highlighting the importance of a comprehensive database. We conclude this survey with over 40 future research recommendations and open challenges.
- [42] arXiv:2403.18071 [pdf, other]
-
Title: From Sontag s to Cardano-Lyapunov Formula for Systems Not Affine in the Control: Convection-Enabled PDE StabilizationComments: To be presented at the 2024 American Control ConferenceSubjects: Systems and Control (eess.SY); Analysis of PDEs (math.AP)
We propose the first generalization of Sontag s universal controller to systems not affine in the control, particularly, to PDEs with boundary actuation. We assume that the system admits a control Lyapunov function (CLF) whose derivative, rather than being affine in the control, has either a depressed cubic, quadratic, or depressed quartic dependence on the control. For each case, a continuous universal controller that vanishes at the origin and achieves global exponential stability is derived. We prove our result in the context of convectionreaction-diffusion PDEs with Dirichlet actuation. We show that if the convection has a certain structure, then the L2 norm of the state is a CLF. In addition to generalizing Sontag s formula to some non-affine systems, we present the first general Lyapunov approach for boundary control of nonlinear PDEs. We illustrate our results via a numerical example.
- [43] arXiv:2403.18073 [pdf, other]
-
Title: Workflow Mini-Apps: Portable, Scalable, Tunable & Faithful Representations of Scientific WorkflowsAuthors: Ozgur Ozan Kilic, Tianle Wang, Matteo Turilli, Mikhail Titov, Andre Merzky, Line Pouchard, Shantenu JhaSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Workflows are critical for scientific discovery. However, the sophistication, heterogeneity, and scale of workflows make building, testing, and optimizing them increasingly challenging. Furthermore, their complexity and heterogeneity make performance reproducibility hard. In this paper, we propose workflow mini-apps as a tool to address the challenges in building and testing workflows while controlling the fidelity of representing realworld workflows. Workflow mini-apps are deployed and run on various HPC systems and architectures without workflow-specific constraints. We offer insight into their design and implementation, providing an analysis of their performance and reproducibility. Workflow mini-apps thus advance the science of workflows by providing simple, portable, and managed (fidelity) representations of otherwise complex and difficult-to-control real workflows.
- [44] arXiv:2403.18074 [pdf, other]
-
Title: Every Shot Counts: Using Exemplars for Repetition Counting in VideosComments: Project website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Video repetition counting infers the number of repetitions of recurring actions or motion within a video. We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same and different videos. In training, ESCounts regresses locations of high correspondence to the exemplars within the video. In tandem, our method learns a latent that encodes representations of general repetitive motions, which we use for exemplar-free, zero-shot inference. Extensive experiments over commonly used datasets (RepCount, Countix, and UCFRep) showcase ESCounts obtaining state-of-the-art performance across all three datasets. On RepCount, ESCounts increases the off-by-one from 0.39 to 0.56 and decreases the mean absolute error from 0.38 to 0.21. Detailed ablations further demonstrate the effectiveness of our method.
- [45] arXiv:2403.18079 [pdf, ps, other]
-
Title: Paths to Equilibrium in Normal-Form GamesSubjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In multi-agent reinforcement learning (MARL), agents repeatedly interact across time and revise their strategies as new data arrives, producing a sequence of strategy profiles. This paper studies sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning, where an agent who is best responding in period $t$ does not switch its strategy in the next period $t+1$. This constraint merely requires that optimizing agents do not switch strategies, but does not constrain the other non-optimizing agents in any way, and thus allows for exploration. Sequences with this property are called satisficing paths, and arise naturally in many MARL algorithms. A fundamental question about strategic dynamics is such: for a given game and initial strategy profile, is it always possible to construct a satisficing path that terminates at an equilibrium strategy? The resolution of this question has implications about the capabilities or limitations of a class of MARL algorithms. We answer this question in the affirmative for mixed extensions of finite normal-form games.%
- [46] arXiv:2403.18080 [pdf, other]
-
Title: EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose EstimationAuthors: Chenhongyi Yang, Anastasia Tkach, Shreyas Hampali, Linguang Zhang, Elliot J. Crowley, Cem KeskinComments: Tech ReportSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present EgoPoseFormer, a simple yet effective transformer-based model for stereo egocentric human pose estimation. The main challenge in egocentric pose estimation is overcoming joint invisibility, which is caused by self-occlusion or a limited field of view (FOV) of head-mounted cameras. Our approach overcomes this challenge by incorporating a two-stage pose estimation paradigm: in the first stage, our model leverages the global information to estimate each joint's coarse location, then in the second stage, it employs a DETR style transformer to refine the coarse locations by exploiting fine-grained stereo visual features. In addition, we present a deformable stereo operation to enable our transformer to effectively process multi-view features, which enables it to accurately localize each joint in the 3D world. We evaluate our method on the stereo UnrealEgo dataset and show it significantly outperforms previous approaches while being computationally efficient: it improves MPJPE by 27.4mm (45% improvement) with only 7.9% model parameters and 13.1% FLOPs compared to the state-of-the-art. Surprisingly, with proper training techniques, we find that even our first-stage pose proposal network can achieve superior performance compared to previous arts. We also show that our method can be seamlessly extended to monocular settings, which achieves state-of-the-art performance on the SceneEgo dataset, improving MPJPE by 25.5mm (21% improvement) compared to the best existing method with only 60.7% model parameters and 36.4% FLOPs.
- [47] arXiv:2403.18085 [pdf, other]
-
Title: ANOCA: AC Network-aware Optimal Curtailment Approach for Dynamic Hosting CapacitySubjects: Systems and Control (eess.SY)
With exponential growth in distributed energy resources (DERs) coupled with at-capacity distribution grid infrastructure, prosumers cannot always export all extra power to the grid without violating technical limits. Consequently, a slew of dynamic hosting capacity (DHC) algorithms have emerged for optimal utilization of grid infrastructure while maximizing export from DERs. Most of these DHC algorithms utilize the concept of operating envelopes (OE)}, where the utility gives prosumers technical power export limits, and they are free to export power within these limits. Recent studies have shown that OE-based frameworks have drawbacks, as most develop power export limits based on convex or linear grid models. As OEs must capture extreme operating conditions, both convex and linear models can violate technical limits in practice because they approximate grid physics. However, AC models are unsuitable because they may not be feasible within the whole region of OE. We propose a new two-stage optimization framework for DHC built on three-phase AC models to address the current gaps. In this approach, the prosumers first run a receding horizon multi-period optimization to identify optimal export power setpoints to communicate with the utility. The utility then performs an infeasibility-based optimization to either accept the prosumer's request or dispatch an optimal curtail signal such that overall system technical constraints are not violated. To explore various curtailment strategies, we develop an L1, L2, and Linf norm-based dispatch algorithm with an exact three-phase AC model. We test our framework on a 1420 three-phase node meshed distribution network and show that the proposed algorithm optimally curtails DERs while guaranteeing the AC feasibility of the network.
- [48] arXiv:2403.18086 [pdf, ps, other]
-
Title: Generalizing Better Response Paths and Weakly Acyclic GamesSubjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
Weakly acyclic games generalize potential games and are fundamental to the study of game theoretic control. In this paper, we present a generalization of weakly acyclic games, and we observe its importance in multi-agent learning when agents employ experimental strategy updates in periods where they fail to best respond. While weak acyclicity is defined in terms of path connectivity properties of a game's better response graph, our generalization is defined using a generalized better response graph. We provide sufficient conditions for this notion of generalized weak acyclicity in both two-player games and $n$-player games. To demonstrate that our generalization is not trivial, we provide examples of games admitting a pure Nash equilibrium that are not generalized weakly acyclic. The generalization presented in this work is closely related to the recent theory of satisficing paths, and the counterexamples presented here constitute the first negative results in that theory.
- [49] arXiv:2403.18088 [pdf, other]
-
Title: Discretize first, filter next: learning divergence-consistent closure models for large-eddy simulationComments: 40 pages, 16 figures, 5 tablesSubjects: Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
We propose a new neural network based large eddy simulation framework for the incompressible Navier-Stokes equations based on the paradigm "discretize first, filter and close next". This leads to full model-data consistency and allows for employing neural closure models in the same environment as where they have been trained. Since the LES discretization error is included in the learning process, the closure models can learn to account for the discretization.
Furthermore, we introduce a new divergence-consistent discrete filter defined through face-averaging. The new filter preserves the discrete divergence-free constraint by construction, unlike general discrete filters such as volume-averaging filters. We show that using a divergence-consistent LES formulation coupled with a convolutional neural closure model produces stable and accurate results for both a-priori and a-posteriori training, while a general (divergence-inconsistent) LES model requires a-posteriori training or other stability-enforcing measures. - [50] arXiv:2403.18092 [pdf, other]
-
Title: OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware InterpolationComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
The scarcity of ground-truth labels poses one major challenge in developing optical flow estimation models that are both generalizable and robust. While current methods rely on data augmentation, they have yet to fully exploit the rich information available in labeled video sequences. We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongside optical flows in between. Utilizing a forward warping approach, OCAI employs occlusion awareness to resolve ambiguities in pixel values and fills in missing values by leveraging the forward-backward consistency of optical flows. Additionally, we introduce a teacher-student style semi-supervised learning method on top of the interpolated frames. Using a pair of unlabeled frames and the teacher model's predicted optical flow, we generate interpolated frames and flows to train a student model. The teacher's weights are maintained using Exponential Moving Averaging of the student. Our evaluations demonstrate perceptually superior interpolation quality and enhanced optical flow accuracy on established benchmarks such as Sintel and KITTI.
- [51] arXiv:2403.18093 [pdf, other]
-
Title: Enhancing Legal Document Retrieval: A Multi-Phase Approach with Large Language ModelsAuthors: Hai-Long Nguyen, Duc-Minh Nguyen, Tan-Minh Nguyen, Ha-Thanh Nguyen, Thi-Hai-Yen Vuong, Ken SatohComments: JURISIN 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models with billions of parameters, such as GPT-3.5, GPT-4, and LLaMA, are increasingly prevalent. Numerous studies have explored effective prompting techniques to harness the power of these LLMs for various research problems. Retrieval, specifically in the legal data domain, poses a challenging task for the direct application of Prompting techniques due to the large number and substantial length of legal articles. This research focuses on maximizing the potential of prompting by placing it as the final phase of the retrieval system, preceded by the support of two phases: BM25 Pre-ranking and BERT-based Re-ranking. Experiments on the COLIEE 2023 dataset demonstrate that integrating prompting techniques on LLMs into the retrieval system significantly improves retrieval accuracy. However, error analysis reveals several existing issues in the retrieval system that still need resolution.
- [52] arXiv:2403.18094 [pdf, other]
-
Title: A Personalized Video-Based Hand Taxonomy: Application for Individuals with Spinal Cord InjurySubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Hand function is critical for our interactions and quality of life. Spinal cord injuries (SCI) can impair hand function, reducing independence. A comprehensive evaluation of function in home and community settings requires a hand grasp taxonomy for individuals with impaired hand function. Developing such a taxonomy is challenging due to unrepresented grasp types in standard taxonomies, uneven data distribution across injury levels, and limited data. This study aims to automatically identify the dominant distinct hand grasps in egocentric video using semantic clustering. Egocentric video recordings collected in the homes of 19 individual with cervical SCI were used to cluster grasping actions with semantic significance. A deep learning model integrating posture and appearance data was employed to create a personalized hand taxonomy. Quantitative analysis reveals a cluster purity of 67.6% +- 24.2% with with 18.0% +- 21.8% redundancy. Qualitative assessment revealed meaningful clusters in video content. This methodology provides a flexible and effective strategy to analyze hand function in the wild. It offers researchers and clinicians an efficient tool for evaluating hand function, aiding sensitive assessments and tailored intervention plans.
- [53] arXiv:2403.18096 [pdf, other]
-
Title: Efficient Multi-Band Temporal Video Filter for Reducing Human-Robot InteractionAuthors: Lawrence O'GormanComments: 15 pages, 5 figures, 4 tablesSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
Although mobile robots have on-board sensors to perform navigation, their efficiency in completing paths can be enhanced by planning to avoid human interaction. Infrastructure cameras can capture human activity continuously for the purpose of compiling activity analytics to choose efficient times and routes. We describe a cascade temporal filtering method to efficiently extract short- and long-term activity in two time dimensions, isochronal and chronological, for use in global path planning and local navigation respectively. The temporal filter has application either independently, or, if object recognition is also required, it can be used as a pre-filter to perform activity-gating of the more computationally expensive neural network processing. For a testbed 32-camera network, we show how this hybrid approach can achieve over 8 times improvement in frames per second throughput and 6.5 times reduction of system power use. We also show how the cost map of static objects in the ROS robot software development framework is augmented with dynamic regions determined from the temporal filter.
- [54] arXiv:2403.18098 [pdf, other]
-
Title: GPTs and Language Barrier: A Cross-Lingual Legal QA ExaminationComments: NLP 2024, Kobe, JapanSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In this paper, we explore the application of Generative Pre-trained Transformers (GPTs) in cross-lingual legal Question-Answering (QA) systems using the COLIEE Task 4 dataset. In the COLIEE Task 4, given a statement and a set of related legal articles that serve as context, the objective is to determine whether the statement is legally valid, i.e., if it can be inferred from the provided contextual articles or not, which is also known as an entailment task. By benchmarking four different combinations of English and Japanese prompts and data, we provide valuable insights into GPTs' performance in multilingual legal QA scenarios, contributing to the development of more efficient and accurate cross-lingual QA solutions in the legal domain.
- [55] arXiv:2403.18100 [pdf, ps, other]
-
Title: Driving Intelligent IoT Monitoring and Control through Cloud Computing and Machine LearningSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This article explores how to drive intelligent iot monitoring and control through cloud computing and machine learning. As iot and the cloud continue to generate large and diverse amounts of data as sensor devices in the network, the collected data is sent to the cloud for statistical analysis, prediction, and data analysis to achieve business objectives. However, because the cloud computing model is limited by distance, it can be problematic in environments where the quality of the Internet connection is not ideal for critical operations. Therefore, edge computing, as a distributed computing architecture, moves the location of processing applications, data and services from the central node of the network to the logical edge node of the network to reduce the dependence on cloud processing and analysis of data, and achieve near-end data processing and analysis. The combination of iot and edge computing can reduce latency, improve efficiency, and enhance security, thereby driving the development of intelligent systems. The paper also introduces the development of iot monitoring and control technology, the application of edge computing in iot monitoring and control, and the role of machine learning in data analysis and fault detection. Finally, the application and effect of intelligent Internet of Things monitoring and control system in industry, agriculture, medical and other fields are demonstrated through practical cases and experimental studies.
- [56] arXiv:2403.18101 [pdf, other]
-
Title: Towards Explainable Clustering: A Constrained Declarative based ApproachSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
The domain of explainable AI is of interest in all Machine Learning fields, and it is all the more important in clustering, an unsupervised task whose result must be validated by a domain expert. We aim at finding a clustering that has high quality in terms of classic clustering criteria and that is explainable, and we argue that these two dimensions must be considered when building the clustering. We consider that a good global explanation of a clustering should give the characteristics of each cluster taking into account their abilities to describe its objects (coverage) while distinguishing it from the other clusters (discrimination). Furthermore, we aim at leveraging expert knowledge, at different levels, on the structure of the expected clustering or on its explanations. In our framework an explanation of a cluster is a set of patterns, and we propose a novel interpretable constrained clustering method called ECS for declarative clustering with Explainabilty-driven Cluster Selection that integrates structural or domain expert knowledge expressed by means of constraints. It is based on the notion of coverage and discrimination that are formalized at different levels (cluster / clustering), each allowing for exceptions through parameterized thresholds. Our method relies on four steps: generation of a set of partitions, computation of frequent patterns for each cluster, pruning clusters that violates some constraints, and selection of clusters and associated patterns to build an interpretable clustering. This last step is combinatorial and we have developed a Constraint-Programming (CP) model to solve it. The method can integrate prior knowledge in the form of user constraints, both before or in the CP model.
- [57] arXiv:2403.18103 [pdf, other]
-
Title: Tutorial on Diffusion Models for Imaging and VisionAuthors: Stanley H. ChanSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
The astonishing growth of generative tools in recent years has empowered many exciting applications in text-to-image generation and text-to-video generation. The underlying principle behind these generative tools is the concept of diffusion, a particular sampling mechanism that has overcome some shortcomings that were deemed difficult in the previous approaches. The goal of this tutorial is to discuss the essential ideas underlying the diffusion models. The target audience of this tutorial includes undergraduate and graduate students who are interested in doing research on diffusion models or applying these models to solve other problems.
- [58] arXiv:2403.18104 [pdf, other]
-
Title: Mathematical Foundation and Corrections for Full Range Head Pose EstimationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Numerous works concerning head pose estimation (HPE) offer algorithms or proposed neural network-based approaches for extracting Euler angles from either facial key points or directly from images of the head region. However, many works failed to provide clear definitions of the coordinate systems and Euler or Tait-Bryan angles orders in use. It is a well-known fact that rotation matrices depend on coordinate systems, and yaw, roll, and pitch angles are sensitive to their application order. Without precise definitions, it becomes challenging to validate the correctness of the output head pose and drawing routines employed in prior works. In this paper, we thoroughly examined the Euler angles defined in the 300W-LP dataset, head pose estimation such as 3DDFA-v2, 6D-RepNet, WHENet, etc, and the validity of their drawing routines of the Euler angles. When necessary, we infer their coordinate system and sequence of yaw, roll, pitch from provided code. This paper presents (1) code and algorithms for inferring coordinate system from provided source code, code for Euler angle application order and extracting precise rotation matrices and the Euler angles, (2) code and algorithms for converting poses from one rotation system to another, (3) novel formulae for 2D augmentations of the rotation matrices, and (4) derivations and code for the correct drawing routines for rotation matrices and poses. This paper also addresses the feasibility of defining rotations with right-handed coordinate system in Wikipedia and SciPy, which makes the Euler angle extraction much easier for full-range head pose research.
- [59] arXiv:2403.18105 [pdf, other]
-
Title: Large Language Models for Education: A Survey and OutlookAuthors: Shen Wang, Tianlong Xu, Hang Li, Chaoli Zhang, Joleen Liang, Jiliang Tang, Philip S. Yu, Qingsong WenSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The advent of Large Language Models (LLMs) has brought in a new era of possibilities in the realm of education. This survey paper summarizes the various technologies of LLMs in educational settings from multifaceted perspectives, encompassing student and teacher assistance, adaptive learning, and commercial tools. We systematically review the technological advancements in each perspective, organize related datasets and benchmarks, and identify the risks and challenges associated with deploying LLMs in education. Furthermore, we outline future research opportunities, highlighting the potential promising directions. Our survey aims to provide a comprehensive technological picture for educators, researchers, and policymakers to harness the power of LLMs to revolutionize educational practices and foster a more effective personalized learning environment.
- [60] arXiv:2403.18106 [pdf, ps, other]
-
Title: Authorized Subject Headings in the Online Automatic catalog Environment An Empirical Study on a Sample of Arabic RecordsAuthors: Ahmed Ammar Hussein HammamComments: in Arabic languageJournal-ref: Arab journal of library and information science, v. 41, issue 1, January, 2021Subjects: Digital Libraries (cs.DL)
Subject headings are very important to machine catalogs, given the importance of thematic research. This study aims to measure the quality of a group of authorized subject headings with a sample of Arabic bibliographic records on the catalog of Egyptian university libraries by identifying the most important practices, policies, procedures followed, and tools used. In addition to assessing the actual capabilities of lists, thesaurus, and guidelines that were used in establishing thematic availability points. The study used both the descriptive analytical and evaluation approaches to achieve the study objectives.
- [61] arXiv:2403.18107 [pdf, ps, other]
-
Title: The Need for Climate Data Stewardship: 10 Tensions and Reflections regarding Climate Data GovernanceAuthors: Stefaan VerhulstComments: 13 pages, 2 figuresSubjects: Computers and Society (cs.CY)
Datafication -- the increase in data generation and advancements in data analysis -- offers new possibilities for governing and tackling worldwide challenges such as climate change. However, employing new data sources in policymaking carries various risks, such as exacerbating inequalities, introducing biases, and creating gaps in access. This paper articulates ten core tensions related to climate data and its implications for climate data governance, ranging from the diversity of data sources and stakeholders to issues of quality, access, and the balancing act between local needs and global imperatives. Through examining these tensions, the article advocates for a paradigm shift towards multi-stakeholder governance, data stewardship, and equitable data practices to harness the potential of climate data for public good. It underscores the critical role of data stewards in navigating these challenges, fostering a responsible data ecology, and ultimately contributing to a more sustainable and just approach to climate action and broader social issues.
- [62] arXiv:2403.18111 [pdf, other]
-
Title: Scrolly2Reel: Turning News Graphics into TikToks by Adjusting Narrative Beats and PacingComments: 9 pages, 3 figuresSubjects: Human-Computer Interaction (cs.HC)
As media evolves, storytelling evolves. In 2012, newspapers introduced scrollytelling sequences, or "scrollies," to make news more immersive and interactive on the web. As users scroll through an article, graphics like animation, charts, and 3D visualizations appear to provide visual dynamics to the story. Today, news consumption is shifting to short-video platforms like TikTok, particularly among younger audiences. We propose repurposing the assets from scrollies and computationally transform them into videos. By shortening the original written text and precisely synchronizing the timing of audio narrative with features in the visual scrolling assets, we can create reels with dynamic pacing and narrative beats. We argue that text shortening is essential to producing fast paced videos that are compelling and visually interesting, and show that when beats are preserved in the output reel, topical alignment between them and the visual assets is crucial to the viewing experience. Understanding narrative pacing and beats in creative forms is key to user experience of media. They are an important primitive to effective editing, repurposing, and retargeting content while maintaining a cohesive narrative.
- [63] arXiv:2403.18114 [pdf, other]
-
Title: Segment Any Medical Model ExtendedAuthors: Yihao Liu, Jiaming Zhang, Andres Diaz-Pinto, Haowei Li, Alejandro Martin-Gomez, Amir Kheradmand, Mehran ArmandComments: The content of the manuscript has been presented in SPIE Medical Imaging 2024, and had been accepted to appear in the proceedings of the conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV)
The Segment Anything Model (SAM) has drawn significant attention from researchers who work on medical image segmentation because of its generalizability. However, researchers have found that SAM may have limited performance on medical images compared to state-of-the-art non-foundation models. Regardless, the community sees potential in extending, fine-tuning, modifying, and evaluating SAM for analysis of medical imaging. An increasing number of works have been published focusing on the mentioned four directions, where variants of SAM are proposed. To this end, a unified platform helps push the boundary of the foundation model for medical images, facilitating the use, modification, and validation of SAM and its variants in medical image segmentation. In this work, we introduce SAMM Extended (SAMME), a platform that integrates new SAM variant models, adopts faster communication protocols, accommodates new interactive modes, and allows for fine-tuning of subcomponents of the models. These features can expand the potential of foundation models like SAM, and the results can be translated to applications such as image-guided therapy, mixed reality interaction, robotic navigation, and data augmentation.
- [64] arXiv:2403.18116 [pdf, other]
-
Title: QuakeSet: A Dataset and Low-Resource Models to Monitor Earthquakes through Sentinel-1Comments: Accepted at ISCRAM 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Earthquake monitoring is necessary to promptly identify the affected areas, the severity of the events, and, finally, to estimate damages and plan the actions needed for the restoration process. The use of seismic stations to monitor the strength and origin of earthquakes is limited when dealing with remote areas (we cannot have global capillary coverage). Identification and analysis of all affected areas is mandatory to support areas not monitored by traditional stations. Using social media images in crisis management has proven effective in various situations. However, they are still limited by the possibility of using communication infrastructures in case of an earthquake and by the presence of people in the area. Moreover, social media images and messages cannot be used to estimate the actual severity of earthquakes and their characteristics effectively. The employment of satellites to monitor changes around the globe grants the possibility of exploiting instrumentation that is not limited by the visible spectrum, the presence of land infrastructures, and people in the affected areas. In this work, we propose a new dataset composed of images taken from Sentinel-1 and a new series of tasks to help monitor earthquakes from a new detailed view. Coupled with the data, we provide a series of traditional machine learning and deep learning models as baselines to assess the effectiveness of ML-based models in earthquake analysis.
- [65] arXiv:2403.18117 [pdf, ps, other]
-
Title: TDIP: Tunable Deep Image Processing, a Real Time Melt Pool Monitoring SolutionSubjects: Computer Vision and Pattern Recognition (cs.CV)
In the era of Industry 4.0, Additive Manufacturing (AM), particularly metal AM, has emerged as a significant contributor due to its innovative and cost-effective approach to fabricate highly intricate geometries. Despite its potential, this industry still lacks real-time capable process monitoring algorithms. Recent advancements in this field suggest that Melt Pool (MP) signatures during the fabrication process contain crucial information about process dynamics and quality. To obtain this information, various sensory approaches, such as high-speed cameras-based vision modules are employed for online fabrication monitoring. However, many conventional in-depth analyses still cannot process all the recorded data simultaneously. Although conventional Image Processing (ImP) solutions provide a targeted tunable approach, they pose a trade-off between convergence certainty and convergence speed. As a result, conventional methods are not suitable for a dynamically changing application like MP monitoring. Therefore, this article proposes the implementation of a Tunable Deep Image Processing (TDIP) method to address the data-rich monitoring needs in real-time. The proposed model is first trained to replicate an ImP algorithm with tunable features and methodology. The TDIP model is then further improved to account for MP geometries and fabrication quality based on the vision input and process parameters. The TDIP model achieved over 94% estimation accuracy with more than 96% R2 score for quality, geometry, and MP signature estimation and isolation. The TDIP model can process 500 images per second, while conventional methods taking a few minutes per image. This significant processing time reduction enables the integration of vision-based monitoring in real-time for processes and quality estimation.
- [66] arXiv:2403.18118 [pdf, other]
-
Title: EgoLifter: Open-world 3D Segmentation for Egocentric PerceptionComments: Preprint. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale.
- [67] arXiv:2403.18119 [pdf, other]
-
Title: Multiple Model Reference Adaptive Control with Blending for Non-Square Multivariable SystemsComments: 10 pages, 7 figures, IEEE Journal SubmissionSubjects: Systems and Control (eess.SY)
In this paper we develop a multiple model reference adaptive controller (MMRAC) with blending. The systems under consideration are non-square, i.e., the number of inputs is not equal to the number of states; multi-input, linear, time-invariant with uncertain parameters that lie inside of a known, compact, and convex set. Moreover, the full state of the plant is available for feedback. A multiple model online identification scheme for the plant's state and input matrices is developed that guarantees the estimated parameters converge to the underlying plant model under the assumption of persistence of excitation. Using an exact matching condition, the parameter estimates are used in a control law such that the plant's states asymptotically track the reference signal generated by a state-space model reference. The control architecture is proven to provide boundedness of all closed-loop signals and to asymptotically drive the state tracking error to zero. Numerical simulations illustrate the stability and efficacy of the proposed MMRAC scheme.
- [68] arXiv:2403.18120 [pdf, other]
-
Title: Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with AutoformalizationAuthors: Jin Peng Zhou, Charles Staats, Wenda Li, Christian Szegedy, Kilian Q. Weinberger, Yuhuai WuComments: ICLR 2024Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Large language models (LLM), such as Google's Minerva and OpenAI's GPT families, are becoming increasingly capable of solving mathematical quantitative reasoning problems. However, they still make unjustified logical and computational errors in their reasoning steps and answers. In this paper, we leverage the fact that if the training corpus of LLMs contained sufficiently many examples of formal mathematics (e.g. in Isabelle, a formal theorem proving environment), they can be prompted to translate i.e. autoformalize informal mathematical statements into formal Isabelle code -- which can be verified automatically for internal consistency. This provides a mechanism to automatically reject solutions whose formalized versions are inconsistent within themselves or with the formalized problem statement. We evaluate our method on GSM8K, MATH and MultiArith datasets and demonstrate that our approach provides a consistently better heuristic than vanilla majority voting -- the previously best method to identify correct answers, by more than 12% on GSM8K. In our experiments it improves results consistently across all datasets and LLM model sizes. The code can be found at https://github.com/jinpz/dtv.
- [69] arXiv:2403.18121 [pdf, other]
-
Title: ChatGPT Role-play Dataset: Analysis of User Motives and Model NaturalnessComments: Accepted by LREC-COLING 2024Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
Recent advances in interactive large language models like ChatGPT have revolutionized various domains; however, their behavior in natural and role-play conversation settings remains underexplored. In our study, we address this gap by deeply investigating how ChatGPT behaves during conversations in different settings by analyzing its interactions in both a normal way and a role-play setting. We introduce a novel dataset of broad range of human-AI conversations annotated with user motives and model naturalness to examine (i) how humans engage with the conversational AI model, and (ii) how natural are AI model responses. Our study highlights the diversity of user motives when interacting with ChatGPT and variable AI naturalness, showing not only the nuanced dynamics of natural conversations between humans and AI, but also providing new avenues for improving the effectiveness of human-AI communication.
- [70] arXiv:2403.18125 [pdf, ps, other]
-
Title: For those who don't know (how) to ask: Building a dataset of technology questions for digital newcomersComments: Presented at the AI4ED workshop at AAAI 2024Subjects: Computation and Language (cs.CL)
While the rise of large language models (LLMs) has created rich new opportunities to learn about digital technology, many on the margins of this technology struggle to gain and maintain competency due to lexical or conceptual barriers that prevent them from asking appropriate questions. Although there have been many efforts to understand factuality of LLM-created content and ability of LLMs to answer questions, it is not well understood how unclear or nonstandard language queries affect the model outputs. We propose the creation of a dataset that captures questions of digital newcomers and outsiders, utilizing data we have compiled from a decade's worth of one-on-one tutoring. In this paper we lay out our planned efforts and some potential uses of this dataset.
- [71] arXiv:2403.18127 [pdf, ps, other]
-
Title: A Correction of Pseudo Log-Likelihood MethodComments: 7 pagesSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
Pseudo log-likelihood is a type of maximum likelihood estimation (MLE) method used in various fields including contextual bandits, influence maximization of social networks, and causal bandits. However, in previous literature \citep{li2017provably, zhang2022online, xiong2022combinatorial, feng2023combinatorial1, feng2023combinatorial2}, the log-likelihood function may not be bounded, which may result in the algorithm they proposed not well-defined. In this paper, we give a counterexample that the maximum pseudo log-likelihood estimation fails and then provide a solution to correct the algorithms in \citep{li2017provably, zhang2022online, xiong2022combinatorial, feng2023combinatorial1, feng2023combinatorial2}.
- [72] arXiv:2403.18128 [pdf, other]
-
Title: HealthGAT: Node Classifications in Electronic Health Records using Graph Attention NetworksSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
While electronic health records (EHRs) are widely used across various applications in healthcare, most applications use the EHRs in their raw (tabular) format. Relying on raw or simple data pre-processing can greatly limit the performance or even applicability of downstream tasks using EHRs. To address this challenge, we present HealthGAT, a novel graph attention network framework that utilizes a hierarchical approach to generate embeddings from EHR, surpassing traditional graph-based methods. Our model iteratively refines the embeddings for medical codes, resulting in improved EHR data analysis. We also introduce customized EHR-centric auxiliary pre-training tasks to leverage the rich medical knowledge embedded within the data. This approach provides a comprehensive analysis of complex medical relationships and offers significant advancement over standard data representation techniques. HealthGAT has demonstrated its effectiveness in various healthcare scenarios through comprehensive evaluations against established methodologies. Specifically, our model shows outstanding performance in node classification and downstream tasks such as predicting readmissions and diagnosis classifications.
Our code is available at https://github.com/healthylaife/HealthGAT - [73] arXiv:2403.18132 [pdf, other]
-
Title: Recommendation of data-free class-incremental learning algorithms by simulating future dataSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Class-incremental learning deals with sequential data streams composed of batches of classes. Various algorithms have been proposed to address the challenging case where samples from past classes cannot be stored. However, selecting an appropriate algorithm for a user-defined setting is an open problem, as the relative performance of these algorithms depends on the incremental settings. To solve this problem, we introduce an algorithm recommendation method that simulates the future data stream. Given an initial set of classes, it leverages generative models to simulate future classes from the same visual domain. We evaluate recent algorithms on the simulated stream and recommend the one which performs best in the user-defined incremental setting. We illustrate the effectiveness of our method on three large datasets using six algorithms and six incremental settings. Our method outperforms competitive baselines, and performance is close to that of an oracle choosing the best algorithm in each setting. This work contributes to facilitate the practical deployment of incremental learning.
- [74] arXiv:2403.18133 [pdf, other]
-
Title: AE SemRL: Learning Semantic Association Rules with AutoencodersSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Association Rule Mining (ARM) is the task of learning associations among data features in the form of logical rules. Mining association rules from high-dimensional numerical data, for example, time series data from a large number of sensors in a smart environment, is a computationally intensive task. In this study, we propose an Autoencoder-based approach to learn and extract association rules from time series data (AE SemRL). Moreover, we argue that in the presence of semantic information related to time series data sources, semantics can facilitate learning generalizable and explainable association rules. Despite enriching time series data with additional semantic features, AE SemRL makes learning association rules from high-dimensional data feasible. Our experiments show that semantic association rules can be extracted from a latent representation created by an Autoencoder and this method has in the order of hundreds of times faster execution time than state-of-the-art ARM approaches in many scenarios. We believe that this study advances a new way of extracting associations from representations and has the potential to inspire more research in this field.
- [75] arXiv:2403.18136 [pdf, other]
-
Title: Securing GNNs: Explanation-Based Identification of Backdoored Training GraphsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Graph Neural Networks (GNNs) have gained popularity in numerous domains, yet they are vulnerable to backdoor attacks that can compromise their performance and ethical application. The detection of these attacks is crucial for maintaining the reliability and security of GNN classification tasks, but effective detection techniques are lacking. Following an initial investigation, we observed that while graph-level explanations can offer limited insights, their effectiveness in detecting backdoor triggers is inconsistent and incomplete. To bridge this gap, we extract and transform secondary outputs of GNN explanation mechanisms, designing seven novel metrics that more effectively detect backdoor attacks. Additionally, we develop an adaptive attack to rigorously evaluate our approach. We test our method on multiple benchmark datasets and examine its efficacy against various attack models. Our results show that our method can achieve high detection performance, marking a significant advancement in safeguarding GNNs against backdoor attacks.
- [76] arXiv:2403.18140 [pdf, other]
-
Title: Juru: Legal Brazilian Large Language Model from Reputable SourcesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The high computational cost associated with pretraining large language models limits their research. Two strategies have emerged to address this issue: domain specialization and pretraining with high-quality data. To explore these strategies, we specialized the Sabi\'a-2 Small model with 1.9 billion unique tokens from reputable Brazilian legal sources and conducted few-shot evaluations on legal and general knowledge exams. Our model, Juru, demonstrates the benefits of domain specialization with a reduced amount of pretraining data. However, this specialization comes at the expense of degrading performance in other knowledge areas within the same language. This study contributes to the growing body of scientific evidence showing that pretraining data selection may enhance the performance of large language models, enabling the exploration of these models at a lower cost.
- [77] arXiv:2403.18142 [pdf, other]
-
Title: HERTA: A High-Efficiency and Rigorous Training Algorithm for Unfolded Graph Neural NetworksSubjects: Machine Learning (cs.LG)
As a variant of Graph Neural Networks (GNNs), Unfolded GNNs offer enhanced interpretability and flexibility over traditional designs. Nevertheless, they still suffer from scalability challenges when it comes to the training cost. Although many methods have been proposed to address the scalability issues, they mostly focus on per-iteration efficiency, without worst-case convergence guarantees. Moreover, those methods typically add components to or modify the original model, thus possibly breaking the interpretability of Unfolded GNNs. In this paper, we propose HERTA: a High-Efficiency and Rigorous Training Algorithm for Unfolded GNNs that accelerates the whole training process, achieving a nearly-linear time worst-case training guarantee. Crucially, HERTA converges to the optimum of the original model, thus preserving the interpretability of Unfolded GNNs. Additionally, as a byproduct of HERTA, we propose a new spectral sparsification method applicable to normalized and regularized graph Laplacians that ensures tighter bounds for our algorithm than existing spectral sparsifiers do. Experiments on real-world datasets verify the superiority of HERTA as well as its adaptability to various loss functions and optimizers.
- [78] arXiv:2403.18144 [pdf, other]
-
Title: Leak and Learn: An Attacker's Cookbook to Train Using Leaked Data from Federated LearningComments: Accepted to CVPR 2024Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Federated learning is a decentralized learning paradigm introduced to preserve privacy of client data. Despite this, prior work has shown that an attacker at the server can still reconstruct the private training data using only the client updates. These attacks are known as data reconstruction attacks and fall into two major categories: gradient inversion (GI) and linear layer leakage attacks (LLL). However, despite demonstrating the effectiveness of these attacks in breaching privacy, prior work has not investigated the usefulness of the reconstructed data for downstream tasks. In this work, we explore data reconstruction attacks through the lens of training and improving models with leaked data. We demonstrate the effectiveness of both GI and LLL attacks in maliciously training models using the leaked data more accurately than a benign federated learning strategy. Counter-intuitively, this bump in training quality can occur despite limited reconstruction quality or a small total number of leaked images. Finally, we show the limitations of these attacks for downstream training, individually for GI attacks and for LLL attacks.
- [79] arXiv:2403.18145 [pdf, other]
-
Title: A Real-Time Rescheduling Algorithm for Multi-robot Plan ExecutionComments: ICAPS 2024Subjects: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Robotics (cs.RO)
One area of research in multi-agent path finding is to determine how replanning can be efficiently achieved in the case of agents being delayed during execution. One option is to reschedule the passing order of agents, i.e., the sequence in which agents visit the same location. In response, we propose Switchable-Edge Search (SES), an A*-style algorithm designed to find optimal passing orders. We prove the optimality of SES and evaluate its efficiency via simulations. The best variant of SES takes less than 1 second for small- and medium-sized problems and runs up to 4 times faster than baselines for large-sized problems.
- [80] arXiv:2403.18146 [pdf, ps, other]
-
Title: Adaptive TTD Configurations for Near-Field Communications: An Unsupervised Transformer ApproachSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
True-time delayers (TTDs) are popular analog devices for facilitating near-field wideband beamforming subject to the spatial-wideband effect. In this paper, an adaptive TTD configuration is proposed for short-range TTDs. Compared to the existing TTD configurations, the proposed one can effectively combat the spatial-widebandd effect for arbitrary user locations and array shapes with the aid of a switch network. A novel end-to-end deep neural network is proposed to optimize the hybrid beamforming with adaptive TTDs for maximizing spectral efficiency. 1) First, based on the U-Net architecture, a near-field channel learning module (NFC-LM) is proposed for adaptive beamformer design through extracting the latent channel response features of various users across different frequencies. In the NFC-LM, an improved cross attention (CA) is introduced to further optimize beamformer design by enhancing the latent feature connection between near-field channel and different beamformers. 2) Second, a switch multi-user transformer (S-MT) is proposed to adaptively control the connection between TTDs and phase shifters (PSs). In the S-MT, an improved multi-head attention, namely multi-user attention (MSA), is introduced to optimize the switch network through exploring the latent channel relations among various users. 3) Third, a multi feature cross attention (MCA) is introduced to simultaneously optimize the NFC-LM and S-MT by enhancing the latent feature correlation between beamformers and switch network. Numerical simulation results show that 1) the proposed adaptive TTD configuration effectively eliminates the spatial-wideband effect under uniform linear array (ULA) and uniform circular array (UCA) architectures, and 2) the proposed deep neural network can provide near optimal spectral efficiency, and solve the multi-user bemformer design and dynamical connection problem in real-time.
- [81] arXiv:2403.18147 [pdf, other]
-
Title: Divide, Conquer, Combine Bayesian Decision Tree SamplingComments: 38 pages, 5 figuresSubjects: Machine Learning (cs.LG)
Decision trees are commonly used predictive models due to their flexibility and interpretability. This paper is directed at quantifying the uncertainty of decision tree predictions by employing a Bayesian inference approach. This is challenging because these approaches need to explore both the tree structure space and the space of decision parameters associated with each tree structure. This has been handled by using Markov Chain Monte Carlo (MCMC) methods, where a Markov Chain is constructed to provide samples from the desired Bayesian estimate. Importantly, the structure and the decision parameters are tightly coupled; small changes in the tree structure can demand vastly different decision parameters to provide accurate predictions. A challenge for existing MCMC approaches is proposing joint changes in both the tree structure and the decision parameters that result in efficient sampling. This paper takes a different approach, where each distinct tree structure is associated with a unique set of decision parameters. The proposed approach, entitled DCC-Tree, is inspired by the work in Zhou et al. [23] for probabilistic programs and Cochrane et al. [4] for Hamiltonian Monte Carlo (HMC) based sampling for decision trees. Results show that DCC-Tree performs comparably to other HMC-based methods and better than existing Bayesian tree methods while improving on consistency and reducing the per-proposal complexity.
- [82] arXiv:2403.18148 [pdf, other]
-
Title: Large Language Models Produce Responses Perceived to be EmpathicSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large Language Models (LLMs) have demonstrated surprising performance on many tasks, including writing supportive messages that display empathy. Here, we had these models generate empathic messages in response to posts describing common life experiences, such as workplace situations, parenting, relationships, and other anxiety- and anger-eliciting situations. Across two studies (N=192, 202), we showed human raters a variety of responses written by several models (GPT4 Turbo, Llama2, and Mistral), and had people rate these responses on how empathic they seemed to be. We found that LLM-generated responses were consistently rated as more empathic than human-written responses. Linguistic analyses also show that these models write in distinct, predictable ``styles", in terms of their use of punctuation, emojis, and certain words. These results highlight the potential of using LLMs to enhance human peer support in contexts where empathy is important.
- [83] arXiv:2403.18149 [pdf, other]
-
Title: Code Generation for Conic Model-Predictive Control on Microcontrollers with TinyMPCComments: Submitted to CDC, 2024. First two authors contributed equallySubjects: Robotics (cs.RO); Systems and Control (eess.SY); Optimization and Control (math.OC)
Conic constraints appear in many important control applications like legged locomotion, robotic manipulation, and autonomous rocket landing. However, current solvers for conic optimization problems have relatively heavy computational demands in terms of both floating-point operations and memory footprint, making them impractical for use on small embedded devices. We extend TinyMPC, an open-source, high-speed solver targeting low-power embedded control applications, to handle second-order cone constraints. We also present code-generation software to enable deployment of TinyMPC on a variety of microcontrollers. We benchmark our generated code against state-of-the-art embedded QP and SOCP solvers, demonstrating a two-order-of-magnitude speed increase over ECOS while consuming less memory. Finally, we demonstrate TinyMPC's efficacy on the Crazyflie, a lightweight, resource-constrained quadrotor with fast dynamics. TinyMPC and its code-generation tools are publicly available at https://tinympc.org.
- [84] arXiv:2403.18152 [pdf, other]
-
Title: Large Language Models as Financial Data Annotators: A Study on Effectiveness and EfficiencyAuthors: Toyin Aguda, Suchetha Siddagangappa, Elena Kochkina, Simerjot Kaur, Dongsheng Wang, Charese Smiley, Sameena ShahComments: Accepted to LREC-COLING 2024Subjects: Computation and Language (cs.CL)
Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them. While Large Language Models (LLMs) have demonstrated remarkable performance in data annotation tasks on general domain datasets, their effectiveness on domain specific datasets remains underexplored. To address this gap, we investigate the potential of LLMs as efficient data annotators for extracting relations in financial documents. We compare the annotations produced by three LLMs (GPT-4, PaLM 2, and MPT Instruct) against expert annotators and crowdworkers. We demonstrate that the current state-of-the-art LLMs can be sufficient alternatives to non-expert crowdworkers. We analyze models using various prompts and parameter settings and find that customizing the prompts for each relation group by providing specific examples belonging to those groups is paramount. Furthermore, we introduce a reliability index (LLM-RelIndex) used to identify outputs that may require expert attention. Finally, we perform an extensive time, cost and error analysis and provide recommendations for the collection and usage of automated annotations in domain-specific settings.
- [85] arXiv:2403.18158 [pdf, other]
-
Title: The Effects of Short Video-Sharing Services on Video Copy DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
The short video-sharing services that allow users to post 10-30 second videos (e.g., YouTube Shorts and TikTok) have attracted a lot of attention in recent years. However, conventional video copy detection (VCD) methods mainly focus on general video-sharing services (e.g., YouTube and Bilibili), and the effects of short video-sharing services on video copy detection are still unclear. Considering that illegally copied videos in short video-sharing services have service-distinctive characteristics, especially in those time lengths, the pros and cons of VCD in those services are required to be analyzed. In this paper, we examine the effects of short video-sharing services on VCD by constructing a dataset that has short video-sharing service characteristics. Our novel dataset is automatically constructed from the publicly available dataset to have reference videos and fixed short-time-length query videos, and such automation procedures assure the reproducibility and data privacy preservation of this paper. From the experimental results focusing on segment-level and video-level situations, we can see that three effects: "Segment-level VCD in short video-sharing services is more difficult than those in general video-sharing services", "Video-level VCD in short video-sharing services is easier than those in general video-sharing services", "The video alignment component mainly suppress the detection performance in short video-sharing services".
- [86] arXiv:2403.18159 [pdf, other]
-
Title: Oh! We Freeze: Improving Quantized Knowledge Distillation via Signal Propagation Analysis for Large Language ModelsAuthors: Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Kyunggeun Lee, Jun Ma, Harris TeagueComments: Accepted at Practical ML for Low Resource Settings Workshop at ICLR 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Large generative models, such as large language models (LLMs) and diffusion models have as revolutionized the fields of NLP and computer vision respectively. However, their slow inference, high computation and memory requirement makes it challenging to deploy them on edge devices. In this study, we propose a light-weight quantization aware fine tuning technique using knowledge distillation (KD-QAT) to improve the performance of 4-bit weight quantized LLMs using commonly available datasets to realize a popular language use case, on device chat applications. To improve this paradigm of finetuning, as main contributions, we provide insights into stability of KD-QAT by empirically studying the gradient propagation during training to better understand the vulnerabilities of KD-QAT based approaches to low-bit quantization errors. Based on our insights, we propose ov-freeze, a simple technique to stabilize the KD-QAT process. Finally, we experiment with the popular 7B LLaMAv2-Chat model at 4-bit quantization level and demonstrate that ov-freeze results in near float-point precision performance, i.e., less than 0.7% loss of accuracy on Commonsense Reasoning benchmarks.
- [87] arXiv:2403.18160 [pdf, other]
-
Title: Eternagram: Probing Player Attitudes in Alternate Climate Scenarios Through a ChatGPT-Driven Text AdventureComments: 22 pages, 6 figures, Accepted by CHI Conference on Human Factors in Computing Systems 2024Subjects: Human-Computer Interaction (cs.HC)
Conventional methods of assessing attitudes towards climate change are limited in capturing authentic opinions, primarily stemming from a lack of context-specific assessment strategies and an overreliance on simplistic surveys. Game-based Assessments (GBA) have demonstrated the ability to overcome these issues by immersing participants in engaging gameplay within carefully crafted, scenario-based environments. Concurrently, advancements in AI and Natural Language Processing (NLP) show promise in enhancing the gamified testing environment, achieving this by generating context-aware, human-like dialogues that contribute to a more natural and effective assessment. Our study introduces a new technique for probing climate change attitudes by actualizing a GPT-driven chatbot system in harmony with a game design depicting a futuristic climate scenario. The correlation analysis reveals an assimilation effect, where players' post-game climate awareness tends to align with their in-game perceptions. Key predictors of pro-climate attitudes are identified as traits like 'Openness' and 'Agreeableness', and a preference for democratic values.
- [88] arXiv:2403.18162 [pdf, other]
-
Title: Optimizing Cyber Response Time on Temporal Active Directory Networks Using DecoysComments: To be appear in ACM GECCO 2024Subjects: Cryptography and Security (cs.CR); Computer Science and Game Theory (cs.GT); Neural and Evolutionary Computing (cs.NE)
Microsoft Active Directory (AD) is the default security management system for Window domain network. We study the problem of placing decoys in AD network to detect potential attacks. We model the problem as a Stackelberg game between an attacker and a defender on AD attack graphs where the defender employs a set of decoys to detect the attacker on their way to Domain Admin (DA). Contrary to previous works, we consider time-varying (temporal) attack graphs. We proposed a novel metric called response time, to measure the effectiveness of our decoy placement in temporal attack graphs. Response time is defined as the duration from the moment attackers trigger the first decoy to when they compromise the DA. Our goal is to maximize the defender's response time to the worst-case attack paths. We establish the NP-hard nature of the defender's optimization problem, leading us to develop Evolutionary Diversity Optimization (EDO) algorithms. EDO algorithms identify diverse sets of high-quality solutions for the optimization problem. Despite the polynomial nature of the fitness function, it proves experimentally slow for larger graphs. To enhance scalability, we proposed an algorithm that exploits the static nature of AD infrastructure in the temporal setting. Then, we introduce tailored repair operations, ensuring the convergence to better results while maintaining scalability for larger graphs.
- [89] arXiv:2403.18163 [pdf, other]
-
Title: A Study of Three Influencer Archetypes for the Control of Opinion Spread in Time-Varying Social NetworksComments: Submission to IEEE 2024 Conference on Decision and Control. 8 pages, 7 figures, 1 tableSubjects: Social and Information Networks (cs.SI); Systems and Control (eess.SY); Physics and Society (physics.soc-ph)
In this work we consider the impact of information spread in time-varying social networks, where agents request to follow other agents with aligned opinions while dropping ties to neighbors whose posts are too dissimilar to their own views. Opinion control and rhetorical influence has a very long history, employing various methods including education, persuasion, propaganda, marketing, and manipulation through mis-, dis-, and mal-information. The automation of opinion controllers, however, has only recently become easily deployable at a wide scale, with the advent of large language models (LLMs) and generative AI that can translate the quantified commands from opinion controllers into actual content with the appropriate nuance. Automated agents in social networks can be deployed for various purposes, such as breaking up echo chambers, bridging valuable new connections between agents, or shaping the opinions of a target population -- and all of these raise important ethical concerns that deserve serious attention and thoughtful discussion and debate. This paper attempts to contribute to this discussion by considering three archetypal influencing styles observed by human drivers in these settings, comparing and contrasting the impact of these different control methods on the opinions of agents in the network. We will demonstrate the efficacy of current generative AI for generating nuanced content consistent with the command signal from automatic opinion controllers like these, and we will report on frameworks for approaching the relevant ethical considerations.
- [90] arXiv:2403.18164 [pdf, other]
-
Title: Incentive Designs for Learning Agents to Stabilize Coupled Exogenous SystemsComments: 8 pages, 3 figuresSubjects: Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC)
We consider a large population of learning agents noncooperatively selecting strategies from a common set, influencing the dynamics of an exogenous system (ES) we seek to stabilize at a desired equilibrium. Our approach is to design a dynamic payoff mechanism capable of shaping the population's strategy profile, thus affecting the ES's state, by offering incentives for specific strategies within budget limits. Employing system-theoretic passivity concepts, we establish conditions under which a payoff mechanism can be systematically constructed to ensure the global asymptotic stabilization of the ES's equilibrium. In comparison to previous approaches originally studied in the context of the so-called epidemic population games, the method proposed here allows for more realistic epidemic models and other types of ES, such as predator-prey dynamics. Stabilization is established with the support of a Lyapunov function, which provides useful bounds on the transients.
- [91] arXiv:2403.18166 [pdf, other]
-
Title: Incentive-Compatible Vertiport Reservation in Advanced Air Mobility: An Auction-Based ApproachComments: 26 pages, 2 figures, 1 tableSubjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH); Optimization and Control (math.OC)
The rise of advanced air mobility (AAM) is expected to become a multibillion-dollar industry in the near future. Market-based mechanisms are touted to be an integral part of AAM operations, which comprise heterogeneous operators with private valuations. In this work, we study the problem of designing a mechanism to coordinate the movement of electric vertical take-off and landing (eVTOL) aircraft, operated by multiple operators each having heterogeneous valuations associated with their fleet, between vertiports, while enforcing the arrival, departure, and parking constraints at vertiports. Particularly, we propose an incentive-compatible and individually rational vertiport reservation mechanism that maximizes a social welfare metric, which encapsulates the objective of maximizing the overall valuations of all operators while minimizing the congestion at vertiports. Additionally, we improve the computational tractability of designing the reservation mechanism by proposing a mixed binary linear programming approach that is based on constructing network flow graph corresponding to the underlying problem.
- [92] arXiv:2403.18167 [pdf, other]
-
Title: Mechanisms of non-factual hallucinations in language modelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that misalign with world knowledge. Despite extensive efforts to detect and mitigate hallucinations, understanding their internal mechanisms remains elusive. Our study investigates the mechanistic causes of hallucination, specifically non-factual ones where the LM incorrectly predicts object attributes in response to subject-relation queries. With causal mediation analysis and embedding space projection, we identify two general mechanistic causes of hallucinations shared across LMs of various scales and designs: 1) insufficient subject attribute knowledge in lower layer MLPs, and 2) failing to select the correct object attribute in upper layer attention heads and MLPs. These two mechanisms exhibit varying degrees of subject-object association, predictive uncertainty and perturbation robustness. Additionally, we scrutinize LM pre-training checkpoints, revealing distinct learning dynamics for the two mechanistic causes of hallucinations. We also highlight how attribution features from our causal analysis can effectively construct hallucination detectors. Our work proposes a mechanistic understanding of LM factual errors.
- [93] arXiv:2403.18171 [pdf, other]
-
Title: Higher order multi-dimension reduction methods via Einstein-productSubjects: Numerical Analysis (math.NA)
This paper explores the extension of dimension reduction (DR) techniques to the multi-dimension case by using the Einstein product. Our focus lies on graph-based methods, encompassing both linear and nonlinear approaches, within both supervised and unsupervised learning paradigms. Additionally, we investigate variants such as repulsion graphs and kernel methods for linear approaches. Furthermore, we present two generalizations for each method, based on single or multiple weights. We demonstrate the straightforward nature of these generalizations and provide theoretical insights. Numerical experiments are conducted, and results are compared with original methods, highlighting the efficiency of our proposed methods, particularly in handling high-dimensional data such as color images.
- [94] arXiv:2403.18172 [pdf, other]
-
Title: Vision-Based Force Estimation for Minimally Invasive Telesurgery Through Contact Detection and Local Stiffness ModelsComments: Preprint of an article accepted in Journal of Medical Robotics Research \copyright 2024 copyright World Scientific Publishing CompanySubjects: Robotics (cs.RO)
In minimally invasive telesurgery, obtaining accurate force information is difficult due to the complexities of in-vivo end effector force sensing. This constrains development and implementation of haptic feedback and force-based automated performance metrics, respectively. Vision-based force sensing approaches using deep learning are a promising alternative to intrinsic end effector force sensing. However, they have limited ability to generalize to novel scenarios, and require learning on high-quality force sensor training data that can be difficult to obtain. To address these challenges, this paper presents a novel vision-based contact-conditional approach for force estimation in telesurgical environments. Our method leverages supervised learning with human labels and end effector position data to train deep neural networks. Predictions from these trained models are optionally combined with robot joint torque information to estimate forces indirectly from visual data. We benchmark our method against ground truth force sensor data and demonstrate generality by fine-tuning to novel surgical scenarios in a data-efficient manner. Our methods demonstrated greater than 90% accuracy on contact detection and less than 10% force prediction error. These results suggest potential usefulness of contact-conditional force estimation for sensory substitution haptic feedback and tissue handling skill evaluation in clinical settings.
- [95] arXiv:2403.18173 [pdf, ps, other]
-
Title: LLMs in HCI Data Work: Bridging the Gap Between Information Retrieval and Responsible Research PracticesComments: 5 pages, CHI2024 Workshop on LLMs as Research Tools: Applications and Evaluations in HCI Data WorkSubjects: Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
Efficient and accurate information extraction from scientific papers is significant in the rapidly developing human-computer interaction research in the literature review process. Our paper introduces and analyses a new information retrieval system using state-of-the-art Large Language Models (LLMs) in combination with structured text analysis techniques to extract experimental data from HCI literature, emphasizing key elements. Then We analyze the challenges and risks of using LLMs in the world of research. We performed a comprehensive analysis on our conducted dataset, which contained the specified information of 300 CHI 2020-2022 papers, to evaluate the performance of the two large language models, GPT-3.5 (text-davinci-003) and Llama-2-70b, paired with structured text analysis techniques. The GPT-3.5 model gains an accuracy of 58\% and a mean absolute error of 7.00. In contrast, the Llama2 model indicates an accuracy of 56\% with a mean absolute error of 7.63. The ability to answer questions was also included in the system in order to work with streamlined data. By evaluating the risks and opportunities presented by LLMs, our work contributes to the ongoing dialogue on establishing methodological validity and ethical guidelines for LLM use in HCI data work.
- [96] arXiv:2403.18174 [pdf, ps, other]
-
Title: Local (coarse) correlated equilibria in non-concave gamesAuthors: Mete Şeref AhunbayComments: 39 pagesSubjects: Computer Science and Game Theory (cs.GT)
We investigate local notions of correlated equilibria, distributions of actions for smooth games such that players do not incur any regret against modifications of their strategies along a set of continuous vector fields. Our analysis shows that such equilibria are intrinsically linked to the projected gradient dynamics of the game. We identify the equivalent of coarse equilibria in this setting when no regret is incurred against any gradient field of a differentiable function. As a result, such equilibria are approximable when all players employ online (projected) gradient ascent with equal step-sizes as learning algorithms, and when their compact and convex action sets either (1) possess a smooth boundary, or (2) are polyhedra over which linear optimisation is ``trivial''. As a consequence, primal-dual proofs of performance guarantees for local coarse equilibria take the form of a generalised Lyapunov function for the gradient dynamics of the game. Adapting the regret matching framework to our setting, we also show that general local correlated equilibria are approximable when the set of vector fields is finite, given access to a fixed-point oracle for linear or conical combinations. For the class of affine-linear vector fields, which subsumes correlated equilibria of normal form games as a special case, such a fixed-point turns out to be the solution of a convex quadratic minimisation problem. Our results are independent of concavity assumptions on players' utilities.
- [97] arXiv:2403.18176 [pdf, other]
-
Title: Mistake, Manipulation and Margin Guarantees in Online Strategic ClassificationSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
We consider an online strategic classification problem where each arriving agent can manipulate their true feature vector to obtain a positive predicted label, while incurring a cost that depends on the amount of manipulation. The learner seeks to predict the agent's true label given access to only the manipulated features. After the learner releases their prediction, the agent's true label is revealed. Previous algorithms such as the strategic perceptron guarantee finitely many mistakes under a margin assumption on agents' true feature vectors. However, these are not guaranteed to encourage agents to be truthful. Promoting truthfulness is intimately linked to obtaining adequate margin on the predictions, thus we provide two new algorithms aimed at recovering the maximum margin classifier in the presence of strategic agent behavior. We prove convergence, finite mistake and finite manipulation guarantees for a variety of agent cost structures. We also provide generalized versions of the strategic perceptron with mistake guarantees for different costs. Our numerical study on real and synthetic data demonstrates that the new algorithms outperform previous ones in terms of margin, number of manipulation and number of mistakes.
- [98] arXiv:2403.18178 [pdf, other]
-
Title: Online Embedding Multi-Scale CLIP Features into 3D MapsComments: 8 pages, 7 figuresSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
This study introduces a novel approach to online embedding of multi-scale CLIP (Contrastive Language-Image Pre-Training) features into 3D maps. By harnessing CLIP, this methodology surpasses the constraints of conventional vocabulary-limited methods and enables the incorporation of semantic information into the resultant maps. While recent approaches have explored the embedding of multi-modal features in maps, they often impose significant computational costs, lacking practicality for exploring unfamiliar environments in real time. Our approach tackles these challenges by efficiently computing and embedding multi-scale CLIP features, thereby facilitating the exploration of unfamiliar environments through real-time map generation. Moreover, the embedding CLIP features into the resultant maps makes offline retrieval via linguistic queries feasible. In essence, our approach simultaneously achieves real-time object search and mapping of unfamiliar environments. Additionally, we propose a zero-shot object-goal navigation system based on our mapping approach, and we validate its efficacy through object-goal navigation, offline object retrieval, and multi-object-goal navigation in both simulated environments and real robot experiments. The findings demonstrate that our method not only exhibits swifter performance than state-of-the-art mapping methods but also surpasses them in terms of the success rate of object-goal navigation tasks.
- [99] arXiv:2403.18180 [pdf, other]
-
Title: Multi-Layer Dense Attention Decoder for Polyp SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Detecting and segmenting polyps is crucial for expediting the diagnosis of colon cancer. This is a challenging task due to the large variations of polyps in color, texture, and lighting conditions, along with subtle differences between the polyp and its surrounding area. Recently, vision Transformers have shown robust abilities in modeling global context for polyp segmentation. However, they face two major limitations: the inability to learn local relations among multi-level layers and inadequate feature aggregation in the decoder. To address these issues, we propose a novel decoder architecture aimed at hierarchically aggregating locally enhanced multi-level dense features. Specifically, we introduce a novel module named Dense Attention Gate (DAG), which adaptively fuses all previous layers' features to establish local feature relations among all layers. Furthermore, we propose a novel nested decoder architecture that hierarchically aggregates decoder features, thereby enhancing semantic features. We incorporate our novel dense decoder with the PVT backbone network and conduct evaluations on five polyp segmentation datasets: Kvasir, CVC-300, CVC-ColonDB, CVC-ClinicDB, and ETIS. Our experiments and comparisons with nine competing segmentation models demonstrate that the proposed architecture achieves state-of-the-art performance and outperforms the previous models on four datasets. The source code is available at: https://github.com/krushi1992/Dense-Decoder.
- [100] arXiv:2403.18181 [pdf, ps, other]
-
Title: Compression of the Koopman matrix for nonlinear physical models via hierarchical clusteringComments: 9 pages, 10 figuresSubjects: Machine Learning (cs.LG); Dynamical Systems (math.DS)
Machine learning methods allow the prediction of nonlinear dynamical systems from data alone. The Koopman operator is one of them, which enables us to employ linear analysis for nonlinear dynamical systems. The linear characteristics of the Koopman operator are hopeful to understand the nonlinear dynamics and perform rapid predictions. The extended dynamic mode decomposition (EDMD) is one of the methods to approximate the Koopman operator as a finite-dimensional matrix. In this work, we propose a method to compress the Koopman matrix using hierarchical clustering. Numerical demonstrations for the cart-pole model and comparisons with the conventional singular value decomposition (SVD) are shown; the results indicate that the hierarchical clustering performs better than the naive SVD compressions.
- [101] arXiv:2403.18182 [pdf, other]
-
Title: ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech CorpusComments: Accepted to LREC-COLING 2024Subjects: Computation and Language (cs.CL)
We present ZAEBUC-Spoken, a multilingual multidialectal Arabic-English speech corpus. The corpus comprises twelve hours of Zoom meetings involving multiple speakers role-playing a work situation where Students brainstorm ideas for a certain topic and then discuss it with an Interlocutor. The meetings cover different topics and are divided into phases with different language setups. The corpus presents a challenging set for automatic speech recognition (ASR), including two languages (Arabic and English) with Arabic spoken in multiple variants (Modern Standard Arabic, Gulf Arabic, and Egyptian Arabic) and English used with various accents. Adding to the complexity of the corpus, there is also code-switching between these languages and dialects. As part of our work, we take inspiration from established sets of transcription guidelines to present a set of guidelines handling issues of conversational speech, code-switching and orthography of both languages. We further enrich the corpus with two layers of annotations; (1) dialectness level annotation for the portion of the corpus where mixing occurs between different variants of Arabic, and (2) automatic morphological annotations, including tokenization, lemmatization, and part-of-speech tagging.
- [102] arXiv:2403.18183 [pdf, other]
-
Title: Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction ConfidenceSubjects: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
A well-designed document communicates not only through its words but also through its visual eloquence. Authors utilize aesthetic elements such as colors, fonts, graphics, and layouts to shape the perception of information. Thoughtful document design, informed by psychological insights, enhances both the visual appeal and the comprehension of the content. While state-of-the-art document AI models demonstrate the benefits of incorporating layout and image data, it remains unclear whether the nuances of document aesthetics are effectively captured. To bridge the gap between human cognition and AI interpretation of aesthetic elements, we formulated hypotheses concerning AI behavior in document understanding tasks, specifically anchored in document design principles. With a focus on legibility and layout quality, we tested four aspects of aesthetic effects: noise, font-size contrast, alignment, and complexity, on model confidence using correlational analysis. The results and observations highlight the value of model analysis rooted in document design theories. Our work serves as a trailhead for further studies and we advocate for continued research in this topic to deepen our understanding of how AI interprets document aesthetics.
- [103] arXiv:2403.18186 [pdf, other]
-
Title: Don't Look into the Dark: Latent Codes for Pluralistic Image InpaintingComments: cvpr 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
We present a method for large-mask pluralistic image inpainting based on the generative framework of discrete latent codes. Our method learns latent priors, discretized as tokens, by only performing computations at the visible locations of the image. This is realized by a restrictive partial encoder that predicts the token label for each visible block, a bidirectional transformer that infers the missing labels by only looking at these tokens, and a dedicated synthesis network that couples the tokens with the partial image priors to generate coherent and pluralistic complete image even under extreme mask settings. Experiments on public benchmarks validate our design choices as the proposed method outperforms strong baselines in both visual quality and diversity metrics.
- [104] arXiv:2403.18187 [pdf, other]
-
Title: LayoutFlow: Flow Matching for Layout GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Finding a suitable layout represents a crucial task for diverse applications in graphic design. Motivated by simpler and smoother sampling trajectories, we explore the use of Flow Matching as an alternative to current diffusion-based layout generation models. Specifically, we propose LayoutFlow, an efficient flow-based model capable of generating high-quality layouts. Instead of progressively denoising the elements of a noisy layout, our method learns to gradually move, or flow, the elements of an initial sample until it reaches its final prediction. In addition, we employ a conditioning scheme that allows us to handle various generation tasks with varying degrees of conditioning with a single model. Empirically, LayoutFlow performs on par with state-of-the-art models while being significantly faster.
- [105] arXiv:2403.18188 [pdf, ps, other]
-
Title: Integrating urban digital twins with cloud-based geospatial dashboards for coastal resilience planning: A case study in FloridaAuthors: Changjie Chen, Yu Han, Andrea Galinski, Christian Calle, Jeffery Carney, Xinyue Ye, Cees van WestenSubjects: Computers and Society (cs.CY)
Coastal communities are confronted with a growing incidence of climate-induced flooding, necessitating adaptation measures for resilience. In this paper, we introduce a framework that integrates an urban digital twin with a geospatial dashboard to allow visualization of the vulnerabilities within critical infrastructure across a range of spatial and temporal scales. The synergy between these two technologies fosters heightened community awareness about increased flood risks to establish a unified understanding, the foundation for collective decision-making in adaptation plans. The paper also elucidates ethical considerations while developing the platform, including ensuring accessibility, promoting transparency and equity, and safeguarding individual privacy.
- [106] arXiv:2403.18191 [pdf, other]
-
Title: The process of polarisation as a loss of dimensionality: measuring changes in polarisation using Singular Value Decomposition of network graphsComments: 24 pages, 7 figures, abstract presented at ICCS 2023Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
The increasing polarisation in our societies is a major international concern. Current approaches to defining and detecting polarisation largely rely on finding evidence of bimodality in social networks or voter opinion surveys. It is difficult to detect temporal trends in polarisation, as the results usually fall into a binary of polarised or non-polarised, which cannot robustly show that subsequent increases in bimodality are statistically significant.
Our work is aligned with Baldassari and Gelman's theory that polarisation should be defined as increasing correlation between positions in the ideological field. We also draw from post-structuralist work which argues that polarisation is the process of both the ideological and material layers of society being segregated into two poles, as in cases of apartheid. Thus, in order to measure the polarisation in a society, it would be beneficial to be able to assess social networks directly.
In this paper we use Random Dot Product Graphs to embed social networks in metric spaces. In the case of a social network, the embedded dimensionality corresponds to the number of reasons any two people may form a social connection. A decrease in the optimal dimensionality for the embedding of the network graph, as measured using truncated Singular Value Decomposition of the graph adjacency matrix, indicates increasing polarisation in the network.
We apply this method to two different Twitter networks based on discussions of climate change, and show that our methods agree with other researchers' detection of polarisation in this space. We also use networks generated by stochastic block models to explore how an increase of the isolation between distinct communities in a network, or the increase in the predominance of one community over the other, are identifiable as polarisation processes. - [107] arXiv:2403.18192 [pdf, other]
-
Title: Multi-Label Adaptive Batch Selection by Highlighting Hard and Imbalanced SamplesSubjects: Machine Learning (cs.LG)
Deep neural network models have demonstrated their effectiveness in classifying multi-label data from various domains. Typically, they employ a training mode that combines mini-batches with optimizers, where each sample is randomly selected with equal probability when constructing mini-batches. However, the intrinsic class imbalance in multi-label data may bias the model towards majority labels, since samples relevant to minority labels may be underrepresented in each mini-batch. Meanwhile, during the training process, we observe that instances associated with minority labels tend to induce greater losses. Existing heuristic batch selection methods, such as priority selection of samples with high contribution to the objective function, i.e., samples with high loss, have been proven to accelerate convergence while reducing the loss and test error in single-label data. However, batch selection methods have not yet been applied and validated in multi-label data. In this study, we introduce a simple yet effective adaptive batch selection algorithm tailored to multi-label deep learning models. It adaptively selects each batch by prioritizing hard samples related to minority labels. A variant of our method also takes informative label correlations into consideration. Comprehensive experiments combining five multi-label deep learning models on thirteen benchmark datasets show that our method converges faster and performs better than random batch selection.
- [108] arXiv:2403.18193 [pdf, other]
-
Title: Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV)
RGB-T tracking, a vital downstream task of object tracking, has made remarkable progress in recent years. Yet, it remains hindered by two major challenges: 1) the trade-off between performance and efficiency; 2) the scarcity of training data. To address the latter challenge, some recent methods employ prompts to fine-tune pre-trained RGB tracking models and leverage upstream knowledge in a parameter-efficient manner. However, these methods inadequately explore modality-independent patterns and disregard the dynamic reliability of different modalities in open scenarios. We propose M3PT, a novel RGB-T prompt tracking method that leverages middle fusion and multi-modal and multi-stage visual prompts to overcome these challenges. We pioneer the use of the middle fusion framework for RGB-T tracking, which achieves a balance between performance and efficiency. Furthermore, we incorporate the pre-trained RGB tracking model into the framework and utilize multiple flexible prompt strategies to adapt the pre-trained model to the comprehensive exploration of uni-modal patterns and the improved modeling of fusion-modal features, harnessing the potential of prompt learning in RGB-T tracking. Our method outperforms the state-of-the-art methods on four challenging benchmarks, while attaining 46.1 fps inference speed.
- [109] arXiv:2403.18195 [pdf, other]
-
Title: SCANet: Correcting LEGO Assembly Errors with Self-Correct Assembly NetworkSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Autonomous assembly in robotics and 3D vision presents significant challenges, particularly in ensuring assembly correctness. Presently, predominant methods such as MEPNet focus on assembling components based on manually provided images. However, these approaches often fall short in achieving satisfactory results for tasks requiring long-term planning. Concurrently, we observe that integrating a self-correction module can partially alleviate such issues. Motivated by this concern, we introduce the single-step assembly error correction task, which involves identifying and rectifying misassembled components. To support research in this area, we present the LEGO Error Correction Assembly Dataset (LEGO-ECA), comprising manual images for assembly steps and instances of assembly failures. Additionally, we propose the Self-Correct Assembly Network (SCANet), a novel method to address this task. SCANet treats assembled components as queries, determining their correctness in manual images and providing corrections when necessary. Finally, we utilize SCANet to correct the assembly results of MEPNet. Experimental results demonstrate that SCANet can identify and correct MEPNet's misassembled results, significantly improving the correctness of assembly. Our code and dataset are available at https://github.com/Yaser-wyx/SCANet.
- [110] arXiv:2403.18196 [pdf, ps, other]
-
Title: Looking Beyond What You See: An Empirical Analysis on Subgroup Intersectional Fairness for Multi-label Chest X-ray Classification Using Social Determinants of Racial Health InequitiesComments: ICCV CVAMD 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY)
There has been significant progress in implementing deep learning models in disease diagnosis using chest X- rays. Despite these advancements, inherent biases in these models can lead to disparities in prediction accuracy across protected groups. In this study, we propose a framework to achieve accurate diagnostic outcomes and ensure fairness across intersectional groups in high-dimensional chest X- ray multi-label classification. Transcending traditional protected attributes, we consider complex interactions within social determinants, enabling a more granular benchmark and evaluation of fairness. We present a simple and robust method that involves retraining the last classification layer of pre-trained models using a balanced dataset across groups. Additionally, we account for fairness constraints and integrate class-balanced fine-tuning for multi-label settings. The evaluation of our method on the MIMIC-CXR dataset demonstrates that our framework achieves an optimal tradeoff between accuracy and fairness compared to baseline methods.
- [111] arXiv:2403.18197 [pdf, other]
-
Title: LocoMan: Advancing Versatile Quadrupedal Dexterity with Lightweight Loco-ManipulatorsAuthors: Changyi Lin, Xingyu Liu, Yuxiang Yang, Yaru Niu, Wenhao Yu, Tingnan Zhang, Jie Tan, Byron Boots, Ding ZhaoComments: Project page: this https URLSubjects: Robotics (cs.RO)
Quadrupedal robots have emerged as versatile agents capable of locomoting and manipulating in complex environments. Traditional designs typically rely on the robot's inherent body parts or incorporate top-mounted arms for manipulation tasks. However, these configurations may limit the robot's operational dexterity, efficiency and adaptability, particularly in cluttered or constrained spaces. In this work, we present LocoMan, a dexterous quadrupedal robot with a novel morphology to perform versatile manipulation in diverse constrained environments. By equipping a Unitree Go1 robot with two low-cost and lightweight modular 3-DoF loco-manipulators on its front calves, LocoMan leverages the combined mobility and functionality of the legs and grippers for complex manipulation tasks that require precise 6D positioning of the end effector in a wide workspace. To harness the loco-manipulation capabilities of LocoMan, we introduce a unified control framework that extends the whole-body controller (WBC) to integrate the dynamics of loco-manipulators. Through experiments, we validate that the proposed whole-body controller can accurately and stably follow desired 6D trajectories of the end effector and torso, which, when combined with the large workspace from our design, facilitates a diverse set of challenging dexterous loco-manipulation tasks in confined spaces, such as opening doors, plugging into sockets, picking objects in narrow and low-lying spaces, and bimanual manipulation.
- [112] arXiv:2403.18200 [pdf, other]
-
Title: Fault-tolerant properties of scale-free linear protocols for synchronization of homogeneous multi-agent systemsComments: The article was submitted to IEEE Transactions on Automatic Control for review at March 27th, 2024Subjects: Systems and Control (eess.SY)
Originally, protocols were designed for multi-agent systems (MAS) using information about the network. However, in many cases there is no or only limited information available about the network. Recently, there has been a focus on scale-free synchronization of multi-agent systems (MAS). In this case, the protocol is designed without any prior information about the network. As long as the network contains a directed spanning tree, the scale-free protocol guarantees that the network achieves synchronization.
If there is no directed spanning tree for the network then synchronization cannot be achieved. But what happens when these scale-free protocols are applied to such a network where the directed spanning tree no longer exists? The latter might arise if, for instance, a fault occurs in one of more crucial links. This paper establishes that the network decomposes into a number of basic bicomponents which achieves synchronization among all nodes in this basic bicomponent. On the other hand, nodes which are not part of any basic bicomponent converge to a weighted average of the synchronized trajectories of the basic bicomponents. The weights are independent of the initial conditions and are independent of the designed protocol. - [113] arXiv:2403.18201 [pdf, other]
-
Title: Few-shot Online Anomaly Detection and SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Detecting anomaly patterns from images is a crucial artificial intelligence technique in industrial applications. Recent research in this domain has emphasized the necessity of a large volume of training data, overlooking the practical scenario where, post-deployment of the model, unlabeled data containing both normal and abnormal samples can be utilized to enhance the model's performance. Consequently, this paper focuses on addressing the challenging yet practical few-shot online anomaly detection and segmentation (FOADS) task. Under the FOADS framework, models are trained on a few-shot normal dataset, followed by inspection and improvement of their capabilities by leveraging unlabeled streaming data containing both normal and abnormal samples simultaneously.
To tackle this issue, we propose modeling the feature distribution of normal images using a Neural Gas network, which offers the flexibility to adapt the topology structure to identify outliers in the data flow. In order to achieve improved performance with limited training samples, we employ multi-scale feature embedding extracted from a CNN pre-trained on ImageNet to obtain a robust representation. Furthermore, we introduce an algorithm that can incrementally update parameters without the need to store previous samples. Comprehensive experimental results demonstrate that our method can achieve substantial performance under the FOADS setting, while ensuring that the time complexity remains within an acceptable range on MVTec AD and BTAD datasets. - [114] arXiv:2403.18202 [pdf, other]
-
Title: TGMM: Combining Parse Tree with GPU for Scalable Multilingual and Multi-Granularity Code Clone DetectionComments: 14 pages, 7 figuresSubjects: Software Engineering (cs.SE)
The rapid evolution of programming languages and software systems has necessitated the implementation of multilingual and scalable clone detection tools. However, it is difficult to achieve the above requirements at the same time. Most existing tools only focus on one challenge. In this work, we propose TGMM, a tree and GPU-based tool for multilingual and multi-granularity code clone detection. By generating parse trees based on user-provided grammar files, TGMM can extract code blocks at a specified granularity and detect Type-3 clones efficiently. In order to show the performance of TGMM, we compare it with seven state-of-the-art tools in terms of recall, precision, and execution time. TGMM ranks first in execution time and precision, while its recall is comparable to the others. Moreover, we analyzed the language extensibility of TGMM across 30 mainstream programming languages. Out of these, a total of 25 languages were supported, while the remaining five currently lack the necessary grammar files. Finally, we analyzed the clone characteristics of nine popular languages at five common granularities, hoping to inspire future researchers. The source code of TGMM is available at: https://github.com/TGMM24/TGMM.git.
- [115] arXiv:2403.18203 [pdf, other]
-
Title: EndToEndML: An Open-Source End-to-End Pipeline for Machine Learning ApplicationsAuthors: Nisha Pillai, Athish Ram Das, Moses Ayoola, Ganga Gireesan, Bindu Nanduri, Mahalingam RamkumarComments: 2024 7th International Conference on Information and Computer Technologies (ICICT)Subjects: Artificial Intelligence (cs.AI)
Artificial intelligence (AI) techniques are widely applied in the life sciences. However, applying innovative AI techniques to understand and deconvolute biological complexity is hindered by the learning curve for life science scientists to understand and use computing languages. An open-source, user-friendly interface for AI models, that does not require programming skills to analyze complex biological data will be extremely valuable to the bioinformatics community. With easy access to different sequencing technologies and increased interest in different 'omics' studies, the number of biological datasets being generated has increased and analyzing these high-throughput datasets is computationally demanding. The majority of AI libraries today require advanced programming skills as well as machine learning, data preprocessing, and visualization skills. In this research, we propose a web-based end-to-end pipeline that is capable of preprocessing, training, evaluating, and visualizing machine learning (ML) models without manual intervention or coding expertise. By integrating traditional machine learning and deep neural network models with visualizations, our library assists in recognizing, classifying, clustering, and predicting a wide range of multi-modal, multi-sensor datasets, including images, languages, and one-dimensional numerical data, for drug discovery, pathogen classification, and medical diagnostics.
- [116] arXiv:2403.18205 [pdf, other]
-
Title: Exploring the Privacy Protection Capabilities of Chinese Large Language ModelsComments: 11 pagesSubjects: Artificial Intelligence (cs.AI)
Large language models (LLMs), renowned for their impressive capabilities in various tasks, have significantly advanced artificial intelligence. Yet, these advancements have raised growing concerns about privacy and security implications. To address these issues and explain the risks inherent in these models, we have devised a three-tiered progressive framework tailored for evaluating privacy in language systems. This framework consists of progressively complex and in-depth privacy test tasks at each tier. Our primary objective is to comprehensively evaluate the sensitivity of large language models to private information, examining how effectively they discern, manage, and safeguard sensitive data in diverse scenarios. This systematic evaluation helps us understand the degree to which these models comply with privacy protection guidelines and the effectiveness of their inherent safeguards against privacy breaches. Our observations indicate that existing Chinese large language models universally show privacy protection shortcomings. It seems that at the moment this widespread issue is unavoidable and may pose corresponding privacy risks in applications based on these models.
- [117] arXiv:2403.18206 [pdf, other]
-
Title: Sailing Through Point Clouds: Safe Navigation Using Point Cloud Based Control Barrier FunctionsSubjects: Robotics (cs.RO)
The capability to navigate safely in an unstructured environment is crucial when deploying robotic systems in real-world scenarios. Recently, control barrier function (CBF) based approaches have been highly effective in synthesizing safety-critical controllers. In this work, we propose a novel CBF-based local planner comprised of two components: Vessel and Mariner. The Vessel is a novel scaling factor based CBF formulation that synthesizes CBFs using only point cloud data. The Mariner is a CBF-based preview control framework that is used to mitigate getting stuck in spurious equilibria during navigation. To demonstrate the efficacy of our proposed approach, we first compare the proposed point cloud based CBF formulation with other point cloud based CBF formulations. Then, we demonstrate the performance of our proposed approach and its integration with global planners using experimental studies on the Unitree B1 and Unitree Go2 quadruped robots in various environments.
- [118] arXiv:2403.18207 [pdf, other]
-
Title: Road Obstacle Detection based on Unknown Objectness ScoresComments: ICRA 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
The detection of unknown traffic obstacles is vital to ensure safe autonomous driving. The standard object-detection methods cannot identify unknown objects that are not included under predefined categories. This is because object-detection methods are trained to assign a background label to pixels corresponding to the presence of unknown objects. To address this problem, the pixel-wise anomaly-detection approach has attracted increased research attention. Anomaly-detection techniques, such as uncertainty estimation and perceptual difference from reconstructed images, make it possible to identify pixels of unknown objects as out-of-distribution (OoD) samples. However, when applied to images with many unknowns and complex components, such as driving scenes, these methods often exhibit unstable performance. The purpose of this study is to achieve stable performance for detecting unknown objects by incorporating the object-detection fashions into the pixel-wise anomaly detection methods. To achieve this goal, we adopt a semantic-segmentation network with a sigmoid head that simultaneously provides pixel-wise anomaly scores and objectness scores. Our experimental results show that the objectness scores play an important role in improving the detection performance. Based on these results, we propose a novel anomaly score by integrating these two scores, which we term as unknown objectness score. Quantitative evaluations show that the proposed method outperforms state-of-the-art methods when applied to the publicly available datasets.
- [119] arXiv:2403.18208 [pdf, other]
-
Title: An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)
Hand gesture recognition (HGR) based on multimodal data has attracted considerable attention owing to its great potential in applications. Various manually designed multimodal deep networks have performed well in multimodal HGR (MHGR), but most of existing algorithms require a lot of expert experience and time-consuming manual trials. To address these issues, we propose an evolutionary network architecture search framework with the adaptive multimodel fusion (AMF-ENAS). Specifically, we design an encoding space that simultaneously considers fusion positions and ratios of the multimodal data, allowing for the automatic construction of multimodal networks with different architectures through decoding. Additionally, we consider three input streams corresponding to intra-modal surface electromyography (sEMG), intra-modal accelerometer (ACC), and inter-modal sEMG-ACC. To automatically adapt to various datasets, the ENAS framework is designed to automatically search a MHGR network with appropriate fusion positions and ratios. To the best of our knowledge, this is the first time that ENAS has been utilized in MHGR to tackle issues related to the fusion position and ratio of multimodal data. Experimental results demonstrate that AMF-ENAS achieves state-of-the-art performance on the Ninapro DB2, DB3, and DB7 datasets.
- [120] arXiv:2403.18209 [pdf, other]
-
Title: Long and Short-Term Constraints Driven Safe Reinforcement Learning for Autonomous DrivingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Reinforcement learning (RL) has been widely used in decision-making tasks, but it cannot guarantee the agent's safety in the training process due to the requirements of interaction with the environment, which seriously limits its industrial applications such as autonomous driving. Safe RL methods are developed to handle this issue by constraining the expected safety violation costs as a training objective, but they still permit unsafe state occurrence, which is unacceptable in autonomous driving tasks. Moreover, these methods are difficult to achieve a balance between the cost and return expectations, which leads to learning performance degradation for the algorithms. In this paper, we propose a novel algorithm based on the long and short-term constraints (LSTC) for safe RL. The short-term constraint aims to guarantee the short-term state safety that the vehicle explores, while the long-term constraint ensures the overall safety of the vehicle throughout the decision-making process. In addition, we develop a safe RL method with dual-constraint optimization based on the Lagrange multiplier to optimize the training process for end-to-end autonomous driving. Comprehensive experiments were conducted on the MetaDrive simulator. Experimental results demonstrate that the proposed method achieves higher safety in continuous state and action tasks, and exhibits higher exploration performance in long-distance decision-making tasks compared with state-of-the-art methods.
- [121] arXiv:2403.18211 [pdf, other]
-
Title: NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level ModulationSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Recent fMRI-to-image approaches mainly focused on associating fMRI signals with specific conditions of pre-trained diffusion models. These approaches, while producing high-quality images, capture only a limited aspect of the complex information in fMRI signals and offer little detailed control over image creation. In contrast, this paper proposes to directly modulate the generation process of diffusion models using fMRI signals. Our approach, NeuroPictor, divides the fMRI-to-image process into three steps: i) fMRI calibrated-encoding, to tackle multi-individual pre-training for a shared latent space to minimize individual difference and enable the subsequent cross-subject training; ii) fMRI-to-image cross-subject pre-training, perceptually learning to guide diffusion model with high- and low-level conditions across different individuals; iii) fMRI-to-image single-subject refining, similar with step ii but focus on adapting to particular individual. NeuroPictor extracts high-level semantic features from fMRI signals that characterizing the visual stimulus and incrementally fine-tunes the diffusion model with a low-level manipulation network to provide precise structural instructions. By training with over 60,000 fMRI-image pairs from various individuals, our model enjoys superior fMRI-to-image decoding capacity, particularly in the within-subject setting, as evidenced in benchmark datasets. Project page: https://jingyanghuo.github.io/neuropictor/.
- [122] arXiv:2403.18212 [pdf, other]
-
Title: Preference-Based Planning in Stochastic Environments: From Partially-Ordered Temporal Goals to Most Preferred PoliciesComments: arXiv admin note: substantial text overlap with arXiv:2209.12267Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Formal Languages and Automata Theory (cs.FL); Logic in Computer Science (cs.LO)
Human preferences are not always represented via complete linear orders: It is natural to employ partially-ordered preferences for expressing incomparable outcomes. In this work, we consider decision-making and probabilistic planning in stochastic systems modeled as Markov decision processes (MDPs), given a partially ordered preference over a set of temporally extended goals. Specifically, each temporally extended goal is expressed using a formula in Linear Temporal Logic on Finite Traces (LTL$_f$). To plan with the partially ordered preference, we introduce order theory to map a preference over temporal goals to a preference over policies for the MDP. Accordingly, a most preferred policy under a stochastic ordering induces a stochastic nondominated probability distribution over the finite paths in the MDP. To synthesize a most preferred policy, our technical approach includes two key steps. In the first step, we develop a procedure to transform a partially ordered preference over temporal goals into a computational model, called preference automaton, which is a semi-automaton with a partial order over acceptance conditions. In the second step, we prove that finding a most preferred policy is equivalent to computing a Pareto-optimal policy in a multi-objective MDP that is constructed from the original MDP, the preference automaton, and the chosen stochastic ordering relation. Throughout the paper, we employ running examples to illustrate the proposed preference specification and solution approaches. We demonstrate the efficacy of our algorithm using these examples, providing detailed analysis, and then discuss several potential future directions.
- [123] arXiv:2403.18217 [pdf, ps, other]
-
Title: Mixed Variational Formulation of Coupled PlatesSubjects: Numerical Analysis (math.NA)
This paper proposes a mixed variational formulation for the problem of two coupled plates with a rigid {junction}. The proposed mixed {formulation} introduces {the union of} stresses and moments as {an auxiliary variable}, which {are} commonly of great interest in practical applications. The primary challenge lies in determining a suitable {space involving} both boundary and junction conditions of the auxiliary variable. The {theory} of densely defined operators in Hilbert spaces is employed to define {a nonstandard Sobolev space} without the use of trace operators. The well-posedness is established for the mixed formulation. Based on these conditions, this paper provides a framework {of} conforming {mixed} finite element methods. Numerical experiments are given to validate the theoretical results.
- [124] arXiv:2403.18218 [pdf, other]
-
Title: Leveraging Large Language Models for Fuzzy String Matching in Political ScienceAuthors: Yu WangComments: 7 pages, 2 figures, 1 table;Subjects: Artificial Intelligence (cs.AI)
Fuzzy string matching remains a key issue when political scientists combine data from different sources. Existing matching methods invariably rely on string distances, such as Levenshtein distance and cosine similarity. As such, they are inherently incapable of matching strings that refer to the same entity with different names such as ''JP Morgan'' and ''Chase Bank'', ''DPRK'' and ''North Korea'', ''Chuck Fleischmann (R)'' and ''Charles Fleischmann (R)''. In this letter, we propose to use large language models to entirely sidestep this problem in an easy and intuitive manner. Extensive experiments show that our proposed methods can improve the state of the art by as much as 39% in terms of average precision while being substantially easier and more intuitive to use by political scientists. Moreover, our results are robust against various temperatures. We further note that enhanced prompting can lead to additional performance improvements.
- [125] arXiv:2403.18219 [pdf, ps, other]
-
Title: From Two-Dimensional to Three-Dimensional Environment with Q-Learning: Modeling Autonomous Navigation with Reinforcement Learning and no LibrariesAuthors: Ergon Cugler de Moraes SilvaSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation (stat.CO)
Reinforcement learning (RL) algorithms have become indispensable tools in artificial intelligence, empowering agents to acquire optimal decision-making policies through interactions with their environment and feedback mechanisms. This study explores the performance of RL agents in both two-dimensional (2D) and three-dimensional (3D) environments, aiming to research the dynamics of learning across different spatial dimensions. A key aspect of this investigation is the absence of pre-made libraries for learning, with the algorithm developed exclusively through computational mathematics. The methodological framework centers on RL principles, employing a Q-learning agent class and distinct environment classes tailored to each spatial dimension. The research aims to address the question: How do reinforcement learning agents adapt and perform in environments of varying spatial dimensions, particularly in 2D and 3D settings? Through empirical analysis, the study evaluates agents' learning trajectories and adaptation processes, revealing insights into the efficacy of RL algorithms in navigating complex, multi-dimensional spaces. Reflections on the findings prompt considerations for future research, particularly in understanding the dynamics of learning in higher-dimensional environments.
- [126] arXiv:2403.18222 [pdf, other]
-
Title: Uncertainty-Aware Deployment of Pre-trained Language-Conditioned Imitation Learning PoliciesComments: 8 pages, 7 figuresSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Large-scale robotic policies trained on data from diverse tasks and robotic platforms hold great promise for enabling general-purpose robots; however, reliable generalization to new environment conditions remains a major challenge. Toward addressing this challenge, we propose a novel approach for uncertainty-aware deployment of pre-trained language-conditioned imitation learning agents. Specifically, we use temperature scaling to calibrate these models and exploit the calibrated model to make uncertainty-aware decisions by aggregating the local information of candidate actions. We implement our approach in simulation using three such pre-trained models, and showcase its potential to significantly enhance task completion rates. The accompanying code is accessible at the link: https://github.com/BobWu1998/uncertainty_quant_all.git
- [127] arXiv:2403.18223 [pdf, other]
-
Title: A Transformer-Based Framework for Payload Malware Detection and ClassificationSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
As malicious cyber threats become more sophisticated in breaching computer networks, the need for effective intrusion detection systems (IDSs) becomes crucial. Techniques such as Deep Packet Inspection (DPI) have been introduced to allow IDSs analyze the content of network packets, providing more context for identifying potential threats. IDSs traditionally rely on using anomaly-based and signature-based detection techniques to detect unrecognized and suspicious activity. Deep learning techniques have shown great potential in DPI for IDSs due to their efficiency in learning intricate patterns from the packet content being transmitted through the network. In this paper, we propose a revolutionary DPI algorithm based on transformers adapted for the purpose of detecting malicious traffic with a classifier head. Transformers learn the complex content of sequence data and generalize them well to similar scenarios thanks to their self-attention mechanism. Our proposed method uses the raw payload bytes that represent the packet contents and is deployed as man-in-the-middle. The payload bytes are used to detect malicious packets and classify their types. Experimental results on the UNSW-NB15 and CIC-IOT23 datasets demonstrate that our transformer-based model is effective in distinguishing malicious from benign traffic in the test dataset, attaining an average accuracy of 79\% using binary classification and 72\% on the multi-classification experiment, both using solely payload bytes.
- [128] arXiv:2403.18226 [pdf, other]
-
Title: How is Testing Related to Single Statement Bugs?Subjects: Software Engineering (cs.SE)
In this study, we analyzed the correlation between unit test coverage and the occurrence of Single Statement Bugs (SSBs) in open-source Java projects. We analyzed data from the top 100 Maven-based projects on GitHub, which includes 7824 SSBs. Our preliminary findings suggest a weak to moderate correlation, indicating that increased test coverage is somewhat reduce the occurrence of SSBs. However, this relationship is not very strong, emphasizing the need for better tests. Our study contributes to the ongoing discussion on enhancing software quality and provides a basis for future research into effective testing practices aimed at mitigating SSBs.
- [129] arXiv:2403.18227 [pdf, other]
-
Title: One Backpropagation in Two Tower Recommendation ModelsComments: 9 pages, 8 figuresSubjects: Information Retrieval (cs.IR)
Recent years have witnessed extensive researches on developing two tower recommendation models for relieving information overload. Four building modules can be identified in such models, namely, user-item encoding, negative sampling, loss computing and back-propagation updating. To the best of our knowledge, existing algorithms have researched only on the first three modules, yet neglecting the backpropagation module. They all adopt a kind of two backpropagation strategy, which are based on an implicit assumption of equally treating users and items in the training phase. In this paper, we challenge such an equal training assumption and propose a novel one backpropagation updating strategy, which keeps the normal gradient backpropagation for the item encoding tower, but cuts off the backpropagation for the user encoding tower. Instead, we propose a moving-aggregation updating strategy to update a user encoding in each training epoch. Except the proposed backpropagation updating module, we implement the other three modules with the most straightforward choices. Experiments on four public datasets validate the effectiveness and efficiency of our model in terms of improved recommendation performance and reduced computation overload over the state-of-the-art competitors.
- [130] arXiv:2403.18228 [pdf, other]
-
Title: Fourier or Wavelet bases as counterpart self-attention in spikformer for efficient visual classificationComments: 18 pages, 2 figures. arXiv admin note: substantial text overlap with arXiv:2308.02557Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Energy-efficient spikformer has been proposed by integrating the biologically plausible spiking neural network (SNN) and artificial Transformer, whereby the Spiking Self-Attention (SSA) is used to achieve both higher accuracy and lower computational cost. However, it seems that self-attention is not always necessary, especially in sparse spike-form calculation manners. In this paper, we innovatively replace vanilla SSA (using dynamic bases calculating from Query and Key) with spike-form Fourier Transform, Wavelet Transform, and their combinations (using fixed triangular or wavelets bases), based on a key hypothesis that both of them use a set of basis functions for information transformation. Hence, the Fourier-or-Wavelet-based spikformer (FWformer) is proposed and verified in visual classification tasks, including both static image and event-based video datasets. The FWformer can achieve comparable or even higher accuracies ($0.4\%$-$1.5\%$), higher running speed ($9\%$-$51\%$ for training and $19\%$-$70\%$ for inference), reduced theoretical energy consumption ($20\%$-$25\%$), and reduced GPU memory usage ($4\%$-$26\%$), compared to the standard spikformer. Our result indicates the continuous refinement of new Transformers, that are inspired either by biological discovery (spike-form), or information theory (Fourier or Wavelet Transform), is promising.
- [131] arXiv:2403.18229 [pdf, ps, other]
-
Title: A Comprehensive Overview of the Lebesgue Differentiation Theorem in CoqSubjects: Logic in Computer Science (cs.LO)
Formalization of real analysis offers a chance to rebuild traditional proofs of important theorems as unambiguous theories that can be interactively explored. This paper provides a comprehensive overview of the Lebesgue Differentiation Theorem formalized in the Coq proof assistant, from which the first Fundamental Theorem of Calculus (FTC) for the Lebesgue integral is obtained as a corollary. Proving the first FTC in this way has the advantage of decomposing into loosely-coupled theories of moderate size and of independent interest that lend themselves well to incremental and collaborative development. We explain how we formalize all the topological constructs and all the standard lemmas needed to eventually relate the definitions of derivability and of Lebesgue integration of MathComp-Analysis, a formalization of analysis developed on top of the Mathematical Components library. In the course of this experiment, we substantially enrich MathComp-Analysis and even devise a new proof for Urysohn's lemma.
- [132] arXiv:2403.18230 [pdf, other]
-
Title: Large Language Models Need Consultants for Reasoning: Becoming an Expert in a Complex Human System Through Behavior SimulationSubjects: Artificial Intelligence (cs.AI)
Large language models (LLMs), in conjunction with various reasoning reinforcement methodologies, have demonstrated remarkable capabilities comparable to humans in fields such as mathematics, law, coding, common sense, and world knowledge. In this paper, we delve into the reasoning abilities of LLMs within complex human systems. We propose a novel reasoning framework, termed ``Mosaic Expert Observation Wall'' (MEOW) exploiting generative-agents-based simulation technique. In the MEOW framework, simulated data are utilized to train an expert model concentrating ``experience'' about a specific task in each independent time of simulation. It is the accumulated ``experience'' through the simulation that makes for an expert on a task in a complex human system. We conduct the experiments within a communication game that mirrors real-world security scenarios. The results indicate that our proposed methodology can cooperate with existing methodologies to enhance the reasoning abilities of LLMs in complex human systems.
- [133] arXiv:2403.18231 [pdf, ps, other]
-
Title: The Dimensions of the Hulls of Conorm Codes from Algebraic Geometry CodesSubjects: Information Theory (cs.IT)
Chara et al. introduced conorm codes defined over algebraic geometry codes, but the hulls of conorm codes were not determined yet. In this paper, we study the dimension of the hull of conorm codes using the method introduced by Camps et al. For an algebraic geometry code $\mathcal{C}:=C_\mathscr{L}(D, G)$, we consider the divisor $\gcd(G, H)$, where $H$ is the divisor satisfying \[C_\mathscr{L}(D, G)^\perp=C_\mathscr{L}(D, H).\] Given an extension $F'/\mathbb{F}_{q^t}$ of an algebraic function field $F/\mathbb{F}_q$, we assume that the divisor $\gcd(G, H)$ is non-special. If the degree of $\gcd(G, H)$ is greater than $2g-2+{t\over [F':F]}\deg\text{Diff}(F'/F)$, then we have determined the exact dimension of the hull of the conorm of $\mathcal{C}$. If not, we have determined the lower bound of the dimension of the hull of the conorm of $\mathcal{C}$. We provide some examples for the dimension of the hull of certain conorm codes of AG codes defined over a rational function field.
- [134] arXiv:2403.18235 [pdf, other]
-
Title: An Execution-time-certified QP Algorithm for $\ell_1$ penalty-based Soft-constrained MPCComments: 6 pagesSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Providing an execution time certificate and handling possible infeasibility in closed-loop are two pressing requirements of Model Predictive Control (MPC). To simultaneously meet these two requirements, this paper uses $\ell_1$ penalty-based soft-constrained MPC formulation and innovatively transforms the resulting non-smooth QP into a box-constrained QP, which is solved by our previously proposed direct and execution-time certified algorithm with only dimension-dependent (data-independent) and exact number of iterations [1]. This approach not only overcomes the limitation of our previously proposed algorithm [1], only applicable to input-constrained MPC, but also enjoys exact recovery feature (exactly recover the same solution when the original problem is feasible) of $\ell_1$ penalty-based soft-constrained MPC formulation without suffering numerical difficulty of the resulting non-smoothness. Other various real-time QP applications, not limited to MPC, will also benefit from our QP algorithm with execution-time certificate and global feasibility.
- [135] arXiv:2403.18236 [pdf, other]
-
Title: Multi-AGV Path Planning Method via Reinforcement Learning and Particle FiltersAuthors: Shao ShuoSubjects: Robotics (cs.RO)
The Reinforcement Learning (RL) algorithm, renowned for its robust learning capability and search stability, has garnered significant attention and found extensive application in Automated Guided Vehicle (AGV) path planning. However, RL planning algorithms encounter challenges stemming from the substantial variance of neural networks caused by environmental instability and significant fluctuations in system structure. These challenges manifest in slow convergence speed and low learning efficiency. To tackle this issue, this paper presents the Particle Filter-Double Deep Q-Network (PF-DDQN) approach, which incorporates the Particle Filter (PF) into multi-AGV reinforcement learning path planning. The PF-DDQN method leverages the imprecise weight values of the network as state values to formulate the state space equation. Through the iterative fusion process of neural networks and particle filters, the DDQN model is optimized to acquire the optimal true weight values, thus enhancing the algorithm's efficiency. The proposed method's effectiveness and superiority are validated through numerical simulations. Overall, the simulation results demonstrate that the proposed algorithm surpasses the traditional DDQN algorithm in terms of path planning superiority and training time indicators by 92.62% and 76.88%, respectively. In conclusion, the PF-DDQN method addresses the challenges encountered by RL planning algorithms in AGV path planning. By integrating the Particle Filter and optimizing the DDQN model, the proposed method achieves enhanced efficiency and outperforms the traditional DDQN algorithm in terms of path planning superiority and training time indicators.
- [136] arXiv:2403.18238 [pdf, other]
-
Title: TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial ScenesAuthors: Liangyu Xu, Wanxuan Lu, Hongfeng Yu, Yongqiang Mao, Hanbo Bi, Chenglong Liu, Xian Sun, Kun FuComments: 17 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
As drone technology advances, using unmanned aerial vehicles for aerial surveys has become the dominant trend in modern low-altitude remote sensing. The surge in aerial video data necessitates accurate prediction for future scenarios and motion states of the interested target, particularly in applications like traffic management and disaster response. Existing video prediction methods focus solely on predicting future scenes (video frames), suffering from the neglect of explicitly modeling target's motion states, which is crucial for aerial video interpretation. To address this issue, we introduce a novel task called Target-Aware Aerial Video Prediction, aiming to simultaneously predict future scenes and motion states of the target. Further, we design a model specifically for this task, named TAFormer, which provides a unified modeling approach for both video and target motion states. Specifically, we introduce Spatiotemporal Attention (STA), which decouples the learning of video dynamics into spatial static attention and temporal dynamic attention, effectively modeling the scene appearance and motion. Additionally, we design an Information Sharing Mechanism (ISM), which elegantly unifies the modeling of video and target motion by facilitating information interaction through two sets of messenger tokens. Moreover, to alleviate the difficulty of distinguishing targets in blurry predictions, we introduce Target-Sensitive Gaussian Loss (TSGL), enhancing the model's sensitivity to both target's position and content. Extensive experiments on UAV123VP and VisDroneVP (derived from single-object tracking datasets) demonstrate the exceptional performance of TAFormer in target-aware video prediction, showcasing its adaptability to the additional requirements of aerial video interpretation for target awareness.
- [137] arXiv:2403.18241 [pdf, other]
-
Title: NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and GenerationAuthors: Ruikai Cui, Weizhe Liu, Weixuan Sun, Senbo Wang, Taizhang Shang, Yang Li, Xibin Song, Han Yan, Zhennan Wu, Shenzhou Chen, Hongdong Li, Pan JiSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
3D shape generation aims to produce innovative 3D content adhering to specific conditions and constraints. Existing methods often decompose 3D shapes into a sequence of localized components, treating each element in isolation without considering spatial consistency. As a result, these approaches exhibit limited versatility in 3D data representation and shape generation, hindering their ability to generate highly diverse 3D shapes that comply with the specified constraints. In this paper, we introduce a novel spatial-aware 3D shape generation framework that leverages 2D plane representations for enhanced 3D shape modeling. To ensure spatial coherence and reduce memory usage, we incorporate a hybrid shape representation technique that directly learns a continuous signed distance field representation of the 3D shape using orthogonal 2D planes. Additionally, we meticulously enforce spatial correspondences across distinct planes using a transformer-based autoencoder structure, promoting the preservation of spatial relationships in the generated 3D shapes. This yields an algorithm that consistently outperforms state-of-the-art 3D shape generation methods on various tasks, including unconditional shape generation, multi-modal shape completion, single-view reconstruction, and text-to-shape synthesis.
- [138] arXiv:2403.18243 [pdf, other]
-
Title: Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-CheckSubjects: Artificial Intelligence (cs.AI)
Retrieval-Augmented Generation (RAG) aims to generate more reliable and accurate responses, by augmenting large language models (LLMs) with the external vast and dynamic knowledge. Most previous work focuses on using RAG for single-round question answering, while how to adapt RAG to the complex conversational setting wherein the question is interdependent on the preceding context is not well studied. In this paper, we propose a conversation-level RAG approach, which incorporates fine-grained retrieval augmentation and self-check for conversational question answering (CQA). In particular, our approach consists of three components, namely conversational question refiner, fine-grained retriever and self-check based response generator, which work collaboratively for question understanding and relevant information acquisition in conversational settings. Extensive experiments demonstrate the great advantages of our approach over the state-of-the-art baselines. Moreover, we also release a Chinese CQA dataset with new features including reformulated question, extracted keyword, retrieved paragraphs and their helpfulness, which facilitates further researches in RAG enhanced CQA.
- [139] arXiv:2403.18247 [pdf, other]
-
Title: An Experimentally Validated Feasible Quantum Protocol for Identity-Based Signature with Application to Secure Email CommunicationAuthors: Tapaswini Mohanty, Vikas Srivastava, Sumit Kumar Debnath, Debasish Roy, Kouichi Sakurai, Sourav MukhopadhyaySubjects: Cryptography and Security (cs.CR); Information Theory (cs.IT)
Digital signatures are one of the simplest cryptographic building blocks that provide appealing security characteristics such as authenticity, unforgeability, and undeniability. In 1984, Shamir developed the first Identity-based signature (IBS) to simplify public key infrastructure and circumvent the need for certificates. It makes the process uncomplicated by enabling users to verify digital signatures using only the identifiers of signers, such as email, phone number, etc. Nearly all existing IBS protocols rely on several theoretical assumption-based hard problems. Unfortunately, these hard problems are unsafe and pose a hazard in the quantum realm. Thus, designing IBS algorithms that can withstand quantum attacks and ensure long-term security is an important direction for future research. Quantum cryptography (QC) is one such approach. In this paper, we propose an IBS based on QC. Our scheme's security is based on the laws of quantum mechanics. It thereby achieves long-term security and provides resistance against quantum attacks. We verify the proposed design's correctness and feasibility by simulating it in a prototype quantum device and the IBM Qiskit quantum simulator. The implementation code in qiskit with Jupyternotebook is provided in the Annexure. Moreover, we discuss the application of our design in secure email communication.
- [140] arXiv:2403.18249 [pdf, other]
-
Title: Exploring the Deceptive Power of LLM-Generated Fake News: A Study of Real-World Detection ChallengesSubjects: Computation and Language (cs.CL); Social and Information Networks (cs.SI)
Recent advancements in Large Language Models (LLMs) have enabled the creation of fake news, particularly in complex fields like healthcare. Studies highlight the gap in the deceptive power of LLM-generated fake news with and without human assistance, yet the potential of prompting techniques has not been fully explored. Thus, this work aims to determine whether prompting strategies can effectively narrow this gap. Current LLM-based fake news attacks require human intervention for information gathering and often miss details and fail to maintain context consistency. Therefore, to better understand threat tactics, we propose a strong fake news attack method called conditional Variational-autoencoder-Like Prompt (VLPrompt). Unlike current methods, VLPrompt eliminates the need for additional data collection while maintaining contextual coherence and preserving the intricacies of the original text. To propel future research on detecting VLPrompt attacks, we created a new dataset named VLPrompt fake news (VLPFN) containing real and fake texts. Our experiments, including various detection methods and novel human study metrics, were conducted to assess their performance on our dataset, yielding numerous findings.
- [141] arXiv:2403.18250 [pdf, other]
-
Title: Linear Hybrid Asymmetrical Load-Modulated Balanced Amplifier with Multi-Band Reconfigurability and Antenna-VSWR ResilienceComments: This work has been submitted to the IEEE for possible publicationSubjects: Systems and Control (eess.SY)
This paper presents the first-ever highly linear and load-insensitive three-way load-modulation power amplifier (PA) based on reconfigurable hybrid asymmetrical load modulated balanced amplifier (H-ALMBA). Through proper amplitude and phase controls, the carrier, control amplifier (CA), and two peaking balanced amplifiers (BA1 and BA2) can form a linear high-order load modulation over wide bandwidth. Moreover, it is theoretically unveiled that the load modulation behavior of H-ALMBA can be insensitive to load mismatch by leveraging bias reconfiguration and the intrinsic load-insensitivity of balanced topology. Specifically, the PA's linearity and efficiency profiles can be maintained against arbitrary load mismatch through $Z_\mathrm{L}$-dependent reconfiguration of CA supply voltage ($V_\mathrm{DD,CA}$) and turning-on sequence of BA1 and BA2. Based on the proposed theory, an RF-input linear H-ALMBA is developed with GaN transistors and wideband quadrature hybrids. Over the design bandwidth from $1.7$-$2.9$ GHz, an efficiency of $56.8\%$$-$$72.9\%$ at peak power and $49.8\%$$-$$61.2\%$ at $10$-dB PBO are measured together with linear AMAM and AMPM responses. In modulated evaluation with 4G LTE signal, an EVM of $3.1\%$, ACPR of $-39$ dB, and average efficiency of up to $52\%$ are measured. Moreover, the reconfigurable H-ALMBA experimentally maintains an excellent average efficiency and linearity against arbitrary load mismatch at $2:1$ VSWR, and this mismatch-resilient operation can be achieved at any in-band frequencies. The overall measured performance favorably outperforms the state-of-the-art.
- [142] arXiv:2403.18251 [pdf, other]
-
Title: Since the Scientific Literature Is Multilingual, Our Models Should Be TooSubjects: Computation and Language (cs.CL)
English has long been assumed the $\textit{lingua franca}$ of scientific research, and this notion is reflected in the natural language processing (NLP) research involving scientific document representation. In this position piece, we quantitatively show that the literature is largely multilingual and argue that current models and benchmarks should reflect this linguistic diversity. We provide evidence that text-based models fail to create meaningful representations for non-English papers and highlight the negative user-facing impacts of using English-only models non-discriminately across a multilingual domain. We end with suggestions for the NLP community on how to improve performance on non-English documents.
- [143] arXiv:2403.18252 [pdf, other]
-
Title: Beyond Embeddings: The Promise of Visual Table in Multi-Modal ModelsComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM)
Visual representation learning has been a cornerstone in computer vision, evolving from supervised learning with human-annotated labels to aligning image-text pairs from the Internet. Despite recent advancements in multi-modal large language models (MLLMs), the visual representations they rely on, such as CLIP embeddings, often lack access to external world knowledge critical for real-world visual reasoning. In this work, we propose Visual Table, a novel visual representation tailored for MLLMs. It provides hierarchical text descriptions of holistic visual scenes, consisting of a scene description and multiple object-centric descriptions that encompass categories, attributes, and knowledge at instance level. We further develop a scalable generator for visual table generation and train it on small-scale annotations from GPT4V. Extensive evaluations demonstrate that, with generated visual tables as additional visual representations, our model can consistently outperform the state-of-the-art (SOTA) MLLMs across diverse benchmarks. When visual tables serve as standalone visual representations, our model can closely match or even beat the SOTA MLLMs that are built on CLIP visual embeddings. Our code is available at https://github.com/LaVi-Lab/Visual-Table.
- [144] arXiv:2403.18253 [pdf, other]
-
Title: MD-PK: Metaphor Detection via Prompt Learning and Knowledge DistillationSubjects: Computation and Language (cs.CL)
Metaphors are ubiquitous in daily life, yet detecting them poses a significant challenge. Previous approaches often struggled with improper application of language rules and overlooked the issue of data sparsity. To address these challenges, we introduce knowledge distillation and prompt learning into metaphor detection. Specifically, we devise a prompt learning template tailored for the metaphor detection task. By masking target words and providing relevant prompt information, we guide the model to accurately infer the contextual meaning of these words. This approach not only mitigates the interference from the literal meaning of target words but also ensures the proper utilization of MIP language rules for metaphor detection. Moreover, we employ a teacher model equipped with prior knowledge to generate meaningful soft labels, guiding the optimization process of the student model. The inclusion of soft labels, akin to label smoothing, helps alleviate the model's tendency towards over-confidence and effectively addresses the challenge of data sparsity. Experimental results demonstrate that our proposed model achieves state-of-the-art performance across multiple datasets.
- [145] arXiv:2403.18254 [pdf, other]
-
Title: Differentially Private Distributed Nonconvex Stochastic Optimization with Quantized CommunicationsSubjects: Systems and Control (eess.SY)
This paper proposes a new distributed nonconvex stochastic optimization algorithm that can achieve privacy protection, communication efficiency and convergence simultaneously. Specifically, each node adds time-varying privacy noises to its local state to avoid information leakage, and then quantizes its noise-perturbed state before transmitting to improve communication efficiency. By employing the subsampling method controlled through the sample-size parameter, the proposed algorithm reduces the impact of privacy noises, and enhances the differential privacy level. When the global cost function satisfies the Polyak-Lojasiewicz condition, the mean and high-probability convergence rate and the oracle complexity of the proposed algorithm are given. Importantly, the proposed algorithm achieves both the mean convergence and a finite cumulative differential privacy budget over infinite iterations as the sample-size goes to infinity. A numerical example of the distributed training on the "MNIST" dataset is given to show the effectiveness of the algorithm.
- [146] arXiv:2403.18256 [pdf, other]
-
Title: Manipulating Neural Path Planners via Slight PerturbationsSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Data-driven neural path planners are attracting increasing interest in the robotics community. However, their neural network components typically come as black boxes, obscuring their underlying decision-making processes. Their black-box nature exposes them to the risk of being compromised via the insertion of hidden malicious behaviors. For example, an attacker may hide behaviors that, when triggered, hijack a delivery robot by guiding it to a specific (albeit wrong) destination, trapping it in a predefined region, or inducing unnecessary energy expenditure by causing the robot to repeatedly circle a region. In this paper, we propose a novel approach to specify and inject a range of hidden malicious behaviors, known as backdoors, into neural path planners. Our approach provides a concise but flexible way to define these behaviors, and we show that hidden behaviors can be triggered by slight perturbations (e.g., inserting a tiny unnoticeable object), that can nonetheless significantly compromise their integrity. We also discuss potential techniques to identify these backdoors aimed at alleviating such risks. We demonstrate our approach on both sampling-based and search-based neural path planners.
- [147] arXiv:2403.18258 [pdf, other]
-
Title: Enhancing Generative Class Incremental Learning Performance with Model Forgetting ApproachSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
This study presents a novel approach to Generative Class Incremental Learning (GCIL) by introducing the forgetting mechanism, aimed at dynamically managing class information for better adaptation to streaming data. GCIL is one of the hot topics in the field of computer vision, and this is considered one of the crucial tasks in society, specifically the continual learning of generative models. The ability to forget is a crucial brain function that facilitates continual learning by selectively discarding less relevant information for humans. However, in the field of machine learning models, the concept of intentionally forgetting has not been extensively investigated. In this study we aim to bridge this gap by incorporating the forgetting mechanisms into GCIL, thereby examining their impact on the models' ability to learn in continual learning. Through our experiments, we have found that integrating the forgetting mechanisms significantly enhances the models' performance in acquiring new knowledge, underscoring the positive role that strategic forgetting plays in the process of continual learning.
- [148] arXiv:2403.18259 [pdf, other]
-
Title: RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint GenerationComments: Accepted by ICRA 2024Subjects: Robotics (cs.RO)
Estimating robot pose and joint angles is significant in advanced robotics, enabling applications like robot collaboration and online hand-eye calibration.However, the introduction of unknown joint angles makes prediction more complex than simple robot pose estimation, due to its higher dimensionality.Previous methods either regress 3D keypoints directly or utilise a render&compare strategy. These approaches often falter in terms of performance or efficiency and grapple with the cross-camera gap problem.This paper presents a novel framework that bifurcates the high-dimensional prediction task into two manageable subtasks: 2D keypoints detection and lifting 2D keypoints to 3D. This separation promises enhanced performance without sacrificing the efficiency innate to keypoint-based techniques.A vital component of our method is the lifting of 2D keypoints to 3D keypoints. Common deterministic regression methods may falter when faced with uncertainties from 2D detection errors or self-occlusions.Leveraging the robust modeling potential of diffusion models, we reframe this issue as a conditional 3D keypoints generation task. To bolster cross-camera adaptability, we introduce theNormalised Camera Coordinate Space (NCCS), ensuring alignment of estimated 2D keypoints across varying camera intrinsics.Experimental results demonstrate that the proposed method outperforms the state-of-the-art render\&compare method and achieves higher inference speed.Furthermore, the tests accentuate our method's robust cross-camera generalisation capabilities.We intend to release both the dataset and code in https://nimolty.github.io/Robokeygen/
- [149] arXiv:2403.18260 [pdf, other]
-
Title: Toward Interactive Regional Understanding in Vision-Large Language ModelsComments: NAACL 2024 Main ConferenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Recent Vision-Language Pre-training (VLP) models have demonstrated significant advancements. Nevertheless, these models heavily rely on image-text pairs that capture only coarse and global information of an image, leading to a limitation in their regional understanding ability. In this work, we introduce \textbf{RegionVLM}, equipped with explicit regional modeling capabilities, allowing them to understand user-indicated image regions. To achieve this, we design a simple yet innovative architecture, requiring no modifications to the model architecture or objective function. Additionally, we leverage a dataset that contains a novel source of information, namely Localized Narratives, which has been overlooked in previous VLP research. Our experiments demonstrate that our single generalist model not only achieves an interactive dialogue system but also exhibits superior performance on various zero-shot region understanding tasks, without compromising its ability for global image understanding.
- [150] arXiv:2403.18266 [pdf, other]
-
Title: Branch-Tuning: Balancing Stability and Plasticity for Continual Self-Supervised LearningSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Self-supervised learning (SSL) has emerged as an effective paradigm for deriving general representations from vast amounts of unlabeled data. However, as real-world applications continually integrate new content, the high computational and resource demands of SSL necessitate continual learning rather than complete retraining. This poses a challenge in striking a balance between stability and plasticity when adapting to new information. In this paper, we employ Centered Kernel Alignment for quantitatively analyzing model stability and plasticity, revealing the critical roles of batch normalization layers for stability and convolutional layers for plasticity. Motivated by this, we propose Branch-tuning, an efficient and straightforward method that achieves a balance between stability and plasticity in continual SSL. Branch-tuning consists of branch expansion and compression, and can be easily applied to various SSL methods without the need of modifying the original methods, retaining old data or models. We validate our method through incremental experiments on various benchmark datasets, demonstrating its effectiveness and practical value in real-world scenarios. We hope our work offers new insights for future continual self-supervised learning research. The code will be made publicly available.
- [151] arXiv:2403.18267 [pdf, other]
-
Title: DSF-GAN: DownStream Feedback Generative Adversarial NetworkSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Utility and privacy are two crucial measurements of the quality of synthetic tabular data. While significant advancements have been made in privacy measures, generating synthetic samples with high utility remains challenging. To enhance the utility of synthetic samples, we propose a novel architecture called the DownStream Feedback Generative Adversarial Network (DSF-GAN). This approach incorporates feedback from a downstream prediction model during training to augment the generator's loss function with valuable information. Thus, DSF-GAN utilizes a downstream prediction task to enhance the utility of synthetic samples. To evaluate our method, we tested it using two popular datasets. Our experiments demonstrate improved model performance when training on synthetic samples generated by DSF-GAN, compared to those generated by the same GAN architecture without feedback. The evaluation was conducted on the same validation set comprising real samples. All code and datasets used in this research will be made openly available for ease of reproduction.
- [152] arXiv:2403.18270 [pdf, other]
-
Title: Image Deraining via Self-supervised Reinforcement LearningSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
The quality of images captured outdoors is often affected by the weather. One factor that interferes with sight is rain, which can obstruct the view of observers and computer vision applications that rely on those images. The work aims to recover rain images by removing rain streaks via Self-supervised Reinforcement Learning (RL) for image deraining (SRL-Derain). We locate rain streak pixels from the input rain image via dictionary learning and use pixel-wise RL agents to take multiple inpainting actions to remove rain progressively. To our knowledge, this work is the first attempt where self-supervised RL is applied to image deraining. Experimental results on several benchmark image-deraining datasets show that the proposed SRL-Derain performs favorably against state-of-the-art few-shot and self-supervised deraining and denoising methods.
- [153] arXiv:2403.18271 [pdf, other]
-
Title: Unleashing the Potential of SAM for Medical Adaptation via Hierarchical DecodingComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
The Segment Anything Model (SAM) has garnered significant attention for its versatile segmentation abilities and intuitive prompt-based interface. However, its application in medical imaging presents challenges, requiring either substantial training costs and extensive medical datasets for full model fine-tuning or high-quality prompts for optimal performance. This paper introduces H-SAM: a prompt-free adaptation of SAM tailored for efficient fine-tuning of medical images via a two-stage hierarchical decoding procedure. In the initial stage, H-SAM employs SAM's original decoder to generate a prior probabilistic mask, guiding a more intricate decoding process in the second stage. Specifically, we propose two key designs: 1) A class-balanced, mask-guided self-attention mechanism addressing the unbalanced label distribution, enhancing image embedding; 2) A learnable mask cross-attention mechanism spatially modulating the interplay among different image regions based on the prior mask. Moreover, the inclusion of a hierarchical pixel decoder in H-SAM enhances its proficiency in capturing fine-grained and localized details. This approach enables SAM to effectively integrate learned medical priors, facilitating enhanced adaptation for medical image segmentation with limited samples. Our H-SAM demonstrates a 4.78% improvement in average Dice compared to existing prompt-free SAM variants for multi-organ segmentation using only 10% of 2D slices. Notably, without using any unlabeled data, H-SAM even outperforms state-of-the-art semi-supervised models relying on extensive unlabeled training data across various medical datasets. Our code is available at https://github.com/Cccccczh404/H-SAM.
- [154] arXiv:2403.18274 [pdf, other]
-
Title: DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure AlignmentSubjects: Computer Vision and Pattern Recognition (cs.CV)
Information inside visual and LiDAR data is well complementary derived from the fine-grained texture of images and massive geometric information in point clouds. However, it remains challenging to explore effective visual-LiDAR fusion, mainly due to the intrinsic data structure inconsistency between two modalities: Images are regular and dense, but LiDAR points are unordered and sparse. To address the problem, we propose a local-to-global fusion network with bi-directional structure alignment. To obtain locally fused features, we project points onto image plane as cluster centers and cluster image pixels around each center. Image pixels are pre-organized as pseudo points for image-to-point structure alignment. Then, we convert points to pseudo images by cylindrical projection (point-to-image structure alignment) and perform adaptive global feature fusion between point features with local fused features. Our method achieves state-of-the-art performance on KITTI odometry and FlyingThings3D scene flow datasets compared to both single-modal and multi-modal methods. Codes will be released later.
- [155] arXiv:2403.18275 [pdf, other]
-
Title: Differentially Private Dual Gradient Tracking for Distributed Resource AllocationSubjects: Systems and Control (eess.SY)
This paper investigates privacy issues in distributed resource allocation over directed networks, where each agent holds a private cost function and optimizes its decision subject to a global coupling constraint through local interaction with other agents. Conventional methods for resource allocation over directed networks require all agents to transmit their original data to neighbors, which poses the risk of disclosing sensitive and private information. To address this issue, we propose an algorithm called differentially private dual gradient tracking (DP-DGT) for distributed resource allocation, which obfuscates the exchanged messages using independent Laplacian noise. Our algorithm ensures that the agents' decisions converge to a neighborhood of the optimal solution almost surely. Furthermore, without the assumption of bounded gradients, we prove that the cumulative differential privacy loss under the proposed algorithm is finite even when the number of iterations goes to infinity. To the best of our knowledge, we are the first to simultaneously achieve these two goals in distributed resource allocation problems over directed networks. Finally, numerical simulations on economic dispatch problems within the IEEE 14-bus system illustrate the effectiveness of our proposed algorithm.
- [156] arXiv:2403.18276 [pdf, other]
-
Title: RankMamba, Benchmarking Mamba's Document Ranking Performance in the Era of TransformersAuthors: Zhichao XuSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (IR). Transformer architecture's core mechanism -- attention requires $O(n^2)$ time complexity in training and $O(n)$ time complexity in inference. Many works have been proposed to improve the attention mechanism's scalability, such as Flash Attention and Multi-query Attention. A different line of work aims to design new mechanisms to replace attention. Recently, a notable model structure -- Mamba, which is based on state space models, has achieved transformer-equivalent performance in multiple sequence modeling tasks.
In this work, we examine \mamba's efficacy through the lens of a classical IR task -- document ranking. A reranker model takes a query and a document as input, and predicts a scalar relevance score. This task demands the language model's ability to comprehend lengthy contextual inputs and to capture the interaction between query and document tokens. We find that (1) Mamba models achieve competitive performance compared to transformer-based models with the same training recipe; (2) but also have a lower training throughput in comparison to efficient transformer implementations such as flash attention. We hope this study can serve as a starting point to explore Mamba models in other classical IR tasks. Our code implementation and trained checkpoints are made public to facilitate reproducibility.\footnote{https://github.com/zhichaoxu-shufe/RankMamba}. - [157] arXiv:2403.18277 [pdf, other]
-
Title: BlendX: Complex Multi-Intent Detection with Blended PatternsComments: Accepted to LREC-COLING2024Subjects: Computation and Language (cs.CL)
Task-oriented dialogue (TOD) systems are commonly designed with the presumption that each utterance represents a single intent. However, this assumption may not accurately reflect real-world situations, where users frequently express multiple intents within a single utterance. While there is an emerging interest in multi-intent detection (MID), existing in-domain datasets such as MixATIS and MixSNIPS have limitations in their formulation. To address these issues, we present BlendX, a suite of refined datasets featuring more diverse patterns than their predecessors, elevating both its complexity and diversity. For dataset construction, we utilize both rule-based heuristics as well as a generative tool -- OpenAI's ChatGPT -- which is augmented with a similarity-driven strategy for utterance selection. To ensure the quality of the proposed datasets, we also introduce three novel metrics that assess the statistical properties of an utterance related to word count, conjunction use, and pronoun usage. Extensive experiments on BlendX reveal that state-of-the-art MID models struggle with the challenges posed by the new datasets, highlighting the need to reexamine the current state of the MID field. The dataset is available at https://github.com/HYU-NLP/BlendX.
- [158] arXiv:2403.18278 [pdf, other]
-
Title: Identification and Uses of Deep Learning Backbones via Pattern MiningComments: 9 pages, 6 figures, published SIAM SDM24Subjects: Artificial Intelligence (cs.AI)
Deep learning is extensively used in many areas of data mining as a black-box method with impressive results. However, understanding the core mechanism of how deep learning makes predictions is a relatively understudied problem. Here we explore the notion of identifying a backbone of deep learning for a given group of instances. A group here can be instances of the same class or even misclassified instances of the same class. We view each instance for a given group as activating a subset of neurons and attempt to find a subgraph of neurons associated with a given concept/group. We formulate this problem as a set cover style problem and show it is intractable and presents a highly constrained integer linear programming (ILP) formulation. As an alternative, we explore a coverage-based heuristic approach related to pattern mining, and show it converges to a Pareto equilibrium point of the ILP formulation. Experimentally we explore these backbones to identify mistakes and improve performance, explanation, and visualization. We demonstrate application-based results using several challenging data sets, including Bird Audio Detection (BAD) Challenge and Labeled Faces in the Wild (LFW), as well as the classic MNIST data.
- [159] arXiv:2403.18280 [pdf, other]
-
Title: Improving Out-of-Vocabulary Handling in Recommendation SystemsAuthors: William Shiao, Mingxuan Ju, Zhichun Guo, Xin Chen, Evangelos Papalexakis, Tong Zhao, Neil Shah, Yozen LiuComments: 11 pages, 6 figuresSubjects: Information Retrieval (cs.IR)
Recommendation systems (RS) are an increasingly relevant area for both academic and industry researchers, given their widespread impact on the daily online experiences of billions of users. One common issue in real RS is the cold-start problem, where users and items may not contain enough information to produce high-quality recommendations. This work focuses on a complementary problem: recommending new users and items unseen (out-of-vocabulary, or OOV) at training time. This setting is known as the inductive setting and is especially problematic for factorization-based models, which rely on encoding only those users/items seen at training time with fixed parameter vectors. Many existing solutions applied in practice are often naive, such as assigning OOV users/items to random buckets. In this work, we tackle this problem and propose approaches that better leverage available user/item features to improve OOV handling at the embedding table level. We discuss general-purpose plug-and-play approaches that are easily applicable to most RS models and improve inductive performance without negatively impacting transductive model performance. We extensively evaluate 9 OOV embedding methods on 5 models across 4 datasets (spanning different domains). One of these datasets is a proprietary production dataset from a prominent RS employed by a large social platform serving hundreds of millions of daily active users. In our experiments, we find that several proposed methods that exploit feature similarity using LSH consistently outperform alternatives on most model-dataset combinations, with the best method showing a mean improvement of 3.74% over the industry standard baseline in inductive performance. We release our code and hope our work helps practitioners make more informed decisions when handling OOV for their RS and further inspires academic research into improving OOV support in RS.
- [160] arXiv:2403.18281 [pdf, other]
-
Title: AIR-HLoc: Adaptive Image Retrieval for Efficient Visual LocalisationSubjects: Computer Vision and Pattern Recognition (cs.CV)
State-of-the-art (SOTA) hierarchical localisation pipelines (HLoc) rely on image retrieval (IR) techniques to establish 2D-3D correspondences by selecting the $k$ most similar images from a reference image database for a given query image. Although higher values of $k$ enhance localisation robustness, the computational cost for feature matching increases linearly with $k$. In this paper, we observe that queries that are the most similar to images in the database result in a higher proportion of feature matches and, thus, more accurate positioning. Thus, a small number of images is sufficient for queries very similar to images in the reference database. We then propose a novel approach, AIR-HLoc, which divides query images into different localisation difficulty levels based on their similarity to the reference image database. We consider an image with high similarity to the reference image as an easy query and an image with low similarity as a hard query. Easy queries show a limited improvement in accuracy when increasing $k$. Conversely, higher values of $k$ significantly improve accuracy for hard queries. Given the limited improvement in accuracy when increasing $k$ for easy queries and the significant improvement for hard queries, we adapt the value of $k$ to the query's difficulty level. Therefore, AIR-HLoc optimizes processing time by adaptively assigning different values of $k$ based on the similarity between the query and reference images without losing accuracy. Our extensive experiments on the Cambridge Landmarks, 7Scenes, and Aachen Day-Night-v1.1 datasets demonstrate our algorithm's efficacy, reducing 30\%, 26\%, and 11\% in computational overhead while maintaining SOTA accuracy compared to HLoc with fixed image retrieval.
- [161] arXiv:2403.18282 [pdf, other]
-
Title: SGDM: Static-Guided Dynamic Module Make Stronger Visual ModelsComments: 16 pages, 4 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
The spatial attention mechanism has been widely used to improve object detection performance. However, its operation is currently limited to static convolutions lacking content-adaptive features. This paper innovatively approaches from the perspective of dynamic convolution. We propose Razor Dynamic Convolution (RDConv) to address thetwo flaws in dynamic weight convolution, making it hard to implement in spatial mechanism: 1) it is computation-heavy; 2) when generating weights, spatial information is disregarded. Firstly, by using Razor Operation to generate certain features, we vastly reduce the parameters of the entire dynamic convolution operation. Secondly, we added a spatial branch inside RDConv to generate convolutional kernel parameters with richer spatial information. Embedding dynamic convolution will also bring the problem of sensitivity to high-frequency noise. We propose the Static-Guided Dynamic Module (SGDM) to address this limitation. By using SGDM, we utilize a set of asymmetric static convolution kernel parameters to guide the construction of dynamic convolution. We introduce the mechanism of shared weights in static convolution to solve the problem of dynamic convolution being sensitive to high-frequency noise. Extensive experiments illustrate that multiple different object detection backbones equipped with SGDM achieve a highly competitive boost in performance(e.g., +4% mAP with YOLOv5n on VOC and +1.7% mAP with YOLOv8n on COCO) with negligible parameter increase(i.e., +0.33M on YOLOv5n and +0.19M on YOLOv8n).
- [162] arXiv:2403.18285 [pdf, other]
-
Title: Stability and convergence of the penalty formulation for nonlinear magnetostaticsSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
The magnetostatic field distribution in a nonlinear medium amounts to the unique minimizer of the magnetic coenergy over all fields that can be generated by the same current. This is a nonlinear saddlepoint problem whose numerical solution can in principle be achieved by mixed finite element methods and appropriate nonlinear solvers. The saddlepoint structure, however, makes the solution cumbersome. A remedy is to split the magnetic field into a known source field and the gradient of a scalar potential which is governed by a convex minimization problem. The penalty approach avoids the use of artificial potentials and Lagrange multipliers and leads to an unconstrained convex minimization problem involving a large parameter. We provide a rigorous justification of the penalty approach by deriving error estimates for the approximation due to penalization. We further highlight the close connections to the Lagrange-multiplier and scalar potential approach. The theoretical results are illustrated by numerical tests for a typical benchmark problem
- [163] arXiv:2403.18286 [pdf, other]
-
Title: Few-Shot Recalibration of Language ModelsComments: preprintSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Recent work has uncovered promising ways to extract well-calibrated confidence estimates from language models (LMs), where the model's confidence score reflects how likely it is to be correct. However, while LMs may appear well-calibrated over broad distributions, this often hides significant miscalibration within narrower slices (e.g., systemic over-confidence in math can balance out systemic under-confidence in history, yielding perfect calibration in aggregate). To attain well-calibrated confidence estimates for any slice of a distribution, we propose a new framework for few-shot slice-specific recalibration. Specifically, we train a recalibration model that takes in a few unlabeled examples from any given slice and predicts a curve that remaps confidence scores to be more accurate for that slice. Our trained model can recalibrate for arbitrary new slices, without using any labeled data from that slice. This enables us to identify domain-specific confidence thresholds above which the LM's predictions can be trusted, and below which it should abstain. Experiments show that our few-shot recalibrator consistently outperforms existing calibration methods, for instance improving calibration error for PaLM2-Large on MMLU by 16%, as compared to temperature scaling.
- [164] arXiv:2403.18287 [pdf, ps, other]
-
Title: Frozen Gaussian approximation for the fractional Schrödinger equationSubjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
We develop the frozen Gaussian approximation (FGA) for the fractional Schr\"odinger equation in the semi-classical regime, where the solution is highly oscillatory when the scaled Planck constant $\varepsilon$ is small. This method approximates the solution to the Schr\"odinger equation by an integral representation based on asymptotic analysis and provides a highly efficient computational method for high-frequency wave function evolution. In particular, we revise the standard FGA formula to address the singularities arising in the higher-order derivatives of coefficients of the associated Hamiltonian flow that are second-order continuously differentiable or smooth in conventional FGA analysis. We then establish its convergence to the true solution. Additionally, we provide some numerical examples to verify the accuracy and convergence behavior of the frozen Gaussian approximation method.
- [165] arXiv:2403.18291 [pdf, other]
-
Title: Towards Non-Exemplar Semi-Supervised Class-Incremental LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
Deep neural networks perform remarkably well in close-world scenarios. However, novel classes emerged continually in real applications, making it necessary to learn incrementally. Class-incremental learning (CIL) aims to gradually recognize new classes while maintaining the discriminability of old ones. Existing CIL methods have two limitations: a heavy reliance on preserving old data for forgetting mitigation and the need for vast labeled data for knowledge adaptation. To overcome these issues, we propose a non-exemplar semi-supervised CIL framework with contrastive learning and semi-supervised incremental prototype classifier (Semi-IPC). On the one hand, contrastive learning helps the model learn rich representations, easing the trade-off between learning representations of new classes and forgetting that of old classes. On the other hand, Semi-IPC learns a prototype for each class with unsupervised regularization, enabling the model to incrementally learn from partially labeled new data while maintaining the knowledge of old classes. Experiments on benchmark datasets demonstrate the strong performance of our method: without storing any old samples and only using less than 1% of labels, Semi-IPC outperforms advanced exemplar-based methods. We hope our work offers new insights for future CIL research. The code will be made publicly available.
- [166] arXiv:2403.18293 [pdf, other]
-
Title: Efficient Test-Time Adaptation of Vision-Language ModelsComments: Accepted to CVPR 2024. The code has been released in \url{this https URL}Subjects: Computer Vision and Pattern Recognition (cs.CV)
Test-time adaptation with pre-trained vision-language models has attracted increasing attention for tackling distribution shifts during the test time. Though prior studies have achieved very promising performance, they involve intensive computation which is severely unaligned with test-time adaptation. We design TDA, a training-free dynamic adapter that enables effective and efficient test-time adaptation with vision-language models. TDA works with a lightweight key-value cache that maintains a dynamic queue with few-shot pseudo labels as values and the corresponding test-sample features as keys. Leveraging the key-value cache, TDA allows adapting to test data gradually via progressive pseudo label refinement which is super-efficient without incurring any backpropagation. In addition, we introduce negative pseudo labeling that alleviates the adverse impact of pseudo label noises by assigning pseudo labels to certain negative classes when the model is uncertain about its pseudo label predictions. Extensive experiments over two benchmarks demonstrate TDA's superior effectiveness and efficiency as compared with the state-of-the-art. The code has been released in \url{https://kdiaaa.github.io/tda/}.
- [167] arXiv:2403.18294 [pdf, other]
-
Title: Multi-scale Unified Network for Image ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Convolutional Neural Networks (CNNs) have advanced significantly in visual representation learning and recognition. However, they face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs. Conventional methods rescale all input images into a fixed size, wherein a larger fixed size favors performance but rescaling small size images to a larger size incurs digitization noise and increased computation cost. In this work, we carry out a comprehensive, layer-wise investigation of CNN models in response to scale variation, based on Centered Kernel Alignment (CKA) analysis. The observations reveal lower layers are more sensitive to input image scale variations than high-level layers. Inspired by this insight, we propose Multi-scale Unified Network (MUSN) consisting of multi-scale subnets, a unified network, and scale-invariant constraint. Our method divides the shallow layers into multi-scale subnets to enable feature extraction from multi-scale inputs, and the low-level features are unified in deep layers for extracting high-level semantic features. A scale-invariant constraint is posed to maintain feature consistency across different scales. Extensive experiments on ImageNet and other scale-diverse datasets, demonstrate that MSUN achieves significant improvements in both model performance and computational efficiency. Particularly, MSUN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.
- [168] arXiv:2403.18295 [pdf, other]
-
Title: Dual Instruction Tuning with Large Language Models for Mathematical ReasoningSubjects: Computation and Language (cs.CL)
Recent advancements highlight the success of instruction tuning with large language models (LLMs) utilizing Chain-of-Thought (CoT) data for mathematical reasoning tasks. Despite the fine-tuned LLMs, challenges persist, such as incorrect, missing, and redundant steps in CoT generation leading to inaccuracies in answer predictions. To alleviate this problem, we propose a dual instruction tuning strategy to meticulously model mathematical reasoning from both forward and reverse directions. This involves introducing the Intermediate Reasoning State Prediction task (forward reasoning) and the Instruction Reconstruction task (reverse reasoning) to enhance the LLMs' understanding and execution of instructions. Training instances for these tasks are constructed based on existing mathematical instruction tuning datasets. Subsequently, LLMs undergo multi-task fine-tuning using both existing mathematical instructions and the newly created data. Comprehensive experiments validate the effectiveness and domain generalization of the dual instruction tuning strategy across various mathematical reasoning tasks.
- [169] arXiv:2403.18296 [pdf, other]
-
Title: GeNet: A Graph Neural Network-based Anti-noise Task-Oriented Semantic Communication ParadigmSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Traditional approaches to semantic communication tasks rely on the knowledge of the signal-to-noise ratio (SNR) to mitigate channel noise. However, these methods necessitate training under specific SNR conditions, entailing considerable time and computational resources. In this paper, we propose GeNet, a Graph Neural Network (GNN)-based paradigm for semantic communication aimed at combating noise, thereby facilitating Task-Oriented Communication (TOC). We propose a novel approach where we first transform the input data image into graph structures. Then we leverage a GNN-based encoder to extract semantic information from the source data. This extracted semantic information is then transmitted through the channel. At the receiver's end, a GNN-based decoder is utilized to reconstruct the relevant semantic information from the source data for TOC. Through experimental evaluation, we show GeNet's effectiveness in anti-noise TOC while decoupling the SNR dependency. We further evaluate GeNet's performance by varying the number of nodes, revealing its versatility as a new paradigm for semantic communication. Additionally, we show GeNet's robustness to geometric transformations by testing it with different rotation angles, without resorting to data augmentation.
- [170] arXiv:2403.18300 [pdf, other]
-
Title: HotStuff-2 vs. HotStuff: The Difference and AdvantageSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Byzantine consensus protocols are essential in blockchain technology. The widely recognized HotStuff protocol uses cryptographic measures for efficient view changes and reduced communication complexity. Recently, the main authors of HotStuff introduced an advanced iteration named HotStuff-2. This paper aims to compare the principles and analyze the effectiveness of both protocols, hoping to depict their key differences and assess the potential enhancements offered by HotStuff-2.
- [171] arXiv:2403.18301 [pdf, other]
-
Title: Selective Mixup Fine-Tuning for Optimizing Non-Decomposable ObjectivesAuthors: Shrinivas Ramasubramanian, Harsh Rangwani, Sho Takemori, Kunal Samanta, Yuhei Umeda, Venkatesh Babu RadhakrishnanComments: ICLR 2024 SpotLightSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
The rise in internet usage has led to the generation of massive amounts of data, resulting in the adoption of various supervised and semi-supervised machine learning algorithms, which can effectively utilize the colossal amount of data to train models. However, before deploying these models in the real world, these must be strictly evaluated on performance measures like worst-case recall and satisfy constraints such as fairness. We find that current state-of-the-art empirical techniques offer sub-optimal performance on these practical, non-decomposable performance objectives. On the other hand, the theoretical techniques necessitate training a new model from scratch for each performance objective. To bridge the gap, we propose SelMix, a selective mixup-based inexpensive fine-tuning technique for pre-trained models, to optimize for the desired objective. The core idea of our framework is to determine a sampling distribution to perform a mixup of features between samples from particular classes such that it optimizes the given objective. We comprehensively evaluate our technique against the existing empirical and theoretically principled methods on standard benchmark datasets for imbalanced classification. We find that proposed SelMix fine-tuning significantly improves the performance for various practical non-decomposable objectives across benchmarks.
- [172] arXiv:2403.18305 [pdf, other]
-
Title: A Recommender System for NFT Collectibles with Item FeatureComments: Presented at the AAAI 2023 Bridge on AI for Financial Services (this https URL)Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
Recommender systems have been actively studied and applied in various domains to deal with information overload. Although there are numerous studies on recommender systems for movies, music, and e-commerce, comparatively less attention has been paid to the recommender system for NFTs despite the continuous growth of the NFT market. This paper presents a recommender system for NFTs that utilizes a variety of data sources, from NFT transaction records to external item features, to generate precise recommendations that cater to individual preferences. We develop a data-efficient graph-based recommender system to efficiently capture the complex relationship between each item and users and generate node(item) embeddings which incorporate both node feature information and graph structure. Furthermore, we exploit inputs beyond user-item interactions, such as image feature, text feature, and price feature. Numerical experiments verify the performance of the graph-based recommender system improves significantly after utilizing all types of item features as side information, thereby outperforming all other baselines.
- [173] arXiv:2403.18306 [pdf, other]
-
Title: Sm-Nd Isotope Data Compilation from Geoscientific Literature Using an Automated Tabular Extraction MethodAuthors: Zhixin Guo, Tao Wang, Chaoyang Wang, Jianping Zhou, Guanjie Zheng, Xinbing Wang, Chenghu ZhouSubjects: Databases (cs.DB)
The rare earth elements Sm and Nd significantly address fundamental questions about crustal growth, such as its spatiotemporal evolution and the interplay between orogenesis and crustal accretion. Their relative immobility during high-grade metamorphism makes the Sm-Nd isotopic system crucial for inferring crustal formation times. Historically, data have been disseminated sporadically in the scientific literature due to complicated and costly sampling procedures, resulting in a fragmented knowledge base. However, the scattering of critical geoscience data across multiple publications poses significant challenges regarding human capital and time. In response, we present an automated tabular extraction method for harvesting tabular geoscience data. We collect 10,624 Sm-Nd data entries from 9,138 tables in over 20,000 geoscience publications using this method. We manually selected 2,118 data points from it to supplement our previously constructed global Sm-Nd dataset, increasing its sample count by over 20\%. Our automatic data collection methodology enhances the efficiency of data acquisition processes spanning various scientific domains. Furthermore, the constructed Sm-Nd isotopic dataset should motivate the research of classifying global orogenic belts.
- [174] arXiv:2403.18307 [pdf, ps, other]
-
Title: Mutual Information Optimization for SIM-Based Holographic MIMO SystemsComments: 5 pages, 2 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In the context of emerging stacked intelligent metasurface (SIM)-based holographic MIMO (HMIMO) systems, a fundamental problem is to study the mutual information (MI) between transmitted and received signals to establish their capacity. However, direct optimization or analytical evaluation of the MI, particularly for discrete signaling, is often intractable. To address this challenge, we adopt the channel cutoff rate (CR) as an alternative optimization metric for the MI maximization. In this regard, we propose an alternating projected gradient method (APGM), which optimizes the CR of a SIM-based HMIMO system by adjusting signal precoding and the phase shifts across the transmit and receive SIMs in a layer-by-layer basis. Simulation results indicate that the proposed algorithm significantly enhances the CR, achieving substantial gains proportional to those observed for the corresponding MI. This justifies the effectiveness of using the channel CR for the MI optimization. Moreover, we demonstrate that the integration of digital precoding, even on a modest scale, has a significant impact on the ultimate performance of SIM-aided systems.
- [175] arXiv:2403.18308 [pdf, ps, other]
-
Title: Comparison of different methods for identification of dominant oscillation modeJournal-ref: B&H Electrical Engineering Vol.14, Special Edition, pp 43-50, 2020Subjects: Computational Engineering, Finance, and Science (cs.CE)
This paper introduces and compares the various techniques for identification and analysis of low frequency oscillations in a power system. Inter-area electromechanical oscillations are the focus of this paper. After multiresolution decomposition of characteristic signals, physical characteristics of system oscillations in signal components are identified and presented using the Fourier transform, Prony method, Matrix Pencil Analysis Method, S-transform, Global Wavelet Spectrum and Hilbert Huang transform (Hilbert Marginal Spectrum) in time-frequency domain representation. The analyses were performed on real frequency signals obtained from FNET GridEye system during the earthquake that triggered the shutdown of the North Anna Nuclear Generating Station in the east coast of the United States. In addition, according to the obtained results the proposed methods have proven to be reliable for identification of the model parameters of low-frequency oscillation in power systems. The relevant analyses are carried out in MATLAB coding environment.
- [176] arXiv:2403.18309 [pdf, other]
-
Title: Bayesian Learned Models Can Detect Adversarial Malware For FreeAuthors: Bao Gia Doan, Dang Quang Nguyen, Paul Montague, Tamas Abraham, Olivier De Vel, Seyit Camtepe, Salil S. Kanhere, Ehsan Abbasnejad, Damith C. RanasingheComments: Accepted to the 29th European Symposium on Research in Computer Security (ESORICS) 2024 ConferenceSubjects: Cryptography and Security (cs.CR)
The vulnerability of machine learning-based malware detectors to adversarial attacks has prompted the need for robust solutions. Adversarial training is an effective method but is computationally expensive to scale up to large datasets and comes at the cost of sacrificing model performance for robustness. We hypothesize that adversarial malware exploits the low-confidence regions of models and can be identified using epistemic uncertainty of ML approaches -- epistemic uncertainty in a machine learning-based malware detector is a result of a lack of similar training samples in regions of the problem space. In particular, a Bayesian formulation can capture the model parameters' distribution and quantify epistemic uncertainty without sacrificing model performance. To verify our hypothesis, we consider Bayesian learning approaches with a mutual information-based formulation to quantify uncertainty and detect adversarial malware in Android, Windows domains and PDF malware. We found, quantifying uncertainty through Bayesian learning methods can defend against adversarial malware. In particular, Bayesian models: (1) are generally capable of identifying adversarial malware in both feature and problem space, (2) can detect concept drift by measuring uncertainty, and (3) with a diversity-promoting approach (or better posterior approximations) lead to parameter instances from the posterior to significantly enhance a detectors' ability.
- [177] arXiv:2403.18310 [pdf, ps, other]
-
Title: A thermodynamically consistent physics-informed deep learning material model for short fiber/polymer nanocompositesComments: arXiv admin note: text overlap with arXiv:2305.08102Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)
This work proposes a physics-informed deep learning (PIDL)-based constitutive model for investigating the viscoelastic-viscoplastic behavior of short fiber-reinforced nanoparticle-filled epoxies under various ambient conditions. The deep-learning model is trained to enforce thermodynamic principles, leading to a thermodynamically consistent constitutive model. To accomplish this, a long short-term memory network is combined with a feed-forward neural network to predict internal variables required for characterizing the internal dissipation of the nanocomposite materials. In addition, another feed-forward neural network is used to indicate the free-energy function, which enables defining the thermodynamic state of the entire system. The PIDL model is initially developed for the three-dimensional case by generating synthetic data from a classical constitutive model. The model is then trained by extracting the data directly from cyclic loading-unloading experimental tests. Numerical examples show that the PIDL model can accurately predict the mechanical behavior of epoxy-based nanocomposites for different volume fractions of fibers and nanoparticles under various hygrothermal conditions.
- [178] arXiv:2403.18314 [pdf, other]
-
Title: Chinese Offensive Language Detection:Current Status and Future DirectionsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Despite the considerable efforts being made to monitor and regulate user-generated content on social media platforms, the pervasiveness of offensive language, such as hate speech or cyberbullying, in the digital space remains a significant challenge. Given the importance of maintaining a civilized and respectful online environment, there is an urgent and growing need for automatic systems capable of detecting offensive speech in real time. However, developing effective systems for processing languages such as Chinese presents a significant challenge, owing to the language's complex and nuanced nature, which makes it difficult to process automatically. This paper provides a comprehensive overview of offensive language detection in Chinese, examining current benchmarks and approaches and highlighting specific models and tools for addressing the unique challenges of detecting offensive language in this complex language. The primary objective of this survey is to explore the existing techniques and identify potential avenues for further research that can address the cultural and linguistic complexities of Chinese.
- [179] arXiv:2403.18316 [pdf, other]
-
Title: Multi-Modal Contrastive Learning for Online Clinical Time-Series ApplicationsComments: Accepted as a Workshop Paper at TS4H@ICLR2024Subjects: Machine Learning (cs.LG)
Electronic Health Record (EHR) datasets from Intensive Care Units (ICU) contain a diverse set of data modalities. While prior works have successfully leveraged multiple modalities in supervised settings, we apply advanced self-supervised multi-modal contrastive learning techniques to ICU data, specifically focusing on clinical notes and time-series for clinically relevant online prediction tasks. We introduce a loss function Multi-Modal Neighborhood Contrastive Loss (MM-NCL), a soft neighborhood function, and showcase the excellent linear probe and zero-shot performance of our approach.
- [180] arXiv:2403.18317 [pdf, other]
-
Title: A Situation-aware Enhancer for Personalized RecommendationComments: Accepted at the International Conference on Database Systems for Advanced Applications (DASFAA 2024)Subjects: Information Retrieval (cs.IR)
When users interact with Recommender Systems (RecSys), current situations, such as time, location, and environment, significantly influence their preferences. Situations serve as the background for interactions, where relationships between users and items evolve with situation changes. However, existing RecSys treat situations, users, and items on the same level. They can only model the relations between situations and users/items respectively, rather than the dynamic impact of situations on user-item associations (i.e., user preferences). In this paper, we provide a new perspective that takes situations as the preconditions for users' interactions. This perspective allows us to separate situations from user/item representations, and capture situations' influences over the user-item relationship, offering a more comprehensive understanding of situations. Based on it, we propose a novel Situation-Aware Recommender Enhancer (SARE), a pluggable module to integrate situations into various existing RecSys. Since users' perception of situations and situations' impact on preferences are both personalized, SARE includes a Personalized Situation Fusion (PSF) and a User-Conditioned Preference Encoder (UCPE) to model the perception and impact of situations, respectively. We conduct experiments of applying SARE on seven backbones in various settings on two real-world datasets. Experimental results indicate that SARE improves the recommendation performances significantly compared with backbones and SOTA situation-aware baselines.
- [181] arXiv:2403.18318 [pdf, other]
-
Title: Uncertainty-Aware SAR ATR: Defending Against Adversarial Attacks via Bayesian Neural NetworksSubjects: Computer Vision and Pattern Recognition (cs.CV)
Adversarial attacks have demonstrated the vulnerability of Machine Learning (ML) image classifiers in Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) systems. An adversarial attack can deceive the classifier into making incorrect predictions by perturbing the input SAR images, for example, with a few scatterers attached to the on-ground objects. Therefore, it is critical to develop robust SAR ATR systems that can detect potential adversarial attacks by leveraging the inherent uncertainty in ML classifiers, thereby effectively alerting human decision-makers. In this paper, we propose a novel uncertainty-aware SAR ATR for detecting adversarial attacks. Specifically, we leverage the capability of Bayesian Neural Networks (BNNs) in performing image classification with quantified epistemic uncertainty to measure the confidence for each input SAR image. By evaluating the uncertainty, our method alerts when the input SAR image is likely to be adversarially generated. Simultaneously, we also generate visual explanations that reveal the specific regions in the SAR image where the adversarial scatterers are likely to to be present, thus aiding human decision-making with hints of evidence of adversarial attacks. Experiments on the MSTAR dataset demonstrate that our approach can identify over 80% adversarial SAR images with fewer than 20% false alarms, and our visual explanations can identify up to over 90% of scatterers in an adversarial SAR image.
- [182] arXiv:2403.18321 [pdf, ps, other]
-
Title: Implementation of the Principal Component Analysis onto High-Performance Computer Facilities for Hyperspectral Dimensionality Reduction: Results and ComparisonsAuthors: E. Martel, R. Lazcano, J. Lopez, D. Madroñal, R. Salvador, S. Lopez, E. Juarez, R. Guerra, C. Sanz, R. SarmientoComments: 30 pages, 10 figuresSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Dimensionality reduction represents a critical preprocessing step in order to increase the efficiency and the performance of many hyperspectral imaging algorithms. However, dimensionality reduction algorithms, such as the Principal Component Analysis (PCA), suffer from their computationally demanding nature, becoming advisable for their implementation onto high-performance computer architectures for applications under strict latency constraints. This work presents the implementation of the PCA algorithm onto two different high-performance devices, namely, an NVIDIA Graphics Processing Unit (GPU) and a Kalray manycore, uncovering a highly valuable set of tips and tricks in order to take full advantage of the inherent parallelism of these high-performance computing platforms, and hence, reducing the time that is required to process a given hyperspectral image. Moreover, the achieved results obtained with different hyperspectral images have been compared with the ones that were obtained with a field programmable gate array (FPGA)-based implementation of the PCA algorithm that has been recently published, providing, for the first time in the literature, a comprehensive analysis in order to highlight the pros and cons of each option.
- [183] arXiv:2403.18322 [pdf, other]
-
Title: Quantum Algorithms: A New Frontier in Financial Crime PreventionSubjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET)
Financial crimes fast proliferation and sophistication require novel approaches that provide robust and effective solutions. This paper explores the potential of quantum algorithms in combating financial crimes. It highlights the advantages of quantum computing by examining traditional and Machine Learning (ML) techniques alongside quantum approaches. The study showcases advanced methodologies such as Quantum Machine Learning (QML) and Quantum Artificial Intelligence (QAI) as powerful solutions for detecting and preventing financial crimes, including money laundering, financial crime detection, cryptocurrency attacks, and market manipulation. These quantum approaches leverage the inherent computational capabilities of quantum computers to overcome limitations faced by classical methods. Furthermore, the paper illustrates how quantum computing can support enhanced financial risk management analysis. Financial institutions can improve their ability to identify and mitigate risks, leading to more robust risk management strategies by exploiting the quantum advantage. This research underscores the transformative impact of quantum algorithms on financial risk management. By embracing quantum technologies, organisations can enhance their capabilities to combat evolving threats and ensure the integrity and stability of financial systems.
- [184] arXiv:2403.18323 [pdf, other]
-
Title: How to Cache Important Contents for Multi-modal Service in Dynamic Networks: A DRL-based Caching SchemeJournal-ref: IEEE Transactions on Multimedia (Early Access), 2024Subjects: Networking and Internet Architecture (cs.NI); Multimedia (cs.MM)
With the continuous evolution of networking technologies, multi-modal services that involve video, audio, and haptic contents are expected to become the dominant multimedia service in the near future. Edge caching is a key technology that can significantly reduce network load and content transmission latency, which is critical for the delivery of multi-modal contents. However, existing caching approaches only rely on a limited number of factors, e.g., popularity, to evaluate their importance for caching, which is inefficient for caching multi-modal contents, especially in dynamic network environments. To overcome this issue, we propose a content importance-based caching scheme which consists of a content importance evaluation model and a caching model. By leveraging dueling double deep Q networks (D3QN) model, the content importance evaluation model can adaptively evaluate contents' importance in dynamic networks. Based on the evaluated contents' importance, the caching model can easily cache and evict proper contents to improve caching efficiency. The simulation results show that the proposed content importance-based caching scheme outperforms existing caching schemes in terms of caching hit ratio (at least 15% higher), reduced network load (up to 22% reduction), average number of hops (up to 27% lower), and unsatisfied requests ratio (more than 47% reduction).
- [185] arXiv:2403.18325 [pdf, other]
-
Title: Common Sense Enhanced Knowledge-based Recommendation with Large Language ModelComments: Accepted by DASFAA 2024Subjects: Information Retrieval (cs.IR)
Knowledge-based recommendation models effectively alleviate the data sparsity issue leveraging the side information in the knowledge graph, and have achieved considerable performance. Nevertheless, the knowledge graphs used in previous work, namely metadata-based knowledge graphs, are usually constructed based on the attributes of items and co-occurring relations (e.g., also buy), in which the former provides limited information and the latter relies on sufficient interaction data and still suffers from cold start issue. Common sense, as a form of knowledge with generality and universality, can be used as a supplement to the metadata-based knowledge graph and provides a new perspective for modeling users' preferences. Recently, benefiting from the emergent world knowledge of the large language model, efficient acquisition of common sense has become possible. In this paper, we propose a novel knowledge-based recommendation framework incorporating common sense, CSRec, which can be flexibly coupled to existing knowledge-based methods. Considering the challenge of the knowledge gap between the common sense-based knowledge graph and metadata-based knowledge graph, we propose a knowledge fusion approach based on mutual information maximization theory. Experimental results on public datasets demonstrate that our approach significantly improves the performance of existing knowledge-based recommendation models.
- [186] arXiv:2403.18326 [pdf, ps, other]
-
Title: Privacy-Preserving Distributed Nonnegative Matrix FactorizationComments: 5 pages, 1 figure, submitted to EUSIPCO 2024 conferenceSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Signal Processing (eess.SP)
Nonnegative matrix factorization (NMF) is an effective data representation tool with numerous applications in signal processing and machine learning. However, deploying NMF in a decentralized manner over ad-hoc networks introduces privacy concerns due to the conventional approach of sharing raw data among network agents. To address this, we propose a privacy-preserving algorithm for fully-distributed NMF that decomposes a distributed large data matrix into left and right matrix factors while safeguarding each agent's local data privacy. It facilitates collaborative estimation of the left matrix factor among agents and enables them to estimate their respective right factors without exposing raw data. To ensure data privacy, we secure information exchanges between neighboring agents utilizing the Paillier cryptosystem, a probabilistic asymmetric algorithm for public-key cryptography that allows computations on encrypted data without decryption. Simulation results conducted on synthetic and real-world datasets demonstrate the effectiveness of the proposed algorithm in achieving privacy-preserving distributed NMF over ad-hoc networks.
- [187] arXiv:2403.18327 [pdf, other]
-
Title: Can LLMs Converse Formally? Automatically Assessing LLMs in Translating and Interpreting Formal SpecificationsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Stakeholders often describe system requirements using natural language which are then converted to formal syntax by a domain-expert leading to increased design costs. This paper assesses the capabilities of Large Language Models (LLMs) in converting between natural language descriptions and formal specifications. Existing work has evaluated the capabilities of LLMs in generating formal syntax such as source code but such experiments are typically hand-crafted and use problems that are likely to be in the training set of LLMs, and often require human-annotated datasets. We propose an approach that can use two copies of an LLM in conjunction with an off-the-shelf verifier to automatically evaluate its translation abilities without any additional human input. Our approach generates formal syntax using language grammars to automatically generate a dataset. We conduct an empirical evaluation to measure the accuracy of this translation task and show that SOTA LLMs cannot adequately solve this task, limiting their current utility in the design of complex systems.
- [188] arXiv:2403.18328 [pdf, other]
-
Title: PIPNet3D: Interpretable Detection of Alzheimer in MRI ScansAuthors: Lisa Anita De Santi, Jörg Schlötterer, Michael Scheschenja, Joel Wessendorf, Meike Nauta, Vincenzo Positano, Christin SeifertSubjects: Computer Vision and Pattern Recognition (cs.CV)
Information from neuroimaging examinations (CT, MRI) is increasingly used to support diagnoses of dementia, e.g., Alzheimer's disease. While current clinical practice is mainly based on visual inspection and feature engineering, Deep Learning approaches can be used to automate the analysis and to discover new image-biomarkers. Part-prototype neural networks (PP-NN) are an alternative to standard blackbox models, and have shown promising results in general computer vision. PP-NN's base their reasoning on prototypical image regions that are learned fully unsupervised, and combined with a simple-to-understand decision layer. We present PIPNet3D, a PP-NN for volumetric images. We apply PIPNet3D to the clinical case study of Alzheimer's Disease diagnosis from structural Magnetic Resonance Imaging (sMRI). We assess the quality of prototypes under a systematic evaluation framework, propose new metrics to evaluate brain prototypes and perform an evaluation with domain experts. Our results show that PIPNet3D is an interpretable, compact model for Alzheimer's diagnosis with its reasoning well aligned to medical domain knowledge. Notably, PIPNet3D achieves the same accuracy as its blackbox counterpart; and removing the remaining clinically irrelevant prototypes from its decision process does not decrease predictive performance.
- [189] arXiv:2403.18330 [pdf, other]
-
Title: Tracking-Assisted Object Detection with Event CamerasAuthors: Ting-Kang Yen, Igor Morawski, Shusil Dangi, Kai He, Chung-Yi Lin, Jia-Fong Yeh, Hung-Ting Su, Winston HsuSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Event-based object detection has recently garnered attention in the computer vision community due to the exceptional properties of event cameras, such as high dynamic range and no motion blur. However, feature asynchronism and sparsity cause invisible objects due to no relative motion to the camera, posing a significant challenge in the task. Prior works have studied various memory mechanisms to preserve as many features as possible at the current time, guided by temporal clues. While these implicit-learned memories retain some short-term information, they still struggle to preserve long-term features effectively. In this paper, we consider those invisible objects as pseudo-occluded objects and aim to reveal their features. Firstly, we introduce visibility attribute of objects and contribute an auto-labeling algorithm to append additional visibility labels on an existing event camera dataset. Secondly, we exploit tracking strategies for pseudo-occluded objects to maintain their permanence and retain their bounding boxes, even when features have not been available for a very long time. These strategies can be treated as an explicit-learned memory guided by the tracking objective to record the displacements of objects across frames. Lastly, we propose a spatio-temporal feature aggregation module to enrich the latent features and a consistency loss to increase the robustness of the overall pipeline. We conduct comprehensive experiments to verify our method's effectiveness where still objects are retained but real occluded objects are discarded. The results demonstrate that (1) the additional visibility labels can assist in supervised training, and (2) our method outperforms state-of-the-art approaches with a significant improvement of 7.9% absolute mAP.
- [190] arXiv:2403.18331 [pdf, other]
-
Title: Neighbor-Environment Observer: An Intelligent Agent for Immersive Working CompanionshipComments: UIST 2023Subjects: Human-Computer Interaction (cs.HC)
Human-computer symbiosis is a crucial direction for the development of artificial intelligence. As intelligent systems become increasingly prevalent in our work and personal lives, it is important to develop strategies to support users across physical and virtual environments. While technological advances in personal digital devices, such as personal computers and virtual reality devices, can provide immersive experiences, they can also disrupt users' awareness of their surroundings and enhance the frustration caused by disturbances. In this paper, we propose a joint observation strategy for artificial agents to support users across virtual and physical environments. We introduce a prototype system, neighbor-environment observer (NEO), that utilizes non-invasive sensors to assist users in dealing with disruptions to their immersive experience. System experiments evaluate NEO from different perspectives and demonstrate the effectiveness of the joint observation strategy. A user study is conducted to evaluate its usability. The results show that NEO could lessen users' workload with the learned user preference. We suggest that the proposed strategy can be applied to various smart home scenarios.
- [191] arXiv:2403.18334 [pdf, other]
-
Title: DODA: Diffusion for Object-detection Domain Adaptation in AgricultureSubjects: Computer Vision and Pattern Recognition (cs.CV)
The diverse and high-quality content generated by recent generative models demonstrates the great potential of using synthetic data to train downstream models. However, in vision, especially in objection detection, related areas are not fully explored, the synthetic images are merely used to balance the long tails of existing datasets, and the accuracy of the generated labels is low, the full potential of generative models has not been exploited. In this paper, we propose DODA, a data synthesizer that can generate high-quality object detection data for new domains in agriculture. Specifically, we improve the controllability of layout-to-image through encoding layout as an image, thereby improving the quality of labels, and use a visual encoder to provide visual clues for the diffusion model to decouple visual features from the diffusion model, and empowering the model the ability to generate data in new domains. On the Global Wheat Head Detection (GWHD) Dataset, which is the largest dataset in agriculture and contains diverse domains, using the data synthesized by DODA improves the performance of the object detector by 12.74-17.76 AP$_{50}$ in the domain that was significantly shifted from the training data.
- [192] arXiv:2403.18336 [pdf, other]
-
Title: A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across LanguagesAuthors: Lisa Raithel, Hui-Syuan Yeh, Shuntaro Yada, Cyril Grouin, Thomas Lavergne, Aurélie Névéol, Patrick Paroubek, Philippe Thomas, Tomohiro Nishiyama, Sebastian Möller, Eiji Aramaki, Yuji Matsumoto, Roland Roller, Pierre ZweigenbaumComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese. Our corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types. It contributes to the development of real-world multilingual language models for healthcare. We provide statistics to highlight certain challenges associated with the corpus and conduct preliminary experiments resulting in strong baselines for extracting entities and relations between these entities, both within and across languages.
- [193] arXiv:2403.18337 [pdf, ps, other]
-
Title: Macroscale fracture surface segmentation via semi-supervised learning considering the structural similarityComments: During review title changed to: Deep learning based initial crack size measurements utilizing macroscale fracture surface segmentationJournal-ref: Engineering Fracture Mechanics, Volume 293, 1st December 2023, 109868Subjects: Machine Learning (cs.LG)
To this date the safety assessment of materials, used for example in the nuclear power sector, commonly relies on a fracture mechanical analysis utilizing macroscopic concepts, where a global load quantity K or J is compared to the materials fracture toughness curve. Part of the experimental effort involved in these concepts is dedicated to the quantitative analysis of fracture surfaces. Within the scope of this study a methodology for the semi-supervised training of deep learning models for fracture surface segmentation on a macroscopic level was established. Therefore, three distinct and unique datasets were created to analyze the influence of structural similarity on the segmentation capability. The structural similarity differs due to the assessed materials and specimen, as well as imaging-induced variance due to fluctuations in image acquisition in different laboratories. The datasets correspond to typical isolated laboratory conditions, complex real-world circumstances, and a curated subset of the two. We implemented a weak-to-strong consistency regularization for semi-supervised learning. On the heterogeneous dataset we were able to train robust and well-generalizing models that learned feature representations from images across different domains without observing a significant drop in prediction quality. Furthermore, our approach reduced the number of labeled images required for training by a factor of 6. To demonstrate the success of our method and the benefit of our approach for the fracture mechanics assessment, we utilized the models for initial crack size measurements with the area average method. For the laboratory setting, the deep learning assisted measurements proved to have the same quality as manual measurements. For models trained on the heterogeneous dataset, very good measurement accuracies with mean deviations smaller than 1 % could be achieved...
- [194] arXiv:2403.18338 [pdf, other]
-
Title: mALBERT: Is a Compact Multilingual BERT Model Still Worth It?Comments: The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, May 2024, Torino, ItalySubjects: Artificial Intelligence (cs.AI)
Within the current trend of Pretained Language Models (PLM), emerge more and more criticisms about the ethical andecological impact of such models. In this article, considering these critical remarks, we propose to focus on smallermodels, such as compact models like ALBERT, which are more ecologically virtuous than these PLM. However,PLMs enable huge breakthroughs in Natural Language Processing tasks, such as Spoken and Natural LanguageUnderstanding, classification, Question--Answering tasks. PLMs also have the advantage of being multilingual, and,as far as we know, a multilingual version of compact ALBERT models does not exist. Considering these facts, wepropose the free release of the first version of a multilingual compact ALBERT model, pre-trained using Wikipediadata, which complies with the ethical aspect of such a language model. We also evaluate the model against classicalmultilingual PLMs in classical NLP tasks. Finally, this paper proposes a rare study on the subword tokenizationimpact on language performances.
- [195] arXiv:2403.18340 [pdf, ps, other]
-
Title: The Metric Distortion of Randomized Social Choice Functions: C1 Maximal Lottery Rules and SimulationsComments: 8 pages, 1 figureSubjects: Computer Science and Game Theory (cs.GT)
The metric distortion of a randomized social choice function (RSCF) quantifies its worst-case approximation ratio of the optimal social cost when the voters' costs for alternatives are given by distances in a metric space. This notion has recently attracted significant attention as numerous RSCFs that aim to minimize the metric distortion have been suggested. However, such tailored voting rules usually have little appeal other than their low metric distortion. In this paper, we will thus study the metric distortion of well-established RSCFs. In more detail, we first show that C1 maximal lottery rules, a well-known class of RSCFs, have a metric distortion of $4$ and furthermore prove that this is optimal within the class of majoritarian RSCFs (which only depend on the majority relation). As our second contribution, we perform extensive computer experiments on the metric distortion of established RSCFs to obtain insights into their average-case performance. These computer experiments are based on a new linear program for computing the metric distortion of a lottery on a given profile and reveal that some classical RSCFs perform almost as well as the currently best known RSCF with respect to the metric distortion on randomly sampled profiles.
- [196] arXiv:2403.18341 [pdf, other]
-
Title: IterAlign: Iterative Constitutional Alignment of Large Language ModelsComments: NAACL 2024Subjects: Computation and Language (cs.CL)
With the rapid development of large language models (LLMs), aligning LLMs with human values and societal norms to ensure their reliability and safety has become crucial. Reinforcement learning with human feedback (RLHF) and Constitutional AI (CAI) have been proposed for LLM alignment. However, these methods require either heavy human annotations or explicitly pre-defined constitutions, which are labor-intensive and resource-consuming. To overcome these drawbacks, we study constitution-based LLM alignment and propose a data-driven constitution discovery and self-alignment framework called IterAlign. IterAlign leverages red teaming to unveil the weaknesses of an LLM and automatically discovers new constitutions using a stronger LLM. These constitutions are then used to guide self-correction of the base LLM. Such a constitution discovery pipeline can be run iteratively and automatically to discover new constitutions that specifically target the alignment gaps in the current LLM. Empirical results on several safety benchmark datasets and multiple base LLMs show that IterAlign successfully improves truthfulness, helpfulness, harmlessness and honesty, improving the LLM alignment by up to $13.5\%$ in harmlessness.
- [197] arXiv:2403.18342 [pdf, other]
-
Title: Learning Inclusion Matching for Animation Paint Bucket ColorizationComments: accepted to CVPR 2024. Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Colorizing line art is a pivotal task in the production of hand-drawn cel animation. This typically involves digital painters using a paint bucket tool to manually color each segment enclosed by lines, based on RGB values predetermined by a color designer. This frame-by-frame process is both arduous and time-intensive. Current automated methods mainly focus on segment matching. This technique migrates colors from a reference to the target frame by aligning features within line-enclosed segments across frames. However, issues like occlusion and wrinkles in animations often disrupt these direct correspondences, leading to mismatches. In this work, we introduce a new learning-based inclusion matching pipeline, which directs the network to comprehend the inclusion relationships between segments rather than relying solely on direct visual correspondences. Our method features a two-stage pipeline that integrates a coarse color warping module with an inclusion matching module, enabling more nuanced and accurate colorization. To facilitate the training of our network, we also develope a unique dataset, referred to as PaintBucket-Character. This dataset includes rendered line arts alongside their colorized counterparts, featuring various 3D characters. Extensive experiments demonstrate the effectiveness and superiority of our method over existing techniques.
- [198] arXiv:2403.18343 [pdf, other]
-
Title: The Artificial Neural Twin -- Process Optimization and Continual Learning in Distributed Process ChainsAuthors: Johannes Emmert, Ronald Mendez, Houman Mirzaalian Dastjerdi, Christopher Syben, Andreas MaierComments: 20 pages, 11 figuresSubjects: Machine Learning (cs.LG)
Industrial process optimization and control is crucial to increase economic and ecologic efficiency. However, data sovereignty, differing goals, or the required expert knowledge for implementation impede holistic implementation. Further, the increasing use of data-driven AI-methods in process models and industrial sensory often requires regular fine-tuning to accommodate distribution drifts. We propose the Artificial Neural Twin, which combines concepts from model predictive control, deep learning, and sensor networks to address these issues. Our approach introduces differentiable data fusion to estimate the state of distributed process steps and their dependence on input data. By treating the interconnected process steps as a quasi neural-network, we can backpropagate loss gradients for process optimization or model fine-tuning to process parameters or AI models respectively. The concept is demonstrated on a virtual machine park simulated in Unity, consisting of bulk material processes in plastic recycling.
- [199] arXiv:2403.18344 [pdf, other]
-
Title: LC-LLM: Explainable Lane-Change Intention and Trajectory Predictions with Large Language ModelsAuthors: Mingxing Peng, Xusen Guo, Xianda Chen, Meixin Zhu, Kehua Chen, Hao (Frank) Yang, Xuesong Wang, Yinhai WangSubjects: Artificial Intelligence (cs.AI)
To ensure safe driving in dynamic environments, autonomous vehicles should possess the capability to accurately predict the lane change intentions of surrounding vehicles in advance and forecast their future trajectories. Existing motion prediction approaches have ample room for improvement, particularly in terms of long-term prediction accuracy and interpretability. In this paper, we address these challenges by proposing LC-LLM, an explainable lane change prediction model that leverages the strong reasoning capabilities and self-explanation abilities of Large Language Models (LLMs). Essentially, we reformulate the lane change prediction task as a language modeling problem, processing heterogeneous driving scenario information in natural language as prompts for input into the LLM and employing a supervised fine-tuning technique to tailor the LLM specifically for our lane change prediction task. This allows us to utilize the LLM's powerful common sense reasoning abilities to understand complex interactive information, thereby improving the accuracy of long-term predictions. Furthermore, we incorporate explanatory requirements into the prompts in the inference stage. Therefore, our LC-LLM model not only can predict lane change intentions and trajectories but also provides explanations for its predictions, enhancing the interpretability. Extensive experiments on the large-scale highD dataset demonstrate the superior performance and interpretability of our LC-LLM in lane change prediction task. To the best of our knowledge, this is the first attempt to utilize LLMs for predicting lane change behavior. Our study shows that LLMs can encode comprehensive interaction information for driving behavior understanding.
- [200] arXiv:2403.18346 [pdf, other]
-
Title: Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal PerspectiveSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
Recent advancements in Large Language Models (LLMs) have facilitated the development of Multimodal LLMs (MLLMs). Despite their impressive capabilities, MLLMs often suffer from an over-reliance on unimodal biases (e.g., language bias and vision bias), leading to incorrect answers in complex multimodal tasks. To investigate this issue, we propose a causal framework to interpret the biases in Visual Question Answering (VQA) problems. Within our framework, we devise a causal graph to elucidate the predictions of MLLMs on VQA problems, and assess the causal effect of biases through an in-depth causal analysis. Motivated by the causal graph, we introduce a novel MORE dataset, consisting of 12,000 VQA instances. This dataset is designed to challenge MLLMs' abilities, necessitating multi-hop reasoning and the surmounting of unimodal biases. Furthermore, we propose two strategies to mitigate unimodal biases and enhance MLLMs' reasoning capabilities, including a Decompose-Verify-Answer (DeVA) framework for limited-access MLLMs and the refinement of open-source MLLMs through fine-tuning. Extensive quantitative and qualitative experiments offer valuable insights for future research.
- [201] arXiv:2403.18348 [pdf, other]
-
Title: Sequential Recommendation with Latent Relations based on Large Language ModelComments: Accepted by SIGIR 2024Subjects: Information Retrieval (cs.IR)
Sequential recommender systems predict items that may interest users by modeling their preferences based on historical interactions. Traditional sequential recommendation methods rely on capturing implicit collaborative filtering signals among items. Recent relation-aware sequential recommendation models have achieved promising performance by explicitly incorporating item relations into the modeling of user historical sequences, where most relations are extracted from knowledge graphs. However, existing methods rely on manually predefined relations and suffer the sparsity issue, limiting the generalization ability in diverse scenarios with varied item relations. In this paper, we propose a novel relation-aware sequential recommendation framework with Latent Relation Discovery (LRD). Different from previous relation-aware models that rely on predefined rules, we propose to leverage the Large Language Model (LLM) to provide new types of relations and connections between items. The motivation is that LLM contains abundant world knowledge, which can be adopted to mine latent relations of items for recommendation. Specifically, inspired by that humans can describe relations between items using natural language, LRD harnesses the LLM that has demonstrated human-like knowledge to obtain language knowledge representations of items. These representations are fed into a latent relation discovery module based on the discrete state variational autoencoder (DVAE). Then the self-supervised relation discovery tasks and recommendation tasks are jointly optimized. Experimental results on multiple public datasets demonstrate our proposed latent relations discovery method can be incorporated with existing relation-aware sequential recommendation models and significantly improve the performance. Further analysis experiments indicate the effectiveness and reliability of the discovered latent relations.
- [202] arXiv:2403.18349 [pdf, other]
-
Title: Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge FeedbackSubjects: Computation and Language (cs.CL)
Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope. While addressing hallucination has been a focal point in research, previous efforts primarily concentrate on enhancing correctness without giving due consideration to the significance of rejection mechanisms. In this paper, we conduct a comprehensive examination of the role of rejection, introducing the notion of model reliability along with corresponding metrics. These metrics measure the model's ability to provide accurate responses while adeptly rejecting questions exceeding its knowledge boundaries, thereby minimizing hallucinations. To improve the inherent reliability of LLMs, we present a novel alignment framework called Reinforcement Learning from Knowledge Feedback (RLKF). RLKF leverages knowledge feedback to dynamically determine the model's knowledge boundary and trains a reliable reward model to encourage the refusal of out-of-knowledge questions. Experimental results on mathematical questions affirm the substantial efficacy of RLKF in significantly enhancing LLM reliability.
- [203] arXiv:2403.18350 [pdf, other]
-
Title: Evaluation of Semantic Search and its Role in Retrieved-Augmented-Generation (RAG) for Arabic LanguageSubjects: Computation and Language (cs.CL)
The latest advancements in machine learning and deep learning have brought forth the concept of semantic similarity, which has proven immensely beneficial in multiple applications and has largely replaced keyword search. However, evaluating semantic similarity and conducting searches for a specific query across various documents continue to be a complicated task. This complexity is due to the multifaceted nature of the task, the lack of standard benchmarks, whereas these challenges are further amplified for Arabic language. This paper endeavors to establish a straightforward yet potent benchmark for semantic search in Arabic. Moreover, to precisely evaluate the effectiveness of these metrics and the dataset, we conduct our assessment of semantic search within the framework of retrieval augmented generation (RAG).
- [204] arXiv:2403.18351 [pdf, other]
-
Title: Generating Diverse Agricultural Data for Vision-Based Farming ApplicationsAuthors: Mikolaj Cieslak, Umabharathi Govindarajan, Alejandro Garcia, Anuradha Chandrashekar, Torsten Hädrich, Aleksander Mendoza-Drosik, Dominik L. Michels, Sören Pirk, Chia-Chun Fu, Wojciech PałubickiComments: 10 pages, 8 figures, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)
We present a specialized procedural model for generating synthetic agricultural scenes, focusing on soybean crops, along with various weeds. This model is capable of simulating distinct growth stages of these plants, diverse soil conditions, and randomized field arrangements under varying lighting conditions. The integration of real-world textures and environmental factors into the procedural generation process enhances the photorealism and applicability of the synthetic data. Our dataset includes 12,000 images with semantic labels, offering a comprehensive resource for computer vision tasks in precision agriculture, such as semantic segmentation for autonomous weed control. We validate our model's effectiveness by comparing the synthetic data against real agricultural images, demonstrating its potential to significantly augment training data for machine learning models in agriculture. This approach not only provides a cost-effective solution for generating high-quality, diverse data but also addresses specific needs in agricultural vision tasks that are not fully covered by general-purpose models.
- [205] arXiv:2403.18356 [pdf, other]
-
Title: MonoHair: High-Fidelity Hair Modeling from a Monocular VideoAuthors: Keyu Wu, Lingchen Yang, Zhiyi Kuang, Yao Feng, Xutao Han, Yuefan Shen, Hongbo Fu, Kun Zhou, Youyi ZhengComments: Accepted by IEEE CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Undoubtedly, high-fidelity 3D hair is crucial for achieving realism, artistic expression, and immersion in computer graphics. While existing 3D hair modeling methods have achieved impressive performance, the challenge of achieving high-quality hair reconstruction persists: they either require strict capture conditions, making practical applications difficult, or heavily rely on learned prior data, obscuring fine-grained details in images. To address these challenges, we propose MonoHair,a generic framework to achieve high-fidelity hair reconstruction from a monocular video, without specific requirements for environments. Our approach bifurcates the hair modeling process into two main stages: precise exterior reconstruction and interior structure inference. The exterior is meticulously crafted using our Patch-based Multi-View Optimization (PMVO). This method strategically collects and integrates hair information from multiple views, independent of prior data, to produce a high-fidelity exterior 3D line map. This map not only captures intricate details but also facilitates the inference of the hair's inner structure. For the interior, we employ a data-driven, multi-view 3D hair reconstruction method. This method utilizes 2D structural renderings derived from the reconstructed exterior, mirroring the synthetic 2D inputs used during training. This alignment effectively bridges the domain gap between our training data and real-world data, thereby enhancing the accuracy and reliability of our interior structure inference. Lastly, we generate a strand model and resolve the directional ambiguity by our hair growth algorithm. Our experiments demonstrate that our method exhibits robustness across diverse hairstyles and achieves state-of-the-art performance. For more results, please refer to our project page https://keyuwu-cs.github.io/MonoHair/.
- [206] arXiv:2403.18358 [pdf, ps, other]
-
Title: Imaging radar and LiDAR image translation for 3-DOF extrinsic calibrationSubjects: Robotics (cs.RO)
The integration of sensor data is crucial in the field of robotics to take full advantage of the various sensors employed. One critical aspect of this integration is determining the extrinsic calibration parameters, such as the relative transformation, between each sensor. The use of data fusion between complementary sensors, such as radar and LiDAR, can provide significant benefits, particularly in harsh environments where accurate depth data is required. However, noise included in radar sensor data can make the estimation of extrinsic calibration challenging. To address this issue, we present a novel framework for the extrinsic calibration of radar and LiDAR sensors, utilizing CycleGAN as amethod of image-to-image translation. Our proposed method employs translating radar bird-eye-view images into LiDAR-style images to estimate the 3-DOF extrinsic parameters. The use of image registration techniques, as well as deskewing based on sensor odometry and B-spline interpolation, is employed to address the rolling shutter effect commonly present in spinning sensors. Our method demonstrates a notable improvement in extrinsic calibration compared to filter-based methods using the MulRan dataset.
- [207] arXiv:2403.18360 [pdf, other]
-
Title: Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain AdaptationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Most domain adaptation (DA) methods are based on either a convolutional neural networks (CNNs) or a vision transformers (ViTs). They align the distribution differences between domains as encoders without considering their unique characteristics. For instance, ViT excels in accuracy due to its superior ability to capture global representations, while CNN has an advantage in capturing local representations. This fact has led us to design a hybrid method to fully take advantage of both ViT and CNN, called Explicitly Class-specific Boundaries (ECB). ECB learns CNN on ViT to combine their distinct strengths. In particular, we leverage ViT's properties to explicitly find class-specific decision boundaries by maximizing the discrepancy between the outputs of the two classifiers to detect target samples far from the source support. In contrast, the CNN encoder clusters target features based on the previously defined class-specific boundaries by minimizing the discrepancy between the probabilities of the two classifiers. Finally, ViT and CNN mutually exchange knowledge to improve the quality of pseudo labels and reduce the knowledge discrepancies of these models. Compared to conventional DA methods, our ECB achieves superior performance, which verifies its effectiveness in this hybrid model. The project website can be found https://dotrannhattuong.github.io/ECB/website/.
- [208] arXiv:2403.18361 [pdf, other]
-
Title: ViTAR: Vision Transformer with Any ResolutionAuthors: Qihang Fan, Quanzeng You, Xiaotian Han, Yongfei Liu, Yunzhe Tao, Huaibo Huang, Ran He, Hongxia YangSubjects: Computer Vision and Pattern Recognition (cs.CV)
his paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen during training. Our work introduces two key innovations to address this issue. Firstly, we propose a novel module for dynamic resolution adjustment, designed with a single Transformer block, specifically to achieve highly efficient incremental token integration. Secondly, we introduce fuzzy positional encoding in the Vision Transformer to provide consistent positional awareness across multiple resolutions, thereby preventing overfitting to any single training resolution. Our resulting model, ViTAR (Vision Transformer with Any Resolution), demonstrates impressive adaptability, achieving 83.3\% top-1 accuracy at a 1120x1120 resolution and 80.4\% accuracy at a 4032x4032 resolution, all while reducing computational costs. ViTAR also shows strong performance in downstream tasks such as instance and semantic segmentation and can easily combined with self-supervised learning techniques like Masked AutoEncoder. Our work provides a cost-effective solution for enhancing the resolution scalability of ViTs, paving the way for more versatile and efficient high-resolution image processing.
- [209] arXiv:2403.18362 [pdf, other]
-
Title: Fractional variational integrators based on convolution quadratureSubjects: Numerical Analysis (math.NA)
Fractional dissipation is a powerful tool to study non-local physical phenomena such as damping models. The design of geometric, in particular, variational integrators for the numerical simulation of such systems relies on a variational formulation of the model. In [19], a new approach is proposed to deal with dissipative systems including fractionally damped systems in a variational way for both, the continuous and discrete setting. It is based on the doubling of variables and their fractional derivatives. The aim of this work is to derive higher-order fractional variational integrators by means of convolution quadrature (CQ) based on backward difference formulas. We then provide numerical methods that are of order 2 improving a previous result in [19]. The convergence properties of the fractional variational integrators and saturation effects due to the approximation of the fractional derivatives by CQ are studied numerically.
- [210] arXiv:2403.18364 [pdf, other]
-
Title: Intent-Aware DRL-Based Uplink Dynamic Scheduler for 5G-NRSubjects: Information Theory (cs.IT); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We investigate the problem of supporting Industrial Internet of Things user equipment (IIoT UEs) with intent (i.e., requested quality of service (QoS)) and random traffic arrival. A deep reinforcement learning (DRL) based centralized dynamic scheduler for time-frequency resources is proposed to learn how to schedule the available communication resources among the IIoT UEs. The proposed scheduler leverages an RL framework to adapt to the dynamic changes in the wireless communication system and traffic arrivals. Moreover, a graph-based reduction scheme is proposed to reduce the state and action space of the RL framework to allow fast convergence and a better learning strategy. Simulation results demonstrate the effectiveness of the proposed intelligent scheduler in guaranteeing the expressed intent of IIoT UEs compared to several traditional scheduling schemes, such as round-robin, semi-static, and heuristic approaches. The proposed scheduler also outperforms the contention-free and contention-based schemes in maximizing the number of successfully computed tasks.
- [211] arXiv:2403.18365 [pdf, other]
-
Title: BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific ModelsComments: 11pagesSubjects: Computation and Language (cs.CL)
Large Language Models (LLMs) like ChatGPT and GPT-4 are versatile and capable of addressing a diverse range of tasks. However, general LLMs, which are developed on open-domain data, may lack the domain-specific knowledge essential for tasks in vertical domains, such as legal, medical, etc. To address this issue, previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. Unfortunately, these strategies are either cost-intensive or unreliable in practical applications. To this end, we present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models. BLADE consists of a black-box LLM and a small domain-specific LM. The small LM preserves domain-specific knowledge and offers specialized insights, while the general LLM contributes robust language comprehension and reasoning capabilities. Specifically, our method involves three steps: 1) pre-training the small LM with domain-specific data, 2) fine-tuning this model using knowledge instruction data, and 3) joint Bayesian optimization of the general LLM and the small LM. Extensive experiments conducted on public legal and medical benchmarks reveal that BLADE significantly outperforms existing approaches. This shows the potential of BLADE as an effective and cost-efficient solution in adapting general LLMs for vertical domains.
- [212] arXiv:2403.18367 [pdf, other]
-
Title: Merits of Time-Domain Computing for VMM -- A Quantitative ComparisonSubjects: Hardware Architecture (cs.AR)
Vector-matrix-multiplication (VMM) accel-erators have gained a lot of traction, especially due to therise of convolutional neural networks (CNNs) and the desireto compute them on the edge. Besides the classical digitalapproach, analog computing has gone through a renais-sance to push energy efficiency further. A more recent ap-proach is called time-domain (TD) computing. In contrastto analog computing, TD computing permits easy technol-ogy as well as voltage scaling. As it has received limitedresearch attention, it is not yet clear which scenarios aremost suitable to be computed in the TD. In this work, weinvestigate these scenarios, focussing on energy efficiencyconsidering approximative computations that preserve ac-curacy. Both goals are addressed by a novel efficiency met-ric, which is used to find a baseline design. We use SPICEsimulation data which is fed into a python framework toevaluate how performance scales for VMM computation.We see that TD computing offers best energy efficiency forsmall to medium sized arrays. With throughput and sili-con footprint we investigate two additional metrics, givinga holistic comparison.
- [213] arXiv:2403.18369 [pdf, other]
-
Title: Damage Mechanics Challenge: Predictions based on the phase field fracture modelSubjects: Computational Engineering, Finance, and Science (cs.CE); Applied Physics (physics.app-ph)
In this work, we describe our contribution to the Purdue-SANDIA-LLNL \emph{Damage Mechanics Challenge}. The phase field fracture model is adopted to blindly estimate the failure characteristics of the challenge test, an unconventional three-point bending experiment on an additively manufactured rock resembling a type of gypsum. The model is formulated in a variationally consistent fashion, incorporating a volumetric-deviatoric strain energy decomposition, and the numerical implementation adopts a monolithic unconditionally stable solution scheme. Our focus is on providing an efficient and simple yet rigorous approach capable of delivering accurate predictions based solely on physical parameters. Model inputs are Young's modulus $E$, Poisson's ratio $\nu$, toughness $G_c$ and strength $\sigma_c$ (as determined by the choice of phase field length scale $\ell$). We show that a single mode I three-point bending test is sufficient to calibrate the model, and that the calibrated model can then reliably predict the force versus displacement responses, crack paths and surface crack morphologies of more intricate three-point bending experiments that are inherently mixed-mode. Importantly, our peak load, crack trajectory and crack surface morphology predictions for the challenge test, submitted before the experimental data was released, show a remarkable agreement with experiments. The characteristics of the challenge, and how changes in these can impact the predictive abilities of phase field fracture models, are also discussed.
- [214] arXiv:2403.18370 [pdf, other]
-
Title: Ship in Sight: Diffusion Models for Ship-Image Super ResolutionComments: Accepted at 2024 International Joint Conference on Neural Networks (IJCNN)Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
In recent years, remarkable advancements have been achieved in the field of image generation, primarily driven by the escalating demand for high-quality outcomes across various image generation subtasks, such as inpainting, denoising, and super resolution. A major effort is devoted to exploring the application of super-resolution techniques to enhance the quality of low-resolution images. In this context, our method explores in depth the problem of ship image super resolution, which is crucial for coastal and port surveillance. We investigate the opportunity given by the growing interest in text-to-image diffusion models, taking advantage of the prior knowledge that such foundation models have already learned. In particular, we present a diffusion-model-based architecture that leverages text conditioning during training while being class-aware, to best preserve the crucial details of the ships during the generation of the super-resoluted image. Since the specificity of this task and the scarcity availability of off-the-shelf data, we also introduce a large labeled ship dataset scraped from online ship images, mostly from ShipSpotting\footnote{\url{www.shipspotting.com}} website. Our method achieves more robust results than other deep learning models previously employed for super resolution, as proven by the multiple experiments performed. Moreover, we investigate how this model can benefit downstream tasks, such as classification and object detection, thus emphasizing practical implementation in a real-world scenario. Experimental results show flexibility, reliability, and impressive performance of the proposed framework over state-of-the-art methods for different tasks. The code is available at: https://github.com/LuigiSigillo/ShipinSight .
- [215] arXiv:2403.18371 [pdf, other]
-
Title: Multivariable control of modular multilevel converters with convergence and safety guaranteesComments: Submitted to IEEE Open Journal of the Industrial ElectronicsSubjects: Systems and Control (eess.SY)
Well-designed current control is a key factor in ensuring the efficient and safe operation of modular multilevel converters (MMCs). Even though this control problem involves multiple control objectives, conventional current control schemes are comprised of independently designed decoupled controllers, e.g., proportional-integral (PI) or proportional-resonant (PR). Due to the bilinearity of the MMC dynamics, tuning PI and PR controllers so that good performance and constraint satisfaction are guaranteed is quite challenging. This challenge becomes more relevant in an AC/AC MMC configuration due to the complexity of tracking the single-phase sinusoidal components of the MMC output. In this paper, we propose a method to design a multivariable controller, i.e., a static feedback gain, to regulate the MMC currents. We use a physics-informed transformation to model the MMC dynamics linearly and synthesise the proposed controller. We use this linear model to formulate a linear matrix inequality that computes a feedback gain that guarantees safe and effective operation, including (i) limited tracking error, (ii) stability, and (iii) meeting all constraints. To test the efficacy of our method, we examine its performance in a direct AC/AC MMC simulated in Simulink/PLECS and in a scaled-down AC/AC MMC prototype to investigate the ultra-fast charging of electric vehicles.
- [216] arXiv:2403.18373 [pdf, other]
-
Title: BAM: Box Abstraction Monitors for Real-time OoD Detection in Object DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Out-of-distribution (OoD) detection techniques for deep neural networks (DNNs) become crucial thanks to their filtering of abnormal inputs, especially when DNNs are used in safety-critical applications and interact with an open and dynamic environment. Nevertheless, integrating OoD detection into state-of-the-art (SOTA) object detection DNNs poses significant challenges, partly due to the complexity introduced by the SOTA OoD construction methods, which require the modification of DNN architecture and the introduction of complex loss functions. This paper proposes a simple, yet surprisingly effective, method that requires neither retraining nor architectural change in object detection DNN, called Box Abstraction-based Monitors (BAM). The novelty of BAM stems from using a finite union of convex box abstractions to capture the learned features of objects for in-distribution (ID) data, and an important observation that features from OoD data are more likely to fall outside of these boxes. The union of convex regions within the feature space allows the formation of non-convex and interpretable decision boundaries, overcoming the limitations of VOS-like detectors without sacrificing real-time performance. Experiments integrating BAM into Faster R-CNN-based object detection DNNs demonstrate a considerably improved performance against SOTA OoD detection techniques.
- [217] arXiv:2403.18374 [pdf, other]
-
Title: Optimizing Communication for Latency Sensitive HPC Applications on up to 48 FPGAs Using ACCLAuthors: Marius Meyer, Tobias Kenter, Lucian Petrica, Kenneth O'Brien, Michaela Blott, Christian PesslSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Hardware Architecture (cs.AR)
Most FPGA boards in the HPC domain are well-suited for parallel scaling because of the direct integration of versatile and high-throughput network ports. However, the utilization of their network capabilities is often challenging and error-prone because the whole network stack and communication patterns have to be implemented and managed on the FPGAs. Also, this approach conceptually involves a trade-off between the performance potential of improved communication and the impact of resource consumption for communication infrastructure, since the utilized resources on the FPGAs could otherwise be used for computations. In this work, we investigate this trade-off, firstly, by using synthetic benchmarks to evaluate the different configuration options of the communication framework ACCL and their impact on communication latency and throughput. Finally, we use our findings to implement a shallow water simulation whose scalability heavily depends on low-latency communication. With a suitable configuration of ACCL, good scaling behavior can be shown to all 48 FPGAs installed in the system. Overall, the results show that the availability of inter-FPGA communication frameworks as well as the configurability of framework and network stack are crucial to achieve the best application performance with low latency communication.
- [218] arXiv:2403.18375 [pdf, other]
-
Title: Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model UpdatesSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning. It typically involves a set of heterogeneous devices locally training neural network (NN) models in parallel with periodic centralized aggregations. As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers. Conventional approaches discard incomplete intra-model updates done by stragglers, alter the amount of local workload and architecture, or resort to asynchronous settings; which all affect the trained model performance under tight training latency constraints. In this work, we propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion. SALF allows stragglers to synchronously convey partial gradients, having each layer of the global model be updated independently with a different contributing set of users. We provide a theoretical analysis, establishing convergence guarantees for the global model under mild assumptions on the distribution of the participating devices, revealing that SALF converges at the same asymptotic rate as FL with no timing limitations. This insight is matched with empirical observations, demonstrating the performance gains of SALF compared to alternative mechanisms mitigating the device heterogeneity gap in FL.
- [219] arXiv:2403.18376 [pdf, other]
-
Title: Extensible Hook System for Rendesvouz and Docking of a Cubesat SwarmSubjects: Robotics (cs.RO)
The use of cubesat swarms is being proposed for different missions where cooperation between satellites is required. Commonly, the cube swarm requires formation flight and even rendezvous and docking, which are very challenging tasks since they required more energy and the use of advanced guidance, navigation and control techniques. In this paper, we propose the use of an extensible hook system to mitigate these drawbacks,i.e. it allows to save fuel and reduce the system complexity by including techniques that have been previously demonstrated on Earth. This system is based on a scissor boom structure, which could reach up to five meters for a 4U dimension, including three degrees of freedom to place the end effector at any pose within the system workspace. We simulated the dynamic behaviour of a cubesat with the proposed system, demonstrating the required power for a 16U cubesat equipped with one extensible hook system is considered acceptable according to the current state of the art actuators.
- [220] arXiv:2403.18378 [pdf, ps, other]
-
Title: Improvements to the theoretical estimates of the Schwarz preconditioner with $Δ$-GenEO coarse space for the indefinite Helmholtz problemSubjects: Numerical Analysis (math.NA)
The purpose of this work is to improve the estimates for the $\Delta$-GenEO method from the paper "Overlapping Schwarz methods with GenEO coarse spaces for indefinite and nonself-adjoint problems" by N. Bootland, V. Dolean, I. G Graham, C. Ma, R. Scheichl (https://doi.org/10.1093/imanum/drac036) when applied to the indefinite Helmholtz equation. We derive k-dependent estimates of quantities of interest ensuring the robustness of the method.
- [221] arXiv:2403.18379 [pdf, other]
-
Title: IIP-Mixer:Intra-Inter Patch Mixing Architecture for Battery Remaining Useful Life PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Accurately estimating the Remaining Useful Life (RUL) of lithium-ion batteries is crucial for maintaining the safe and stable operation of rechargeable battery management systems. However, this task is often challenging due to the complex temporal dynamics involved. Recently, attention-based networks, such as Transformers and Informer, have been the popular architecture in time series forecasting. Despite their effectiveness, these models with abundant parameters necessitate substantial training time to unravel temporal patterns. To tackle these challenges, we propose a simple MLP-Mixer-based architecture named 'Intra-Inter Patch Mixer' (IIP-Mixer), which is an architecture based exclusively on multi-layer perceptrons (MLPs), extracting information by mixing operations along both intra-patch and inter-patch dimensions for battery RUL prediction. The proposed IIP-Mixer comprises parallel dual-head mixer layers: the intra-patch mixing MLP, capturing local temporal patterns in the short-term period, and the inter-patch mixing MLP, capturing global temporal patterns in the long-term period. Notably, to address the varying importance of features in RUL prediction, we introduce a weighted loss function in the MLP-Mixer-based architecture, marking the first time such an approach has been employed. Our experiments demonstrate that IIP-Mixer achieves competitive performance in battery RUL prediction, outperforming other popular time-series frameworks
- [222] arXiv:2403.18381 [pdf, other]
-
Title: Improving Attributed Text Generation of Large Language Models via Preference LearningComments: 23 pages, 15 tables, 2 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models have been widely adopted in natural language processing, yet they face the challenge of generating unreliable content. Recent works aim to reduce misinformation and hallucinations by resorting to attribution as a means to provide evidence (i.e., citations). However, current attribution methods usually focus on the retrieval stage and automatic evaluation that neglect mirroring the citation mechanisms in human scholarly writing to bolster credibility. In this paper, we address these challenges by modelling the attribution task as preference learning and introducing an Automatic Preference Optimization (APO) framework. First, we create a curated collection for post-training with 6,330 examples by collecting and filtering from existing datasets. Second, considering the high cost of labelling preference data, we further propose an automatic method to synthesize attribution preference data resulting in 95,263 pairs. Moreover, inspired by the human citation process, we further propose a progressive preference optimization method by leveraging fine-grained information. Extensive experiments on three datasets (i.e., ASQA, StrategyQA, and ELI5) demonstrate that APO achieves state-of-the-art citation F1 with higher answer quality.
- [223] arXiv:2403.18383 [pdf, other]
-
Title: Generative Multi-modal Models are Good Class-Incremental LearnersComments: Accepted at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In class-incremental learning (CIL) scenarios, the phenomenon of catastrophic forgetting caused by the classifier's bias towards the current task has long posed a significant challenge. It is mainly caused by the characteristic of discriminative models. With the growing popularity of the generative multi-modal models, we would explore replacing discriminative models with generative ones for CIL. However, transitioning from discriminative to generative models requires addressing two key challenges. The primary challenge lies in transferring the generated textual information into the classification of distinct categories. Additionally, it requires formulating the task of CIL within a generative framework. To this end, we propose a novel generative multi-modal model (GMM) framework for class-incremental learning. Our approach directly generates labels for images using an adapted generative model. After obtaining the detailed text, we use a text encoder to extract text features and employ feature matching to determine the most similar label as the classification prediction. In the conventional CIL settings, we achieve significantly better results in long-sequence task scenarios. Under the Few-shot CIL setting, we have improved by at least 14\% accuracy over all the current state-of-the-art methods with significantly less forgetting. Our code is available at \url{https://github.com/DoubleClass/GMM}.
- [224] arXiv:2403.18388 [pdf, other]
-
Title: FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN ConversionSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Spiking Neural Networks (SNNs) offer a promising avenue for energy-efficient computing compared with Artificial Neural Networks (ANNs), closely mirroring biological neural processes. However, this potential comes with inherent challenges in directly training SNNs through spatio-temporal backpropagation -- stemming from the temporal dynamics of spiking neurons and their discrete signal processing -- which necessitates alternative ways of training, most notably through ANN-SNN conversion. In this work, we introduce a lightweight Forward Temporal Bias Correction (FTBC) technique, aimed at enhancing conversion accuracy without the computational overhead. We ground our method on provided theoretical findings that through proper temporal bias calibration the expected error of ANN-SNN conversion can be reduced to be zero after each time step. We further propose a heuristic algorithm for finding the temporal bias only in the forward pass, thus eliminating the computational burden of backpropagation and we evaluate our method on CIFAR-10/100 and ImageNet datasets, achieving a notable increase in accuracy on all datasets. Codes are released at a GitHub repository.
- [225] arXiv:2403.18393 [pdf, other]
-
Title: Tensor-based Graph Learning with Consistency and Specificity for Multi-view ClusteringSubjects: Machine Learning (cs.LG)
Graph learning is widely recognized as a crucial technique in multi-view clustering. Existing graph learning methods typically involve constructing an adaptive neighbor graph based on probabilistic neighbors and then learning a consensus graph to for clustering, however, they are confronted with two limitations. Firstly, they often rely on Euclidean distance to measure similarity when constructing the adaptive neighbor graph, which proves inadequate in capturing the intrinsic structure among data points in many real-world scenarios. Secondly, most of these methods focus solely on consensus graph, ignoring view-specific graph information. In response to the aforementioned drawbacks, we in this paper propose a novel tensor-based graph learning framework that simultaneously considers consistency and specificity for multi-view clustering. Specifically, we calculate the similarity distance on the Stiefel manifold to preserve the intrinsic structure among data points. By making an assumption that the learned neighbor graph of each view comprises both a consistent graph and a view-specific graph, we formulate a new tensor-based target graph learning paradigm. Owing to the benefits of tensor singular value decomposition (t-SVD) in uncovering high-order correlations, this model is capable of achieving a complete understanding of the target graph. Furthermore, we develop an iterative algorithm to solve the proposed objective optimization problem. Experiments conducted on real-world datasets have demonstrated the superior performance of the proposed method over some state-of-the-art multi-view clustering methods. The source code has been released on https://github.com/lshi91/CSTGL-Code.
- [226] arXiv:2403.18397 [pdf, ps, other]
-
Title: Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial NetworksComments: 28 pages, 5 tables, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Abstract Art is an immensely popular, discussed form of art that often has the ability to depict the emotions of an artist. Many researchers have made attempts to study abstract art in the form of edge detection, brush stroke and emotion recognition algorithms using machine and deep learning. This papers describes the study of a wide distribution of abstract paintings using Generative Adversarial Neural Networks(GAN). GANs have the ability to learn and reproduce a distribution enabling researchers and scientists to effectively explore and study the generated image space. However, the challenge lies in developing an efficient GAN architecture that overcomes common training pitfalls. This paper addresses this challenge by introducing a modified-DCGAN (mDCGAN) specifically designed for high-quality artwork generation. The approach involves a thorough exploration of the modifications made, delving into the intricate workings of DCGANs, optimisation techniques, and regularisation methods aimed at improving stability and realism in art generation enabling effective study of generated patterns. The proposed mDCGAN incorporates meticulous adjustments in layer configurations and architectural choices, offering tailored solutions to the unique demands of art generation while effectively combating issues like mode collapse and gradient vanishing. Further this paper explores the generated latent space by performing random walks to understand vector relationships between brush strokes and colours in the abstract art space and a statistical analysis of unstable outputs after a certain period of GAN training and compare its significant difference. These findings validate the effectiveness of the proposed approach, emphasising its potential to revolutionise the field of digital art generation and digital art ecosystem.
- [227] arXiv:2403.18398 [pdf, ps, other]
-
Title: Adaptive Economic Model Predictive Control for linear systems with performance guaranteesComments: 8 pages, 3 figures, submitted to IEEE CDC 2024Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
We present a model predictive control (MPC) formulation to directly optimize economic criteria for linear constrained systems subject to disturbances and uncertain model parameters. The proposed formulation combines a certainty equivalent economic MPC with a simple least-squares parameter adaptation. For the resulting adaptive economic MPC scheme, we derive strong asymptotic and transient performance guarantees. We provide a numerical example involving building temperature control and demonstrate performance benefits of online parameter adaptation.
- [228] arXiv:2403.18402 [pdf, other]
-
Title: On Spectrogram Analysis in a Multiple Classifier Fusion Framework for Power Grid Classification Using Electric Network FrequencyComments: 13th International Conference on Pattern Recognition Applications and Methods (ICPRAM)Subjects: Machine Learning (cs.LG)
The Electric Network Frequency (ENF) serves as a unique signature inherent to power distribution systems. Here, a novel approach for power grid classification is developed, leveraging ENF. Spectrograms are generated from audio and power recordings across different grids, revealing distinctive ENF patterns that aid in grid classification through a fusion of classifiers. Four traditional machine learning classifiers plus a Convolutional Neural Network (CNN), optimized using Neural Architecture Search, are developed for One-vs-All classification. This process generates numerous predictions per sample, which are then compiled and used to train a shallow multi-label neural network specifically designed to model the fusion process, ultimately leading to the conclusive class prediction for each sample. Experimental findings reveal that both validation and testing accuracy outperform those of current state-of-the-art classifiers, underlining the effectiveness and robustness of the proposed methodology.
- [229] arXiv:2403.18403 [pdf, other]
-
Title: FoC: Figure out the Cryptographic Functions in Stripped Binaries with LLMsSubjects: Cryptography and Security (cs.CR)
Analyzing the behavior of cryptographic functions in stripped binaries is a challenging but essential task. Cryptographic algorithms exhibit greater logical complexity compared to typical code, yet their analysis is unavoidable in areas such as virus analysis and legacy code inspection. Existing methods often rely on data or structural pattern matching, leading to suboptimal generalizability and suffering from manual work. In this paper, we propose a novel framework called FoC to Figure out the Cryptographic functions in stripped binaries. In FoC, we first build a binary large language model (FoCBinLLM) to summarize the semantics of cryptographic functions in natural language. The prediction of FoC-BinLLM is insensitive to minor changes, such as vulnerability patches. To mitigate it, we further build a binary code similarity model (FoC-Sim) upon the FoC-BinLLM to create change-sensitive representations and use it to retrieve similar implementations of unknown cryptographic functions in a database. In addition, we construct a cryptographic binary dataset for evaluation and to facilitate further research in this domain. And an automated method is devised to create semantic labels for extensive binary functions. Evaluation results demonstrate that FoC-BinLLM outperforms ChatGPT by 14.61% on the ROUGE-L score. FoC-Sim outperforms the previous best methods with a 52% higher Recall@1. Furthermore, our method also shows practical ability in virus analysis and 1-day vulnerability detection.
- [230] arXiv:2403.18404 [pdf, ps, other]
-
Title: Convexity of near-optimal orthogonal-pair-free sets on the unit sphereAuthors: Apurva MudgalSubjects: Computational Geometry (cs.CG)
A subset $S$ of the unit sphere $\mathbb{S}^2$ is called orthogonal-pair-free if and only if there do not exist two distinct points $u, v \in S$ at distance $\frac{\pi}{2}$ from each other. Witsenhausen \cite{witsenhausen} asked the following question: {\it What is the least upper bound $\alpha_3$ on the Lesbegue measure of any measurable orthogonal-pair-free subset of $\mathbb{S}^2$?} We prove the following result in this paper: Let $\mathcal{A}$ be the collection of all orthogonal-pair-free sets $S$ such that $S$ consists of a finite number of mutually disjoint convex sets. Then, $\alpha_3 = \limsup_{S \in \mathcal{A}} \mu(S)$. Thus, if the double cap conjecture \cite{kalai1} is not true, there is a set in $\mathcal{A}$ with measure strictly greater than the measure of the double cap.
- [231] arXiv:2403.18405 [pdf, other]
-
Title: Leveraging Large Language Models for Relevance Judgments in Legal Case RetrievalSubjects: Artificial Intelligence (cs.AI)
Collecting relevant judgments for legal case retrieval is a challenging and time-consuming task. Accurately judging the relevance between two legal cases requires a considerable effort to read the lengthy text and a high level of domain expertise to extract Legal Facts and make juridical judgments. With the advent of advanced large language models, some recent studies have suggested that it is promising to use LLMs for relevance judgment. Nonetheless, the method of employing a general large language model for reliable relevance judgments in legal case retrieval is yet to be thoroughly explored. To fill this research gap, we devise a novel few-shot workflow tailored to the relevant judgment of legal cases. The proposed workflow breaks down the annotation process into a series of stages, imitating the process employed by human annotators and enabling a flexible integration of expert reasoning to enhance the accuracy of relevance judgments. By comparing the relevance judgments of LLMs and human experts, we empirically show that we can obtain reliable relevance judgments with the proposed workflow. Furthermore, we demonstrate the capacity to augment existing legal case retrieval models through the synthesis of data generated by the large language model.
- [232] arXiv:2403.18406 [pdf, other]
-
Title: An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLMComments: Our code is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Stimulated by the sophisticated reasoning capabilities of recent Large Language Models (LLMs), a variety of strategies for bridging video modality have been devised. A prominent strategy involves Video Language Models (VideoLMs), which train a learnable interface with video data to connect advanced vision encoders with LLMs. Recently, an alternative strategy has surfaced, employing readily available foundation models, such as VideoLMs and LLMs, across multiple stages for modality bridging. In this study, we introduce a simple yet novel strategy where only a single Vision Language Model (VLM) is utilized. Our starting point is the plain insight that a video comprises a series of images, or frames, interwoven with temporal information. The essence of video comprehension lies in adeptly managing the temporal aspects along with the spatial details of each frame. Initially, we transform a video into a single composite image by arranging multiple frames in a grid layout. The resulting single image is termed as an image grid. This format, while maintaining the appearance of a solitary image, effectively retains temporal information within the grid structure. Therefore, the image grid approach enables direct application of a single high-performance VLM without necessitating any video-data training. Our extensive experimental analysis across ten zero-shot video question answering benchmarks, including five open-ended and five multiple-choice benchmarks, reveals that the proposed Image Grid Vision Language Model (IG-VLM) surpasses the existing methods in nine out of ten benchmarks.
- [233] arXiv:2403.18407 [pdf, other]
-
Title: A Channel-ensemble Approach: Unbiased and Low-variance Pseudo-labels is Critical for Semi-supervised ClassificationSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Semi-supervised learning (SSL) is a practical challenge in computer vision. Pseudo-label (PL) methods, e.g., FixMatch and FreeMatch, obtain the State Of The Art (SOTA) performances in SSL. These approaches employ a threshold-to-pseudo-label (T2L) process to generate PLs by truncating the confidence scores of unlabeled data predicted by the self-training method. However, self-trained models typically yield biased and high-variance predictions, especially in the scenarios when a little labeled data are supplied. To address this issue, we propose a lightweight channel-based ensemble method to effectively consolidate multiple inferior PLs into the theoretically guaranteed unbiased and low-variance one. Importantly, our approach can be readily extended to any SSL framework, such as FixMatch or FreeMatch. Experimental results demonstrate that our method significantly outperforms state-of-the-art techniques on CIFAR10/100 in terms of effectiveness and efficiency.
- [234] arXiv:2403.18413 [pdf, ps, other]
-
Title: HyRRT-Connect: A Bidirectional Rapidly-Exploring Random Trees Motion Planning Algorithm for Hybrid SystemsComments: Accepted by the 8th IFAC International Conference on Analysis and Design of Hybrid Systems (ADHS 2024)Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper proposes a bidirectional rapidly-exploring random trees (RRT) algorithm to solve the motion planning problem for hybrid systems. The proposed algorithm, called HyRRT-Connect, propagates in both forward and backward directions in hybrid time until an overlap between the forward and backward propagation results is detected. Then, HyRRT-Connect constructs a motion plan through the reversal and concatenation of functions defined on hybrid time domains, ensuring the motion plan thoroughly satisfies the given hybrid dynamics. To address the potential discontinuity along the flow caused by tolerating some distance between the forward and backward partial motion plans, we reconstruct the backward partial motion plan by a forward-in-hybrid-time simulation from the final state of the forward partial motion plan. By applying the reversed input of the backward partial motion plan, the reconstruction process effectively eliminates the discontinuity and ensures that as the tolerance distance decreases to zero, the distance between the endpoint of the reconstructed motion plan and the final state set approaches zero. The proposed algorithm is applied to an actuated bouncing ball example and a walking robot example so as to highlight its generality and computational improvement.
- [235] arXiv:2403.18415 [pdf, ps, other]
-
Title: The Topos of Transformer NetworksSubjects: Machine Learning (cs.LG); Category Theory (math.CT)
The transformer neural network has significantly out-shined all other neural network architectures as the engine behind large language models. We provide a theoretical analysis of the expressivity of the transformer architecture through the lens of topos theory. From this viewpoint, we show that many common neural network architectures, such as the convolutional, recurrent and graph convolutional networks, can be embedded in a pretopos of piecewise-linear functions, but that the transformer necessarily lives in its topos completion. In particular, this suggests that the two network families instantiate different fragments of logic: the former are first order, whereas transformers are higher-order reasoners. Furthermore, we draw parallels with architecture search and gradient descent, integrating our analysis in the framework of cybernetic agents.
- [236] arXiv:2403.18416 [pdf, other]
-
Title: A Delaunay Refinement Algorithm for the Particle Finite Element Method applied to Free Surface FlowsSubjects: Computational Engineering, Finance, and Science (cs.CE)
This paper proposes two contributions to the calculation of free surface flows using the particle finite element method (PFEM). The PFEM is based on a Lagrangian approach: a set of particles defines the fluid. Then, unlike a pure Lagrangian method, all the particles are connected by a triangular mesh. The difficulty lies in locating the free surface from this mesh. It is a matter of deciding which of the elements in the mesh are part of the fluid domain, and to define a boundary - the free surface. Then, the incompressible Navier-Stokes equations are solved on the fluid domain and the particles' position is updated using the resulting velocity vector. Our first contribution is to propose an approach to adapt the mesh with theoretical guarantees of quality: the mesh generation community has acquired a lot of experience and understanding about mesh adaptation approaches with guarantees of quality on the final mesh. We use here a Delaunay refinement strategy, allowing to insert and remove nodes while gradually improving mesh quality. We show that this allows to create stable and smooth free surface geometries. Our PFEM approach models the topological evolution of one fluid. It is nevertheless necessary to apply conditions on the domain boundaries. When a boundary is a free surface, the flow on the other side is not modelled, it is represented by an external pressure. On the external free surface boundary, atmospheric pressure can be imposed. Nevertheless, there may be internal free surfaces: the fluid can fully encapsulate cavities to form bubbles. The pressure required to maintain the volume of those bubbles is a priori unknown. We propose a multi-point constraint approach to enforce global incompressibility of those empty bubbles. This approach allows to accurately model bubbly flows that involve two fluids with large density differences, while only modelling the heavier fluid.
- [237] arXiv:2403.18417 [pdf, other]
-
Title: ECNet: Effective Controllable Text-to-Image Diffusion ModelsAuthors: Sicheng Li, Keqiang Sun, Zhixin Lai, Xiaoshi Wu, Feng Qiu, Haoran Xie, Kazunori Miyata, Hongsheng LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
The conditional text-to-image diffusion models have garnered significant attention in recent years. However, the precision of these models is often compromised mainly for two reasons, ambiguous condition input and inadequate condition guidance over single denoising loss. To address the challenges, we introduce two innovative solutions. Firstly, we propose a Spatial Guidance Injector (SGI) which enhances conditional detail by encoding text inputs with precise annotation information. This method directly tackles the issue of ambiguous control inputs by providing clear, annotated guidance to the model. Secondly, to overcome the issue of limited conditional supervision, we introduce Diffusion Consistency Loss (DCL), which applies supervision on the denoised latent code at any given time step. This encourages consistency between the latent code at each time step and the input signal, thereby enhancing the robustness and accuracy of the output. The combination of SGI and DCL results in our Effective Controllable Network (ECNet), which offers a more accurate controllable end-to-end text-to-image generation framework with a more precise conditioning input and stronger controllable supervision. We validate our approach through extensive experiments on generation under various conditions, such as human body skeletons, facial landmarks, and sketches of general objects. The results consistently demonstrate that our method significantly enhances the controllability and robustness of the generated images, outperforming existing state-of-the-art controllable text-to-image models.
- [238] arXiv:2403.18421 [pdf, other]
-
Title: BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical TextAuthors: Elliot Bolton, Abhinav Venigalla, Michihiro Yasunaga, David Hall, Betty Xiong, Tony Lee, Roxana Daneshjou, Jonathan Frankle, Percy Liang, Michael Carbin, Christopher D. ManningComments: 23 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Models such as GPT-4 and Med-PaLM 2 have demonstrated impressive performance on a wide variety of biomedical NLP tasks. However, these models have hundreds of billions of parameters, are computationally expensive to run, require users to send their input data over the internet, and are trained on unknown data sources. Can smaller, more targeted models compete? To address this question, we build and release BioMedLM, a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles. When fine-tuned, BioMedLM can produce strong multiple-choice biomedical question-answering results competitive with much larger models, such as achieving a score of 57.3% on MedMCQA (dev) and 69.0% on the MMLU Medical Genetics exam. BioMedLM can also be fine-tuned to produce useful answers to patient questions on medical topics. This demonstrates that smaller models can potentially serve as transparent, privacy-preserving, economical and environmentally friendly foundations for particular NLP applications, such as in biomedicine. The model is available on the Hugging Face Hub: https://huggingface.co/stanford-crfm/BioMedLM.
- [239] arXiv:2403.18422 [pdf, ps, other]
-
Title: Feedback Linearizable Discretizations of Second Order Mechanical Systems using Retraction MapsSubjects: Systems and Control (eess.SY)
Mechanical systems, in nature, are often described by a set of continuous-time, nonlinear, second-order differential equations (SODEs). This has motivated designs of various control laws implemented on digital controllers, consequently requiring numerical discretization schemes. Feedback linearizability of such sampled systems depends on the discretization scheme or map choice. In this article, we utilize retraction maps and their lifts to construct feedback linearizable discretizations for SODEs, which can be applied to various mechanical systems.
- [240] arXiv:2403.18423 [pdf, other]
-
Title: SemRoDe: Macro Adversarial Training to Learn Representations That are Robust to Word-Level AttacksComments: Published in NAACL 2024 (Main Track)Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Language models (LMs) are indispensable tools for natural language processing tasks, but their vulnerability to adversarial attacks remains a concern. While current research has explored adversarial training techniques, their improvements to defend against word-level attacks have been limited. In this work, we propose a novel approach called Semantic Robust Defence (SemRoDe), a Macro Adversarial Training strategy to enhance the robustness of LMs. Drawing inspiration from recent studies in the image domain, we investigate and later confirm that in a discrete data setting such as language, adversarial samples generated via word substitutions do indeed belong to an adversarial domain exhibiting a high Wasserstein distance from the base domain. Our method learns a robust representation that bridges these two domains. We hypothesize that if samples were not projected into an adversarial domain, but instead to a domain with minimal shift, it would improve attack robustness. We align the domains by incorporating a new distance-based objective. With this, our model is able to learn more generalized representations by aligning the model's high-level output features and therefore better handling unseen adversarial samples. This method can be generalized across word embeddings, even when they share minimal overlap at both vocabulary and word-substitution levels. To evaluate the effectiveness of our approach, we conduct experiments on BERT and RoBERTa models on three datasets. The results demonstrate promising state-of-the-art robustness.
- [241] arXiv:2403.18425 [pdf, other]
-
Title: U-Sketch: An Efficient Approach for Sketch to Image Diffusion ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Diffusion models have demonstrated remarkable performance in text-to-image synthesis, producing realistic and high resolution images that faithfully adhere to the corresponding text-prompts. Despite their great success, they still fall behind in sketch-to-image synthesis tasks, where in addition to text-prompts, the spatial layout of the generated images has to closely follow the outlines of certain reference sketches. Employing an MLP latent edge predictor to guide the spatial layout of the synthesized image by predicting edge maps at each denoising step has been recently proposed. Despite yielding promising results, the pixel-wise operation of the MLP does not take into account the spatial layout as a whole, and demands numerous denoising iterations to produce satisfactory images, leading to time inefficiency. To this end, we introduce U-Sketch, a framework featuring a U-Net type latent edge predictor, which is capable of efficiently capturing both local and global features, as well as spatial correlations between pixels. Moreover, we propose the addition of a sketch simplification network that offers the user the choice of preprocessing and simplifying input sketches for enhanced outputs. The experimental results, corroborated by user feedback, demonstrate that our proposed U-Net latent edge predictor leads to more realistic results, that are better aligned with the spatial outlines of the reference sketches, while drastically reducing the number of required denoising steps and, consequently, the overall execution time.
- [242] arXiv:2403.18426 [pdf, other]
-
Title: TriviaHG: A Dataset for Automatic Hint Generation from Factoid QuestionsComments: Accepted at SIGIR 2024Subjects: Computation and Language (cs.CL)
Nowadays, individuals tend to engage in dialogues with Large Language Models, seeking answers to their questions. In times when such answers are readily accessible to anyone, the stimulation and preservation of human's cognitive abilities, as well as the assurance of maintaining good reasoning skills by humans becomes crucial. This study addresses such needs by proposing hints (instead of final answers or before giving answers) as a viable solution. We introduce a framework for the automatic hint generation for factoid questions, employing it to construct TriviaHG, a novel large-scale dataset featuring 160,230 hints corresponding to 16,645 questions from the TriviaQA dataset. Additionally, we present an automatic evaluation method that measures the Convergence and Familiarity quality attributes of hints. To evaluate the TriviaHG dataset and the proposed evaluation method, we enlisted 10 individuals to annotate 2,791 hints and tasked 6 humans with answering questions using the provided hints. The effectiveness of hints varied, with success rates of 96%, 78%, and 36% for questions with easy, medium, and hard answers, respectively. Moreover, the proposed automatic evaluation methods showed a robust correlation with annotators' results. Conclusively, the findings highlight three key insights: the facilitative role of hints in resolving unknown questions, the dependence of hint quality on answer difficulty, and the feasibility of employing automatic evaluation methods for hint assessment.
- [243] arXiv:2403.18430 [pdf, other]
-
Title: Exploring language relations through syntactic distances and geographic proximityComments: 36 pagesSubjects: Computation and Language (cs.CL); Data Analysis, Statistics and Probability (physics.data-an); Physics and Society (physics.soc-ph); Applications (stat.AP)
Languages are grouped into families that share common linguistic traits. While this approach has been successful in understanding genetic relations between diverse languages, more analyses are needed to accurately quantify their relatedness, especially in less studied linguistic levels such as syntax. Here, we explore linguistic distances using series of parts of speech (POS) extracted from the Universal Dependencies dataset. Within an information-theoretic framework, we show that employing POS trigrams maximizes the possibility of capturing syntactic variations while being at the same time compatible with the amount of available data. Linguistic connections are then established by assessing pairwise distances based on the POS distributions. Intriguingly, our analysis reveals definite clusters that correspond to well known language families and groups, with exceptions explained by distinct morphological typologies. Furthermore, we obtain a significant correlation between language similarity and geographic distance, which underscores the influence of spatial proximity on language kinships.
- [244] arXiv:2403.18433 [pdf, other]
-
Title: iFace: Hand-Over-Face Gesture Recognition Leveraging Impedance SensingComments: Accepted by Augmented Humans 2024Subjects: Human-Computer Interaction (cs.HC)
Hand-over-face gestures can provide important implicit interactions during conversations, such as frustration or excitement. However, in situations where interlocutors are not visible, such as phone calls or textual communication, the potential meaning contained in the hand-over-face gestures is lost. In this work, we present iFace, an unobtrusive, wearable impedance-sensing solution for recognizing different hand-over-face gestures. In contrast to most existing works, iFace does not require the placement of sensors on the user's face or hands. Instead, we proposed a novel sensing configuration, the shoulders, which remains invisible to both the user and outside observers. The system can monitor the shoulder-to-shoulder impedance variation caused by gestures through electrodes attached to each shoulder. We evaluated iFace in a user study with eight participants, collecting six kinds of hand-over-face gestures with different meanings. Using a convolutional neural network and a user-dependent classification, iFace reaches 82.58 \% macro F1 score. We discuss potential application scenarios of iFace as an implicit interaction interface.
- [245] arXiv:2403.18435 [pdf, other]
-
Title: DELTA: Pre-train a Discriminative Encoder for Legal Case Retrieval via Structural Word AlignmentComments: 11 pagesSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Recent research demonstrates the effectiveness of using pre-trained language models for legal case retrieval. Most of the existing works focus on improving the representation ability for the contextualized embedding of the [CLS] token and calculate relevance using textual semantic similarity. However, in the legal domain, textual semantic similarity does not always imply that the cases are relevant enough. Instead, relevance in legal cases primarily depends on the similarity of key facts that impact the final judgment. Without proper treatments, the discriminative ability of learned representations could be limited since legal cases are lengthy and contain numerous non-key facts. To this end, we introduce DELTA, a discriminative model designed for legal case retrieval. The basic idea involves pinpointing key facts in legal cases and pulling the contextualized embedding of the [CLS] token closer to the key facts while pushing away from the non-key facts, which can warm up the case embedding space in an unsupervised manner. To be specific, this study brings the word alignment mechanism to the contextual masked auto-encoder. First, we leverage shallow decoders to create information bottlenecks, aiming to enhance the representation ability. Second, we employ the deep decoder to enable translation between different structures, with the goal of pinpointing key facts to enhance discriminative ability. Comprehensive experiments conducted on publicly available legal benchmarks show that our approach can outperform existing state-of-the-art methods in legal case retrieval. It provides a new perspective on the in-depth understanding and processing of legal case documents.
- [246] arXiv:2403.18436 [pdf, other]
-
Title: Collaborative Active Learning in Conditional Trust EnvironmentComments: 5 pages, 9 figures, conferenceSubjects: Machine Learning (cs.LG)
In this paper, we investigate collaborative active learning, a paradigm in which multiple collaborators explore a new domain by leveraging their combined machine learning capabilities without disclosing their existing data and models. Instead, the collaborators share prediction results from the new domain and newly acquired labels. This collaboration offers several advantages: (a) it addresses privacy and security concerns by eliminating the need for direct model and data disclosure; (b) it enables the use of different data sources and insights without direct data exchange; and (c) it promotes cost-effectiveness and resource efficiency through shared labeling costs. To realize these benefits, we introduce a collaborative active learning framework designed to fulfill the aforementioned objectives. We validate the effectiveness of the proposed framework through simulations. The results demonstrate that collaboration leads to higher AUC scores compared to independent efforts, highlighting the framework's ability to overcome the limitations of individual models. These findings support the use of collaborative approaches in active learning, emphasizing their potential to enhance outcomes through collective expertise and shared resources. Our work provides a foundation for further research on collaborative active learning and its practical applications in various domains where data privacy, cost efficiency, and model performance are critical considerations.
- [247] arXiv:2403.18438 [pdf, other]
-
Title: Global Vegetation Modeling with Pre-Trained Weather TransformersComments: Tackling Climate Change with Machine Learning Workshop @ ICLR 2024Subjects: Machine Learning (cs.LG)
Accurate vegetation models can produce further insights into the complex interaction between vegetation activity and ecosystem processes. Previous research has established that long-term trends and short-term variability of temperature and precipitation affect vegetation activity. Motivated by the recent success of Transformer-based Deep Learning models for medium-range weather forecasting, we adapt the publicly available pre-trained FourCastNet to model vegetation activity while accounting for the short-term dynamics of climate variability. We investigate how the learned global representation of the atmosphere's state can be transferred to model the normalized difference vegetation index (NDVI). Our model globally estimates vegetation activity at a resolution of \SI{0.25}{\degree} while relying only on meteorological data. We demonstrate that leveraging pre-trained weather models improves the NDVI estimates compared to learning an NDVI model from scratch. Additionally, we compare our results to other recent data-driven NDVI modeling approaches from machine learning and ecology literature. We further provide experimental evidence on how much data and training time is necessary to turn FourCastNet into an effective vegetation model. Code and models will be made available upon publication.
- [248] arXiv:2403.18439 [pdf, other]
-
Title: Generalized Policy Learning for Smart Grids: FL TRPO ApproachComments: ICLR 2024 Workshop: Tackling Climate Change with Machine LearningSubjects: Machine Learning (cs.LG)
The smart grid domain requires bolstering the capabilities of existing energy management systems; Federated Learning (FL) aligns with this goal as it demonstrates a remarkable ability to train models on heterogeneous datasets while maintaining data privacy, making it suitable for smart grid applications, which often involve disparate data distributions and interdependencies among features that hinder the suitability of linear models. This paper introduces a framework that combines FL with a Trust Region Policy Optimization (FL TRPO) aiming to reduce energy-associated emissions and costs. Our approach reveals latent interconnections and employs personalized encoding methods to capture unique insights, understanding the relationships between features and optimal strategies, allowing our model to generalize to previously unseen data. Experimental results validate the robustness of our approach, affirming its proficiency in effectively learning policy models for smart grid challenges.
- [249] arXiv:2403.18442 [pdf, other]
-
Title: Backpropagation-free Network for 3D Test-time AdaptationAuthors: Yanshuo Wang, Ali Cheraghian, Zeeshan Hayder, Jie Hong, Sameera Ramasinghe, Shafin Rahman, David Ahmedt-Aristizabal, Xuesong Li, Lars Petersson, Mehrtash HarandiComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Real-world systems often encounter new data over time, which leads to experiencing target domain shifts. Existing Test-Time Adaptation (TTA) methods tend to apply computationally heavy and memory-intensive backpropagation-based approaches to handle this. Here, we propose a novel method that uses a backpropagation-free approach for TTA for the specific case of 3D data. Our model uses a two-stream architecture to maintain knowledge about the source domain as well as complementary target-domain-specific information. The backpropagation-free property of our model helps address the well-known forgetting problem and mitigates the error accumulation issue. The proposed method also eliminates the need for the usually noisy process of pseudo-labeling and reliance on costly self-supervised training. Moreover, our method leverages subspace learning, effectively reducing the distribution variance between the two domains. Furthermore, the source-domain-specific and the target-domain-specific streams are aligned using a novel entropy-based adaptive fusion strategy. Extensive experiments on popular benchmarks demonstrate the effectiveness of our method. The code will be available at https://github.com/abie-e/BFTT3D.
- [250] arXiv:2403.18443 [pdf, other]
-
Title: $\mathrm{F^2Depth}$: Self-supervised Indoor Monocular Depth Estimation via Optical Flow Consistency and Feature Map SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV)
Self-supervised monocular depth estimation methods have been increasingly given much attention due to the benefit of not requiring large, labelled datasets. Such self-supervised methods require high-quality salient features and consequently suffer from severe performance drop for indoor scenes, where low-textured regions dominant in the scenes are almost indiscriminative. To address the issue, we propose a self-supervised indoor monocular depth estimation framework called $\mathrm{F^2Depth}$. A self-supervised optical flow estimation network is introduced to supervise depth learning. To improve optical flow estimation performance in low-textured areas, only some patches of points with more discriminative features are adopted for finetuning based on our well-designed patch-based photometric loss. The finetuned optical flow estimation network generates high-accuracy optical flow as a supervisory signal for depth estimation. Correspondingly, an optical flow consistency loss is designed. Multi-scale feature maps produced by finetuned optical flow estimation network perform warping to compute feature map synthesis loss as another supervisory signal for depth learning. Experimental results on the NYU Depth V2 dataset demonstrate the effectiveness of the framework and our proposed losses. To evaluate the generalization ability of our $\mathrm{F^2Depth}$, we collect a Campus Indoor depth dataset composed of approximately 1500 points selected from 99 images in 18 scenes. Zero-shot generalization experiments on 7-Scenes dataset and Campus Indoor achieve $\delta_1$ accuracy of 75.8% and 76.0% respectively. The accuracy results show that our model can generalize well to monocular images captured in unknown indoor scenes.
- [251] arXiv:2403.18444 [pdf, other]
-
Title: FRESCO: Federated Reinforcement Energy System for Cooperative OptimizationComments: Tiny Paper at ICLR 2023Subjects: Machine Learning (cs.LG)
The rise in renewable energy is creating new dynamics in the energy grid that promise to create a cleaner and more participative energy grid, where technology plays a crucial part in making the required flexibility to achieve the vision of the next-generation grid. This work presents FRESCO, a framework that aims to ease the implementation of energy markets using a hierarchical control architecture of reinforcement learning agents trained using federated learning. The core concept we are proving is that having greedy agents subject to changing conditions from a higher level agent creates a cooperative setup that will allow for fulfilling all the individual objectives. This paper presents a general overview of the framework, the current progress, and some insights we obtained from the recent results.
- [252] arXiv:2403.18447 [pdf, other]
-
Title: Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory PredictionComments: Accepted at CVPR 2024Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
Language models have demonstrated impressive ability in context understanding and generative performance. Inspired by the recent success of language foundation models, in this paper, we propose LMTraj (Language-based Multimodal Trajectory predictor), which recasts the trajectory prediction task into a sort of question-answering problem. Departing from traditional numerical regression models, which treat the trajectory coordinate sequence as continuous signals, we consider them as discrete signals like text prompts. Specially, we first transform an input space for the trajectory coordinate into the natural language space. Here, the entire time-series trajectories of pedestrians are converted into a text prompt, and scene images are described as text information through image captioning. The transformed numerical and image data are then wrapped into the question-answering template for use in a language model. Next, to guide the language model in understanding and reasoning high-level knowledge, such as scene context and social relationships between pedestrians, we introduce an auxiliary multi-task question and answering. We then train a numerical tokenizer with the prompt data. We encourage the tokenizer to separate the integer and decimal parts well, and leverage it to capture correlations between the consecutive numbers in the language model. Lastly, we train the language model using the numerical tokenizer and all of the question-answer prompts. Here, we propose a beam-search-based most-likely prediction and a temperature-based multimodal prediction to implement both deterministic and stochastic inferences. Applying our LMTraj, we show that the language-based model can be a powerful pedestrian trajectory predictor, and outperforms existing numerical-based predictor methods. Code is publicly available at https://github.com/inhwanbae/LMTrajectory .
- [253] arXiv:2403.18451 [pdf, other]
-
Title: CoRAST: Towards Foundation Model-Powered Correlated Data Analysis in Resource-Constrained CPS and IoTComments: accepted and to be published in 2024 IEEE International Workshop on Foundation Models for Cyber-Physical Systems & Internet of Things (FMSys)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Foundation models (FMs) emerge as a promising solution to harness distributed and diverse environmental data by leveraging prior knowledge to understand the complicated temporal and spatial correlations within heterogeneous datasets. Unlike distributed learning frameworks such as federated learning, which often struggle with multimodal data, FMs can transform diverse inputs into embeddings. This process facilitates the integration of information from various modalities and the application of prior learning to new domains. However, deploying FMs in resource-constrained edge systems poses significant challenges. To this end, we introduce CoRAST, a novel learning framework that utilizes FMs for enhanced analysis of distributed, correlated heterogeneous data. Utilizing a server-based FM, CoRAST can exploit existing environment information to extract temporal, spatial, and cross-modal correlations among sensor data. This enables CoRAST to offer context-aware insights for localized client tasks through FM-powered global representation learning. Our evaluation on real-world weather dataset demonstrates CoRAST's ability to exploit correlated heterogeneous data through environmental representation learning to reduce the forecast errors by up to 50.3% compared to the baselines.
- [254] arXiv:2403.18452 [pdf, other]
-
Title: SingularTrajectory: Universal Trajectory Predictor Using Diffusion ModelComments: Accepted at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO)
There are five types of trajectory prediction tasks: deterministic, stochastic, domain adaptation, momentary observation, and few-shot. These associated tasks are defined by various factors, such as the length of input paths, data split and pre-processing methods. Interestingly, even though they commonly take sequential coordinates of observations as input and infer future paths in the same coordinates as output, designing specialized architectures for each task is still necessary. For the other task, generality issues can lead to sub-optimal performances. In this paper, we propose SingularTrajectory, a diffusion-based universal trajectory prediction framework to reduce the performance gap across the five tasks. The core of SingularTrajectory is to unify a variety of human dynamics representations on the associated tasks. To do this, we first build a Singular space to project all types of motion patterns from each task into one embedding space. We next propose an adaptive anchor working in the Singular space. Unlike traditional fixed anchor methods that sometimes yield unacceptable paths, our adaptive anchor enables correct anchors, which are put into a wrong location, based on a traversability map. Finally, we adopt a diffusion-based predictor to further enhance the prototype paths using a cascaded denoising process. Our unified framework ensures the generality across various benchmark settings such as input modality, and trajectory lengths. Extensive experiments on five public benchmarks demonstrate that SingularTrajectory substantially outperforms existing models, highlighting its effectiveness in estimating general dynamics of human movements. Code is publicly available at https://github.com/inhwanbae/SingularTrajectory .
- [255] arXiv:2403.18453 [pdf, other]
-
Title: Annotating Slack Directly on Your Verilog: Fine-Grained RTL Timing Evaluation for Early OptimizationComments: Published as a conference paper at Design Automation Conference (DAC) 2024Subjects: Hardware Architecture (cs.AR)
In digital IC design, compared with post-synthesis netlists or layouts, the early register-transfer level (RTL) stage offers greater optimization flexibility for both designers and EDA tools. However, timing information is typically unavailable at this early stage. Some recent machine learning (ML) solutions propose to predict the total negative slack (TNS) and worst negative slack (WNS) of an entire design at the RTL stage, but the fine-grained timing information of individual registers remains unavailable. In this work, we address the unique challenges of RTL timing prediction and introduce our solution named RTL-Timer. To the best of our knowledge, this is the first fine-grained general timing estimator applicable to any given design. RTL-Timer explores multiple promising RTL representations and proposes customized loss functions to capture the maximum arrival time at register endpoints. RTL-Timer's fine-grained predictions are further applied to guide optimization in a standard synthesis flow. The average results on unknown test designs demonstrate a correlation above 0.89, contributing around 3% WNS and 10% TNS improvement after optimization.
- [256] arXiv:2403.18454 [pdf, other]
-
Title: Scaling Vision-and-Language Navigation With Offline RLComments: Published in Transactions on Machine Learning Research (04/2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
The study of vision-and-language navigation (VLN) has typically relied on expert trajectories, which may not always be available in real-world situations due to the significant effort required to collect them. On the other hand, existing approaches to training VLN agents that go beyond available expert data involve data augmentations or online exploration which can be tedious and risky. In contrast, it is easy to access large repositories of suboptimal offline trajectories. Inspired by research in offline reinforcement learning (ORL), we introduce a new problem setup of VLN-ORL which studies VLN using suboptimal demonstration data. We introduce a simple and effective reward-conditioned approach that can account for dataset suboptimality for training VLN agents, as well as benchmarks to evaluate progress and promote research in this area. We empirically study various noise models for characterizing dataset suboptimality among other unique challenges in VLN-ORL and instantiate it for the VLN$\circlearrowright$BERT and MTVM architectures in the R2R and RxR environments. Our experiments demonstrate that the proposed reward-conditioned approach leads to significant performance improvements, even in complex and intricate environments.
- [257] arXiv:2403.18456 [pdf, other]
-
Title: Inverse kinematics learning of a continuum manipulator using limited real time dataSubjects: Robotics (cs.RO)
Data driven control of a continuum manipulator requires a lot of data for training but generating sufficient amount of real time data is not cost efficient. Random actuation of the manipulator can also be unsafe sometimes. Meta learning has been used successfully to adapt to a new environment. Hence, this paper tries to solve the above mentioned problem using meta learning. We consider two cases for that. First, this paper proposes a method to use simulation data for training the model using MAML(Model-Agnostic Meta-Learning). Then, it adapts to the real world using gradient steps. Secondly,if the simulation model is not available or difficult to formulate, then we propose a CGAN(Conditional Generative adversial network)-MAML based method for it. The model is trained using a small amount of real time data and augmented data for different loading conditions. Then, adaptation is done in the real environment. It has been found out from the experiments that the relative positioning error for both the cases are below 3%. The proposed models are experimentally verified on a real continuum manipulator.
- [258] arXiv:2403.18459 [pdf, other]
-
Title: CoBOS: Constraint-Based Online Scheduler for Human-Robot CollaborationComments: 7 pages, 8 figuresSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
Assembly processes involving humans and robots are challenging scenarios because the individual activities and access to shared workspace have to be coordinated. Fixed robot programs leave no room to diverge from a fixed protocol. Working on such a process can be stressful for the user and lead to ineffective behavior or failure. We propose a novel approach of online constraint-based scheduling in a reactive execution control framework facilitating behavior trees called CoBOS. This allows the robot to adapt to uncertain events such as delayed activity completions and activity selection (by the human). The user will experience less stress as the robotic coworkers adapt their behavior to best complement the human-selected activities to complete the common task. In addition to the improved working conditions, our algorithm leads to increased efficiency, even in highly uncertain scenarios. We evaluate our algorithm using a probabilistic simulation study with 56000 experiments. We outperform all baselines by a margin of 4-10%. Initial real robot experiments using a Franka Emika Panda robot and human tracking based on HTC Vive VR gloves look promising.
- [259] arXiv:2403.18461 [pdf, other]
-
Title: DiffStyler: Diffusion-based Localized Image Style TransferAuthors: Shaoxu LiSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image style transfer aims to imbue digital imagery with the distinctive attributes of style targets, such as colors, brushstrokes, shapes, whilst concurrently preserving the semantic integrity of the content. Despite the advancements in arbitrary style transfer methods, a prevalent challenge remains the delicate equilibrium between content semantics and style attributes. Recent developments in large-scale text-to-image diffusion models have heralded unprecedented synthesis capabilities, albeit at the expense of relying on extensive and often imprecise textual descriptions to delineate artistic styles. Addressing these limitations, this paper introduces DiffStyler, a novel approach that facilitates efficient and precise arbitrary image style transfer. DiffStyler lies the utilization of a text-to-image Stable Diffusion model-based LoRA to encapsulate the essence of style targets. This approach, coupled with strategic cross-LoRA feature and attention injection, guides the style transfer process. The foundation of our methodology is rooted in the observation that LoRA maintains the spatial feature consistency of UNet, a discovery that further inspired the development of a mask-wise style transfer technique. This technique employs masks extracted through a pre-trained FastSAM model, utilizing mask prompts to facilitate feature fusion during the denoising process, thereby enabling localized style transfer that preserves the original image's unaffected regions. Moreover, our approach accommodates multiple style targets through the use of corresponding masks. Through extensive experimentation, we demonstrate that DiffStyler surpasses previous methods in achieving a more harmonious balance between content preservation and style integration.
- [260] arXiv:2403.18462 [pdf, other]
-
Title: Decoy Effect In Search Interaction: Understanding User Behavior and Measuring System VulnerabilitySubjects: Information Retrieval (cs.IR)
This study examines the decoy effect's underexplored influence on user search interactions and methods for measuring information retrieval (IR) systems' vulnerability to this effect. It explores how decoy results alter users' interactions on search engine result pages, focusing on metrics like click-through likelihood, browsing time, and perceived document usefulness. By analyzing user interaction logs from multiple datasets, the study demonstrates that decoy results significantly affect users' behavior and perceptions. Furthermore, it investigates how different levels of task difficulty and user knowledge modify the decoy effect's impact, finding that easier tasks and lower knowledge levels lead to higher engagement with target documents. In terms of IR system evaluation, the study introduces the DEJA-VU metric to assess systems' susceptibility to the decoy effect, testing it on specific retrieval tasks. The results show differences in systems' effectiveness and vulnerability, contributing to our understanding of cognitive biases in search behavior and suggesting pathways for creating more balanced and bias-aware IR evaluations.
- [261] arXiv:2403.18469 [pdf, other]
-
Title: Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point CloudsComments: CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
3D synthetic-to-real unsupervised domain adaptive segmentation is crucial to annotating new domains. Self-training is a competitive approach for this task, but its performance is limited by different sensor sampling patterns (i.e., variations in point density) and incomplete training strategies. In this work, we propose a density-guided translator (DGT), which translates point density between domains, and integrates it into a two-stage self-training pipeline named DGT-ST. First, in contrast to existing works that simultaneously conduct data generation and feature/output alignment within unstable adversarial training, we employ the non-learnable DGT to bridge the domain gap at the input level. Second, to provide a well-initialized model for self-training, we propose a category-level adversarial network in stage one that utilizes the prototype to prevent negative transfer. Finally, by leveraging the designs above, a domain-mixed self-training method with source-aware consistency loss is proposed in stage two to narrow the domain gap further. Experiments on two synthetic-to-real segmentation tasks (SynLiDAR $\rightarrow$ semanticKITTI and SynLiDAR $\rightarrow$ semanticPOSS) demonstrate that DGT-ST outperforms state-of-the-art methods, achieving 9.4$\%$ and 4.3$\%$ mIoU improvements, respectively. Code is available at \url{https://github.com/yuan-zm/DGT-ST}.
- [262] arXiv:2403.18471 [pdf, other]
-
Title: DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery AnalysisSubjects: Computer Vision and Pattern Recognition (cs.CV)
The rapid progress in deep learning has given rise to hyper-realistic facial forgery methods, leading to concerns related to misinformation and security risks. Existing face forgery datasets have limitations in generating high-quality facial images and addressing the challenges posed by evolving generative techniques. To combat this, we present DiffusionFace, the first diffusion-based face forgery dataset, covering various forgery categories, including unconditional and Text Guide facial image generation, Img2Img, Inpaint, and Diffusion-based facial exchange algorithms. Our DiffusionFace dataset stands out with its extensive collection of 11 diffusion models and the high-quality of the generated images, providing essential metadata and a real-world internet-sourced forgery facial image dataset for evaluation. Additionally, we provide an in-depth analysis of the data and introduce practical evaluation protocols to rigorously assess discriminative models' effectiveness in detecting counterfeit facial images, aiming to enhance security in facial image authentication processes. The dataset is available for download at \url{https://github.com/Rapisurazurite/DiffFace}.
- [263] arXiv:2403.18472 [pdf, other]
-
Title: Computational decomposition and composition technique for approximate solution of nonstationary problemsAuthors: P.N. VabishchevichComments: 22 pages, 3 figuresSubjects: Numerical Analysis (math.NA)
Stable computational algorithms for the approximate solution of the Cauchy problem for nonstationary problems are based on implicit time approximations. Computational costs for boundary value problems for systems of coupled multidimensional equations can be reduced by additive decomposition of the problem operator(s) and composition of the approximate solution using particular explicit-implicit time approximations. Such a technique is currently applied in conditions where the decomposition step is uncomplicated. A general approach is proposed to construct decomposition-composition algorithms for evolution equations in finite-dimensional Hilbert spaces. It is based on two main variants of the decomposition of the unit operator in the corresponding spaces at the decomposition stage and the application of additive operator-difference schemes at the composition stage. The general results are illustrated on the boundary value problem for a second-order parabolic equation by constructing standard splitting schemes on spatial variables and region-additive schemes (domain decomposition schemes).
- [264] arXiv:2403.18476 [pdf, other]
-
Title: Modeling uncertainty for Gaussian SplattingSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
We present Stochastic Gaussian Splatting (SGS): the first framework for uncertainty estimation using Gaussian Splatting (GS). GS recently advanced the novel-view synthesis field by achieving impressive reconstruction quality at a fraction of the computational cost of Neural Radiance Fields (NeRF). However, contrary to the latter, it still lacks the ability to provide information about the confidence associated with their outputs. To address this limitation, in this paper, we introduce a Variational Inference-based approach that seamlessly integrates uncertainty prediction into the common rendering pipeline of GS. Additionally, we introduce the Area Under Sparsification Error (AUSE) as a new term in the loss function, enabling optimization of uncertainty estimation alongside image reconstruction. Experimental results on the LLFF dataset demonstrate that our method outperforms existing approaches in terms of both image rendering quality and uncertainty estimation accuracy. Overall, our framework equips practitioners with valuable insights into the reliability of synthesized views, facilitating safer decision-making in real-world applications.
- [265] arXiv:2403.18479 [pdf, other]
-
Title: Lightweight Embeddings for Graph Collaborative FilteringComments: Accepted by SIGIR '24Subjects: Information Retrieval (cs.IR)
Graph neural networks (GNNs) are currently one of the most performant collaborative filtering methods. Meanwhile, owing to the use of an embedding table to represent each user/item as a distinct vector, GNN-based recommenders have inherited the long-standing defect of parameter inefficiency. As a common practice for scalable embeddings, parameter sharing enables the use of fewer embedding vectors (i.e., meta-embeddings). When assigning meta-embeddings, most existing methods are a heuristically designed, predefined mapping from each user's/item's ID to the corresponding meta-embedding indexes, thus simplifying the optimization problem into learning only the meta-embeddings. However, in the context of GNN-based collaborative filtering, such a fixed mapping omits the semantic correlations between entities that are evident in the user-item interaction graph, leading to suboptimal recommendation performance. To this end, we propose Lightweight Embeddings for Graph Collaborative Filtering (LEGCF), a parameter-efficient embedding framework dedicated to GNN-based recommenders. LEGCF innovatively introduces an assignment matrix as an extra learnable component on top of meta-embeddings. To jointly optimize these two heavily entangled components, aside from learning the meta-embeddings by minimizing the recommendation loss, LEGCF further performs efficient assignment update by enforcing a novel semantic similarity constraint and finding its closed-form solution based on matrix pseudo-inverse. The meta-embeddings and assignment matrix are alternately updated, where the latter is sparsified on the fly to ensure negligible storage overhead. Extensive experiments on three benchmark datasets have verified LEGCF's smallest trade-off between size and performance, with consistent accuracy gain over state-of-the-art baselines. The codebase of LEGCF is available in https://github.com/xurong-liang/LEGCF.
- [266] arXiv:2403.18480 [pdf, other]
-
Title: Enhanced Generative Recommendation via Content and Collaboration IntegrationAuthors: Yidan Wang, Zhaochun Ren, Weiwei Sun, Jiyuan Yang, Zhixiang Liang, Xin Chen, Ruobing Xie, Su Yan, Xu Zhang, Pengjie Ren, Zhumin Chen, Xin XinSubjects: Information Retrieval (cs.IR)
Generative recommendation has emerged as a promising paradigm aimed at augmenting recommender systems with recent advancements in generative artificial intelligence. This task has been formulated as a sequence-to-sequence generation process, wherein the input sequence encompasses data pertaining to the user's previously interacted items, and the output sequence denotes the generative identifier for the suggested item. However, existing generative recommendation approaches still encounter challenges in (i) effectively integrating user-item collaborative signals and item content information within a unified generative framework, and (ii) executing an efficient alignment between content information and collaborative signals.
In this paper, we introduce content-based collaborative generation for recommender systems, denoted as ColaRec. To capture collaborative signals, the generative item identifiers are derived from a pretrained collaborative filtering model, while the user is represented through the aggregation of interacted items' content. Subsequently, the aggregated textual description of items is fed into a language model to encapsulate content information. This integration enables ColaRec to amalgamate collaborative signals and content information within an end-to-end framework. Regarding the alignment, we propose an item indexing task to facilitate the mapping between the content-based semantic space and the interaction-based collaborative space. Additionally, a contrastive loss is introduced to ensure that items with similar collaborative GIDs possess comparable content representations, thereby enhancing alignment. To validate the efficacy of ColaRec, we conduct experiments on three benchmark datasets. Empirical results substantiate the superior performance of ColaRec. - [267] arXiv:2403.18482 [pdf, other]
-
Title: UVL Sentinel: a tool for parsing and syntactic correction of UVL datasetsComments: Presented at 6th International Workshop on Languages for Modelling Variability (MODEVAR'24) (arXiv:cs/2402.15511)Subjects: Software Engineering (cs.SE)
Feature models have become a de facto standard for representing variability in software product lines. UVL (Universal Variability Language) is a language which expresses the features, dependencies, and constraints between them. This language is written in plain text and follows a syntactic structure that needs to be processed by a parser. This parser is software with specific syntactic rules that the language must comply with to be processed correctly. Researchers have datasets with numerous feature models. The language description form of these feature models is tied to a version of the parser language. When the parser is updated to support new features or correct previous ones, these feature models are often no longer compatible, generating incompatibilities and inconsistency within the dataset. In this paper, we present UVL Sentinel. This tool analyzes a dataset of feature models in UVL format, generating error analysis reports, describing those errors and, eventually, a syntactic processing that applies the most common solutions. This tool can detect the incompatibilities of the feature models of a dataset when the parser is updated and tries to correct the most common syntactic errors, facilitating the management of the dataset and the adaptation of their models to the new version of the parser. Our tool was evaluated using a dataset of 1,479 UVL models from different sources and helped semi-automatically fix 185 warnings and syntax errors.
- [268] arXiv:2403.18486 [pdf, other]
-
Title: Synthesizing EEG Signals from Event-Related Potential Paradigms with Conditional Diffusion ModelsComments: submitted to 9th Graz BCI conference, 6 pages, 3 figures, first figure is split into two subfigures, 1 tableSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Data scarcity in the brain-computer interface field can be alleviated through the use of generative models, specifically diffusion models. While diffusion models have previously been successfully applied to electroencephalogram (EEG) data, existing models lack flexibility w.r.t.~sampling or require alternative representations of the EEG data. To overcome these limitations, we introduce a novel approach to conditional diffusion models that utilizes classifier-free guidance to directly generate subject-, session-, and class-specific EEG data. In addition to commonly used metrics, domain-specific metrics are employed to evaluate the specificity of the generated samples. The results indicate that the proposed model can generate EEG data that resembles real data for each subject, session, and class.
- [269] arXiv:2403.18488 [pdf, ps, other]
-
Title: The Guesswork of Ordered Statistics Decoding: Complexity and Practical DesignComments: Submitted for peer review;19 pages;15 figuresSubjects: Information Theory (cs.IT)
This paper investigates guesswork over ordered statistics and formulates the complexity of ordered statistics decoding (OSD) in binary additive white Gaussian noise (AWGN) channels. It first develops a new upper bound of guesswork for independent sequences, by applying the Holder's inequity to Hamming shell-based subspaces. This upper bound is then extended to the ordered statistics, by constructing the conditionally independent sequences within the ordered statistics sequences. We leverage the established bounds to formulate the best achievable decoding complexity of OSD that ensures no loss in error performance, where OSD stops immediately when the correct codeword estimate is found. We show that the average complexity of OSD at maximum decoding order can be accurately approximated by the modified Bessel function, which increases near-exponentially with code dimension. We also identify a complexity saturation threshold, where increasing the OSD decoding order beyond this threshold improves error performance without further raising decoding complexity. Finally, the paper presents insights on applying these findings to enhance the efficiency of practical decoder implementations.
- [270] arXiv:2403.18489 [pdf, ps, other]
-
Title: Impact of Employing Weather Forecast Data as Input to the Estimation of Evapotranspiration by Deep Neural Network ModelsComments: A partial version of the work submitted to ESRE/INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCES AND RENEWABLE ENERGYSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Reference Evapotranspiration (ET0) is a key parameter for designing smart irrigation scheduling, since it is related by a coefficient to the water needs of a crop. The United Nations Food and Agriculture Organization, proposed a standard method for ET0 computation (FAO56PM), based on the parameterization of the Penman-Monteith equation, that is widely adopted in the literature. To compute ET0 using the FAO56-PM method, four main weather parameters are needed: temperature, humidity, wind, and solar radiation (SR). One way to make daily ET0 estimations for future days is to use freely available weather forecast services (WFSs), where many meteorological parameters are estimated up to the next 15 days. A problem with this method is that currently, SR is not provided as a free forecast parameter on most of those online services or, normally, such forecasts present a financial cost penalty. For this reason, several ET0 estimation models using machine and deep learning were developed and presented in the literature, that use as input features a reduced set of carefully selected weather parameters, that are compatible with common freely available WFSs. However, most studies on this topic have only evaluated model performance using data from weather stations (WSs), without considering the effect of using weather forecast data. In this study, the performance of authors' previous models is evaluated when using weather forecast data from two online WFSs, in the following scenarios: (i) direct ET0 estimation by an ANN model, and (ii) estimate SR by ANN model, and then use that estimation for ET0 computation, using the FAO56-PM method. Employing data collected from two WFSs and a WS located in Vale do Lobo, Portugal, the latter approach achieved the best result, with a coefficient of determination (R2) ranging between 0.893 and 0.667, when considering forecasts up to 15 days.
- [271] arXiv:2403.18490 [pdf, other]
-
Title: I2CKD : Intra- and Inter-Class Knowledge Distillation for Semantic SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
This paper proposes a new knowledge distillation method tailored for image semantic segmentation, termed Intra- and Inter-Class Knowledge Distillation (I2CKD). The focus of this method is on capturing and transferring knowledge between the intermediate layers of teacher (cumbersome model) and student (compact model). For knowledge extraction, we exploit class prototypes derived from feature maps. To facilitate knowledge transfer, we employ a triplet loss in order to minimize intra-class variances and maximize inter-class variances between teacher and student prototypes. Consequently, I2CKD enables the student to better mimic the feature representation of the teacher for each class, thereby enhancing the segmentation performance of the compact network. Extensive experiments on three segmentation datasets, i.e., Cityscapes, Pascal VOC and CamVid, using various teacher-student network pairs demonstrate the effectiveness of the proposed method.
- [272] arXiv:2403.18491 [pdf, other]
-
Title: Algorithmic Details behind the Predator Shape AnalyserComments: Book chapter previewSubjects: Software Engineering (cs.SE); Programming Languages (cs.PL)
This chapter, which is an extended and revised version of the conference paper 'Predator: Byte-Precise Verification of Low-Level List Manipulation', concentrates on a detailed description of the algorithms behind the Predator shape analyser based on abstract interpretation and symbolic memory graphs. Predator is particularly suited for formal analysis and verification of sequential non-recursive C code that uses low-level pointer operations to manipulate various kinds of linked lists of unbounded size as well as various other kinds of pointer structures of bounded size. The tool supports practically relevant forms of pointer arithmetic, block operations, address alignment, or memory reinterpretation. We present the overall architecture of the tool, along with selected implementation details of the tool as well as its extension into so-called Predator Hunting Party, which utilises multiple concurrently-running Predator analysers with various restrictions on their behaviour. Results of experiments with Predator within the SV-COMP competition as well as on our own benchmarks are provided.
- [273] arXiv:2403.18493 [pdf, other]
-
Title: VersaT2I: Improving Text-to-Image Models with Versatile RewardAuthors: Jianshu Guo, Wenhao Chai, Jie Deng, Hsiang-Wei Huang, Tian Ye, Yichen Xu, Jiawei Zhang, Jenq-Neng Hwang, Gaoang WangSubjects: Computer Vision and Pattern Recognition (cs.CV)
Recent text-to-image (T2I) models have benefited from large-scale and high-quality data, demonstrating impressive performance. However, these T2I models still struggle to produce images that are aesthetically pleasing, geometrically accurate, faithful to text, and of good low-level quality. We present VersaT2I, a versatile training framework that can boost the performance with multiple rewards of any T2I model. We decompose the quality of the image into several aspects such as aesthetics, text-image alignment, geometry, low-level quality, etc. Then, for every quality aspect, we select high-quality images in this aspect generated by the model as the training set to finetune the T2I model using the Low-Rank Adaptation (LoRA). Furthermore, we introduce a gating function to combine multiple quality aspects, which can avoid conflicts between different quality aspects. Our method is easy to extend and does not require any manual annotation, reinforcement learning, or model architecture changes. Extensive experiments demonstrate that VersaT2I outperforms the baseline methods across various quality criteria.
- [274] arXiv:2403.18494 [pdf, other]
-
Title: Learning in PINNs: Phase transition, total diffusion, and generalizationAuthors: Sokratis J. Anagnostopoulos, Juan Diego Toscano, Nikolaos Stergiopulos, George Em KarniadakisSubjects: Machine Learning (cs.LG)
We investigate the learning dynamics of fully-connected neural networks through the lens of gradient signal-to-noise ratio (SNR), examining the behavior of first-order optimizers like Adam in non-convex objectives. By interpreting the drift/diffusion phases in the information bottleneck theory, focusing on gradient homogeneity, we identify a third phase termed ``total diffusion", characterized by equilibrium in the learning rates and homogeneous gradients. This phase is marked by an abrupt SNR increase, uniform residuals across the sample space and the most rapid training convergence. We propose a residual-based re-weighting scheme to accelerate this diffusion in quadratic loss functions, enhancing generalization. We also explore the information compression phenomenon, pinpointing a significant saturation-induced compression of activations at the total diffusion phase, with deeper layers experiencing negligible information loss. Supported by experimental data on physics-informed neural networks (PINNs), which underscore the importance of gradient homogeneity due to their PDE-based sample inter-dependence, our findings suggest that recognizing phase transitions could refine ML optimization strategies for improved generalization.
- [275] arXiv:2403.18495 [pdf, other]
-
Title: Direct mineral content prediction from drill core images via transfer learningAuthors: Romana Boiger, Sergey V. Churakov, Ignacio Ballester Llagaria, Georg Kosakowski, Raphael Wüst, Nikolaos I. PrasianakisSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Deep subsurface exploration is important for mining, oil and gas industries, as well as in the assessment of geological units for the disposal of chemical or nuclear waste, or the viability of geothermal energy systems. Typically, detailed examinations of subsurface formations or units are performed on cuttings or core materials extracted during drilling campaigns, as well as on geophysical borehole data, which provide detailed information about the petrophysical properties of the rocks. Depending on the volume of rock samples and the analytical program, the laboratory analysis and diagnostics can be very time-consuming. This study investigates the potential of utilizing machine learning, specifically convolutional neural networks (CNN), to assess the lithology and mineral content solely from analysis of drill core images, aiming to support and expedite the subsurface geological exploration. The paper outlines a comprehensive methodology, encompassing data preprocessing, machine learning methods, and transfer learning techniques. The outcome reveals a remarkable 96.7% accuracy in the classification of drill core segments into distinct formation classes. Furthermore, a CNN model was trained for the evaluation of mineral content using a learning data set from multidimensional log analysis data (silicate, total clay, carbonate). When benchmarked against laboratory XRD measurements on samples from the cores, both the advanced multidimensional log analysis model and the neural network approach developed here provide equally good performance. This work demonstrates that deep learning and particularly transfer learning can support extracting petrophysical properties, including mineral content and formation classification, from drill core images, thus offering a road map for enhancing model performance and data set quality in image-based analysis of drill cores.
- [276] arXiv:2403.18497 [pdf, other]
-
Title: Minimum sum vertex cover: kernelization and parameterized algorithmsSubjects: Data Structures and Algorithms (cs.DS)
Given an ordering of the vertices of a graph, the cost of covering an edge is the smaller number of its two ends. The minimum sum vertex cover problem asks for an ordering that minimizes the total cost of covering all edges. We consider parameterized complexity of this problem, using the largest cost~$k$ of covering a single edge as the parameter. Note that the first $k$ vertices form a (not necessarily minimal) vertex cover of the graph, and ordering of vertices after $k$ is irrelevant. We present a $(2k^2 + 3k)$-vertex kernel and an $O(|E(G)| + 2^kk! k^4)$-time algorithm for the minimum sum vertex cover problem.
- [277] arXiv:2403.18504 [pdf, ps, other]
-
Title: AcTED: Automatic Acquisition of Typical Event Duration for Semi-supervised Temporal Commonsense QAAuthors: Felix Virgo, Fei Cheng, Lis Kanashiro Pereira, Masayuki Asahara, Ichiro Kobayashi, Sadao KurohashiSubjects: Computation and Language (cs.CL)
We propose a voting-driven semi-supervised approach to automatically acquire the typical duration of an event and use it as pseudo-labeled data. The human evaluation demonstrates that our pseudo labels exhibit surprisingly high accuracy and balanced coverage. In the temporal commonsense QA task, experimental results show that using only pseudo examples of 400 events, we achieve performance comparable to the existing BERT-based weakly supervised approaches that require a significant amount of training examples. When compared to the RoBERTa baselines, our best approach establishes state-of-the-art performance with a 7% improvement in Exact Match.
- [278] arXiv:2403.18506 [pdf, other]
-
Title: Faster Convergence for Transformer Fine-tuning with Line Search MethodsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Recent works have shown that line search methods greatly increase performance of traditional stochastic gradient descent methods on a variety of datasets and architectures [1], [2]. In this work we succeed in extending line search methods to the novel and highly popular Transformer architecture and dataset domains in natural language processing. More specifically, we combine the Armijo line search with the Adam optimizer and extend it by subdividing the networks architecture into sensible units and perform the line search separately on these local units. Our optimization method outperforms the traditional Adam optimizer and achieves significant performance improvements for small data sets or small training budgets, while performing equal or better for other tested cases. Our work is publicly available as a python package, which provides a hyperparameter-free pytorch optimizer that is compatible with arbitrary network architectures.
- [279] arXiv:2403.18508 [pdf, ps, other]
-
Title: On Propositional Dynamic Logic and ConcurrencySubjects: Logic in Computer Science (cs.LO)
Dynamic logic in the setting of concurrency has proved problematic because of the challenge of capturing interleaving. This challenge stems from the fact that the operational semantics for programs considered in these logics is tailored on trace reasoning for sequential programs. In this work, we generalise propositional dynamic logic (PDL) to a logic framework we call operational propositional dynamic logic (OPDL) in which we are able to reason on sets of programs provided with arbitrary operational semantics. We prove cut-elimination and adequacy of a sequent calculus for PDL and we extend these results to OPDL. We conclude by discussing OPDL for Milner's CCS and Choreographic Programming.
- [280] arXiv:2403.18509 [pdf, ps, other]
-
Title: Distributed Maximum Consensus over Noisy LinksComments: 5 pages, 7 figures, submitted to EUSIPCO 2024 conferenceSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Signal Processing (eess.SP)
We introduce a distributed algorithm, termed noise-robust distributed maximum consensus (RD-MC), for estimating the maximum value within a multi-agent network in the presence of noisy communication links. Our approach entails redefining the maximum consensus problem as a distributed optimization problem, allowing a solution using the alternating direction method of multipliers. Unlike existing algorithms that rely on multiple sets of noise-corrupted estimates, RD-MC employs a single set, enhancing both robustness and efficiency. To further mitigate the effects of link noise and improve robustness, we apply moving averaging to the local estimates. Through extensive simulations, we demonstrate that RD-MC is significantly more robust to communication link noise compared to existing maximum-consensus algorithms.
- [281] arXiv:2403.18512 [pdf, other]
-
Title: ParCo: Part-Coordinating Text-to-Motion SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV)
We study a challenging task: text-to-motion synthesis, aiming to generate motions that align with textual descriptions and exhibit coordinated movements. Currently, the part-based methods introduce part partition into the motion synthesis process to achieve finer-grained generation. However, these methods encounter challenges such as the lack of coordination between different part motions and difficulties for networks to understand part concepts. Moreover, introducing finer-grained part concepts poses computational complexity challenges. In this paper, we propose Part-Coordinating Text-to-Motion Synthesis (ParCo), endowed with enhanced capabilities for understanding part motions and communication among different part motion generators, ensuring a coordinated and fined-grained motion synthesis. Specifically, we discretize whole-body motion into multiple part motions to establish the prior concept of different parts. Afterward, we employ multiple lightweight generators designed to synthesize different part motions and coordinate them through our part coordination module. Our approach demonstrates superior performance on common benchmarks with economic computations, including HumanML3D and KIT-ML, providing substantial evidence of its effectiveness. Code is available at https://github.com/qrzou/ParCo .
- [282] arXiv:2403.18513 [pdf, other]
-
Title: Realizing temporal transportation treesSubjects: Data Structures and Algorithms (cs.DS)
In this paper, we study the complexity of the \textit{periodic temporal graph realization} problem with respect to upper bounds on the fastest path durations among its vertices. This constraint with respect to upper bounds appears naturally in transportation network design applications where, for example, a road network is given, and the goal is to appropriately schedule periodic travel routes, while not exceeding some desired upper bounds on the travel times. This approach is in contrast to verification applications of the graph realization problems, where exact values for the distances (respectively, fastest travel times) are given, following some kind of precise measurement. In our work, we focus only on underlying tree topologies, which are fundamental in many transportation network applications.
As it turns out, the periodic upper-bounded temporal tree realization problem (TTR) has a very different computational complexity behavior than both (i) the classic graph realization problem with respect to shortest path distances in static graphs and (ii) the periodic temporal graph realization problem with exact given fastest travel times (which was recently introduced). First, we prove that, surprisingly, TTR is NP-hard, even for a constant period $\Delta$ and when the input tree $G$ satisfies at least one of the following conditions: (a) $G$ has a constant diameter, or (b) $G$ has constant maximum degree. In contrast, when we are given exact values of the fastest travel delays, the problem is known to be solvable in polynomial time. Second, we prove that TTR is fixed-parameter tractable (FPT) with respect to the number of leaves in the input tree $G$, via a novel combination of techniques for totally unimodular matrices and mixed integer linear programming. - [283] arXiv:2403.18517 [pdf, other]
-
Title: Efficient Algorithms for Regularized Nonnegative Scale-invariant Low-rank Approximation ModelsSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
Regularized nonnegative low-rank approximations such as sparse Nonnegative Matrix Factorization or sparse Nonnegative Tucker Decomposition are an important branch of dimensionality reduction models with enhanced interpretability. However, from a practical perspective, the choice of regularizers and regularization coefficients, as well as the design of efficient algorithms, is challenging because of the multifactor nature of these models and the lack of theory to back these choices. This paper aims at improving upon these issues. By studying a more general model called the Homogeneous Regularized Scale-Invariant, we prove that the scale-invariance inherent to low-rank approximation models causes an implicit regularization with both unexpected beneficial and detrimental effects. This observation allows to better understand the effect of regularization functions in low-rank approximation models, to guide the choice of the regularization hyperparameters, and to design balancing strategies to enhance the convergence speed of dedicated optimization algorithms. Some of these results were already known but restricted to specific instances of regularized low-rank approximations. We also derive a generic Majorization Minimization algorithm that handles many regularized nonnegative low-rank approximations, with convergence guarantees. We showcase our contributions on sparse Nonnegative Matrix Factorization, ridge-regularized Canonical Polyadic decomposition and sparse Nonnegative Tucker Decomposition.
- [284] arXiv:2403.18519 [pdf, other]
-
Title: Improving Line Search Methods for Large Scale Neural Network TrainingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
In recent studies, line search methods have shown significant improvements in the performance of traditional stochastic gradient descent techniques, eliminating the need for a specific learning rate schedule. In this paper, we identify existing issues in state-of-the-art line search methods, propose enhancements, and rigorously evaluate their effectiveness. We test these methods on larger datasets and more complex data domains than before. Specifically, we improve the Armijo line search by integrating the momentum term from ADAM in its search direction, enabling efficient large-scale training, a task that was previously prone to failure using Armijo line search methods. Our optimization approach outperforms both the previous Armijo implementation and tuned learning rate schedules for Adam. Our evaluation focuses on Transformers and CNNs in the domains of NLP and image data. Our work is publicly available as a Python package, which provides a hyperparameter free Pytorch optimizer.
- [285] arXiv:2403.18520 [pdf, other]
-
Title: Global convergence of iterative solvers for problems of nonlinear magnetostaticsSubjects: Numerical Analysis (math.NA)
We consider the convergence of iterative solvers for problems of nonlinear magnetostatics. Using the equivalence to an underlying minimization problem, we can establish global linear convergence of a large class of methods, including the damped Newton-method, fixed-point iteration, and the Kacanov iteration, which can all be interpreted as generalized gradient descent methods. Armijo backtracking isconsidered for an adaptive choice of the stepsize. The general assumptions required for our analysis cover inhomogeneous, nonlinear, and anisotropic materials, as well as permanent magnets. The main results are proven on the continuous level, but they carry over almost verbatim to various approximation schemes, including finite elements and isogeometric analysis, leading to bounds on the iteration numbers, which are independent of the particular discretization. The theoretical results are illustrated by numerical tests for a typical benchmark problem.
- [286] arXiv:2403.18524 [pdf, other]
-
Title: Bridging the Gap: Regularized Reinforcement Learning for Improved Classical Motion Planning with Safety ModulesComments: 8 pagesSubjects: Robotics (cs.RO)
Classical navigation planners can provide safe navigation, albeit often suboptimally and with hindered human norm compliance. ML-based, contemporary autonomous navigation algorithms can imitate more natural and humancompliant navigation, but usually require large and realistic datasets and do not always provide safety guarantees. We present an approach that leverages a classical algorithm to guide reinforcement learning. This greatly improves the results and convergence rate of the underlying RL algorithm and requires no human-expert demonstrations to jump-start the process. Additionally, we incorporate a practical fallback system that can switch back to a classical planner to ensure safety. The outcome is a sample efficient ML approach for mobile navigation that builds on classical algorithms, improves them to ensure human compliance, and guarantees safety.
- [287] arXiv:2403.18525 [pdf, other]
-
Title: Language Plays a Pivotal Role in the Object-Attribute Compositional Generalization of CLIPComments: Oral accepted at OODCV 2023(this http URL)Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Vision-language models, such as CLIP, have shown promising Out-of-Distribution (OoD) generalization under various types of distribution shifts. Recent studies attempted to investigate the leading cause of this capability. In this work, we follow the same path, but focus on a specific type of OoD data - images with novel compositions of attribute-object pairs - and study whether such models can successfully classify those images into composition classes. We carefully designed an authentic image test dataset called ImageNet-AO, consisting of attributes for objects that are unlikely encountered in the CLIP training sets. We found that CLIPs trained with large datasets such as OpenAI CLIP, LAION-400M, and LAION-2B show orders-of-magnitude improvement in effective compositional OoD generalization compared to both supervised models and CLIPs trained with smaller datasets, such as CC-12M and YFCC-15M. Our results provide evidence that the scale and diversity of training data and language supervision play a key role in unlocking the compositional generalization abilities of vision-language models.
- [288] arXiv:2403.18527 [pdf, other]
-
Title: Wirtinger gradient descent methods for low-dose Poisson phase retrievalSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
The problem of phase retrieval has many applications in the field of optical imaging. Motivated by imaging experiments with biological specimens, we primarily consider the setting of low-dose illumination where Poisson noise plays the dominant role. In this paper, we discuss gradient descent algorithms based on different loss functions adapted to data affected by Poisson noise, in particular in the low-dose regime. Starting from the maximum log-likelihood function for the Poisson distribution, we investigate different regularizations and approximations of the problem to design an algorithm that meets the requirements that are faced in applications. In the course of this, we focus on low-count measurements. For all suggested loss functions, we study the convergence of the respective gradient descent algorithms to stationary points and find constant step sizes that guarantee descent of the loss in each iteration. Numerical experiments in the low-dose regime are performed to corroborate the theoretical observations.
- [289] arXiv:2403.18536 [pdf, ps, other]
-
Title: A Novel Behavior-Based Recommendation System for E-commerceAuthors: Reza Barzegar Nozari, Mahdi Divsalar, Sepehr Akbarzadeh Abkenar, Mohammadreza Fadavi Amiri, Ali DivsalarSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
The majority of existing recommender systems rely on user ratings, which are limited by the lack of user collaboration and the sparsity problem. To address these issues, this study proposes a behavior-based recommender system that leverages customers' natural behaviors, such as browsing and clicking, on e-commerce platforms. The proposed recommendation system involves clustering active customers, determining neighborhoods, collecting similar users, calculating product reputation based on similar users, and recommending high-reputation products. To overcome the complexity of customer behaviors and traditional clustering methods, an unsupervised clustering approach based on product categories is developed to enhance the recommendation methodology. This study makes notable contributions in several aspects. Firstly, a groundbreaking behavior-based recommendation methodology is developed, incorporating customer behavior to generate accurate and tailored recommendations leading to improved customer satisfaction and engagement. Secondly, an original unsupervised clustering method, focusing on product categories, enables more precise clustering and facilitates accurate recommendations. Finally, an approach to determine neighborhoods for active customers within clusters is established, ensuring grouping of customers with similar behavioral patterns to enhance recommendation accuracy and relevance. The proposed recommendation methodology and clustering method contribute to improved recommendation performance, offering valuable insights for researchers and practitioners in the field of e-commerce recommendation systems. Additionally, the proposed method outperforms benchmark methods in experiments conducted using a behavior dataset from the well-known e-commerce site Alibaba.
- [290] arXiv:2403.18537 [pdf, ps, other]
-
Title: A Path Towards Legal Autonomy: An interoperable and explainable approach to extracting, transforming, loading and computing legal information using large language models, expert systems and Bayesian networksAuthors: Axel Constant, Hannes Westermann, Bryan Wilson, Alex Kiefer, Ines Hipolito, Sylvain Pronovost, Steven Swanson, Mahault Albarracin, Maxwell J.D. RamsteadSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY); Logic in Computer Science (cs.LO)
Legal autonomy - the lawful activity of artificial intelligence agents - can be achieved in one of two ways. It can be achieved either by imposing constraints on AI actors such as developers, deployers and users, and on AI resources such as data, or by imposing constraints on the range and scope of the impact that AI agents can have on the environment. The latter approach involves encoding extant rules concerning AI driven devices into the software of AI agents controlling those devices (e.g., encoding rules about limitations on zones of operations into the agent software of an autonomous drone device). This is a challenge since the effectivity of such an approach requires a method of extracting, loading, transforming and computing legal information that would be both explainable and legally interoperable, and that would enable AI agents to reason about the law. In this paper, we sketch a proof of principle for such a method using large language models (LLMs), expert legal systems known as legal decision paths, and Bayesian networks. We then show how the proposed method could be applied to extant regulation in matters of autonomous cars, such as the California Vehicle Code.
- [291] arXiv:2403.18539 [pdf, other]
-
Title: Safe and Robust Reinforcement-Learning: Principles and PracticeSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
Reinforcement Learning (RL) has shown remarkable success in solving relatively complex tasks, yet the deployment of RL systems in real-world scenarios poses significant challenges related to safety and robustness. This paper aims to identify and further understand those challenges thorough the exploration of the main dimensions of the safe and robust RL landscape, encompassing algorithmic, ethical, and practical considerations. We conduct a comprehensive review of methodologies and open problems that summarizes the efforts in recent years to address the inherent risks associated with RL applications.
After discussing and proposing definitions for both safe and robust RL, the paper categorizes existing research works into different algorithmic approaches that enhance the safety and robustness of RL agents. We examine techniques such as uncertainty estimation, optimisation methodologies, exploration-exploitation trade-offs, and adversarial training. Environmental factors, including sim-to-real transfer and domain adaptation, are also scrutinized to understand how RL systems can adapt to diverse and dynamic surroundings. Moreover, human involvement is an integral ingredient of the analysis, acknowledging the broad set of roles that humans can take in this context.
Importantly, to aid practitioners in navigating the complexities of safe and robust RL implementation, this paper introduces a practical checklist derived from the synthesized literature. The checklist encompasses critical aspects of algorithm design, training environment considerations, and ethical guidelines. It will serve as a resource for developers and policymakers alike to ensure the responsible deployment of RL systems in many application domains. - [292] arXiv:2403.18542 [pdf, other]
-
Title: Attention-aware semantic relevance predicting Chinese sentence readingAuthors: Kun SunSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
In recent years, several influential computational models and metrics have been proposed to predict how humans comprehend and process sentence. One particularly promising approach is contextual semantic similarity. Inspired by the attention algorithm in Transformer and human memory mechanisms, this study proposes an ``attention-aware'' approach for computing contextual semantic relevance. This new approach takes into account the different contributions of contextual parts and the expectation effect, allowing it to incorporate contextual information fully. The attention-aware approach also facilitates the simulation of existing reading models and evaluate them. The resulting ``attention-aware'' metrics of semantic relevance can more accurately predict fixation durations in Chinese reading tasks recorded in an eye-tracking corpus than those calculated by existing approaches. The study's findings further provide strong support for the presence of semantic preview benefits in Chinese naturalistic reading. Furthermore, the attention-aware metrics of semantic relevance, being memory-based, possess high interpretability from both linguistic and cognitive standpoints, making them a valuable computational tool for modeling eye-movements in reading and further gaining insight into the process of language comprehension. Our approach underscores the potential of these metrics to advance our comprehension of how humans understand and process language, ultimately leading to a better understanding of language comprehension and processing.
- [293] arXiv:2403.18545 [pdf, other]
-
Title: Optimal Resource Efficiency with Fairness in Heterogeneous GPU ClustersSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Ensuring the highest training throughput to maximize resource efficiency, while maintaining fairness among users, is critical for deep learning (DL) training in heterogeneous GPU clusters. However, current DL schedulers provide only limited fairness properties and suboptimal training throughput, impeding tenants from effectively leveraging heterogeneous resources. The underlying design challenge stems from inherent conflicts between efficiency and fairness properties.
In this paper, we introduce OEF, a new resource allocation framework specifically developed for achieving optimal resource efficiency and ensuring diverse fairness properties in heterogeneous GPU clusters. By integrating resource efficiency and fairness within a global optimization framework, OEF is capable of providing users with maximized overall efficiency, as well as various guarantees of fairness, in both cooperative and non-cooperative environments. We have implemented OEF in a cluster resource manager and conducted large-scale experiments, showing that OEF can improve the overall training throughput by up to 32% while improving fairness compared to state-of-the-art heterogeneity-aware schedulers. - [294] arXiv:2403.18546 [pdf, other]
-
Title: Efficient Heatmap-Guided 6-Dof Grasp Detection in Cluttered ScenesComments: Extensive results on GraspNet-1B datasetSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Fast and robust object grasping in clutter is a crucial component of robotics. Most current works resort to the whole observed point cloud for 6-Dof grasp generation, ignoring the guidance information excavated from global semantics, thus limiting high-quality grasp generation and real-time performance. In this work, we show that the widely used heatmaps are underestimated in the efficiency of 6-Dof grasp generation. Therefore, we propose an effective local grasp generator combined with grasp heatmaps as guidance, which infers in a global-to-local semantic-to-point way. Specifically, Gaussian encoding and the grid-based strategy are applied to predict grasp heatmaps as guidance to aggregate local points into graspable regions and provide global semantic information. Further, a novel non-uniform anchor sampling mechanism is designed to improve grasp accuracy and diversity. Benefiting from the high-efficiency encoding in the image space and focusing on points in local graspable regions, our framework can perform high-quality grasp detection in real-time and achieve state-of-the-art results. In addition, real robot experiments demonstrate the effectiveness of our method with a success rate of 94% and a clutter completion rate of 100%. Our code is available at https://github.com/THU-VCLab/HGGD.
- [295] arXiv:2403.18547 [pdf, other]
-
Title: Neural Architecture Search for Sentence Classification with BERTSubjects: Artificial Intelligence (cs.AI)
Pre training of language models on large text corpora is common practice in Natural Language Processing. Following, fine tuning of these models is performed to achieve the best results on a variety of tasks. In this paper we question the common practice of only adding a single output layer as a classification head on top of the network. We perform an AutoML search to find architectures that outperform the current single layer at only a small compute cost. We validate our classification architecture on a variety of NLP benchmarks from the GLUE dataset.
- [296] arXiv:2403.18548 [pdf, other]
-
Title: A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness ConstraintComments: This paper is accepted by CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Existing research based on deep learning has extensively explored the problem of daytime image dehazing. However, few studies have considered the characteristics of nighttime hazy scenes. There are two distinctions between nighttime and daytime haze. First, there may be multiple active colored light sources with lower illumination intensity in nighttime scenes, which may cause haze, glow and noise with localized, coupled and frequency inconsistent characteristics. Second, due to the domain discrepancy between simulated and real-world data, unrealistic brightness may occur when applying a dehazing model trained on simulated data to real-world data. To address the above two issues, we propose a semi-supervised model for real-world nighttime dehazing. First, the spatial attention and frequency spectrum filtering are implemented as a spatial-frequency domain information interaction module to handle the first issue. Second, a pseudo-label-based retraining strategy and a local window-based brightness loss for semi-supervised training process is designed to suppress haze and glow while achieving realistic brightness. Experiments on public benchmarks validate the effectiveness of the proposed method and its superiority over state-of-the-art methods. The source code and Supplementary Materials are placed in the https://github.com/Xiaofeng-life/SFSNiD.
- [297] arXiv:2403.18550 [pdf, other]
-
Title: OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental LearningSubjects: Computer Vision and Pattern Recognition (cs.CV)
Few-Shot Class-Incremental Learning (FSCIL) introduces a paradigm in which the problem space expands with limited data. FSCIL methods inherently face the challenge of catastrophic forgetting as data arrives incrementally, making models susceptible to overwriting previously acquired knowledge. Moreover, given the scarcity of labeled samples available at any given time, models may be prone to overfitting and find it challenging to strike a balance between extensive pretraining and the limited incremental data. To address these challenges, we propose the OrCo framework built on two core principles: features' orthogonality in the representation space, and contrastive learning. In particular, we improve the generalization of the embedding space by employing a combination of supervised and self-supervised contrastive losses during the pretraining phase. Additionally, we introduce OrCo loss to address challenges arising from data limitations during incremental sessions. Through feature space perturbations and orthogonality between classes, the OrCo loss maximizes margins and reserves space for the following incremental data. This, in turn, ensures the accommodation of incoming classes in the feature space without compromising previously acquired knowledge. Our experimental results showcase state-of-the-art performance across three benchmark datasets, including mini-ImageNet, CIFAR100, and CUB datasets. Code is available at https://github.com/noorahmedds/OrCo
- [298] arXiv:2403.18551 [pdf, other]
-
Title: Attention Calibration for Disentangled Text-to-Image PersonalizationComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent thrilling progress in large-scale text-to-image (T2I) models has unlocked unprecedented synthesis quality of AI-generated content (AIGC) including image generation, 3D and video composition. Further, personalized techniques enable appealing customized production of a novel concept given only several images as reference. However, an intriguing problem persists: Is it possible to capture multiple, novel concepts from one single reference image? In this paper, we identify that existing approaches fail to preserve visual consistency with the reference image and eliminate cross-influence from concepts. To alleviate this, we propose an attention calibration mechanism to improve the concept-level understanding of the T2I model. Specifically, we first introduce new learnable modifiers bound with classes to capture attributes of multiple concepts. Then, the classes are separated and strengthened following the activation of the cross-attention operation, ensuring comprehensive and self-contained concepts. Additionally, we suppress the attention activation of different classes to mitigate mutual influence among concepts. Together, our proposed method, dubbed DisenDiff, can learn disentangled multiple concepts from one single image and produce novel customized images with learned concepts. We demonstrate that our method outperforms the current state of the art in both qualitative and quantitative evaluations. More importantly, our proposed techniques are compatible with LoRA and inpainting pipelines, enabling more interactive experiences.
- [299] arXiv:2403.18552 [pdf, other]
-
Title: Generalized convergence of the deep BSDE method: a step towards fully-coupled FBSDEs and applications in stochastic controlComments: 25 pages, 3 figures, 1 tableSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
We are concerned with high-dimensional coupled FBSDE systems approximated by the deep BSDE method of Han et al. (2018). It was shown by Han and Long (2020) that the errors induced by the deep BSDE method admit a posteriori estimate depending on the loss function, whenever the backward equation only couples into the forward diffusion through the Y process. We generalize this result to fully-coupled drift coefficients, and give sufficient conditions for convergence under standard assumptions. The resulting conditions are directly verifiable for any equation. Consequently, unlike in earlier theory, our convergence analysis enables the treatment of FBSDEs stemming from stochastic optimal control problems. In particular, we provide a theoretical justification for the non-convergence of the deep BSDE method observed in recent literature, and present direct guidelines for when convergence can be guaranteed in practice. Our theoretical findings are supported by several numerical experiments in high-dimensional settings.
- [300] arXiv:2403.18554 [pdf, other]
-
Title: CosalPure: Learning Concept from Group Images for Robust Co-Saliency DetectionComments: 8 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Co-salient object detection (CoSOD) aims to identify the common and salient (usually in the foreground) regions across a given group of images. Although achieving significant progress, state-of-the-art CoSODs could be easily affected by some adversarial perturbations, leading to substantial accuracy reduction. The adversarial perturbations can mislead CoSODs but do not change the high-level semantic information (e.g., concept) of the co-salient objects. In this paper, we propose a novel robustness enhancement framework by first learning the concept of the co-salient objects based on the input group images and then leveraging this concept to purify adversarial perturbations, which are subsequently fed to CoSODs for robustness enhancement. Specifically, we propose CosalPure containing two modules, i.e., group-image concept learning and concept-guided diffusion purification. For the first module, we adopt a pre-trained text-to-image diffusion model to learn the concept of co-salient objects within group images where the learned concept is robust to adversarial examples. For the second module, we map the adversarial image to the latent space and then perform diffusion generation by embedding the learned concept into the noise prediction function as an extra condition. Our method can effectively alleviate the influence of the SOTA adversarial attack containing different adversarial patterns, including exposure and noise. The extensive results demonstrate that our method could enhance the robustness of CoSODs significantly.
- [301] arXiv:2403.18555 [pdf, other]
-
Title: Debiasing Sentence Embedders through Contrastive Word PairsSubjects: Computation and Language (cs.CL)
Over the last years, various sentence embedders have been an integral part in the success of current machine learning approaches to Natural Language Processing (NLP). Unfortunately, multiple sources have shown that the bias, inherent in the datasets upon which these embedding methods are trained, is learned by them. A variety of different approaches to remove biases in embeddings exists in the literature. Most of these approaches are applicable to word embeddings and in fewer cases to sentence embeddings. It is problematic that most debiasing approaches are directly transferred from word embeddings, therefore these approaches fail to take into account the nonlinear nature of sentence embedders and the embeddings they produce. It has been shown in literature that bias information is still present if sentence embeddings are debiased using such methods. In this contribution, we explore an approach to remove linear and nonlinear bias information for NLP solutions, without impacting downstream performance. We compare our approach to common debiasing methods on classical bias metrics and on bias metrics which take nonlinear information into account.
- [302] arXiv:2403.18561 [pdf, other]
-
Title: A Dynamic Programming Approach for Road Traffic EstimationSubjects: Systems and Control (eess.SY)
We consider a road network represented by a directed graph. We assume to collect many measurements of traffic flows on all the network arcs, or on a subset of them. We assume that the users are divided into different groups. Each group follows a different path. The flows of all user groups are modeled as a set of independent Poisson processes. Our focus is estimating the paths followed by each user group, and the means of the associated Poisson processes. We present a possible solution based on a Dynamic Programming algorithm. The method relies on the knowledge of high order cumulants. We discuss the theoretical properties of the introduced method. Finally, we present some numerical tests on well-known benchmark networks, using synthetic data.
- [303] arXiv:2403.18564 [pdf, ps, other]
-
Title: Formal Verification with Constrained Polynomial Logical ZonotopeSubjects: Systems and Control (eess.SY); Logic in Computer Science (cs.LO)
In this paper, we propose using constrained polynomial logical zonotopes for formal verification of logical systems. We perform reachability analysis to compute the set of states that could be reached. To do this, we utilize a recently introduced set representation called polynomial logical zonotopes for performing computationally efficient and exact reachability analysis on logical systems. Notably, polynomial logical zonotopes address the "curse of dimensionality" when analyzing the reachability of logical systems since the set representation can represent 2^n binary vectors using n generators. After finishing the reachability analysis, the formal verification involves verifying whether the intersection of the calculated reachable set and the unsafe set is empty or not. However, polynomial logical zonotopes are not closed under intersections. To address this, we formulate constrained polynomial logical zonotopes, which maintain the computational efficiency and exactness of polynomial logical zonotopes for reachability analysis while supporting exact intersections. Furthermore, we present an extensive empirical study illustrating and verifying the benefits of using constrained polynomial logical zonotopes for the formal verification of logical systems.
- [304] arXiv:2403.18565 [pdf, other]
-
Title: Artifact Reduction in 3D and 4D Cone-beam Computed Tomography Images with Deep Learning -- A ReviewComments: 16 pages, 4 figures, 1 Table, published in IEEE Access JournalJournal-ref: IEEE Access, vol. 12, pp. 10281-10295, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep learning based approaches have been used to improve image quality in cone-beam computed tomography (CBCT), a medical imaging technique often used in applications such as image-guided radiation therapy, implant dentistry or orthopaedics. In particular, while deep learning methods have been applied to reduce various types of CBCT image artifacts arising from motion, metal objects, or low-dose acquisition, a comprehensive review summarizing the successes and shortcomings of these approaches, with a primary focus on the type of artifacts rather than the architecture of neural networks, is lacking in the literature. In this review, the data generation and simulation pipelines, and artifact reduction techniques are specifically investigated for each type of artifact. We provide an overview of deep learning techniques that have successfully been shown to reduce artifacts in 3D, as well as in time-resolved (4D) CBCT through the use of projection- and/or volume-domain optimizations, or by introducing neural networks directly within the CBCT reconstruction algorithms. Research gaps are identified to suggest avenues for future exploration. One of the key findings of this work is an observed trend towards the use of generative models including GANs and score-based or diffusion models, accompanied with the need for more diverse and open training datasets and simulations.
- [305] arXiv:2403.18569 [pdf, other]
-
Title: PDNNet: PDN-Aware GNN-CNN Heterogeneous Network for Dynamic IR Drop PredictionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
IR drop on the power delivery network (PDN) is closely related to PDN's configuration and cell current consumption. As the integrated circuit (IC) design is growing larger, dynamic IR drop simulation becomes computationally unaffordable and machine learning based IR drop prediction has been explored as a promising solution. Although CNN-based methods have been adapted to IR drop prediction task in several works, the shortcomings of overlooking PDN configuration is non-negligible. In this paper, we consider not only how to properly represent cell-PDN relation, but also how to model IR drop following its physical nature in the feature aggregation procedure. Thus, we propose a novel graph structure, PDNGraph, to unify the representations of the PDN structure and the fine-grained cell-PDN relation. We further propose a dual-branch heterogeneous network, PDNNet, incorporating two parallel GNN-CNN branches to favorably capture the above features during the learning process. Several key designs are presented to make the dynamic IR drop prediction highly effective and interpretable. We are the first work to apply graph structure to deep-learning based dynamic IR drop prediction method. Experiments show that PDNNet outperforms the state-of-the-art CNN-based methods by up to 39.3% reduction in prediction error and achieves 545x speedup compared to the commercial tool, which demonstrates the superiority of our method.
- [306] arXiv:2403.18570 [pdf, other]
-
Title: Physics-Informed Graph Neural Networks for Water Distribution SystemsComments: Extended version of the paper with the same title published at Proceedings of the AAAI Conference on Artificial Intelligence 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Water distribution systems (WDS) are an integral part of critical infrastructure which is pivotal to urban development. As 70% of the world's population will likely live in urban environments in 2050, efficient simulation and planning tools for WDS play a crucial role in reaching UN's sustainable developmental goal (SDG) 6 - "Clean water and sanitation for all". In this realm, we propose a novel and efficient machine learning emulator, more precisely, a physics-informed deep learning (DL) model, for hydraulic state estimation in WDS. Using a recursive approach, our model only needs a few graph convolutional neural network (GCN) layers and employs an innovative algorithm based on message passing. Unlike conventional machine learning tasks, the model uses hydraulic principles to infer two additional hydraulic state features in the process of reconstructing the available ground truth feature in an unsupervised manner. To the best of our knowledge, this is the first DL approach to emulate the popular hydraulic simulator EPANET, utilizing no additional information. Like most DL models and unlike the hydraulic simulator, our model demonstrates vastly faster emulation times that do not increase drastically with the size of the WDS. Moreover, we achieve high accuracy on the ground truth and very similar results compared to the hydraulic simulator as demonstrated through experiments on five real-world WDS datasets.
- [307] arXiv:2403.18571 [pdf, ps, other]
-
Title: Bootstrapping Guarantees: Stability and Performance Analysis for Dynamic Encrypted ControlSubjects: Systems and Control (eess.SY); Cryptography and Security (cs.CR); Optimization and Control (math.OC)
Encrypted dynamic controllers that operate for an unlimited time have been a challenging subject of research. The fundamental difficulty is the accumulation of errors and scaling factors in the internal state during operation. Bootstrapping, a technique commonly employed in fully homomorphic cryptosystems, can be used to avoid overflows in the controller state but can potentially introduce significant numerical errors. In this paper, we analyze dynamic encrypted control with explicit consideration of bootstrapping. By recognizing the bootstrapping errors occurring in the controller's state as an uncertainty in the robust control framework, we can provide stability and performance guarantees for the whole encrypted control system. Further, the conservatism of the stability and performance test is reduced by using a lifted version of the control system.
- [308] arXiv:2403.18572 [pdf, ps, other]
-
Title: ACES: Evaluating Automated Audio Captioning Models on the Semantics of SoundsSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
Automated Audio Captioning is a multimodal task that aims to convert audio content into natural language. The assessment of audio captioning systems is typically based on quantitative metrics applied to text data. Previous studies have employed metrics derived from machine translation and image captioning to evaluate the quality of generated audio captions. Drawing inspiration from auditory cognitive neuroscience research, we introduce a novel metric approach -- Audio Captioning Evaluation on Semantics of Sound (ACES). ACES takes into account how human listeners parse semantic information from sounds, providing a novel and comprehensive evaluation perspective for automated audio captioning systems. ACES combines semantic similarities and semantic entity labeling. ACES outperforms similar automated audio captioning metrics on the Clotho-Eval FENSE benchmark in two evaluation categories.
- [309] arXiv:2403.18575 [pdf, other]
-
Title: HandBooster: Boosting 3D Hand-Mesh Reconstruction by Conditional Synthesis and Sampling of Hand-Object InteractionsSubjects: Computer Vision and Pattern Recognition (cs.CV)
Reconstructing 3D hand mesh robustly from a single image is very challenging, due to the lack of diversity in existing real-world datasets. While data synthesis helps relieve the issue, the syn-to-real gap still hinders its usage. In this work, we present HandBooster, a new approach to uplift the data diversity and boost the 3D hand-mesh reconstruction performance by training a conditional generative space on hand-object interactions and purposely sampling the space to synthesize effective data samples. First, we construct versatile content-aware conditions to guide a diffusion model to produce realistic images with diverse hand appearances, poses, views, and backgrounds; favorably, accurate 3D annotations are obtained for free. Then, we design a novel condition creator based on our similarity-aware distribution sampling strategies to deliberately find novel and realistic interaction poses that are distinctive from the training set. Equipped with our method, several baselines can be significantly improved beyond the SOTA on the HO3D and DexYCB benchmarks. Our code will be released on https://github.com/hxwork/HandBooster_Pytorch.
- [310] arXiv:2403.18579 [pdf, other]
-
Title: On Optimizing Hyperparameters for Quantum Neural NetworksSubjects: Machine Learning (cs.LG); Emerging Technologies (cs.ET)
The increasing capabilities of Machine Learning (ML) models go hand in hand with an immense amount of data and computational power required for training. Therefore, training is usually outsourced into HPC facilities, where we have started to experience limits in scaling conventional HPC hardware, as theorized by Moore's law. Despite heavy parallelization and optimization efforts, current state-of-the-art ML models require weeks for training, which is associated with an enormous $CO_2$ footprint. Quantum Computing, and specifically Quantum Machine Learning (QML), can offer significant theoretical speed-ups and enhanced expressive power. However, training QML models requires tuning various hyperparameters, which is a nontrivial task and suboptimal choices can highly affect the trainability and performance of the models. In this study, we identify the most impactful hyperparameters and collect data about the performance of QML models. We compare different configurations and provide researchers with performance data and concrete suggestions for hyperparameter selection.
- [311] arXiv:2403.18580 [pdf, other]
-
Title: MisGUIDE : Defense Against Data-Free Deep Learning Model ExtractionComments: Under ReviewSubjects: Cryptography and Security (cs.CR)
The rise of Machine Learning as a Service (MLaaS) has led to the widespread deployment of machine learning models trained on diverse datasets. These models are employed for predictive services through APIs, raising concerns about the security and confidentiality of the models due to emerging vulnerabilities in prediction APIs. Of particular concern are model cloning attacks, where individuals with limited data and no knowledge of the training dataset manage to replicate a victim model's functionality through black-box query access. This commonly entails generating adversarial queries to query the victim model, thereby creating a labeled dataset.
This paper proposes "MisGUIDE", a two-step defense framework for Deep Learning models that disrupts the adversarial sample generation process by providing a probabilistic response when the query is deemed OOD. The first step employs a Vision Transformer-based framework to identify OOD queries, while the second step perturbs the response for such queries, introducing a probabilistic loss function to MisGUIDE the attackers. The aim of the proposed defense method is to reduce the accuracy of the cloned model while maintaining accuracy on authentic queries. Extensive experiments conducted on two benchmark datasets demonstrate that the proposed framework significantly enhances the resistance against state-of-the-art data-free model extraction in black-box settings. - [312] arXiv:2403.18587 [pdf, other]
-
Title: The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer VisionComments: Accepted at the DLSP 2024Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Resource efficiency plays an important role for machine learning nowadays. The energy and decision latency are two critical aspects to ensure a sustainable and practical application. Unfortunately, the energy consumption and decision latency are not robust against adversaries. Researchers have recently demonstrated that attackers can compute and submit so-called sponge examples at inference time to increase the energy consumption and decision latency of neural networks. In computer vision, the proposed strategy crafts inputs with less activation sparsity which could otherwise be used to accelerate the computation. In this paper, we analyze the mechanism how these energy-latency attacks reduce activation sparsity. In particular, we find that input uniformity is a key enabler. A uniform image, that is, an image with mostly flat, uniformly colored surfaces, triggers more activations due to a specific interplay of convolution, batch normalization, and ReLU activation. Based on these insights, we propose two new simple, yet effective strategies for crafting sponge examples: sampling images from a probability distribution and identifying dense, yet inconspicuous inputs in natural datasets. We empirically examine our findings in a comprehensive evaluation with multiple image classification models and show that our attack achieves the same sparsity effect as prior sponge-example methods, but at a fraction of computation effort. We also show that our sponge examples transfer between different neural networks. Finally, we discuss applications of our findings for the good by improving efficiency by increasing sparsity.
- [313] arXiv:2403.18588 [pdf, other]
-
Title: From Virtual Reality to the Emerging Discipline of Perception EngineeringAuthors: Steven M. LaValle, Evan G. Center, Timo Ojala, Matti Pouke, Nicoletta Prencipe, Basak Sakcak, Markku Suomalainen, Kalle G. Timperi, Vadim K. WeinsteinComments: 30 pages, 5 figuresJournal-ref: Annu. Rev. Control Robot. Auton. Syst. v. 7, 2023Subjects: Human-Computer Interaction (cs.HC); Systems and Control (eess.SY)
This paper makes the case that a powerful new discipline, which we term perception engineering, is steadily emerging. It follows from a progression of ideas that involve creating illusions, from historical paintings and film, to video games and virtual reality in modern times. Rather than creating physical artifacts such as bridges, airplanes, or computers, perception engineers create illusory perceptual experiences. The scope is defined over any agent that interacts with the physical world, including both biological organisms (humans, animals) and engineered systems (robots, autonomous systems). The key idea is that an agent, called a producer, alters the environment with the intent to alter the perceptual experience of another agent, called a receiver. Most importantly, the paper introduces a precise mathematical formulation of this process, based on the von Neumann-Morgenstern notion of information, to help scope and define the discipline. It is then applied to the cases of engineered and biological agents with discussion of its implications on existing fields such as virtual reality, robotics, and even social media. Finally, open challenges and opportunities for involvement are identified.
- [314] arXiv:2403.18591 [pdf, ps, other]
-
Title: Safety Verification of Wait-Only Non-Blocking Broadcast ProtocolsComments: Long version of a paper accepted to PetriNets 2024Subjects: Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA)
We study networks of processes that all execute the same finite protocol and communicate synchronously in two different ways: a process can broadcast one message to all other processes or send it to at most one other process. In both cases, if no process can receive the message, it will still be sent. We establish a precise complexity class for two coverability problems with a parameterised number of processes: the state coverability problem and the configuration coverability problem. It is already known that these problems are Ackermann-hard (but decidable) in the general case. We show that when the protocol is Wait-Only, i.e., it has no state from which a process can send and receive messages, the complexity drops to P and PSPACE, respectively.
- [315] arXiv:2403.18593 [pdf, other]
-
Title: Homogeneous Tokenizer Matters: Homogeneous Visual Tokenizer for Remote Sensing Image UnderstandingComments: 20 pages, 8 figures, 6 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
The tokenizer, as one of the fundamental components of large models, has long been overlooked or even misunderstood in visual tasks. One key factor of the great comprehension power of the large language model is that natural language tokenizers utilize meaningful words or subwords as the basic elements of language. In contrast, mainstream visual tokenizers, represented by patch-based methods such as Patch Embed, rely on meaningless rectangular patches as basic elements of vision, which cannot serve as effectively as words or subwords in language. Starting from the essence of the tokenizer, we defined semantically independent regions (SIRs) for vision. We designed a simple HOmogeneous visual tOKenizer: HOOK. HOOK mainly consists of two modules: the Object Perception Module (OPM) and the Object Vectorization Module (OVM). To achieve homogeneity, the OPM splits the image into 4*4 pixel seeds and then utilizes the attention mechanism to perceive SIRs. The OVM employs cross-attention to merge seeds within the same SIR. To achieve adaptability, the OVM defines a variable number of learnable vectors as cross-attention queries, allowing for the adjustment of token quantity. We conducted experiments on the NWPU-RESISC45, WHU-RS19 classification dataset, and GID5 segmentation dataset for sparse and dense tasks. The results demonstrate that the visual tokens obtained by HOOK correspond to individual objects, which demonstrates homogeneity. HOOK outperformed Patch Embed by 6\% and 10\% in the two tasks and achieved state-of-the-art performance compared to the baselines used for comparison. Compared to Patch Embed, which requires more than one hundred tokens for one image, HOOK requires only 6 and 8 tokens for sparse and dense tasks, respectively, resulting in efficiency improvements of 1.5 to 2.8 times. The code is available at https://github.com/GeoX-Lab/Hook.
- [316] arXiv:2403.18599 [pdf, other]
-
Title: Proving correctness for SQL implementations of OCL constraintsComments: 11 pagesSubjects: Databases (cs.DB)
In the context of the model-driven development of data-centric applications, OCL constraints play a major role in adding precision to the source models (e.g., data models and security models). Several code-generators have been proposed to bridge the gap between source models with OCL constraints and their corresponding database implementations. However, the database queries produced by these code-generators are significantly less efficient -- from the point of view of execution-time performance -- than the implementations manually written by database experts. In this paper, we propose a different approach to bridge the gap between models with OCL constraints and their corresponding database implementations. In particular, we introduce a model-based methodology for proving the correctness of manually written SQL implementations of OCL constraints. This methodology is based on a novel mapping from a significant subset of the SQL language into many-sorted first-order logic. Moreover, by leveraging on an already existing mapping from the OCL language into many-sorted first-order logic, we can use SMT solvers to automatically prove the correctness of SQL implementations of OCL constraints. To illustrate and show the applicability of our approach, we include in the paper a number of non-trivial examples. Finally, we report on the status of a suite of tools supporting our approach.
- [317] arXiv:2403.18600 [pdf, other]
-
Title: RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional VideosComments: 23 pages, 6 figures, 12 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Procedure Planning in instructional videos entails generating a sequence of action steps based on visual observations of the initial and target states. Despite the rapid progress in this task, there remain several critical challenges to be solved: (1) Adaptive procedures: Prior works hold an unrealistic assumption that the number of action steps is known and fixed, leading to non-generalizable models in real-world scenarios where the sequence length varies. (2) Temporal relation: Understanding the step temporal relation knowledge is essential in producing reasonable and executable plans. (3) Annotation cost: Annotating instructional videos with step-level labels (i.e., timestamp) or sequence-level labels (i.e., action category) is demanding and labor-intensive, limiting its generalizability to large-scale datasets.In this work, we propose a new and practical setting, called adaptive procedure planning in instructional videos, where the procedure length is not fixed or pre-determined. To address these challenges we introduce Retrieval-Augmented Planner (RAP) model. Specifically, for adaptive procedures, RAP adaptively determines the conclusion of actions using an auto-regressive model architecture. For temporal relation, RAP establishes an external memory module to explicitly retrieve the most relevant state-action pairs from the training videos and revises the generated procedures. To tackle high annotation cost, RAP utilizes a weakly-supervised learning manner to expand the training dataset to other task-relevant, unannotated videos by generating pseudo labels for action steps. Experiments on CrossTask and COIN benchmarks show the superiority of RAP over traditional fixed-length models, establishing it as a strong baseline solution for adaptive procedure planning.
- [318] arXiv:2403.18604 [pdf, other]
-
Title: Modeling Sustainable City Trips: Integrating CO2 Emissions, Popularity, and Seasonality into Tourism Recommender SystemsSubjects: Information Retrieval (cs.IR)
In an era of information overload and complex decision-making processes, Recommender Systems (RS) have emerged as indispensable tools across diverse domains, particularly travel and tourism. These systems simplify trip planning by offering personalized recommendations that consider individual preferences and address broader challenges like seasonality, travel regulations, and capacity constraints. The intricacies of the tourism domain, characterized by multiple stakeholders, including consumers, item providers, platforms, and society, underscore the complexity of achieving balance among diverse interests. Although previous research has focused on fairness in Tourism Recommender Systems (TRS) from a multistakeholder perspective, limited work has focused on generating sustainable recommendations.
Our paper introduces a novel approach for assigning a sustainability indicator (SF index) for city trips accessible from the users' starting point, integrating Co2e analysis, destination popularity, and seasonal demand. Our methodology involves comprehensive data gathering on transportation modes and emissions, complemented by analyses of destination popularity and seasonal demand. A user study validates our index, showcasing its practicality and efficacy in providing well-rounded and sustainable city trip recommendations. Our findings contribute significantly to the evolution of responsible tourism strategies, harmonizing the interests of tourists, local communities, and the environment while paving the way for future research in responsible and equitable tourism practices. - [319] arXiv:2403.18605 [pdf, other]
-
Title: FlexEdit: Flexible and Controllable Diffusion-based Object-centric Image EditingComments: Our project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Our work addresses limitations seen in previous approaches for object-centric editing problems, such as unrealistic results due to shape discrepancies and limited control in object replacement or insertion. To this end, we introduce FlexEdit, a flexible and controllable editing framework for objects where we iteratively adjust latents at each denoising step using our FlexEdit block. Initially, we optimize latents at test time to align with specified object constraints. Then, our framework employs an adaptive mask, automatically extracted during denoising, to protect the background while seamlessly blending new content into the target image. We demonstrate the versatility of FlexEdit in various object editing tasks and curate an evaluation test suite with samples from both real and synthetic images, along with novel evaluation metrics designed for object-centric editing. We conduct extensive experiments on different editing scenarios, demonstrating the superiority of our editing framework over recent advanced text-guided image editing methods. Our project page is published at https://flex-edit.github.io/.
- [320] arXiv:2403.18607 [pdf, other]
-
Title: Spikewhisper: Temporal Spike Backdoor Attacks on Federated Neuromorphic Learning over Low-power DevicesSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
Federated neuromorphic learning (FedNL) leverages event-driven spiking neural networks and federated learning frameworks to effectively execute intelligent analysis tasks over amounts of distributed low-power devices but also perform vulnerability to poisoning attacks. The threat of backdoor attacks on traditional deep neural networks typically comes from time-invariant data. However, in FedNL, unknown threats may be hidden in time-varying spike signals. In this paper, we start to explore a novel vulnerability of FedNL-based systems with the concept of time division multiplexing, termed Spikewhisper, which allows attackers to evade detection as much as possible, as multiple malicious clients can imperceptibly poison with different triggers at different timeslices. In particular, the stealthiness of Spikewhisper is derived from the time-domain divisibility of global triggers, in which each malicious client pastes only one local trigger to a certain timeslice in the neuromorphic sample, and also the polarity and motion of each local trigger can be configured by attackers. Extensive experiments based on two different neuromorphic datasets demonstrate that the attack success rate of Spikewispher is higher than the temporally centralized attacks. Besides, it is validated that the effect of Spikewispher is sensitive to the trigger duration.
- [321] arXiv:2403.18609 [pdf, ps, other]
-
Title: A survey on learning models of spiking neural membrane systems and spiking neural networksSubjects: Neural and Evolutionary Computing (cs.NE); Computation and Language (cs.CL)
Spiking neural networks (SNN) are a biologically inspired model of neural networks with certain brain-like properties. In the past few decades, this model has received increasing attention in computer science community, owing also to the successful phenomenon of deep learning. In SNN, communication between neurons takes place through the spikes and spike trains. This differentiates these models from the ``standard'' artificial neural networks (ANN) where the frequency of spikes is replaced by real-valued signals. Spiking neural P systems (SNPS) can be considered a branch of SNN based more on the principles of formal automata, with many variants developed within the framework of the membrane computing theory. In this paper, we first briefly compare structure and function, advantages and drawbacks of SNN and SNPS. A key part of the article is a survey of recent results and applications of machine learning and deep learning models of both SNN and SNPS formalisms.
- [322] arXiv:2403.18613 [pdf, ps, other]
-
Title: Scalable Lipschitz Estimation for CNNsSubjects: Machine Learning (cs.LG)
Estimating the Lipschitz constant of deep neural networks is of growing interest as it is useful for informing on generalisability and adversarial robustness. Convolutional neural networks (CNNs) in particular, underpin much of the recent success in computer vision related applications. However, although existing methods for estimating the Lipschitz constant can be tight, they have limited scalability when applied to CNNs. To tackle this, we propose a novel method to accelerate Lipschitz constant estimation for CNNs. The core idea is to divide a large convolutional block via a joint layer and width-wise partition, into a collection of smaller blocks. We prove an upper-bound on the Lipschitz constant of the larger block in terms of the Lipschitz constants of the smaller blocks. Through varying the partition factor, the resulting method can be adjusted to prioritise either accuracy or scalability and permits parallelisation. We demonstrate an enhanced scalability and comparable accuracy to existing baselines through a range of experiments.
- [323] arXiv:2403.18616 [pdf, other]
-
Title: Will You Participate? Exploring the Potential of Robotics Competitions on Human-centric TopicsJournal-ref: International Conference on Human-Computer Interaction (HCII) 2024Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)
This paper presents findings from an exploratory needfinding study investigating the research current status and potential participation of the competitions on the robotics community towards four human-centric topics: safety, privacy, explainability, and federated learning. We conducted a survey with 34 participants across three distinguished European robotics consortia, nearly 60% of whom possessed over five years of research experience in robotics. Our qualitative and quantitative analysis revealed that current mainstream robotic researchers prioritize safety and explainability, expressing a greater willingness to invest in further research in these areas. Conversely, our results indicate that privacy and federated learning garner less attention and are perceived to have lower potential. Additionally, the study suggests a lack of enthusiasm within the robotics community for participating in competitions related to these topics. Based on these findings, we recommend targeting other communities, such as the machine learning community, for future competitions related to these four human-centric topics.
- [324] arXiv:2403.18619 [pdf, other]
-
Title: Enhanced OpenMP Algorithm to Compute All-Pairs Shortest Path on x86 ArchitecturesComments: Accepted for publication in Computer Science - CACIC 2023Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Graphs have become a key tool when modeling and solving problems in different areas. The Floyd-Warshall (FW) algorithm computes the shortest path between all pairs of vertices in a graph and is employed in areas like communication networking, traffic routing, bioinformatics, among others. However, FW is computationally and spatially expensive since it requires O(n^3) operations and O(n^2) memory space. As the graph gets larger, parallel computing becomes necessary to provide a solution in an acceptable time range. In this paper, we studied a FW code developed for Xeon Phi KNL processors and adapted it to run on any Intel x86 processors, losing the specificity of the former. To do so, we verified one by one the optimizations proposed by the original code, making adjustments to the base code where necessary, and analyzing its performance on two Intel servers under different test scenarios. In addition, a new optimization was proposed to increase the concurrency degree of the parallel algorithm, which was implemented using two different synchronization mechanisms. The experimental results show that all optimizations were beneficial on the two x86 platforms selected. Last, the new optimization proposal improved performance by up to 23%.
- [325] arXiv:2403.18621 [pdf, other]
-
Title: Performance Analysis of Integrated Sensing and Communication Networks with Blockage EffectsSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Communication-sensing integration represents an up-and-coming area of research, enabling wireless networks to simultaneously perform communication and sensing tasks. However, in urban cellular networks, the blockage of buildings results in a complex signal propagation environment, affecting the performance analysis of integrated sensing and communication (ISAC) networks. To overcome this obstacle, this paper constructs a comprehensive framework considering building blockage and employs a distance-correlated blockage model to analyze interference from line of sight (LoS), non-line of sight (NLoS), and target reflection cascading (TRC) links. Using stochastic geometric theory, expressions for signal-to-interference-plus-noise ratio (SINR) and coverage probability for communication and sensing in the presence of blockage are derived, allowing for a comprehensive comparison under the same parameters. The research findings indicate that blockage can positively impact coverage, especially in enhancing communication performance. The analysis also suggests that there exists an optimal base station (BS) density when blockage is of the same order of magnitude as the BS density, maximizing communication or sensing coverage probability.
- [326] arXiv:2403.18622 [pdf, other]
-
Title: qIoV: A Quantum-Driven Internet-of-Vehicles-Based Approach for Environmental Monitoring and Rapid Response SystemsSubjects: Emerging Technologies (cs.ET); Networking and Internet Architecture (cs.NI)
This research addresses the critical necessity for advanced rapid response operations in managing a spectrum of environmental hazards. We propose a novel framework, qIoV that integrates quantum computing with the Internet-of-Vehicles (IoV) to leverage the computational efficiency, parallelism, and entanglement properties of quantum mechanics. Our approach involves the use of environmental sensors mounted on vehicles for precise air quality assessment. These sensors are designed to be highly sensitive and accurate, leveraging the principles of quantum mechanics to detect and measure environmental parameters. A salient feature of our proposal is the Quantum Mesh Network Fabric (QMF), a system designed to dynamically adjust the quantum network topology in accordance with vehicular movements. This capability is critical to maintaining the integrity of quantum states against environmental and vehicular disturbances, thereby ensuring reliable data transmission and processing. Moreover, our methodology is further augmented by the incorporation of a variational quantum classifier (VQC) with advanced quantum entanglement techniques. This integration offers a significant reduction in latency for hazard alert transmission, thus enabling expedited communication of crucial data to emergency response teams and the public. Our study on the IBM OpenQSAM 3 platform, utilizing a 127 Qubit system, revealed significant advancements in pair plot analysis, achieving over 90% in precision, recall, and F1-Score metrics and an 83% increase in the speed of toxic gas detection compared to conventional methods.Additionally, theoretical analyses validate the efficiency of quantum rotation, teleportation protocols, and the fidelity of quantum entanglement, further underscoring the potential of quantum computing in enhancing analytical performance.
- [327] arXiv:2403.18623 [pdf, other]
-
Title: Antitrust, Amazon, and Algorithmic AuditingAuthors: Abhisek Dash, Abhijnan Chakraborty, Saptarshi Ghosh, Animesh Mukherjee, Jens Frankenreiter, Stefan Bechtold, Krishna P. GummadiComments: The paper has been accepted to appear at Journal of Institutional and Theoretical Economics (JITE) 2024Subjects: Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR)
In digital markets, antitrust law and special regulations aim to ensure that markets remain competitive despite the dominating role that digital platforms play today in everyone's life. Unlike traditional markets, market participant behavior is easily observable in these markets. We present a series of empirical investigations into the extent to which Amazon engages in practices that are typically described as self-preferencing. We discuss how the computer science tools used in this paper can be used in a regulatory environment that is based on algorithmic auditing and requires regulating digital markets at scale.
- [328] arXiv:2403.18624 [pdf, other]
-
Title: Vulnerability Detection with Code Language Models: How Far Are We?Authors: Yangruibo Ding, Yanjun Fu, Omniyyah Ibrahim, Chawin Sitawarin, Xinyun Chen, Basel Alomair, David Wagner, Baishakhi Ray, Yizheng ChenSubjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
In the context of the rising interest in code language models (code LMs) and vulnerability detection, we study the effectiveness of code LMs for detecting vulnerabilities. Our analysis reveals significant shortcomings in existing vulnerability datasets, including poor data quality, low label accuracy, and high duplication rates, leading to unreliable model performance in realistic vulnerability detection scenarios. Additionally, the evaluation methods used with these datasets are not representative of real-world vulnerability detection.
To address these challenges, we introduce PrimeVul, a new dataset for training and evaluating code LMs for vulnerability detection. PrimeVul incorporates a novel set of data labeling techniques that achieve comparable label accuracy to human-verified benchmarks while significantly expanding the dataset. It also implements a rigorous data de-duplication and chronological data splitting strategy to mitigate data leakage issues, alongside introducing more realistic evaluation metrics and settings. This comprehensive approach aims to provide a more accurate assessment of code LMs' performance in real-world conditions.
Evaluating code LMs on PrimeVul reveals that existing benchmarks significantly overestimate the performance of these models. For instance, a state-of-the-art 7B model scored 68.26% F1 on BigVul but only 3.09% F1 on PrimeVul. Attempts to improve performance through advanced training techniques and larger models like GPT-3.5 and GPT-4 were unsuccessful, with results akin to random guessing in the most stringent settings. These findings underscore the considerable gap between current capabilities and the practical requirements for deploying code LMs in security roles, highlighting the need for more innovative research in this domain. - [329] arXiv:2403.18628 [pdf, other]
-
Title: To Recommend or Not: Recommendability Identification in Conversations with Pre-trained Language ModelsSubjects: Information Retrieval (cs.IR)
Most current recommender systems primarily focus on what to recommend, assuming users always require personalized recommendations. However, with the widely spread of ChatGPT and other chatbots, a more crucial problem in the context of conversational systems is how to minimize user disruption when we provide recommendation services for users. While previous research has extensively explored different user intents in dialogue systems, fewer efforts are made to investigate whether recommendations should be provided. In this paper, we formally define the recommendability identification problem, which aims to determine whether recommendations are necessary in a specific scenario. First, we propose and define the recommendability identification task, which investigates the need for recommendations in the current conversational context. A new dataset is constructed. Subsequently, we discuss and evaluate the feasibility of leveraging pre-trained language models (PLMs) for recommendability identification. Finally, through comparative experiments, we demonstrate that directly employing PLMs with zero-shot results falls short of meeting the task requirements. Besides, fine-tuning or utilizing soft prompt techniques yields comparable results to traditional classification methods. Our work is the first to study recommendability before recommendation and provides preliminary ways to make it a fundamental component of the future recommendation system.
- [330] arXiv:2403.18631 [pdf, other]
-
Title: First Experiences with the Identification of People at Risk for Diabetes in Argentina using Machine Learning TechniquesAuthors: Enzo Rucci, Gonzalo Tittarelli, Franco Ronchetti, Jorge F. Elgart, Laura Lanzarini, Juan José GagliardinoComments: Accepted for publication in Computer Science - CACIC 2023Subjects: Machine Learning (cs.LG)
Detecting Type 2 Diabetes (T2D) and Prediabetes (PD) is a real challenge for medicine due to the absence of pathogenic symptoms and the lack of known associated risk factors. Even though some proposals for machine learning models enable the identification of people at risk, the nature of the condition makes it so that a model suitable for one population may not necessarily be suitable for another. In this article, the development and assessment of predictive models to identify people at risk for T2D and PD specifically in Argentina are discussed. First, the database was thoroughly preprocessed and three specific datasets were generated considering a compromise between the number of records and the amount of available variables. After applying 5 different classification models, the results obtained show that a very good performance was observed for two datasets with some of these models. In particular, RF, DT, and ANN demonstrated great classification power, with good values for the metrics under consideration. Given the lack of this type of tool in Argentina, this work represents the first step towards the development of more sophisticated models.
- [331] arXiv:2403.18632 [pdf, other]
-
Title: Optimal Control Synthesis of Markov Decision Processes for Efficiency with Surveillance TasksSubjects: Systems and Control (eess.SY)
We investigate the problem of optimal control synthesis for Markov Decision Processes (MDPs), addressing both qualitative and quantitative objectives. Specifically, we require the system to fulfill a qualitative surveillance task in the sense that a specific region of interest can be visited infinitely often with probability one. Furthermore, to quantify the performance of the system, we consider the concept of efficiency, which is defined as the ratio between rewards and costs. This measure is more general than the standard long-run average reward metric as it aims to maximize the reward obtained per unit cost. Our objective is to synthesize a control policy that ensures the surveillance task while maximizes the efficiency. We provide an effective approach to synthesize a stationary control policy achieving $\epsilon$-optimality by integrating state classifications of MDPs and perturbation analysis in a novel manner. Our results generalize existing works on efficiency-optimal control synthesis for MDP by incorporating qualitative surveillance tasks. A robot motion planning case study is provided to illustrate the proposed algorithm.
- [332] arXiv:2403.18635 [pdf, other]
-
Title: Fusion approaches for emotion recognition from speech using acoustic and text-based featuresComments: 5 pages. Accepted in ICASSP 2020Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In this paper, we study different approaches for classifying emotions from speech using acoustic and text-based features. We propose to obtain contextualized word embeddings with BERT to represent the information contained in speech transcriptions and show that this results in better performance than using Glove embeddings. We also propose and compare different strategies to combine the audio and text modalities, evaluating them on IEMOCAP and MSP-PODCAST datasets. We find that fusing acoustic and text-based systems is beneficial on both datasets, though only subtle differences are observed across the evaluated fusion approaches. Finally, for IEMOCAP, we show the large effect that the criteria used to define the cross-validation folds have on results. In particular, the standard way of creating folds for this dataset results in a highly optimistic estimation of performance for the text-based system, suggesting that some previous works may overestimate the advantage of incorporating transcriptions.
- [333] arXiv:2403.18639 [pdf, other]
-
Title: Dependency Aware Incident Linking in Large Cloud SystemsAuthors: Supriyo Ghosh, Karish Grover, Jimmy Wong, Chetan Bansal, Rakesh Namineni, Mohit Verma, Saravan RajmohanSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Despite significant reliability efforts, large-scale cloud services inevitably experience production incidents that can significantly impact service availability and customer's satisfaction. Worse, in many cases one incident can lead to multiple downstream failures due to cascading effects that creates several related incidents across different dependent services. Often time On-call Engineers (OCEs) examine these incidents in silos that lead to significant amount of manual toil and increase the overall time-to-mitigate incidents. Therefore, developing efficient incident linking models is of paramount importance for grouping related incidents into clusters so as to quickly resolve major outages and reduce on-call fatigue. Existing incident linking methods mostly leverages textual and contextual information of incidents (e.g., title, description, severity, impacted components), thus failing to leverage the inter-dependencies between services. In this paper, we propose the dependency-aware incident linking (DiLink) framework which leverages both textual and service dependency graph information to improve the accuracy and coverage of incident links not only coming from same service, but also from different services and workloads. Furthermore, we propose a novel method to align the embeddings of multi-modal (i.e., textual and graphical) data using Orthogonal Procrustes. Extensive experimental results on real-world incidents from 5 workloads of Microsoft demonstrate that our alignment method has an F1-score of 0.96 (14% gain over current state-of-the-art methods). We are also in the process of deploying this solution across 610 services from these 5 workloads for continuously supporting OCEs improving incident management and reducing manual toil.
- [334] arXiv:2403.18641 [pdf, other]
-
Title: Improving Efficiency of Parallel Across the Method Spectral Deferred CorrectionsComments: 24 pagesSubjects: Numerical Analysis (math.NA); Distributed, Parallel, and Cluster Computing (cs.DC)
Parallel-across-the method time integration can provide small scale parallelism when solving initial value problems. Spectral deferred corrections (SDC) with a diagonal sweeper, which is closely related to iterated Runge-Kutta methods proposed by Van der Houwen and Sommeijer, can use a number of threads equal to the number of quadrature nodes in the underlying collocation method. However, convergence speed, efficiency and stability depends critically on the used coefficients. Previous approaches have used numerical optimization to find good parameters. Instead, we propose an ansatz that allows to find optimal parameters analytically. We show that the resulting parallel SDC methods provide stability domains and convergence order very similar to those of well established serial SDC variants. Using a model for computational cost that assumes 80% efficiency of an implementation of parallel SDC we show that our variants are competitive with serial SDC, previously published parallel SDC coefficients as well as Picard iteration, explicit RKM-4 and an implicit fourth-order diagonally implicit Runge-Kutta method.
- [335] arXiv:2403.18642 [pdf, other]
-
Title: Collective schedules: axioms and algorithmsSubjects: Computer Science and Game Theory (cs.GT)
The collective schedules problem consists in computing a schedule of tasks shared between individuals. Tasks may have different duration, and individuals have preferences over the order of the shared tasks. This problem has numerous applications since tasks may model public infrastructure projects, events taking place in a shared room, or work done by co-workers. Our aim is, given the preferred schedules of individuals (voters), to return a consensus schedule. We propose an axiomatic study of the collective schedule problem, by using classic axioms in computational social choice and new axioms that take into account the duration of the tasks. We show that some axioms are incompatible, and we study the axioms fulfilled by three rules: one which has been studied in the seminal paper on collective schedules (Pascual et al. 2018), one which generalizes the Kemeny rule, and one which generalizes Spearman's footrule. From an algorithmic point of view, we show that these rules solve NP-hard problems, but that it is possible to solve optimally these problems for small but realistic size instances, and we give an efficient heuristic for large instances. We conclude this paper with experiments.
- [336] arXiv:2403.18643 [pdf, other]
-
Title: Sampling-Based Motion Planning with Online Racing Line Generation for Autonomous Driving on Three-Dimensional Race TracksComments: 8 pages, submitted to be published at the 35th IEEE Intelligent Vehicles Symposium, June 2 - 5, 2024, Jeju Shinhwa World, Jeju Island, KoreaSubjects: Robotics (cs.RO)
Existing approaches to trajectory planning for autonomous racing employ sampling-based methods, generating numerous jerk-optimal trajectories and selecting the most favorable feasible trajectory based on a cost function penalizing deviations from an offline-calculated racing line. While successful on oval tracks, these methods face limitations on complex circuits due to the simplistic geometry of jerk-optimal edges failing to capture the complexity of the racing line. Additionally, they only consider two-dimensional tracks, potentially neglecting or surpassing the actual dynamic potential. In this paper, we present a sampling-based local trajectory planning approach for autonomous racing that can maintain the lap time of the racing line even on complex race tracks and consider the race track's three-dimensional effects. In simulative experiments, we demonstrate that our approach achieves lower lap times and improved utilization of dynamic limits compared to existing approaches. We also investigate the impact of online racing line generation, in which the time-optimal solution is planned from the current vehicle state for a limited spatial horizon, in contrast to a closed racing line calculated offline. We show that combining the sampling-based planner with the online racing line generation can significantly reduce lap times in multi-vehicle scenarios.
- [337] arXiv:2403.18646 [pdf, ps, other]
-
Title: Synergistic KnowledgeSubjects: Logic in Computer Science (cs.LO)
In formal epistemology, group knowledge is often modelled as the knowledge that the group would have, if the agents shared all their individual knowledge. However, this interpretation does not account for relations between agents. In this work, we propose the notion of synergistic knowledge which makes it possible to model those relationships.
- [338] arXiv:2403.18647 [pdf, other]
-
Title: SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive TokensComments: 12 pages, 7 figuresSubjects: Computation and Language (cs.CL)
We propose an acceleration scheme for large language models (LLMs) through Speculative Decoding with Semantic Adaptive Tokens (SDSAT). The primary objective of this design is to enhance the LLM model's ability to generate draft tokens more accurately without compromising the model's accuracy. The core strategies involve: 1) Fine-tune the model by incorporating semantic adaptive tokens that possess flexible decoding capabilities without changing its structure, allowing them to generate high-quality draft tokens. 2) By employing a training method that does not affect the standard tokens, the model can acquire parallel decoding abilities atop its original framework with minimal training overhead. 3) We have designed the "two-step-draft-then-verify" generation strategies using both greedy search and nucleus sampling. Experiments conducted on the CodeLlama-13B and 7B models have yielded speed increases of over 3.5X and 3.0X, respectively. Please refer to https://github.com/hasuoshenyun/SDSAT.
- [339] arXiv:2403.18649 [pdf, other]
-
Title: Addressing Data Annotation Challenges in Multiple Sensors: A Solution for Scania Collected DatasetsAuthors: Ajinkya Khoche, Aron Asefaw, Alejandro Gonzalez, Bogdan Timus, Sina Sharif Mansouri, Patric JensfeltComments: Accepted to European Control Conference 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Systems and Control (eess.SY)
Data annotation in autonomous vehicles is a critical step in the development of Deep Neural Network (DNN) based models or the performance evaluation of the perception system. This often takes the form of adding 3D bounding boxes on time-sequential and registered series of point-sets captured from active sensors like Light Detection and Ranging (LiDAR) and Radio Detection and Ranging (RADAR). When annotating multiple active sensors, there is a need to motion compensate and translate the points to a consistent coordinate frame and timestamp respectively. However, highly dynamic objects pose a unique challenge, as they can appear at different timestamps in each sensor's data. Without knowing the speed of the objects, their position appears to be different in different sensor outputs. Thus, even after motion compensation, highly dynamic objects are not matched from multiple sensors in the same frame, and human annotators struggle to add unique bounding boxes that capture all objects. This article focuses on addressing this challenge, primarily within the context of Scania collected datasets. The proposed solution takes a track of an annotated object as input and uses the Moving Horizon Estimation (MHE) to robustly estimate its speed. The estimated speed profile is utilized to correct the position of the annotated box and add boxes to object clusters missed by the original annotation.
- [340] arXiv:2403.18650 [pdf, other]
-
Title: MPC-CBF with Adaptive Safety Margins for Safety-critical Teleoperation over Imperfect Network ConnectionsComments: Accepted for publication in the 2024 European Control Conference (ECC)Subjects: Systems and Control (eess.SY)
The paper focuses on the design of a control strategy for safety-critical remote teleoperation. The main goal is to make the controlled system track the desired velocity specified by an operator while avoiding obstacles despite communication delays. Control Barrier Functions (CBFs) are used to define the safety constraints that the system has to respect to avoid obstacles, while Model Predictive Control (MPC) provides the framework for adjusting the desired input, taking the constraints into account. The resulting input is sent to the remote system, where appropriate low-level velocity controllers translate it into system-specific commands. The main novelty of the paper is a method to make the CBFs robust against the uncertainties caused by the network delays affecting the system's state and do so in a less conservative manner. The results show how the proposed method successfully solves the safety-critical teleoperation problem, making the controlled systems avoid obstacles with different types of network delay. The controller has also been tested in simulation and on a real manipulator, demonstrating its general applicability when reliable low-level velocity controllers are available.
- [341] arXiv:2403.18659 [pdf, other]
-
Title: INEXA: Interactive and Explainable Process Model Abstraction Through Object-Centric Process MiningSubjects: Artificial Intelligence (cs.AI)
Process events are recorded by multiple information systems at different granularity levels. Based on the resulting event logs, process models are discovered at different granularity levels, as well. Events stored at a fine-grained granularity level, for example, may hinder the discovered process model to be displayed due the high number of resulting model elements. The discovered process model of a real-world manufacturing process, for example, consists of 1,489 model elements and over 2,000 arcs. Existing process model abstraction techniques could help reducing the size of the model, but would disconnect it from the underlying event log. Existing event abstraction techniques do neither support the analysis of mixed granularity levels, nor interactive exploration of a suitable granularity level. To enable the exploration of discovered process models at different granularity levels, we propose INEXA, an interactive, explainable process model abstraction method that keeps the link to the event log. As a starting point, INEXA aggregates large process models to a "displayable" size, e.g., for the manufacturing use case to a process model with 58 model elements. Then, the process analyst can explore granularity levels interactively, while applied abstractions are automatically traced in the event log for explainability.
- [342] arXiv:2403.18660 [pdf, other]
-
Title: InstructBrush: Learning Attention-based Instruction Optimization for Image EditingAuthors: Ruoyu Zhao, Qingnan Fan, Fei Kou, Shuai Qin, Hong Gu, Wei Wu, Pengcheng Xu, Mingrui Zhu, Nannan Wang, Xinbo GaoComments: Project Page: this https URLSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV)
In recent years, instruction-based image editing methods have garnered significant attention in image editing. However, despite encompassing a wide range of editing priors, these methods are helpless when handling editing tasks that are challenging to accurately describe through language. We propose InstructBrush, an inversion method for instruction-based image editing methods to bridge this gap. It extracts editing effects from exemplar image pairs as editing instructions, which are further applied for image editing. Two key techniques are introduced into InstructBrush, Attention-based Instruction Optimization and Transformation-oriented Instruction Initialization, to address the limitations of the previous method in terms of inversion effects and instruction generalization. To explore the ability of instruction inversion methods to guide image editing in open scenarios, we establish a TransformationOriented Paired Benchmark (TOP-Bench), which contains a rich set of scenes and editing types. The creation of this benchmark paves the way for further exploration of instruction inversion. Quantitatively and qualitatively, our approach achieves superior performance in editing and is more semantically consistent with the target editing effects.
- [343] arXiv:2403.18667 [pdf, other]
-
Title: Improving Content Recommendation: Knowledge Graph-Based Semantic Contrastive Learning for Diversity and Cold-Start UsersAuthors: Yejin Kim, Scott Rome, Kevin Foley, Mayur Nankani, Rimon Melamed, Javier Morales, Abhay Yadav, Maria Peifer, Sardar Hamidian, H. Howie HuangComments: Accepted at LREC-COLING 2024Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Addressing the challenges related to data sparsity, cold-start problems, and diversity in recommendation systems is both crucial and demanding. Many current solutions leverage knowledge graphs to tackle these issues by combining both item-based and user-item collaborative signals. A common trend in these approaches focuses on improving ranking performance at the cost of escalating model complexity, reducing diversity, and complicating the task. It is essential to provide recommendations that are both personalized and diverse, rather than solely relying on achieving high rank-based performance, such as Click-through Rate, Recall, etc. In this paper, we propose a hybrid multi-task learning approach, training on user-item and item-item interactions. We apply item-based contrastive learning on descriptive text, sampling positive and negative pairs based on item metadata. Our approach allows the model to better understand the relationships between entities within the knowledge graph by utilizing semantic information from text. It leads to more accurate, relevant, and diverse user recommendations and a benefit that extends even to cold-start users who have few interactions with items. We perform extensive experiments on two widely used datasets to validate the effectiveness of our approach. Our findings demonstrate that jointly training user-item interactions and item-based signals using synopsis text is highly effective. Furthermore, our results provide evidence that item-based contrastive learning enhances the quality of entity embeddings, as indicated by metrics such as uniformity and alignment.
- [344] arXiv:2403.18668 [pdf, ps, other]
-
Title: Aiming for RelevanceComments: 10 pages, 9 figures, AMIA Informatics 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (stat.ML)
Vital signs are crucial in intensive care units (ICUs). They are used to track the patient's state and to identify clinically significant changes. Predicting vital sign trajectories is valuable for early detection of adverse events. However, conventional machine learning metrics like RMSE often fail to capture the true clinical relevance of such predictions. We introduce novel vital sign prediction performance metrics that align with clinical contexts, focusing on deviations from clinical norms, overall trends, and trend deviations. These metrics are derived from empirical utility curves obtained in a previous study through interviews with ICU clinicians. We validate the metrics' usefulness using simulated and real clinical datasets (MIMIC and eICU). Furthermore, we employ these metrics as loss functions for neural networks, resulting in models that excel in predicting clinically significant events. This research paves the way for clinically relevant machine learning model evaluation and optimization, promising to improve ICU patient care. 10 pages, 9 figures.
- [345] arXiv:2403.18671 [pdf, other]
-
Title: Fact Checking Beyond Training SetComments: NAACL 2024Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Evaluating the veracity of everyday claims is time consuming and in some cases requires domain expertise. We empirically demonstrate that the commonly used fact checking pipeline, known as the retriever-reader, suffers from performance deterioration when it is trained on the labeled data from one domain and used in another domain. Afterwards, we delve into each component of the pipeline and propose novel algorithms to address this problem. We propose an adversarial algorithm to make the retriever component robust against distribution shift. Our core idea is to initially train a bi-encoder on the labeled source data, and then, to adversarially train two separate document and claim encoders using unlabeled target data. We then focus on the reader component and propose to train it such that it is insensitive towards the order of claims and evidence documents. Our empirical evaluations support the hypothesis that such a reader shows a higher robustness against distribution shift. To our knowledge, there is no publicly available multi-topic fact checking dataset. Thus, we propose a simple automatic method to re-purpose two well-known fact checking datasets. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models, including recent domain adaptation models that use GPT4 for generating synthetic data.
- [346] arXiv:2403.18674 [pdf, other]
-
Title: Deep Learning for Robust and Explainable Models in Computer VisionAuthors: Mohammadreza AmirianComments: 150 pages, 37 figures, 12 tablesJournal-ref: OPARU is the OPen Access Repository of Ulm University and Ulm University of Applied Sciences, 2023Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent breakthroughs in machine and deep learning (ML and DL) research have provided excellent tools for leveraging enormous amounts of data and optimizing huge models with millions of parameters to obtain accurate networks for image processing. These developments open up tremendous opportunities for using artificial intelligence (AI) in the automation and human assisted AI industry. However, as more and more models are deployed and used in practice, many challenges have emerged. This thesis presents various approaches that address robustness and explainability challenges for using ML and DL in practice.
Robustness and reliability are the critical components of any model before certification and deployment in practice. Deep convolutional neural networks (CNNs) exhibit vulnerability to transformations of their inputs, such as rotation and scaling, or intentional manipulations as described in the adversarial attack literature. In addition, building trust in AI-based models requires a better understanding of current models and developing methods that are more explainable and interpretable a priori.
This thesis presents developments in computer vision models' robustness and explainability. Furthermore, this thesis offers an example of using vision models' feature response visualization (models' interpretations) to improve robustness despite interpretability and robustness being seemingly unrelated in the related research. Besides methodological developments for robust and explainable vision models, a key message of this thesis is introducing model interpretation techniques as a tool for understanding vision models and improving their design and robustness. In addition to the theoretical developments, this thesis demonstrates several applications of ML and DL in different contexts, such as medical imaging and affective computing. - [347] arXiv:2403.18679 [pdf, ps, other]
-
Title: An Exploratory Study on Upper-Level Computing Students' Use of Large Language Models as Tools in a Semester-Long ProjectComments: Accepted to the 2024 General Conference of the American Society for Engineering Education (ASEE)Subjects: Software Engineering (cs.SE); Human-Computer Interaction (cs.HC)
Background: Large Language Models (LLMs) such as ChatGPT and CoPilot are influencing software engineering practice. Software engineering educators must teach future software engineers how to use such tools well. As of yet, there have been few studies that report on the use of LLMs in the classroom. It is, therefore, important to evaluate students' perception of LLMs and possible ways of adapting the computing curriculum to these shifting paradigms.
Purpose: The purpose of this study is to explore computing students' experiences and approaches to using LLMs during a semester-long software engineering project.
Design/Method: We collected data from a senior-level software engineering course at Purdue University. This course uses a project-based learning (PBL) design. The students used LLMs such as ChatGPT and Copilot in their projects. A sample of these student teams were interviewed to understand (1) how they used LLMs in their projects; and (2) whether and how their perspectives on LLMs changed over the course of the semester. We analyzed the data to identify themes related to students' usage patterns and learning outcomes.
Results/Discussion: When computing students utilize LLMs within a project, their use cases cover both technical and professional applications. In addition, these students perceive LLMs to be efficient tools in obtaining information and completion of tasks. However, there were concerns about the responsible use of LLMs without being detrimental to their own learning outcomes. Based on our findings, we recommend future research to investigate the usage of LLM's in lower-level computer engineering courses to understand whether and how LLMs can be integrated as a learning aid without hurting the learning outcomes. - [348] arXiv:2403.18680 [pdf, other]
-
Title: NL-ITI: Optimizing Probing and Intervention for Improvement of ITI MethodAuthors: Jakub Hoscilowicz, Adam Wiacek, Jan Chojnacki, Adam Cieslak, Leszek Michon, Vitalii Urbanevych, Artur JanickiComments: Code is available at this https URLSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Large Language Models (LLM) are prone to returning false information. It constitutes one of major challenges in the AI field. In our work, we explore paradigm introduced by Inference-Time-Intervention (ITI). In first stage, it identifies attention heads, which contain the highest amount of desired type of knowledge (e.g., truthful). Afterwards, during inference, LLM activations are shifted for chosen subset of attention heads. We further improved the ITI framework by introducing a nonlinear probing and multi-token intervention - Non-Linear ITI (NL-ITI). NL-ITI is tested on diverse multiple-choice benchmarks, including TruthfulQA, on which we report around 14% MC1 metric improvement with respect to the baseline ITI results. NL-ITI achieves also encouraging results on other testsets - on Business Ethics subdomain of MMLU, around 18% MC1 improvement over baseline LLaMA2-7B. Additionally, NL-ITI performs better while being less invasive in the behavior of LLM at the same time (as measured by Kullback-Leibler divergence).
- [349] arXiv:2403.18681 [pdf, other]
-
Title: TransFusion: Contrastive Learning with TransformersComments: 17 pages, 4 figures,Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
This paper proposes a novel framework, TransFusion, designed to make the process of contrastive learning more analytical and explainable. TransFusion consists of attention blocks whose softmax being replaced by ReLU, and its final block's weighted-sum operation is truncated to leave the adjacency matrix as the output. The model is trained by minimizing the Jensen-Shannon Divergence between its output and the target affinity matrix, which indicates whether each pair of samples belongs to the same or different classes. The main contribution of TransFusion lies in defining a theoretical limit for answering two fundamental questions in the field: the maximum level of data augmentation and the minimum batch size required for effective contrastive learning. Furthermore, experimental results indicate that TransFusion successfully extracts features that isolate clusters from complex real-world data, leading to improved classification accuracy in downstream tasks.
- [350] arXiv:2403.18682 [pdf, other]
-
Title: JumpBackHash: Say Goodbye to the Modulo Operation to Distribute Keys Uniformly to BucketsAuthors: Otmar ErtlComments: 8 pagesSubjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
The distribution of keys to a given number of buckets is a fundamental task in distributed data processing and storage. A simple, fast, and therefore popular approach is to map the hash values of keys to buckets based on the remainder after dividing by the number of buckets. Unfortunately, these mappings are not stable when the number of buckets changes, which can lead to severe spikes in system resource utilization, such as network or database requests. Consistent hash algorithms can minimize remappings, but are either significantly slower than the modulo-based approach, require floating-point arithmetic, or are based on a family of hash functions rarely available in standard libraries. This paper introduces JumpBackHash, which uses only integer arithmetic and a standard pseudorandom generator. Due to its speed and simple implementation, it can safely replace the modulo-based approach to improve assignment and system stability. A production-ready Java implementation of JumpBackHash has been released as part of the Hash4j open source library.
- [351] arXiv:2403.18684 [pdf, other]
-
Title: Scaling Laws For Dense RetrievalComments: Accepted at SIGIR 2024Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
Scaling up neural models has yielded significant advancements in a wide array of tasks, particularly in language generation. Previous studies have found that the performance of neural models frequently adheres to predictable scaling laws, correlated with factors such as training set size and model size. This insight is invaluable, especially as large-scale experiments grow increasingly resource-intensive. Yet, such scaling law has not been fully explored in dense retrieval due to the discrete nature of retrieval metrics and complex relationships between training data and model sizes in retrieval tasks. In this study, we investigate whether the performance of dense retrieval models follows the scaling law as other neural models. We propose to use contrastive log-likelihood as the evaluation metric and conduct extensive experiments with dense retrieval models implemented with different numbers of parameters and trained with different amounts of annotated data. Results indicate that, under our settings, the performance of dense retrieval models follows a precise power-law scaling related to the model size and the number of annotations. Additionally, we examine scaling with prevalent data augmentation methods to assess the impact of annotation quality, and apply the scaling law to find the best resource allocation strategy under a budget constraint. We believe that these insights will significantly contribute to understanding the scaling effect of dense retrieval models and offer meaningful guidance for future research endeavors.
- [352] arXiv:2403.18685 [pdf, other]
-
Title: Representatividad Muestral en la Incertidumbre Simétrica Multivariada para la Selección de AtributosAuthors: Gustavo Sosa-CabreraComments: 52 pages, in Spanish. Advisors: Miguel Garc\'ia-Torres, Santiago G\'omez-Guerrero, Christian E. Schaerer SerraSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)
In this work, we analyze the behavior of the multivariate symmetric uncertainty (MSU) measure through the use of statistical simulation techniques under various mixes of informative and non-informative randomly generated features. Experiments show how the number of attributes, their cardinalities, and the sample size affect the MSU. In this thesis, through observation of results, it is proposed an heuristic condition that preserves good quality in the MSU under different combinations of these three factors, providing a new useful criterion to help drive the process of dimension reduction.
--
En el presente trabajo hemos analizado el comportamiento de una versi\'on multivariada de la incertidumbre sim\'etrica a trav\'es de t\'ecnicas de simulaci\'on estad\'isticas sobre varias combinaciones de atributos informativos y no-informativos generados de forma aleatoria. Los experimentos muestran como el n\'umero de atributos, sus cardinalidades y el tama\~no muestral afectan al MSU como medida. En esta tesis, mediante la observaci\'on de resultados hemos propuesto una condici\'on que preserva una buena calidad en el MSU bajo diferentes combinaciones de los tres factores mencionados, lo cual provee un nuevo y valioso criterio para llevar a cabo el proceso de reducci\'on de dimensionalidad. - [353] arXiv:2403.18687 [pdf, ps, other]
-
Title: InceptionTime vs. Wavelet -- A comparison for time series classificationComments: 4 pages, 1 figureSubjects: Machine Learning (cs.LG)
Neural networks were used to classify infrasound data. Two different approaches were compared. One based on the direct classification of time series data, using a custom implementation of the InceptionTime network. For the other approach, we generated 2D images of the wavelet transformation of the signals, which were subsequently classified using a ResNet implementation. Choosing appropriate hyperparameter settings, both achieve a classification accuracy of above 90 %, with the direct approach reaching 95.2 %.
- [354] arXiv:2403.18690 [pdf, other]
-
Title: Annolid: Annotate, Segment, and Track Anything You NeedSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Annolid is a deep learning-based software package designed for the segmentation, labeling, and tracking of research targets within video files, focusing primarily on animal behavior analysis. Based on state-of-the-art instance segmentation methods, Annolid now harnesses the Cutie video object segmentation model to achieve resilient, markerless tracking of multiple animals from single annotated frames, even in environments in which they may be partially or entirely concealed by environmental features or by one another. Our integration of Segment Anything and Grounding-DINO strategies additionally enables the automatic masking and segmentation of recognizable animals and objects by text command, removing the need for manual annotation. Annolid's comprehensive approach to object segmentation flexibly accommodates a broad spectrum of behavior analysis applications, enabling the classification of diverse behavioral states such as freezing, digging, pup huddling, and social interactions in addition to the tracking of animals and their body parts.
- [355] arXiv:2403.18692 [pdf, ps, other]
-
Title: Teaching Introductory HRI: UChicago Course "Human-Robot Interaction: Research and Practice"Authors: Sarah SeboComments: 4 pages, 2 tables, Presented at the Designing an Intro to HRI Course Workshop at HRI 2024 (arXiv:2403.05588)Subjects: Robotics (cs.RO); Human-Computer Interaction (cs.HC)
In 2020, I designed the course CMSC 20630/30630 Human-Robot Interaction: Research and Practice as a hands-on introduction to human-robot interaction (HRI) research for both undergraduate and graduate students at the University of Chicago. Since 2020, I have taught and refined this course each academic year. Human-Robot Interaction: Research and Practice focuses on the core concepts and cutting-edge research in the field of human-robot interaction (HRI), covering topics that include: nonverbal robot behavior, verbal robot behavior, social dynamics, norms & ethics, collaboration & learning, group interactions, applications, and future challenges of HRI. Course meetings involve students in the class leading discussions about cutting-edge peer-reviewed research HRI publications. Students also participate in a quarter-long collaborative research project, where they pursue an HRI research question that often involves conducing their own human-subjects research study where they recruit human subjects to interact with a robot. In this paper, I detail the structure of the course and its learning goals as well as my reflections and student feedback on the course.
- [356] arXiv:2403.18695 [pdf, other]
-
Title: An Efficient Risk-aware Branch MPC for Automated Driving that is Robust to Uncertain Vehicle BehaviorsSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
One of the critical challenges in automated driving is ensuring safety of automated vehicles despite the unknown behavior of the other vehicles. Although motion prediction modules are able to generate a probability distribution associated with various behavior modes, their probabilistic estimates are often inaccurate, thus leading to a possibly unsafe trajectory. To overcome this challenge, we propose a risk-aware motion planning framework that appropriately accounts for the ambiguity in the estimated probability distribution. We formulate the risk-aware motion planning problem as a min-max optimization problem and develop an efficient iterative method by incorporating a regularization term in the probability update step. Via extensive numerical studies, we validate the convergence of our method and demonstrate its advantages compared to the state-of-the-art approaches.
- [357] arXiv:2403.18697 [pdf, other]
-
Title: The Invalsi Benchmark: measuring Language Models Mathematical and Language understanding in ItalianSubjects: Computation and Language (cs.CL)
While Italian is by all metrics a high resource language, currently, there are isn't a Language Model pre-trained exclusively in this language. This results in a lower number of available benchmarks to evaluate the performance of language models in Italian.
This work presents two new benchmarks to evaluate the models performance on mathematical understanding and language understanding in Italian. These benchmarks are based on real tests that are undertaken by students of age between 11 and 18 within the Italian school system and have therefore been validated by several experts in didactics and pedagogy.
To validate this dataset we evaluate the performance of 9 language models that are the best performing when writing in Italian, including our own fine-tuned models. We show that this is a challenging benchmark where current language models are bound by 60\% accuracy.
We believe that the release of this dataset paves the way for improving future models mathematical and language understanding in Italian. - [358] arXiv:2403.18699 [pdf, other]
-
Title: Contrastive Learning with Orthonormal Anchors (CLOA)Comments: 11 pages, 4 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
This study focuses on addressing the instability issues prevalent in contrastive learning, specifically examining the InfoNCE loss function and its derivatives. We reveal a critical observation that these loss functions exhibit a restrictive behavior, leading to a convergence phenomenon where embeddings tend to merge into a singular point. This "over-fusion" effect detrimentally affects classification accuracy in subsequent supervised-learning tasks. Through theoretical analysis, we demonstrate that embeddings, when equalized or confined to a rank-1 linear subspace, represent a local minimum for InfoNCE. In response to this challenge, our research introduces an innovative strategy that leverages the same or fewer labeled data than typically used in the fine-tuning phase. The loss we proposed, Orthonormal Anchor Regression Loss, is designed to disentangle embedding clusters, significantly enhancing the distinctiveness of each embedding while simultaneously ensuring their aggregation into dense, well-defined clusters. Our method demonstrates remarkable improvements with just a fraction of the conventional label requirements, as evidenced by our results on CIFAR10 and CIFAR100 datasets.
- [359] arXiv:2403.18702 [pdf, other]
-
Title: Toward CXL-Native Memory Tiering via Device-Side ProfilingAuthors: Zhe Zhou, Yiqi Chen, Tao Zhang, Yang Wang, Ran Shu, Shuotao Xu, Peng Cheng, Lei Qu, Yongqiang Xiong, Guangyu SunSubjects: Hardware Architecture (cs.AR)
The Compute Express Link (CXL) interconnect has provided the ability to integrate diverse memory types into servers via byte-addressable SerDes links. Harnessing the full potential of such heterogeneous memory systems requires efficient memory tiering. However, existing research in this domain has been constrained by low-resolution and high-overhead memory access profiling techniques. To address this critical challenge, we propose to enhance existing memory tiering systems with a novel NeoMem solution. NeoMem offloads memory profiling functions to device-side controllers, integrating a dedicated hardware unit called NeoProf. NeoProf readily tracks memory access and provides the operating system with crucial page hotness statistics and other useful system state information. On the OS kernel side, we introduce a revamped memory-tiering strategy, enabling accurate and timely hot page promotion based on NeoProf statistics. We implement NeoMem on a real CXL-enabled FPGA platform and Linux kernel v6.3. Comprehensive evaluations demonstrate that NeoMem achieves 32% to 67% geomean speedup over several existing memory tiering solutions.
- [360] arXiv:2403.18703 [pdf, other]
-
Title: Fpga-Based Neural Thrust Controller for UAVsAuthors: Sharif Azem, David Scheunert, Mengguang Li, Jonas Gehrunger, Kai Cui, Christian Hochberger, Heinz KoeppSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
The advent of unmanned aerial vehicles (UAVs) has improved a variety of fields by providing a versatile, cost-effective and accessible platform for implementing state-of-the-art algorithms. To accomplish a broader range of tasks, there is a growing need for enhanced on-board computing to cope with increasing complexity and dynamic environmental conditions. Recent advances have seen the application of Deep Neural Networks (DNNs), particularly in combination with Reinforcement Learning (RL), to improve the adaptability and performance of UAVs, especially in unknown environments. However, the computational requirements of DNNs pose a challenge to the limited computing resources available on many UAVs. This work explores the use of Field Programmable Gate Arrays (FPGAs) as a viable solution to this challenge, offering flexibility, high performance, energy and time efficiency. We propose a novel hardware board equipped with an Artix-7 FPGA for a popular open-source micro-UAV platform. We successfully validate its functionality by implementing an RL-based low-level controller using real-world experiments.
- [361] arXiv:2403.18704 [pdf, ps, other]
-
Title: Convergence rates under a range invariance condition with application to electrical impedance tomographyAuthors: Barbara KaltenbacherSubjects: Numerical Analysis (math.NA)
This paper is devoted to proving convergence rates of variational and iterative regularization methods under variational source conditions VSCs for inverse problems whose linearization satisfies a range invariance condition. In order to achieve this, often an appropriate relaxation of the problem needs to be found that is usually based on an augmentation of the set of unknowns and leads to a particularly structured reformulation of the inverse problem. We analyze three approaches that make use of this structure, namely a variational and a Newton type scheme, whose convergence without rates has already been established in \cite{rangeinvar}; additionally we propose a split minimization approach that can be show to satisfy the same rates results. \\ The range invariance condition has been verified for several coefficient identification problems for partial differential equations from boundary observations as relevant in a variety of tomographic imaging modalities. Our motivation particularly comes from the by now classical inverse problem of electrical impedance tomography EIT and we study both the original formulation by a diffusion type equation and its reformulation as a Schr\"odinger equation. For both of them we find relaxations that can be proven to satisfy the range invariance condition. Combining results on VSCs from \cite{Diss-Weidling} with the abstract framework for the three approaches mentioned above, we arrive at convergence rates results for the variational, split minimization and Newton type method in EIT.
- [362] arXiv:2403.18705 [pdf, other]
-
Title: Conditional Wasserstein Distances with Applications in Bayesian OT Flow MatchingComments: This paper supersedes arXiv:2310.13433Subjects: Machine Learning (cs.LG); Optimization and Control (math.OC)
In inverse problems, many conditional generative models approximate the posterior measure by minimizing a distance between the joint measure and its learned approximation. While this approach also controls the distance between the posterior measures in the case of the Kullback--Leibler divergence, this is in general not hold true for the Wasserstein distance. In this paper, we introduce a conditional Wasserstein distance via a set of restricted couplings that equals the expected Wasserstein distance of the posteriors. Interestingly, the dual formulation of the conditional Wasserstein-1 flow resembles losses in the conditional Wasserstein GAN literature in a quite natural way. We derive theoretical properties of the conditional Wasserstein distance, characterize the corresponding geodesics and velocity fields as well as the flow ODEs. Subsequently, we propose to approximate the velocity fields by relaxing the conditional Wasserstein distance. Based on this, we propose an extension of OT Flow Matching for solving Bayesian inverse problems and demonstrate its numerical advantages on an inverse problem and class-conditional image generation.
- [363] arXiv:2403.18708 [pdf, other]
-
Title: Dense Vision Transformer Compression with Few SamplesComments: Accepted to CVPR 2024. Note: Jianxin Wu is a contributing author for the arXiv version of this paper but is not listed as an author in the CVPR version due to his role as Program ChairSubjects: Computer Vision and Pattern Recognition (cs.CV)
Few-shot model compression aims to compress a large model into a more compact one with only a tiny training set (even without labels). Block-level pruning has recently emerged as a leading technique in achieving high accuracy and low latency in few-shot CNN compression. But, few-shot compression for Vision Transformers (ViT) remains largely unexplored, which presents a new challenge. In particular, the issue of sparse compression exists in traditional CNN few-shot methods, which can only produce very few compressed models of different model sizes. This paper proposes a novel framework for few-shot ViT compression named DC-ViT. Instead of dropping the entire block, DC-ViT selectively eliminates the attention module while retaining and reusing portions of the MLP module. DC-ViT enables dense compression, which outputs numerous compressed models that densely populate the range of model complexity. DC-ViT outperforms state-of-the-art few-shot compression methods by a significant margin of 10 percentage points, along with lower latency in the compression of ViT and its variants.
- [364] arXiv:2403.18710 [pdf, other]
-
Title: Deep Learning for Traffic Flow Prediction using Cellular Automata-based Model and CNN-LSTM architectureSubjects: Machine Learning (cs.LG)
Recent works have attempted to use deep learning to predict future states of traffic flow, but have met with mixed results. These approaches face two key challenges. First, training deep learning neural networks requires large amounts of training data which are not yet easily available for traffic flow systems. Second, even when data is available, the neural networks require access to historical data that covers most possible traffic flow dynamics to successfully predict future traffic states. Specifically, these deep learning approaches do not fully leverage domain-knowledge about traffic flow dynamics, despite a significant existing knowledge-base. In this work, we propose to solve both issues using a Convolutional Neural Network (CNNs) with Long Short Term Memory (LSTM) deep learning architecture to successfully predict traffic flow, while leveraging a cellular automata-based statistical mechanics model of traffic flow to generate training and test data. Another major contribution of this paper is the insight that training data for a large traffic system can actually be sampled from the simulations of a much smaller traffic system. This is achieved through observing that the normalized energy distribution of the statistical mechanics model is scale invariant, which significantly eases the burden of data generation for large scale traffic systems. The resulting simulations indicate good agreement between the predicted and the true traffic flow dynamics.
- [365] arXiv:2403.18711 [pdf, other]
-
Title: SAT-NGP : Unleashing Neural Graphics Primitives for Fast Relightable Transient-Free 3D reconstruction from Satellite ImageryComments: 5 pages, 3 figures, 1 table; Accepted to International Geoscience and Remote Sensing Symposium (IGARSS) 2024; Code available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Current stereo-vision pipelines produce high accuracy 3D reconstruction when using multiple pairs or triplets of satellite images. However, these pipelines are sensitive to the changes between images that can occur as a result of multi-date acquisitions. Such variations are mainly due to variable shadows, reflexions and transient objects (cars, vegetation). To take such changes into account, Neural Radiance Fields (NeRF) have recently been applied to multi-date satellite imagery. However, Neural methods are very compute-intensive, taking dozens of hours to learn, compared with minutes for standard stereo-vision pipelines. Following the ideas of Instant Neural Graphics Primitives we propose to use an efficient sampling strategy and multi-resolution hash encoding to accelerate the learning. Our model, Satellite Neural Graphics Primitives (SAT-NGP) decreases the learning time to 15 minutes while maintaining the quality of the 3D reconstruction.
- [366] arXiv:2403.18713 [pdf, other]
-
Title: Characterization of Spatial-Temporal Channel Statistics from Indoor Measurement Data at D BandAuthors: Chathuri Weragama, Joonas Kokkoniemi, Mar Francis De Guzman, Katsuyuki Haneda, Pekka Kyosti, Markku JunttiComments: 6 pages, 22 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Millimeter-wave (mmWave) and D Band (110--170~GHz) frequencies are poised to play a pivotal role in the advancement of sixth-generation (6G) systems and beyond, owing to their ability to enhance performance metrics such as capacity, ultra-low latency, and spectral efficiency. This paper concentrates on deriving statistical insights into power, delay, and the number of paths based on measurements conducted across four distinct locations at a center frequency of 143.1 GHz. The findings underscore the suitability of various distributions in characterizing power behavior in line-of-sight (LOS) scenarios, including lognormal, Nakagami, gamma, and beta distributions, whereas the loglogistic distribution gives the optimal fit for power distribution in non-line-of-sight (NLOS) scenarios. Moreover, the exponential distribution shows to be the most appropriate model for the delay distribution in both LOS and NLOS scenarios. In terms of the number of paths, observations indicate a tendency for the highest concentration within the 10 m to 30 m distance range between the transmitter (Tx) and receiver (Rx). These insights shed light on the statistical nature of D band propagation characteristics, which are vital for informing the design and optimization of future 6G communication systems
- [367] arXiv:2403.18714 [pdf, other]
-
Title: Bringing Textual Prompt to AI-Generated Image Quality AssessmentComments: 6 pages, 3 figures, accepted by ICME2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
AI-Generated Images (AGIs) have inherent multimodal nature. Unlike traditional image quality assessment (IQA) on natural scenarios, AGIs quality assessment (AGIQA) takes the correspondence of image and its textual prompt into consideration. This is coupled in the ground truth score, which confuses the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs Quality Assessment via Image and Prompt), a multimodal framework for AGIQA via corresponding image and prompt incorporation. Specifically, we propose a novel incremental pretraining task named Image2Prompt for better understanding of AGIs and their corresponding textual prompts. An effective and efficient image-prompt fusion module, along with a novel special [QA] token, are also applied. Both are plug-and-play and beneficial for the cooperation of image and its corresponding prompt. Experiments demonstrate that our IP-IQA achieves the state-of-the-art on AGIQA-1k and AGIQA-3k datasets. Code will be available.
- [368] arXiv:2403.18715 [pdf, other]
-
Title: Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive DecodingSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM)
Large Vision-Language Models (LVLMs) are increasingly adept at generating contextually detailed and coherent responses from visual inputs. However, their application in multimodal decision-making and open-ended generation is hindered by a notable rate of hallucinations, where generated text inaccurately represents the visual contents. To address this issue, this paper introduces the Instruction Contrastive Decoding (ICD) method, a novel approach designed to reduce hallucinations during LVLM inference. Our method is inspired by our observation that what we call disturbance instructions significantly exacerbate hallucinations in multimodal fusion modules. ICD contrasts distributions from standard and instruction disturbance, thereby increasing alignment uncertainty and effectively subtracting hallucinated concepts from the original distribution. Through comprehensive experiments on discriminative benchmarks (POPE and MME) and a generative benchmark (LLaVa-Bench), we demonstrate that ICD significantly mitigates both object-level and attribute-level hallucinations. Moreover, our method not only addresses hallucinations but also significantly enhances the general perception and recognition capabilities of LVLMs.
- [369] arXiv:2403.18716 [pdf, other]
-
Title: Statistical testing of random number generators and their improvement using randomness extractionComments: 20+10 pages, 8 figures and 28 tables. Comments are welcome!Subjects: Cryptography and Security (cs.CR); Quantum Physics (quant-ph)
Random number generators (RNGs) are notoriously hard to build and test, especially in a cryptographic setting. Although one cannot conclusively determine the quality of an RNG by testing the statistical properties of its output alone, running numerical tests is both a powerful verification tool and the only universally applicable method. In this work, we present and make available a comprehensive statistical testing environment (STE) that is based on existing statistical test suites. The STE can be parameterised to run lightweight (i.e. fast) all the way to intensive testing, which goes far beyond what is required by certification bodies. With it, we benchmark the statistical properties of several RNGs, comparing them against each other. We then present and implement a variety of post-processing methods, in the form of randomness extractors, which improve the RNG's output quality under different sets of assumptions and analyse their impact through numerical testing with the STE.
- [370] arXiv:2403.18717 [pdf, other]
-
Title: Semi-Supervised Learning for Deep Causal Generative ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Developing models that can answer questions of the form "How would $x$ change if $y$ had been $z$?" is fundamental for advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that corresponding labels are available in training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels.
- [371] arXiv:2403.18720 [pdf, ps, other]
-
Title: Testing Resource Isolation for System-on-Chip ArchitecturesComments: In Proceedings MARS 2024, arXiv:2403.17862Journal-ref: EPTCS 399, 2024, pp. 129-168Subjects: Hardware Architecture (cs.AR); Cryptography and Security (cs.CR); Software Engineering (cs.SE)
Ensuring resource isolation at the hardware level is a crucial step towards more security inside the Internet of Things. Even though there is still no generally accepted technique to generate appropriate tests, it became clear that tests should be generated at the system level. In this paper, we illustrate the modeling aspects in test generation for resource isolation, namely modeling the behavior and expressing the intended test scenario. We present both aspects using the industrial standard PSS and an academic approach based on conformance testing.
- [372] arXiv:2403.18721 [pdf, other]
-
Title: PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab InvestigationsComments: Submitted to IEEE RO-MANSubjects: Robotics (cs.RO)
Robot systems in education can leverage Large language models' (LLMs) natural language understanding capabilities to provide assistance and facilitate learning. This paper proposes a multimodal interactive robot (PhysicsAssistant) built on YOLOv8 object detection, cameras, speech recognition, and chatbot using LLM to provide assistance to students' physics labs. We conduct a user study on ten 8th-grade students to empirically evaluate the performance of PhysicsAssistant with a human expert. The Expert rates the assistants' responses to student queries on a 0-4 scale based on Bloom's taxonomy to provide educational support. We have compared the performance of PhysicsAssistant (YOLOv8+GPT-3.5-turbo) with GPT-4 and found that the human expert rating of both systems for factual understanding is the same. However, the rating of GPT-4 for conceptual and procedural knowledge (3 and 3.2 vs 2.2 and 2.6, respectively) is significantly higher than PhysicsAssistant (p < 0.05). However, the response time of GPT-4 is significantly higher than PhysicsAssistant (3.54 vs 1.64 sec, p < 0.05). Hence, despite the relatively lower response quality of PhysicsAssistant than GPT-4, it has shown potential for being used as a real-time lab assistant to provide timely responses and can offload teachers' labor to assist with repetitive tasks. To the best of our knowledge, this is the first attempt to build such an interactive multimodal robotic assistant for K-12 science (physics) education.
- [373] arXiv:2403.18722 [pdf, other]
-
Title: Formally Modelling the Rijkswaterstaat Tunnel Control Systems in a Constrained Industrial EnvironmentAuthors: Kevin H.J. Jilissen (Rijkswaterstaat), Peter Dieleman (Rijkswaterstaat), Jan Friso Groote (Eindhoven University of Technology)Comments: In Proceedings MARS 2024, arXiv:2403.17862Journal-ref: EPTCS 399, 2024, pp. 101-127Subjects: Logic in Computer Science (cs.LO)
Rijkswaterstaat, the National Dutch body responsible for infrastructure, recognised the importance of formal modelling and set up a program to model the control of road tunnels. This is done to improve the standardisation of tunnel control and make communication with suppliers smoother. A subset of SysML is used to formulate the models, which are substantial. In an earlier paper we have shown that these models can be used to prove behavioural properties by manually translating the models to mCRL2. In this paper we report on an automatic translation to mCRL2. As the results of the translation became unwieldy, we also investigated modelling tunnel control in the specification language Dezyne which has built-in verification capabilities and compared the results.
- [374] arXiv:2403.18723 [pdf, other]
-
Title: Four Formal Models of IEEE 1394 Link LayerAuthors: Hubert Garavel (Univ. Grenoble Alpes, INRIA, CNRS, Grenoble INP, LIG, Grenoble, France), Bas Luttik (Eindhoven University of Technology, The Netherlands)Comments: In Proceedings MARS 2024, arXiv:2403.17862Journal-ref: EPTCS 399, 2024, pp. 21-100Subjects: Logic in Computer Science (cs.LO); Hardware Architecture (cs.AR); Programming Languages (cs.PL)
We revisit the IEEE 1394 high-performance serial bus ("FireWire"), which became a success story in formal methods after three PhD students, by using process algebra and model checking, detected a deadlock error in this IEEE standard. We present four formal models for the asynchronous mode of the Link Layer of IEEE 1394: the original model in muCRL, a simplified model in mCRL2, a revised model in LOTOS, and a novel model in LNT.
- [375] arXiv:2403.18724 [pdf, other]
-
Title: An exactly curl-free finite-volume scheme for a hyperbolic compressible barotropic two-phase modelSubjects: Numerical Analysis (math.NA)
We present a new second order accurate structure-preserving finite volume scheme for the solution of the compressible barotropic two-phase model of Romenski et. al in multiple space dimensions. The governing equations fall into the wider class of symmetric hyperbolic and thermodynamically compatible (SHTC) systems and consist of a set of first-order hyperbolic partial differential equations (PDE). In the absence of algebraic source terms, the model is subject to a curl-free constraint for the relative velocity between the two phases. The main objective of this paper is, therefore, to preserve this structural property exactly also at the discrete level. The new numerical method is based on a staggered grid arrangement where the relative velocity field is stored in the cell vertexes while all the remaining variables are stored in the cell centers. This allows the definition of discretely compatible gradient and curl operators, which ensure that the discrete curl errors of the relative velocity field remain zero up to machine precision. A set of numerical results confirms this property also experimentally.
- [376] arXiv:2403.18725 [pdf, ps, other]
-
Title: Probabilistic Model Checking of Stochastic Reinforcement Learning PoliciesSubjects: Artificial Intelligence (cs.AI)
We introduce a method to verify stochastic reinforcement learning (RL) policies. This approach is compatible with any RL algorithm as long as the algorithm and its corresponding environment collectively adhere to the Markov property. In this setting, the future state of the environment should depend solely on its current state and the action executed, independent of any previous states or actions. Our method integrates a verification technique, referred to as model checking, with RL, leveraging a Markov decision process, a trained RL policy, and a probabilistic computation tree logic (PCTL) formula to build a formal model that can be subsequently verified via the model checker Storm. We demonstrate our method's applicability across multiple benchmarks, comparing it to baseline methods called deterministic safety estimates and naive monolithic model checking. Our results show that our method is suited to verify stochastic RL policies.
- [377] arXiv:2403.18729 [pdf, other]
-
Title: ConstraintFlow: A DSL for Specification and Verification of Neural Network AnalysesSubjects: Programming Languages (cs.PL)
The uninterpretability of DNNs hinders their deployment to safety-critical applications. Recent works have shown that Abstract-Interpretation-based formal certification techniques provide promising avenues for building trust in DNNs to some extent. The intricate mathematical background of Abstract Interpretation poses two challenges: (i) easily designing the algorithms that capture the intricate DNN behavior by balancing cost vs. precision tradeoff, and (ii) maintaining the over-approximation-based soundness of these certifiers.
General-purpose programming languages like C++ provide extensive functionality, however, verifying the soundness of the algorithms written in them can be impractical. The most commonly used DNN certification libraries like auto_LiRPA and ERAN prove the correctness of their analyses. However, they consist of only a few hard-coded abstract domains and abstract transformers (or transfer functions) and do not allow the user to define new analyses. Further, these libraries can handle only specific DNN architectures.
To address these issues, we develop a declarative DSL -- ConstraintFlow -- that can be used to specify Abstract Interpretation-based DNN certifiers. In ConstraintFlow, programmers can easily define various existing and new abstract domains and transformers, all within just a few 10s of Lines of Code as opposed to 1000s of LOCs of existing libraries. We also provide lightweight automatic verification, which can be used to ensure the over-approximation-based soundness of the certifier code written in ConstraintFlow for arbitrary (but bounded) DNN architectures. Using this automated verification procedure, for the first time, we can verify the soundness of state-of-the-art DNN certifiers for arbitrary DNN architectures, all within a few minutes. We prove the soundness of our verification procedure and the completeness of a subset of ConstraintFlow. - [378] arXiv:2403.18730 [pdf, other]
-
Title: Towards Image Ambient Lighting NormalizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Lighting normalization is a crucial but underexplored restoration task with broad applications. However, existing works often simplify this task within the context of shadow removal, limiting the light sources to one and oversimplifying the scene, thus excluding complex self-shadows and restricting surface classes to smooth ones. Although promising, such simplifications hinder generalizability to more realistic settings encountered in daily use. In this paper, we propose a new challenging task termed Ambient Lighting Normalization (ALN), which enables the study of interactions between shadows, unifying image restoration and shadow removal in a broader context. To address the lack of appropriate datasets for ALN, we introduce the large-scale high-resolution dataset Ambient6K, comprising samples obtained from multiple light sources and including self-shadows resulting from complex geometries, which is the first of its kind. For benchmarking, we select various mainstream methods and rigorously evaluate them on Ambient6K. Additionally, we propose IFBlend, a novel strong baseline that maximizes Image-Frequency joint entropy to selectively restore local areas under different lighting conditions, without relying on shadow localization priors. Experiments show that IFBlend achieves SOTA scores on Ambient6K and exhibits competitive performance on conventional shadow removal benchmarks compared to shadow-specific models with mask priors. The dataset, benchmark, and code are available at https://github.com/fvasluianu97/IFBlend.
- [379] arXiv:2403.18731 [pdf, other]
-
Title: Enhancing Manufacturing Quality Prediction Models through the Integration of Explainability MethodsSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Machine Learning (cs.LG)
This research presents a method that utilizes explainability techniques to amplify the performance of machine learning (ML) models in forecasting the quality of milling processes, as demonstrated in this paper through a manufacturing use case. The methodology entails the initial training of ML models, followed by a fine-tuning phase where irrelevant features identified through explainability methods are eliminated. This procedural refinement results in performance enhancements, paving the way for potential reductions in manufacturing costs and a better understanding of the trained ML models. This study highlights the usefulness of explainability techniques in both explaining and optimizing predictive models in the manufacturing realm.
- [380] arXiv:2403.18735 [pdf, other]
-
Title: Nonlinear model reduction for operator learningComments: Published as a Tiny Paper at ICLR 2024 (Notable)Subjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
Operator learning provides methods to approximate mappings between infinite-dimensional function spaces. Deep operator networks (DeepONets) are a notable architecture in this field. Recently, an extension of DeepONet based on model reduction and neural networks, proper orthogonal decomposition (POD)-DeepONet, has been able to outperform other architectures in terms of accuracy for several benchmark tests. We extend this idea towards nonlinear model order reduction by proposing an efficient framework that combines neural networks with kernel principal component analysis (KPCA) for operator learning. Our results demonstrate the superior performance of KPCA-DeepONet over POD-DeepONet.
- [381] arXiv:2403.18739 [pdf, other]
-
Title: Usage-Specific Survival Modeling Based on Operational Data and Neural NetworksComments: 7 pagesSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
Accurate predictions of when a component will fail are crucial when planning maintenance, and by modeling the distribution of these failure times, survival models have shown to be particularly useful in this context. The presented methodology is based on conventional neural network-based survival models that are trained using data that is continuously gathered and stored at specific times, called snapshots. An important property of this type of training data is that it can contain more than one snapshot from a specific individual which results in that standard maximum likelihood training can not be directly applied since the data is not independent. However, the papers show that if the data is in a specific format where all snapshot times are the same for all individuals, called homogeneously sampled, maximum likelihood training can be applied and produce desirable results. In many cases, the data is not homogeneously sampled and in this case, it is proposed to resample the data to make it homogeneously sampled. How densely the dataset is sampled turns out to be an important parameter; it should be chosen large enough to produce good results, but this also increases the size of the dataset which makes training slow. To reduce the number of samples needed during training, the paper also proposes a technique to, instead of resampling the dataset once before the training starts, randomly resample the dataset at the start of each epoch during the training. The proposed methodology is evaluated on both a simulated dataset and an experimental dataset of starter battery failures. The results show that if the data is homogeneously sampled the methodology works as intended and produces accurate survival models. The results also show that randomly resampling the dataset on each epoch is an effective way to reduce the size of the training data.
- [382] arXiv:2403.18742 [pdf, other]
-
Title: Understanding the Learning Dynamics of Alignment with Human FeedbackSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Aligning large language models (LLMs) with human intentions has become a critical task for safely deploying models in real-world systems. While existing alignment approaches have seen empirical success, theoretically understanding how these methods affect model behavior remains an open question. Our work provides an initial attempt to theoretically analyze the learning dynamics of human preference alignment. We formally show how the distribution of preference datasets influences the rate of model updates and provide rigorous guarantees on the training accuracy. Our theory also reveals an intricate phenomenon where the optimization is prone to prioritizing certain behaviors with higher preference distinguishability. We empirically validate our findings on contemporary LLMs and alignment tasks, reinforcing our theoretical insights and shedding light on considerations for future alignment approaches. Disclaimer: This paper contains potentially offensive text; reader discretion is advised.
- [383] arXiv:2403.18745 [pdf, other]
-
Title: Fast Decision Algorithms for Efficient Access Point Assignment in SDN-Controlled Wireless Access NetworksAuthors: Pablo Fondo-Ferreiro, Saber Mhiri, Cristina López-Bravo, Francisco Javier González-Castaño, Felipe Gil-CastiñeiraComments: Accepted version of the article published in IEEE Transactions on Network and Service ManagementJournal-ref: IEEE Transactions on Network and Service Management, vol. 16, no. 3, pp. 1059-1070, September 2019Subjects: Networking and Internet Architecture (cs.NI)
Global optimization of access point (AP) assignment to user terminals requires efficient monitoring of user behavior, fast decision algorithms, efficient control signaling, and fast AP reassignment mechanisms. In this scenario, software defined networking (SDN) technology may be suitable for network monitoring, signaling, and control. We recently proposed embedding virtual switches in user terminals for direct management by an SDN controller, further contributing to SDN-oriented access network optimization. However, since users may restrict terminal-side traffic monitoring for privacy reasons (a common assumption by previous authors), we infer user traffic classes at the APs. On the other hand, since handovers will be more frequent in dense small-cell networks (e.g., mmWave-based 5G deployments will require dense network topologies with inter-site distances of ~150-200 m), the delay to take assignment decisions should be minimal. To this end, we propose taking fast decisions based exclusively on extremely simple network-side application flow-type predictions based on past user behavior. Using real data we show that a centralized allocation algorithm based on those predictions achieves network utilization levels that approximate those of optimal allocations. We also test a distributed version of this algorithm. Finally, we quantify the elapsed time since a user traffic event takes place until its terminal is assigned an AP, when needed.
- [384] arXiv:2403.18746 [pdf, other]
-
Title: CYCLE: Learning to Self-Refine the Code GenerationComments: Camera-ready for OOPSLA'24Subjects: Software Engineering (cs.SE); Computation and Language (cs.CL)
Pre-trained code language models have achieved promising performance in code generation and improved the programming efficiency of human developers. However, their self-refinement capability is typically overlooked by the existing evaluations of code LMs, which focus only on the accuracy of the one-time prediction. For the cases when code LMs fail to implement the correct program, developers actually find it hard to debug and fix the faulty prediction since it is not written by the developers themselves. Unfortunately, our study reveals that code LMs cannot efficiently self-refine their faulty generations as well.
In this paper, we propose CYCLE framework, learning to self-refine the faulty generation according to the available feedback, such as the execution results reported by the test suites. We evaluate CYCLE on three popular code generation benchmarks, HumanEval, MBPP, and APPS. The results reveal that CYCLE successfully maintains, sometimes improves, the quality of one-time code generation, while significantly improving the self-refinement capability of code LMs. We implement four variants of CYCLE with varied numbers of parameters across 350M, 1B, 2B, and 3B, and the experiments show that CYCLE consistently boosts the code generation performance, by up to 63.5%, across benchmarks and varied model sizes. We also notice that CYCLE outperforms code LMs that have 3$\times$ more parameters in self-refinement. - [385] arXiv:2403.18749 [pdf, other]
-
Title: Robust Numerical Algebraic GeometrySubjects: Numerical Analysis (math.NA)
The field of numerical algebraic geometry consists of algorithms for numerically solving systems of polynomial equations. When the system is exact, such as having rational coefficients, the solution set is well-defined. However, for a member of a parameterized family of polynomial systems where the parameter values may be measured with imprecision or arise from prior numerical computations, uncertainty may arise in the structure of the solution set, including the number of isolated solutions, the existence of higher dimensional solution components, and the number of irreducible components along with their multiplicities. The loci where these structures change form a stratification of exceptional algebraic sets in the space of parameters. We describe methodologies for making the interpretation of numerical results more robust by searching for nearby parameter values on an exceptional set. We demonstrate these techniques on several illustrative examples and then treat several more substantial problems arising from the kinematics of mechanisms and robots.
- [386] arXiv:2403.18755 [pdf, other]
-
Title: Many-Objective Evolutionary Influence Maximization: Balancing Spread, Budget, Fairness, and TimeComments: To appear in Genetic and Evolutionary Computation Conference (GECCO 24 Companion), July 14 18, 2024, Melbourne, VIC, Australia. ACM, New York, NY, USASubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Social and Information Networks (cs.SI)
The Influence Maximization (IM) problem seeks to discover the set of nodes in a graph that can spread the information propagation at most. This problem is known to be NP-hard, and it is usually studied by maximizing the influence (spread) and, optionally, optimizing a second objective, such as minimizing the seed set size or maximizing the influence fairness. However, in many practical scenarios multiple aspects of the IM problem must be optimized at the same time. In this work, we propose a first case study where several IM-specific objective functions, namely budget, fairness, communities, and time, are optimized on top of the maximization of influence and minimization of the seed set size. To this aim, we introduce MOEIM (Many-Objective Evolutionary Algorithm for Influence Maximization) a Multi-Objective Evolutionary Algorithm (MOEA) based on NSGA-II incorporating graph-aware operators and a smart initialization. We compare MOEIM in two experimental settings, including a total of nine graph datasets, two heuristic methods, a related MOEA, and a state-of-the-art Deep Learning approach. The experiments show that MOEIM overall outperforms the competitors in most of the tested many-objective settings. To conclude, we also investigate the correlation between the objectives, leading to novel insights into the topic. The codebase is available at https://github.com/eliacunegatti/MOEIM.
- [387] arXiv:2403.18756 [pdf, ps, other]
-
Title: Detection of subclinical atherosclerosis by image-based deep learning on chest x-rayAuthors: Guglielmo Gallone, Francesco Iodice, Alberto Presta, Davide Tore, Ovidio de Filippo, Michele Visciano, Carlo Alberto Barbano, Alessandro Serafini, Paola Gorrini, Alessandro Bruno, Walter Grosso Marra, James Hughes, Mario Iannaccone, Paolo Fonio, Attilio Fiandrotti, Alessandro Depaoli, Marco Grangetto, Gaetano Maria de Ferrari, Fabrizio D'AscenzoComments: Submitted to European Heart Journal - Cardiovascular Imaging Added also the additional material 44 pages (30 main paper, 14 additional material), 14 figures (5 main manuscript, 9 additional material)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Aims. To develop a deep-learning based system for recognition of subclinical atherosclerosis on a plain frontal chest x-ray. Methods and Results. A deep-learning algorithm to predict coronary artery calcium (CAC) score (the AI-CAC model) was developed on 460 chest x-ray (80% training cohort, 20% internal validation cohort) of primary prevention patients (58.4% male, median age 63 [51-74] years) with available paired chest x-ray and chest computed tomography (CT) indicated for any clinical reason and performed within 3 months. The CAC score calculated on chest CT was used as ground truth. The model was validated on an temporally-independent cohort of 90 patients from the same institution (external validation). The diagnostic accuracy of the AI-CAC model assessed by the area under the curve (AUC) was the primary outcome. Overall, median AI-CAC score was 35 (0-388) and 28.9% patients had no AI-CAC. AUC of the AI-CAC model to identify a CAC>0 was 0.90 in the internal validation cohort and 0.77 in the external validation cohort. Sensitivity was consistently above 92% in both cohorts. In the overall cohort (n=540), among patients with AI-CAC=0, a single ASCVD event occurred, after 4.3 years. Patients with AI-CAC>0 had significantly higher Kaplan Meier estimates for ASCVD events (13.5% vs. 3.4%, log-rank=0.013). Conclusion. The AI-CAC model seems to accurately detect subclinical atherosclerosis on chest x-ray with elevated sensitivity, and to predict ASCVD events with elevated negative predictive value. Adoption of the AI-CAC model to refine CV risk stratification or as an opportunistic screening tool requires prospective evaluation.
- [388] arXiv:2403.18760 [pdf, other]
-
Title: MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language ModelSubjects: Robotics (cs.RO)
In the realm of data-driven AI technology, the application of open-source large language models (LLMs) in robotic task planning represents a significant milestone. Recent robotic task planning methods based on open-source LLMs typically leverage vast task planning datasets to enhance models' planning abilities. While these methods show promise, they struggle with complex long-horizon tasks, which require comprehending more context and generating longer action sequences. This paper addresses this limitation by proposing MLDT, theMulti-Level Decomposition Task planning method. This method innovatively decomposes tasks at the goal-level, task-level, and action-level to mitigate the challenge of complex long-horizon tasks. In order to enhance open-source LLMs' planning abilities, we introduce a goal-sensitive corpus generation method to create high-quality training data and conduct instruction tuning on the generated corpus. Since the complexity of the existing datasets is not high enough, we construct a more challenging dataset, LongTasks, to specifically evaluate planning ability on complex long-horizon tasks. We evaluate our method using various LLMs on four datasets in VirtualHome. Our results demonstrate a significant performance enhancement in robotic task planning, showcasing MLDT's effectiveness in overcoming the limitations of existing methods based on open-source LLMs as well as its practicality in complex, real-world scenarios.
- [389] arXiv:2403.18761 [pdf, other]
-
Title: MATTopo: Topology-preserving Medial Axis Transform with Restricted Power DiagramSubjects: Graphics (cs.GR)
We present a novel volumetric RPD (restricted power diagram) based framework for approximating the medial axes of 3D CAD shapes adaptively, while preserving topological equivalence, medial features, and geometric convergence. To solve the topology preservation problem, we propose a volumetric RPD based strategy, which discretizes the input volume into sub-regions given a set of medial spheres. With this intermediate structure, we convert the homotopy equivalence between the generated medial mesh and the input 3D shape into a localized problem between each primitive of the medial mesh (vertex, edge, face) and its dual restricted elements (power cell, power face, power edge), by checking their connected components and Euler characteristics. We further proposed a fractional Euler characteristic strategy for efficient GPU-based computation of Euler characteristic for each restricted element on the fly while computing the volumetric RPD. Compared with existing voxel-based or sampling-based methods, our method is the first that can adaptively and directly revise the medial mesh without modifying the dependent structure globally, such as voxel size or sampling density. Compared with the feature preservation method MATFP, our method offers geometrically comparable results with fewer number of spheres, while more robustly captures the topology of the input shape.
- [390] arXiv:2403.18762 [pdf, other]
-
Title: ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place RecognitionAuthors: Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli ChenComments: 8 pages, 11 figures, conferenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: https://github.com/haomo-ai/ModaLink.git.
- [391] arXiv:2403.18764 [pdf, other]
-
Title: Temporal Logic Formalisation of ISO 34502 Critical Scenarios: Modular Construction with the RSS Safety DistanceAuthors: Jesse Reimann, Nico Mansion, James Haydon, Benjamin Bray, Agnishom Chattopadhyay, Sota Sato, Masaki Waga, Étienne André, Ichiro Hasuo, Naoki Ueda, Yosuke YokoyamaComments: 12 pages, 4 figures, 5 tables. Accepted to SAC 2024Subjects: Robotics (cs.RO); Logic in Computer Science (cs.LO)
As the development of autonomous vehicles progresses, efficient safety assurance methods become increasingly necessary. Safety assurance methods such as monitoring and scenario-based testing call for formalisation of driving scenarios. In this paper, we develop a temporal-logic formalisation of an important class of critical scenarios in the ISO standard 34502. We use signal temporal logic (STL) as a logical formalism. Our formalisation has two main features: 1) modular composition of logical formulas for systematic and comprehensive formalisation (following the compositional methodology of ISO 34502); 2) use of the RSS distance for defining danger. We find our formalisation comes with few parameters to tune thanks to the RSS distance. We experimentally evaluated our formalisation; using its results, we discuss the validity of our formalisation and its stability with respect to the choice of some parameter values.
- [392] arXiv:2403.18765 [pdf, other]
-
Title: CaT: Constraints as Terminations for Legged Locomotion Reinforcement LearningAuthors: Elliot Chane-Sane, Pierre-Alexandre Leziart, Thomas Flayols, Olivier Stasse, Philippe Souères, Nicolas MansardComments: Project webpage: this https URLSubjects: Robotics (cs.RO); Machine Learning (cs.LG)
Deep Reinforcement Learning (RL) has demonstrated impressive results in solving complex robotic tasks such as quadruped locomotion. Yet, current solvers fail to produce efficient policies respecting hard constraints. In this work, we advocate for integrating constraints into robot learning and present Constraints as Terminations (CaT), a novel constrained RL algorithm. Departing from classical constrained RL formulations, we reformulate constraints through stochastic terminations during policy learning: any violation of a constraint triggers a probability of terminating potential future rewards the RL agent could attain. We propose an algorithmic approach to this formulation, by minimally modifying widely used off-the-shelf RL algorithms in robot learning (such as Proximal Policy Optimization). Our approach leads to excellent constraint adherence without introducing undue complexity and computational overhead, thus mitigating barriers to broader adoption. Through empirical evaluation on the real quadruped robot Solo crossing challenging obstacles, we demonstrate that CaT provides a compelling solution for incorporating constraints into RL frameworks. Videos and code are available at https://constraints-as-terminations.github.io.
- [393] arXiv:2403.18766 [pdf, ps, other]
-
Title: Superior Parallel Big Data Clustering through Competitive Stochastic Sample Size Optimization in Big-meansSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Information Retrieval (cs.IR)
This paper introduces a novel K-means clustering algorithm, an advancement on the conventional Big-means methodology. The proposed method efficiently integrates parallel processing, stochastic sampling, and competitive optimization to create a scalable variant designed for big data applications. It addresses scalability and computation time challenges typically faced with traditional techniques. The algorithm adjusts sample sizes dynamically for each worker during execution, optimizing performance. Data from these sample sizes are continually analyzed, facilitating the identification of the most efficient configuration. By incorporating a competitive element among workers using different sample sizes, efficiency within the Big-means algorithm is further stimulated. In essence, the algorithm balances computational time and clustering quality by employing a stochastic, competitive sampling strategy in a parallel computing setting.
- [394] arXiv:2403.18769 [pdf, ps, other]
-
Title: Improved Neural Protoform Reconstruction via Reflex PredictionComments: Accepted to LREC-COLING 2024Subjects: Computation and Language (cs.CL)
Protolanguage reconstruction is central to historical linguistics. The comparative method, one of the most influential theoretical and methodological frameworks in the history of the language sciences, allows linguists to infer protoforms (reconstructed ancestral words) from their reflexes (related modern words) based on the assumption of regular sound change. Not surprisingly, numerous computational linguists have attempted to operationalize comparative reconstruction through various computational models, the most successful of which have been supervised encoder-decoder models, which treat the problem of predicting protoforms given sets of reflexes as a sequence-to-sequence problem. We argue that this framework ignores one of the most important aspects of the comparative method: not only should protoforms be inferable from cognate sets (sets of related reflexes) but the reflexes should also be inferable from the protoforms. Leveraging another line of research -- reflex prediction -- we propose a system in which candidate protoforms from a reconstruction model are reranked by a reflex prediction model. We show that this more complete implementation of the comparative method allows us to surpass state-of-the-art protoform reconstruction methods on three of four Chinese and Romance datasets.
- [395] arXiv:2403.18771 [pdf, other]
-
Title: CheckEval: Robust Evaluation Framework using Large Language Model via ChecklistComments: HEAL at CHI 2024Subjects: Computation and Language (cs.CL)
We introduce CheckEval, a novel evaluation framework using Large Language Models, addressing the challenges of ambiguity and inconsistency in current evaluation methods. CheckEval addresses these challenges by dividing evaluation criteria into detailed sub-aspects and constructing a checklist of Boolean questions for each, simplifying the evaluation. This approach not only renders the process more interpretable but also significantly enhances the robustness and reliability of results by focusing on specific evaluation dimensions. Validated through a focused case study using the SummEval benchmark, CheckEval indicates a strong correlation with human judgments. Furthermore, it demonstrates a highly consistent Inter-Annotator Agreement. These findings highlight the effectiveness of CheckEval for objective, flexible, and precise evaluations. By offering a customizable and interactive framework, CheckEval sets a new standard for the use of LLMs in evaluation, responding to the evolving needs of the field and establishing a clear method for future LLM-based evaluation.
- [396] arXiv:2403.18774 [pdf, other]
-
Title: RAW: A Robust and Agile Plug-and-Play Watermark Framework for AI-Generated Images with Provable GuaranteesSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Safeguarding intellectual property and preventing potential misuse of AI-generated images are of paramount importance. This paper introduces a robust and agile plug-and-play watermark detection framework, dubbed as RAW. As a departure from traditional encoder-decoder methods, which incorporate fixed binary codes as watermarks within latent representations, our approach introduces learnable watermarks directly into the original image data. Subsequently, we employ a classifier that is jointly trained with the watermark to detect the presence of the watermark. The proposed framework is compatible with various generative architectures and supports on-the-fly watermark injection after training. By incorporating state-of-the-art smoothing techniques, we show that the framework provides provable guarantees regarding the false positive rate for misclassifying a watermarked image, even in the presence of certain adversarial attacks targeting watermark removal. Experiments on a diverse range of images generated by state-of-the-art diffusion models reveal substantial performance enhancements compared to existing approaches. For instance, our method demonstrates a notable increase in AUROC, from 0.48 to 0.82, when compared to state-of-the-art approaches in detecting watermarked images under adversarial attacks, while maintaining image quality, as indicated by closely aligned FID and CLIP scores.
- [397] arXiv:2403.18775 [pdf, other]
-
Title: ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic ObjectComments: Accepted at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We establish rigorous benchmarks for visual perception robustness. Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of evaluation over synthetic corruptions, backgrounds, and textures, yet those robustness benchmarks are restricted in specified variations and have low synthetic quality. In this work, we introduce generative model as a data source for synthesizing hard images that benchmark deep models' robustness. Leveraging diffusion models, we are able to generate images with more diversified backgrounds, textures, and materials than any prior work, where we term this benchmark as ImageNet-D. Experimental results show that ImageNet-D results in a significant accuracy drop to a range of vision models, from the standard ResNet visual classifier to the latest foundation models like CLIP and MiniGPT-4, significantly reducing their accuracy by up to 60\%. Our work suggests that diffusion models can be an effective source to test vision models. The code and dataset are available at https://github.com/chenshuang-zhang/imagenet_d.
- [398] arXiv:2403.18777 [pdf, other]
-
Title: New Graph and Hypergraph Container Lemmas with Applications in Property TestingComments: To appear at STOC 2024Subjects: Data Structures and Algorithms (cs.DS)
The graph and hypergraph container methods are powerful tools with a wide range of applications across combinatorics. Recently, Blais and Seth (FOCS 2023) showed that the graph container method is particularly well-suited for the analysis of the natural canonical tester for two fundamental graph properties: having a large independent set and $k$-colorability. In this work, we show that the connection between the container method and property testing extends further along two different directions.
First, we show that the container method can be used to analyze the canonical tester for many other properties of graphs and hypergraphs. We introduce a new hypergraph container lemma and use it to give an upper bound of $\widetilde{O}(kq^3/\epsilon)$ on the sample complexity of $\epsilon$-testing satisfiability, where $q$ is the number of variables per constraint and $k$ is the size of the alphabet. This is the first upper bound for the problem that is polynomial in all of $k$, $q$ and $1/\epsilon$. As a corollary, we get new upper bounds on the sample complexity of the canonical testers for hypergraph colorability and for every semi-homogeneous graph partition property.
Second, we show that the container method can also be used to study the query complexity of (non-canonical) graph property testers. This result is obtained by introducing a new container lemma for the class of all independent set stars, a strict superset of the class of all independent sets. We use this container lemma to give a new upper bound of $\widetilde{O}(\rho^5/\epsilon^{7/2})$ on the query complexity of $\epsilon$-testing the $\rho$-independent set property. This establishes for the first time the non-optimality of the canonical tester for a non-homogeneous graph partition property. - [399] arXiv:2403.18778 [pdf, other]
-
Title: 3P-LLM: Probabilistic Path Planning using Large Language Model for Autonomous Robot NavigationAuthors: Ehsan LatifComments: Exploratory StudySubjects: Robotics (cs.RO)
Much worldly semantic knowledge can be encoded in large language models (LLMs). Such information could be of great use to robots that want to carry out high-level, temporally extended commands stated in natural language. However, the lack of real-world experience that language models have is a key limitation that makes it challenging to use them for decision-making inside a particular embodiment. This research assesses the feasibility of using LLM (GPT-3.5-turbo chatbot by OpenAI) for robotic path planning. The shortcomings of conventional approaches to managing complex environments and developing trustworthy plans for shifting environmental conditions serve as the driving force behind the research. Due to the sophisticated natural language processing abilities of LLM, the capacity to provide effective and adaptive path-planning algorithms in real-time, great accuracy, and few-shot learning capabilities, GPT-3.5-turbo is well suited for path planning in robotics. In numerous simulated scenarios, the research compares the performance of GPT-3.5-turbo with that of state-of-the-art path planners like Rapidly Exploring Random Tree (RRT) and A*. We observed that GPT-3.5-turbo is able to provide real-time path planning feedback to the robot and outperforms its counterparts. This paper establishes the foundation for LLM-powered path planning for robotic systems.
- [400] arXiv:2403.18781 [pdf, other]
-
Title: Hypergraph Unreliability in Quasi-Polynomial TimeComments: To appear in STOC 2024Subjects: Data Structures and Algorithms (cs.DS)
The hypergraph unreliability problem asks for the probability that a hypergraph gets disconnected when every hyperedge fails independently with a given probability. For graphs, the unreliability problem has been studied over many decades, and multiple fully polynomial-time approximation schemes are known starting with the work of Karger (STOC 1995). In contrast, prior to this work, no non-trivial result was known for hypergraphs (of arbitrary rank).
In this paper, we give quasi-polynomial time approximation schemes for the hypergraph unreliability problem. For any fixed $\varepsilon \in (0, 1)$, we first give a $(1+\varepsilon)$-approximation algorithm that runs in $m^{O(\log n)}$ time on an $m$-hyperedge, $n$-vertex hypergraph. Then, we improve the running time to $m\cdot n^{O(\log^2 n)}$ with an additional exponentially small additive term in the approximation. - [401] arXiv:2403.18783 [pdf, other]
-
Title: Towards a World-English Language Model for On-Device Virtual AssistantsAuthors: Rricha Jalota, Lyan Verwimp, Markus Nussbaum-Thom, Amr Mousa, Arturo Argueta, Youssef OualilComments: Accepted in ICASSP 2024Subjects: Computation and Language (cs.CL)
Neural Network Language Models (NNLMs) for Virtual Assistants (VAs) are generally language-, region-, and in some cases, device-dependent, which increases the effort to scale and maintain them. Combining NNLMs for one or more of the categories is one way to improve scalability. In this work, we combine regional variants of English to build a ``World English'' NNLM for on-device VAs. In particular, we investigate the application of adapter bottlenecks to model dialect-specific characteristics in our existing production NNLMs {and enhance the multi-dialect baselines}. We find that adapter modules are more effective in modeling dialects than specializing entire sub-networks. Based on this insight and leveraging the design of our production models, we introduce a new architecture for World English NNLM that meets the accuracy, latency, and memory constraints of our single-dialect models.
- [402] arXiv:2403.18784 [pdf, other]
-
Title: SplatFace: Gaussian Splat Face Reconstruction Leveraging an Optimizable SurfaceSubjects: Computer Vision and Pattern Recognition (cs.CV)
We present SplatFace, a novel Gaussian splatting framework designed for 3D human face reconstruction without reliance on accurate pre-determined geometry. Our method is designed to simultaneously deliver both high-quality novel view rendering and accurate 3D mesh reconstructions. We incorporate a generic 3D Morphable Model (3DMM) to provide a surface geometric structure, making it possible to reconstruct faces with a limited set of input images. We introduce a joint optimization strategy that refines both the Gaussians and the morphable surface through a synergistic non-rigid alignment process. A novel distance metric, splat-to-surface, is proposed to improve alignment by considering both the Gaussian position and covariance. The surface information is also utilized to incorporate a world-space densification process, resulting in superior reconstruction quality. Our experimental analysis demonstrates that the proposed method is competitive with both other Gaussian splatting techniques in novel view synthesis and other 3D reconstruction methods in producing 3D face meshes with high geometric precision.
- [403] arXiv:2403.18788 [pdf, other]
-
Title: Peregrine: ML-based Malicious Traffic Detection for Terabit NetworksAuthors: João Romeiras Amado, Francisco Pereira, David Pissarra, Salvatore Signorello, Miguel Correia, Fernando M. V. RamosSubjects: Networking and Internet Architecture (cs.NI)
Malicious traffic detectors leveraging machine learning (ML), namely those incorporating deep learning techniques, exhibit impressive detection capabilities across multiple attacks. However, their effectiveness becomes compromised when deployed in networks handling Terabit-speed traffic. In practice, these systems require substantial traffic sampling to reconcile the high data plane packet rates with the comparatively slower processing speeds of ML detection. As sampling significantly reduces traffic observability, it fundamentally undermines their detection capability.
We present Peregrine, an ML-based malicious traffic detector for Terabit networks. The key idea is to run the detection process partially in the network data plane. Specifically, we offload the detector's ML feature computation to a commodity switch. The Peregrine switch processes a diversity of features per-packet, at Tbps line rates - three orders of magnitude higher than the fastest detector - to feed the ML-based component in the control plane. Our offloading approach presents a distinct advantage. While, in practice, current systems sample raw traffic, in Peregrine sampling occurs after feature computation. This essential trait enables computing features over all traffic, significantly enhancing detection performance. The Peregrine detector is not only effective for Terabit networks, but it is also energy- and cost-efficient. Further, by shifting a compute-heavy component to the switch, it saves precious CPU cycles and improves detection throughput. - [404] arXiv:2403.18791 [pdf, other]
-
Title: Object Pose Estimation via the Aggregation of Diffusion FeaturesComments: Accepted to CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Estimating the pose of objects from images is a crucial task of 3D scene understanding, and recent approaches have shown promising results on very large benchmarks. However, these methods experience a significant performance drop when dealing with unseen objects. We believe that it results from the limited generalizability of image features. To address this problem, we have an in-depth analysis on the features of diffusion models, e.g. Stable Diffusion, which hold substantial potential for modeling unseen objects. Based on this analysis, we then innovatively introduce these diffusion features for object pose estimation. To achieve this, we propose three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation. Our approach outperforms the state-of-the-art methods by a considerable margin on three popular benchmark datasets, LM, O-LM, and T-LESS. In particular, our method achieves higher accuracy than the previous best arts on unseen objects: 98.2% vs. 93.5% on Unseen LM, 85.9% vs. 76.3% on Unseen O-LM, showing the strong generalizability of our method. Our code is released at https://github.com/Tianfu18/diff-feats-pose.
- [405] arXiv:2403.18795 [pdf, other]
-
Title: Gamba: Marry Gaussian Splatting with Mamba for single view 3D reconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
We tackle the challenge of efficiently reconstructing a 3D asset from a single image with growing demands for automated 3D content creation pipelines. Previous methods primarily rely on Score Distillation Sampling (SDS) and Neural Radiance Fields (NeRF). Despite their significant success, these approaches encounter practical limitations due to lengthy optimization and considerable memory usage. In this report, we introduce Gamba, an end-to-end amortized 3D reconstruction model from single-view images, emphasizing two main insights: (1) 3D representation: leveraging a large number of 3D Gaussians for an efficient 3D Gaussian splatting process; (2) Backbone design: introducing a Mamba-based sequential network that facilitates context-dependent reasoning and linear scalability with the sequence (token) length, accommodating a substantial number of Gaussians. Gamba incorporates significant advancements in data preprocessing, regularization design, and training methodologies. We assessed Gamba against existing optimization-based and feed-forward 3D generation approaches using the real-world scanned OmniObject3D dataset. Here, Gamba demonstrates competitive generation capabilities, both qualitatively and quantitatively, while achieving remarkable speed, approximately 0.6 second on a single NVIDIA A100 GPU.
- [406] arXiv:2403.18797 [pdf, other]
-
Title: SolderlessPCB: Reusing Electronic Components in PCB Prototyping through Detachable 3D Printed HousingsJournal-ref: Proceedings of the 2024 CHI Conference on Human Factors in Computing SystemsSubjects: Human-Computer Interaction (cs.HC)
The iterative prototyping process for printed circuit boards (PCBs) frequently employs surface-mounted device (SMD) components, which are often discarded rather than reused due to the challenges associated with desoldering, leading to unnecessary electronic waste. This paper introduces SolderlessPCB, a collection of techniques for solder-free PCB prototyping, specifically designed to promote the recycling and reuse of electronic components. Central to this approach are custom 3D-printable housings that allow SMD components to be mounted onto PCBs without soldering. We detail the design of SolderlessPCB and the experiments conducted to evaluate its design parameters, electrical performance, and durability. To illustrate the potential for reusing SMD components with SolderlessPCB, we discuss two scenarios: the reuse of components from earlier design iterations and from obsolete prototypes. We also provide examples demonstrating that SolderlessPCB can handle high-current applications and is suitable for high-speed data transmission. The paper concludes by discussing the limitations of our approach and suggesting future directions to overcome these challenges.
- [407] arXiv:2403.18802 [pdf, other]
-
Title: Long-form factuality in large language modelsAuthors: Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. LeSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factuality through a method which we call Search-Augmented Factuality Evaluator (SAFE). SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether a fact is supported by the search results. Furthermore, we propose extending F1 score as an aggregated metric for long-form factuality. To do so, we balance the percentage of supported facts in a response (precision) with the percentage of provided facts relative to a hyperparameter representing a user's preferred response length (recall).
Empirically, we demonstrate that LLM agents can achieve superhuman rating performance - on a set of ~16k individual facts, SAFE agrees with crowdsourced human annotators 72% of the time, and on a random subset of 100 disagreement cases, SAFE wins 76% of the time. At the same time, SAFE is more than 20 times cheaper than human annotators. We also benchmark thirteen language models on LongFact across four model families (Gemini, GPT, Claude, and PaLM-2), finding that larger language models generally achieve better long-form factuality. LongFact, SAFE, and all experimental code are available at https://github.com/google-deepmind/long-form-factuality. - [408] arXiv:2403.18803 [pdf, other]
-
Title: Projective Methods for Mitigating Gender Bias in Pre-trained Language ModelsSubjects: Computation and Language (cs.CL)
Mitigation of gender bias in NLP has a long history tied to debiasing static word embeddings. More recently, attention has shifted to debiasing pre-trained language models. We study to what extent the simplest projective debiasing methods, developed for word embeddings, can help when applied to BERT's internal representations. Projective methods are fast to implement, use a small number of saved parameters, and make no updates to the existing model parameters. We evaluate the efficacy of the methods in reducing both intrinsic bias, as measured by BERT's next sentence prediction task, and in mitigating observed bias in a downstream setting when fine-tuned. To this end, we also provide a critical analysis of a popular gender-bias assessment test for quantifying intrinsic bias, resulting in an enhanced test set and new bias measures. We find that projective methods can be effective at both intrinsic bias and downstream bias mitigation, but that the two outcomes are not necessarily correlated. This finding serves as a warning that intrinsic bias test sets, based either on language modeling tasks or next sentence prediction, should not be the only benchmark in developing a debiased language model.
- [409] arXiv:2403.18804 [pdf, other]
-
Title: Is Modularity Transferable? A Case Study through the Lens of Knowledge DistillationComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
The rise of Modular Deep Learning showcases its potential in various Natural Language Processing applications. Parameter-efficient fine-tuning (PEFT) modularity has been shown to work for various use cases, from domain adaptation to multilingual setups. However, all this work covers the case where the modular components are trained and deployed within one single Pre-trained Language Model (PLM). This model-specific setup is a substantial limitation on the very modularity that modular architectures are trying to achieve. We ask whether current modular approaches are transferable between models and whether we can transfer the modules from more robust and larger PLMs to smaller ones. In this work, we aim to fill this gap via a lens of Knowledge Distillation, commonly used for model compression, and present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs. Moreover, we propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity. The experiments on Named Entity Recognition, Natural Language Inference, and Paraphrase Identification tasks over multiple languages and PEFT methods showcase the initial potential of transferable modularity.
- [410] arXiv:2403.18807 [pdf, other]
-
Title: ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth EstimationComments: Accepted at IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In the absence of parallax cues, a learning-based single image depth estimation (SIDE) model relies heavily on shading and contextual cues in the image. While this simplicity is attractive, it is necessary to train such models on large and varied datasets, which are difficult to capture. It has been shown that using embeddings from pre-trained foundational models, such as CLIP, improves zero shot transfer in several applications. Taking inspiration from this, in our paper we explore the use of global image priors generated from a pre-trained ViT model to provide more detailed contextual information. We argue that the embedding vector from a ViT model, pre-trained on a large dataset, captures greater relevant information for SIDE than the usual route of generating pseudo image captions, followed by CLIP based text embeddings. Based on this idea, we propose a new SIDE model using a diffusion backbone which is conditioned on ViT embeddings. Our proposed design establishes a new state-of-the-art (SOTA) for SIDE on NYUv2 dataset, achieving Abs Rel error of 0.059(14% improvement) compared to 0.069 by the current SOTA (VPD). And on KITTI dataset, achieving Sq Rel error of 0.139 (2% improvement) compared to 0.142 by the current SOTA (GEDepth). For zero-shot transfer with a model trained on NYUv2, we report mean relative improvement of (20%, 23%, 81%, 25%) over NeWCRFs on (Sun-RGBD, iBims1, DIODE, HyperSim) datasets, compared to (16%, 18%, 45%, 9%) by ZoeDepth. The code is available at https://github.com/Aradhye2002/EcoDepth.
- [411] arXiv:2403.18810 [pdf, other]
-
Title: LightningNet: Distributed Graph-based Cellular Network Performance Forecasting for the EdgeAuthors: Konstantinos Zacharopoulos, Georgios Koutroumpas, Ioannis Arapakis, Konstantinos Georgopoulos, Javad Khangosstar, Sotiris IoannidisSubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
The cellular network plays a pivotal role in providing Internet access, since it is the only global-scale infrastructure with ubiquitous mobility support. To manage and maintain large-scale networks, mobile network operators require timely information, or even accurate performance forecasts. In this paper, we propose LightningNet, a lightweight and distributed graph-based framework for forecasting cellular network performance, which can capture spatio-temporal dependencies that arise in the network traffic. LightningNet achieves a steady performance increase over state-of-the-art forecasting techniques, while maintaining a similar resource usage profile. Our architecture ideology also excels in the respect that it is specifically designed to support IoT and edge devices, giving us an even greater step ahead of the current state-of-the-art, as indicated by our performance experiments with NVIDIA Jetson.
- [412] arXiv:2403.18811 [pdf, other]
-
Title: Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance AccompanimentAuthors: Li Siyao, Tianpei Gu, Zhitao Yang, Zhengyu Lin, Ziwei Liu, Henghui Ding, Lei Yang, Chen Change LoyComments: ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
We introduce a novel task within the field of 3D dance generation, termed dance accompaniment, which necessitates the generation of responsive movements from a dance partner, the "follower", synchronized with the lead dancer's movements and the underlying musical rhythm. Unlike existing solo or group dance generation tasks, a duet dance scenario entails a heightened degree of interaction between the two participants, requiring delicate coordination in both pose and position. To support this task, we first build a large-scale and diverse duet interactive dance dataset, DD100, by recording about 117 minutes of professional dancers' performances. To address the challenges inherent in this task, we propose a GPT-based model, Duolando, which autoregressively predicts the subsequent tokenized motion conditioned on the coordinated information of the music, the leader's and the follower's movements. To further enhance the GPT's capabilities of generating stable results on unseen conditions (music and leader motions), we devise an off-policy reinforcement learning strategy that allows the model to explore viable trajectories from out-of-distribution samplings, guided by human-defined rewards. Based on the collected dataset and proposed method, we establish a benchmark with several carefully designed metrics.
- [413] arXiv:2403.18812 [pdf, other]
-
Title: On the Communication Complexity of Approximate Pattern MatchingComments: 62 pages; abstract shortenedSubjects: Data Structures and Algorithms (cs.DS); Quantum Physics (quant-ph)
The decades-old Pattern Matching with Edits problem, given a length-$n$ string $T$ (the text), a length-$m$ string $P$ (the pattern), and a positive integer $k$ (the threshold), asks to list all fragments of $T$ that are at edit distance at most $k$ from $P$. The one-way communication complexity of this problem is the minimum amount of space needed to encode the answer so that it can be retrieved without accessing the input strings $P$ and $T$.
The closely related Pattern Matching with Mismatches problem (defined in terms of the Hamming distance instead of the edit distance) is already well understood from the communication complexity perspective: Clifford, Kociumaka, and Porat [SODA 2019] proved that $\Omega(n/m \cdot k \log(m/k))$ bits are necessary and $O(n/m \cdot k\log (m|\Sigma|/k))$ bits are sufficient; the upper bound allows encoding not only the occurrences of $P$ in $T$ with at most $k$ mismatches but also the substitutions needed to make each $k$-mismatch occurrence exact.
Despite recent improvements in the running time [Charalampopoulos, Kociumaka, and Wellnitz; FOCS 2020 and 2022], the communication complexity of Pattern Matching with Edits remained unexplored, with a lower bound of $\Omega(n/m \cdot k\log(m/k))$ bits and an upper bound of $O(n/m \cdot k^3\log m)$ bits stemming from previous research. In this work, we prove an upper bound of $O(n/m \cdot k \log^2 m)$ bits, thus establishing the optimal communication complexity up to logarithmic factors. We also show that $O(n/m \cdot k \log m \log (m|\Sigma|))$ bits allow encoding, for each $k$-error occurrence of $P$ in $T$, the shortest sequence of edits needed to make the occurrence exact.
We leverage the techniques behind our new result on the communication complexity to obtain quantum algorithms for Pattern Matching with Edits. - [414] arXiv:2403.18814 [pdf, other]
-
Title: Mini-Gemini: Mining the Potential of Multi-modality Vision Language ModelsAuthors: Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya JiaComments: Code and models are available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic visual dialog and reasoning, a performance gap persists compared to advanced models like GPT-4 and Gemini. We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i.e., high-resolution visual tokens, high-quality data, and VLM-guided generation. To enhance visual tokens, we propose to utilize an additional visual encoder for high-resolution refinement without increasing the visual token count. We further construct a high-quality dataset that promotes precise image comprehension and reasoning-based generation, expanding the operational scope of current VLMs. In general, Mini-Gemini further mines the potential of VLMs and empowers current frameworks with image understanding, reasoning, and generation simultaneously. Mini-Gemini supports a series of dense and MoE Large Language Models (LLMs) from 2B to 34B. It is demonstrated to achieve leading performance in several zero-shot benchmarks and even surpasses the developed private models. Code and models are available at https://github.com/dvlab-research/MiniGemini.
- [415] arXiv:2403.18816 [pdf, other]
-
Title: Garment3DGen: 3D Garment Stylization and Texture GenerationComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce Garment3DGen a new method to synthesize 3D garment assets from a base mesh given a single input image as guidance. Our proposed approach allows users to generate 3D textured clothes based on both real and synthetic images, such as those generated by text prompts. The generated assets can be directly draped and simulated on human bodies. First, we leverage the recent progress of image to 3D diffusion methods to generate 3D garment geometries. However, since these geometries cannot be utilized directly for downstream tasks, we propose to use them as pseudo ground-truth and set up a mesh deformation optimization procedure that deforms a base template mesh to match the generated 3D target. Second, we introduce carefully designed losses that allow the input base mesh to freely deform towards the desired target, yet preserve mesh quality and topology such that they can be simulated. Finally, a texture estimation module generates high-fidelity texture maps that are globally and locally consistent and faithfully capture the input guidance, allowing us to render the generated 3D assets. With Garment3DGen users can generate the textured 3D garment of their choice without the need of artist intervention. One can provide a textual prompt describing the garment they desire to generate a simulation-ready 3D asset. We present a plethora of quantitative and qualitative comparisons on various assets both real and generated and provide use-cases of how one can generate simulation-ready 3D garments.
- [416] arXiv:2403.18818 [pdf, other]
-
Title: ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and InsertionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Diffusion models have revolutionized image editing but often generate images that violate physical laws, particularly the effects of objects on the scene, e.g., occlusions, shadows, and reflections. By analyzing the limitations of self-supervised approaches, we propose a practical solution centered on a \q{counterfactual} dataset. Our method involves capturing a scene before and after removing a single object, while minimizing other changes. By fine-tuning a diffusion model on this dataset, we are able to not only remove objects but also their effects on the scene. However, we find that applying this approach for photorealistic object insertion requires an impractically large dataset. To tackle this challenge, we propose bootstrap supervision; leveraging our object removal model trained on a small counterfactual dataset, we synthetically expand this dataset considerably. Our approach significantly outperforms prior methods in photorealistic object removal and insertion, particularly at modeling the effects of objects on the scene.
- [417] arXiv:2403.18819 [pdf, other]
-
Title: Benchmarking Object Detectors with COCO: A New Path ForwardSubjects: Computer Vision and Pattern Recognition (cs.CV)
The Common Objects in Context (COCO) dataset has been instrumental in benchmarking object detectors over the past decade. Like every dataset, COCO contains subtle errors and imperfections stemming from its annotation procedure. With the advent of high-performing models, we ask whether these errors of COCO are hindering its utility in reliably benchmarking further progress. In search for an answer, we inspect thousands of masks from COCO (2017 version) and uncover different types of errors such as imprecise mask boundaries, non-exhaustively annotated instances, and mislabeled masks. Due to the prevalence of COCO, we choose to correct these errors to maintain continuity with prior research. We develop COCO-ReM (Refined Masks), a cleaner set of annotations with visibly better mask quality than COCO-2017. We evaluate fifty object detectors and find that models that predict visually sharper masks score higher on COCO-ReM, affirming that they were being incorrectly penalized due to errors in COCO-2017. Moreover, our models trained using COCO-ReM converge faster and score higher than their larger variants trained using COCO-2017, highlighting the importance of data quality in improving object detectors. With these findings, we advocate using COCO-ReM for future object detection research. Our dataset is available at https://cocorem.xyz
- [418] arXiv:2403.18820 [pdf, other]
-
Title: MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and RenderingComments: Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Faithful human performance capture and free-view rendering from sparse RGB observations is a long-standing problem in Vision and Graphics. The main challenges are the lack of observations and the inherent ambiguities of the setting, e.g. occlusions and depth ambiguity. As a result, radiance fields, which have shown great promise in capturing high-frequency appearance and geometry details in dense setups, perform poorly when na\"ively supervising them on sparse camera views, as the field simply overfits to the sparse-view inputs. To address this, we propose MetaCap, a method for efficient and high-quality geometry recovery and novel view synthesis given very sparse or even a single view of the human. Our key idea is to meta-learn the radiance field weights solely from potentially sparse multi-view videos, which can serve as a prior when fine-tuning them on sparse imagery depicting the human. This prior provides a good network weight initialization, thereby effectively addressing ambiguities in sparse-view capture. Due to the articulated structure of the human body and motion-induced surface deformations, learning such a prior is non-trivial. Therefore, we propose to meta-learn the field weights in a pose-canonicalized space, which reduces the spatial feature range and makes feature learning more effective. Consequently, one can fine-tune our field parameters to quickly generalize to unseen poses, novel illumination conditions as well as novel and sparse (even monocular) camera views. For evaluating our method under different scenarios, we collect a new dataset, WildDynaCap, which contains subjects captured in, both, a dense camera dome and in-the-wild sparse camera rigs, and demonstrate superior results compared to recent state-of-the-art methods on both public and WildDynaCap dataset.
- [419] arXiv:2403.18821 [pdf, other]
-
Title: Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and BenchmarkAuthors: Ziyang Chen, Israel D. Gebru, Christian Richardt, Anurag Kumar, William Laney, Andrew Owens, Alexander RichardComments: Accepted to CVPR 2024. Project site: this https URLSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6DoF pose tracking data for sound emitters and listeners in the rooms. We used this dataset to evaluate existing methods for novel-view acoustic synthesis and impulse response generation which previously relied on synthetic data. In our evaluation, we thoroughly assessed existing audio and audio-visual models against multiple criteria and proposed settings to enhance their performance on real-world data. We also conducted experiments to investigate the impact of incorporating visual data (i.e., images and depth) into neural acoustic field models. Additionally, we demonstrated the effectiveness of a simple sim2real approach, where a model is pre-trained with simulated data and fine-tuned with sparse real-world data, resulting in significant improvements in the few-shot learning approach. RAF is the first dataset to provide densely captured room acoustic data, making it an ideal resource for researchers working on audio and audio-visual neural acoustic field modeling techniques. Demos and datasets are available on our project page: https://facebookresearch.github.io/real-acoustic-fields/
Cross-lists for Thu, 28 Mar 24
- [420] arXiv:2403.17946 (cross-list from math.FA) [pdf, ps, other]
-
Title: Nonlinear Heisenberg-Robertson-Schrodinger Uncertainty PrincipleAuthors: K. Mahesh KrishnaComments: 4 Pages, 0 FiguresSubjects: Functional Analysis (math.FA); Information Theory (cs.IT); Mathematical Physics (math-ph)
We derive an uncertainty principle for Lipschitz maps acting on subsets of Banach spaces. We show that this nonlinear uncertainty principle reduces to the Heisenberg-Robertson-Schrodinger uncertainty principle for linear operators acting on Hilbert spaces.
- [421] arXiv:2403.17961 (cross-list from math.CT) [pdf, ps, other]
-
Title: A categorical formulation of Kraus' paradoxAuthors: Andrew W. SwanSubjects: Category Theory (math.CT); Logic in Computer Science (cs.LO); Logic (math.LO)
We give a categorical formulation of Kraus' "magic trick" for recovering information from truncated types. Rather than type theory, we work in Van den Berg-Moerdijk path categories with a univalent universe, and rather than propositional truncation we work with arbitrary cofibrations, which includes truncation as a special case. We show, using Kraus' argument that any cofibration with homogeneous domain is a monomorphism. We give some simple concrete examples in groupoids to illustrate the interaction between homogeneous types, cofibrations and univalent fibrations.
- [422] arXiv:2403.17982 (cross-list from stat.ME) [pdf, ps, other]
-
Title: Markov chain models for inspecting response dynamics in psychological testingAuthors: Andrea BoscoComments: 20 pages, 1 figure, 3 tables, 25 equations/matrices. Part of this paper was presented to the XXIX AIP Congress, Experimental Psychology Section. September 18th-20th 2023, Lucca, Italy. Title of the talk: "Differentiating students with signs of ADHD or OCD based on hysteresis in responses to a mind-wandering test. A Study of Markov Chain Test Response Sequences"Subjects: Methodology (stat.ME); Machine Learning (cs.LG); Probability (math.PR)
The importance of considering contextual probabilities in shaping response patterns within psychological testing is underscored, despite the ubiquitous nature of order effects discussed extensively in methodological literature. Drawing from concepts such as path-dependency, first-order autocorrelation, state-dependency, and hysteresis, the present study is an attempt to address how earlier responses serve as an anchor for subsequent answers in tests, surveys, and questionnaires. Introducing the notion of non-commuting observables derived from quantum physics, I highlight their role in characterizing psychological processes and the impact of measurement instruments on participants' responses. We advocate for the utilization of first-order Markov chain modeling to capture and forecast sequential dependencies in survey and test responses. The employment of the first-order Markov chain model lies in individuals' propensity to exhibit partial focus to preceding responses, with recent items most likely exerting a substantial influence on subsequent response selection. This study contributes to advancing our understanding of the dynamics inherent in sequential data within psychological research and provides a methodological framework for conducting longitudinal analyses of response patterns of test and questionnaire.
- [423] arXiv:2403.17992 (cross-list from q-bio.QM) [pdf, other]
-
Title: Interpretable cancer cell detection with phonon microscopy using multi-task conditional neural networks for inter-batch calibrationSubjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
Advances in artificial intelligence (AI) show great potential in revealing underlying information from phonon microscopy (high-frequency ultrasound) data to identify cancerous cells. However, this technology suffers from the 'batch effect' that comes from unavoidable technical variations between each experiment, creating confounding variables that the AI model may inadvertently learn. We therefore present a multi-task conditional neural network framework to simultaneously achieve inter-batch calibration, by removing confounding variables, and accurate cell classification of time-resolved phonon-derived signals. We validate our approach by training and validating on different experimental batches, achieving a balanced precision of 89.22% and an average cross-validated precision of 89.07% for classifying background, healthy and cancerous regions. Classification can be performed in 0.5 seconds with only simple prior batch information required for multiple batch corrections. Further, we extend our model to reconstruct denoised signals, enabling physical interpretation of salient features indicating disease state including sound velocity, sound attenuation and cell-adhesion to substrate.
- [424] arXiv:2403.18026 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Cross-system biological image quality enhancement based on the generative adversarial network as a foundation for establishing a multi-institute microscopy cooperative networkAuthors: Dominik Panek, Carina Rząca, Maksymilian Szczypior, Joanna Sorysz, Krzysztof Misztal, Zbigniew Baster, Zenon RajfurComments: 15 Pages, 5 Figures, 1 Table, 3 pages Supplementary MaterialsSubjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
High-quality fluorescence imaging of biological systems is limited by processes like photobleaching and phototoxicity, and also in many cases, by limited access to the latest generations of microscopes. Moreover, low temporal resolution can lead to a motion blur effect in living systems. Our work presents a deep learning (DL) generative-adversarial approach to the problem of obtaining high-quality (HQ) images based on their low-quality (LQ) equivalents. We propose a generative-adversarial network (GAN) for contrast transfer between two different separate microscopy systems: a confocal microscope (producing HQ images) and a wide-field fluorescence microscope (producing LQ images). Our model proves that such transfer is possible, allowing us to receive HQ-generated images characterized by low mean squared error (MSE) values, high structural similarity index (SSIM), and high peak signal-to-noise ratio (PSNR) values. For our best model in the case of comparing HQ-generated images and HQ-ground truth images, the median values of the metrics are 6x10-4, 0.9413, and 31.87, for MSE, SSIM, and PSNR, respectively. In contrast, in the case of comparison between LQ and HQ ground truth median values of the metrics are equal to 0.0071, 0.8304, and 21.48 for MSE, SSIM, and PSNR respectively. Therefore, we observe a significant increase ranging from 14% to 49% for SSIM and PSNR respectively. These results, together with other single-system cross-modality studies, provide proof of concept for further implementation of a cross-system biological image quality enhancement.
- [425] arXiv:2403.18030 (cross-list from quant-ph) [pdf, other]
-
Title: EinExprs: Contraction Paths of Tensor Networks as Symbolic ExpressionsComments: 4 pages, 5 figures, submitted to JuliaCon Proceedings 2023Subjects: Quantum Physics (quant-ph); Mathematical Software (cs.MS)
Tensor Networks are graph representations of summation expressions in which vertices represent tensors and edges represent tensor indices or vector spaces. In this work, we present EinExprs.jl, a Julia package for contraction path optimization that offers state-of-art optimizers. We propose a representation of the contraction path of a Tensor Network based on symbolic expressions. Using this package the user may choose among a collection of different methods such as Greedy algorithms, or an approach based on the hypergraph partitioning problem. We benchmark this library with examples obtained from the simulation of Random Quantum Circuits (RQC), a well known example where Tensor Networks provide state-of-the-art methods.
- [426] arXiv:2403.18044 (cross-list from math.OC) [pdf, other]
-
Title: Deep polytopic autoencoders for low-dimensional linear parameter-varying approximations and nonlinear feedback designComments: 9 pages, 6 figures, 2 tablesSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Dynamical Systems (math.DS); Numerical Analysis (math.NA); Fluid Dynamics (physics.flu-dyn)
Polytopic autoencoders provide low-dimensional parametrizations of states in a polytope. For nonlinear PDEs, this is readily applied to low-dimensional linear parameter-varying (LPV) approximations as they have been exploited for efficient nonlinear controller design via series expansions of the solution to the state-dependent Riccati equation. In this work, we develop a polytopic autoencoder for control applications and show how it outperforms standard linear approaches in view of LPV approximations of nonlinear systems and how the particular architecture enables higher order series expansions at little extra computational effort. We illustrate the properties and potentials of this approach to computational nonlinear controller design for large-scale systems with a thorough numerical study.
- [427] arXiv:2403.18052 (cross-list from astro-ph.IM) [pdf, other]
-
Title: R2D2 image reconstruction with model uncertainty quantification in radio astronomyComments: submitted to IEEE EUSIPCO 2024Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
The ``Residual-to-Residual DNN series for high-Dynamic range imaging'' (R2D2) approach was recently introduced for Radio-Interferometric (RI) imaging in astronomy. R2D2's reconstruction is formed as a series of residual images, iteratively estimated as outputs of Deep Neural Networks (DNNs) taking the previous iteration's image estimate and associated data residual as inputs. In this work, we investigate the robustness of the R2D2 image estimation process, by studying the uncertainty associated with its series of learned models. Adopting an ensemble averaging approach, multiple series can be trained, arising from different random DNN initializations of the training process at each iteration. The resulting multiple R2D2 instances can also be leveraged to generate ``R2D2 samples'', from which empirical mean and standard deviation endow the algorithm with a joint estimation and uncertainty quantification functionality. Focusing on RI imaging, and adopting a telescope-specific approach, multiple R2D2 instances were trained to encompass the most general observation setting of the Very Large Array (VLA). Simulations and real-data experiments confirm that: (i) R2D2's image estimation capability is superior to that of the state-of-the-art algorithms; (ii) its ultra-fast reconstruction capability (arising from series with only few DNNs) makes the computation of multiple reconstruction samples and of uncertainty maps practical even at large image dimension; (iii) it is characterized by a very low model uncertainty.
- [428] arXiv:2403.18072 (cross-list from stat.CO) [pdf, other]
-
Title: Goal-Oriented Bayesian Optimal Experimental Design for Nonlinear Models using Markov Chain Monte CarloSubjects: Computation (stat.CO); Machine Learning (cs.LG); Methodology (stat.ME); Machine Learning (stat.ML)
Optimal experimental design (OED) provides a systematic approach to quantify and maximize the value of experimental data. Under a Bayesian approach, conventional OED maximizes the expected information gain (EIG) on model parameters. However, we are often interested in not the parameters themselves, but predictive quantities of interest (QoIs) that depend on the parameters in a nonlinear manner. We present a computational framework of predictive goal-oriented OED (GO-OED) suitable for nonlinear observation and prediction models, which seeks the experimental design providing the greatest EIG on the QoIs. In particular, we propose a nested Monte Carlo estimator for the QoI EIG, featuring Markov chain Monte Carlo for posterior sampling and kernel density estimation for evaluating the posterior-predictive density and its Kullback-Leibler divergence from the prior-predictive. The GO-OED design is then found by maximizing the EIG over the design space using Bayesian optimization. We demonstrate the effectiveness of the overall nonlinear GO-OED method, and illustrate its differences versus conventional non-GO-OED, through various test problems and an application of sensor placement for source inversion in a convection-diffusion field.
- [429] arXiv:2403.18087 (cross-list from eess.SP) [pdf, other]
-
Title: Channel Estimation and Beamforming for Beyond Diagonal Reconfigurable Intelligent SurfacesComments: 12 pages, 10 figures, submitted to IEEE journalSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
Beyond diagonal reconfigurable intelligent surface (BD-RIS) is a new advance and generalization of the RIS technique. BD-RIS breaks through the isolation between RIS elements by creatively introducing inter-element connections, thereby enabling smarter wave manipulation and enlarging coverage. However, exploring proper channel estimation schemes suitable for BD-RIS aided communication systems still remains an open problem. In this paper, we study channel estimation and beamforming design for BD-RIS aided multi-antenna systems. We first describe the channel estimation strategy based on the least square (LS) method, derive the mean square error (MSE) of the LS estimation, and formulate the joint pilot sequence and BD-RIS design problem with unique constraints induced by BD-RIS architectures. Specifically, we propose an efficient pilot sequence and BD-RIS design which theoretically guarantees to achieve the minimum MSE. With the estimated channel, we then consider two BD-RIS scenarios and propose beamforming design algorithms. Finally, we provide simulation results to verify the effectiveness of the proposed channel estimation scheme and beamforming design algorithms. We also show that more interelement connections in BD-RIS improves the performance while increasing the training overhead for channel estimation.
- [430] arXiv:2403.18102 (cross-list from math.CT) [pdf, ps, other]
-
Title: The operadic theory of convexityComments: 42 pagesSubjects: Category Theory (math.CT); Information Theory (cs.IT); Quantum Physics (quant-ph)
In this article, we characterize convexity in terms of algebras over a PROP, and establish a tensor-product-like symmetric monoidal structure on the category of convex sets. Using these two structures, and the theory of $\scr{O}$-monoidal categories, we state and prove a Grothendieck construction for lax $\scr{O}$-monoidal functors into convex sets. We apply this construction to the categorical characterization of entropy of Baez, Fritz, and Leinster, and to the study of quantum contextuality in the framework of simplicial distributions.
- [431] arXiv:2403.18130 (cross-list from math.OC) [pdf, other]
-
Title: Generalized Maximum Entropy Differential Dynamic ProgrammingComments: 7 pages, 5 figures, This paper is for CDC 2024Subjects: Optimization and Control (math.OC); Information Theory (cs.IT)
We present a sampling-based trajectory optimization method derived from the maximum entropy formulation of Differential Dynamic Programming with Tsallis entropy. This method can be seen as a generalization of the legacy work with Shannon entropy, which leads to a Gaussian optimal control policy for exploration during optimization. With the Tsallis entropy, the optimal control policy takes the form of $q$-Gaussian, which further encourages exploration with its heavy-tailed shape. Moreover, in our formulation, the exploration variance, which was scaled by a fixed constant inverse temperature in the original formulation with Shannon entropy, is automatically scaled based on the value function of the trajectory. Due to this property, our algorithms can promote exploration when necessary, that is, the cost of the trajectory is high, rather than using the same scaling factor. The simulation results demonstrate the properties of the proposed algorithm described above.
- [432] arXiv:2403.18134 (cross-list from eess.IV) [pdf, other]
-
Title: Integrative Graph-Transformer Framework for Histopathology Whole Slide Image Representation and ClassificationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
In digital pathology, the multiple instance learning (MIL) strategy is widely used in the weakly supervised histopathology whole slide image (WSI) classification task where giga-pixel WSIs are only labeled at the slide level. However, existing attention-based MIL approaches often overlook contextual information and intrinsic spatial relationships between neighboring tissue tiles, while graph-based MIL frameworks have limited power to recognize the long-range dependencies. In this paper, we introduce the integrative graph-transformer framework that simultaneously captures the context-aware relational features and global WSI representations through a novel Graph Transformer Integration (GTI) block. Specifically, each GTI block consists of a Graph Convolutional Network (GCN) layer modeling neighboring relations at the local instance level and an efficient global attention model capturing comprehensive global information from extensive feature embeddings. Extensive experiments on three publicly available WSI datasets: TCGA-NSCLC, TCGA-RCC and BRIGHT, demonstrate the superiority of our approach over current state-of-the-art MIL methods, achieving an improvement of 1.0% to 2.6% in accuracy and 0.7%-1.6% in AUROC.
- [433] arXiv:2403.18139 (cross-list from eess.IV) [pdf, other]
-
Title: Pseudo-MRI-Guided PET Image Reconstruction Method Based on a Diffusion Probabilistic ModelAuthors: Weijie Gan, Huidong Xie, Carl von Gall, Günther Platsch, Michael T. Jurkiewicz, Andrea Andrade, Udunna C. Anazodo, Ulugbek S. Kamilov, Hongyu An, Jorge CabelloSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Anatomically guided PET reconstruction using MRI information has been shown to have the potential to improve PET image quality. However, these improvements are limited to PET scans with paired MRI information. In this work we employed a diffusion probabilistic model (DPM) to infer T1-weighted-MRI (deep-MRI) images from FDG-PET brain images. We then use the DPM-generated T1w-MRI to guide the PET reconstruction. The model was trained with brain FDG scans, and tested in datasets containing multiple levels of counts. Deep-MRI images appeared somewhat degraded than the acquired MRI images. Regarding PET image quality, volume of interest analysis in different brain regions showed that both PET reconstructed images using the acquired and the deep-MRI images improved image quality compared to OSEM. Same conclusions were found analysing the decimated datasets. A subjective evaluation performed by two physicians confirmed that OSEM scored consistently worse than the MRI-guided PET images and no significant differences were observed between the MRI-guided PET images. This proof of concept shows that it is possible to infer DPM-based MRI imagery to guide the PET reconstruction, enabling the possibility of changing reconstruction parameters such as the strength of the prior on anatomically guided PET reconstruction in the absence of MRI.
- [434] arXiv:2403.18151 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Automated Report Generation for Lung Cytological Images Using a CNN Vision Classifier and Multiple-Transformer Text Decoders: Preliminary StudyAuthors: Atsushi Teramoto, Ayano Michiba, Yuka Kiriyama, Tetsuya Tsukamoto, Kazuyoshi Imaizumi, Hiroshi FujitaComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Medical Physics (physics.med-ph)
Cytology plays a crucial role in lung cancer diagnosis. Pulmonary cytology involves cell morphological characterization in the specimen and reporting the corresponding findings, which are extremely burdensome tasks. In this study, we propose a report-generation technique for lung cytology images. In total, 71 benign and 135 malignant pulmonary cytology specimens were collected. Patch images were extracted from the captured specimen images, and the findings were assigned to each image as a dataset for report generation. The proposed method consists of a vision model and a text decoder. In the former, a convolutional neural network (CNN) is used to classify a given image as benign or malignant, and the features related to the image are extracted from the intermediate layer. Independent text decoders for benign and malignant cells are prepared for text generation, and the text decoder switches according to the CNN classification results. The text decoder is configured using a Transformer that uses the features obtained from the CNN for report generation. Based on the evaluation results, the sensitivity and specificity were 100% and 96.4%, respectively, for automated benign and malignant case classification, and the saliency map indicated characteristic benign and malignant areas. The grammar and style of the generated texts were confirmed as correct and in better agreement with gold standard compared to existing LLM-based image-captioning methods and single-text-decoder ablation model. These results indicate that the proposed method is useful for pulmonary cytology classification and reporting.
- [435] arXiv:2403.18155 (cross-list from math.OC) [pdf, other]
-
Title: An inexact infeasible arc-search interior-point method for linear programming problemsComments: 25 pages, 3 figuresSubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
Inexact interior-point methods (IPMs) are a type of interior-point methods that inexactly solve the linear equation system for obtaining the search direction. On the other hand,arc-search IPMs approximate the central path with an ellipsoidal arc obtained by solving two linear equation systems in each iteration, while conventional line-search IPMs solve one linear system, therefore, the improvement due to the inexact solutions of the linear equation systems can be more beneficial in arc-search IPMs than conventional IPMs. In this paper, we propose an inexact infeasible arc-search interior-point method.We establish that the proposed method is a polynomial-time algorithm through its convergence analysis. The numerical experiments with the conjugate gradient method show that the proposed method can reduce the number of iterations compared to an existing method for benchmark problems; the numbers of iterations are reduced to two-thirds for more than 70% of the problems.
- [436] arXiv:2403.18198 (cross-list from eess.IV) [pdf, other]
-
Title: Generative Medical SegmentationSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Rapid advancements in medical image segmentation performance have been significantly driven by the development of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). However, these models introduce high computational demands and often have limited ability to generalize across diverse medical imaging datasets. In this manuscript, we introduce Generative Medical Segmentation (GMS), a novel approach leveraging a generative model for image segmentation. Concretely, GMS employs a robust pre-trained Variational Autoencoder (VAE) to derive latent representations of both images and masks, followed by a mapping model that learns the transition from image to mask in the latent space. This process culminates in generating a precise segmentation mask within the image space using the pre-trained VAE decoder. The design of GMS leads to fewer learnable parameters in the model, resulting in a reduced computational burden and enhanced generalization capability. Our extensive experimental analysis across five public datasets in different medical imaging domains demonstrates GMS outperforms existing discriminative segmentation models and has remarkable domain generalization. Our experiments suggest GMS could set a new benchmark for medical image segmentation, offering a scalable and effective solution. GMS implementation and model weights are available at https://github.com/King-HAW/GMS.
- [437] arXiv:2403.18216 (cross-list from stat.ML) [pdf, other]
-
Title: Minimax Optimal Fair Classification with Bounded Demographic DisparitySubjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG); Statistics Theory (math.ST)
Mitigating the disparate impact of statistical machine learning methods is crucial for ensuring fairness. While extensive research aims to reduce disparity, the effect of using a \emph{finite dataset} -- as opposed to the entire population -- remains unclear. This paper explores the statistical foundations of fair binary classification with two protected groups, focusing on controlling demographic disparity, defined as the difference in acceptance rates between the groups. Although fairness may come at the cost of accuracy even with infinite data, we show that using a finite sample incurs additional costs due to the need to estimate group-specific acceptance thresholds. We study the minimax optimal classification error while constraining demographic disparity to a user-specified threshold. To quantify the impact of fairness constraints, we introduce a novel measure called \emph{fairness-aware excess risk} and derive a minimax lower bound on this measure that all classifiers must satisfy. Furthermore, we propose FairBayes-DDP+, a group-wise thresholding method with an offset that we show attains the minimax lower bound. Our lower bound proofs involve several innovations. Experiments support that FairBayes-DDP+ controls disparity at the user-specified level, while being faster and having a more favorable fairness-accuracy tradeoff than several baselines.
- [438] arXiv:2403.18233 (cross-list from eess.IV) [pdf, other]
-
Title: Benchmarking Image Transformers for Prostate Cancer Detection from Ultrasound DataAuthors: Mohamed Harmanani, Paul F. R. Wilson, Fahimeh Fooladgar, Amoon Jamzad, Mahdi Gilany, Minh Nguyen Nhat To, Brian Wodlinger, Purang Abolmaesumi, Parvin MousaviComments: early draft, 7 pages; Accepted to SPIE Medical Imaging 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Tissues and Organs (q-bio.TO)
PURPOSE: Deep learning methods for classifying prostate cancer (PCa) in ultrasound images typically employ convolutional networks (CNNs) to detect cancer in small regions of interest (ROI) along a needle trace region. However, this approach suffers from weak labelling, since the ground-truth histopathology labels do not describe the properties of individual ROIs. Recently, multi-scale approaches have sought to mitigate this issue by combining the context awareness of transformers with a CNN feature extractor to detect cancer from multiple ROIs using multiple-instance learning (MIL). In this work, we present a detailed study of several image transformer architectures for both ROI-scale and multi-scale classification, and a comparison of the performance of CNNs and transformers for ultrasound-based prostate cancer classification. We also design a novel multi-objective learning strategy that combines both ROI and core predictions to further mitigate label noise. METHODS: We evaluate 3 image transformers on ROI-scale cancer classification, then use the strongest model to tune a multi-scale classifier with MIL. We train our MIL models using our novel multi-objective learning strategy and compare our results to existing baselines. RESULTS: We find that for both ROI-scale and multi-scale PCa detection, image transformer backbones lag behind their CNN counterparts. This deficit in performance is even more noticeable for larger models. When using multi-objective learning, we can improve performance of MIL, with a 77.9% AUROC, a sensitivity of 75.9%, and a specificity of 66.3%. CONCLUSION: Convolutional networks are better suited for modelling sparse datasets of prostate ultrasounds, producing more robust features than transformers in PCa detection. Multi-scale methods remain the best architecture for this task, with multi-objective learning presenting an effective way to improve performance.
- [439] arXiv:2403.18257 (cross-list from eess.AS) [pdf, other]
-
Title: Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech SeparationSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
Transformers have been the most successful architecture for various speech modeling tasks, including speech separation. However, the self-attention mechanism in transformers with quadratic complexity is inefficient in computation and memory. Recent models incorporate new layers and modules along with transformers for better performance but also introduce extra model complexity. In this work, we replace transformers with Mamba, a selective state space model, for speech separation. We propose dual-path Mamba, which models short-term and long-term forward and backward dependency of speech signals using selective state spaces. Our experimental results on the WSJ0-2mix data show that our dual-path Mamba models match or outperform dual-path transformer models Sepformer with only 60% of its parameters, and the QDPN with only 30% of its parameters. Our large model also reaches a new state-of-the-art SI-SNRi of 24.4 dB.
- [440] arXiv:2403.18269 (cross-list from stat.ML) [pdf, other]
-
Title: Clustering Change Sign Detection by Fusing Mixture ComplexityComments: 23 pagesSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
This paper proposes an early detection method for cluster structural changes. Cluster structure refers to discrete structural characteristics, such as the number of clusters, when data are represented using finite mixture models, such as Gaussian mixture models. We focused on scenarios in which the cluster structure gradually changed over time. For finite mixture models, the concept of mixture complexity (MC) measures the continuous cluster size by considering the cluster proportion bias and overlap between clusters. In this paper, we propose MC fusion as an extension of MC to handle situations in which multiple mixture numbers are possible in a finite mixture model. By incorporating the fusion of multiple models, our approach accurately captured the cluster structure during transitional periods of gradual change. Moreover, we introduce a method for detecting changes in the cluster structure by examining the transition of MC fusion. We demonstrate the effectiveness of our method through empirical analysis using both artificial and real-world datasets.
- [441] arXiv:2403.18302 (cross-list from astro-ph.SR) [pdf, other]
-
Title: Super-Resolution of SOHO/MDI Magnetograms of Solar Active Regions Using SDO/HMI Data and an Attention-Aided Convolutional Neural NetworkComments: 17 pages, 7 figuresSubjects: Solar and Stellar Astrophysics (astro-ph.SR); Machine Learning (cs.LG)
Image super-resolution has been an important subject in image processing and recognition. Here, we present an attention-aided convolutional neural network (CNN) for solar image super-resolution. Our method, named SolarCNN, aims to enhance the quality of line-of-sight (LOS) magnetograms of solar active regions (ARs) collected by the Michelson Doppler Imager (MDI) on board the Solar and Heliospheric Observatory (SOHO). The ground-truth labels used for training SolarCNN are the LOS magnetograms collected by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO). Solar ARs consist of strong magnetic fields in which magnetic energy can suddenly be released to produce extreme space weather events, such as solar flares, coronal mass ejections, and solar energetic particles. SOHO/MDI covers Solar Cycle 23, which is stronger with more eruptive events than Cycle 24. Enhanced SOHO/MDI magnetograms allow for better understanding and forecasting of violent events of space weather. Experimental results show that SolarCNN improves the quality of SOHO/MDI magnetograms in terms of the structural similarity index measure (SSIM), Pearson's correlation coefficient (PCC), and the peak signal-to-noise ratio (PSNR).
- [442] arXiv:2403.18339 (cross-list from eess.IV) [pdf, other]
-
Title: H2ASeg: Hierarchical Adaptive Interaction and Weighting Network for Tumor Segmentation in PET/CT ImagesComments: 10 pages,4 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Positron emission tomography (PET) combined with computed tomography (CT) imaging is routinely used in cancer diagnosis and prognosis by providing complementary information. Automatically segmenting tumors in PET/CT images can significantly improve examination efficiency. Traditional multi-modal segmentation solutions mainly rely on concatenation operations for modality fusion, which fail to effectively model the non-linear dependencies between PET and CT modalities. Recent studies have investigated various approaches to optimize the fusion of modality-specific features for enhancing joint representations. However, modality-specific encoders used in these methods operate independently, inadequately leveraging the synergistic relationships inherent in PET and CT modalities, for example, the complementarity between semantics and structure. To address these issues, we propose a Hierarchical Adaptive Interaction and Weighting Network termed H2ASeg to explore the intrinsic cross-modal correlations and transfer potential complementary information. Specifically, we design a Modality-Cooperative Spatial Attention (MCSA) module that performs intra- and inter-modal interactions globally and locally. Additionally, a Target-Aware Modality Weighting (TAMW) module is developed to highlight tumor-related features within multi-modal features, thereby refining tumor segmentation. By embedding these modules across different layers, H2ASeg can hierarchically model cross-modal correlations, enabling a nuanced understanding of both semantic and structural tumor features. Extensive experiments demonstrate the superiority of H2ASeg, outperforming state-of-the-art methods on AutoPet-II and Hecktor2022 benchmarks. The code is released at https://github.com/G14nTDo4/H2ASeg.
- [443] arXiv:2403.18347 (cross-list from astro-ph.SR) [pdf, other]
-
Title: A Quantum Fuzzy-based Approach for Real-Time Detection of Solar Coronal HolesComments: 14 pages, 5 figures, 3 tablesSubjects: Solar and Stellar Astrophysics (astro-ph.SR); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
The detection and analysis of the solar coronal holes (CHs) is an important field of study in the domain of solar physics. Mainly, it is required for the proper prediction of the geomagnetic storms which directly or indirectly affect various space and ground-based systems. For the detection of CHs till date, the solar scientist depends on manual hand-drawn approaches. However, with the advancement of image processing technologies, some automated image segmentation methods have been used for the detection of CHs. In-spite of this, fast and accurate detection of CHs are till a major issues. Here in this work, a novel quantum computing-based fast fuzzy c-mean technique has been developed for fast detection of the CHs region. The task has been carried out in two stages, in first stage the solar image has been segmented using a quantum computing based fast fuzzy c-mean (QCFFCM) and in the later stage the CHs has been extracted out from the segmented image based on image morphological operation. In the work, quantum computing has been used to optimize the cost function of the fast fuzzy c-mean (FFCM) algorithm, where quantum approximate optimization algorithm (QAOA) has been used to optimize the quadratic part of the cost function. The proposed method has been tested for 193 \AA{} SDO/AIA full-disk solar image datasets and has been compared with the existing techniques. The outcome shows the comparable performance of the proposed method with the existing one within a very lesser time.
- [444] arXiv:2403.18355 (cross-list from stat.ML) [pdf, other]
-
Title: Supervised Multiple Kernel Learning approaches for multi-omics data integrationAuthors: Mitja Briscik (IMT), Gabriele Tazza, Marie-Agnes Dillies, László Vidács, Sébastien Dejean (IMT)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel learning (MKL) has shown to be a flexible and valid approach to consider the diverse nature of multi-omics inputs, despite being an underused tool in genomic data mining.We provide novel MKL approaches based on different kernel fusion strategies.To learn from the meta-kernel of input kernels, we adaptedunsupervised integration algorithms for supervised tasks with support vector machines.We also tested deep learning architectures for kernel fusion and classification.The results show that MKL-based models can compete with more complex, state-of-the-art, supervised multi-omics integrative approaches. Multiple kernel learning offers a natural framework for predictive models in multi-omics genomic data. Our results offer a direction for bio-data mining research and further development of methods for heterogeneous data integration.
- [445] arXiv:2403.18468 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Deep Learning Segmentation and Classification of Red Blood Cells Using a Large Multi-Scanner DatasetComments: 15 pages, 12 figures, 8 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Digital pathology has recently been revolutionized by advancements in artificial intelligence, deep learning, and high-performance computing. With its advanced tools, digital pathology can help improve and speed up the diagnostic process, reduce human errors, and streamline the reporting step. In this paper, we report a new large red blood cell (RBC) image dataset and propose a two-stage deep learning framework for RBC image segmentation and classification. The dataset is a highly diverse dataset of more than 100K RBCs containing eight different classes. The dataset, which is considerably larger than any publicly available hematopathology dataset, was labeled independently by two hematopathologists who also manually created masks for RBC cell segmentation. Subsequently, in the proposed framework, first, a U-Net model was trained to achieve automatic RBC image segmentation. Second, an EfficientNetB0 model was trained to classify RBC images into one of the eight classes using a transfer learning approach with a 5X2 cross-validation scheme. An IoU of 98.03% and an average classification accuracy of 96.5% were attained on the test set. Moreover, we have performed experimental comparisons against several prominent CNN models. These comparisons show the superiority of the proposed model with a good balance between performance and computational cost.
- [446] arXiv:2403.18501 (cross-list from eess.IV) [pdf, other]
-
Title: HEMIT: H&E to Multiplex-immunohistochemistry Image Translation with Dual-Branch Pix2pix GeneratorSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Computational analysis of multiplexed immunofluorescence histology data is emerging as an important method for understanding the tumour micro-environment in cancer. This work presents HEMIT, a dataset designed for translating Hematoxylin and Eosin (H&E) sections to multiplex-immunohistochemistry (mIHC) images, featuring DAPI, CD3, and panCK markers. Distinctively, HEMIT's mIHC images are multi-component and cellular-level aligned with H&E, enriching supervised stain translation tasks. To our knowledge, HEMIT is the first publicly available cellular-level aligned dataset that enables H&E to multi-target mIHC image translation. This dataset provides the computer vision community with a valuable resource to develop novel computational methods which have the potential to gain new insights from H&E slide archives.
We also propose a new dual-branch generator architecture, using residual Convolutional Neural Networks (CNNs) and Swin Transformers which achieves better translation outcomes than other popular algorithms. When evaluated on HEMIT, it outperforms pix2pixHD, pix2pix, U-Net, and ResNet, achieving the highest overall score on key metrics including the Structural Similarity Index Measure (SSIM), Pearson correlation score (R), and Peak signal-to-noise Ratio (PSNR). Additionally, downstream analysis has been used to further validate the quality of the generated mIHC images. These results set a new benchmark in the field of stain translation tasks. - [447] arXiv:2403.18514 (cross-list from eess.IV) [pdf, other]
-
Title: CT-3DFlow : Leveraging 3D Normalizing Flows for Unsupervised Detection of Pathological Pulmonary CT scansAuthors: Aissam Djahnine, Alexandre Popoff, Emilien Jupin-Delevaux, Vincent Cottin, Olivier Nempont, Loic BousselSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Unsupervised pathology detection can be implemented by training a model on healthy data only and measuring the deviation from the training set upon inference, for example with CNN-based feature extraction and one-class classifiers, or reconstruction-score-based methods such as AEs, GANs and Diffusion models. Normalizing Flows (NF) have the ability to directly learn the probability distribution of training examples through an invertible architecture. We leverage this property in a novel 3D NF-based model named CT-3DFlow, specifically tailored for patient-level pulmonary pathology detection in chest CT data. Our model is trained unsupervised on healthy 3D pulmonary CT patches, and detects deviations from its log-likelihood distribution as anomalies. We aggregate patches-level likelihood values from a patient's CT scan to provide a patient-level 'normal'/'abnormal' prediction. Out-of-distribution detection performance is evaluated using expert annotations on a separate chest CT test dataset, outperforming other state-of-the-art methods.
- [448] arXiv:2403.18535 (cross-list from eess.IV) [pdf, other]
-
Title: Theoretical Bound-Guided Hierarchical VAE for Neural Image CodecsComments: 2024 IEEE International Conference on Multimedia and Expo (ICME2024)Subjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG)
Recent studies reveal a significant theoretical link between variational autoencoders (VAEs) and rate-distortion theory, notably in utilizing VAEs to estimate the theoretical upper bound of the information rate-distortion function of images. Such estimated theoretical bounds substantially exceed the performance of existing neural image codecs (NICs). To narrow this gap, we propose a theoretical bound-guided hierarchical VAE (BG-VAE) for NIC. The proposed BG-VAE leverages the theoretical bound to guide the NIC model towards enhanced performance. We implement the BG-VAE using Hierarchical VAEs and demonstrate its effectiveness through extensive experiments. Along with advanced neural network blocks, we provide a versatile, variable-rate NIC that outperforms existing methods when considering both rate-distortion performance and computational complexity. The code is available at BG-VAE.
- [449] arXiv:2403.18540 (cross-list from stat.ML) [pdf, other]
-
Title: skscope: Fast Sparsity-Constrained Optimization in PythonAuthors: Zezhi Wang, Jin Zhu, Peng Chen, Huiyang Peng, Xiaoke Zhang, Anran Wang, Yu Zheng, Junxian Zhu, Xueqin WangComments: 4 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Computation (stat.CO)
Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two examples in the paper, where sparse linear regression and trend filtering are addressed with just four lines of code. More importantly, skscope's efficient implementation allows state-of-the-art solvers to quickly attain the sparse solution regardless of the high dimensionality of parameter space. Numerical experiments reveal the available solvers in skscope can achieve up to 80x speedup on the competing relaxation solutions obtained via the benchmarked convex solver. skscope is published on the Python Package Index (PyPI) and Conda, and its source code is available at: https://github.com/abess-team/skscope.
- [450] arXiv:2403.18557 (cross-list from math.OC) [pdf, other]
-
Title: Stability Properties of the Impulsive Goodwin's Oscillator in 1-cycleComments: submitted to IEEE CDC 2024Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
The Impulsive Goodwin's Oscillator (IGO) is a mathematical model of a hybrid closed-loop system. It arises by closing a special kind of continuous linear positive time-invariant system with impulsive feedback, which employs both amplitude and frequency pulse modulation. The structure of IGO precludes the existence of equilibria, and all its solutions are oscillatory. With its origin in mathematical biology, the IGO also presents a control paradigm useful in a wide range of applications, in particular dosing of chemicals and medicines. Since the pulse modulation feedback mechanism introduces significant nonlinearity and non-smoothness in the closedloop dynamics, conventional controller design methods fail to apply. However, the hybrid dynamics of IGO reduce to a nonlinear, time-invariant discrete-time system, exhibiting a one-to-one correspondence between periodic solutions of the original IGO and those of the discrete-time system. The paper proposes a design approach that leverages the linearization of the equivalent discrete-time dynamics in the vicinity of a fixed point. A simple and efficient local stability condition of the 1-cycle in terms of the characteristics of the amplitude and frequency modulation functions is obtained.
- [451] arXiv:2403.18560 (cross-list from eess.AS) [pdf, other]
-
Title: Noise-Robust Keyword Spotting through Self-supervised PretrainingSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Voice assistants are now widely available, and to activate them a keyword spotting (KWS) algorithm is used. Modern KWS systems are mainly trained using supervised learning methods and require a large amount of labelled data to achieve a good performance. Leveraging unlabelled data through self-supervised learning (SSL) has been shown to increase the accuracy in clean conditions. This paper explores how SSL pretraining such as Data2Vec can be used to enhance the robustness of KWS models in noisy conditions, which is under-explored.
Models of three different sizes are pretrained using different pretraining approaches and then fine-tuned for KWS. These models are then tested and compared to models trained using two baseline supervised learning methods, one being standard training using clean data and the other one being multi-style training (MTR). The results show that pretraining and fine-tuning on clean data is superior to supervised learning on clean data across all testing conditions, and superior to supervised MTR for testing conditions of SNR above 5 dB. This indicates that pretraining alone can increase the model's robustness. Finally, it is found that using noisy data for pretraining models, especially with the Data2Vec-denoising approach, significantly enhances the robustness of KWS models in noisy conditions. - [452] arXiv:2403.18578 (cross-list from stat.ML) [pdf, other]
-
Title: SteinGen: Generating Fidelitous and Diverse Graph SamplesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Generating graphs that preserve characteristic structures while promoting sample diversity can be challenging, especially when the number of graph observations is small. Here, we tackle the problem of graph generation from only one observed graph. The classical approach of graph generation from parametric models relies on the estimation of parameters, which can be inconsistent or expensive to compute due to intractable normalisation constants. Generative modelling based on machine learning techniques to generate high-quality graph samples avoids parameter estimation but usually requires abundant training samples. Our proposed generating procedure, SteinGen, which is phrased in the setting of graphs as realisations of exponential random graph models, combines ideas from Stein's method and MCMC by employing Markovian dynamics which are based on a Stein operator for the target model. SteinGen uses the Glauber dynamics associated with an estimated Stein operator to generate a sample, and re-estimates the Stein operator from the sample after every sampling step. We show that on a class of exponential random graph models this novel "estimation and re-estimation" generation strategy yields high distributional similarity (high fidelity) to the original data, combined with high sample diversity.
- [453] arXiv:2403.18582 (cross-list from hep-ph) [pdf, other]
-
Title: One flow to correct them all: improving simulations in high-energy physics with a single normalising flow and a switchAuthors: Caio Cesar Daumann, Mauro Donega, Johannes Erdmann, Massimiliano Galli, Jan Lukas Späh, Davide ValsecchiComments: 19 pages, 12 figuresSubjects: High Energy Physics - Phenomenology (hep-ph); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex); Data Analysis, Statistics and Probability (physics.data-an)
Simulated events are key ingredients in almost all high-energy physics analyses. However, imperfections in the simulation can lead to sizeable differences between the observed data and simulated events. The effects of such mismodelling on relevant observables must be corrected either effectively via scale factors, with weights or by modifying the distributions of the observables and their correlations. We introduce a correction method that transforms one multidimensional distribution (simulation) into another one (data) using a simple architecture based on a single normalising flow with a boolean condition. We demonstrate the effectiveness of the method on a physics-inspired toy dataset with non-trivial mismodelling of several observables and their correlations.
- [454] arXiv:2403.18589 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Users prefer Jpegli over same-sized libjpeg-turbo or MozJPEGSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
We performed pairwise comparisons by human raters of JPEG images from MozJPEG, libjpeg-turbo and our new Jpegli encoder. When compressing images at a quality similar to libjpeg-turbo quality 95, the Jpegli images were 54% likely to be preferred over both libjpeg-turbo and MozJPEG images, but used only 2.8 bits per pixel compared to libjpeg-turbo and MozJPEG that used 3.8 and 3.5 bits per pixel respectively. The raw ratings and source images are publicly available for further analysis and study.
- [455] arXiv:2403.18597 (cross-list from cond-mat.mtrl-sci) [pdf, other]
-
Title: Heterogeneous Peridynamic Neural Operators: Discover Biotissue Constitutive Law and Microstructure From Digital Image Correlation MeasurementsAuthors: Siavash Jafarzadeh, Stewart Silling, Lu Zhang, Colton Ross, Chung-Hao Lee, S. M. Rakibur Rahman, Shuodao Wang, Yue YuSubjects: Materials Science (cond-mat.mtrl-sci); Machine Learning (cs.LG)
Human tissues are highly organized structures with specific collagen fiber arrangements varying from point to point. The effects of such heterogeneity play an important role for tissue function, and hence it is of critical to discover and understand the distribution of such fiber orientations from experimental measurements, such as the digital image correlation data. To this end, we introduce the heterogeneous peridynamic neural operator (HeteroPNO) approach, for data-driven constitutive modeling of heterogeneous anisotropic materials. The goal is to learn both a nonlocal constitutive law together with the material microstructure, in the form of a heterogeneous fiber orientation field, from loading field-displacement field measurements. To this end, we propose a two-phase learning approach. Firstly, we learn a homogeneous constitutive law in the form of a neural network-based kernel function and a nonlocal bond force, to capture complex homogeneous material responses from data. Then, in the second phase we reinitialize the learnt bond force and the kernel function, and training them together with a fiber orientation field for each material point. Owing to the state-based peridynamic skeleton, our HeteroPNO-learned material models are objective and have the balance of linear and angular momentum guaranteed. Moreover, the effects from heterogeneity and nonlinear constitutive relationship are captured by the kernel function and the bond force respectively, enabling physical interpretability. As a result, our HeteroPNO architecture can learn a constitutive model for a biological tissue with anisotropic heterogeneous response undergoing large deformation regime. Moreover, the framework is capable to provide displacement and stress field predictions for new and unseen loading instances.
- [456] arXiv:2403.18636 (cross-list from eess.AS) [pdf, other]
-
Title: A Diffusion-Based Generative Equalizer for Music RestorationComments: Submitted to DAFx24. Historical music restoration examples are available at: this http URLSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
This paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of significant improvements. This research broadens the concept of bandwidth extension to \emph{generative equalization}, a novel task that, to the best of our knowledge, has not been explicitly addressed in previous studies. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing a marked enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music.
- [457] arXiv:2403.18637 (cross-list from eess.IV) [pdf, other]
-
Title: Transformers-based architectures for stroke segmentation: A reviewSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Stroke remains a significant global health concern, necessitating precise and efficient diagnostic tools for timely intervention and improved patient outcomes. The emergence of deep learning methodologies has transformed the landscape of medical image analysis. Recently, Transformers, initially designed for natural language processing, have exhibited remarkable capabilities in various computer vision applications, including medical image analysis. This comprehensive review aims to provide an in-depth exploration of the cutting-edge Transformer-based architectures applied in the context of stroke segmentation. It commences with an exploration of stroke pathology, imaging modalities, and the challenges associated with accurate diagnosis and segmentation. Subsequently, the review delves into the fundamental ideas of Transformers, offering detailed insights into their architectural intricacies and the underlying mechanisms that empower them to effectively capture complex spatial information within medical images. The existing literature is systematically categorized and analyzed, discussing various approaches that leverage Transformers for stroke segmentation. A critical assessment is provided, highlighting the strengths and limitations of these methods, including considerations of performance and computational efficiency. Additionally, this review explores potential avenues for future research and development
- [458] arXiv:2403.18664 (cross-list from stat.ML) [pdf, other]
-
Title: Neural Network-Based Piecewise Survival ModelsComments: 7 pagesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Systems and Control (eess.SY)
In this paper, a family of neural network-based survival models is presented. The models are specified based on piecewise definitions of the hazard function and the density function on a partitioning of the time; both constant and linear piecewise definitions are presented, resulting in a family of four models. The models can be seen as an extension of the commonly used discrete-time and piecewise exponential models and thereby add flexibility to this set of standard models. Using a simulated dataset the models are shown to perform well compared to the highly expressive, state-of-the-art energy-based model, while only requiring a fraction of the computation time.
- [459] arXiv:2403.18686 (cross-list from math.PR) [pdf, other]
-
Title: Decision-Epoch Matters: Unveiling its Impact on the Stability of Scheduling with Randomly Varying ConnectivitySubjects: Probability (math.PR); Performance (cs.PF)
A classical queuing theory result states that in a parallel-queue single-server model, the maximum stability region does not depend on the scheduling decision epochs, and in particular is the same for preemptive and non-preemptive systems. We consider here the case in which each of the queues may be connected to the server or not, depending on an exogenous process. In our main result, we show that the maximum stability region now does strongly depend on how the decision epochs are defined. We compare the setting where decisions can be made at any moment in time (the unconstrained setting), to two other settings: decisions are taken either (i) at moments of a departure (non-preemptive scheduling), or (ii) when an exponentially clock rings with rate $\gamma$. We characterise the maximum stability region for the two constrained configurations, allowing us to observe a reduction compared to the unconstrained configuration. In the non-preemptive setting, the maximum stability region is drastically reduced compared to the unconstrained setting and we conclude that a non-preemptive scheduler cannot take opportunistically advantage (in terms of stability) of the random varying connectivity. Instead, for the $\gamma$ decision epochs, we observe that the maximum stability region is monotone in the rate of the decision moments $\gamma$, and that one can be arbitrarily close to the maximum stability region in the unconstrained setting if we choose $\gamma$ large enough. We further show that Serve Longest Connected (SLC) queue is maximum stable in both constrained settings, within the set of policies that select a queue among the connected ones. From a methodological viewpoint, we introduce a novel theoretical tool termed a ``test for fluid limits'' (TFL) that might be of independent interest. TFL is a simple test that, if satisfied by the fluid limit, allows us to conclude for stability.
- [460] arXiv:2403.18707 (cross-list from math.OC) [pdf, other]
-
Title: Connections between Reachability and Time OptimalityComments: Submitted to AutomaticaSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper presents the concept of an equivalence relation between the set of optimal control problems. By leveraging this concept, we show that the boundary of the reachability set can be constructed by the solutions of time optimal problems. Alongside, a more generalized equivalence theorem is presented together. The findings facilitate the use of solution structures from a certain class of optimal control problems to address problems in corresponding equivalent classes. As a byproduct, we state and prove the construction methods of the reachability sets of three-dimensional curves with prescribed curvature bound. The findings are twofold: Firstly, we prove that any boundary point of the reachability set, with the terminal direction taken into account, can be accessed via curves of H, CSC, CCC, or their respective subsegments, where H denotes a helicoidal arc, C a circular arc with maximum curvature, and S a straight segment. Secondly, we show that any boundary point of the reachability set, without considering the terminal direction, can be accessed by curves of CC, CS, or their respective subsegments. These findings extend the developments presented in literature regarding planar curves, or Dubins car dynamics, into spatial curves in $\mathbb{R}^3$. For higher dimensions, we confirm that the problem of identifying the reachability set of curvature bounded paths subsumes the well-known Markov-Dubins problem. These advancements in understanding the reachability of curvature bounded paths in $\mathbb{R}^3$ hold significant practical implications, particularly in the contexts of mission planning problems and time optimal guidance.
- [461] arXiv:2403.18719 (cross-list from math.CO) [pdf, other]
-
Title: On the scaling of random Tamari intervals and Schnyder woods of random triangulations (with an asymptotic D-finite trick)Authors: Guillaume ChapuyComments: 24 pagesSubjects: Combinatorics (math.CO); Symbolic Computation (cs.SC); Probability (math.PR)
We consider a Tamari interval of size $n$ (i.e., a pair of Dyck paths which are comparable for the Tamari relation) chosen uniformly at random. We show that the height of a uniformly chosen vertex on the upper or lower path scales as $n^{3/4}$, and has an explicit limit law. By the Bernardi-Bonichon bijection, this result also describes the height of points in the canonical Schnyder trees of a uniform random plane triangulation of size $n$.
The exact solution of the model is based on polynomial equations with one and two catalytic variables. To prove the convergence from the exact solution, we use a version of moment pumping based on D-finiteness, which is essentially automatic and should apply to many other models. We are not sure to have seen this simple trick used before.
It would be interesting to study the universality of this convergence for decomposition trees associated to positive Bousquet-M\'elou--Jehanne equations. - [462] arXiv:2403.18734 (cross-list from eess.IV) [pdf, other]
-
Title: A vascular synthetic model for improved aneurysm segmentation and detection via Deep Neural NetworksSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
We hereby present a full synthetic model, able to mimic the various constituents of the cerebral vascular tree: the cerebral arteries, the bifurcations and the intracranial aneurysms. By building this model, our goal was to provide a substantial dataset of brain arteries which could be used by a 3D Convolutional Neural Network (CNN) to either segment or detect/recognize various vascular diseases (such as artery dissection/thrombosis) or even some portions of the cerebral vasculature, such as the bifurcations or aneurysms. In this study, we will particularly focus on Intra-Cranial Aneurysm (ICA) detection and segmentation. The cerebral aneurysms most often occur on a particular structure of the vascular tree named the Circle of Willis. Various studies have been conducted to detect and monitor the ICAs and those based on Deep Learning (DL) achieve the best performances. Specifically, in this work, we propose a full synthetic 3D model able to mimic the brain vasculature as acquired by Magnetic Resonance Angiography (MRA), and more particularly the Time Of Flight (TOF) principle. Among the various MRI modalities, the MRA-TOF allows to have a relatively good rendering of the blood vessels and is non-invasive (no contrast liquid injection). Our model has been designed to simultaneously mimic the arteries geometry, the ICA shape and the background noise. The geometry of the vascular tree is modeled thanks to an interpolation with 3D Spline functions, and the statistical properties of the background MRI noise is collected from MRA acquisitions and reproduced within the model. In this work, we thoroughly describe the synthetic vasculature model, we build up a neural network designed for ICA segmentation and detection, and finally, we carry out an in-depth evaluation of the performance gap gained thanks to the synthetic model data augmentation.
- [463] arXiv:2403.18809 (cross-list from math.DS) [pdf, other]
-
Title: $L^\infty$-error bounds for approximations of the Koopman operator by kernel extended dynamic mode decompositionComments: 21 pages, 2 figures, 2 tablesSubjects: Dynamical Systems (math.DS); Numerical Analysis (math.NA)
Extended dynamic mode decomposition (EDMD) is a well-established method to generate a data-driven approximation of the Koopman operator for analysis and prediction of nonlinear dynamical systems. Recently, kernel EDMD (kEDMD) has gained popularity due to its ability to resolve the challenging task of choosing a suitable dictionary by defining data-based observables. In this paper, we provide the first pointwise bounds on the approximation error of kEDMD. The main idea consists of two steps. First, we show that the reproducing kernel Hilbert spaces of Wendland functions are invariant under the Koopman operator. Second, exploiting that the learning problem given by regression in the native norm can be recast as an interpolation problem, we prove our novel error bounds by using interpolation estimates. Finally, we validate our findings with numerical experiments.
Replacements for Thu, 28 Mar 24
- [464] arXiv:1904.07184 (replaced) [pdf, ps, other]
-
Title: A monotone scheme for G-equations with application to the explicit convergence rate of robust central limit theoremComments: 33 pagesSubjects: Probability (math.PR); Numerical Analysis (math.NA)
- [465] arXiv:2102.12920 (replaced) [pdf, ps, other]
-
Title: Emerging Trends in Federated Learning: From Model Fusion to Federated X LearningAuthors: Shaoxiong Ji, Yue Tan, Teemu Saravirta, Zhiqin Yang, Yixin Liu, Lauri Vasankari, Shirui Pan, Guodong Long, Anwar WalidComments: To appear in the International Journal of Machine Learning and CyberneticsSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
- [466] arXiv:2109.00970 (replaced) [pdf, ps, other]
-
Title: A Construction of 2-D Z-Complementary Array Code Sets with Flexible Even Row Lengths and Applications in Massive MIMOSubjects: Information Theory (cs.IT)
- [467] arXiv:2111.03354 (replaced) [pdf, other]
-
Title: Programming with union, intersection, and negation typesAuthors: Giuseppe CastagnaSubjects: Programming Languages (cs.PL)
- [468] arXiv:2201.06180 (replaced) [pdf, other]
-
Title: Nonlinear Control Allocation: A Learning Based ApproachComments: submitted to IEEE Conference on Decision and Control (CDC), 2024Subjects: Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
- [469] arXiv:2204.11041 (replaced) [pdf, other]
-
Title: Learning by Erasing: Conditional Entropy based Transferable Out-Of-Distribution DetectionComments: update new experimental resultsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [470] arXiv:2208.02767 (replaced) [pdf, other]
-
Title: Parabolic PDE-constrained optimal control under uncertainty with entropic risk measure using quasi-Monte Carlo integrationSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
- [471] arXiv:2209.02200 (replaced) [pdf, other]
-
Title: Task-wise Sampling Convolutions for Arbitrary-Oriented Object Detection in Aerial ImagesComments: 15 pages, 13 figures, 11 tablesJournal-ref: IEEE Transactions on Neural Networks and Learning Systems,2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [472] arXiv:2210.11634 (replaced) [pdf, other]
-
Title: A Polynomial-time Algorithm for the Large Scale of Airplane Refueling ProblemComments: 18 pages, 2 figuresSubjects: Data Structures and Algorithms (cs.DS); Optimization and Control (math.OC)
- [473] arXiv:2211.17126 (replaced) [pdf, other]
-
Title: BEVUDA: Multi-geometric Space Alignments for Domain Adaptive BEV 3D Object DetectionAuthors: Jiaming Liu, Rongyu Zhang, Xiaoqi Li, Xiaowei Chi, Zehui Chen, Ming Lu, Yandong Guo, Shanghang ZhangComments: Accepted by ICRA2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [474] arXiv:2212.08251 (replaced) [pdf, other]
-
Title: Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental LearningComments: Accepted at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [475] arXiv:2301.02505 (replaced) [pdf, other]
-
Title: Nested Dirichlet models for unsupervised attack pattern detection in honeypot dataAuthors: Francesco Sanna Passino, Anastasia Mantziou, Daniyar Ghani, Philip Thiede, Ross Bevington, Nicholas A. HeardSubjects: Cryptography and Security (cs.CR); Applications (stat.AP)
- [476] arXiv:2301.10856 (replaced) [pdf, other]
-
Title: Partial Mobilization: Tracking Multilingual Information Flows Amongst Russian Media Outlets and TelegramComments: Accepted to ICWSM 2024Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Machine Learning (cs.LG); Social and Information Networks (cs.SI)
- [477] arXiv:2301.11104 (replaced) [pdf, other]
-
Title: Discovering and Mitigating Visual Biases through Keyword ExplanationComments: CVPR 2024. First two authors contributed equallySubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [478] arXiv:2302.01421 (replaced) [pdf, other]
-
Title: Follower Agnostic Methods for Stackelberg GamesComments: 31 pagesSubjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Dynamical Systems (math.DS)
- [479] arXiv:2302.06912 (replaced) [pdf, other]
-
Title: Regret-Based Defense in Adversarial Reinforcement LearningComments: Accepted at AAMAS 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [480] arXiv:2302.08463 (replaced) [pdf, other]
-
Title: Dynamic Grasping with a Learned Meta-ControllerComments: 9 pagesSubjects: Robotics (cs.RO)
- [481] arXiv:2302.12468 (replaced) [pdf, other]
-
Title: Adapting Knowledge for Few-shot Table-to-Text GenerationComments: arXiv admin note: substantial text overlap with arXiv:2302.04415Subjects: Computation and Language (cs.CL)
- [482] arXiv:2302.13483 (replaced) [pdf, other]
-
Title: CrystalBox: Future-Based Explanations for Input-Driven Deep RL SystemsSubjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
- [483] arXiv:2303.08231 (replaced) [pdf, other]
-
Title: Rotation-Invariant Transformer for Point Cloud MatchingComments: Accepted to CVPR 2023Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [484] arXiv:2303.09618 (replaced) [pdf, other]
-
Title: HIVE: Harnessing Human Feedback for Instructional Visual EditingAuthors: Shu Zhang, Xinyi Yang, Yihao Feng, Can Qin, Chia-Chih Chen, Ning Yu, Zeyuan Chen, Huan Wang, Silvio Savarese, Stefano Ermon, Caiming Xiong, Ran XuComments: In CVPR, 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [485] arXiv:2303.09817 (replaced) [pdf, other]
-
Title: Interpretable machine learning for time-to-event prediction in medicine and healthcareAuthors: Hubert Baniecki, Bartlomiej Sobieski, Patryk Szatkowski, Przemyslaw Bombinski, Przemyslaw BiecekComments: An extended version of an AIME 2023 paper submitted to Artificial Intelligence in MedicineJournal-ref: Artificial Intelligence in Medicine, vol. 1, pp. 65-74, 2023Subjects: Computer Vision and Pattern Recognition (cs.CV); Applications (stat.AP)
- [486] arXiv:2303.09992 (replaced) [pdf, other]
-
Title: LION: Implicit Vision Prompt TuningComments: Accepted by AAAI2024; 9 pages, 3 figures, 4 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [487] arXiv:2303.10365 (replaced) [pdf, other]
-
Title: CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label LearningComments: Accepted by CVPR 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [488] arXiv:2303.12091 (replaced) [pdf, other]
-
Title: Adaptive Negative Evidential Deep Learning for Open-set Semi-supervised LearningComments: Accepted by AAAI2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [489] arXiv:2303.13300 (replaced) [pdf, ps, other]
-
Title: The Innovation Paradox: Concept Space Expansion with Diminishing Originality and the Promise of Creative AIComments: Forthcoming on the Design ScienceSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI)
- [490] arXiv:2303.17251 (replaced) [pdf, other]
-
Title: Demystifying Misconceptions in Social Bots ResearchAuthors: Stefano Cresci, Kai-Cheng Yang, Angelo Spognardi, Roberto Di Pietro, Filippo Menczer, Marinella PetrocchiSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
- [491] arXiv:2304.01973 (replaced) [pdf, other]
-
Title: ERM++: An Improved Baseline for Domain GeneralizationComments: An improved baseline for Domain GeneralizationSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [492] arXiv:2304.03544 (replaced) [pdf, other]
-
Title: InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic ModelingComments: Accepted to AAAI2023 conference. Code is available at this https URLSubjects: Computation and Language (cs.CL)
- [493] arXiv:2304.06427 (replaced) [pdf, other]
-
Title: In-Distribution and Out-of-Distribution Self-supervised ECG Representation Learning for Arrhythmia DetectionComments: This paper has been published in the IEEE Journal of Biomedical and Health Informatics (JBHI). Copyright IEEE. Please cite as: S. Soltanieh, J. Hashemi and A. Etemad, "In-Distribution and Out-of-Distribution Self-Supervised ECG Representation Learning for Arrhythmia Detection," in IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 2, pp. 789-800, Feb. 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Signal Processing (eess.SP)
- [494] arXiv:2304.14394 (replaced) [pdf, other]
-
Title: Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object TrackingComments: This is a new expanded version of our previous CVPR2023 paper "SeqTrack: Sequence to Sequence Learning for Visual Object Tracking." SeqTrackv2 extends SeqTrack to four multi-modal tracking tasks with a unified model and parameter setSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [495] arXiv:2305.01716 (replaced) [pdf, other]
-
Title: The Pseudoinverse of $A=CR$ is $A^+=R^+C^+$ (?)Comments: 10 pages, 5 figures, matlab code, new paragraphs introduce general formulas for the pseudoinverse of CR, new Figures and the randomized pseudoinverse algorithmSubjects: Numerical Analysis (math.NA)
- [496] arXiv:2305.02151 (replaced) [pdf, other]
-
Title: Identifying the Correlation Between Language Distance and Cross-Lingual Transfer in a Multilingual Representation SpaceComments: SIGTYP Workshop 2023 (co-located with EACL 2023)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [497] arXiv:2305.03123 (replaced) [pdf, ps, other]
-
Title: ChatGPT Needs SPADE (Sustainability, PrivAcy, Digital divide, and Ethics) Evaluation: A ReviewComments: 29 pages, 8 figures, 4 tablesSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [498] arXiv:2305.09497 (replaced) [pdf, other]
-
Title: Curious Rhythms: Temporal Regularities of Wikipedia ConsumptionComments: ICWSM 2024Subjects: Computers and Society (cs.CY); Digital Libraries (cs.DL)
- [499] arXiv:2305.12523 (replaced) [pdf, ps, other]
-
Title: Multi-Static Target Detection and Power Allocation for Integrated Sensing and Communication in Cell-Free Massive MIMOComments: 16 pages, 7 figuresSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
- [500] arXiv:2305.13525 (replaced) [pdf, other]
-
Title: A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
- [501] arXiv:2305.14258 (replaced) [pdf, other]
-
Title: Weakly Supervised AUC Optimization: A Unified Partial AUC ApproachComments: Accepted by IEEE TPAMISubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [502] arXiv:2305.14718 (replaced) [pdf, other]
-
Title: Leftover-Lunch: Advantage-based Offline Reinforcement Learning for Language ModelsComments: published at ICLR 2024Subjects: Computation and Language (cs.CL)
- [503] arXiv:2305.14965 (replaced) [pdf, other]
-
Title: Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting JailbreaksComments: Accepted at LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and EvaluationSubjects: Computation and Language (cs.CL)
- [504] arXiv:2305.17079 (replaced) [pdf, other]
-
Title: Complete Multiparty Session Type Projection with AutomataComments: 24 pages, 44 pages including appendix; CAV 2023Subjects: Formal Languages and Automata Theory (cs.FL); Distributed, Parallel, and Cluster Computing (cs.DC); Programming Languages (cs.PL)
- [505] arXiv:2305.17294 (replaced) [pdf, other]
-
Title: A boundary integral equation method for the complete electrode model in electrical impedance tomography with tests on experimental dataComments: 27 pages. The published version linked to below is substantially expandedJournal-ref: SIAM Journal on Imaging Sciences, volume 17, number 1, 672--705, 2024Subjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
- [506] arXiv:2306.02928 (replaced) [pdf, other]
-
Title: Weakly-Supervised Conditional Embedding for Referred Visual SearchComments: 28 pages, 13 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [507] arXiv:2306.03997 (replaced) [pdf, other]
-
Title: Sentiment Analysis in Finance: From Transformers Back to eXplainable Lexicons (XLex)Comments: Published by IEEE Access DOI: 10.1109/ACCESS.2024.3349970 Link: this https URLSubjects: Computation and Language (cs.CL)
- [508] arXiv:2306.04344 (replaced) [pdf, other]
-
Title: ViDA: Homeostatic Visual Domain Adapter for Continual Test Time AdaptationAuthors: Jiaming Liu, Senqiao Yang, Peidong Jia, Renrui Zhang, Ming Lu, Yandong Guo, Wei Xue, Shanghang ZhangComments: Accepted by ICLR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [509] arXiv:2306.04357 (replaced) [pdf, other]
-
Title: Dial-MAE: ConTextual Masked Auto-Encoder for Retrieval-based Dialogue SystemsComments: This paper has been accepted by NAACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [510] arXiv:2306.05882 (replaced) [pdf, ps, other]
-
Title: Good, but not always Fair: An Evaluation of Gender Bias for three commercial Machine Translation SystemsJournal-ref: Hermes Journal of Language and Communication in Business no 63 2023Subjects: Computation and Language (cs.CL)
- [511] arXiv:2306.08304 (replaced) [pdf, other]
-
Title: Chart2Vec: A Universal Embedding of Context-Aware VisualizationsSubjects: Human-Computer Interaction (cs.HC)
- [512] arXiv:2306.09459 (replaced) [pdf, other]
-
Title: Recurrent Action Transformer with MemoryComments: 15 pages, 11 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [513] arXiv:2306.11376 (replaced) [pdf, other]
-
Title: Coevolution of cognition and cooperation in structured populations under reinforcement learningComments: 10 pages, 2 figuresSubjects: Physics and Society (physics.soc-ph); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); General Economics (econ.GN)
- [514] arXiv:2306.12609 (replaced) [pdf, other]
-
Title: Towards Regulatable AI Systems: Technical Gaps and Policy OpportunitiesAuthors: Xudong Shen, Hannah Brown, Jiashu Tao, Martin Strobel, Yao Tong, Akshay Narayan, Harold Soh, Finale Doshi-VelezComments: scheduled for publication in the Communications of the ACM, titled "Directions of Technical Innovation for Regulatable AI Systems"Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
- [515] arXiv:2306.15328 (replaced) [pdf, ps, other]
-
Title: Simulating counterfactualsSubjects: Machine Learning (stat.ML); Computers and Society (cs.CY); Machine Learning (cs.LG); Computation (stat.CO)
- [516] arXiv:2306.16772 (replaced) [pdf, other]
-
Title: Learning from Synthetic Human Group ActivitiesAuthors: Che-Jui Chang, Danrui Li, Deep Patel, Parth Goel, Honglu Zhou, Seonghyeon Moon, Samuel S. Sohn, Sejong Yoon, Vladimir Pavlovic, Mubbasir KapadiaSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [517] arXiv:2307.02203 (replaced) [pdf, other]
-
Title: Neural Fields for Interactive Visualization of Statistical Dependencies in 3D Simulation EnsemblesAuthors: Fatemeh Farokhmanesh, Kevin Höhlein, Christoph Neuhauser, Tobias Necker, Martin Weissmann, Takemasa Miyoshi, Rüdiger WestermannSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [518] arXiv:2307.07572 (replaced) [pdf, other]
-
Title: High-Rate Phase Association with Travel Time Neural FieldsSubjects: Geophysics (physics.geo-ph); Machine Learning (cs.LG); Signal Processing (eess.SP)
- [519] arXiv:2307.09136 (replaced) [pdf, other]
-
Title: The Effects of Mixed Sample Data Augmentation are Class DependentComments: 21 pages, 18 figures, Overall RevisionSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [520] arXiv:2307.13352 (replaced) [pdf, other]
-
Title: High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine AttackersSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
- [521] arXiv:2307.16071 (replaced) [pdf, other]
-
Title: ÌròyìnSpeech: A multi-purpose Yorùbá Speech CorpusComments: Accepted to LREC-COLING 2024Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [522] arXiv:2307.16075 (replaced) [pdf, ps, other]
-
Title: Redesigning Large-Scale Multimodal Transit Networks with Shared Autonomous Mobility ServicesComments: 48 pages, 18 figures, accepted for publication in Transportation Research Part C: Emerging Technologies, and presentation in the 25th International Symposium on Transportation and Traffic Theory (ISTTT25)Subjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
- [523] arXiv:2308.00911 (replaced) [pdf, other]
-
Title: Optimal Sensor Deception to Deviate from an Allowed ItinerarySubjects: Robotics (cs.RO)
- [524] arXiv:2308.02396 (replaced) [pdf, other]
-
Title: HOOD: Real-Time Human Presence and Out-of-Distribution Detection Using FMCW RadarComments: 10 pages, 2 figures, project page: this https URLSubjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [525] arXiv:2308.06098 (replaced) [pdf, ps, other]
-
Title: Automated Construction of Time-Space Diagrams for Traffic Analysis Using Street-View Video SequenceComments: The paper is published in 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)Journal-ref: 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), Bilbao, Spain, 2023, pp. 2282-2288Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [526] arXiv:2308.06822 (replaced) [pdf, other]
-
Title: Approximate and Weighted Data Reconstruction Attack in Federated LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Optimization and Control (math.OC)
- [527] arXiv:2308.10483 (replaced) [pdf, ps, other]
-
Title: Aggregate Model of District Heating Network for Integrated Energy Dispatch: A Physically Informed Data-Driven ApproachSubjects: Systems and Control (eess.SY)
- [528] arXiv:2308.11138 (replaced) [pdf, ps, other]
-
Title: NLP-based detection of systematic anomalies among the narratives of consumer complaintsSubjects: Methodology (stat.ME); Computation and Language (cs.CL); Risk Management (q-fin.RM); Machine Learning (stat.ML)
- [529] arXiv:2308.12531 (replaced) [pdf, other]
-
Title: CARE: Co-Attention Network for Joint Entity and Relation ExtractionComments: Accepted by LREC-COLING 2024Subjects: Computation and Language (cs.CL)
- [530] arXiv:2308.12882 (replaced) [pdf, other]
-
Title: LCANets++: Robust Audio Classification using Multi-layer Neural Networks with Lateral CompetitionComments: Accepted at 2024 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops (ICASSPW)Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [531] arXiv:2308.13356 (replaced) [pdf, ps, other]
-
Title: CEIMVEN: An Approach of Cutting Edge Implementation of Modified Versions of EfficientNet (V1-V2) Architecture for Breast Cancer Detection and Classification from Ultrasound ImagesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [532] arXiv:2309.04381 (replaced) [pdf, other]
-
Title: Generalization Bounds: Perspectives from Information Theory and PAC-BayesComments: 228 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Theory (cs.IT); Statistics Theory (math.ST); Machine Learning (stat.ML)
- [533] arXiv:2309.06075 (replaced) [pdf, other]
-
Title: A2V: A Semi-Supervised Domain Adaptation Framework for Brain Vessel Segmentation via Two-Phase Training Angiography-to-Venography TranslationAuthors: Francesco Galati, Daniele Falcetta, Rosa Cortese, Barbara Casolla, Ferran Prados, Ninon Burgos, Maria A. ZuluagaComments: Accepted at the 34th British Machine Vision Conference (BMVC)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [534] arXiv:2309.06494 (replaced) [pdf, other]
-
Title: Non-smooth Control Barrier Functions for Stochastic Dynamical SystemsSubjects: Robotics (cs.RO)
- [535] arXiv:2309.07798 (replaced) [pdf, other]
-
Title: Enhancing Performance, Calibration Time and Efficiency in Brain-Machine Interfaces through Transfer Learning and Wearable EEG TechnologySubjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
- [536] arXiv:2309.10718 (replaced) [pdf, other]
-
Title: DRIVE: Data-driven Robot Input Vector ExplorationAuthors: Dominic Baril, Simon-Pierre Deschênes, Luc Coupal, Cyril Goffin, Julien Lépine, Philippe Giguère, François PomerleauComments: 8 pages, 7 figures, 1 table, accepted for publication at the 2024 IEEE International Conference on Robotics and Automation (ICRA2024), Yokohama, JapanSubjects: Robotics (cs.RO)
- [537] arXiv:2309.11190 (replaced) [pdf, other]
-
Title: Space and Move-optimal Arbitrary Pattern Formation on Infinite Rectangular Grid by Oblivious Robot SwarmSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
- [538] arXiv:2309.11427 (replaced) [pdf, other]
-
Title: Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor ManufacturingSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [539] arXiv:2309.11798 (replaced) [pdf, other]
-
Title: A Comprehensive Review of Community Detection in GraphsAuthors: Jiakang Li, Songning Lai, Zhihao Shuai, Yuan Tan, Yifan Jia, Mianyang Yu, Zichen Song, Xiaokang Peng, Ziyang Xu, Yongxin Ni, Haifeng Qiu, Jiayu Yang, Yutong Liu, Yonggang LuSubjects: Social and Information Networks (cs.SI); Machine Learning (cs.LG)
- [540] arXiv:2309.12857 (replaced) [pdf, other]
-
Title: Risk-aware Control for Robots with Non-Gaussian Belief SpacesSubjects: Robotics (cs.RO)
- [541] arXiv:2309.13320 (replaced) [pdf, other]
-
Title: GlotScript: A Resource and Tool for Low Resource Writing System IdentificationComments: LREC-COLING 2024Subjects: Computation and Language (cs.CL)
- [542] arXiv:2309.13322 (replaced) [pdf, other]
-
Title: From Text to Source: Results in Detecting Large Language Model-Generated ContentComments: Accepted to COLING-LREC 2024Subjects: Computation and Language (cs.CL)
- [543] arXiv:2309.16046 (replaced) [pdf, other]
-
Title: Confidence and second-order errors in cortical circuitsSubjects: Neurons and Cognition (q-bio.NC); Neural and Evolutionary Computing (cs.NE)
- [544] arXiv:2309.16421 (replaced) [pdf, other]
-
Title: Distilling ODE Solvers of Diffusion Models into Smaller StepsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [545] arXiv:2310.00117 (replaced) [pdf, other]
-
Title: ABScribe: Rapid Exploration & Organization of Multiple Writing Variations in Human-AI Co-Writing Tasks using Large Language ModelsAuthors: Mohi Reza, Nathan Laundry, Ilya Musabirov, Peter Dushniku, Zhi Yuan "Michael" Yu, Kashish Mittal, Tovi Grossman, Michael Liut, Anastasia Kuzminykh, Joseph Jay WilliamsComments: CHI 2024Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [546] arXiv:2310.02879 (replaced) [pdf, other]
-
Title: Online Mechanism Design with PredictionsComments: 25 pages, 1 figureSubjects: Computer Science and Game Theory (cs.GT)
- [547] arXiv:2310.03325 (replaced) [pdf, other]
-
Title: Learning Concept-Based Causal Transition and Symbolic Reasoning for Visual PlanningSubjects: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [548] arXiv:2310.04181 (replaced) [pdf, other]
-
Title: DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse ConditionsAuthors: Sanket Kalwar, Mihir Ungarala, Shruti Jain, Aaron Monis, Krishna Reddy Konda, Sourav Garg, K Madhava KrishnaSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [549] arXiv:2310.05649 (replaced) [pdf, other]
-
Title: Context, Composition, Automation, and Communication -- The C2AC Roadmap for Modeling and SimulationAuthors: Adelinde Uhrmacher, Peter Frazier, Reiner Hähnle, Franziska Klügl, Fabian Lorig, Bertram Ludäscher, Laura Nenzi, Cristina Ruiz-Martin, Bernhard Rumpe, Claudia Szabo, Gabriel A. Wainer, Pia WilsdorfSubjects: Computational Engineering, Finance, and Science (cs.CE)
- [550] arXiv:2310.05723 (replaced) [pdf, other]
-
Title: Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement LearningComments: 10 pages, 17 figures, preprintSubjects: Machine Learning (cs.LG)
- [551] arXiv:2310.08106 (replaced) [pdf, other]
-
Title: Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation ModelsComments: V2 proposed a more effective method for label distribution estimation. V1 fixed a typo in abstract; Accepted by NeurIPS2023Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [552] arXiv:2310.10900 (replaced) [pdf, other]
-
Title: Stability of Sequential Lateration and of Stress Minimization in the Presence of NoiseComments: arXiv admin note: substantial text overlap with arXiv:2207.07218Subjects: Statistics Theory (math.ST); Networking and Internet Architecture (cs.NI); Probability (math.PR)
- [553] arXiv:2310.12370 (replaced) [pdf, other]
-
Title: No-Regret Learning in Bilateral Trade via Global Budget BalanceComments: Accepted at STOC 2024Subjects: Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG)
- [554] arXiv:2310.15081 (replaced) [pdf, other]
-
Title: E4S: Fine-grained Face Swapping via Editing With Regional GAN InversionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [555] arXiv:2310.17072 (replaced) [pdf, other]
-
Title: MMP++: Motion Manifold Primitives with Parametric Curve ModelsAuthors: Yonghyeon LeeComments: 12 pages. This work has been submitted to the IEEE for possible publicationSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
- [556] arXiv:2310.18011 (replaced) [pdf, ps, other]
-
Title: Data journeys in popular science: Producing climate change and COVID-19 data visualizations at Scientific AmericanComments: 44 pages, 4 figures, 3 boxesSubjects: Digital Libraries (cs.DL); Human-Computer Interaction (cs.HC); Popular Physics (physics.pop-ph)
- [557] arXiv:2310.19055 (replaced) [pdf, other]
-
Title: A Few-Shot Learning Focused Survey on Recent Named Entity Recognition and Relation Classification MethodsSubjects: Computation and Language (cs.CL)
- [558] arXiv:2311.01191 (replaced) [pdf, other]
-
Title: VIGraph: Generative Self-supervised Learning for Class-Imbalanced Node ClassificationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [559] arXiv:2311.01483 (replaced) [pdf, other]
-
Title: FedSN: A Novel Federated Learning Framework over LEO Satellite NetworksComments: 14 pages, 17 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
- [560] arXiv:2311.02749 (replaced) [pdf, other]
-
Title: Fast Point Cloud to Mesh Reconstruction for Deformable Object TrackingComments: 8 pages with appendix,16 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [561] arXiv:2311.03189 (replaced) [pdf, other]
-
Title: Safe Control for Soft-Rigid Robots with Self-Contact using Control Barrier FunctionsComments: 6 pages, 6 figures, submitted to IEEE Robosoft 2024 ConferenceSubjects: Robotics (cs.RO)
- [562] arXiv:2311.03683 (replaced) [pdf, other]
-
Title: Preventing Arbitrarily High Confidence on Far-Away Data in Point-Estimated Discriminative Neural NetworksComments: Accepted at AISTATS 2024Subjects: Machine Learning (cs.LG)
- [563] arXiv:2311.04698 (replaced) [pdf, other]
-
Title: Challenging Common Paradigms in Multi-Task LearningComments: -Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
- [564] arXiv:2311.05362 (replaced) [pdf, other]
-
Title: Modeling and Control of Intrinsically Elasticity Coupled Soft-Rigid RobotsComments: 7 pages, 8 figuresSubjects: Robotics (cs.RO)
- [565] arXiv:2311.06373 (replaced) [pdf, other]
-
Title: Partial Information Decomposition for Continuous Variables based on Shared Exclusions: Analytical Formulation and EstimationAuthors: David A. Ehrlich, Kyle Schick-Poland, Abdullah Makkeh, Felix Lanfermann, Patricia Wollstadt, Michael WibralComments: 32 pages, 15 figuresSubjects: Information Theory (cs.IT); Probability (math.PR); Statistics Theory (math.ST); Computation (stat.CO)
- [566] arXiv:2311.07335 (replaced) [pdf, other]
-
Title: Throughput Maximization in Multi-Band Optical Networks with Column GenerationComments: 6 pages, 4 figures, accepted by IEEE International Conference on Communications 2024 (ICC2024)Subjects: Networking and Internet Architecture (cs.NI)
- [567] arXiv:2311.07838 (replaced) [pdf, other]
-
Title: LLatrieval: LLM-Verified Retrieval for Verifiable GenerationComments: Accepted by NAACL 2024 (Main Conference)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
- [568] arXiv:2311.08100 (replaced) [pdf, other]
-
Title: PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous DrivingSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [569] arXiv:2311.08268 (replaced) [pdf, other]
-
Title: A Wolf in Sheep's Clothing: Generalized Nested Jailbreak Prompts can Fool Large Language Models EasilyComments: Acccepted by NAACL 2024, 18 pages, 7 figures, 13 tablesSubjects: Computation and Language (cs.CL)
- [570] arXiv:2311.08590 (replaced) [pdf, other]
-
Title: PEMA: An Offsite-Tunable Plug-in External Memory Adaptation for Language ModelsComments: Accepted to NAACL 2024Subjects: Computation and Language (cs.CL)
- [571] arXiv:2311.08787 (replaced) [pdf, other]
-
Title: Polygonal Cone Control Barrier Functions (PolyC2BF) for safe navigation in cluttered environmentsComments: 6 Pages, 6 Figures. Accepted at European Control Conference (ECC) 2024. arXiv admin note: text overlap with arXiv:2303.15871Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [572] arXiv:2311.10319 (replaced) [pdf, other]
-
Title: Shifting to Machine Supervision: Annotation-Efficient Semi and Self-Supervised Learning for Automatic Medical Image Segmentation and ClassificationAuthors: Pranav Singh, Raviteja Chukkapalli, Shravan Chaudhari, Luoyao Chen, Mei Chen, Jinqian Pan, Craig Smuda, Jacopo CirroneComments: Seventeen pages (incl. references), five figures, and one table. (Under Review)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [573] arXiv:2311.10522 (replaced) [pdf, other]
-
Title: Enhancing Object Coherence in Layout-to-Image SynthesisSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [574] arXiv:2311.12028 (replaced) [pdf, other]
-
Title: Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose EstimationComments: Accepted by CVPR 2024, Open SourcedSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [575] arXiv:2311.12386 (replaced) [pdf, other]
-
Title: Point, Segment and Count: A Generalized Framework for Object CountingComments: Accepted by CVPR 2024. Camera readySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [576] arXiv:2311.13888 (replaced) [pdf, other]
-
Title: On the robustness of high-order upwind summation-by-parts methods for nonlinear conservation lawsAuthors: Hendrik Ranocha, Andrew R. Winters, Michael Schlottke-Lakemper, Philipp Öffner, Jan Glaubitz, Gregor J. GassnerSubjects: Numerical Analysis (math.NA)
- [577] arXiv:2311.13967 (replaced) [pdf, other]
-
Title: Unconstrained learning of networked nonlinear systems via free parametrization of stable interconnected operatorsComments: Full version of the paper to appear at ECC 2024Subjects: Systems and Control (eess.SY)
- [578] arXiv:2311.15803 (replaced) [pdf, other]
-
Title: SOAC: Spatio-Temporal Overlap-Aware Multi-Sensor Calibration using Neural Radiance FieldsAuthors: Quentin Herau, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Cyrille Migniot, Pascal Vasseur, Cédric DemonceauxComments: Accepted at CVPR 2024. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [579] arXiv:2311.15864 (replaced) [pdf, other]
-
Title: InterControl: Generate Human Motion Interactions by Controlling Every JointComments: Generate human interactions with only single-person data via joint contact pairs, code this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [580] arXiv:2311.17456 (replaced) [pdf, other]
-
Title: DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based RefinementAuthors: Jiuming Liu, Guangming Wang, Weicai Ye, Chaokang Jiang, Jinru Han, Zhe Liu, Guofeng Zhang, Dalong Du, Hesheng WangComments: Camera-ready version of CVPR 2024. Codes are released at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [581] arXiv:2311.17532 (replaced) [pdf, other]
-
Title: Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture GenerationAuthors: Xingqun Qi, Jiahao Pan, Peng Li, Ruibin Yuan, Xiaowei Chi, Mengfei Li, Wenhan Luo, Wei Xue, Shanghang Zhang, Qifeng Liu, Yike GuoComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [582] arXiv:2311.18113 (replaced) [pdf, other]
-
Title: Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D FeaturesComments: Accepted to CVPR 2024, Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [583] arXiv:2312.01220 (replaced) [pdf, other]
-
Title: Boosting Object Detection with Zero-Shot Day-Night Domain AdaptationComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [584] arXiv:2312.01616 (replaced) [pdf, other]
-
Title: SchurVINS: Schur Complement-Based Lightweight Visual Inertial Navigation SystemSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
- [585] arXiv:2312.01629 (replaced) [pdf, other]
-
Title: CLAMP: Contrastive LAnguage Model Prompt-tuningSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [586] arXiv:2312.02126 (replaced) [pdf, other]
-
Title: SplaTAM: Splat, Track & Map 3D Gaussians for Dense RGB-D SLAMAuthors: Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula, Gengshan Yang, Sebastian Scherer, Deva Ramanan, Jonathon LuitenComments: CVPR 2024. Website: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [587] arXiv:2312.03256 (replaced) [pdf, other]
-
Title: CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation ModelsSubjects: Machine Learning (cs.LG)
- [588] arXiv:2312.03620 (replaced) [pdf, other]
-
Title: Golden Gemini is All You Need: Finding the Sweet Spots for Speaker VerificationComments: Accepted to IEEE/ACM Transactions on Audio, Speech, and Language Processing. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [589] arXiv:2312.05677 (replaced) [pdf, other]
-
Title: Batched Low-Rank Adaptation of Foundation ModelsComments: 16 pages, 3 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [590] arXiv:2312.06358 (replaced) [pdf, other]
-
Title: Intraoperative 2D/3D Image Registration via Differentiable X-ray RenderingComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [591] arXiv:2312.06733 (replaced) [pdf, other]
-
Title: TULIP: Transformer for Upsampling of LiDAR Point CloudAuthors: Bin Yang, Patrick Pfreundschuh, Roland Siegwart, Marco Hutter, Peyman Moghadam, Vaishakh PatilComments: The paper was accepted by CVPR20224Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [592] arXiv:2312.07264 (replaced) [pdf, other]
-
Title: Dual Structure-Aware Image Filterings for Semi-supervised Medical Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [593] arXiv:2312.07472 (replaced) [pdf, other]
-
Title: MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active PerceptionAuthors: Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng, Ruimao Zhang, Yu Qiao, Jing ShaoComments: Accepted to CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [594] arXiv:2312.07950 (replaced) [pdf, other]
-
Title: CBQ: Cross-Block Quantization for Large Language ModelsAuthors: Xin Ding, Xiaoyu Liu, Zhijun Tu, Yun Zhang, Wei Li, Jie Hu, Hanting Chen, Yehui Tang, Zhiwei Xiong, Baoqun Yin, Yunhe WangSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
- [595] arXiv:2312.08344 (replaced) [pdf, other]
-
Title: FoundationPose: Unified 6D Pose Estimation and Tracking of Novel ObjectsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
- [596] arXiv:2312.08479 (replaced) [pdf, ps, other]
-
Title: Vision Transformer-Based Deep Learning for Histologic Classification of Endometrial CancerAuthors: Manu Goyal, Laura J. Tafe, James X. Feng, Kristen E. Muller, Liesbeth Hondelink, Jessica L. Bentz, Saeed HassanpourComments: 4 Tables and 3 FiguresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [597] arXiv:2312.08533 (replaced) [pdf, other]
-
Title: World Models via Policy-Guided Trajectory DiffusionComments: Published in TMLR, March 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [598] arXiv:2312.09138 (replaced) [pdf, other]
-
Title: Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D EnvironmentsComments: CVPR 2024 camera-readySubjects: Computer Vision and Pattern Recognition (cs.CV)
- [599] arXiv:2312.10114 (replaced) [pdf, other]
-
Title: FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation modelsComments: 26 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [600] arXiv:2312.10812 (replaced) [pdf, other]
-
Title: Learning to Act without ActionsComments: Accepted at ICLR 2024 (spotlight). The code can be found at this http URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [601] arXiv:2312.10842 (replaced) [pdf, other]
-
Title: Compositional Inductive Invariant Based Verification of Neural Network Controlled SystemsSubjects: Logic in Computer Science (cs.LO); Machine Learning (cs.LG); Systems and Control (eess.SY)
- [602] arXiv:2312.10997 (replaced) [pdf, other]
-
Title: Retrieval-Augmented Generation for Large Language Models: A SurveyAuthors: Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng Wang, Haofen WangComments: Ongoing WorkSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [603] arXiv:2312.12359 (replaced) [pdf, other]
-
Title: CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentationAuthors: Monika Wysoczańska, Oriane Siméoni, Michaël Ramamonjisoa, Andrei Bursuc, Tomasz Trzciński, Patrick PérezSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [604] arXiv:2312.12480 (replaced) [pdf, other]
-
Title: Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time AdaptationAuthors: Jiaming Liu, Ran Xu, Senqiao Yang, Renrui Zhang, Qizhe Zhang, Zehui Chen, Yandong Guo, Shanghang ZhangComments: Accepted by CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [605] arXiv:2312.12558 (replaced) [pdf, other]
-
Title: Sample Efficient Reinforcement Learning with Partial Dynamics KnowledgeComments: Published in the 38th Annual AAAI Conference on Artificial IntelligenceSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
- [606] arXiv:2312.13094 (replaced) [pdf, other]
-
Title: Automated MPI code generation for scalable finite-difference solversAuthors: George Bisbas, Rhodri Nelson, Mathias Louboutin, Paul H.J. Kelly, Fabio Luporini, Gerard GormanComments: 10 pages, 12 figures (18 pages with References and Appendix)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Mathematical Software (cs.MS); Performance (cs.PF)
- [607] arXiv:2312.16943 (replaced) [pdf, other]
-
Title: SAR-Net: Multi-scale Direction-aware SAR Network via Global Information FusionSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [608] arXiv:2401.00374 (replaced) [pdf, other]
-
Title: EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture ModelingAuthors: Haiyang Liu, Zihao Zhu, Giorgio Becherini, Yichen Peng, Mingyang Su, You Zhou, Xuefei Zhe, Naoya Iwamoto, Bo Zheng, Michael J. BlackComments: Conflict of Interest Disclosure; CVPR Camera Ready; Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [609] arXiv:2401.01647 (replaced) [pdf, other]
-
Title: SIGNeRF: Scene Integrated Generation for Neural Radiance FieldsComments: Project Page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
- [610] arXiv:2401.02009 (replaced) [pdf, other]
-
Title: Self-Contrast: Better Reflection Through Inconsistent Solving PerspectivesAuthors: Wenqi Zhang, Yongliang Shen, Linjuan Wu, Qiuying Peng, Jun Wang, Yueting Zhuang, Weiming LuSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [611] arXiv:2401.02379 (replaced) [pdf, other]
-
Title: Detection and Discovery of Misinformation Sources using Attributed WebgraphsSubjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
- [612] arXiv:2401.03244 (replaced) [pdf, other]
-
Title: Artificial Intelligence for Operations Research: Revolutionizing the Operations Research ProcessSubjects: Optimization and Control (math.OC); Artificial Intelligence (cs.AI)
- [613] arXiv:2401.06201 (replaced) [pdf, other]
-
Title: EASYTOOL: Enhancing LLM-based Agents with Concise Tool InstructionAuthors: Siyu Yuan, Kaitao Song, Jiangjie Chen, Xu Tan, Yongliang Shen, Ren Kan, Dongsheng Li, Deqing YangSubjects: Computation and Language (cs.CL)
- [614] arXiv:2401.06712 (replaced) [pdf, other]
-
Title: Few-Shot Detection of Machine-Generated Text using Style RepresentationsSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [615] arXiv:2401.07494 (replaced) [pdf, other]
-
Title: Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering TasksSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
- [616] arXiv:2401.08742 (replaced) [pdf, other]
-
Title: Fast Dynamic 3D Object Generation from a Single-view VideoComments: Technical reportSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [617] arXiv:2401.11542 (replaced) [pdf, other]
-
Title: Nigel -- Mechatronic Design and Robust Sim2Real Control of an Over-Actuated Autonomous VehicleSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
- [618] arXiv:2401.12492 (replaced) [pdf, other]
-
Title: Comparing Pre-trained Human Language Models: Is it Better with Human Context as Groups, Individual Traits, or Both?Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [619] arXiv:2401.13387 (replaced) [pdf, other]
-
Title: A Mathematical Theory of Semantic CommunicationComments: (version 2.0 updated) 96 pages, 18 figures. This paper is submitted to IEEE Transactions on Information Theory (TIT)Subjects: Information Theory (cs.IT)
- [620] arXiv:2401.15120 (replaced) [pdf, other]
-
Title: Incorporating simulated spatial context information improves the effectiveness of contrastive learning modelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [621] arXiv:2401.16025 (replaced) [pdf, other]
- [622] arXiv:2401.16063 (replaced) [pdf, ps, other]
-
Title: Shannon Capacity of Channels with Markov Insertions, Deletions and SubstitutionsComments: 15 pages, 1 figureSubjects: Information Theory (cs.IT)
- [623] arXiv:2401.17098 (replaced) [pdf, ps, other]
-
Title: Deep Learning-Driven Approach for Handwritten Chinese Character ClassificationComments: 30 pages, 9 figures, 2 tables, preprint v2Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [624] arXiv:2401.17879 (replaced) [pdf, other]
-
Title: AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction ErrorComments: Accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [625] arXiv:2402.01216 (replaced) [pdf, other]
-
Title: Robust Commutation Design: Applied to Switched Reluctance MotorsComments: 6 pages, 7 figures. Final versionSubjects: Systems and Control (eess.SY)
- [626] arXiv:2402.01739 (replaced) [pdf, other]
-
Title: OpenMoE: An Early Effort on Open Mixture-of-Experts Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
- [627] arXiv:2402.02561 (replaced) [pdf, other]
-
Title: Foundation Model Makes Clustering A Better Initialization For Cold-Start Active LearningSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [628] arXiv:2402.04848 (replaced) [pdf, other]
-
Title: Nonlinear behavior of memristive devices for hardware security primitives and neuromorphic computing systemsAuthors: Sahitya Yarragolla, Torben Hemke, Fares Jalled, Tobias Gergs, Jan Trieschmann, Tolga Arul, Thomas MussenbrockSubjects: Emerging Technologies (cs.ET); Mesoscale and Nanoscale Physics (cond-mat.mes-hall)
- [629] arXiv:2402.07868 (replaced) [pdf, ps, other]
-
Title: Nesting Particle Filters for Experimental Design in Dynamical SystemsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
- [630] arXiv:2402.08950 (replaced) [pdf, other]
-
Title: Taking GPU Programming Models to Task for Performance PortabilityAuthors: Joshua H. Davis, Pranav Sivaraman, Joy Kitson, Konstantinos Parasyris, Harshitha Menon, Isaac Minn, Giorgis Georgakoudis, Abhinav BhateleComments: 12 pages, 7 figuresSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
- [631] arXiv:2402.09283 (replaced) [pdf, other]
-
Title: Attacks, Defenses and Evaluations for LLM Conversation Safety: A SurveyComments: Accepted to NAACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
- [632] arXiv:2402.09654 (replaced) [pdf, other]
-
Title: GPT-4's assessment of its performance in a USMLE-based case studyAuthors: Uttam Dhakal, Aniket Kumar Singh, Suman Devkota, Yogesh Sapkota, Bishal Lamichhane, Suprinsa Paudyal, Chandra DhakalSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
- [633] arXiv:2402.11800 (replaced) [pdf, other]
-
Title: Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian SamplingAuthors: Arman Adibi, Nicolo Dal Fabbro, Luca Schenato, Sanjeev Kulkarni, H. Vincent Poor, George J. Pappas, Hamed Hassani, Aritra MitraComments: Accepted to the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024!Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Systems and Control (eess.SY); Optimization and Control (math.OC)
- [634] arXiv:2402.12000 (replaced) [pdf, ps, other]
-
Title: Thinking Outside the Black Box: Insights from a Digital Exhibition in the HumanitiesComments: Sumbitted to the AIUCD2024 Conference: this https URLSubjects: Digital Libraries (cs.DL)
- [635] arXiv:2402.12997 (replaced) [pdf, other]
-
Title: Towards Trustworthy Reranking: A Simple yet Effective Abstention MechanismAuthors: Hippolyte Gisserot-Boukhlef, Manuel Faysse, Emmanuel Malherbe, Céline Hudelot, Pierre ColomboSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL)
- [636] arXiv:2402.13284 (replaced) [pdf, other]
-
Title: Structure Guided Large Language Model for SQL GenerationSubjects: Databases (cs.DB); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [637] arXiv:2402.13376 (replaced) [pdf, ps, other]
-
Title: Probabilistic automatic complexity of finite stringsAuthors: Kenneth GillComments: 41 pages, 5 figures. This work extends Chapter 2 of the author's PhD dissertation at Penn State. V2: fix statement of Proposition 3.4Subjects: Formal Languages and Automata Theory (cs.FL); Logic (math.LO)
- [638] arXiv:2402.13729 (replaced) [pdf, other]
-
Title: Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet RepresentationComments: 17 pages, 13 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [639] arXiv:2402.14326 (replaced) [pdf, other]
-
Title: Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic SegmentationComments: Accepted by ACM Multimedia 2023Subjects: Multimedia (cs.MM)
- [640] arXiv:2402.15764 (replaced) [pdf, other]
-
Title: Look Before You Leap: Problem Elaboration Prompting Improves Mathematical Reasoning in Large Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [641] arXiv:2402.17304 (replaced) [pdf, ps, other]
-
Title: Probing Multimodal Large Language Models for Global and Local Semantic RepresentationsComments: Accepted by LREC-COLING 2024 as a short paper (Camera Ready)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [642] arXiv:2402.17464 (replaced) [pdf, other]
-
Title: Generative 3D Part Assembly via Part-Whole-Hierarchy Message PassingSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [643] arXiv:2402.17574 (replaced) [pdf, other]
-
Title: Agent-Pro: Learning to Evolve via Policy-Level Reflection and OptimizationAuthors: Wenqi Zhang, Ke Tang, Hai Wu, Mengna Wang, Yongliang Shen, Guiyang Hou, Zeqi Tan, Peng Li, Yueting Zhuang, Weiming LuComments: LLM-based AgentSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
- [644] arXiv:2402.18355 (replaced) [pdf, ps, other]
-
Title: COPR -- Efficient, large-scale log storage and retrievalComments: 14 pages, 8 figuresSubjects: Information Retrieval (cs.IR); Databases (cs.DB); Data Structures and Algorithms (cs.DS)
- [645] arXiv:2402.18920 (replaced) [pdf, other]
-
Title: Spectral Meets Spatial: Harmonising 3D Shape Matching and InterpolationComments: accepted by CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computational Geometry (cs.CG)
- [646] arXiv:2402.19146 (replaced) [pdf, other]
-
Title: Computing Longest Common Subsequence under Cartesian-Tree Matching ModelSubjects: Data Structures and Algorithms (cs.DS)
- [647] arXiv:2402.19473 (replaced) [pdf, other]
-
Title: Retrieval-Augmented Generation for AI-Generated Content: A SurveyAuthors: Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Bin CuiComments: Citing 380 papers, 36 pages, 16 figures. Project: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [648] arXiv:2403.00154 (replaced) [pdf, other]
-
Title: LLMs in Political Science: Heralding a New Era of Visual AnalysisAuthors: Yu WangComments: 7 pages, 3 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [649] arXiv:2403.00174 (replaced) [pdf, other]
-
Title: A citizen science toolkit to collect human perceptions of urban environments using open street view imagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [650] arXiv:2403.00211 (replaced) [pdf, other]
-
Title: Trustworthy Self-Attention: Enabling the Network to Focus Only on the Most Relevant ReferencesComments: Correct Figure 1Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [651] arXiv:2403.00465 (replaced) [pdf, other]
-
Title: Polyamorous SchedulingComments: v2: stronger and simplified hardness-of-approximation results, corrected constant in layering approximation algorithmSubjects: Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI); Optimization and Control (math.OC)
- [652] arXiv:2403.00868 (replaced) [pdf, other]
-
Title: SoftTiger: A Clinical Foundation Model for Healthcare WorkflowsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [653] arXiv:2403.02649 (replaced) [pdf, other]
-
Title: Few-shot Learner Parameterization by Diffusion Time-stepsComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [654] arXiv:2403.03100 (replaced) [pdf, other]
-
Title: NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion ModelsAuthors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng ZhaoComments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot waySubjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [655] arXiv:2403.03271 (replaced) [pdf, ps, other]
-
Title: Low-Complexity Linear Decoupling of Users for Uplink Massive MU-MIMO DetectionSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
- [656] arXiv:2403.03532 (replaced) [pdf, other]
-
Title: Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance ExtensionComments: In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [657] arXiv:2403.04125 (replaced) [pdf, other]
-
Title: Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [658] arXiv:2403.04507 (replaced) [pdf, other]
-
Title: NLPre: a revised approach towards language-centric benchmarking of Natural Language Preprocessing systemsComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
- [659] arXiv:2403.05218 (replaced) [pdf, other]
-
Title: 3D Face Reconstruction Using A Spectral-Based Graph Convolution EncoderComments: 4 pages, 3 figures. Accepted to WWW 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [660] arXiv:2403.05262 (replaced) [pdf, other]
-
Title: Debiasing Multimodal Large Language ModelsAuthors: Yi-Fan Zhang, Weichen Yu, Qingsong Wen, Xue Wang, Zhang Zhang, Liang Wang, Rong Jin, Tieniu TanComments: 38 pages, 17 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [661] arXiv:2403.05465 (replaced) [pdf, other]
-
Title: Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN InferenceComments: 2024 61st IEEE/ACM Design Automation Conference (DAC)Subjects: Hardware Architecture (cs.AR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
- [662] arXiv:2403.06054 (replaced) [pdf, other]
-
Title: Decoupled Data Consistency with Diffusion Purification for Image RestorationSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)
- [663] arXiv:2403.06629 (replaced) [pdf, other]
-
Title: Assembly Theory is an approximation to algorithmic complexity based on LZ compression that does not explain selection or evolutionComments: 15 pages + appendix, 2 figuresSubjects: Information Theory (cs.IT); Biomolecules (q-bio.BM)
- [664] arXiv:2403.06633 (replaced) [pdf, other]
-
Title: Fractal spatio-temporal scale-free messaging: amplitude modulation of self-executable carriers given by the Weierstrass function's componentsComments: 15 pages + appendix (21 pages total)Subjects: Information Theory (cs.IT)
- [665] arXiv:2403.06646 (replaced) [pdf, ps, other]
-
Title: Unisolvence of random Kansa collocation by Thin-Plate Splines for the Poisson equationSubjects: Numerical Analysis (math.NA)
- [666] arXiv:2403.06747 (replaced) [pdf, other]
-
Title: MetaSplit: Meta-Split Network for Limited-Stock Product RecommendationComments: Accepted at WWW 2024. This work has already been deployed on the Xianyu platform in Alibaba. The first two authors contributed equallySubjects: Information Retrieval (cs.IR)
- [667] arXiv:2403.07091 (replaced) [pdf, other]
-
Title: Sim-to-Real gap in RL: Use Case with TIAGo and Isaac Sim/GymComments: Accepted in ERF24 workshop "Towards Efficient and Portable Robot Learning for Real-World Settings". To be published in Springer Proceedings in Advanced RoboticsSubjects: Robotics (cs.RO)
- [668] arXiv:2403.07359 (replaced) [pdf, other]
-
Title: FSC: Few-point Shape CompletionComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [669] arXiv:2403.07392 (replaced) [pdf, other]
-
Title: ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense PredictionsComments: CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [670] arXiv:2403.07636 (replaced) [pdf, other]
-
Title: Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training FrameworkAuthors: Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi, Lingqiao Liu, Liyang Liu, Bowen Zhang, Zhibin Liao, Qi Wu, Minh-Son To, Johan W. VerjansComments: Accepted at CVPR2024. Pre-print before final camera-ready versionJournal-ref: CVPR2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [671] arXiv:2403.07711 (replaced) [pdf, other]
-
Title: SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State SpacesComments: Accepted as workshop paper at ICLR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [672] arXiv:2403.07961 (replaced) [pdf, other]
-
Title: The $L_p$-discrepancy for finite $p>1$ suffers from the curse of dimensionalityComments: arXiv admin note: substantial text overlap with arXiv:2303.01787Subjects: Numerical Analysis (math.NA)
- [673] arXiv:2403.08579 (replaced) [pdf, other]
-
Title: Machine Learning Optimized Orthogonal Basis Piecewise Polynomial ApproximationComments: Submitted to LION18Subjects: Machine Learning (cs.LG)
- [674] arXiv:2403.09069 (replaced) [pdf, other]
-
Title: Dyadic Interaction Modeling for Social Behavior GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [675] arXiv:2403.09131 (replaced) [pdf, other]
-
Title: ProSwitch: Knowledge-Guided Language Model Fine-Tuning to Generate Professional and Non-Professional Styled TextComments: 8 pagesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [676] arXiv:2403.09267 (replaced) [pdf, other]
-
Title: Deep Limit Order Book ForecastingComments: 43 pages, 14 figures, 12 TablesSubjects: Trading and Market Microstructure (q-fin.TR); Machine Learning (cs.LG)
- [677] arXiv:2403.09700 (replaced) [pdf, other]
-
Title: Shapley Values-Powered Framework for Fair Reward Split in Content Produced by GenAIComments: 36 pages, 32 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
- [678] arXiv:2403.09887 (replaced) [pdf, other]
-
Title: Sabiá-2: A New Generation of Portuguese Large Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [679] arXiv:2403.10030 (replaced) [pdf, other]
-
Title: Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision TransformersComments: Conference on Computer Vision and Pattern Recognition (CVPR), 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [680] arXiv:2403.10066 (replaced) [pdf, other]
-
Title: Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality AssessmentAuthors: Ziyu Shan, Yujie Zhang, Qi Yang, Haichen Yang, Yiling Xu, Jenq-Neng Hwang, Xiaozhong Xu, Shan LiuSubjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM)
- [681] arXiv:2403.10158 (replaced) [pdf, other]
-
Title: Functional Graph Convolutional Networks: A unified multi-task and multi-modal learning framework to facilitate health and social-care insightsAuthors: Tobia Boschi, Francesca Bonin, Rodrigo Ordonez-Hurtado, Cécile Rousseau, Alessandra Pascale, John DinsmoreSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [682] arXiv:2403.10286 (replaced) [pdf, other]
-
Title: RACH-less Handover with Early Timing Advance Acquisition for Outage ReductionComments: 7 pages, 7 figures. Accepted for presentation at the 2024 IEEE 99th Vehicular Technology Conference (VTC2024)-Spring to be held in SingaporeSubjects: Networking and Internet Architecture (cs.NI)
- [683] arXiv:2403.11107 (replaced) [pdf, other]
-
Title: Self-supervised co-salient object detection via feature correspondence at multiple scalesSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [684] arXiv:2403.11128 (replaced) [pdf, other]
-
Title: Beyond Static Evaluation: A Dynamic Approach to Assessing AI Assistants' API Invocation CapabilitiesComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
- [685] arXiv:2403.11399 (replaced) [pdf, other]
-
Title: X-LLaVA: Optimizing Bilingual Large Vision-Language AlignmentAuthors: Dongjae Shin, Hyunseok Lim, Inho Won, Changsu Choi, Minjun Kim, Seungwoo Song, Hangyeol Yoo, Sangmin Kim, Kyungtae LimSubjects: Computation and Language (cs.CL)
- [686] arXiv:2403.11617 (replaced) [pdf, other]
-
Title: Frontier-Based Exploration for Multi-Robot Rendezvous in Communication-Restricted Unknown EnvironmentsSubjects: Robotics (cs.RO)
- [687] arXiv:2403.11656 (replaced) [pdf, other]
-
Title: LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything ModelComments: Accepted to 2024 IEEE Security and Privacy Workshops (SPW)Subjects: Computer Vision and Pattern Recognition (cs.CV)
- [688] arXiv:2403.12452 (replaced) [src]
-
Title: On the Salient Limitations of `On the Salient Limitations of the Methods of Assembly Theory and their Classification of Molecular Biosignatures'Authors: Leroy CroninComments: the arguments here are being discussed in more detail and will be ready laterSubjects: Information Theory (cs.IT); Adaptation and Self-Organizing Systems (nlin.AO); Biological Physics (physics.bio-ph)
- [689] arXiv:2403.12820 (replaced) [src]
-
Title: A Physics-embedded Deep Learning Framework for Cloth SimulationAuthors: Zhiwei ZhaoComments: A derivation is incomplete, and updations are being processedSubjects: Graphics (cs.GR); Machine Learning (cs.LG)
- [690] arXiv:2403.13374 (replaced) [pdf, other]
-
Title: Byzantine-resilient Federated Learning With Adaptivity to Data HeterogeneitySubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
- [691] arXiv:2403.13680 (replaced) [pdf, other]
-
Title: Step-Calibrated Diffusion for Biomedical Optical Image RestorationAuthors: Yiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. HollonSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
- [692] arXiv:2403.14623 (replaced) [pdf, other]
-
Title: Simplified Diffusion Schrödinger BridgeSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
- [693] arXiv:2403.14721 (replaced) [pdf, ps, other]
-
Title: Automated Extraction and Maturity Analysis of Open Source Clinical Informatics Repositories from Scientific LiteratureAuthors: Jeremy R. HarperSubjects: Digital Libraries (cs.DL); Software Engineering (cs.SE)
- [694] arXiv:2403.14814 (replaced) [pdf, ps, other]
-
Title: The opportunities and risks of large language models in mental healthAuthors: Hannah R. Lawrence, Renee A. Schneider, Susan B. Rubin, Maja J. Mataric, Daniel J. McDuff, Megan Jones BellComments: 12 pages, 2 tables, 4 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
- [695] arXiv:2403.14864 (replaced) [pdf, other]
-
Title: Learning Quadruped Locomotion Using Differentiable SimulationSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
- [696] arXiv:2403.14916 (replaced) [pdf, other]
-
Title: Snail: Secure Single Iteration LocalizationSubjects: Cryptography and Security (cs.CR)
- [697] arXiv:2403.15098 (replaced) [pdf, other]
-
Title: UniTraj: A Unified Framework for Scalable Vehicle Trajectory PredictionAuthors: Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud Ben Amor, Éloi Zablocki, Matthieu Cord, Alexandre AlahiSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [698] arXiv:2403.15114 (replaced) [pdf, other]
-
Title: Solving a Real-World Package Delivery Routing Problem Using Quantum AnnealersComments: 15 pages, 11 figures and 4 tables. Paper submitted for review in Scientific ReportsSubjects: Emerging Technologies (cs.ET); Artificial Intelligence (cs.AI)
- [699] arXiv:2403.15198 (replaced) [pdf, ps, other]
-
Title: On the Weighted Top-Difference Distance: Axioms, Aggregation, and ApproximationComments: 64 pagesSubjects: Computer Science and Game Theory (cs.GT); Discrete Mathematics (cs.DM); Theoretical Economics (econ.TH); Methodology (stat.ME)
- [700] arXiv:2403.15201 (replaced) [pdf, other]
-
Title: Flip-Breakability: A Combinatorial Dichotomy for Monadically Dependent Graph ClassesComments: v2: added section "Conclusions and Future Work"Subjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Logic in Computer Science (cs.LO); Logic (math.LO)
- [701] arXiv:2403.15472 (replaced) [pdf, other]
-
Title: Enhancing Programming Education with ChatGPT: A Case Study on Student Perceptions and Interactions in a Python CourseSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
- [702] arXiv:2403.15721 (replaced) [pdf, other]
-
Title: Design and Implementation of an Analysis Pipeline for Heterogeneous DataAuthors: Arup Kumar Sarker, Aymen Alsaadi, Niranda Perera, Mills Staylor, Gregor von Laszewski, Matteo Turilli, Ozgur Ozan Kilic, Mikhail Titov, Andre Merzky, Shantenu Jha, Geoffrey FoxComments: 14 pages, 16 figures, 2 tablesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
- [703] arXiv:2403.15837 (replaced) [pdf, other]
-
Title: Centered Masking for Language-Image Pre-TrainingSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [704] arXiv:2403.16157 (replaced) [pdf, other]
-
Title: pyKCN: A Python Tool for Bridging Scientific KnowledgeSubjects: Digital Libraries (cs.DL)
- [705] arXiv:2403.16271 (replaced) [pdf, other]
-
Title: Object Detectors in the Open Environment: Challenges, Solutions, and OutlookAuthors: Siyuan Liang, Wei Wang, Ruoyu Chen, Aishan Liu, Boxi Wu, Ee-Chien Chang, Xiaochun Cao, Dacheng TaoComments: 32 pages, 17 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
- [706] arXiv:2403.16290 (replaced) [pdf, other]
-
Title: An Information Theoretic Treatment of Animal Movement TracksAuthors: Wayne M GetzComments: 20 pages, 2 tables, 1 figureSubjects: Populations and Evolution (q-bio.PE); Information Theory (cs.IT)
- [707] arXiv:2403.16335 (replaced) [pdf, other]
-
Title: MEDDAP: Medical Dataset Enhancement via Diversified Augmentation PipelineComments: submitted to miccai 2024 submitted to miccai 2024 Submitted to MICCAI-2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [708] arXiv:2403.16427 (replaced) [pdf, other]
-
Title: Re2LLM: Reflective Reinforcement Large Language Model for Session-based RecommendationComments: 11 pages, 4 figuresSubjects: Artificial Intelligence (cs.AI)
- [709] arXiv:2403.16432 (replaced) [pdf, other]
-
Title: $\textit{LinkPrompt}$: Natural and Universal Adversarial Attacks on Prompt-based Language ModelsComments: Accepted to the main conference of NAACL2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [710] arXiv:2403.16451 (replaced) [pdf, other]
-
Title: DeepMachining: Online Prediction of Machining Errors of Lathe MachinesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [711] arXiv:2403.16488 (replaced) [pdf, ps, other]
-
Title: Ensuring Disturbance Rejection Performance by Synthesizing Grid-Following and Grid-Forming Inverters in Power SystemsComments: 6 pagesSubjects: Systems and Control (eess.SY)
- [712] arXiv:2403.16512 (replaced) [pdf, other]
-
Title: LLMs Are Few-Shot In-Context Low-Resource Language LearnersSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
- [713] arXiv:2403.16516 (replaced) [pdf, other]
-
Title: Visually Guided Generative Text-Layout Pre-training for Document IntelligenceComments: Accepted to NAACL 2024 main conference. The first version of this paper was submitted to OpenReview (this https URL) in June 2023Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
- [714] arXiv:2403.16915 (replaced) [pdf, other]
-
Title: Coarse-Tuning for Ad-hoc Document Retrieval Using Pre-trained Language ModelsComments: Accepted at LREC-COLING 2024Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [715] arXiv:2403.16967 (replaced) [pdf, other]
-
Title: Visual Whole-Body Control for Legged Loco-ManipulationComments: The first two authors contribute equally. Project page: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [716] arXiv:2403.16975 (replaced) [pdf, other]
-
Title: Unconditionally positivity-preserving approximations of the Ait-Sahalia type model: Explicit Milstein-type schemesComments: 19 pages, 3 figuresSubjects: Numerical Analysis (math.NA)
- [717] arXiv:2403.17139 (replaced) [pdf, other]
-
Title: An Equilibrium Analysis of the Arad-Rubinstein GameSubjects: Computer Science and Game Theory (cs.GT)
- [718] arXiv:2403.17143 (replaced) [pdf, other]
-
Title: Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New LanguageComments: Accepted to LREC-COLING 2024 (The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation)Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
- [719] arXiv:2403.17165 (replaced) [pdf, other]
-
Title: Building an Open-Source Community to Enhance Autonomic Nervous System Signal Analysis: DBDP-AutonomicAuthors: Jessilyn Dunn, Varun Mishra, Md Mobashir Hasan Shandhi, Hayoung Jeong, Natasha Yamane, Yuna Watanabe, Matthew S. GoodwinSubjects: Human-Computer Interaction (cs.HC)
- [720] arXiv:2403.17219 (replaced) [pdf, other]
-
Title: SeSaMe: A Framework to Simulate Self-Reported Ground Truth for Mental Health Sensing StudiesSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
- [721] arXiv:2403.17301 (replaced) [pdf, other]
-
Title: Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous DrivingComments: Accepted by CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
- [722] arXiv:2403.17320 (replaced) [pdf, other]
-
Title: Leveraging Symmetry in RL-based Legged Locomotion ControlAuthors: Zhi Su, Xiaoyu Huang, Daniel Ordoñez-Apraez, Yunfei Li, Zhongyu Li, Qiayuan Liao, Giulio Turrisi, Massimiliano Pontil, Claudio Semini, Yi Wu, Koushil SreenathSubjects: Robotics (cs.RO)
- [723] arXiv:2403.17343 (replaced) [pdf, other]
-
Title: Language Models are Free Boosters for Biomedical Imaging TasksSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [724] arXiv:2403.17367 (replaced) [pdf, other]
-
Title: RoboDuet: A Framework Affording Mobile-Manipulation and Cross-EmbodimentAuthors: Guoping Pan, Qingwei Ben, Zhecheng Yuan, Guangqi Jiang, Yandong Ji, Jiangmiao Pang, Houde Liu, Huazhe XuSubjects: Robotics (cs.RO)
- [725] arXiv:2403.17392 (replaced) [pdf, other]
-
Title: Natural-artificial hybrid swarm: Cyborg-insect group navigation in unknown obstructed soft terrainAuthors: Yang Bai, Phuoc Thanh Tran Ngoc, Huu Duoc Nguyen, Duc Long Le, Quang Huy Ha, Kazuki Kai, Yu Xiang See To, Yaosheng Deng, Jie Song, Naoki Wakamiya, Hirotaka Sato, Masaki OguraSubjects: Robotics (cs.RO); Systems and Control (eess.SY); Adaptation and Self-Organizing Systems (nlin.AO)
- [726] arXiv:2403.17421 (replaced) [pdf, other]
-
Title: MA4DIV: Multi-Agent Reinforcement Learning for Search Result DiversificationAuthors: Yiqun Chen, Jiaxin Mao, Yi Zhang, Dehong Ma, Long Xia, Jun Fan, Daiting Shi, Zhicong Cheng, Simiu Gu, Dawei YinSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
- [727] arXiv:2403.17456 (replaced) [pdf, other]
-
Title: Imitating Cost-Constrained Behaviors in Reinforcement LearningComments: Accepted to the 34th International Conference on Automated Planning and Scheduling (ICAPS-24)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
- [728] arXiv:2403.17458 (replaced) [pdf, ps, other]
-
Title: Expectations Versus Reality: Evaluating Intrusion Detection Systems in PracticeAuthors: Jake Hesford, Daniel Cheng, Alan Wan, Larry Huynh, Seungho Kim, Hyoungshick Kim, Jin B. HongComments: 10 pagesSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
- [729] arXiv:2403.17636 (replaced) [pdf, other]
-
Title: Mix-Initiative Response Generation with Dynamic Prefix TuningComments: Accepted to the main conference of NAACL 2024Subjects: Computation and Language (cs.CL)
- [730] arXiv:2403.17647 (replaced) [pdf, other]
-
Title: Intrinsic Subgraph Generation for Interpretable Graph based Visual Question AnsweringComments: Accepted at LREC-COLING 2024Subjects: Computation and Language (cs.CL)
- [731] arXiv:2403.17767 (replaced) [pdf, ps, other]
-
Title: Asymptotic Bayes risk of semi-supervised learning with uncertain labelingSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
- [732] arXiv:2403.17786 (replaced) [pdf, other]
-
Title: Query Refinement for Diverse Top-$k$ SelectionComments: v2 corrects author orderSubjects: Databases (cs.DB)
- [733] arXiv:2403.17794 (replaced) [pdf, other]
-
Title: Fermihedral: On the Optimal Compilation for Fermion-to-Qubit EncodingJournal-ref: ASPLOS 2024Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
- [734] arXiv:2403.17878 (replaced) [pdf, other]
-
Title: Empowering Data Mesh with Federated LearningSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC)
- [735] arXiv:2403.17905 (replaced) [pdf, other]
-
Title: Scalable Non-Cartesian Magnetic Resonance Imaging with R2D2Comments: submitted to IEEE EUSIPCO 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Signal Processing (eess.SP)
[ showing up to 2000 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, recent, 2403, contact, help (Access key information)