Resource
2024 EN
Kerem Zaman · Leshem Choshen · Shashank Srivastava
Model fusion research aims to aggregate the knowledge of multiple individualmodels to enhance performance by combining their weights. In this work, westudy the inverse problem: investigating whether model fusion can be used toreduce unwanted knowledge. We investigate the effects of model fusion in threescenarios: the learning of shortcuts, social biases, and memorization oftraining data in fine-tuned language models. Through experiments coveringclassification and generation tasks, our analysis highlights that sharedknowledge among models is enhanced during model fusion, while unsharedknowledge is usually forgotten. Based on this observation, we demonstrate thepotential of model fusion as a debiasing tool and showcase its efficacy inaddressing privacy concerns associated with language models.
Resource
2024 EN
Carlos Perez-Lara · James Wetzel · Ugur Akgun
+36 more
The RADiCAL Collaboration is conducting R\&D on high performanceelectromagnetic (EM) calorimetry to address the challenges expected in futurecollider experiments under conditions of high luminosity and/or highirradiation (FCC-ee, FCC-hh and fixed target and forward physics environments).Under development is a sampling calorimeter approach, known as RADiCAL modules,based on scintillation and wavelength-shifting (WLS) technologies andphotosensor, including SiPM and SiPM-like technology. The modules discussedherein consist of alternating layers of very dense (W) absorber andscintillating crystal (LYSO:Ce) plates, assembled to a depth of 25 $X_0$. Thescintillation signals produced by the EM showers in the region of EM showermaximum (shower max) are transmitted to SiPM located at the upstream anddownstream ends of the modules via quartz capillaries which penetrate the fulllength of the module. The capillaries contain DSB1 organic plastic WLSfilaments positioned within the region of shower max, where the shower energydeposition is greatest, and fused with quartz rod elsewhere. The wavelengthshifted light from this spatially-localized shower max region is thenpropagated to the photosensors. This paper presents the results of an initialmeasurement of the time resolution of a RADiCAL module over the energy range 25GeV $\leq$ E $\leq$ 150 GeV using the H2 electron beam at CERN. The dataindicate an energy dependence of the time resolution that follows thefunctional form: $\sigma_{t} = a/\sqrt{E} \oplus b$, where a = 256$\sqrt{GeV}$~ps and b = 17.5 ps. The time resolution measured at the highestelectron beam energy for which data was currently recorded (150 GeV) was foundto be $\sigma_{t}$ = 27 ps.
Resource
2024 EN
Shuvro Chowdhury · Shaila Niazi · Kerem Y. Camsari
Despite their appeal as physics-inspired, energy-based and generative nature,general Boltzmann Machines (BM) are considered intractable to train. Thisbelief led to simplified models of BMs with restricted intralayer connectionsor layer-by-layer training of deep BMs. Recent developments in domain-specifichardware -- specifically probabilistic computers (p-computer) withprobabilistic bits (p-bit) -- may change established wisdom on the tractabilityof deep BMs. In this paper, we show that deep and unrestricted BMs can betrained using p-computers generating hundreds of billions of Markov Chain MonteCarlo (MCMC) samples per second, on sparse networks developed originally foruse in D-Wave's annealers. To maximize the efficiency of learning thep-computer, we introduce two families of Mean-Field Theory assisted learningalgorithms, or xMFTs (x = Naive and Hierarchical). The xMFTs are used toestimate the averages and correlations during the positive phase of thecontrastive divergence (CD) algorithm and our custom-designed p-computer isused to estimate the averages and correlations in the negative phase. A customField-Programmable-Gate Array (FPGA) emulation of the p-computer architecturetakes up to 45 billion flips per second, allowing the implementation of CD-$n$where $n$ can be of the order of millions, unlike RBMs where $n$ is typically 1or 2. Experiments on the full MNIST dataset with the combined algorithm showthat the positive phase can be efficiently computed by xMFTs without muchdegradation when the negative phase is computed by the p-computer. Ouralgorithm can be used in other scalable Ising machines and its variants can beused to train BMs, previously thought to be intractable.
Resource
2024 EN
Emre Ozfatura · Kerem Ozfatura · Alptekin Kupcu
+1 more
Federated learning (FL) has been introduced to enable a large number ofclients, possibly mobile devices, to collaborate on generating a generalizedmachine learning model thanks to utilizing a larger number of local sampleswithout sharing to offer certain privacy to collaborating clients. However, dueto the participation of a large number of clients, it is often difficult toprofile and verify each client, which leads to a security threat that maliciousparticipants may hamper the accuracy of the trained model by conveying poisonedmodels during the training. Hence, the aggregation framework at the parameterserver also needs to minimize the detrimental effects of these maliciousclients. A plethora of attack and defence strategies have been analyzed in theliterature. However, often the Byzantine problem is analyzed solely from theoutlier detection perspective, being oblivious to the topology of neuralnetworks (NNs). In the scope of this work, we argue that by extracting certain sideinformation specific to the NN topology, one can design stronger attacks.Hence, inspired by the sparse neural networks, we introduce a hybrid sparseByzantine attack that is composed of two parts: one exhibiting a sparse natureand attacking only certain NN locations with higher sensitivity, and the otherbeing more silent but accumulating over time, where each ideally targets adifferent type of defence mechanism, and together they form a strong butimperceptible attack. Finally, we show through extensive simulations that theproposed hybrid Byzantine attack is effective against 8 different defencemethods.
Resource
2024 EN
Kagan Ucak · Faruk Karatas · Emre Cetinkaya
+1 more
A blood turbine-pump system (iATVA), resembling a turbocharger was proposedas a mechanical right-heart assist device without external drive power. In thisstudy, the iATVA system is investigated with particular emphasis on the bloodturbine flow dynamics. A time-resolved 2D particle image velocimetry (PIV)set-up equipped with a beam splitter and two high speed cameras, allowedsimultaneous recordings from both the turbine and pump impellers at 7 differentphased-locked instances. The iATVA prototype is 3D printed using an opticallyclear resin following our earlier PIV protocols. Results showed thatmagnetically coupled impellers operated synchronously. As the turbine flow rateincreased from 1.6 to 2.4 LPM, the rotational speed and relative inlet flowangle increase from 630 to 900 rpm, and 38 to 55% respectively. At the trailingedges, backflow region spanned 3/5 of the total passage outlet flow, and anextra leakage flow was observed at the leading edge. For this initial turbinedesign, approximately, 75% of the turbine blade passage was not contributing tothe impulse operation mode. The maximum non-wall shear rate was ~2288 s-1 nearto the inlet exit, which is significantly lower than the commercial bloodpumps, encouraging further research and blood experiments of this novelconcept. Experimental results will improve the hydrodynamic design of theturbine impeller and volute regions and will be useful in computational fluiddynamics validation studies of similar passive devices.
Resource
2024 EN
Mehmet Kerem Turkcan · Sanjeev Narasimhan · Chengbo Zang
+6 more
We introduce Constellation, a dataset of 13K images suitable for research ondetection of objects in dense urban streetscapes observed from high-elevationcameras, collected for a variety of temporal conditions. The dataset addressesthe need for curated data to explore problems in small object detectionexemplified by the limited pixel footprint of pedestrians observed tens ofmeters from above. It enables the testing of object detection models forvariations in lighting, building shadows, weather, and scene dynamics. Weevaluate contemporary object detection architectures on the dataset, observingthat state-of-the-art methods have lower performance in detecting smallpedestrians compared to vehicles, corresponding to a 10% difference in averageprecision (AP). Using structurally similar datasets for pretraining the modelsresults in an increase of 1.8% mean AP (mAP). We further find thatincorporating domain-specific data augmentations helps improve modelperformance. Using pseudo-labeled data, obtained from inference outcomes of thebest-performing models, improves the performance of the models. Finally,comparing the models trained using the data collected in two different timeintervals, we find a performance drift in models due to the changes inintersection conditions over time. The best-performing model achieves apedestrian AP of 92.0% with 11.5 ms inference time on NVIDIA A100 GPUs, and anmAP of 95.4%.
Resource
2024 EN
Togay Yazar · Mucahid Kutlu · İsa Kerem Bayırlı
Over the past century, the Turkish language has undergone substantialchanges, primarily driven by governmental interventions. In this work, our goalis to investigate the evolution of the Turkish language since the establishmentof T\"urkiye in 1923. Thus, we first introduce Turkronicles which is adiachronic corpus for Turkish derived from the Official Gazette of T\"urkiye.Turkronicles contains 45,375 documents, detailing governmental actions, makingit a pivotal resource for analyzing the linguistic evolution influenced by thestate policies. In addition, we expand an existing diachronic Turkish corpuswhich consists of the records of the Grand National Assembly of T\"urkiye bycovering additional years. Next, combining these two diachronic corpora, weseek answers for two main research questions: How have the Turkish vocabularyand the writing conventions changed since the 1920s? Our analysis reveals thatthe vocabularies of two different time periods diverge more as the time betweenthem increases, and newly coined Turkish words take the place of their oldcounterparts. We also observe changes in writing conventions. In particular,the use of circumflex noticeably decreases and words ending with the letters"-b" and "-d" are successively replaced with "-p" and "-t" letters,respectively. Overall, this study quantitatively highlights the dramaticchanges in Turkish from various aspects of the language in a diachronicperspective.
Resource
2024 EN
Lütfi Kerem Senel · Besnik Fetahu · Davis Yoshida
+5 more
Recommender systems are widely used to suggest engaging content, and LargeLanguage Models (LLMs) have given rise to generative recommenders. Such systemscan directly generate items, including for open-set tasks like questionsuggestion. While the world knowledge of LLMs enable good recommendations,improving the generated content through user feedback is challenging ascontinuously fine-tuning LLMs is prohibitively expensive. We present atraining-free approach for optimizing generative recommenders by connectinguser feedback loops to LLM-based optimizers. We propose a generativeexplore-exploit method that can not only exploit generated items with knownhigh engagement, but also actively explore and discover hidden populationpreferences to improve recommendation quality. We evaluate our approach onquestion generation in two domains (e-commerce and general knowledge), andmodel user feedback with Click Through Rate (CTR). Experiments show ourLLM-based explore-exploit approach can iteratively improve recommendations, andconsistently increase CTR. Ablation analysis shows that generative explorationis key to learning user preferences, avoiding the pitfalls of greedyexploit-only approaches. A human evaluation strongly supports our quantitativefindings.
Resource
2024 EN
Arif Kerem Dayı · Orhan Eren Akgün · Stephanie Gil
+2 more
In this work, we introduce the Resilient Projected Push-Pull (RP3) algorithmdesigned for distributed optimization in multi-agent cyber-physical systemswith directed communication graphs and the presence of malicious agents. Ouralgorithm leverages stochastic inter-agent trust values and gradient trackingto achieve geometric convergence rates in expectation even in adversarialenvironments. We introduce growing constraint sets to limit the impact of themalicious agents without compromising the geometric convergence rate of thealgorithm. We prove that RP3 converges to the nominal optimal solution almostsurely and in the $r$-th mean for any $r\geq 1$, provided the step sizes aresufficiently small and the constraint sets are appropriately chosen. Wevalidate our approach with numerical studies on average consensus andmulti-robot target tracking problems, demonstrating that RP3 effectivelymitigates the impact of malicious agents and achieves the desired geometricconvergence.
Resource
2024 EN
Arda Yüksel · Abdullatif Köksal · Lütfi Kerem Şenel
+2 more
Multiple choice question answering tasks evaluate the reasoning,comprehension, and mathematical abilities of Large Language Models (LLMs).While existing benchmarks employ automatic translation for multilingualevaluation, this approach is error-prone and potentially introduces culturallybiased questions, especially in social sciences. We introduce the firstmultitask, multiple-choice Turkish QA benchmark, TurkishMMLU, to evaluate LLMs'understanding of the Turkish language. TurkishMMLU includes over 10,000questions, covering 9 different subjects from Turkish high-school educationcurricula. These questions are written by curriculum experts, suitable for thehigh-school curricula in Turkey, covering subjects ranging from naturalsciences and math questions to more culturally representative topics such asTurkish Literature and the history of the Turkish Republic. We evaluate over 20LLMs, including multilingual open-source (e.g., Gemma, Llama, MT5),closed-source (GPT 4o, Claude, Gemini), and Turkish-adapted (e.g., Trendyol)models. We provide an extensive evaluation, including zero-shot and few-shotevaluation of LLMs, chain-of-thought reasoning, and question difficultyanalysis along with model performance. We provide an in-depth analysis of theTurkish capabilities and limitations of current LLMs to provide insights forfuture LLMs for the Turkish language. We publicly release our code for thedataset and evaluation: https://github.com/ArdaYueksel/TurkishMMLU.