Publications by authors named "Lucian Ilie"

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent.

View Article and Find Full Text PDF

Motivation: Proteins accomplish cellular functions by interacting with each other, which makes the prediction of interaction sites a fundamental problem. As experimental methods are expensive and time consuming, computational prediction of the interaction sites has been studied extensively. Structure-based programs are the most accurate, while the sequence-based ones are much more widely applicable, as the sequences available outnumber the structures by two orders of magnitude.

View Article and Find Full Text PDF

Several proteins work independently, but the majority work together to maintain the functions of the cell. Thus, it is crucial to know the interaction sites that facilitate protein-protein interactions. The development of effective computational methods is essential because experimental methods are expensive and time-consuming.

View Article and Find Full Text PDF

The preservation of food supplies has been humankind's priority since ancient times, and it is arguably more relevant today than ever before. Food sustainability and safety have been heavily prioritized by consumers, producers, and government entities alike. In this regard, filamentous fungi have always been a health hazard due to their contamination of the food substrate with mycotoxins.

View Article and Find Full Text PDF

Cellular functions are governed by proteins, and, while some proteins work independently, most work by interacting with other proteins. As a result it is crucially important to know the interaction sites that facilitate the interactions between the proteins. Since the experimental methods are costly and time consuming, it is essential to develop effective computational methods.

View Article and Find Full Text PDF

Motivation: Sequence similarity is the most frequently used procedure in biological research, as proved by the widely used BLAST program. The consecutive seed used by BLAST can be dramatically improved by considering multiple spaced seeds. Finding the best seeds is a hard problem and much effort went into developing heuristic algorithms and software for designing highly sensitive spaced seeds.

View Article and Find Full Text PDF

Motivation: Proteins usually perform their functions by interacting with other proteins, which is why accurately predicting protein-protein interaction (PPI) binding sites is a fundamental problem. Experimental methods are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods.

View Article and Find Full Text PDF

Understanding protein-protein interactions (PPIs) is vital to reveal the function mechanisms in cells. Thus, predicting and identifying PPIs is one of the fundamental problems in system biology. Various high-throughput experimental and computation methods have been developed to predict PPIs.

View Article and Find Full Text PDF

Background: The next generation sequencing (NGS) techniques have been around for over a decade. Many of their fundamental applications rely on the ability to compute good genome assemblies. As the technology evolves, the assembly algorithms and tools have to continuously adjust and improve.

View Article and Find Full Text PDF

Background: Proteins perform their functions usually by interacting with other proteins. Predicting which proteins interact is a fundamental problem. Experimental methods are slow, expensive, and have a high rate of error.

View Article and Find Full Text PDF

Summary: De novo genome assembly of next-generation sequencing data is a fundamental problem in bioinformatics. There are many programs that assemble small genomes, but very few can assemble whole human genomes. We present a new algorithm for parallel overlap graph construction, which is capable of assembling human genomes and improves upon the current state-of-the-art in genome assembly.

View Article and Find Full Text PDF

Background: Genome assembly is a fundamental problem with multiple applications. Current technological limitations do not allow assembling of entire genomes and many programs have been designed to produce longer and more reliable contigs. Assessing the quality of these assemblies and comparing those produced by different tools is essential in choosing the best ones.

View Article and Find Full Text PDF

Motivation: Alignment of similar whole genomes is often performed using anchors given by the maximal exact matches (MEMs) between their sequences. In spite of significant amount of research on this problem, the computation of MEMs for large genomes remains a challenging problem. The leading current algorithms employ full text indexes, the sparse suffix array giving the best results.

View Article and Find Full Text PDF

Background: De novo genome assembly of next-generation sequencing data is one of the most important current problems in bioinformatics, essential in many biological applications. In spite of significant amount of work in this area, better solutions are still very much needed.

Results: We present a new program, SAGE, for de novo genome assembly.

View Article and Find Full Text PDF

Next-generation sequencing technologies revolutionized the ways in which genetic information is obtained and have opened the door for many essential applications in biomedical sciences. Hundreds of gigabytes of data are being produced, and all applications are affected by the errors in the data. Many programs have been designed to correct these errors, most of them targeting the data produced by the dominant technology of Illumina.

View Article and Find Full Text PDF

Motivation: High-throughput next-generation sequencing technologies enable increasingly fast and affordable sequencing of genomes and transcriptomes, with a broad range of applications. The quality of the sequencing data is crucial for all applications. A significant portion of the data produced contains errors, and ever more efficient error correction programs are needed.

View Article and Find Full Text PDF

Background: DNA microarrays have become ubiquitous in biological and medical research. The most difficult problem that needs to be solved is the design of DNA oligonucleotides that (i) are highly specific, that is, bind only to the intended target, (ii) cover the highest possible number of genes, that is, all genes that allow such unique regions, and (iii) are computed fast. None of the existing programs meet all these criteria.

View Article and Find Full Text PDF

Summary: Multiple spaced seeds represent the current state-of-the-art for similarity search in bioinformatics, with applications in various areas such as sequence alignment, read mapping, oligonucleotide design, etc. We present SpEED, a software program that computes highly sensitive multiple spaced seeds. SpEED can be several orders of magnitude faster and computes better seeds than the existing leading software programs.

View Article and Find Full Text PDF

Background: DNA oligonucleotides are a very useful tool in biology. The best algorithms for designing good DNA oligonucleotides are filtering out unsuitable regions using a seeding approach. Determining the quality of the seeds is crucial for the performance of these algorithms.

View Article and Find Full Text PDF

Unlabelled: We report on a major update (version 2) of the original SHort Read Mapping Program (SHRiMP). SHRiMP2 primarily targets mapping sensitivity, and is able to achieve high accuracy at a very reasonable speed. SHRiMP2 supports both letter space and color space (AB/SOLiD) reads, enables for direct alignment of paired reads and uses parallel computation to fully utilize multi-core architectures.

View Article and Find Full Text PDF

Motivation: High-throughput sequencing technologies produce very large amounts of data and sequencing errors constitute one of the major problems in analyzing such data. Current algorithms for correcting these errors are not very accurate and do not automatically adapt to the given data.

Results: We present HiTEC, an algorithm that provides a highly accurate, robust and fully automated method to correct reads produced by high-throughput sequencing methods.

View Article and Find Full Text PDF

Motivation: Alignment of biological sequences is one of the most frequently performed computer tasks. The current state of the art involves the use of (multiple) spaced seeds for producing high quality alignments. A particular important class is that of neighbor seeds which combine high sensitivity with reduced space requirements.

View Article and Find Full Text PDF

Motivation: Homology search finds similar segments between two biological sequences, such as DNA or protein sequences. The introduction of optimal spaced seeds in PatternHunter has increased both the sensitivity and the speed of homology search, and it has been adopted by many alignment programs such as BLAST. With the further improvement provided by multiple spaced seeds in PatternHunterII, Smith-Waterman sensitivity is approached at BLASTn speed.

View Article and Find Full Text PDF