Background: The presence of B cells in early stage non-small cell lung cancer (NSCLC) is associated with longer survival, however, the role these cells play in the generation and maintenance of anti-tumor immunity is unclear. B cells differentiate into a variety of subsets with differing characteristics and functions. To date, there is limited information on the specific B cell subsets found within NSCLC.
View Article and Find Full Text PDFBackground: Numerous microarray-based prognostic gene expression signatures of primary neoplasms have been published but often with little concurrence between studies, thus limiting their clinical utility. We describe a methodology using logistic regression, which circumvents limitations of conventional Kaplan Meier analysis. We applied this approach to a thrice-analyzed and published squamous cell carcinoma (SQCC) of the lung data set, with the objective of identifying gene expressions predictive of early death versus long survival in early-stage disease.
View Article and Find Full Text PDFPearson correlation coefficient for expression analysis of the Lymphoma/Leukemia Molecular Profiling Project (LLMPP) demonstrated Aurora A and B are highly correlated with MYC in DLBCL and mantle cell lymphoma (MCL), while both Auroras correlate with BCL2 only in DLBCL. Auroras are up-regulated by MYC dysregulation with associated aneuploidy and resistance to microtubule targeted agents such as vincristine. Myc and Bcl2 are differentially expressed in U-2932, TMD-8, OCI-Ly10 and Granta-519, but only U-2932 cells over-express mutated p53.
View Article and Find Full Text PDFBackground: Inflammatory breast cancer (IBC) is a rare, highly aggressive form of breast cancer. The mechanism of IBC carcinogenesis remains unknown. We sought to evaluate potential genetic risk factors for IBC and whether or not the IBC cell lines SUM149 and SUM190 demonstrated evidence of viral infection.
View Article and Find Full Text PDFSequence alignment editors enable the user to manually edit a multiple sequence alignment (msa) in order to obtain a more reasonable or expected alignment. Editors allow sequences to be reordered and/or modified using the computer's cut and paste commands. They are designed to accept various msa formats and to provide the output file in a suitable user-designated format.
View Article and Find Full Text PDFFinding a global optimal alignment of more than two sequences that includes matches, mismatches, and gaps and that takes into account the degree of variation in all of the sequences at the same time is especially difficult. The dynamic programming algorithm used for optimal alignment of pairs of sequences can be extended to global alignment of three sequences, but for more than three sequences, only a small number of relatively short sequences may be analyzed. Thus, approximate methods are used for global alignment.
View Article and Find Full Text PDFFinding a global optimal alignment of more than two sequences that includes matches, mismatches, and gaps and that takes into account the degree of variation in all of the sequences at the same time is especially difficult. The dynamic programming algorithm used for optimal alignment of pairs of sequences can be extended to global alignment of three sequences, but for more than three sequences, only a small number of relatively short sequences may be analyzed. Thus, approximate methods are used for global sequence alignment.
View Article and Find Full Text PDFA hidden Markov model (HMM) is a probabilistic model of a multiple sequence alignment (msa) of proteins. In the model, each column of symbols in the alignment is represented by a frequency distribution of the symbols (called a "state"), and insertions and deletions are represented by other states. One moves through the model along a particular path from state to state in a Markov chain (i.
View Article and Find Full Text PDFCold Spring Harb Protoc
July 2009
It is difficult to find a global optimal alignment of more than two sequences (and, especially, more than three) that includes matches, mismatches, and gaps and that takes into account the degree of variation in all of the sequences at the same time. Thus, approximate methods are used, such as progressive global alignment, iterative global alignment, alignments based on locally conserved patterns found in the same order in the sequences, statistical methods that generate probabilistic models of the sequences, and multiple sequence alignments produced by graph-based methods. When 10 or more sequences are being compared, it is common to begin by determining sequence similarities between all pairs of sequences in the set.
View Article and Find Full Text PDFINTRODUCTIONTo obtain the best possible alignment between two sequences, it is necessary to include gaps in sequence alignments and use gap penalties. For aligning DNA sequences, a simple positive score for matches and a negative score for mismatches and gaps are most often used. To score matches and mismatches in alignments of proteins, it is necessary to know how often one amino acid is substituted for another in related proteins.
View Article and Find Full Text PDFINTRODUCTIONThe original Dayhoff percent accepted mutation (PAM) matrices were developed based on a small number of protein sequences and an evolutionary model of protein change. By extrapolating from the observed changes at small evolutionary distances to large ones, it was possible to establish a PAM250 scoring matrix for sequences that were highly divergent. Another approach to finding a scoring matrix for divergent sequences is to start with a more divergent set of sequences and produce a scoring matrix from the substitutions found in those less-related sequences.
View Article and Find Full Text PDFINTRODUCTIONCertain amino acid substitutions commonly occur in related proteins from different species. Because a protein still functions with these substitutions, the substituted amino acids are compatible with protein structure and function. Knowing the types of changes that are most and least common in a large number of proteins can assist with predicting alignments for any set of protein sequences.
View Article and Find Full Text PDFINTRODUCTIONComparing different amino acid scoring matrix-gap penalty combinations poses several problems. For example, the analysis often overlooks the purposes of different matrices; e.g.
View Article and Find Full Text PDFINTRODUCTIONThe choice of a scoring system including scores for matches, mismatches, substitutions, insertions, and deletions influences the alignment of both DNA and protein sequences. To score matches and mismatches in alignments of proteins, it is necessary to know how often one amino acid is substituted for another in related proteins. Percent accepted mutation (PAM) matrices list the likelihood of change from one amino acid to another in homologous protein sequences during evolution and thus are focused on tracking the evolutionary origins of proteins.
View Article and Find Full Text PDFINTRODUCTIONThe percent accepted mutation (PAM) scoring matrix is based on the Dayhoff model of protein evolution, which is a Markov process. In the Markov model of amino acid change, the probability of mutation at each site is independent of the previous history of mutations. Use of this model makes it possible to extrapolate amino acid substitutions observed over a relatively short period of evolutionary time to longer periods of evolutionary time.
View Article and Find Full Text PDFINTRODUCTIONMaximum likelihood (ML) methods are especially useful for phylogenetic prediction when there is considerable variation among the sequences in the multiple sequence alignment (msa) to be analyzed. ML methods start with a simple model, in this case a model of rates of evolutionary change in nucleic acid or protein sequences and tree models that represent a pattern of evolutionary change, and then adjust the model until there is a best fit to the observed data. For phylogenetic analysis, the observed data are the observed sequence variations found within the columns of an msa.
View Article and Find Full Text PDFINTRODUCTIONPhylogenetic analysis of a multiple sequence alignment (msa) can be performed using distance methods, which are based on genetic distances between sequence pairs in an msa. The genetic distance between two sequences is the fraction of aligned positions in which the sequence has been changed. In contrast, sequence identity is the fraction of the aligned positions that are identical.
View Article and Find Full Text PDFINTRODUCTIONMaximum parsimony predicts the evolutionary tree or trees that minimize the number of steps required to generate the observed variation in the sequences from common ancestral sequences. For this reason, the method is also sometimes referred to as the minimum evolution method. A multiple sequence alignment (msa) is required to predict which sequence positions are likely to correspond.
View Article and Find Full Text PDFINTRODUCTIONThree methods--maximum parsimony, distance, and maximum likelihood--are generally used to find the evolutionary tree or trees that best account for the observed variation in a group of sequences. Each of these methods uses a different type of analysis. Programs based on distance methods are commonly used in the molecular biology laboratory because they are straightforward and can be used with a large number of sequences.
View Article and Find Full Text PDFINTRODUCTIONThe BLAST algorithm was developed as a way to perform DNA and protein sequence similarity searches by an algorithm that is faster than FASTA but considered to be equally as sensitive. Both of these methods follow a heuristic (tried-and-true) method that almost always works to find related sequences in a database search, but does not have the underlying guarantee of an optimal solution like the dynamic programming algorithm. FASTA finds short common patterns in query and database sequences and joins these into an alignment.
View Article and Find Full Text PDFINTRODUCTIONFASTA is a program for rapid alignment of pairs of protein and DNA sequences. Rather than comparing individual residues in the two sequences, FASTA searches for matching sequence patterns or words, called k-tuples. These patterns comprise k consecutive matches of letters in both sequences.
View Article and Find Full Text PDFINTRODUCTIONDatabase similarity search programs tend to produce large volumes of output. It can become difficult to screen this volume of material and to assess whether the more remotely related sequences are really related to the query sequence. Thus, it is important to limit the sequence output; there are some relatively simple procedures that may be followed for each program, as described in this article.
View Article and Find Full Text PDFINTRODUCTIONThe BLAST algorithm performs DNA and protein sequence similarity searches by an algorithm that is faster than FASTA but considered to be equally as sensitive. BLAST is very popular due to availability of the program on the World Wide Web through a large server at the National Center for Biotechnology Information (NCBI) and at many other sites. The BLAST algorithm has evolved to provide a set of very powerful search tools for the molecular biologist that are freely available to run on many computer platforms.
View Article and Find Full Text PDFINTRODUCTIONThe following strategy is recommended for searches with FASTA for finding the most homologous sequences in a database search while avoiding false-negative matches.
View Article and Find Full Text PDFINTRODUCTIONA dot matrix analysis is primarily a method for comparing two sequences to look for possible alignment of characters between the sequences. The method is also used for finding direct or inverted repeats in protein and DNA sequences, and for predicting regions in RNA that are self-complementary and that, therefore, have the potential of forming secondary structure through base-pairing.
View Article and Find Full Text PDF